Entanglement of purification: from spin chains to holography

Purification is a powerful technique in quantum physics whereby a mixed quantum state is extended to a pure state on a larger system. This process is not unique, and in systems composed of many degrees of freedom, one natural purification is the one with minimal entanglement. Here we study the entropy of the minimally entangled purification, called the entanglement of purification, in three model systems: an Ising spin chain, conformal field theories holographically dual to Einstein gravity, and random stabilizer tensor networks. We conjecture values for the entanglement of purification in all these models, and we support our conjectures with a variety of numerical and analytical results. We find that such minimally entangled purifications have a number of applications, from enhancing entanglement-based tensor network methods for describing mixed states to elucidating novel aspects of the emergence of geometry from entanglement in the AdS/CFT correspondence.


I. INTRODUCTION
In quantum physics, it is always possible to interpret the entropy of a physical system as arising from entanglement with an auxiliary system. Given a physical system in a mixed quantum state, one can introduce a fictitious auxiliary system such that the combined system is in a pure state. The state of the combined system is called a purification of the original mixed state and the entanglement between the purifier and the original system, as encoded in the entanglement entropy, recovers the von Neumann entropy of the original state.
For example, when studying the thermal physics of quantum systems, it is often useful to work with a state called the thermofield double which purifies the thermal Gibbs state. In the context of numerical simulations of strongly interacting quantum spin chains using tensor network methods, the thermofield double construction is useful because it maps thermal entropy to entanglement entropy and opens up new algorithmic tools [1]. In the context of the AdS/CFT correspondence, the thermofield double construction is also useful and takes on an interesting physical meaning. The AdS/CFT correspondence maps thermal states to black holes [2][3][4], and the thermofield double state is mapped is to a wormhole geometry that connects the original black hole with a second black hole (the auxiliary system) [5,6].
However, the thermofield double is only one purification of the thermal state; there all an infinite number of other purifications which are all related by the action of a unitary transformation on the auxiliary system.
In the context of tensor network methods, where entanglement is a precious resource, it would be especially useful to work with a purification which had the minimal possible entanglement. It is also interesting to ask if the minimal purification has any geometric meaning within the AdS/CFT correspondence. Indeed, we expect there to be a connection between these two directions given the relationship between tensor networks and the AdS/CFT correspondence [7].
Remarkably, the notion of a purification with the minimal possible entanglement has also been considered in quantum information science as one measure of the total correlations present in a bipartite mixed state.
This quantity is called the entanglement of purification [8], and here we study it in the context of three different classes of quantum many-body systems. We consider first a class of strongly coupled conformal field theories which are holographically dual to Einstein gravity. Next we study a spin chain whose low energy physics is described by an Ising conformal field theory. We also report a result in a random stabilizer state tensor network model [9,10]. Through a combination of analytical arguments and numerical calculations, we conjecture values for the entanglement of purification in all these systems, and, in the case of random stabilizer states, give a rigorous argument.
Our primary motivations are two fold. First, from the perspective of tensor network methods, specifically matrix product states [11], we want to investigate the minimal entanglement amongst purifications of a given thermal state. As indicated above, the minimal entanglement purification could be a useful technical tool in numerical simulations. Indeed, in our calculations we find that the entanglement of the thermofield double state can be reduced by as much as a factor of two, leading to a reduced bond dimension equal to the square root of the thermofield double bond dimension, a substantial reduction given a computational cost scaling like the third power of the bond dimension. Second, from the perspective of holographic models, we want to understand other geometric aspects of the bulk geometry in terms of quantum information. The Ryu-Takayanagi (RT) formula [12] relating entanglement entropy to minimal surfaces is the best example of this correspondence, but it is particularly interesting to search for quantum information measures that go beyond the minimal curve paradigm and capture other aspects of the geometry.

A. Technical introduction
The entanglement of purification (EP) is defined as follows [8]: let ρ AB be a density matrix on a bipartite system H A ⊗ H B . Let |ψ ∈ H AA ⊗ H BB be a purification of ρ AB , e.g., Tr A B |ψ ψ| = ρ AB , as illustrated schematically in Figure 1. The EP of ρ is given by: Here we minimize over all ψ and over all ways of partitioning the purification into A B , and S AA is the von Neumann entropy of the reduced density matrix obtained by tracing out the BB part of |ψ ψ|. To gain some familiarity with this definition, let us consider a few simple cases. If ρ AB is pure, then no purification is needed and E p = S(A) = S(B). If ρ AB is uncorrelated, then there exists a purification of the form |ψ 1 AA ⊗ |ψ 2 BB in which case E p = 0. If ρ AB is classically correlated, then it can be shown that E p = − i p i log p i , the Shannon entropy of the {p i } distribution (see Appendix A for the proof).
More generally, E p obeys a few key properties. For readers unfamiliar with these properties, we have for completeness included proofs of these properties drawn from the literature [8,13] in Appendix A.
• The E p is bounded above by the entanglement entropy: • The E p is monotonic, i.e. it never increases upon discarding a subsystem: • The E p is bounded below by half the mutual information: • For a tripartite system, we have the bound: • For a tripartite pure state, the E p is polygamous: We now proceed to study the entanglement of purification in the aforementioned three classes of physical systems. In the holographic models we proceed by proposing a new dictionary entry relating entanglement of purification to minimal cross-section of the "entanglement wedge" [14][15][16] bounded by the physical boundary and the RT surface [12]. More precisely, we argue that amongst the subset of purifications which have a geometrical gravity dual, the entanglement wedge cross section is the entanglement of purification. We do not show that it suffices to restrict to geometric purifications, but we give some plausibility arguments and show that our proposal obeys all the above properties of E p . Throughout we denote our holographic proposal for E p by E ph .
In the spin chain model we proceed numerically to approximately find the minimal entanglement purification. We start from the thermofield double state and succeed in removing entanglement, but we do not rigorously show that we have found the optimal purification. However, we do find that the numerical results are in remarkable accord with the holographic proposal, perhaps more even than one might expect given that one conformal field theory has central charge less than one (spin chain) while the other has very large central charge and a sparse low lying operator spectrum (AdS/CFT). Throughout we denote the output of our spin chain numerics byẼ p .
Finally, we also study a tensor network model composed of random stabilizer states. In this tensor network class, all entanglement consists of either Bell pairs or "cat states"/GHZ states. Using recent results on the GHZ content of random stabilizer tensor network states [10], we show that in this case the entanglement of purification is approximately 1 2 I(A : B) on average, i.e. near the lower bound. This is so despite the fact that entanglement entropy in such states is computed using a discrete version of the RT formula.
Note: After our holographic results were obtained and while preparing the manuscript, a very similar holographic proposal for the entanglement of purification appeared [17].
In this section we introduce and motivate our holographic prescription for entanglement of purification, denoted E ph . We discuss the core ideas justifying our proposal and give some sample calculations in the ground state and in thermal equilibrium at non-zero temperature. Later, in Sec. V, we discuss generalizations of our proposal to time-dependent situations and show that E ph obeys all the properties listed in the technical introduction. As we discuss in detail below, our proposal for the holographic dual of E p is strongly motivated by tensor network models of the AdS/CFT correspondence.
A. Proposal: time-independent geometry Suppose we have a geometry M dual to some pure state |ψ ABC . We want a holographic prescription for computing the entanglement of purification of the state on AB, which we will refer to as E ph (A : B). Our proposal is as follows. Let Σ be the RT surface associated with the combined region AB. The spatial region bounded by A, B, and Σ is called the entanglement wedge [18]. We consider the entanglement wedge as a new holographic geometry with boundary A ∪ B ∪ Σ, i.e. by discarding all the geometry from Σ to the old boundary C. The prescription is to find the minimum area surface X which can end on Σ that separates A from B. The E ph is then given by: where G N is Newton's constant. We illustrate this for two disjoint boundary intervals in global AdS 3 in Figure 2. As can be seen from this figure, the E ph in this case is only nonzero when the entanglement wedge is connected. In essence, E ph is the minimal cross-section of the entanglement wedge. are labelled on Figure 2). This second formulation makes the physical intuition more clear. The idea is to simplify the geometry M as much as possible by removing the geometry outside the entanglement wedge of AB. This is accomplished using some operation on C. The effect is to replace C with Σ. We then break up Σ into two pieces such that the combined entropy, as computed by the RT formula, is as small as possible.
The above intuition suggests that, if we restrict to holographic purifications, then then entanglement of purification is given by our minimal surface prescription. The more non-trivial claim is that it suffices to restrict to such holographic purifications.
We note that the idea of thinking about Σ as part of the new boundary is especially natural from the viewpoint of tensor networks and their connection to the AdS/CFT correspondence. For example, we show an analog of Σ in a MERA network in Figure 2). Similar pictures can be drawn for networks of perfect tensors or random tensors [9,19]. In the MERA example we can remove tensors from the shaded region by a unitary transformation acting on the complement of AB thereby simplifying the geometry of the tensor network. For example, the number of boundary legs in the purification of AB has gone from six to four by removing tensors below the lower red cut in Figure 2).
B. Sample calculations of E ph : pure AdS3 In this subsection, we provide explicit formulae for the E ph in empty AdS 3 . In the first case, no curve in the bulk separates A from B and we say that the E ph is zero. One could argue for this value of E ph by invoking the mutual information. In this regime, S(AB) = S(A) + S(B) and I(A : B) = 0. This implies that, to leading order in N (in the large-N limit), the reduced density matrix is a product state ρ AB = ρ A ⊗ ρ B . It can be seen that the E p of a product state is always zero. Apparently, according to our picture, the subleading 1/N corrections do not affect the E p .

a. Non-adjacent intervals in
Finding E ph in the second case involves finding the shortest distance between 2 geodesics in the hyperbolic plane. This is a nontrivial exercise in hyperbolic geometry, and we relegate the details to Appendix B and simply quote the result here. If we parametrize the two subsystems by A = (φ 1 − α 1 , φ 1 + α 1 ) and , then the E ph between the two geodesics is given by: The formula above applies of course whenever the entanglement wedge is connected. Note that the formula only depends on φ 1 , φ 2 through their difference, reflecting the rotational symmetry. Alternatively, if we parametrize the boundary intervals by their endpoints as A = (θ 1 , θ 2 ) and B = (θ 3 , θ 4 ) the formula becomes: Also, for the special case α 1 = α 2 ≡ α, φ 1 = π 2 , φ 2 = 3π 2 (i.e. two geodesic of the same size diametrically opposite each other) the above reduces to: This is the situation depicted on the left panel of Figure 2.
To get a sense of the formula (11), we can vary one endpoint of one of the two geodesics (with the other 3 endpoints kept fixed) and plot the E ph as a function of the varying endpoint. This is what we show in Figure 3 below. Note that the E ph is only nonzero in a certain range of the parameters. b. Adjacent intervals in AdS. Next we compute the E ph for two adjacent intervals, which is a special case of the non-adjacent case above, but we need to regulate the divergence. Consider 2 adjacent intervals A, B on the boundary, with half-widths α 1 and α 2 respectively. The E ph in this case is the shortest distance from the common endpoint of A and B to the RT surface of AB, and it can be found using the same techniques as in the previous case of non-adjacent intervals. Note also that the E ph in this case is divergent whereas it is finite in the previous case. We relegate the details to Appendix B again and only give the final result here: where is a near-boundary cutoff (the geodesic is regulated at Beltrami-Klein radial coordinate L AdS (1 − )), and . . . stand for terms which vanish as → 0.
In particular, in the symmetrical case where the two adjacent intervals have the same half-width α 1 = α 2 ≡ α, the above simplifies to [20]: We plot in Figure (4) the E ph as a function of α 2 for fixed values of α 1 . One can notice from the plot that the E ph is neither a convex nor a concave function of the boundary intervals' sizes. This is more or less expected, since the E p is known to be neither concave nor convex with respect to mixture of states [8]. π/4 (green) and π/3 (black). We set the cutoff to 0.1 and 4G N = 1.
Note that the E ph for adjacent intervals is essentially the mutual information (for the same choice of cutoff in the bulk, the two quantities differ by only (L AdS /4G N ) log 2, see section V B for more details).
Interestingly, the functional form of (15) is also the same as that of the logarithmic negativity for 2 adjacent intervals in a CFT [21] (see also [22]).
Next, we present some sample calculations for the BTZ black hole. We focus on the 1-sided black hole in this subsection, with metric [23]: and will consider the 2-sided black hole in the next subsection. The Hawking temperature is given by β/L AdS = 2πL AdS /r + . We distinguish between 2 cases: (1) when the entanglement wedge is topologically trivial (i.e. connected and simply connected), and (2) when the entanglement wedge is not simply connected due to the inclusion of the horizon.
a. Case (1). In the first case, we can use the fact that BTZ is a quotient of global AdS. Thus it is straightforward to map formulae (11) and (15) from AdS to derive the analogous formula for E ph in BTZ.
We do not even need the full coordinate transformation from global AdS to BTZ, but only the transformation of the boundary coordinates. It is known that the coordinate transformation from AdS to BTZ reduces to a conformal transformation on the boundary: Here (τ, θ) are the global AdS time and angle coordinates, and (t, φ) are the BTZ time and angle coordinates.
In particular, on the slice τ = 0 (or equivalently t = 0) we have: In particular, this implies: Next, we substitute the above into formula (13) for the E ph of two non-adjacent intervals in BTZ (such that the entanglement wedge is connected and simply connected): The case of two adjacent intervals in BTZ can be similarly handled.
b. Case (2). Next, we discuss the more complicated case where the entanglement wedge has a hole due to the horizon. In this case, the surface computing the E ph becomes disconnected.
Let us consider a few simple special cases, starting with the case where A and B are of the equal size, each slightly smaller than half the boundary circle (on one side of the BTZ black hole), as depicted in the left panel of Figure 5. Then the RT surface for AB has 3 connected components, one of which is the horizon.
The EP geodesic extends in the radial direction as depicted in Figure 5. When the Araki-Lieb inequality is saturated, the E ph coincides with S(B).
The E ph is: where r * is radial coordinate of the deepest point of the RT components that go to the boundary. It is related to the half-width α of the boundary intervals A or B by: In terms of α, the E ph can be written as: In particular, when α = π 2 the E ph is divergent. The regularized E ph in this case is: Next, consider the case where the union of A and B is the whole boundary circle, say A has half-width α and B has half-width π−α. Moreover, suppose α is either sufficiently large or sufficiently small enough that we are in the "entanglement plateau regime" [24]. This means the Araki-Lieb inequality is saturated, which in turn implies that the E p coincides with the entanglement entropy of the smaller subsystem, and the E ph is computed by the RT surface for the smaller region. This is depicted in the right panel of Figure  Now let us vary α from 0 to π/2. Initially E ph = S(A). Explicitly: At the critical angle α crit,EP given by: the RT surface exchanges dominance with a new saddle: the two radial geodesics crossing the horizon as depicted on the left panel of (5) and its E ph is given by (25). As α keeps increasing, the E ph levels off for a while since the surface remains two radial geodesics despite the change in α. At the second critical angle: the E ph surface snaps back to being the RT surface again. We plot the E ph versus α for 3 different choices of the horizon (or temperature) on the left panel of Figure (6), and we plot both the E ph and half the mutual information for a choice of r h on the right. Let us now elaborate on Figure 6. The fact that the E ph levels off for α close enough to π/2 can be accounted for by the fact that correlations in a thermal state are short-range (they are cut off at the thermal scale). Note that the mutual information, like the E p , is also a measure of the total correlation in the quantum state, and therefore should be expected to saturate for larger values of α. This is indeed the case as can be seen from Figure (6). Interestingly, the mutual information saturates at an angle somewhat smaller than the angle of E ph saturation. That this happens is a consistency check for our proposal: it implies that whenever the Araki-Lieb inequality is saturated, then E ph is indeed given by the entanglement entropy of the smaller region.
On the right panel of Figure (6), we have picked a particular value for the horizon. It is interesting to compare the two critical angles α 2 crit,EP and α crit,EE as a function of the horizon. If α 2 crit,EP > α crit,EE for some horizon size, then the argument above regarding the Araki-Lieb inequality would be in trouble! Recall that α crit,EE is given by: We plot in Figure (7) the two critical angles as a function of r + /L AdS . As can be seen from the plot, we always have α crit,EP < α crit,EE and we do have a consistent picture (i.e. E ph = S(B) whenever Araki-Lieb is saturated). Calculating E p exactly requires a global minimization over the space of purifications -a problem that is numerically difficult even for small wavefunctions. The existence of a geometric interpretation of E p , however, suggests that locality can be exploited during the minimization process. For numerical purposes, the locality of a many body state can be captured using a tensor network ansatz. Here we explain how E p can be approximately computed in 1D using such methods. In fact, as discussed by Hauschild, et al. [25], the solution suggests a potentially dramatic speedup of finite-temperature DMRG calculations which should prove useful in its own right.
In 1D, zero-temperature tensor network algorithms such as DMRG rely on the representation of a pure state as a matrix product state (MPS). [11,26] MPSs are a class of variational ansatz defined by the property that the entanglement entropy for a bipartition of the state into left and right regions is bounded from above by S L:R ≤ log(χ). Here χ is the "bond-dimension" of the MPS -more entanglement can be captured by using larger χ, but the computational cost generally scales as χ 3 .
When numerically simulating a mixed stateρ, one can either representρ as a matrix product operator (MPO), [27] or instead purifyρ and represent the purification as a MPS. [28] Purifications have several advantages over density operators; for instance the density matrix will remain positive definite by construction, regardless of numerical errors. However, as discussed there is a large space of possible purifications, and the choice may drastically effect the numerical difficulty. [29] For equilibrium calculations, it is standard to use the "thermofield double" (TFD) purification, where |n ,|ñ are the nth eigenstate of H with energy E n , on the physical and ancilla degrees of freedom respectively, β is the inverse temperature, and Z(β) = n e −βEn is the partition function. In this case the Hilbert space of the ancilla is identical to the physical one, so locality can be preserved by doubling each degree of freedom in the 1D chain. The MPS ansatz for the TFD state thus looks like a "caterpillar" (Figure 8), just like the MPO representation ofρ would, but the prescription for calculating observables differs.
The MPS representation of the TFD state is straightforward to obtain, for instance using the timeevolving block decimation (TEBD) algorithm. [1,27,28,30] At infinite temperature, β = 0, the TFD state can be constructed by preparing each physical degrees of freedom into a maximally entangled state with its corresponding ancilla, e.g., for a spin-1/2 chain we have where | ·· j denote the states of the physical and ancilla degrees of freedom on site j. This has zero entanglement across any cut and can therefore be represented by an MPS with bond dimension χ = 1.
To prepare a state at finite β using TEBD, [28,30] we apply e −βH/2 to the physical degrees of freedom by Trotterizing the imaginary time evolution into small local gates. During the application of the gates to the MPS, the entanglement of the TFD state grows, and hence the bond dimension χ.
Starting from the TFD purification, we may obtain other purifications by acting with a unitary U anc on the ancilla. Since the difficulty of MPS calculations increases with χ ∼ e S , we can try and use this freedom to reduce the entanglement of the purification. [29] Clearly the TFD is not itself optimal; as β → 0, the TFD puts both the physical and ancilla degrees of freedom into the ground state, |TFD, ∞ = |0 |0 , with entanglement twice that of the ground state. Very crudely speaking, this requires a bond dimension which is the square of the ground state's χ TFD ∼ χ 2 gs . The optimal purification would instead put the ancilla into a product state, e.g. |0 |↑,↑ · · · , which requires only χ gs , suggesting something approaching a quadratic speedup of finite temperature calculations might be possible. Minimizing the entanglement of the purification, and hence hopefully the χ of the MPS, is precisely the problem of calculating the entanglement of purification.
Of course, all of this relies on the ability to correctly find the optimizing unitary U anc . Given the TFD MPS, how do we best find the optimal unitary that minimizes entanglement entropy across a cut? Moreover, minimizing entanglement across a single cut is not very useful, since a priori this may increase the entanglement across other cuts, so we really want to minimize the sum of the entanglement entropy at each cut. This is, of course, a very difficult problem that we do not have an exact solution to.
Nevertheless, we can attempt to find an approximate solution by appealing to locality and restricting the structure of U anc to a unitary circuit formed from the successive application of local (here two-site) gates. We accomplish this practically as follows. [25] Starting from the β = 0 TFD state, we apply a small time step of imaginary time evolution to the physical degree of freedom, e −∆βH/2 |β = 0 , compressing the result as an MPS. We then act with a disentangling unitary U anc (0) which acts only on the ancilla.
The disentangler takes the form of a depth-two unitary circuit acting first on even, then on odd bonds,  bond, so we may locally (gate-by-gate) solve the minimization problem first calculating the even-bond unitaries, and then calculating the odd-bond unitaries holding the former fixed. Numerical algorithms for minimizing entanglement over a local gate have been discussed elsewhere.
[31] Other disentangling criteria are also possible -in this work we actually minimize the 2nd Renyi entropy for numerical efficiency (see Appendix C). This defines the optimal U [j,j+1] anc to apply, andẼ p is defined from the minimum. The purification at the next step is then defined by |∆β = U anc (0)e −∆βH/2 |β = 0 . We then continue the similarly, alternating application of e −∆βH/2 on the physical degrees of freedom with a layer of unitary disentangling U anc (β) on the ancilla. This builds up a state of the form shown in Figure 8, where A priori, the resulting purification need not be the optimal one, first because U anc was restricted to the form of a unitary circuit, and second because we determined the value of the initial layers using the low-β purification, independent of the subsequent layers. Indeed,Ẽ p is rather noisy at intermediate temperature, presumably an artifact of our algorithm. Nevertheless, the numerical experiments reveal that the entan-glementẼ p of the purification we obtain is remarkably consistent with the expected properties of the true entanglement of purification E p , as we now explore.
with J = h = 1/2, where σ x ,σ z are Pauli matrices. While this model is equivalent to a free fermion problem, we have verified that the results are insensitive to an integrability-breaking perturbation which is tuned to stay at the critical point. We obtain the entanglement entropy as a function of subsystem size L A , inverse temperature β, and total system size L, using the method just discussed, which we will refer to as the disentangled entanglement entropyẼ p (L A , β, L). If our disentangling unitary were optimal, thenẼ p would coincide with the entanglement of purification E p . as predicted from the holographic prescription.
In Figure 9, we show raw data forẼ p across the central cut (L A = L/2) as a function of β for a few system sizes L. For reference, we also show the entanglement of the TFD state as an upper bound (obtained by TEBD without disentangling) and half the mutual information as a lower bound (obtained via a thermal correlation matrix method [32]). We believe the noise is due to a landscape of local minima in the entanglement minimization step (see Appendix C).Ẽ p increases up to a maximum, before decreasing again and saturating the lower bound at high β. Note that the saturation of the lower bound at β → ∞ indicates that U anc has successfully transformed the ground state of the ancilla |0 to an unentangled state, realizing the desired reduction χ TFD = χ 2 gs → χ gs of the MPS. Next, we examine the dependence ofẼ p on the subsystem size L A , shown in Figure 9b). Also shown is the thermodynamic von Neumann entropy S A , SĀ for the subsystem A and its complement. There are three clear regimes in the behavior ofẼ p : for small L A ,Ẽ p coincides with S A , until it hits a plateau and saturates over a range of L A . Finally, as L A becomes the majority of the system,Ẽ p again coincides with the entropy of the smaller complement SĀ.
Remarkably, we find thatẼ p satisfies the scaling form where c = 1 2 is the central charge, and f is a universal function. More conveniently, as we will show in Section III A, this can be expressed as eẼ p −Sgs =f (L A /L, β/L) becoming a universal function of L A /L and β/L, where S gs is the ground state entropy (f is related to f by a constant factor). This is shown for L A /L = 1/2, 1/4 in Figure 10.
The qualitative agreement between the holographic and numerical results for the entanglement of purification is encouraging for both sides. It is evidence that the holographic prescription E ph does indeed correspond to the entanglement of purification. At the same time, another message is that although cal-culating E p numerically is difficult, it is possible to calculate it approximately with a practical algorithm.
This result is also encouraging for numerical calculations of this type in general, where bond dimension is the limiting factor. In our current algorithm, the computational gain from decreasing bond dimension is overshadowed by the cost of performing the disentangling at every time step, since our goal was to get as small an entanglement as possible. In principle, the algorithm can be modified to include the disentangling step more sporadically (every few time steps), or only when necessary (if bond dimension goes above a certain value).

A. Comparison with Holographic BCFT
Here we compare the numerical results, which were obtained from a spin chain with open boundary conditions, to the holographic proposal in the case of open boundary conditions. Since the conformal field theory has open boundary conditions, the appropriate tool is now "boundary conformal field theory" (BCFT), not to be confused with the conformal field theory at the boundary of AdS. The holographic calculations are based on an unproven but plausible proposal [33] for the gravity dual of BCFT (the proposal passes many checks). Throughout this section we consider two complementary regions, call them A and B, in the thermal state of a holographic CFT on an interval. We assume for simplicity that the size of region A is always less than or equal to the size of region B and that A and B together give the whole CFT.
The basic proposal for the gravity dual of BCFT is to solve Einstein's equations in the presence of an "end of the world brane" which terminates the bulk spacetime and which ends on the boundary of the boundary, i.e. the boundary of the CFT spacetime. In the simplest case, this brane is described just by a tension T .
One then solves the bulk Einstein equations plus the equation of motion of the brane to find a bulk spacetime with an asymptotic boundary and a bulk termination at the brane. The rules for calculating entanglement entropy are the same, but with the extra proviso that the end of the world brane never contributes.
Practically speaking, for the simple case of three dimensional Einstein gravity which we consider here, the geometry is either described by a part of empty AdS or a part of the BTZ black hole. At low or zero temperature, the dominant saddle point is the AdS geometry. The metric of AdS may be taken to be where h(z) = 1 − z 2 /z 2 0 and x is periodic with period 2πz 0 . The terminating brane is denoted Q and is described by the curve [33]: The turning point of this curve is at z = z 0 1 − L 2 AdS T 2 and its mirror continues after the turning point. The total length of the boundary interval is thus As the temperature is increased, the system experiences a first order Hawking-Page transition from an AdS geometry to a BTZ black hole geometry. The black hole geometry may be written as where f (z) = 1 − z 2 /z 2 H and the temperature is β = T −1 = 2πz H . The terminating brane is now The length of the boundary at z = 0 is still written as πz 0 , and for positive tension T the horizon z = z H includes more of the x coordinate. By analyzing the free energy of the AdS and BTZ saddle points, one can show that the Hawking-Page transition occurs when For example, if the string tension goes to zero, then the phase transition occurs when z 0 = z H . By contrast, as the string tension gets large, the phase transition occurs at larger and larger β.
Now to study the entanglement of purification of as a function of the relative size of A and B we must consider two variables. Fixing the total size, we must first determine, as a function of temperature, whether we are in the AdS or BTZ phase. Then, given the geometry, we must perform the minimization over curves according to the rules discussed above to find the holographic entanglement of purification. This procedure is somewhat involved, so we will not consider the general case here (we anyway do not expect an extremely detailed correspondence between the spin chain and holographic model -for example, the spin chain has no phase transition while the holographic model does). We will consider a few limits and special cases.
First, consider the limit of high temperature (or large interval size) and the case where A is just less than half the total system size, |A| = πz 0 /2. In this limit the boundary effects are mostly irrelevant, at least at finite temperature, and the calculations are simplified. The dominant geometry is the BTZ black hole and the minimal cross-section of the AB entanglement wedge is simply given by a curve which drops vertically from z = (the regulated asymptotic boundary) to z = z H . The length of this curve in Planck units is the holographic entanglement of purification; we find To remove the dependence on the cutoff, it is natural to compare to the ground state entropy of A. On general CFT grounds, the ground state entropy is given by where log g is the boundary entropy and L = πz 0 is the total length. In holographic BCFT, the boundary entropy is related to the string tension via When L A = L/2, the ground state entropy is S gs = c 6 log 2L π + log g. Hence the UV finite scaling form reads Another interesting comparison is to the entanglement between AA and BB (where A and B are the mirrors of A and B in the purifier) in the thermofield double state. This entanglement is actually just twice E p in this limit. Since the required bond dimension is χ ∼ e Ep , the minimal purification is predicted to require approximately the square root of the bond dimension needed for the thermofield double state. Note that in this limit, the holographic entanglement of purification is also approximately the mutual information, so if the holographic prescription is correct, then the lower bound on E p is close to being reached.
It is also possible to study E p as a function of the size of A. If the system is in the thermal AdS phase, then E p = S(A) provided A is less than half the total system. In the holographic model, what is in essence happening is that the dual gauge theory is confined and the system is essentially in its ground state except for a few thermal modes. Hence the large N part of the entanglement is like that of a pure state. If the system is in the BTZ phase, then E p = S(A) again for sufficiently small A, but beyond a critical size of A, E p saturates to the value as discussed above. These two features, tracking the entropy of A for small A and rapidly saturating for large A, are strikingly similar to the spin chain data, at least for sufficiently high temperature.
We conclude this discussion by working out the simplest example in slightly more detail. We consider the case of vanishing string tension, T → 0. Note that in this limit the boundary entropy goes to zero, Similarly, the Hawking-Page transition occurs for z 0 = z H . Geometrically, the key simplifying feature is that the Q boundary is now essentially vertical, i.e. independent of z. We already argued on general grounds that at low temperatures the holographic entanglement of purification is simply E p = S(A). Therefor let us consider the high temperature case.
In the high temperature phase, the entanglement entropy of A for any region A less than half the system size can be obtained by using a doubling trick. The entropy of a segment terminating at the boundary is simply one half the entropy of a segment of twice the size without the boundary. This is correct in the limit where Q is vertical. Thus if A is an interval of length L A then The entanglement of purification is given by the minimum length among two candidate curves, the minimal curve for A and the vertical segment running from z = to z = z H . For large L A , the vertical segment dominates. For small L A , the minimal curve for A dominates. By equating the entropy of A with the length of the vertical segment, we see that the two curves exchange dominance when Thus we have If L A is half the total system size, L A = πz 0 /2, then the switch occurs at However, the Hawking-Page transition occurs at z 0 /z H = 1, so the geometry switches to AdS before the change of minimal curve can occur in the BTZ geometry. Hence the scaling function e Ep−Sgs has the following form in the tensionless limit, By accident, in this limit the scaling function is actually continuous across the Hawking-Page transition.
We also consider the case L A = L/4 and zero brane tension. In this case the Hawking-Page transition still occurs at β/L = 2. But at high temperature (β/L < 2), we have a competition between the surface that drops vertically into the bulk and the one that terminates on Q, and they exchange dominance around β L ≈ 1.782.... The scaling form is found to be:

IV. RANDOM STABILIZER TENSOR NETWORKS
Having motivated the holographic prescription in part using tensor networks, in this section we discuss one concrete tensor network computation of E p . Unlike the previous two models, here our results are rigorously correct. Based on the relationship between tensor networks and the AdS/CFT correspondence, there has been considerable interest in designing tensor networks which obey the network version of the RT formula.
Random stabilizer tensor networks are one class that obeys the RT formula. Here we show, using the results of Ref. [10], that the entanglement of purification can be easily calculated in random stabilizer tensor networks and that it reduces to approximately 1 2 I (A : B). Consider a connected graph (V, E) and choose a subset V ∂ of the vertices called "boundary vertices".
These vertices are the analog of the CFT degrees of freedom which live on the boundary in the AdS/CFT correspondence. The remaining vertices are called "bulk vertices" and they are the analog of the gravity degrees of freedom in the AdS/CFT correspondence. We associate a tensor |V x to each vertex x ∈ V b and a maximally entangled state |e to each edge e ∈ E. The bond dimension is taken to be χ for all bonds so that |e = 1 √ χ χ−1 i=0 |ii and |V x is a tensor on a χ deg(x) dimensional space where deg(x) is the degree of vertex x. The final pure quantum state on V ∂ is The above construction is quite general. A stabilizer state can be constructed by first taking the bond dimension to be χ = p N for prime p. Then the maximally entangled states are stabilizer states. If the vertex tensors are also taken to be stabilizer states, then the resulting pure state on V ∂ is also a stabilizer state.
A random stabilizer state is obtained by drawing the tensors |V x uniformly at random from the set of all stabilizer states of the relevant dimension.
One of the main results of Ref. [9] is that such random stabilizer states obey the network RT formula.
Given a subset A of V ∂ , the entropy of A in state |ψ ∂ is given by the minimal number of bonds in the network which must be cut to isolate A, For the remainder of this section, all entropies will be measured in units of log p, so the RT formula reads S(A) = N |minimal cut|. This result fully characterizes the bipartite entanglement in random stabilizer tensor networks.
Recently, progress has also been made on properties of multipartite entanglement in random stabilizer states. Consider a tripartite stabilizer state |ψ ABC . It is known that, up to local unitary transformations, the entanglement content of such a state is given by Bell pairs and GHZ states [34,35]. Denote the Bell pair by and the GHZ state by Note that these states do not depend on N , i.e. they represent elementary units of entanglement. In this notation, the statement is that for any tripartite pure state there exist local unitaries U A , U B , and U C and factors A i , B i , and C i of the A, B, and C Hilbert spaces such that up to unentangled states.
Given this form, it is easy to calculate the entropy of any region, say A: Similarly, the mutual information is Finally, using results outlined in the introduction plus the fact that the state of AB reduces to products of decoupled mixed states, Bell pairs, and purely classically correlated states (arising from GHZ), the entanglement of purification can be calculated: Now, in the limit of large N , the numbers a, b, and c scale with N while the number g is order one [10].
Hence it follows that In other words, in random stabilizer tensor networks, the entanglement of purification is approximately the lower bound of one half the mutual information. This is in contrast to the holographic proposal, where E p and 1 2 I could differ by a large amount. Indeed, we could have considered an analog of the holographic proposal for random stabilizer tensor networks, but this proposal would be wrong in general.
The random stabilizer tensor network result does highlight an important caveat in the holographic discussion. Since such networks obey the RT formula, any property derived from RT is also obeyed in such networks. Similarly, one can show that in holographic systems which obey the RT formula, the lower bound of 1 2 I(A : B) is also consistent with all properties of E p . Hence it is prudent to emphasize that it is possible the holographic answer is simply one half the mutual information; however, it must be similarly emphasized that the entanglement structure of holographic states is known to be more complex than that of stabilizer states, e.g. the spectrum of density matrices is not flat.
One final note is appropriate. There are other classes of tensor network states that obey the network version of the RT formula, e.g. some tensor networks made of perfect tensors and random tensor network states. Especially in the case of random tensor networks, it is natural to conjecture that the holographic prescription giving E p in terms of the entanglement wedge cross section generalizes to its network version.
It would be very interesting to prove or refute this conjecture in the class of random tensor networks.

V. HOLOGRAPHIC PROPOSAL: GENERAL FORMULATION AND PROPERTIES
In this section we return to our holographic proposal and discuss some general features of it. First, we generalize it to time-dependent situations. Then we discuss some interesting features of the proposal, especially the case when E ph undergoes a first order phase transition. Finally, we show that our proposal in the time-independent case obeys all the properties of E p listed in the technical introduction. The time dependent case is more complex, and depends in principle on the actual dynamics of the theory, so we leave it for future work.  [38] to study the time dependence of the entanglement entropy. First, we find the HRT surface, which we will denote Γ: Γ a pair of spacelike geodesic crossing the wormhole connecting A to B. The HRT surface is disconnected and consists of 2 connected component, as depicted in Figure 11. By symmetry, the E ph should be the geodesic distance between the two midpoints of the connected components of Γ. We schematically depict this in Figure 11. Using the fact that BTZ is a quotient of AdS 3 , one can work out an analytical formula for the E ph as a function of the boundary time T 0 (by boundary time, we mean the Schwarzschild or Killing time on the boundary). In Kruskal coordinates, the BTZ metric reads: with φ ∼ φ+2π. We need the geodesic distance between any two spacelike-separated points X 1 = (u 1 , v 1 , φ 1 ) and X 2 = (u 2 , v 2 , φ 2 ) in the BTZ spacetime [39]: In particular, for two points on the boundary X 1 = (t 1 , φ 1 ) and X 2 = (t 2 , φ 2 ) (in the Schwarzschild coordinates of equation (38) with the renaming of the coordinate x → φ), we have the distance formula: where the sign of ± is plus if the two points belong to the same boundary, and minus if they belong to different boundaries, and is a regulator defined by integrating the geodesic up to the near-boundary hyperbola uv = −1 + 2 /z H . We now consider the 4 points a,b,c, and d which are the endpoints of A and B (the black semicircles on the right panel of Figure 11). Their coordinates are: Note that S(A) and S(B) are independent of T 0 . This is because both these RT surfaces lie on a spatial slice of fixed Schwarzschild time (which goes through the bifurcation surface of the black hole), and the metric is static in this time coordinate. The mutual information is nonzero from time T 0 = 0 to: at which point there is a phase transition and the mutual information jumps to zero. During the time 0 ≤ T 0 ≤ T * , the mutual information is given by: As for the E ph , it is given by the geodesic distance between the midpoint of the component of Γ connecting a to c, and the midpoint of the component connecting b to d. These two midpoints are located at (u, v) coordinates given by: At T 0 = 0, the midpoint of the HRT surface is the bifurcation circle of the black hole (u = v = 0). As T 0 → ∞, the midpoint approaches the singularity (u = v = 1). Using the distance formula (64), we find for the E ph : In particular, at boundary time T 0 = 0 the E ph is equal to half the circumference of the bifurcation circle of the black hole (divided by 4G). We plot in Figure (12) the time evolution of the E ph and (half) the mutual information. Note that, as expected, the E ph is greater than or equal to half the mutual information. A peculiar feature of the E ph in this case, as can be seen from Figure 12, is that even as the mutual information approaches zero continuously at the phase transition, the E ph remains finite and then jumps discontinuously to zero (with the difference between E ph and half the mutual information approximately constant in time until the phase transition). This behavior is somewhat counterintuitive, as one would expect the mutual information and the entanglement of purification to behave similarly to each other. Nevertheless this is also what occurs for 2 non-adjoint boundary intervals in empty AdS: when the entanglement wedge transitions from being connected to being disconnected, the mutual information approaches zero continuously while the E ph jumps discontinuously to zero. It would be interesting to understand this phenomenon in more details. In particular, it would be nice to construct explicit quantum states which have close to zero mutual information but nonzero E p .

B. Holographic check of inequalities
In this section, we show that E ph satisfies the inequalities mentioned in the technical introduction. We will first go through each inequality and check its validity in time-independent backgrounds. Then we will generalize the arguments to the time-dependent case at the end. in AdS, this bound is trivially satisfied because the E ph is UV-divergent at one endpoint but each RT surface for S(A) and S(B) diverges at both endpoints. For 2 non-adjacent intervals, the bound is also trivially true since the entanglement entropy diverges but the E ph is finite.
The BTZ case is more subtle. Consider for example the symmetrical case where A and B are each half the boundary on one side (their half-widths are both π/2). The E ph has already been computed: and the entanglement entropies are: The question of whether E p (A : B) ≤ S(A) then depends on the sign of the quantity 2L AdS log sinh πr+ 2L AdS . This quantity could be of either sign, depending on the size of the horizon relative to L, but we can invoke a thermodynamic argument to eliminate the negative case. Recall that the BT Z black hole undergoes the Hawking-Page transition to thermal AdS when the horizon is smaller than the AdS lengthscale, and only large black holes (with r + > L) are thermodynamically stable. For large black holes, we have that 2L AdS log sinh πr+ 2L AdS > 0 and the upper bound by the entanglement entropy is satisfied. b. Monotonicity. The monotonicity property is quite intuitively clear. For 2 adjacent intervals in AdS 3 , recall formula (15) for the E ph . If we differentiate this formula with respect to α 1 , we have: Since both α 1 and α 2 are in the range (0, π/2), the quantity above is always positive. This means the E ph But, by definition of a geodesic, we also have (ab) < (ac) + (bc) and (ad) < (ac) + (cd). Plugging the two inequalities above into I(A : B) above, we find thus proving the bound. Similar proofs can be constructed for two non-adjacent intervals as well as the BTZ black hole in a straightforward way, as well as for other asymptotically AdS geometries.
Even though we have established the lower bound, it is still interesting to explicitly compute the difference between E ph and half the mutual information in a few simple cases. For two arbitrary adjacent intervals of half-widths α 1 and α 2 , the E ph and mutual information are: Comparing the two expressions above, we find that this latter is larger than half the mutual information by an amount L AdS log 2.
Next, consider 2 non-adjacent intervals. For the simple special case where A and B have the same size α and are diametrically opposite each other (with α sufficiently large so that the entanglement wedge is connected), the E ph and mutual information are: and one can check that the first one is larger than the second. Finally, consider the BTZ black hole, with A, B taken to be each half the boundary (on one side). In this case the E ph and the mutual information are given by: To see that E p (A : B) > 1 2 I(A : B), we have to argue: This is easy to show: 2L AdS log sinh πr + 2L AdS = 2L AdS log e πr+/2L AdS − e −πr+/2L AdS 2 ≤ 2L AdS log e πr+/2L AdS = πr + (91) where we used the fact that the log is a monotonic function. This verifies the bound (7).
d. Tripartite bound . Next, consider the tripartite bound (8). We note a relevant fact: in a holographic state, the mutual information in holographic states is known to be monogamous: as proved in [40]. This property combined with the lower bound (7) implies the tripartite bound (8). To see this, let us replace B by BC in the bound (7). We obtain: Using the monogamy relation (92) to replace I(A : BC) on the right-hand side then yields the bound (8).
Thus, it will be sufficient to check the bound (7) holographically.
e. Polygamy of tripartite pure state. Finally, we check the polygamy of the E ph for tripartite pure states. Like the lower bound by the mutual information, this property is a simple consequence of Riemannian geometry as illustrated in Figure 14. Even though we draw AdS in Figure 14, it is clear from the proof above that it applies to any asymptotically AdS geometry, and not only empty AdS.
f. Generalization to time-dependent situations . Finally, we generalize the arguments above for timedependent backgrounds, starting with the lower bound by half the mutual information. Note that the geometrical argument presented above in the time-independent case does not directly apply due to the fact that in general, the different extremal surfaces involved lie on different spatial slices. However, one can adapt the techniques of [41] to prove this lower bound, as follows.
Consider for instance a spatial slice of the boundary of global AdS, and let A and B be "large", non-adjacent boundary intervals (we require them to be large so that the E ph is nonzero). By corollary (h) of Theorem 17 in [41], we know that there exists a spatial slice Σ containing the HRT surfaces for A, B and AB, and on which all these HRT surfaces are minimal. Thus, one can draw a picture analogous to the left panel of This is, again, established in [41].

VI. CONCLUSION AND FUTURE WORK
We presented an analysis of the entanglement of purification in three different model many-body systems.
In the case of random stabilizer tensor networks we were able to actually compute the entanglement of purification. Our holographic calculations focused for simplicity on the case of a three dimensional bulk, but the proposal obviously extends to any dimension. One technical challenge is to show that the desired properties of E p are obeyed by our holographic proposal in the time dependent case. We found reasonably good agreement between the holographic results and a numerical study of the Ising spin chain.
We mention two promising directions for future work within holography: (1) exploring the connection between the E p and the differential entropy [42] as well as kinematic space, and (2) exploring the connection between E p and the bit threads [43]. It has been discovered that the lengths of arbitrary curve in the bulk can be interpreted by terms of quantum information by a quantity called the differential entropy. This latter quantity is associated to a continuum of boundary intervals defined by the family of geodesics in the bulk tangential to the curve of interest. Equivalently, the length of curves can also be computed by integrating over the volume of a region in an auxiliary geometry called kinematic space. Remarkably, volume elements in kinematic space turn out to compute the conditional mutual information of 3 adjacent boundary intervals. Of course, the differential entropy/kinematic space interpretation also applies to the geodesic segments computing the E ph . Therefore, there seems to be deep connection between holographic entanglement of purification and other quantum-information-theoretical quantities such as the conditional mutual information.
On the other hand, the Ryu-Takayanagi formula has been reinterpreted recently via the min-cut/max-flow theorem as some kind of information flow [43]. Within this framework, a beautiful picture emerges for the lower bound of the E ph by half the mutual information, as follows: one can construct a flow in the bulk that computes half the mutual information and which is supported only in the entanglement wedge. The E ph then acts as the bottleneck that restricts this flow, in pretty much the same way as the diameter of a pipe contrains the amount of water flowing across it. Further explorations of this bit thread picture may help prove nontrivial properties of the E ph that are not easily seen otherwise.
In the context of spin chains, we have shown that a substantial reduction in entanglement relative to the thermofield double state is possible. One promising direction is to construct new tensor network algorithms that take some advantage of this potential reduction in entanglement. Finding the right balance between the cost of keeping unneeded entanglement and the cost of finding and removing it is an interesting challenge.
Finally, in the context of tensor network model of holography, we computed the entanglement of purification for random stabilizer tensor networks. Despite the fact that these networks obey the discrete RT formula, the discrete analog of the holographic proposal for E p was actually not obeyed in general. This is presumably due to the rather simple structure of entanglement in these networks. It would be very interesting to study random tensor networks, for example, to see if the analog of E ph does actually compute E p in that case.
Acknowledgements: MPZ is particularly indebted to Frank Pollmann, with whom we first considered and implemented the purification disentangling algorithm. BGS acknowledges support from the Simons Foundation as part of the It From Qubit Collaboration. This work is partially supported by NSF grants PHY1407744 and PHY1708139. MPZ acknowledges the hospitality of the Aspen Center for Physics, which is supported by National Science Foundation grant PHY-1607611.

Appendix A: Proofs of properties of Ep
In this appendix, we review the proofs of the properties of the E p mentioned in the Introduction [8,13], starting with the upper bound (5) by the entanglement entropy.
Proof : Let ρ AB be a bipartite density matrix with eigenvalues λ i and eigenvectors |ψ i . The standard purification of ρ AB : yields the entanglement entropy S(A) when we trace out the BB , and S(B) when we trace out the AA .
Since we have to minimize over all purifications in the definition of the EP, the bound (5) follows.
Proof : Let ρ ABC be the density matrix on ABC. If ρ ABC is pure, then the EP coincides with the entan- Next, we prove the lower bound (7) by the mutual information.
Proof : Let |ψ ABA B be the optimal pure state for the evaluation of E p (A : B), i.e. S(AA , |ψ ) = E p (A : B).
USing the subadditivity of the conditional entropy for a 4-party quantum state: which is equivalent to (7).
Next, we prove the lower bound (8) for the tripartite systems.
Proof : Let |ψ ABCA D be the optimal pure state for evaluating the EP, i.e. Next, we show that the E p in a state saturating the Araki-Lieb inequality is the entanglement entropy of the smaller subsystem.
Proof : Saturation of Araki-Lieb means: Note that the EP is bounded above by the entanglement entropy and below by half the mutual information: But Araki-Lieb saturation also implies I(A : B) = 2S(B). The above becomes: Hence E p (A : B) = S(B).
Next, we show that the E p in a tripartite pure state is polygamous (inequality 9).
Finally, we show that for a classically correlated state of the form ρ AB = i p i |i i| A ⊗ |i i| B , the E p is the Shannon entropy of the corresponding probability distribution: E p = − i p i log p i .
Proof : We copy the classical information to a third system C and consider the state: This state is unitarily related to the state ρ AB . Indeed, if we call V a unitary operator that copies the classical information V |i B |0 C = |i B |i C for some reference state |0 B , we then have: Using the inequalities previously established in this appendix, we have: In this appendix, we use the Beltrami-Klein model of the hyperbolic plane [44,45] together with its well-known properties to compute the shortest distance between any two geodesics in the hyperbolic plane To this effect, we use the following fact (also known as the ultraparallel theorem in hyperbolic geometry): Fact. Given any two geodesics in the hyperbolic plane which do not share a common endpoint on the boundary (i.e. given two ultra-parallel curves), then there exists a unique geodesic which is perpendicular to both of them. Moreover, this common perpendicular is the shortest curve between the two given geodesics.
By the fact above, we should construct the unique common perpendicular to the two given geodesics in order to find the shortest distance between them. We will work with the Beltrami-Klein (BK) model of the hyperbolic plane to construct the common perpendicular. The BK metric can be obtained from the usual global coordinates in AdS by a redefinition of the radial coordinate: In the BK model, geodesics are straight lines. For example, in Figure 15 we draw the RT surface as well as the EP surface for the case where A and B are of the same size and diametrically opposite from each other, both in the Poincaré disk model and BK model. In the simple case of Figure 15, the unique common perpendicular is easily seen to be the line connecting the midpoints of the two red lines (by symmetry). For more general boundary intervals A and B, finding the common perpendicular is a bit more involved, but the following fact is helpful: Fact. Let L be a geodesic in the hyperbolic plane. Another geodesic L is perpendicular to L if and only if it goes throught the pole of L when extended beyond the edge of the disk (in the Beltrami-Klein model). Here the pole of L is the intersection between the two lines tangential to the edge of the disk at the two endpoints of L.
Using the fact above, we can then construct the common perpendicular to any two geodesics as in Figure   16 below. Let a, b, c, d be 4 boundary points, and we have two geodesics L 1 and L 2 connecting a to b and c to d respectively. These two geodesics are black lines in Figure 16. By the fact above, we know that the unique common perpendicular to L 1 and L 2 passes through the poles of both L 1 and L 2 . The pole of L 1 is the point p, which is the intersection of the two tangential lines to the disk at a and b (depicted in red, dashed in the Figure). Similarly, the pole of L 2 is the point q. The green line connecting p to q is then the unique commone perpendicular to L 1 and L 2 .
Let m and n be the intersection of the green line with L 1 and with L 2 respectively, and let r and s be the two intersections of the green line with the edge of the disk. The shortest distance between L 1 and L 2 is then the distance between m and n. Using the standard formula for distance in the Beltrami-Klein model: where | · | is the Euclidean distance between the two points. Note that the distance is a function of a the cross-ratio of the 4 points. Our task now is to relate the 4 points m, n, r and s in the formula above to the 4 points a, b, c, d. Let us denote by α 1 , α 2 the half-widths of (a, d) and (b, c) respectively, and by φ 1 , φ 2 the midpoints of (a, d) and (b, c). Note that the intervals we are referring to are not (ab) and (cd) but the other two. We want to write down a formula for d(φ 1 , α 1 , φ 2 , α 2 ). After some analytical geometry, we find the formula (11) for E ph of 2 non-adjacent intervals. Unlike the non-adjacent case, the EP is now divergent. We regularize it length by introducing a cutoff at radius L AdS (1 − ) (dashed circle in the figure above). Thus, we want to compute the length of the green line segment between the dashed circle and the RT surface. As in the non-adjacent case, we parametrize A and B as (φ 1 − α 1 , φ 1 + α 1 ) and (φ 2 − α 2 , φ 2 + α 2 ) respectively. The fact that they are adjacent implies: After some analytical geometry, we find the distance formula: d = 1 2 log   g 2 tan 2 (α 1 + α 2 ) (2 sin α 1 sin α 2 + cos (α 1 + α 2 ) g 2 (1 − ) 2 − sec 2 (α 1 + α 2 ) sin 2 (α 1 − α 2 ) (2 sin α 1 sin α 2 − cos (α 1 + α 2 ) g 2 (1 − ) 2 − sec 2 (α 1 + α 2 ) sin 2 (α 1 − α 2 )   (B4) If we now expand in around = 0, we find the result (15) given in section II.

Appendix C: Minimization of 2nd Renyi Entropy
In this appendix we describe the disentangling step of the numerical calculation described in Section III in more detail.
The disentangling step seeks to efficiently find a unitary transformation on the ancilla degrees of freedom of our system which minimizes the total entropy. While this unitary could be any global unitary transformation, to make the problem tractable we instead sweep across the system, minimizing the Second Renyi Entropy between two sites at a time. Disentangling algorithms are discussed in more detail in Ref. [31].
Once the center of normalization for the MPS is on site i or i + 1, the state can be represented by the object Θ [28], depicted in Figure 18. We calculate the Second Renyi Entropy S 2 = − log Trρ 2 in the usual way, treating Θ as our state.
To minimize this quantity for our pair of sites, we use a modified steepest descent algorithm. In particular, we apply a unitary disentangler to the ancilla legs of Θ, and express S 2 in terms of this unitary. We then calculate the gradient of Trρ 2 with respect to this unitary, evaluated at the identity. This gradient is depicted graphically in Figure 19.
The algorithm then chooses a unitary disentangler close to this gradient, which we obtain via a singular value decomposition. In particular, for the decomposition where X and Z are unitary matrices, the two-site disentangler chosen by the algorithm is U = XZ. This selects the unitary closest to XY Z, as defined by the matrix norm.
As argued in Section III, this approach does well to approximate the entanglement of purification, but the data contains considerable noise for intermediate values of β. One method to reduce the noise is to choose two-site disentanglers which are closer to the identity. For example, an alternate approach would be to instead choose the decomposition for a small value of k, with U = XZ as before. This choice of disentangler corresponds to the standard steepest descent algorithm (again with the restriction that only unitary disentanglers are allowed). The choice (C1) corresponds to the large k limit of (C2). Figure 20 shows the entropy after disentangling using various values of k.
Unforunately, small values of k lead to sub-optimal disentanglers, as the algorithm converges on local minima more readily when k is small. As Figure 20 suggests, the noise becomes significant once the algorithm is able to escape some local minima, even for suboptimal purifications. This suggests that the noise is in part due to movement between local minima. Escaping these local minima, however, appears essential to produce a good approximation of the entanglement of purification, as we argue our algorithm accomplishes.