The Python's Lunch: geometric obstructions to decoding Hawking radiation

According to Harlow and Hayden [arXiv:1301.4504] the task of distilling information out of Hawking radiation appears to be computationally hard despite the fact that the quantum state of the black hole and its radiation is relatively un-complex. We trace this computational difficulty to a geometric obstruction in the Einstein-Rosen bridge connecting the black hole and its radiation. Inspired by tensor network models, we conjecture a precise formula relating the computational hardness of distilling information to geometric properties of the wormhole - specifically to the exponential of the difference in generalized entropies between the two non-minimal quantum extremal surfaces that constitute the obstruction. Due to its shape, we call this obstruction the"Python's Lunch", in analogy to the reptile's postprandial bulge.


Introduction
Computational complexity is relevant to the study of quantum gravity in (at least) two ways: in its traditional role as a measure of the difficulty of carrying out tasks [1]; and as a possible holographic dual for properties of the spacetime behind horizons [2,3,4].
Harlow and Hayden [1] studied the complexity of the task of distilling a single qubit of information from Hawking radiation. They argued that the complexity of distillation grows exponentially with the entropy S of the black hole.
Later, in the context of the AdS/CFT duality, one of us proposed a holographic identification between the computational complexity of an entangled pair of boundary states and the size of the Einstein-Rosen bridge in the dual two-sided black hole [2].
At first sight there seems to be some tension between these two roles of complexity. While the complexity of decoding Hawking radiation is exponential in S, the volume of the wormhole connecting the black hole to its radiation is only polynomial.
The source of the discrepancy is that we are using two different definitions of complexity. The decoding task [1] is only hard because we are restricted to act solely on the radiation outside the black-hole horizon. In Ref. [2] there is no such restriction. The distinction between restricted and unrestricted complexity will be a central theme of this paper. In particular we will be interested in the distinction between the holographic dual of unrestricted complexity, which was the subject of [2,3,4], and the holographic dual of restricted complexity, which will be a subject we will develop in this paper. The main point of this paper is not to prove the Harlow-Hayden conjecture -like almost everything else in complexity theory this is too hard -but to explain how it may be related to the geometry of wormholes.
Consider one possible decoding 'strategy' for distilling information while acting solely on the Hawking radiation 1 . The first step in this strategy is to gather the radiation and collapse it into a second black hole. This new black hole is entangled with the first black hole, and the entanglement can be interpreted, according to ER=EPR [5], as a wormhole connecting them. At the Page time the wormhole would have a volume of order S 2 , far less than the exponential complexity claimed by Harlow and Hayden, but still too large to easily implement the decoding. The next step would be to apply unitary operations to the second black hole in order to shorten the wormhole and bring it to the thermofield-double Figure 1: A spatial slice through a 'Python's Lunch' geometry. On the far left, the wormhole opens up to one asymptotic region with infinite cross-sectional area; on the far right, the wormhole opens up to the other asymptotic region also with infinite cross-sectional area. In AdS-Schwarzschild black holes the cross-sectional area reaches a minimum in the middle of the wormhole, and increases on either side. By contrast, in the Python's Lunch geometry the cross-sectional area first shrinks, then grows, then shrinks, then grows again, giving rise to a bulge in the middle of the wormhole-the eponymous Lunch. A L and A R are the areas of the minimal surfaces on each side and A max is the area of the luncheon bulge.
state. In that state the two horizons have no separation between them, and the structure of the entanglement is especially simple. Once this is accomplished the decoding should be easy. The only potentially hard step in this strategy, therefore, is shortening the wormhole.
Since the only potentially hard step is shortening the wormhole, and since the Harlow-Hayden argument shows that decoding information from the Hawking radiation alone is indeed exponentially hard, we conclude that shortening the wormhole from one side must be exponentially hard. This situation suggests that there must be some kind of obstruction in the wormhole, an obstruction which prevents us from efficiently shortening the wormhole from one side. Moreover this obstruction cannot be large volume, since the volume is not large.
In this paper we will conjecture that the geometric obstruction is a bulge in the wormhole, which because of its shape we call the "Python's Lunch", as depicted in Fig. 1. We will estimate the complexity of bypassing the Python's Lunch, and find that, consistent with the Harlow-Hayden claim, it is indeed exponential. In Eq. 4.6 we will conjecture that the restricted complexity is dual to the size of the Python's Lunch via where A max is the maximum cross-section of the wormhole and A R is the size of the throat connecting the wormhole to the radiation. In Eq. 4.9 we will make a covariant generalization of this conjecture. This proposal for the geometric dual of the restricted complexity is complementary to existing conjectures about the geometric duals to unrestricted complexity [2,3,4].
Despite our focus on restricted complexity, in Sec. 7 we find that one-sided Python's Lunches can also teach us about unrestricted complexity. We suggest an improvement to the definition of unrestricted holographic complexity conjectured to be dual to volume & action in Refs. [2,3,4]. Specifically, we argue that these conjectures should have defined complexity to permit not only unitary gates but also non-unitary projections.

The Shortening of Wormholes
In much of what follows we will assume that black holes can be modeled as "quantum computers" by which we mean collections of N qubits evolving by means of k-local all-toall Hamiltonians or discrete gates. The number of qubits is determined by the entropy of the black hole, 2 N ∼ S. (2.1) We will encounter both (unitary) operator complexity and relative state complexity. The complexity of a unitary operator U may be defined as the minimal number of 2-qubit gates g needed to prepare it; in other words the smallest n for which U = g n g n−1 ....g 1 . (2.2) There are other definitions but for our purposes this definition will do. The complexity of 2 Modeling black holes as a quantum computer has been extremely fruitful in the study of quantum information scrambling [6,7], onset of random matrix behavior [8] and derivation of the RT formula [9]. U will be denoted by C(U ). By construction, it satisfies C(U ) = C(U † ) . (2. 3) The relative complexity of two states |ψ and |φ is defined as the complexity of the least complex unitary that connects them |ψ = U |φ . In other words it is the minimum number of gates required to transform |φ to |ψ , |ψ = U |φ = g n g n−1 ....g 1 |φ .
Relative complexity is denoted by C(ψ, φ). Due to Eq. 2.3, it is symmetric in its arguments C(ψ, φ) = C(φ, ψ). where I, J label a complete basis of N qubit states in the computational basis. The same matrix plays a second role in representing a maximally entangled state of 2N qubits, where |J is the time reversal of |J . A special case is U IJ = δ IJ which describes the infinite temperature thermofield-double state, (2.7) The infinite temperature |TFD state may also be written as a product of N Bell pairs, The thermofield-double state of a two-sided black hole is a finite temperature state of an infinite number of qubits, but is often modeled as an infinite temperature state of a finite number of qubits. We too will make that approximation, so by |TFD we will mean the state in Eq. 2.8. The state |TFD is the simplest case of a maximally entangled state. The two subsystems called A and B are under the control of Alice and Bob respectively. The natural evolution of the system is governed by an overall Hamiltonian which is the sum of two non-interacting terms, (2.9) For simplicity we will assume that the two Hamiltonians are identical and real (so that we don't have to worry about the details of time reversal). The natural time evolution operator is a product, (2.10) We will define the time-evolved state |TFD(t) by |TFD(t) ≡ U (t)|TFD (2.11) In the maximally entangled case |TFD(t) can be constructed by evolving on only one side for a total time 2t, As t evolves, the linearly growing complexity of the state is dual to the linearly growing volume of the wormhole. For sub-exponential times the complexity is the sum of the complexity of U A (t) and U B (t) which is equal to the complexity of U A (2t) and of U B (2t).
Given the evolved state |TFD(t) Alice can return it to the initial TFD by applying U † A (2t). We will think of doing this in a series of small steps of low complexity, Pictorially Alice is shortening the long wormhole by a series of incremental small steps, as illustrated in Fig. 2. Figure 2: Successive spatial slices through the wormhole. Since the two sides are maximally entangled, Alice is able to shorten the wormhole by unitary operations U A ⊗ 1 that act only on her side.
Bob may also accomplish the shortening by acting on |TFD(t) with U † B (2t), or Alice and Bob may act together with U † Here are some questions to consider: • Why might one be interested in shortening a wormhole? There are a number of reasons. One that we have already mentioned is that it would be a step in decoding Hawking radiation.
A second would involve the use of the entanglement as a resource for quantum teleportation. Quantum teleportation requires the use of pre-existing entangled qubits.
Not only must these qubits be entangled, they must also be brought to have low complexity in order to successfully teleport. In the language of ER=EPR, if we want to make a wormhole traversable [10,11], we first need to make it as short as possible.
• Why might one want to shorten the wormhole by processes which do not couple the two sides? If the two sides are being used to communicate over a long distance then coupling them quantum-mechanically may be unfeasible. Thus there are practical reasons why one might be interested in the complexity of shortening a wormhole by acting on it from one side.
• Are there situations in which it is easy to shorten a wormhole by interactions which involve both sides, but in which it is extremely difficult to do so from one side?
• Most of all we are interested in whether the answer to the previous question correlates with geometric properties of the wormhole, and if so, what properties?
With regard to this last question, we will argue that there is a particular kind of geometric obstruction which prevents us from efficiently shortening a wormhole by one-sided operations, even though the wormhole has small volume and can be easily shortened by two-sided operations. The shape of the obstruction suggests the name "Python's Lunch".

Restricted and Unrestricted Complexity
The restricted complexity C R of a maximally entangled state of 2N qubits is the number of gates needed to construct it from the TFD state under the restriction that all gates act only on one side. 3 We will sometime use C R,A to indicate that the restriction is on a specific subsystem (A in this case). Without loss of generality we can assume the gates all act on Alice's side or we may distribute them symmetrically between the two sides. A useful picture is provided by the tensor network (TN) description. The state |Ψ is represented as a TN as in Fig. 3. The restricted relative complexity of |Ψ and |TFD is also the complexity of the unitary operator U corresponding to |ψ , as defined in (2.5).
By acting with a layer of gates on Alice's side, a layer can be removed from the TN. By repeating this operation enough times, as in Fig. 4, the state can be brought to the simple state |TFD . The minimal number of gates needed to carry out the shortening operation defines 4 the restricted complexity of |Ψ .
The restricted complexity would be an appropriate measure of the difficulty of the task of shortening the wormhole if the two computers were too far apart to directly couple.
The unrestricted complexity C U is the number of 2-qubit gates needed to complete the shortening task, allowing for gates which couple the two computers. Fig. 5 shows such an It is obvious that C R ≥ C U . In this paper we will be interested in the conditions under which the restricted complexity may be exponentially larger than the unrestricted complexity. This subject is not new; it was introduced by Harlow and Hayden [1] in the context of black hole physics, and elaborated on by Aaronson [12]. Our particular interest is to understand this large gap between restricted and unrestricted complexity through the geometry of the wormholes connecting entangled systems. The question is: can we identify the situations in which C R C U from the shape of the wormhole? To put it another way, can we identify a geometric obstruction to shortening the wormhole from one side?
If the state |Ψ was prepared by acting with restricted gates on |TFD , and if the number of such gates is not exponentially large, we expect C U = C R . On the other hand if |Ψ was prepared from |TFD by a circuit that allows interaction between the two computers, then we expect C R,A C U (assuming the circuit is longer than the scrambling length). In particular if the number of unrestricted gates used in preparing |Ψ is enough to scramble the system then Harlow and Hayden have argued that the restricted complexity C R,A will be exponential in N, and the same for C R,B . At the same time the unrestricted Figure 5: A unitary U AB cannot in general be decomposed as U A ⊗ U B . In the example in this figure, the horizontal red links between the left and right sides represent gates which couple qubits on the two sides.
complexity may be no bigger than polynomial in N .

The Python's Lunch
We now come to the Python's Lunch geometry: a wormhole with a bulge, as illustrated in Figs. 1 and 6. For simplicity we will assume that it consists of three regions, all of length polynomial in N , where N denotes the entropy (number of qubits). The two outer regions have area A L ≈ N · (4G ) and A R ≈ (1 + γ)N · (4G ), where γ > 1 is a numerical constant. The bulge between the two outer regions has larger area, where α > γ is a constant independent of N . In order to count as a Python's Lunch, we will mostly assume that the length must be larger than the scrambling time t * ∼ log N . An alternative way to look at this geometry is as the tensor network (TN) in Fig. 6 which prepares a two-sided state. Figure 6: The tensor network that corresponds to the Python's Lunch geometry. The throats and bulge (where the girth is constant) are composed of unitary gates, whereas the shoulders (where the girth changes) involve projections like those shown in Fig. 7.
If all the vertices in the tensor network were unitary gates, the number of qubits would be the same for every vertical cross-section, but tensor networks (unlike standard quantum circuits) allow certain non-unitary vertices called isometries. Inspection of Fig. 6 shows that some of the vertices involve three edges; those are the isometries. They occur in the transition regions where the area varies.
An isometry can be thought of as a unitary gate in which one of the legs has been projected onto the state |0 as shown in Fig. 7. This allows us to draw the TN as a collection of edges connecting unitary gates, but with a subset of the edges being projected. A portion of the TN with isometries is shown in Fig. 8. Reading the tensor network from left to right, the tensor network expands when we input an ancilla qubit, and contracts when we post-select on a qubit. In general, the number of input ancilla qubits, m L = αN , can be different from the number of postselected qubits, m R = (α − γ)N = βN . |0i |0i |0i |0i One question is whether the operator defined by the TN in Fig. 6 is approximately unitary, or equivalently, is the two-sided state that it defines approximately maximally entangled? So long as the right end of the tensor network is larger than the left end (at leading order), the TN will generically be an almost perfect isometry from the left to the right. The state on the left will therefore be almost exactly maximally entangled with the state on the right. We shall assume that this is indeed the case.
With this assumption the TN can be shortened from the right by one-sided unitary operations. But the question is how many one-sided k-local operations are required? If the TN is small -say of polynomial size -one might conclude that the number of gates should also be small, but that is not the case.

Using Ancilla Qubits
Let's consider an initial state |I in the computational basis and act on it with the TN, inserted from the left side. The output state is where U T N is the original map from left to right defined by the tensor network, |I is the input state on the left, and |ψ is the normalized output state on the right side after post-selection. We will be interested in the relative complexity of |I and |ψ , when we allow Alice to prepare m L ancilla qubits that start at |0 and in the end m R qubits finish in |0 state. We begin on the left side of Fig. 6, and working from left to right, apply the unitary gates. The isometries on the left 'shoulder' of the Python's Lunch are straightforward; we simply couple in the ancilla and treat the isometries as unitary circuit elements.
But when we arrive at the right 'shoulder' the isometries correspond to final state projections (post-selections). Final state projections are not physically implementable processes, so we must do something new. That new thing is measurement: as Alice arrives at each isometry she simply measures the dangling qubit in Fig. 8 in the Z basis. 5 If she gets 0 she moves on to the next isometry and repeats the process. If at the end of all the isometries all measurements give 0 she moves on to the right of the TN and at the end she has prepared |ψ .
However, if she measures 1 at some point she starts over and repeats the entire process, until she succeeds in obtaining 0 for all post-selected qubits. On average she will have to repeat the procedure 2 m R times, for m R post-selected qubits. The total number of gates she will have to apply is of order where C TN is the number of nodes in the TN.
If this were the minimal protocol we would say that the complexity C(ψ, I) is of order 2 m R C TN . 6 However, in the appendix we show that there is a more efficient quantum procedure using a version of Grover's algorithm (which was applied to a similar problem by Kitaev and Yoshida in Ref. [13]).

The Complexity of the Python's Lunch
In Appendix A.2, we describe a protocol that uses Grover search to prepare the |ψ ⊗|0 ⊗m R from an initial state |I ⊗ |0 m L with a unitary circuit using 2 m R 2 C TN gates. Assuming that the length of the lunch is greater than the scrambling time, there is no reason to think that the task can be performed with fewer gates, thus implying that Our initial version of this protocol works only for a single fixed input state |I .
In contrast, we are really interested in finding a unitary operation U P L that satisfies for any input state |I . 7 In Appendix A.4, we such a 'state-independent' protocol by a variant on the usual Grover-search algorithm. Our state-independent protocol succeeds with high probability for any input state |I , given either the assumption that there exists an exact isometry from left-to-right, or simply the assumption that the right system is parametrically larger than the left system and that the tensor network is scrambling (and so can be modelled using 2-designs). The complexity of this protocol is again given by 2 m R 2 C TN . Since we have no good reason to think that a faster algorithm exists, we conjecture that (4.5) Since we have assumed that m R is a finite fraction of N, i.e., m R = βN , the complexity of U P L is exponential in N.
That is our main technical result: that the complexity of a TN is expected to be O(2 m R 2 C TN ) where m R is the difference between the maximal area of the lunch and the area of the right side (or, more generally, the larger side). In particular when m R ∼ N the complexity is exponential in the number of qubits N at either end. The surprising point about this result is that the TN that prepares C(U P L ) can be as small as C TN ∼ N log N. We can now use the analogy between tensor networks and bulk geometry to conjecture a relationship between the restricted complexity and the geometry of a Python's Lunch.
Restricted Complexity Conjecture: In a Python's Lunch geometry with min-maxmin areas A L , A max and A R , and with the assumption A L < A R , we conjecture that the restricted complexity on the right system is where C TN denotes the size of the tensor network and is related to the volume/action of the wormhole (C TN = V /G l AdS ) from the CV/CA conjectures [2,3,4].

Post-selection is Superpolynomial
In the previous subsection we argued that we can decode Hawking radiation by projecting out m qubits, and provided a version of a Grover search [14,15] that allows to do this projection with a unitary that has complexity √ 2 m . Can we rule out the possibility that there is an even faster algorithm that can perform this projection?
On the one hand, we can almost certainly rule out the possibility that there could be an exponentially faster algorithm. There cannot be an algorithm that projects onto m qubits in a time that scales polynomially with m. Or -more precisely -if there were such an algorithm then it would contradict widely held conjectures about computational complexity theory. The complexity class of decision problems you can solve on a quantum computer if you were allowed post-selections (including post-selections onto states with exponentially small amplitude) is called PostBQP. It has been shown [16] that PostBQP is a fantastically powerful class -it is equal to the class PP. Conversely, if you could use a normal quantum computer to implement exponentially rare projections in polynomial time, this would imply BQP=PostBQP. Taking these results together would imply BQP=PP. But PP is a very large class that contains all sorts of problems not believed to be efficiently soluble on a quantum computer, including NP. 8 It would therefore be in gross violation of but this generally doesn't help, because even though the wavefunction 'knows' the answer, linearity means widely held complexity assumptions if we could post-select in a time polynomial in m.
On the other hand, it is not obvious how to rule out the existence of a polynomially faster algorithm, that would still take a time exponential in m. It is not obvious that there can't be a protocol that would improve (say) the square root to a cube root. (The effect of such a speed up would be to change the coefficient in the exponent of the conjecture Eq. 4.6 from 1/2 to 1/3.) It has been proved that Grover search amongst d items cannot be implemented faster than π 4 √ d [17,18,19], 9 but we have an advantage not available in the Grover task, which is that we know in advance which final state we wish to post-select on.

Covariant Lunches
So far in this paper, to determine whether the spacetime contains a Python's Lunch we have implicitly assumed the existence of some preferred choice of bulk Cauchy slice. This is, in large part, a limitation of the tensor network toy models that we have been using to guide us and which resemble a bulk Cauchy slice rather than a full bulk spacetime. However, for the non-static spacetimes that we will be considering in future sections, it is not obvious how the correct slice should be chosen.
In earlier work, the complexity was conjectured to be dual to the volume of the maximal volume slice. An obvious possibility is to work entirely within this slice.
However, the covariant surface that is analogous to the minimal cut through a tensor network is the HRT surface [23,24,25], the minimal area extremal surface homologous to one end of the wormhole. For spacetimes where quantum effects are important, such as evaporating black holes, it is, more precisely, the minimal generalized entropy quantum extremal surface, 10 also known as the Engelhardt-Wall (EW) surface [26,27].
The existence of a second, locally minimal cut is analogous to the existence of a second there is no measurement we can do that has more than an O(1/d) chance of success that can induce the wavefunction to tell us what it knows. If we were able to project on the (exponentially in m = log d) unlikely outcome that a measurement of the final qubit is |1 , then we could find | x answer in one step. 9 There are also other known lower bounds for more general classes of algorithms that perform amplitude amplification [20,21,22] and which are closely related to our state-independent protocol from Appendix A.4. extremal surface, satisfying the same homology constraint. In all the cases that we will consider, there will also be a third extremal surface, that lies in between the first two surfaces, and has a larger generalized entropy than either. This third surface has an important qualitative difference compared to the other two surfaces: we cannot choose a Cauchy slice within which any small (but not necessarily local) deformation of this third extremal surface will increase its area (or generalized entropy). In particular, this means that it cannot ever be the HRT (or EW) surface, which is always globally minimal within some Cauchy slice [28,29]. Instead of corresponding to one of the narrow constrictions at the ends of the python, this third surface is a covariant definition of the maximum size of the bulge in the middle of the lunch.
In general, none of these surfaces will lie in the maximal volume slice. We therefore should not expect the correct covariant definition of a Python's Lunch to involve the maximal volume slice (although, in many examples, such as evaporating black holes, the maximal volume slice will also look like a Python's Lunch). Instead, we should think of a Python's Lunch as being defined by this set of three extremal surfaces, the two 'end surfaces' and the 'bulge surface' in the middle. With this new covariant definition of Python's Lunch we can modify our conjecture. , the restricted complexity on the right system is, where again C TN denotes the size of tensor network.
Naively, a covariant Python's Lunch seems a very specific and unusual feature of a spacetime. It needs to feature three extremal surfaces. Moreover the bulge surface needs to have greater area (or generalized entropy) than either end surface, and, unlike the end surfaces, within any Cauchy slice there should exist small deformations of the bulge surface that decrease its area (or generalized entropy). Nevertheless, every example that we consider of a spacetime with more than one extremal surface will turn out to have a Python's Lunch.
In Appendix B, we explain this phenomenon. We use 'maximin' techniques [28,29] to sketch an argument that almost all spacetimes with more than one extremal surface will contain a Python's Lunch. Specifically, we argue that one can generically find a third extremal surface by considering 'foliations' of a Cauchy slice from one extremal surface to the other, taking the maximal area (or generalized entropy) surface within that foliation, minimizing the maximum over all foliations, and then maximizing the resulting 'minimax' surface over all Cauchy slices. We call this a 'maximinimax' prescription for finding the bulge surface.

Evaporating Black Holes
In this section we will see how a Python's Lunch explains the exponential difficulty of decoding Hawking radiation.
After the Page time an evaporating black hole is maximally entangled with its own Hawking radiation [30]. Harlow and Hayden [1] asked how hard it is to isolate a degree of freedom r in the radiation that is entangled with a particular quantum b of Hawking radiation that is about to be emitted by the black hole (the AMPS task [31]). A highly related task is to decode the state of a small unknown diary thrown into the (known) black hole, just from the state of the Hawking radiation 11 . This is the Hayden-Preskill decoding task [32] and is also expected to be exponentially hard. If we can get the black hole and Hawking radiation into a simple state, both tasks are simple. The difficulty in doing either task comes from the exponentially large restricted complexity of the combined state of the black hole and Hawking radiation.
Building on earlier ideas in [33,34], it was shown in [35,36,37,38,39] that the information-theoretic achievability (or otherwise) of the Hayden-Preskill and Harlow-Hayden tasks could be understood holographically using entanglement wedge reconstruction. 12 After the Page time, a large part of the interior of the black hole is in the entanglement wedge of the early Hawking radiation, and so is encoded in the radiation. 13 This is essentially a holographic derivation of black hole complementarity and ER=EPR [47,48]. We shall now see that the exponential computational difficulty of the Harlow-Hayden and Hayden-Preskill tasks can likewise be understood holographically as coming from the existence of

Preliminary Example
We will consider a preliminary example. Consider a quantum computer initialized at t = 0 in some simple state |I which then evolves for a time greater than the scrambling time t * . The qubits are then split into two subsystems, Alice's and Bob's shares A and B. The two subsystems continue to evolve for a short time but with no coupling between them. The process is illustrated 14 in Fig. 9. Let us imagine sweeping across the TN by a series of cuts which foliate it as in Fig. 10. It is obvious that the number of qubits crossed by the cuts first increases and then decreases. At its maximum the number of qubits in at least N log N. Therefore the geometry of the associated wormhole has a Python's Lunch.
Because the system is scrambled at time t the subsystems A and B are approximately maximally entangled. It follows that Alice can act unitarily on her side in order to bring Figure 10: A foliation of the tensor network of Fig. 9 which interpolates between Alice's side and Bob's side. It is clear from the figure that the number of qubits cut by the slices increases and then decreases as the foliation sweeps round.
the system to a state close to the TFD. The arguments of the previous section show that the restricted complexity is exponential in N although the number of vertices in the TN is much smaller.

Hawking Radiation
In this section we will explain how the Python's lunch geometry appears during the evaporation of a black hole. Note that, for the moment, we are restricting our attention to a single Cauchy slice through this black hole-a generic 'nice' Cauchy slice that stays close to the black-hole horizon. For concreteness we can take it to be the maximal volume slice. In Sec. 5.3, we will discuss the full covariant description of this lunch.
A classical one-sided black hole (Bob's black hole) in a pure state is not connected to any purifying system by a wormhole. But it is connected to a growing "bridge to nowhere" (BTN) whose volume represents the complexity of the state. This is shown schematically in Fig. 11. Figure 11: Successive spatial slices through a one-sided non-evaporating black hole that formed from collapse. The "whiskers" on the left side depict the infalling matter. The 'bridge to nowhere' grows and becomes elongated as the complexity increases.
Starting at the horizon and moving into the BTN the area remains constant for most of its length until it quickly shrinks at the far end. The whisker-like lines at the end represent the infalling matter which originally created the black hole. As time increases the BTN grows.
Evaporation modifies this picture and effectively turns the one-sided system into a twosided system. The black hole becomes entangled with its own Hawking radiation which in effect becomes a second side. The process which was explained in [5] is depicted in figure  12 as a time-sequence at times t 1 < t 2 < t 3 . As the black hole radiates the area of its horizon decreases. In the figure this is shown as a decrease in the thickness of the BTN as one moves from left to right. The interior modes that purify Hawking radiation are shown as red dots. The partners of Hawking radiation emitted later in the evaporation are at the furthest right of the diagram. These interior modes are entangled with the Hawking radiation, and so they have an entropy that contributes to the generalized entropy of any region containing them, but not containing the Hawking radiation, or vice versa. Alternatively, in the language of ER=EPR, they can be thought as being connected by a Planck-area "micro-wormhole" to the Hawking radiation. The homology constraint forces  us to cut these micro-wormholes, which increases the generalized entropy. Now suppose that Alice collects the radiation in a second system A shown as an elongated ellipse in Fig. 13. System A will now be connected to the bridge-to-nowhere via either bulk entanglement/micro-wormholes.
Let us assume that, like a tensor network, the entropy of the Hawking radiation is given by the 'minimal cut' through this Cauchy slice, where the size of a cut is given by its generalized entropy. Of course, in general, this will only be true if we have chosen our Cauchy slice appropriately. However, we will be able to derive the correct qualitative conclusions by just studying the maximal volume slice.
We construct "cuts" separating A from the boundary at the right end of the "fish tail." The cuts can be characterized by an area. In Fig. 14, we see a series of such cuts as we sweep from A to the fishtail.
The generalized entropy of each cut consists of two contribution. One is the portion that Figure 14: A non-temporal sweep of spatial cuts through the expandable space blimp geometry of Fig. 13, analogous to the sweep of the spatial slices in Fig. 10.
cuts through the bulk entanglement between the interior and the Hawking radiation. The second contribution comes from the classical area required to cut the bridge-to-nowhere. Let us track the generalized entropy as the cut proceeds: • In the first cut in Fig. 14 the only contribution to the generalized entropy comes from the bulk entanglement. That contribution is proportional to the entropy in the radiation.
• The next cut also cuts the entanglement between the interior and the radiation.
However, it also cuts across the largest part of the BTN. We see that there is a quick increase in the generalized entropy of the cut.
• In the third and fourth cuts in Fig. 14, the cut moves to the right. As it does so the both contributions to the generalized entropy decrease.
• The final cut in Fig. 14 only cuts across the bridge-to-nowhere near the horizon of the black hole.
The evolution of the generalized entropy of the cut is shown in Fig. 15.
There are two minima to the generalized entropy. One is the cut through the bulk entanglement represented by the green lines in the top picture. As we sweep across the For t < t page , the minimum generalized entropy cut is at the beginning, when the bulk entanglement between the black hole and the Hawking radiation is being cut. For t < t page , the minimum generalized entropy cut is near the end, when the cut is near the black hole horizon. In both cases, the largest cut comes near the beginning, when the generalized entropy is the sum of the semiclassical entropy of the radiation, plus the initial Bekenstein-Hawking entropy of the black hole.
generalized entropy makes a fairly sudden increase, and then a slow gradual decrease to a second minimum -the horizon -at the fishtail. Up to subtleties involving the choice of Cauchy slice, these two minima correspond to the two quantum extremal surfaces found in [35,36].
Early on, the horizon area A hor (or, more precisely, the Bekenstein-Hawking entropy A hor /4G N ) is much larger than the bulk entanglement between the interior and the Hawking radiation. At a very late time the horizon shrinks to zero while the bulk entanglement becomes very large.
At some point there is a crossover where the two minima are degenerate. This defines the Page time. Because the evaporation is irreversible, this happens when the horizon area A hor is slightly larger than half its initial area A 0 (for Schwarzschild black holes in our universe it happens when A hor ∼ 0.6A 0 [49]).
The important point is that the geometry has a Python's Lunch separating the two minima. The generalized entropy at the maximum of the bulge is Figure 16: At late times, a more efficient way of sweeping between the two minimal cuts is to create two cuts near the horizon, and then to sweep one of these cuts "backwards" along the wormhole. Now, suppose that, just after the Page time, Alice, who controls the Hawking radiation, wants to apply gates or a Hamiltonian to shrink the wormhole to the TFD associated with the black hole of area A hor . Assuming that the analogy between a tensor network and the Cauchy slice holds, the protocol in the appendix she can do so in a time that is O(S)e A 0 /8G N . This is consistent with the restricted complexity being exponentially large.
Of course, so far we have only considered one way of sweeping through the Cauchy slice, from one minimal cut to the other. If we could find another way of sweeping through the slice that had a smaller maximal generalized entropy, it would suggest that a more efficient protocol exists, since in a tensor network less post-selection would be required.
In fact, as far as we can tell, the way of sweeping through the slice just described should be optimal both before and shortly after the Page time. However, at late times, an alternative way of sweeping through the slice becomes preferable. We now analyze this second way of sweeping through the slice, which is shown in Fig. 16.
• In the first cut in Fig. 16 the only contribution to the generalized entropy comes from the bulk entanglement and is proportional to the entropy in the radiation, as Figure 17: A comparison of the generalized entropy as a function of sweep parameter for the 'forwards' (Fig. 14) and 'reverse' (Fig. 16) ways of sweeping through the bridge to nowhere. Initially, the generalized entropy of both is given by S rad . The forwards sweep quickly increases by A 0 /4G N and then steadily decreases as the sweep moves along the bridge-to-nowhere. The reverse sweep quickly increases by 2A hor /4G N , slowly decreases as the cut moves backwards along the bridge-to-nowhere, and then finally quickly decreases by A 0 /4G N . The reverse sweep has a smaller maximum size, and hence is more efficient, when A hor < A 0 /2. before.
• The next cut also cuts the bulk entanglement between the interior and the radiation.
However, it also includes an additional "double cut" near the horizon. This gives an additional area contribution equal to 2A hor /4G N .
• In the third and fourth cuts in Fig. 16, one half of the double cut moves to the left.
As it does so, its area increases, but the bulk entanglement decreases. Because black hole evaporation is irreversible, the second effect is slightly larger than the first, and so the generalized entropy slowly decreases.
• Finally, the left-moving cut reaches the end of the bridge-to-nowhere and disappears.
The generalized entropy therefore has a sharp decrease by A 0 /4G N .
The evolution of the generalized entropy of the cut is shown in Fig. 17. The generalized entropy quickly increases as the double cut is added, reaching its maximum size of It then slowly decreases, before a final sudden drop by A 0 /4G N . As shown in Fig. 17, this method of sweeping through the bridge-to-nowhere is therefore more efficient than the forwards sweep when A hor < A 0 /2. Note that this transition happens strictly after the Page time, which, although commonly described as happening at the halfway point in the evaporation, happens when A hor > A 0 /2.

The Covariant Lunch
Of course, as discussed in Sec. 4.4, the correct covariant definition of a Python's Lunch is not given by the shape of a single Cauchy slice, but by a set of three quantum extremal surfaces, the two end surfaces and the bulge surface in the middle. However, as discussed in Appendix B, there exist Cauchy slices within which the most efficient 'sweep' has locally minimal generalized entropy at the end surfaces, and locally maximal generalized entropy at the bulge surface. We can think of these as the Cauchy slices where the tensor network analogy is valid.
For an evaporating black hole, the location of the end surfaces were calculated in [35,36]. The first end surface is the empty surface, containing no points. Its generalized entropy is simply the semiclassical entropy S rad of the Hawking radiation. The second end surface, which becomes the EW surface after the Page time, lies at a radius that is O(G N ) inside the horizon of the black hole, and at an infalling, or retarded, time that is one scrambling time in the past of the current boundary time. Its generalized entropy is given at leading order by the Bekenstein-Hawking entropy A hor /4G N .
The bulge surface was obviously less of a focus in [35,36], because it is never the EW surface. However, the bulge surface that corresponds to the maximum size in the 'forwards sweep' was briefly discussed in [36]. For a one-sided black hole formed from collapse, it lies inside the infalling matter that created the black hole. As the black hole forms, a classical apparent horizon appears that moves outwards in a spacelike direction towards the lightlike event horizon (which, being teleological, formed before the infalling matter even arrived). After the infalling matter has passed through and the black hole begins to evaporate, the classical apparent horizon ends up a Planckian radial distance outside the event horizon. When no greybody factors are present, we can use Eqs. 89 and 90 from [35] to see that there will exist a quantum extremal surface, at sufficiently late times, at a radius where r s (v) is the Schwarzschild radius (and hence the radius of the apparent horizon) in the ingoing Vaidya metric describing the black hole, c evap is the number of two-dimensional bosonic modes (i.e. number of angular momentum modes in higher dimensions) involved in the evaporation, and Ω d−1 is the volume of the unit (d − 1)-sphere, and at an infalling time v when What is the infalling time v when is Eq. 5.4 satisfied? Since the apparent horizon r s (v) goes from far inside to a Planckian distance outside the event horizon r hor (v), a solution must exist (assuming the metric is smooth) somewhere inside the infalling matter. Generally it will be near the end of the infalling matter, when the apparent horizon has approached within a Planckian radial distance of the event horizon. Figure 19: The covariant bulge surface (black) at late times (i.e. for the 'reverse' sweep) consists of two spheres (or points in two dimensions). Both lie slightly inside the EW surface (red), which determines the Hawking radiation entanglement wedge E A (green) and the entanglement wedge E B of the system containing the black hole (blue).
The early time bulge surface is shown in Fig. 18. We note that its generalized entropy is indeed approximately equal to the initial Bekenstein-Hawking entropy of the black hole, plus the entropy of the Hawking radiation, as expected from our analysis of the maximal volume Cauchy slice (see [35,36] for explicit calculations).
What about at late times, when our analysis of the maximal-volume slice suggested that a 'reverse sweep' through the bridge-to-nowhere was optimal? In this case, we expect that the dominant bulge surface should consist of the union of two spheres, both close to the late-time horizon. In general, calculating the location of an extremal surface with this topology is considerably harder than finding extremal surfaces consisting of a single sphere. However, it is possible for JT gravity plus free Dirac fermions. We study this theory in Appendix C. As hoped, we find an extremal surface that consists of two points (or zero-spheres). Both points lie one scrambling time in the infalling past of the current boundary time, just like the late-time EW surface. However, as shown in Fig. 19, both points are spacelike separated from, and further inside the horizon than, the EW surface. This is consistent with this surface being the maximum generalized entropy surface in a sweep through a Cauchy slice that goes from the empty surface to the late-time EW surface. Consistent with our expectations based on a single Cauchy slice, the generalized entropy of this surface is approximately 2A hor /4G N + S rad .

The Time Dependence of the Restricted Complexity
Using our covariant restricted complexity conjecture from Sec. 4.4, we can now make a precise conjecture about how the restricted complexity of the evaporating black hole state evolves over the course of the evaporation. We show a plot of log C R against A 0 − A hor over the course of the entire evaporation in Fig. 20.
There are three distinct phases to this evolution. Before the Page time, the EW surface is empty, with generalized entropy S rad , while the bulge surface is the forwardssweep surface, with generalized entropy S rad + A 0 /4G N . Finally, the larger end surface has generalized entropy A hor /4G N . According to our conjecture in Eq. 4.9, the restricted complexity is controlled by the difference in generalized entropy between the bulge surface and the larger end surface. It is given by The factor of t here comes from the volume of the lunch, which controls C TN according to the conjectures of Refs. [2,3,4]. The first phase transition happens at the Page time, where the EW surface becomes nonempty. This means that the larger end surface becomes the empty surface, with generalized entropy S rad . The restricted complexity is therefore and only changes linearly with time. Finally, when A hor = A 0 /2, we have a second phase transition. This time, it is the bulge surface that changes. It becomes the reverse-sweep surface, with generalized entropy S rad + 2A hor /4G N . The restricted complexity begins to decrease, and is given by Importantly, as the black hole completely evaporates the exponent tends to zero, and the restricted complexity becomes O(t). This is exactly what we expect. The black hole has completely evaporated and so we have a one-sided system again. The restricted complexity will therefore be equal to the unrestricted complexity, which is O(t).

Python's Lunches Beyond Black Hole Evaporation
Black hole evaporation is an inherently quantum mechanical phenomenon. It violates the null energy condition (NEC), for example, even when all the quantum fields in the theory would classically obey the NEC. One might therefore think that Python's Lunches themselves are an inherently quantum mechanical phenomenon and cannot exist in classical spacetimes. As we shall see, this is not at all true. Instead, there are numerous important examples, beyond black hole evaporation, of both classical and quantum lunches.
It is important to note that, in many of these examples, the size of the lunch is fixed in the semiclassical limit. This means that we would not expect a tensor network toy model to fully scramble the degrees of freedom over the course of the lunch. We should therefore be somewhat circumspect in conjecturing that the restricted complexities are actually exponentially large in these cases.
The first, and simplest, example of a Python's Lunch is a two-sided black hole with a heavy brane, sitting in the Einstein-Rosen bridge, as shown in Fig. 21. The backreaction of the brane on the spacetime separates the left and right horizons, creating a Python's Lunch. The two end surfaces lie on the left and right bifurcation surfaces, while the bulge surface lies at the intersection of the brane world line with the static slice. 15 Unlike an evaporating black hole, this spacetime does not violate the null energy condition. However, it cannot be created from the thermofield-double state using semiclassical unitary Lorentzian evolution, because the entire worldline of the brane lies behind the black hole horizon. It can be created using a simple Euclidean path integral, as shown in Fig. 21, but Euclidean evolution is not unitary. Both facts are consistent with the restricted complexity of the state being very large. 16 A second classical example of a Python's lunch is a one-sided black hole, formed from collapse. This is shown in Fig. 22. The two ends of the python are just two halves of the single boundary. The bulging lunch in the middle is just the bridge-to-nowhere. On each side of the black hole there are classical extremal surfaces, which form the ends of the lunch. What about the bulge surface? At sufficiently early times, it seems likely that the bulge surface will go inside the horizon (for that matter, at sufficiently early times the end surfaces will also go inside the horizon). However, we would expect the area of any extremal surface going inside the horizon to grow with time. At late times, we should instead expect the bulge surface to 'wrap around' the horizon. Indeed, for a BTZ black hole, it will be a self-intersecting geodesic that winds once around the horizon.
It is easy to see that the unrestricted complexity of this one-sided black hole state is small. It will be proportional to the time since the black hole first formed. However, because this time evolution couples the two halves of the boundary, then we should expect Figure 22: A one-sided black hole forms a Python's Lunch between one half of the boundary, region A, and the other half, regionĀ. There is an extremal surface either side of the black hole, with the 'bridge-to-nowhere' of the black hole forming the bulging lunch.
the restricted complexity to be very large, at least at late times. Again, the existence of a Python's Lunch corresponds to a large gap between restricted and unrestricted complexity.
A more quantum example, closer in spirit to the single-sided black hole evaporation studied in Sec. 5.2, but with a connected classical geometry everywhere, goes as follows. We start with the thermofield-double state, and then evolve it forwards in time, but with a small coupling between the left and right boundaries. This coupling will mix Hawking radiation between the two exteriors, causing Hawking radiation from the right to end up falling into the black hole from the left and vice versa.
Because the two sides of the black hole are in thermal equilibrium with one another, the size of the black hole will stay approximately constant. There will be no 'classical' Python's Lunch. However the coupling between the two sides will create long-range entanglement between the quantum fields at each end of the wormhole, via Hawking radiation escaping one end and then falling into the other. This creates a quantum lunch (in a classical python). See Fig. 23.
Again, the unrestricted complexity of the state should be small, because it was created by a simple unitary evolution from the thermofield-double state. However, this evolution coupled the two sides, and so the restricted complexity may well be large.
Our final example of a Python's Lunch is the AdS 3 vacuum. If we divide the boundary Figure 23: Coupled evolution of the thermofield-double state creates a quantum lunch in a classical python. The classical cross-section of the wormhole remains constant in size, but there is long range entanglement between the two sides, which increases the generalized entropy of a cut through the middle of the wormhole.
into two connected halves, as with the one-sided black hole discussed earlier in this section, there is no Python's Lunch. However, if each end of the python itself consists of two disconnected regions, as shown in Fig. 24, a lunch appears. There are two topologically distinct end extremal surfaces, plus a self-intersecting bulge surface in the middle. This suggests that the restricted complexity of constructing the vacuum state without coupling these two complementary disjoint regions may be very large. However, the small volume of the lunch suggests that it won't be fully scrambling and so our restricted complexity conjecture (4.6) may not apply.

What is Holographic Complexity?
There have been various proposed definitions for the bulk quantity that is to be holographically dual to boundary complexity. The two most prominent proposals have been the volume of the maximal-volume slice [2], and the action of the Wheeler-de Witt patch [3,4]. In practice, these two proposals tend to give similar answers. Dual to this abundance of promising bulk quantities is the abundance of promising boundary quantities. Various definitions of boundary complexity have been considered. The original suggestion was that it should be unitary circuit complexity (the minimal number of simple gates, from a given primitive gate set, required to build the state from a simple starting point). There is also a 'continuous' variant on the unitary circuit complex- ity, which is defined using the Nielsen geometry [50]. Finally, there is the intuition that the volume of a slice is dual to the size of a tensor network required to build the state. As we have seen in this paper, this is different from the unitary circuit complexity, because a tensor network may contain non-unitary elements.
One way to make a highly complex state with a small volume/action is to have the state of the bulk fields be highly complex. This suggest that the holographic complexity should have a 'quantum correction', similar to the quantum corrections to the Ryu-Takayanagi formula, given by the complexity of the state of the bulk fields.
As we have seen in this paper, even when the bulk fields are in a simple state, the restricted unitary circuit complexity may be much larger than the volume/action, if the geometry contains a Python's Lunch. This is not a contradiction with the conjectures of Refs. [2,3,4], since the unrestricted complexity is still small, and comparable to the volume/action. However, as we shall see in this section, even the unrestricted unitary circuit complexity can be much larger than the volume/action if we consider states that are prepared using non-unitary processes and therefore may contain 'one-sided Python's Lunches'. This is true even when the bulk state is very simple. This suggests that the correct dual quantity to the holographic complexity is the size of the tensor network Figure 25: A one-sided black hole is allowed to evaporate, and then the Hawking radiation is measured. This produces a pure black hole microstate with interior modes that are in some simple state that depends on the measurement outcome. In the maximal volume slice, there is a local minimum of the generalized entropy near the horizon, where the area is smallest. required to make the quantum state, or equivalently the circuit complexity where nonunitary post-selections onto the outcomes of simple measurements are allowed.

Measuring Radiation and One-sided Lunches
Suppose we take a one-sided black hole, and allow it to evaporate as in Sec. 5.2. However, rather than storing the radiation in a second system, we instead measure it in some complete basis. This basis does not have to be complicated: it can be a product basis, for example. The black hole will now be in a pure state; in particular, the interior modes that were previously entangled with the Hawking radiation will now be in a pure state that depends on the measurement outcome.
The resulting bulk geometry can be thought of as a 'one-sided Python's Lunch', with a bridge to nowhere which is largest at the end and then becomes gradually smaller as one approaches a quantum extremal surface near the horizon, as shown in Fig. 25. The exact location of this quantum extremal surface is hard to calculate, but it is easy to show that it should exist, as argued in Fig. 26.
The volume of the maximal volume slice, and the action of the Wheeler-de Witt patch will grow linearly with the time the black hole was allowed to evaporate for. We expect that this will also be the size of the minimal tensor network needed to describe the state. However, if this tensor network resembles the bulk geometry, its cross-section will be largest at the end of the wormhole and then become smaller near the horizon. Such a tensor network cannot generically be produced by a unitary circuit of the same size, Figure 26: To argue that a nonempty quantum extremal surface should exist for the pure state of a two-dimensional black hole, after measuring Hawking radiation, we consider Cauchy slices that asymptote to the maximal volume slice in the distant past, but which are allowed to vary elsewhere. In particular, we allow the slice to vary at infalling times approximately one scrambling time in the past of the current boundary time (this is where the non-empty extremal surface was found in [35,36]). As shown in Fig. 25, such slices should have a local minimum in their generalized entropy near the horizon. If we vary the Cauchy slice too far into the interior of the black hole, this local minimum will become small because the area becomes small. Conversely, if we push the slice too close to the past lightcone, the bulk entropy will become very small, which will also decrease the generalized entropy. By choosing our Cauchy slice to maximize the generalized entropy of this local minima, we would necessarily find a non-empty extremal surface, in a location similar to the surface found in [35,36]. without allowing post-selection.
Instead, applying our restricted complexity conjecture, in the special case where one system is trivial and so the restricted complexity is actually just the unrestricted complexity, we find that the unitary circuit complexity should be proportional to exp[(S This is exactly what we should expect. The state was prepared using a measurement, which can only be reproduced deterministically by using post-selection, or by using Grover search, as discussed in Sec. 4 and Appendix A. The complexity of this process is indeed exponential in the number of post-selected qubits. One might worry that this conclusion seems wrong: what about if we just reversed time, ignoring the fact that the measurement had happened, until we got back to a time  Figure 27: (a) Penrose diagram for the spacetime of a one-sided black hole, with the Hawking radiation extracted and then measured. (b) If we reverse the time evolution after the measurement is done, the black hole won't disappear as one might naively expect. Instead, at one scrambling time in the past, the backreaction from the measurement (shown in red as a small shockwave) will create a white hole. The interior partners of the Hawking radiation will therefore never escape from behind the horizon. before the black hole ever formed? Once there was no black hole, it would presumably be easy to get back to a simple state using a simple circuit. The answer (see [51,33,35,36] for similar discussion) is that, by measuring the Hawking radiation, we necessarily create a small positive-energy localized shock that approaches the horizon of the black hole as we go backwards in time. At an infalling time approximately the scrambling time in the past, the backreaction of this shock becomes significant and it turns the black hole into a white hole. If we continue to evolve the boundary backwards in time, the black hole will never disappear and the interior modes will never escape. Instead we will just see a time-reversed version of Hawking radiation coming out of the newly created white hole. This is shown in Fig. 27.
How does this correspond to a tensor network model? As we evolve time backwards, the simplest tensor network describing the state initially becomes smaller, as the reverse time evolution undoes tensors that were previously added to the network by the forward evolution. However, we cannot 'undo' the projections created by the measurement of Hawking radiation. At one scrambling time in the past, these projections will have infected the entire cross section of the network. Further backwards time evolution cannot remove any more tensors from the network, and so instead will have to add new tensors: the size of the simplest tensor network therefore 'bounces' from this minimal size and begins to increase. This corresponds to the appearance of the white hole horizon.
Another example of a one-sided Python's Lunch is the brane-in-a-wormhole state discussed at the start of Sec. 6. In that section, we used it as an example of a two-sided lunch. However, the union of the left and right extremal surfaces is homologous to the union of the left and right boundaries. We therefore have a non-empty classical extremal surface for the union of the two boundaries, which creates a one-sided Python's Lunch. This suggests that even the unrestricted unitary complexity of the state may be very large. This is perfectly consistent, since we only know how to prepare such a state using a non-unitary Euclidean evolution.
We have argued that states with a non-empty extremal surface for the entire global boundary (i.e. a one-sided Python's Lunch) have high unrestricted unitary complexity. A good consistency check on this claim is that such states cannot be created by semiclassical Lorentzian evolution from states with a Cauchy slice that is entirely within the causal wedge of the boundary. This is indeed true: the entanglement wedge, defined using any extremal surface (not just the HRT surface) must contain the causal wedge, and hence the entire spacetime, by standard focussing arguments (this requires the generalized second law or quantum focussing conjecture [52] in the case of quantum extremal surfaces). It is therefore entirely consistent that they should always have very high unitary circuit complexity.

Post-selected State Complexity
The black hole with measured Hawking radiation appears to be a counterexample to the idea that unitary circuit complexity equals volume/action. The volume and action only grow linearly with the time the black hole is allowed to evaporate for, but the size of the simplest unitary circuit required to produce the state appears to grow exponentially. One response to this problem is to note that this state could only be produced by measuring Hawking radiation, and that a measurement really corresponds to entangling the state with an ancilla measurement apparatus. If we include the measurement apparatus, the unrestricted complexity of the state (including the measurement apparatus) will still be small.
This argument is somewhat unsatisfying. It would be nice to have a boundary quantity (such as complexity) that corresponds to volume/action even for states that can only be produced using post-selection, or by Euclidean path integrals as with the first example from Sec. 6.
There is an obvious candidate quantity. We just redefine the notion of state complexity to allow post-selection onto simple states (say |0 ). Equivalently, we define it as the size of the smallest tensor network required to make the state. With the usual definitions the complexity of creating a state from a simple state is the same as the complexity of starting with the state in question and returning to the simple state. But when post-selections are allowed, this changes. 17 The relevant complexity is C p (I, ψ) i.e. the complexity of creating the state |ψ , from, the simple state |0 ⊗n , using both unitary gates and simple projections.
For specific states |ψ , allowing projections may dramatically reduce the number of operations required, as we have seen. However, for typical states, projections don't buy you much. As we shall see below with a counting argument, it still takes ∼ 4 N operations to get to the most general state.
As an aside, it is worth noting that it doesn't significantly matter whether we allow postselection at arbitrary intermediate points in the state preparation process, or only at the end after all the unitary gates have been applied. This is because we can always implement the desired post-selection by using a unitary operator that 'measures' the relevant register into an ancilla quantum register (this is sometimes called a von Neumann measurement of the first kind) and then post-selecting the ancilla register after all the other unitaries have been applied.

Post-selected Complexity can be Exponential
The maximal unitary circuit complexity of an N qubit state scales exponentially with N . This can be established with a counting argument. On the one hand Hilbert space is double-exponentially huge number of -balls in N -qubit Hilbert space ∼ 2 2 N . (7.1) On the other hand, the number of states that can be made with C gates (for definiteness, let's say we have a universal 2-local gate together with a 1-local phase) is merely exponentially large. At each step, we can apply our 2-local gate in one of N 2 places, or apply our 1-local phase in one of N 1 places, so In order to reach all the states, C must be exponentially big.
However we have now changed our definition of 'complexity'. We have given ourselves the power not only to apply 2-local gates, but also to project the first m-qubits to |00000000 . Since this increases our power, it decreases the complexity. This gives rise to the state synthesis version of the PostBQP complexity class, and as we saw in Sec. 4.3 this is fantastically powerful -there are many quantum states that would normally be exponentially hard to make that are now easy. Are they, in fact, all now easy? Let's prove that the answer to this question is 'no'. We will argue that even granting ourselves the power to post-select, there are still states that are exponentially hard to make. This can again be established with a counting argument. We have more options than before. At each step, as well as applying our 2-local gate or our 1-local phase, we can also project on the first m qubits 18 for any 1 ≤ m ≤ N . Thus the number of different states we can make is which -the point is -is still only exponentially big. We still need exponentially large C to hit all of the double-exponentially numerous -balls.
It would be nice to have a definition of post-selected state complexity that did not rely on a choice of discretization of the Hilbert space into -balls. For unitary state complexity, one such definition is the smallest geodesic distance from the identity to a unitary taking one state to the other in the so-called Nielsen geometry [50]. This is a right-invariant (but not left-invariant) metric on the space of unitaries where distances are much smaller in simple directions (generated by k-local Hamiltonians) than in other directions.
However, if we allow post-selection on a single qubit, then any state can be prepared using a unitary that has arbitrarily small complexity as measured by the Nielsen geometry. The reason for this is that the Nielsen metric (like any metric) is continuous. Hence the complexity of a unitary mapping |0 |ψ → |0 |ψ + ε|1 |φ (7.4) can be made arbitrarily small by making ε sufficiently small. However, we can always post-select onto |1 and produce |φ , no matter how small ε is. Instead, it seems like the right continuous measure of post-selected state complexity would be to make the cost of the post-selection be log p, where p is the probability of obtaining the correct measurement outcome. For typical states where the amplitudes of the post-selected outcomes are not exceptionally large or small, this still corresponds to having an O(1) cost for each post-selected qubit.

Summary
This paper has addressed an apparent inconsistency between the holographic complexity conjectures [2,3,4] and the Harlow-Hayden result [1]. The inconsistency is manifest in an evaporating black hole slightly after the Page time: on the one hand, the volume or action of the black hole is only polynomial in the entropy S, and thus the holographic complexity must be moderate; on the other hand, Harlow & Hayden argue that the complexity of decoding the Hawking radiation must be exponentially large. The difference arises from using different definitions of complexity. The holographic complexity conjectures relate the volume/action of the geometry to unrestricted complexity, which allows gates that span the entire system; whereas Harlow & Hayden's result is about restricted complexity, which forbids gates that couple the interior of the black hole to the previously emitted Hawking radiation.
This distinction motivated us to ask: if action or volume are the geometric duals of unrestricted complexity [2,3,4], what is the geometric dual of restricted complexity?
We conjectured an answer. Exponentially large restricted complexity corresponds to the existence of a geometrical feature that we call a "Python's lunch". In a Python's lunch, the cross-sectional area of the wormhole grows and then shrinks again, in a min-max-min pattern. The restricted complexity, we conjectured in Eq. 4.6, is given by the exponential of the difference between the area of the maximum and the area of the larger of the two flanking minima.
We tested this conjecture in a toy tensor-network model, and found agreement with the Harlow-Hayden estimate. We then made a covariant version of our conjecture, Eq. 4.9, by replacing the min and max areas of the Python's Lunch with generalized entropies of appropriate quantum extremal surfaces. With this generalization, we studied several examples of the Python's Lunch and estimated the restricted complexities in each case, including evaporating black holes, one-sided pure-state black holes, and empty AdS with two disjoint intervals. In all cases where we were able to test our conjecture, the restricted complexity was consistent with the size of the Python's Lunch.
Lastly, in Sec. 7 we returned to the subject of unrestricted complexity. We studied the example of black holes that have had all their Hawking radiation measured, and which therefore have been rendered pure. Using this example, we reconsidered exactly which boundary quantity it is that is holographically dual to the volume or action of the wormhole. In Refs. [2,3,4] it was conjectured that this quantity is the unrestricted unitary circuit complexity, which means the allowed primitive gates are all unitary. Instead, we argued that the definition of unrestricted holographic complexity should also permit nonunitary post-selection -holographic complexity should allow projections onto simple states.

A Complexity of Post-selection
A lemma we will use repeatedly in this appendix is that if an N -qubit state |ψ is simple, then so too is the unitary This unitary flips the sign of the |ψ component of a wavefunction, while leaving all orthogonal components unchanged. Let's explicitly construct a simple circuit that does this. First, note that what it means for |ψ to be simple is precisely that there is a simple unitary U |0 →|ψ that connects it to the reference state, |ψ = U |0 →|ψ |0 . We can therefore write this first transforms to a basis in which the |ψ component of the wavefunction becomes the |0 component, then flips the phase of the |0 component, then transforms back again. This construction upper bounds the complexity We can understand the factor of 2 in this equation as arising from the fact that to make 1 − 2|ψ ψ| the protocol sweeps twice over the circuit that manufactures the state, first to unmake it (the ψ| part) then to make it again (the |ψ part).

A.1 Projecting on a ququit
Suppose we have a simple unitary U that maps a simple state to a superposition Since U is simple, and |s |1 is easy to make, the right-hand side must also be easy to make.
• Question: how complex can it be to make |α |1 ?
As we will see, the answer is "not complex".
One simple strategy to make |α |1 is just to make U |s |1 and then measure the last ququit. Sometimes we'll find the last ququit to be |2 , |3 , or |4 ; if that happens we throw the state away and start over. Other times we'll find the last ququit to be |1 and can declare victory. In this way we can make an |α -factory that has efficiency 1 4 . For some situations, this simple strategy suffices. But if the projection is a step buried deep within a larger circuit, starting again might not be so easy. Or if our initial state is entangled with another state that we do not control, then starting again might be impossible. And if we want to project not 1 ququit but N ququits, the probability of success falls like 2 −2N .
The solution to this problem is to use a close cousin of Grover's algorithm, as we will now explore. First, let's define the 'pre-image' of the four possible answers we could have gotten so that |s |1 = 1 2 (|α + |β + |γ + |δ ). Next observe that it is simple to make the operator Uα that flips the sign of the |α term in a wavefunction while leaving the other terms invariant. We can do this by evolving the state with U , doing a sign-flip controlled on the last ququit being |1 , and then evolving back with U † Finally, since |s |1 is by assumption easy to build, it must also be easy to build the operator that flips everything except |s |1 , Now we concatenate these easy operations to give the desired projection Note that the only reason this worked was because there was an easily addressed ququit to diagnose the final branch. If we just had U |s = |α + |β + |γ + |δ then this method wouldn't have worked, and indeed couldn't have worked since there is such a decomposition for any |α , even if U is the identity.

A.2 Projecting on a qudit
The case of the last subsection was misleadingly easy: we needed only a single implementation of U s V to hit the target state exactly. More generally, we may wish to project onto the value of a qudit, where we can think of d = 2 m for m qubits or m/2 ququits. And it may be that the amplitudes of the states are not evenly distributed. Let's start with an initial state |s |0 ⊗n on total of k qubits with n-ancillary qubits at state |0 . Then, we wish to project (post-select) onto the outcome of m qubits. The output state after the unitary U is applied can be expressed as U |s |0 ⊗n = sin θ|α |0 ⊗m + cos θ|β , where |β is any normalized state with 0| ⊗m |β = 0. Repeating the procedure of the last subsection gives U U s V |s |0 ⊗n = sin 3θ|α |0 ⊗m + cos 3θ|β .
where we denote by for n ancillary and m postselected qubits. We see that in the last subsection we got lucky, since sin θ = 1 2 → sin 3θ = sin 3 π 6 = 1. For more general θ, a single iteration will not yield the desired projection. Iterating l times gives Thus the number of iterations to implement the projection is given by (2l When U |s |0 ⊗n is an equal superposition of d(= 2 m ) states, we have θ = arcsin[1/ √ d] and this gives the celebrated large-d Grover scaling l ∼ π 4 √ d.
Finally, we must confront the possibility that Eq. A.16 does not give a whole number. This is problematic since in general even if a U is easy to implement, √ U may be hard. We could lower our ambitions by implementing the integer part of n and settling for being approximate. But we can do better. We first introduce a fresh qubit, and then use it to bleed some of amplitude out of sin θ|α |0 ⊗m |s = U φ U |s |0 ⊗n |0 = cos φ sin θ|α |0 ⊗m |0 + sin φ sin θ|α |0 ⊗m |1 + |other . (A.17) Then we repeat the iterative procedure using |s instead of |s |0 ⊗n and I instead of U , and carefully choose φ to land on the next integer greater than π 4θ − 1 2 , Even though it is in general not easy to implement a fractional power of an easy unitary, for the specific unitaries we are considering it is.
Let us see what this analysis means for the simplest possible case, that of an evenly split qubit, with θ = π 4 . Equation A.12 made the situation look hopeless -we would just cycle in a loop forever, θ = π 4 → 3π 4 → π 4 → 3π 4 → . . ., never getting any closer to the target state. But now we see that the correct procedure it to first add an extra qubit, and then use φ = π 4 to transform the pair of qubits to an even superposition of a ququit, returning us to the exactly implementable example of Sec. A.1.

A.3 Projecting on very unlikely outcomes
Suppose we wish to project onto a final state that has tiny amplitude, θ 1. For example, we may have an equal superposition over m qubits with large m, giving θ = 2 −m/2 . How complex is this projection? Let us examine three possible methods: • Measure-and-pray. If we measure the qubits and hope for the right answer, the probability that we get lucky is θ 2 , giving measure-and-hope method: complexity ∼ θ −2 = d = 2 m . (A.19) • All-at-once Grover-style projection. Using the method of Sec. A.2, we saw in Eq. A.18 that we can effect a square-root speed up, all-at-once Grover: complexity ∼ θ −1 = √ d = 2 m/2 . (A.20) • Step-by-step Grover-style projection. In Eq. A.20, we simultaneously projected onto the values of all the target qubits. An alternative strategy would be to project on each target ququit in turn, so that each individual projection is then onto a state that is not particularly unlikely. As discussed in

A.4 Removing the state dependence
So far we have only tried to construct a unitary that produces a single output state |α , given input |s |0 ⊗n and unitary U . We used a unitary sequence that works only for the particular input state |s . Our actual task is somewhat more complicated. We need to construct a unitary circuit that produces the same output as our post-selected circuit for any input state |s . In other words, we want to find a unitary U , such that for any input state |s we have where U is a simple unitary, C is a numerical constant. It turns out that we can easily adapt our construction from previous section to produce such a unitary if such a unitary U exists at all. Our construction is very closely related to robust oblivious amplitude amplification, which was independently introduced in [20,21,22]. We became aware of this work after this section of the manuscript had been completed. Importantly, for the moment, we shall assume that an exact unitary U exists that exactly satisfies (A.23). We shall discuss what happens when U can only approximately satisfy (A.23) at the end of this subsection.
Let |i be a computational basis for the input Hilbert space. By our assumption that there exists some unitary U satisfying (A.23), it follows that This implies that U |i |0 ⊗n = sin θ |α i |0 ⊗m + cos θ |β i , (A. 25) where θ is independent of i and sin θ = 1/ √ C, α i |α j = β i |β j = δ ij and 0| ⊗m |β i = 0 (A. 26) for all i. We can get rid of state dependence of the original protocol by replacing U s in A.13 with This is independent of the input state |s . For an arbitrary state |s , we again have U |s |0 ⊗n = sin θ|α |0 ⊗m + cos θ|β , (A. 28) and one can check that The key step is that (A. 25 The rest of the argument is identical to that in Appendix A.2, proving that will output state |α |0 m with very high probability for any |s and l = π/4θ iterations of the Grover step. For typical values of θ ∼ 2 −m/2 , the complexity is O[C(U )2 −m/2 ] as before.
It is important to note that the protocol we just constructed relied crucially on the assumption that there exists an exact unitary U satisfying Eq. A.23. If it is instead only approximate (a more realistic assumption), we can run into problems because we are applying an exponentially long circuit: small errors at each stage can add up to become very large. We now argue that this will not be the case so long as n m 1 and U is scrambling. (If U is not scrambling we expect that much more efficient circuits may well exist anyway.) Scrambling unitaries are well modeled by typical elements of unitary 2-designs. We want to show that a Grover search using (A.32) and a fixed number of repetitions (independent of |ψ ) will work for an arbitrary input state |ψ . This implies that the first and second moment calculations will be exactly given by Haar averages using Weingarten coefficients [53]. Specifically, we first compute the mean of the sin 2 θ, In addition, we would like to estimate the variance of the sin 2 θ. We can also compute this quantity since it only involves two copies of U and two copies of U † , where k is the total number of qubits. This implies that, where k is the total number of qubits. It follows that we can assume θ 2 = 2 −m for the state with exponentially small error.
In the state-independent protocol, we use W instead of the state-dependent projector U s . This results in additional errors if U is not an exact unitary. However, this error in each Grover step can be bounded. The error for a fixed unitary U is Again we can use that U is an element of a 2-design and calculate the mean of the U . Again, one can do those integrals using Weingarten coefficients [53] and arrive at, The amplitude of being mapped into a wrong state in each Grover step is √ = 2 n/2 . Our state independent protocol requires application of (A.32) 2 m/2 times, so the total accumulated error will be given by √ 2 m−n , which is exponentially small in the limit of interest, when n m 1. This completes our argument.

B Maximinimax Prescription for the Bulge Surface
In this appendix, we argue that spacetimes with more than one extremal surface generically contain Python's Lunches. To do so, we use a variant on the maximin arguments introduced by Wall in Ref. [28]. Such arguments have numerous subtleties and require considerable effort to rule out as many edge cases as possible. We shan't worry too much about such details here; instead we will just give physics-level arguments that justify our construction. We shall restrict our attention to classical spacetimes obeying the null energy condition (NEC), although it is possible to generalize maximin arguments to include quantum effects [29]. Our starting assumption is the existence of two distinct extremal surfaces, the HRT surface χ 1 and an additional surface χ 2 , homologous to the same boundary region. The HRT surface χ 1 can be found by a maximin prescription, where one finds the globally minimal area surface within some Cauchy slice, and then maximises that area over all Cauchy slices. The second extremal surface χ 2 must have equal or larger area. Generically it will have larger area and we assume that this is indeed the case.
We first argue that no point on the second extremal surface χ 2 can be timelike separated from any point on the HRT surface χ 1 . Let C 1 be a Cauchy slice in which χ 1 is the unique minimal surface. We define a new Cauchy slice where J − [χ 2 ] and J − [C 1 ] are the past of χ 2 and C 1 respectively and J − [χ 2 ] is the complement of the future of χ 2 . Note that C 2 contains the surface χ 2 and so is nowhere timelike separated from it. But by standard focussing arguments, the minimal area surface in C 2 is at least as big as the minimal area surface on C 1 , with equality if and only if χ 1 is in C 2 . 19 This is because we can focus any surface in C 2 to a surface in C 1 ∩ C 2 with no greater area.
Since we assume that χ 1 is the unique maximin surface, we find that χ 1 ∈ C 2 , and so χ 1 is not timelike separated from χ 2 . Assuming the spacetime is generic (i.e. the NEC holds as a strict inequality) we can also use standard focussing arguments to argue that χ 1 cannot be lightlike separated from χ 2 .
Let us temporarily assume that there exists a Cauchy slice, within which any sufficiently small deformation of χ 2 , which preserves the homology constraint but which is not necessarily local, will increase its area. (We shall consider the alternative possibility below.) If we deform C 2 in a sufficiently small neighborhood of χ 2 , we should then be able to find a new Cauchy slice, still containing χ 1 ∪ χ 2 , on which χ 2 is minimal within a small neighborhood and χ 1 is still globally minimal. This will be important later.
We define the Wheeler-de Witt patch W 1,2 as the bulk domain of dependence of any spacelike slice bounded by χ 1 ∪χ 2 . We can then construct a new surface χ 3 by the following maximinimax procedure.
First we choose some Cauchy slice C 3 for the Wheeler-de Witt patch W 1,2 . 20 Next, we choose a smooth non-degenerate function φ 3 : C 3 → [0, 1], where φ 3 (χ 1 ) = 0 and φ 3 (χ 2 ) = 1. Morally, the level sets of the function φ 3 define a foliation of C 3 . However, it is somewhat more general than this because the topology of the level set can change if φ 3 has critical points. Formally, this is known as a 'sweepout' of C 3 . It is necessary both for physical reasons, since in general an extremal surface may have arbitrary topology, so long as it satisfies the homology constraint, and for mathematical reasons, to prevent the appearance of singularities in the surface if we insist that it have the 'wrong' topology. Finally, we choose the level set φ −1 3 (x 3 ) for x 3 ∈ [0, 1] of maximal area. Note that the level set φ −1 3 (x) will be singular if x is a critical value of φ 3 , but so long as φ 3 is nondegenerate, the singularities will be at isolated points and the area of the surface should still be well defined.
Having found the surface χ 3 = φ −1 3 (x 3 ) of maximal area, we minimise that maximal area over all allowed functions φ 3 . Finally, we maximize that minimax area over all Cauchy slices for W . We call the resulting surface the maximinimax surface χ 3 . In other words, we have where the level set φ −1 3 (x 3 ) is defined by the maximinimaximization Provided a unique maximinimax surface χ 3 exists 21 and does not lie at the boundary of any of the spaces we are optimizing over, it will be extremal, since, at linear order, an arbitrary variation of the surface χ 3 can be achieved by a linear combination of variations in C 3 , φ 3 and x 3 . We shall not try to rigorously prove that this will be true (even generically). After all, we have not even tried to rigorously define our construction -the geometric measure theory [54] required to do so would be well beyond the scope of this paper. However we shall make a few comments about why we expect that a maximinimax surface should exist and not lie at the various boundaries of the optimization space we are searching over.
So long as φ 3 is smooth and non-degenerate, the area of the level set should be a continuous function on a compact interval, and so a maximal area surface should exist.
One might worry that the minimization over functions φ 3 could approach a function that is not smooth and non-degenerate, for which a maximal area level set is not well defined. Our understanding of the results of Almgrem-Pitts min-max theory [55,56,57] is that this will not end up being the case. Instead, the minimax surface χ 3 ∈ C 3 should be a well defined varifold and will be a smooth (possibly self-intersecting) submanifold if the spacetime dimension is less than seven. Intuitively, it is reasonable to expect that, as the function φ 3 becomes more badly behaved, the maximal area surface should only increase, rather than decrease in area.
Similarly, we expect any bad behavior in the Cauchy slice C 3 will tend to decrease the area of the minimax surface. Hence a maximinimax surface χ 3 should exist and be well behaved. For more detailed arguments in this direction, see [28]. For known examples, involving timelike de Sitter boundaries, where maximin surfaces do not exist, see [58].
How could the maximinimax surface χ 3 end up on the boundary of the space of surfaces we are searching over? Firstly, the maximinimax surface χ 3 could have nonzero intersection with χ 1 ∪χ 2 . Suppose this intersection were not a connected component of χ 3 (and χ 1 ∪χ 2 ). Since the surface χ 3 cannot ever go outside the Wheeler-de Witt patch W 1,2 and χ 1 and χ 2 are extremal, there must be some point where χ 3 has nonzero mean curvature, within the Cauchy slice, where it bends 'inwards' into W . We could then decrease the max area of a level set in φ 3 by deforming φ 3 slightly to make the surface χ 3 moves slightly inwards at this point, which gives the desired contradiction.
What about if the intersection is a connected component? In that case, the entire connected component will already be extremal and we don't need to worry about it. One might, of course, worry that χ 3 could end up being the same as either χ 1 or χ 2 , in which case we wouldn't have really found a new extremal surface. However this is impossible, since, by assumption, there exists a Cauchy slice where any small deformation of χ 1 or χ 2 will increase their area, and hence neither can have maximal area within any φ 3 . The min-max surface in this Cauchy slice will have a larger area than either χ 1 or χ 2 , which rules out either χ 1 or χ 2 being maximinimax.
Finally, one might worry that points in χ 3 might end up lightlike separated from either other points in χ 3 , or points in χ 1 or χ 2 . The first cannot happen, because of arguments similar to those in [28]. If a) the minimax surface contained a null segment, the area of the minimax could always be increased by a sufficiently small deformation of the Cauchy slice near this null segment. However, if b) the minimax surface did not contain a null segment, it could not be extremal within the Cauchy slice (using focussing arguments for generic spacetimes), which the minimax should be since its variation is unconstrained so long as it doesn't intersect χ 1 or χ 2 . The second cannot happen because (using focussing in a generic spacetime) we could then increase the area of the minimax surface by deforming φ 3 so that the level set is locally deformed along the lightcone towards χ 1 or χ 2 .
We also note that the maximinimax construction automatically guarantees that the bulge surface χ 3 has larger area than either χ 1 or χ 2 .
Having shown that an intermediate bulge surface exists between χ 1 and χ 2 whenever χ 2 is minimal with respect to any small deformation within some fixed Cauchy slice, we now consider the opposite case, where there exist small deformations of χ 2 which decrease the area of χ 2 within any Cauchy slice. In this case, χ 2 cannot be an end surface and so must instead itself be the bulge surface. Without loss of generality, we assume that χ 1 is contained in the interior Int[χ 2 ] of χ 2 . 22 We shall also assume that, in any Cauchy slice, there exist small deformations of χ 2 that a) decrease the area and b) lie entirely in the exterior Ext[χ 2 ] (defined as the complement of the interior Int[χ 2 ]). The alternative possibility, where there exist Cauchy slices where only deformations that enter the interior can decrease the area, but none where no deformations can decrease the area, should be non-generic and can be interpreted as the bulge surface and one end surface degenerating into one another.
We can now use the usual maximin construction, to find a second end surface χ 3 . We simply constrain our search to Cauchy slices containing χ 2 , and to surfaces within that Cauchy slice that are entirely in the exterior Ext[χ 2 ]. As before, to show extremality, we just need a) for the maximin surface to exist and b) for variations of the Cauchy slice be sufficient to freely vary the maximin surface.
The arguments for both are essentially identical to those for the original maximin construction. The only new potential obstructions that need to be ruled out are the maximin surface either a) intersecting, or b) being lightlike separated from, the surface χ 2 . In the first case, if the intersection was not a connected component the maximin surface would have to bend inwards somewhere, which contradicts its minimality within the Cauchy slice. Any intersection on a connected component will be automatically extremal. Finally, the surface χ 3 cannot simply be equal to χ 2 , since, by assumption, χ 2 does not have globally minimal area within any Cauchy slice.
What about the possibility of lightlike separation from χ 2 ? By focussing arguments in a generic spacetime, the change in area from a lightlike deformation of the surface χ 3 towards χ 2 (in direction k a ) would have to be positive at linear order. Since χ 3 is minimal within the Cauchy slice, there must be some spacelike direction r a pointing away from the lightcone for which the change in area is nonnegative at linear order. However, this implies a deformation in a timelike direction t a (that makes χ 3 spacelike separated from χ 2 ) must increase the area at linear order, in contradiction with the maximality of the Cauchy slice. 23 Finally, we note that, since χ 3 is minimal within a slice containing χ 2 , it must have smaller area than χ 2 . We therefore conclude that the generic situation when more than one extremal surface exists if to have a Python's Lunch: three extremal surfaces, with the middle bulge surface having larger area than either end surface.

C Explicit Calculation of the Late-time Bulge Surface
In this appendix, we explicitly calculate the location of the extremal surface that forms the bulge surface at late times in a particular theory. The theory is JT gravity with c Dirac fermions, and we consider a black hole that is evaporating using transparent boundary conditions, as in [36,37]. Dirac fermions are the only conformal field theory for which the calculation is possible, since they are the only conformal field theory for which the two-interval von Neumann entropy is known analytically.
We note that this surface only becomes the bulge surface when the horizon area, which, in the case of JT gravity is the horizon dilaton value φ + φ 0 , is less than half of its initial value. 24 This means that the initial black hole cannot have been in the regime φ φ 0 where JT gravity is justified as the dimensional reduction of a near extremal black hole. Nonetheless, a) there is no obvious problem (other than UV issues which are unimportant for this calculation) with defining JT gravity as a theory in its own right when φ > ∼ φ 0 and b) it provides a calculable example of an extremal surface that should also exist in more general examples of evaporating black holes.
The JT gravity action is given by where the scalar field φ is called the dilaton and S CFT [g] is the action for the CFT (in this case c Dirac fermions) in the gravitational background. We also impose boundary conditions where t is the physical boundary time,φ r is the fixed renormalized boundary dilaton value and ε is small. The metric of a static black hole in JT gravity is given by with the dilaton profile given by φ = 2φ r π T coth[πT (u − v0)].

(C.4)
Here u and v are the advanced and retarded times respectively. The Bekenstein-Hawking entropy S BH is given by where φ hor is the horizon dilaton value and S 0 = φ 0 /4G N is the extremal entropy.
To extend our coordinate system behind the horizon we simply define the Kruskal-like coordinate U = − exp(−2πT u). We find In the near-horizon region U e 2πT v 1, these simplify to ds 2 = −e 2πT v 8πT dU dv, (C.8) and φ = 2φ r πT 1 − 2U e 2πT v . (C.9) In the semiclassical limit, an evaporating black hole is well approximated by an ingoing Vaidya metric, where we simply promote the temperature T to be a slowly varying function of the infalling time v. The change in temperature is determined by the rate of energy loss from the black hole, where we have Here the first equality uses (C.5), the second equality is the first law of black-hole thermodynamics and the last equality is the (1 + 1)-dimensional Stefan-Boltzmann law. It follows that, in the near horizon region, we have in agreement with the results from [36].
We are now ready to attempt our actual task: calculating the location of the late-time bulge surface. This is the union of two points (U 1 , v 1 ) and (U 2 , v 2 ), where we assume U 2 > U 1 and v 2 < v 1 (since the points need to be spacelike separated). In fact, if we started with a two-sided black hole, the bulge surface really consists of three points, where the third point (U 3 , v 3 ) lies close to the other 'end surface', near the initial bifurcation surface of the two-sided black hole. In the semiclassical limit we have U 3 = exp(O(1/G N )) and v 3 = −O(1/G N ). The corrections to the generalized entropy gradient for (U 3 , v 3 ) from the existence of the additional points (U 1 , v 1 ) and (U 2 , v 2 ) are therefore highly suppressed and we can treat (U 3 , v 3 ) as lying exactly on the quantum extremal end surface.
The outgoing entropy is now the entropy of the union of the two intervals [−1, U 1 ] and [U 2 , U 3 ], which for c Dirac fermions is given by (see [59]) where ε U 3 , ε U 2 , ε U 1 and ε (bdy) U are the outgoing mode cut-offs in units of U at U 3 , U 2 , U 1 and the boundary respectively. Since U 3 = exp(O(1/G N )), terms involving U 3 do not contribute to the gradient of the entropy.
Formally, the ingoing entropy should be calculated by a similar formula for the two intervals log where ε v 1 and ε v 2 are the ingoing cut-offs in units of v at v 1 and v 2 respectively, and we have elided constant terms. As before, to correctly renormalize the entropy, we need As expected, we have U 2 > U 1 > U 0 and v 2 < v 1 < v 0 . Finally, we note that the classical contribution to the generalized entropy for this surface is 2(φ 0 + 2πTφ r ). 25 Meanwhile, up to subleading corrections, (C.23) is equal to c/6 log U 3 , which is the entropy of the interior partners of the Hawking radiation. Hence, at leading order, the bulk von Neumann entropy is simply the entropy S rad of the Hawking radiation. We therefore find that the total generalized entropy at leading order is 2S BH + S rad , where S BH is the final Bekenstein-Hawking entropy of the black hole.