Conditional Mutual Information of Bipartite Unitaries and Scrambling

One way to diagnose chaos in bipartite unitary channels is via the tripartite information of the corresponding Choi state, which for certain choices of the subsystems reduces to the negative conditional mutual information (CMI). We study this quantity from a quantum information-theoretic perspective to clarify its role in diagnosing scrambling. When the CMI is zero, we find that the channel has a special normal form consisting of local channels between individual inputs and outputs. However, we find that arbitrarily low CMI does not imply arbitrary proximity to a channel of this form, although it does imply a type of approximate recoverability of one of the inputs. When the CMI is maximal, we find that the residual channel from an individual input to an individual output is completely depolarizing when the other input is maximally mixed. However, we again find that this result is not robust. We also extend some of these results to the multipartite case and to the case of Haar-random pure input states. Finally, we look at the relationship between tripartite information and its Renyi-2 version which is directly related to out-of-time-order correlation functions. In particular, we demonstrate an arbitrarily large gap between the two quantities.


Introduction
Recent research in quantum gravity has led to an interest in the scrambling and chaotic properties of many-body quantum systems [1][2][3][4][5][6][7]. The simplest model to consider is that of a unitary time evolution, U AB→CD , where A,B and C,D denote fixed bipartitions of past and future time slices of the quantum system, respectively. Typically, A = C and B = D, and we merely use different letters to denote the past and future timeslices, but we may also consider two different bipartitions if we want to compare the propagation between different subsystems.
For chaotic dynamics, we expect that the local degrees of freedom A,B will get encoded nonlocally into C, D, i.e., scrambled. One way to formalize this intuition, proposed recently in [8], is to consider the Choi state dual to U , which is commonly used in quantum information theory to study the properties of quantum channels [9]. For the specific case of bipartite unitaries, the Choi states are used to study the capacity [10][11][12][13][14] and the cost of implementation [15,16]. In the present context, this is the pure state defined by is the conditional mutual information (CMI). 1 In particular, the tripartite information is never positive as a consequence of the strong subadditivity of the von Neumann entropy: This is true for an arbitrary unitary time evolution, whether chaotic or not, contrary to previous expectations [8]. Interestingly, I 3 ≤ 0 is not true for general quantum states, but it has recently been proved in a different context, namely as the consequence of the Ryu-Takayanagi formula in holographic systems [17] (cf. [18,19]) and its tensor network models [20,21], where it can be interpreted as a consequence of the monogamy of entanglement [22]. Whether there exists a deeper common reason for the negativity of In this paper, we aim to clarify the meaning of the tripartite information from the perspective of quantum information theory, based on the connection established above. We are particularly interested in the extreme cases, where the tripartite information attains its minimal or maximal values. We say that U is minimally I 3 -scrambling if I 3 = 0 and maximally I 3 -scrambling if it attains its maximally negative value.
We start in Section 2 by considering the case of minimal I 3 -scrambling. Our first result shows that any such unitary has the following special form: for some decomposition A = A L ⊗ A R and likewise for B, C, D (see fig. 2 for an illustration). That is, the unitary can be decomposed into, in general, four smaller unitaries which locally route the quantum information between the input and output subsystems. Such a 'criss-cross channel' exactly matches our intuition of what a non-scrambling process should look like. This result can also be interpreted as maximizing simultaneously achievable rates of communication between the input and output subsystems: For example, we have that where we write R A→C and R A→D for the simultaneously achievable (one-shot, zero-error) quantum communication rates from A to C and D, respectively, and Q A→CD for the quantum capacity from A to CD, which by unitarity is always equal to log|A|, the Hilbert space dimension of A. Note that logarithms in this paper are base 2, in accordance to the convention in quantum information. Lastly, our result can also be translated into a statement about the recoverability of the systems from partial information -for the purposes of recovering the quantum information from input A given output D, access to the other input subsystem B does not help. It is interesting to ask to what extent the above statements can be generalized to the case where I 3 ≈ 0. The latter result can be readily generalized to the approximate case using a recent result in quantum information theory [23], which asserts that we can find a quantum operation R D→BD independent of the state at A such that we can approximately recover ρ ABD from ρ AD . On the other hand, we show that (1.3) is not robust in the following, strongest possible sense: we explicitly construct a family of unitary quantum channels such that I 3 is arbitrarily close to zero, while their distance from any unitary of the form (1.3) is lower-bounded by a positive constant. Our construction implies that any robust version of (1.3) must necessarily depend on the Hilbert space dimension.
From the perspective of quantum information theory, our results complement the nonrobustness result in [24,25] that provide examples of tripartite states with vanishing conditional mutual information but non-vanishing trace distance to any quantum Markov chain state, that is, a state with a special normal form equivalent to having zero CMI. Here, on the other hand, we find a tripartite state with vanishing CMI and trace distance, but still with non-vanishing diamond norm to any quantum Markov chain state when the states are viewed as reduced Choi states of bipartite unitaries. This provides further evidence for the nonrobustness of normal forms for quantum Markov chains.
In Section 3 we then consider the other extreme case, where the tripartite information I 3 is maximally negative. This can be achieved by, e.g., perfect tensors [20], also known as absolutely maximally entangled states [26,27], such as those obtained by the random construction of [21]. Here, we give an explicit construction similar to that of [28] in the case A = B = C = D, which works in arbitrary odd dimensions. We also show that maximally scrambling unitaries do not exist if all the systems are qubits. Now suppose that U is maximally I 3 -scrambling and, for concreteness, that the dimension of A is the smallest among the four subsystems, so that I 3 = −2 log |A|. Then the residual channels N A→C and N A→D , obtained by fixing a maximally mixed state τ B into B, applying the unitary, and tracing out either D or C, are completely depolarizing. 2 In other words, we cannot locally route any information from A to C or D,while we still have R A→CD = log |A| by unitarity. This characterization nicely complements (1.3) and (1.4). It also complements the recovery interpretation: with only D, we can recover none of the information from A, but with BD we can recover all of it. However, we again find that we need to be cautious when generalizing this result to the approximate case: We construct a unitary such that I 3 is arbitrarily close to being maximally negative, but whose residual channel N A→C is bounded away from the completely depolarizing channel.
In Section 4, we consider general values of I 3 , again using the connection (1.2) to the conditional mutual information. The latter has an operational interpretation in the task of quantum state redistribution. More precisely, given a quantum state ρ ACD with purification ρ ABCD , if one party possesses AC and another party D, the former can send A to the latter using at an optimal rate of 1 2 I(A; B|D) = − 1 2 I 3 qubits [29]. This is intuitive: given that a strongly scrambling unitary will delocalize information from the inputs, we indeed expect that a larger number of qubits should be required to transfer systems. We show that this is consistent with our main results for minimal and maximal I 3 -scrambling and give simple protocols that achieve the given qubit rate. Note that it is also possible to do Figure 3. A multiple input and multiple output (MIMO) unitary. similar analyses using other operational interpretations of CMI such as in the tasks of state deconstruction and conditional erasure [30].
An appealing feature of the tripartite information is that it is related to out-of-timeorder (OTO) correlators, an alternative diagnostic of chaos proposed to quantify the analog of the 'butterfly effect' in black holes [3]. OTO correlators can also be measured in various physical systems [31,32]. An OTO correlator of two local operators O A and O C is by definition an expectation value of the form where U = e −iHt is the time evolution operator and O C (t) = U † O C U . We define the average OTO correlator between A and C, denoted , by averaging the above over orthonormal bases of operators on A and C. In the infinite temperature limit, β = 0, it is known that [8] Here, I is a variant of the tripartite information 3 defined in terms of the Rényi-2 entropy, S 2 (A) = − log tr ρ 2 A , and the entropies are evaluated on the Choi state of U . Since I (2) 3 ≥ I 3 , the butterfly effect as measured by small OTO correlators implies I 3 -scrambling. In Section 5, we show that the converse is not true: a unitary with almost maximally negative tripartite information can still have large OTO correlators. In fact, we find that the difference I (2) 3 − I 3 can be arbitrarily large. Finally, many of the above results can be extended to the multipartite case, as we explain in Section 6. Let U A 1 ...An→C 1 ...Cm be a multiple input and multiple output (MIMO) unitary as shown in fig. 3. We show that the natural generalization of minimal I 3 -scrambling is to demand that I 3 (A i ; A c i ; C j ) = 0 for all i and j, where we write A c i for the subset of all input subsystems save for A i . In this case, the unitary takes the following form, generalizing our result for the bipartite case: where U i→j is a local unitary mapping input subsystem A i to output subsystem C j . We also give an explicit construction of a family of maximally scrambling MIMO unitaries when all systems are of the same large prime dimension.
The nonrobustness of various algebraic characterizations of chaos and scrambling, while undesirable, is one of the central messages of this article. It typically leads to dimensional dependencies, which, in the context of high energy physics where Hilbert spaces are typically high-dimensional, are of particular significance. We believe that this provides good motivation for the development of alternative, more robust characterizations and diagnostics, not only in the present context but also in the study of other quantum information concepts in high energy physics, such as quantum error correction in holographic systems.

Minimal scrambling
In this section, we study properties of bipartite unitaries U AB→CD where I 3 ≈ 0. We first consider the exact case. Here, our main result is that the unitary has the following normal form: Theorem 1. A unitary U AB→CD is minimally I 3 -scrambling, i.e., I 3 = 0, if and only if it can be decomposed into a tensor product of local unitaries. That is, The dimensions of the subsystems are given by |A L | = |C L | = 1 2 I(A; C) U etc.
See fig. 2 for an illustration. This result is consistent with the notion of scrambling as delocalization of quantum information. To see this, take a minimally I 3 -scrambling unitary U AB→CD , and consider the residual channel Then, Theorem 1 implies that Hence, for the purposes of quantum information transfer, the residual channel N A→C is equivalent to the unitary quantum channel U A L →C L . Likewise, N A→D is equivalent to the unitary channel U A R →D L , while N A→CD is equivalent to their tensor product. In particular, the quantum information from A can be perfectly transmitted using local decoders at C and D, independent of the choice of input at B. Thus quantum information is perfectly routed through the system in a completely localized fashion, in agreement with the absence of scrambling.
From the perspective of quantum communication, we may state this as where Q is the quantum capacity of the corresponding channels, i.e., the maximum qubit rate at which quantum communication can be transferred through the channels in the limit of many channel uses and vanishing error (see, e.g., [9] for details). The right-hand side equality is a consequence of unitarity. In fact, we actually get the even stronger result that where R A→C and R A→D are simultaneously achievable, one-shot, zero-error quantum communication rates.
It is important to note that simultaneously achievable rates are different from the individual quantum capacities for general broadcast channels A → CD. The former always satisfy an inequality R A→C + R A→D ≤ Q A→CD . However, the latter need not. This phenomenon is also found in classical communication capacities. Consider, e.g., the basisdependent copying channel A → CD which sends a noiseless copy of A to C and D as |j A → |j C |j D . The individual capacities are log d but so is the overall capacity. While we cannot make the same construction for quantum capacities due to the no-cloning theorem, we can take advantage of the fact that the product of the dimensions of two subspaces can be greater than the sum to get a gap in quantum capacities as well. Define the unitary If we fix the input state ρ 0 B = |0 0| then the resulting channel sends |a → |a ⊗ |0 if a ≤ d 0 , and |a → |0 ⊗ |a otherwise. Therefore, Q A→C ≥ log d 0 by coding in the former, d 0 -dimensional subspace, while Q A→D ≥ log(d−d 0 ) by coding in the latter subspace. Hence the sum of the individual capacities is at least However, Q A→CD is never larger than log |A| = log d, so we obtain the inequality Q A→C + Q A→D > Q A→CD .
To prove Theorem 1, we first prove the corresponding statement for quantum states with vanishing conditional mutual information: has the form Proof. We note that assumptions 2 and 3 together imply that From [33], we know that if ρ ABC is a quantum state with I(A; B|C) = 0 (assumption 1), then we can decompose into sectors for some probability distribution p i and quantum states ρ

Thus we can decompose into sectors
) and purify individually to obtain a purification of ρ ABC of the form (2.4) By Uhlmann's theorem (see, e.g., [9]), the purification in (2.4) only differs by a local unitary on D from the four-party pure state ρ ABCD , which likewise purifies ρ ABC , and hence it suffices to establish the normal form for (2.4). Furthermore, they have the same reduced state on CD, namely, the maximally mixed state (2.1), which is unitarily invariant. Thus: We may think of the left-hand side as a big block matrix with respect to i,j C i ⊗ D j which is only supported on blocks where i = j. The right-hand side on the other hand is supported on all blocks C i ⊗ D j . Thus (2.5) can only be true if there is only a single sector (and hence no pair with i = j). Suppressing the index i, this means that, in fact, and its purification (2.4) reads Moreover, (2.5) becomes by the Schmidt decomposition. But |AB| = |CD| by assumption 3, thus in fact Thus |η AC L D L is maximally entangled between A and C L D L , and |ξ BC R D R is maximally given by a tensor product of maximally entangled states: Thus, by another application of Uhlmann's theorem there exist local unitaries on A,B that transform (2.6) into (2.7). Absorbing all local unitaries into the tensor product decompositions, we obtain the desired result.
The normal form in Theorem 1 follows now readily from Proposition 2, since the Choi state ρ ABCD associated with the unitary U AB→CD satisfies all three assumptions of the proposition. The formula for the dimensions of the subsystems A L etc. follows directly from the normal form. For the converse, we observe that −I 3  Theorem 1 does not appear to directly generalize to isometries V AB→CD . For example, consider the three-party GHZ state |GHZ ACD = (|000 + |111 )/ √ 2, which is the Choi state of the isometry mapping |0 → |00 and |1 → |11 . This is a special case of an isometry V AB→CD where B is trivial, and I 3 (A; C; D) is zero, just as for any tripartite pure state. However, the GHZ state is clearly not of the form in Proposition 2, even if we allow for maximally entangled states between C, D. This can be seen by the fact that tracing out any one of the A, C, D in the GHZ state gives a separable state, which is impossible for a triple of maximally entangled states unless they are all trivial.
It is well-established in quantum information literature that the conditional information can be operationally interpreted in terms of the recoverability of quantum information for tripartite quantum states [23,34]. See also [35][36][37][38]. In particular, it is known that, for any quantum state ρ ABD , X † X is the trace norm and R D→BD a quantum channel that only depends on ρ BD [23]. Applied to the Choi state of a bipartite unitary U AB→CD with −I 3 ≤ ε, we therefore obtain a recovery map with This is immediate from Theorem 1 when I 3 = 0. This recovery property of the state from local information is in stark contrast with the maximally scrambling case, such as in the model of black hole evaporation from [1], and we discuss this in more detail on p. 14.
In contrast to the interpretation in terms of recovery maps, Theorem 1 itself is not robust in the sense that there exist unitaries for which I 3 is arbitrarily close to zero, while their distance to any unitary of the form of Theorem 1 stays bounded away from zero. Here, we measure distance using the diamond norm between two quantum channels N and M, where we optimize over all states ρ on AR, with R an auxiliary n-dimensional Hilbert space (n = |A| is sufficient). As the trace distance quantifies how well one can experimentally distinguish quantum states [39], the diamond norm is a natural measure of how well one can distinguish two quantum channels even with an auxiliary system. Our construction is explicit and goes as follows. We choose A = B = C = D = C d and define a bipartite unitary U d that is maximally I 3 -scrambling on some subspace and the identity otherwise. More precisely, where the infimum is over all unitaries U 0 with vanishing tripartite information.
That is, by making U d I 3 -scrambling on a subspace whose relative size goes to zero for large d, we can make the triparite information go to zero while still leaving a nonzero subspace that is I 3 -scrambling, thereby keeping the diamond norm finitely bounded from zero. It is also interesting to note that the Choi state of U d converges to that of the identity channel, a quantum Markov chain state, in trace distance, while the channel itself does not converge to the identity nor any minimally I 3 -scrambling unitary in diamond norm.
On the other hand, we note that in terms of simultaneous local one-shot quantum capacities of U d , lim d→∞ Q A→CD − (R A→C + R A→D ) = 0. Indeed, by coding in the complementary subspace of A S , R A→C ≥ log(d − d S ) can be achieved. Asymptotically, this goes like log d, since The other inequality is trivial, so we have equality. Hence, one might be tempted to interpret I 3 as the difference between the sum of the simultaneous local quantum capacities A → C, D and the maximum possible value log|A|, which is true in this example for the limit of large d. For finite d, however, we can find examples where this interpretation fails. The interpretation can be partially salvaged, however, by considering instead entanglementassisted classical communication with random codes generated using maximally entangled states while fixing the input to B to be maximally mixed. This follows from the observation and the fact that the entanglement-assisted classical communication rate of a channel N A→C using such a code is given by the mutual information I(A; C) of its Choi state [40,41].
Since the mutual information I(A; CD) = 2 log|A| is as large as it can be, it is not just an achievable rate but in fact the capacity of the A → CD channel. Equation (2.11) therefore states that the sum of the two entanglement-assisted achievable rates is bounded above by the entanglement-assisted capacity. Proposition 3 is a consequence of the following technical estimates proved in Appendix A: where the infimum is over all unitaries U 0 with vanishing tripartite information.
Indeed, (2.12) implies that the difference between the subsystem entropies vanishes in the limit of large d. This follows from the Fannes-Audenaert inequality [42,43], which asserts that, for any two quantum states ρ and σ on a D-dimensional Hilbert space, where is the binary entropy function, which can be upper bounded as h(T ) ≤ 2 √ T . But Φ + AC ⊗ Φ + BD is the Choi state of the identity channel, which has zero tripartite information. Hence the tripartite information I(A; B; C) U d goes to zero in the limit of large d. In the same limit, the right-hand side of (2.13) converges to 1. This concludes the proof of Proposition 3.

Maximal scrambling
We now consider the opposite extreme where I 3 ≈ −2 log min{|A|, . . . , |D|} and compare it to the results we obtained in the minimally I 3 -scrambling case. Note that this is the most negative value it can take since since the mutual information is always nonnegative. A similar inequality holds for the other subsystems.
We first discuss the existence of maximally I 3 -scrambling unitaries in the case where A = B = C = D = C d . Clearly, I 3 = −2 log d if and only if any bipartite subsystem is maximally mixed, i.e., if S(AB) = S(AC) = · · · = 2 log d. Such unitaries are precisely fourparty perfect tensors, i.e., tensors that are unitary from any bipartition to the complement, as pointed out in [8]. This establishes the existence of maximally I 3 -scrambling unitaries in sufficiently large prime dimension d, since a stabilizer state chosen at random will be a perfect tensor with high probability [21]. On the other hand, the following explicit construction achieves the same for any odd dimension d: where all arithmetic is modulo d. We require d to be odd so that U S is unitary. It can be readily verified that I 3 = −2 log d. We note that (3.2) is a straightforward generalization of the three-qutrit code from [28]. It is interesting to observe that U 2 S is minimally I 3scrambling. In this sense, a unitary that is maximally I 3 -scrambling can still have a very small recurrence time.
The relationship to quantum error correcting codes can also be used to argue that there exists no maximally I 3 -scrambling unitary for qubits (d = 2). Indeed, assume that such a unitary U AB→CD exists and consider the isometry V A→BCD := U AB →CD |Φ + BB obtained by inputting one half of a maximally entangled state into B. Then the perfect tensor property implies that we can correct for the erasure of any one of the output qubits B, C and D. In other words, V A→BCD would be a code for the qubit erasure channel of length 3. But this is ruled out by [44]. Hence, such a U does not exist.
We return to the general setup, where the dimensions of the systems A, . . . , D need not be equal, and consider the consequences of a unitary being maximally I 3 -scrambling. In particular, we consider the residual channels from a single input to a single output. Then, we expect the channels residual channels A → C etc. to be noisy since quantum information should be delocalized. Indeed, we find: Proposition 5. Let U AB→CD be a maximally I 3 -scrambling unitary and ρ ABCD its Choi state. If either A or C have the smallest dimension among the four subsystems then ρ AC is maximally mixed and I(A; C) = 0.
As a consequence, the residual channel corresponding to the maximally mixed input on B is completely depolarizing, i.e., its channel output is the maximally mixed state τ C for any input state σ A . 4 Proof. If the dimension of A is smallest, maximal I 3 -scrambling means that I 3 = −2 log|A|. Thus it follows from (3.1) that I(A; C) = I(A; D) = 0, since the mutual information is always nonnegative. Similarly, if C is smallest then we have I 3 = −2 log|C|, which implies that I(A; C) = I(B; C) = 0.
In either case, we thus find that I(A; C) = 0 and hence that ρ AC = ρ A ⊗ ρ C = τ AC , since both ρ A and ρ C are maximally mixed. To see that this implies the second claim, we note that ρ AC is the Choi state of the residual channel N A→C . Hence, Completely depolarizing channels have zero capacity of any kind, in agreement with our expectation that the quantum information at A gets fully delocalized for maximally mixed input at B. In Appendix D we show that if |D| |AC| then ρ AC ≈ τ AC for typical input states on B. Moreover, if |D| |AC| 2 then the residual channel N A→C is typically entanglement-breaking, in which case it still has zero quantum capacity.
In general, there exist input states on B such that the corresponding residual channel A → C can still be used for communication. For example, consider the unitary defined in (3.2). If we fix the input on B to a computational basis state |0 , then Hence, the residual channel is the completely dephasing channel, which has maximal classical capacity. If we instead fix the input on B to be in the state 1 √ 3 (|0 + √ 2 |1 ) and consider the d = 3 case, we obtain a residual channel A → C with positive quantum capacity. To see this, we use the fact that the coherent information of a channel is a lower bound on the quantum capacity [45][46][47]: , we obtain I(R C) = 12 9 − 5 9 log 5 > 0. We can also interpret Proposition 5 from the perspective of recovery of quantum information. If we assume that the dimension of A is smallest then both residual channels A → C and A → D are completely depolarizing. Given only D, none of the quantum information at A can be recovered, while if we supplement it with B, perfect recovery is possible. More precisely, we can transfer entanglement from A to BD perfectly. This follows from the fact that ρ ABCD and Φ + AA ⊗ Φ + CC both purify the reduced state ρ AC = τ A ⊗ τ C , which by Uhlmann's theorem implies the existence of a decoding operation D BD→A .
One of the motivations for studying scrambling unitaries comes from black hole physics. The preceding interpretation applies naturally to the model of black hole evaporation in [1] and was also discussed in [8]. We can schematically model black hole evaporation by a bipartite unitary time evolution where A is half of a Bell pair whose other half A enters the black hole at time t 0 , B is the Hawking radiation emitted before t 0 , assumed to be maximally entangled with the black hole B at t 0 , C is the state of the remaining black hole at a later time t 1 , and D is the Hawking radiation emitted in the interval [t 0 , t 1 ]. All indications are that black holes are highly scrambling [1][2][3]6]. If we assume that they are maximally I 3 -scrambling then we find that A cannot be recovered from the late-time Hawking radiation D alone, while it would be possible when also given the old Hawking radiation B. In contrast, if the process were minimally I 3 -scrambling then someone without knowledge of quantum state at A and with only the new Hawking radiation D could apply a local operation R D→BD to approximately recover the old Hawking radiation, so that the overall tripartite state R D→BD (ρ AD ) is close to ρ ABD (eq. (2.8)).
Lastly, we consider the approximate case, where I 3 ≈ −2 log min{|A|, . . . , |D|}. For concreteness, we assume that the dimension of system A is smallest among all four subsystems and I 3 = −2 log|A| + ε. Then, I(A; D) ≤ ε as a consequence of (3.1) (cf. the proof of Proposition 5). Using Pinsker's inequality, this implies that ρ AD − τ A ⊗ τ D 1 ≤ 2 ln(2)ε. In particular, if we put one half of a maximally entangled state into the residual channel A → D, then the resulting state is close to being completely uncorrelated. Likewise, ρ AC ≈ τ A ⊗ τ C , and hence ρ ABCD and Φ + AA ⊗ Φ + CC still purify approximately the same state. It follows, again by Uhlmann's theorem, that there still exists a quantum operation D BC→A such that D BD→A [ρ ABD ] ≈ Φ + AA . In this sense, the recovery interpretation described above can be made robust.
On the other hand, the stronger conclusion of Proposition 5 is not robust in the sense that we can find unitaries such that the negative tripartite information goes to its maximal value, while the diamond norm (2.9) between the residual channel N A→C and the completely depolarizing channel remains finite. Furthermore, we find that there are such unitaries with nonvanishing one-shot zero-error quantum capacity. That is, a unitary can be arbitrarily close to being maximally I 3 -scrambling even though its residual channel can still transmit quantum information perfectly at a nonvanishing rate. The sequence of unitaries we use is again (2.10), , except this time d S will be large. We still require that d S = d − d 0 is odd, so that the existence of a maximally I 3 -scrambling unitary U S is guaranteed. Then we can then establish the following result:

Proposition 6. Let d 0 be a constant and consider the family of unitaries U d for odd
have bounded distance away from the completely depolarizing channel ∆ A→C : Moreover, their one-shot zero-error quantum capacities Q A→C,d can be lower bounded as To establish Proposition 6, we first note that the last bound (3.6) is immediate, since we can code perfectly using the d 0 -dimensional subspaces. The first two bounds, (3.4) and (3.5), follow from the following lemma, proved in Appendix B, together with the Fannes-Audenaert inequality (2.14) that we similarly used to establish Proposition 3.

Lemma 7.
Consider the unitaries U d from (2.10) and their Choi states ρ ABCD,d . Then ρ AD,d is maximally mixed, and

Tripartite information and state redistribution
We now briefly discuss the meaning of general values of the tripartite information. Naturally, we would like to look for operational interpretations that hold in general. Using the equivalence between tripartite information and conditional mutual information, (1.2), one such interpretation is given by the task of quantum state redistribution, in which a party holding two quantum systems is to transfer one of the systems to a party holding one [29]. Specifically, given many copies of a quantum state ρ ACD with purification ρ ABCD , a party with AC can transmit A to a party with D using a rate of 1 2 I(A; B|D) qubits of communication, 1 2 I(A; C) − 1 2 I(A; D) ebits (i.e., shared Bell pairs of maximally entangled qubits) and no classical communication. Conversely, 1 2 I(A; B|D) is the minimum rate of quantum communication required by any state redistribution protocol. This is consistent with the intuition of scrambling -a strongly scrambling unitary will delocalize the information from the inputs so that observers at individual outputs have little knowledge of the inputs. Hence, a large number of qubits should be required to transmit this information.
We can cross-check this intuition with our main results in the minimally and maximally scrambling cases and give explicit protocols in each case. For the minimally I 3 -scrambling case, we cross-check Theorem 1 by applying this result to the reduced Choi state ρ ACD of the unitary. Using the above result, to transfer A from AC to D, we shouldn't need any communication and consume log |A L | |A R | ebits, where we are using the notation of Theorem 1. This is consistent with our result as we can prepare |Φ A R D L locally. Thus, we only need to consume log|A L | ebits to transmit A L . However, we can use the log|A R | pre-existing ebits to transmit A L for a net ebit cost of log |A L | |A R | . No communication was done, so our qubit and bit costs are indeed zero.
In the maximally I 3 -scrambling case, we can cross-check with Proposition 5. In the case where A is the smallest system, [29] states that we should need log|A| qubits, zero ebits, and zero bits. This is achieved by the trivial protocol that transfers A to D over a quantum channel, in agreement with our result.

Tripartite information and OTO correlators
An important property of the definition of scrambling using the tripartite information is that it can be related to scrambling as measured by out-of-time-order (OTO) correlators, as explained in the introduction. Specifically, we recall the following formula for the product of average OTO correlators, is a Rényi-2 version of the tripartite information, defined in terms of the Rényi-2 entropy S 2 (ρ) = − log tr ρ 2 instead of the von Neumann entropy. Since S 2 (ρ) ≤ S(ρ) for any quantum state ρ, one obtains that I 3 ≥ I 3 . Thus the 'butterfly effect' as measured by small OTO correlators implies I 3 -scrambling [8].
However, the converse of this statement is not true. That is, a I 3 -scrambling bipartite unitary can nevertheless have high OTO correlators. One example of this is again given by the family of unitaries U d defined in (2.10), where we find an arbitrarily large gap between I 3 and I This is proved by explicit calculation in Appendix C, where we find that for sufficiently large d, On the other hand, I 3 (A; B; C) U d ∼ −2 log d as a consequence of eqs. (2.12) and (2.14).
Together this establishes Proposition 8.
This large separation can be understood by the fact that we have large individual OTO correlators. To see this, it is useful to choose bases of local Hermitian operators, tr O D,i O D,j = dδ i,j etc., that are adapted to the scrambling and nonscrambling subspaces. Indeed, we can write where U S is the maximally I 3 -scrambling unitary acting on A S B S = C S D S and IS the identity operator on the complement Furthermore, the number of such pairs of maximally correlated operators will be increasing without bound as d → ∞.

Multipartite generalizations
The main results for the minimal and maximal cases above can be generalized to the multipartite setting. However, it is not clear, a priori, how to extend the definition of I 3 -scrambling to the MIMO case. In the following, we will justify defining I 3 -scrambling for multiple input and multiple output (MIMO) unitaries U A 1 ...An→C 1 ...Cm using tripartite informations of the form where A c i is the subset of all input subsystems save for A i and C c j the subset of all output subsystems except for C j (fig. 3). The equalities follow from the bipartite case, (1.2), if we partition the Choi state of U into the four subsystems A i , A c i , C j , C c j .

Minimal scrambling
We define a minimally I 3 -scrambling MIMO unitary to be a unitary U A 1 ...An→C 1 ...Cm such that for all i, j. Again, we find that such a unitary can be decomposed into a tensor product of local unitaries connecting individual inputs and outputs, generalizing Theorem 1:

.Cm be a MIMO unitary. Then U is minimally I 3 -scrambling if and only if it is of the form
. . , n, C j = n i=1 C i→j for j = 1, . . . , m and unitaries U i→j : A i→j → C i→j for i, j.
We will prove Theorem 9 by viewing the MIMO unitary as a bipartite unitary where we group inputs and outputs. This will then allow us to iteratively apply Theorem 1 to decompose the MIMO unitary piece by piece. We will first peel off all the unitaries for a single input and then repeat for all other inputs. To do so, we need to show that we can decompose a MIMO unitary into a local unitary and a residual MIMO unitary such that A 1 and C 1 have zero mutual information on the residual unitary and such that the residual MIMO unitary is still minimally I 3 -scrambling: Here, U is a minimally I 3 -scrambling MIMO unitary that satisfies I(A 1 ; C 1 ) U = 0.
Proof. We apply Theorem 1 with A = A 1 , B = A 2 . . . A n and C = C 1 and D = C 2 . . . C m . Thus we obtain that U as the tensor product of the three unitaries on the right-hand side then we obtain a decomposition as in the statement of the lemma. That U is still minimally I 3 -scrambling follows from the fact local unitaries U 1→1 and the overall unitary U have zero tripartite information, in addition to the additivity of von Neumann entropy for tensor product states (cf. [19,22]). And the statement about the mutual information holds because I(A R ; C R ) U = 0 by direct inspection of the normal form (6.1).
By iteratively applying Lemma 10, we find decompositions A 1 = m j=1 A 1→j ⊗ A 1 and C j = C 1→j ⊗ C j such that U factors into a tensor product of local unitaries U 1→j : A 1→j → C 1→j with a residual unitary U . The latter is minimally I 3 -scrambling and moreover satisfies I(A 1 ; C j ) U = 0 for all j (using monotonicity of the mutual information). However, we also need to make sure that this process will consume all of A 1 . This is a consequence of the following lemma, applied to the residual unitary U . Proof. First note that, for all j = 1, . . . , m, where the last equality follows from the assumption that I(A 1 ; C j ) = 0. This implies the following recursion formula: The first equality holds by plugging in (6.2), the inequality is strong subadditivity, and the last two follow by using that the reduced state ρ C 1 ...Cm is maximally mixed by unitarity. If we start with (6.2) for j = 1 and successively apply the recursion formula, we obtain We conclude that log|A 1 | = S(A 1 ) = 0.
The above considerations thus allow us to completely peel off A 1 from the MIMO unitary, leaving a minimally I 3 -scrambling MIMO unitary on the other inputs. We have thus proved the following lemma: Theorem 9 now follows by applying Lemma 12 inductively to A 1 , A 2 , etc. After n steps, there are no A-systems left. Since the residual operator U is a unitary, the corresponding C j likewise have to be trivial. We thus obtain the desired normal form. To see that, conversely, any MIMO unitary of the given normal form is minimally I 3 -scrambling follows directly from the corresponding statement in Theorem 1, applied to the bipartitions A i , A c i and C j , C c j . This concludes the proof of Theorem 9.

Maximal scrambling
On the other end, we define a maximally I 3 -scrambling MIMO unitary as one that satisfies for all i, j. Applying Proposition 5 to the bipartition A i , A c i , C j , C c j , we conclude that the residual channels N A i →C j are completely depolarizing whenever A i or C j is the smallest system (e.g., if all systems have the same dimension, as in a typical many-body scenario). We note that if the average OTO correlators between A i , C j and A i , C c j are minimal for each i and j, then the MIMO unitary is maximally I 3 -scrambling.
By an explicit construction similar to that of eq. (3.2), we can establish that maximally I 3 -scrambling MIMO unitaries exist for arbitrarily large values of d.
Let M n be the following n × n matrix, where I n is the identity matrix and E n the matrix of ones. Then U d,n | x = |M n x defines a maximally I 3 -scrambling MIMO unitary. Here we write | x = |x 1 . . . |x n , and all arithmetic is modulo d.
We prove this by showing that the following three criteria on a matrix M are together sufficient to ensure that U M | x = |M x is maximally I 3 -scrambling: 1. M is an invertible matrix modulo d.
2. If we replace any row of M by any elementary row (i.e., a row with all 0's except for a single entry occupied by a 1) then the resulting matrix is still invertible modulo d.
3. All entries of M are invertible modulo d.
We then show that M n defined in (6.3) satisfies these conditions when d > n + 1 and is prime. The detailed proof is given in Appendix E. It is an interesting open question to determine sufficient and necessary conditions on the dimensions for maximally I 3 -scrambling MIMO unitaries to exist [26].
where the infimum is over all unitaries U 0 with vanishing tripartite information.
Proof. Recall that U d is given by We prove the first statement. The Choi state ρ d = |U d U d | of U d is given by where we write |U S for the Choi state of the maximally I 3 -scrambling unitary U S . On the other hand, Hence, for small d S , the overlap between the two Choi states is given by Using the relationship between trace distance and overlap of pure states [9], We have thus established (2.12). We now prove the second statement. Let U 0 be a minimally I 3 -scrambling unitary. By Theorem 1, we can write where A = A L ⊗ A R and similarly for B, C, D. Without loss of generality, |C R | ≥ |C| 1/2 . Otherwise, switch the roles of A, B in the following. We consider a state of the form where σ A S is an arbitrary state on A S ⊆ A = A L ⊗ A R . We will show that U d and U 0 lead to reduced density matrices on C with markedly different entropies, implying that U d , U 0 are well-distinguishable. It is clear from (A.1) and the form of On the other hand, using (A.3) we can compute the second reduced state as and hence that Thus, using the Fannes-Audenaert inequality (2.14), from which it follows that Hence, we can bound the trace distance between the output states using monotonicity, which in turn bounds the diamond norm (2.9): This establishes (2.13).

B Nonrobustness in the approximately maximal case
In this appendix we prove Lemma 7, restated again for convenience.

Lemma 7.
Consider the unitaries U d from (2.10) and their Choi states ρ ABCD,d . Then ρ AD,d is maximally mixed, and Proof. We start with the formula in (A.2) for the Choi state of U d , which can be written as

D Maximal scrambling and typical inputs
In Proposition 5 we found that the residual channel N A→C for maximally mixed input on B is completely depolarizing. In other words, its Choi state is maximally mixed, We calculate the first term using the swap trick: where F AC denotes the swap operator that exchanges the two copies of AC. The second moment of a Haar-random state is given by where I is the identity and F B the swap operator on the two copies of B . Thus: The first term can be bounded as where the last equality follows since the Choi state of U is maximally mixed on AC. For the second term, we compute In the first step, we have relabeled B to B' and inserted two copies of the maximally mixed state τ B ; in the second, we have extended the maximally mixed states to maximally entangled states Φ + BB and teleported the swap operator from the B systems to the B systems; in the third step, we have recognized the Choi state of U and undone the swap trick; and in the last we have used that ρ D is maximally mixed. Together, we obtain the following bound on the mean square deviation: By the Cauchy-Schwarz inequality, X 2 1 ≤ |A||C| X 2 2 , we get

Now Markov's inequality gives
and we obtain the desired bound: The fact that we need |A||C| |D| is intuitive: For any realization of the random pure state σ B , the state ρ ACD = U A B→CD |Φ + AA ⊗ |σ B is a purification of ρ AC . Thus, if ρ AC is to be maximally mixed then we clearly need that |A||C| ≤ |D|, since otherwise the Schmidt rank cannot be |A||C|.
One natural scenario to apply Proposition 14 is to the toy model of black hole evaporation discussed on p. 14 (with D and C interchanged). If A is small (e.g., a qubit) and the initial black hole B is in a typical pure state, the Hawking radiation emitted at later times D is decoupled from A if D is much smaller than the post-evaporation black hole C [1]. The only assumption necessary about the dynamics is that the black hole be maximally I 3 -scrambling.
Another natural scenario to apply Proposition 14 is in the context of maximally I 3scrambling MIMO unitaries as discussed in Section 6. Here, |A i ||C j | is usually much smaller than |C c j |. Hence, if we input a random pure state into A c i and half of a maximally entangled state into A i , then with high probability the reduced state on A i C j is close to being maximally mixed. We can make a even stronger statement by demanding which by [48] would imply that ρ A i C j is separable. By Choi-Jamiołkowski, this means N σ A c i A i →C j is entanglement-breaking. Using Proposition 14 the probability of this is at least 1 − |A i | 2 |C j | 2 /|C c j |. In the case where all systems are of size d, which vanishes for large n or d.

E Existence of maximally scrambling MIMO unitaries
In this appendix we prove Proposition 13. As discussed in Section 6, we first consider the case where M n is replaced by an arbitrary n × n matrix M and identify sufficient conditions for the corresponding unitary U M | x = |M x to be maximally I 3  We can calculate the determinant by cofactor expanding along the elementary row. If i = j then we obtain that det N n = ± det M n−1 , which is nonzero by the preceding. Otherwise, if i = j then find that det N n is up to sign equal to the determinant of the following (n − 1) × (n − 1) matrix, , which looks like M n−1 except that a 2 is replaced by a 1. We can use determinant-preserving row operations to reduce this matrix to