One-shot decoupling

If a quantum system A, which is initially correlated to another system, E, undergoes an evolution separated from E, then the correlation to E generally decreases. Here, we study the conditions under which the correlation disappears (almost) completely, resulting in a decoupling of A from E. We give a criterion for decoupling in terms of two smooth entropies, one quantifying the amount of initial correlation between A and E, and the other characterizing the mapping that describes the evolution of A. The criterion applies to arbitrary such mappings in the general one-shot setting. Furthermore, the criterion is tight for mappings that satisfy certain natural conditions. Decoupling has a number of applications both in physics and information theory, e.g., as a building block for quantum information processing protocols. As an example, we give a one-shot state merging protocol and show that it is essentially optimal in terms of its entanglement consumption/production.


Introduction
Correlations in quantum systems, and in particular entanglement, have been in the focus of (both theoretical and experimental) research in quantum information science over the past decades. As a result, one has nowadays a pretty good (although still not complete) understanding of quantum correlations and, in particular, the processes that create them. In this work, we take-so to speak-an opposite approach and study conditions under which two systems can be decoupled, i.e., brought to a state where they are uncorrelated. We call a system, B, decoupled from another system, E, if the joint state of the two systems, ρ B E , has product form ρ B ρ E . Operationally, this means that the outcome of any measurement on B is statistically independent of the outcome of any measurement on E. Or, in information-theoretic terms, the system E does not give any information on B (and can therefore safely be ignored when studying B).
Decoupling theorem. Our goal is to characterize the conditions under which the evolution of a system results in decoupling. For this, we consider a system, A, that may Decoupling. The initial system, A, may be correlated to a reference system E. The evolution is modeled as a mappingT from A to B. The final state of B is supposed to be independent of E. The subdivision ofT into a unitary U and a mapping T is required for the formulation of our decoupling criterion initially be correlated to E. Furthermore, we assume that the system A undergoes an evolution, described by a TPCPM 1T from A to B, during which no interaction with E takes place (see Fig. 1). The main result of this work is a decoupling theorem, i.e., a criterion that provides necessary and sufficient conditions for decoupling (of B from E). The criterion depends on two entropic quantities, characterizing the initial state, ρ AE , and the mappingT , respectively.
The decoupling criterion can be conceptually split into two parts, called achievability and converse part, which we now describe informally. The full technical statements are provided as Theorems 3.1 and 4.1 in Sects. 3 and 4, respectively. For their formulation, it is convenient to viewT as a sequence,T = T • U, where U is an arbitrary unitary on A, and T a fixed TPCPM from A to B. Achievability: decoupling up to an error ε is achieved for most choices of U if H ε min (A|E) ρ + H ε min (A|B) τ 0. (1) Converse: decoupling up to an error ε is not achieved for any choice of U if H ε min (A|E) ρ + H ε max (A|B) τ 0. (2) The criteria refer to the ε-smooth conditional min-and max-entropy introduced in [RW04,Ren05], which can be seen as generalizations of the von Neumann entropy (cf. Sect. 2 for definitions and properties). The ε-smooth conditional min-entropy H ε min (A|E) ρ is a measure for the correlation present in the initial state ρ AE -the larger this measure, the less dependent is A on E (see Table 1 for some typical examples). The quantities H ε min (A|B) τ (for the achievability) and H ε max (A|B) τ (for the converse) measure how well the mapping T conserves correlations. Roughly, they quantify the uncertainty one has about a "copy" of the input, A, given access to the output, B, of T (cf. Table 2). We note that the expressions for the achievability and for the converse essentially coincide in many cases of interest (see the discussion in Sect. 4).
As a typical example for decoupling, consider m qubits, A, that are classically maximally correlated to E (so that H ε min (A|E) ρ = 0, cf. second row of Table 1). Furthermore, assume that A undergoes a reversible evolution, U, after which we discard m −m qubits, corresponding to a partial trace, T = Tr m−m (see last example of Table 2). Our criterion then says that the remaining m qubits will, for most evolutions U, be decoupled from E whenever m < m/2. Conversely, if this condition is not satisfied, some correlation will necessarily be retained.
We mention that it is possible to phrase our achievability criterion for decoupling (1) in another (but equivalent) way. For TPCPMs T from A to B such that for every unitary The table illustrates how the term H ε min (A|E) ρ (for ε → 0) in the decoupling criterion depends on the initial state ρ AE . In all three examples, A is assumed to be a k-qubit system with orthonormal basis is an orthonormal family of states on E  The table illustrates how the term H ε min (A|B) τ in the decoupling criterion depends on the mapping T . In all five examples, the input space, A, is assumed to consist of m qubits with orthonormal basis {|i A } 2 m i=1 . The last two examples have a smaller output space consisting of only m qubits. The penultimate one can be seen as a combination of the first and the second, and the last one can be seen as a combination of the first and the third. (The smooth conditional min-entropies are evaluated for ε → 0) For more details about this formulation, see the discussion in Sect. 3.1.
Applications. The notion of decoupling has various applications in information theory and in physics. Many of these applications have in common that decoupling of a system B from a system E is used to show that B is maximally entangled with a complementary system, R. Indeed, under the assumption that R is chosen such that the joint state, ρ B E R , is pure, ρ B E = ρ B ρ E immediately implies that there exists a subsystem R of R such that the state on ρ B R is pure. If, in addition, ρ B is fully mixed, ρ B R is necessarily maximally entangled.
In the context of information theory, this type of argument is, for example, used to analyze state merging [HOW05,HOW07], i.e., the task of conveying a subsystem from a sender to a receiver | who already holds a possibly correlated subsystem | using classical communication and entanglement. Another example, where decoupling is used in a similar fashion, is the quantum reverse Shannon theorem [BSST02,BDH + 09,BCR11]. In fact, the proof of this theorem given in [BCR11] refers to a coherent form of state merging (also known as the fully quantum Slepian Wolf or mother protocol [ADHW09]) where the classical communication is replaced by quantum communication. Decoupling can also be used for the characterization of correlation and entanglement between systems, erasure processes, as well as channel capacities (see, e.g., [GPW05,Bus09,HHWY08]). In addition, its classical analogue, privacy amplification [BBCM95,RK05], is widely used in classical and quantum cryptography.
Decoupling processes are also crucial in physics. For example, the evolution of a thermodynamical system towards thermal equilibrium can be understood as a decoupling process, where the system under consideration decouples from the observer (somewhat analogous to the considerations in [LPSW09,Par89a,Par89b]). Recent work indeed shows that there is a close relation between smooth entropies and quantities that are relevant in thermodynamics [DRRV09,dRAR + 11,Hut11,FDOR12,Abe13,HO13]. Similarly, black hole radiation may be analyzed from such a point of view [HP07,BP07,PZ13]. Finally, one-shot decoupling techniques were also applied in solid state physics in order to show that 1D quantum states with exponential decay of correlations have an efficient classical approximate description as a matrix product state [BH13].
History and related work. While various standard results in quantum information theory have been proved using ideas related to decoupling, the concept came into its own with the discovery of state merging protocols [HOW05,HOW07] and, later, the fully quantum Slepian Wolf protocol [ADHW09]. These are based on specific decoupling processes where the mapping T is either a projective measurement or a partial trace. In this early work, the decoupling was analyzed in terms of the dimensions of certain subsystems (rather than smooth conditional entropies).
Based on the diploma thesis of one of us [Ber08], we have generalized these decoupling results to include mappings T that consist of combinations of projective measurements and partial trace-preserving. Furthermore, we expressed the decoupling criterion in terms of smooth conditional entropies. Subsequently, one of the authors derived in his doctoral thesis [Dup09] a general decoupling theorem that can be applied to any type of mapping. This result is essentially (up to the use of different entropy measures) equivalent to Theorem 3.1 presented here. We also note that the aforementioned characterizations of decoupling can be seen as special cases of this general result.
The above work was mostly concerned with achievability. Converse results were so far only known in special cases. In particular, we derived in [BRW07] and [Ber08] (see also [Ren09]) converse theorems for the case where the mapping T is a projective measurement. The converse theorem presented here, Theorem 4.1, generalizes these results.
We emphasize that the use of smooth conditional entropies is essential for applications of the decoupling technique in physics (see the discussion in Sect. 6).
Structure of the paper. In Sect. 2 we introduce the notation and review the definitions and main properties of the entropy measures used in this work. Our main achievability result for decoupling is given in Sect. 3, whereas Sect. 4 contains a converse that is tight in many cases of interest. The use of the decoupling technique is illustrated in Sect. 5, where we show how to obtain optimal one-shot quantum state merging. We conclude with a discussion in Sect. 6.

Preliminaries
and |A| = |B|, the canonical identity mapping from L(H A ) to L(H B ) with respect to these bases is denoted by I A→B , i.e., I A→B (|i j| A ) = |i j| B .
For ρ ∈ P(H), ρ ∞ denotes the operator norm of ρ, which is equal to the maximum eigenvalue of ρ. The trace norm of ρ ∈ L(H) is defined as ρ 1 = Tr( ρ † ρ) and the induced metric on S (H) is called trace distance. 2 The fidelity between ρ, σ ∈ S (H) is defined as F(ρ, σ ) = √ ρ √ σ 1 . We will make use of the Choi-Jamiołkowski isomorphism, which relates CPMs to positive operators, and which we denote by J .

Smooth entropies.
The smooth entropy formalism [Ren05,RW04] has been introduced in (classical and quantum) information theory to study general one-shot scenarios, in which nothing needs to be assumed about the structure of the relevant probability distributions or quantum states (e.g., those modeling noise processes in a communication channel). The formalism therefore overcomes a limitation of the established theory, where it is usually assumed that the relevant processes can be modeled as asymptotic sequences of independent and identically distributed (iid) subprocesses.
In this section we provide the definitions of the underlying entropy measures, called smooth min-and max entropy, and state some of their basic properties. Further properties are summarized in Appendix A. For a more detailed discussion of the smooth entropy formalism we refer to [Tom12,Ren05,KRS09,TCR09,TCR10,Dat09].
Recall the following standard definitions. The von Neumann entropy of ρ ∈ S = (H) is defined as 4 H (ρ) = − Tr(ρ log ρ) and the conditional von Neumann entropy of A given B for ρ AB Definition 2.2. Let ρ AB ∈ S (H AB ). The conditional min-entropy of A given B is defined as The conditional max-entropy of A given B is defined as The smooth conditional min-and max-entropy are defined by extremizing the nonsmooth versions over a set of nearby states, where nearby is quantified by the purified distance.
Definition 2.3. Let ρ, σ ∈ S (H). The purified distance between ρ and σ is defined as ) denotes the generalized fidelity.
The purified distance is a metric on S (H) [TCR10, Lemma 5]. As its name indicates, P(ρ, σ ) corresponds to the minimum trace distance between purifications of ρ and σ . For more about the purified distance we refer to [TCR10].
Henceforth ρ, σ ∈ S (H) are called ε-close if P(ρ, σ ) ε and this is denoted by ρ ≈ ε σ . We use the purified distance to specify an ε-ball around ρ ∈ S (H), Definition 2.4. Let ε 0 and ρ AB ∈ S (H AB ). The ε-smooth conditional min-entropy of A given B is defined as The ε-smooth conditional max-entropy of A given B is defined as We mention that the optimization problems defining the smooth conditional min-and max-entropy can be formulated as semi-definite programs [Tom12, Sect. 5.2.1]. This allows to efficiently compute them numerically. The smooth conditional min-and max-entropy are dual to each other in the following sense.
Lemma 2.5 [TCR10,Lemma 16]. Let ε 0, ρ AB ∈ S (H AB ) and let ρ ABC ∈ S (H ABC ) be an arbitrary purification of ρ AB . Then, we have that Smooth entropies satisfy various natural properties analogous to those known for the von Neumann entropy. One of them is the invariance under local isometries.
Lemma 2.6 [TCR10, Lemma 13/15]. Let ε 0, ρ AB ∈ S (H AB ), and let U A→C and V B→D be isometries from A to C and B to D, respectively. Then, we have that Another important property is the data processing inequality.
Smooth entropies are generalizations of the von Neumann entropy, in the sense that the von Neumann entropy can be retrieved as a special case via the quantum asymptotic equipartition property (AEP).
Lemma 2.8 [Tom12, Corollary 6.6 and 6.7]. Let 0 < ε < 1 and ρ AB ∈ S = (H AB ). Then, we have that For more properties of smooth entropies we refer to the Appendix A and [Tom12, For technical reasons we will also need the following auxiliary quantities.
Definition 2.9. Let ρ AB ∈ S (H AB ). The conditional collision entropy of A given B is defined as Definition 2.10. Let ρ AB ∈ S (H AB ) and σ B ∈ S (H B ). We define It can be shown that It can be shown that Finally, we note that, since all Hilbert spaces in this paper are assumed to have finite dimension, the infima and suprema in the expressions above can be replaced by minima and maxima, respectively.

Achievability
In this section, we present and prove a general decoupling theorem (Theorem 3.1), which corresponds to the achievability part of the criterion sketched informally in Sect. 1. The theorem subsumes and extends previous results in this direction.

Statement of the decoupling theorem.
As explained in the introductory section (see Fig. 1), we consider a mapping from a system A to a system B. The mapping consists of a unitary on A, selected randomly according to the Haar measure over the unitary group on H A , followed by an arbitrary mapping T = T A→B . In applications, T often consists of a measurement or a partial trace (see Table 2 for examples). The decoupling theorem then tells us how well the output, B, of the mapping T is decoupled (on average over the choices of the unitary) from a reference system E. 1. Then, we have that where · dU denotes the integral over the Haar measure over the full unitary group on Here, the total CPM is of the formT = T • U with the unitary channel U(·) = U A (·)U † A and U A chosen at random. We note that, equivalently, we may think ofT as a channel that chooses at random a unitary U A and outputs the choice of U A , together with the output of T .
The decoupling theorem (Theorem 3.1) provides a bound on the quality of decoupling that only depends on two entropic quantities, H ε min (A|E) ρ and H ε min (A|B) τ . The first is a measure for the correlations between A and E that are present in the initial state, ρ AE . The second quantifies properties of the mapping T , which is characterized by the bipartite state τ AB obtained via the Choi-Jamiołkowski isomorphism J . Hence, in order to minimize the right hand side of (23), no channel ends up being better suited for some types of states than for others or vice-versa. Furthermore, as discussed in Sect. 4, the bound in (23) is essentially optimal in many cases of interest. We also note that, using Markov's inequality, the expectation value over the unitaries U can be turned into a bound that holds for most unitaries. That is, for any μ > 0, holds with probability at least 1 − μ (for U chosen according to the Haar measure). Finally, as sketched in the introductory section, the decoupling theorem (Theorem 3.1) can also be phrased in another (but equivalent) way.
Proof. By the decoupling theorem (Theorem 3.1) for the map T A→B , there exists a unitary U A such that Since there exists by assumption a unitary Furthermore, again by assumption, there exists a unitary W B such that

and thus we get
Finally, we arrive at the claim by combining this with (27).
To see why this alternative formulation (Corollary 3.2) is equivalent to the decoupling theorem (Theorem 3.1) we may think of the total map in Theorem 3.1 as a channel that chooses at random a unitary U A and outputs the choice of U A , together with the output of T . By inspection, this total map then fulfills the assumption of Corollary 3.2.
Our first step in proving Theorem 3.1 is to prove a version involving non-smooth min-entropies (Theorem 3.3). Then, in a second step, we show that smoothing preserves the essence of the theorem. Note that Theorem 3.3 may be of interest in cases where no smoothing is required since it is slightly more general: it applies to any completely positive T , not only trace-non-increasing ones.

Theorem 3.3 (Non-Smooth Decoupling Theorem). Let ρ AE ∈ S (H AE ) and let T A→B be a CPM with Choi-Jamiołkowski representation τ AB = J (T ). Then, we have that
where · dU denotes the integral over the Haar measure over the full unitary group on H A .

Technical ingredients to the proof.
The proof of the non-smooth decoupling theorem (Theorem 3.3) is based on a few technical lemmas, which we state and prove in the following, and which may be of independent interest. We note that they partly generalize techniques developed in the context of privacy amplification [RK05,Ren05,TRSS10] as well as earlier work on decoupling (see, e.g., [HOW07]).
The second lemma involves averaging over Haar distributed unitaries. While it would take us too far afield to formally introduce the Haar measure, it can simply be thought of as the uniform probability distribution over the set of all unitaries on a Hilbert space.
The following then tells us the expected value of The following bounds the ratio of the purity of a bipartite state and the purity of the reduced state on one subsystem. Lemma 3.6. Let ξ AB ∈ P(H AB ). Then, we have that Proof. Letting A be a system isomorphic to A, we first prove the left-hand side where the inequality is due to an application of Cauchy-Schwarz. The right-hand side follows from the fact that ξ AB |A| · 1 A ξ B . This can in turn be seen from the fact that we can write In the main proof, we will need to bound the trace distance between two states. The following lemma will allow us to do this.
In particular, if M is Hermitian then, we have that This is a slight generalization of [Ren05, Lemma 5.1.3]. For completeness we give a different proof here.
Proof. We calculate where the inequality results from an application of Cauchy-Schwarz, and the maximizations are over all unitaries on A. The last equality follows from

Proof of the non-smooth decoupling theorem (Theorem 3.3).
Throughout the proof, we will denote with a prime the "twin" subsystems used when we take tensor copies of operators, and F S denotes a swap between S and S . We first use Lemma 3.7 to bound the trace norm. For We then rewrite the above as Using Jensen's inequality we obtain We now simplify the integral We rewrite the first term as follows where we have used the swap trick (Lemma 3.4) with F B E = F B F E in the first equality, the definition of the adjoint of a superoperator in the third equality and the linearity of the trace in forth equality. We now compute the integral using a lemma about Haar distributed unitaries (Lemma 3.5) where α and β satisfy the following equations and In the third equality, we have used the fact thatτ AB is a Choi-Jamiołkowski representation ofT (Lemma 2.1), and the fourth equality is due to the fact that the adjoint of the partial trace is tensoring with the identity. Solving this system of equations yields By applying Lemma 3.6, we can simplify this to α Tr τ 2 B and β Tr τ 2 AB . Substituting this into (44) and using the swap trick twice (Lemma 3.4), and then substituting into (42) yields Finally we get the theorem by using the definitions ofτ AB ,ρ AE and the definition of the conditional collision entropy (Definition 2.9). (Theorem 3.1). We now prove our main result, which is obtained from the non-smooth decoupling theorem (Theorem 3.3) by replacing the conditional collision entropies by smooth conditional min-entropies. First, note that the conditional collision entropy is always greater or equal to the conditional min-entropy (Lemma A.1) and therefore we are allowed to replace the H 2 terms on the right-hand side of the statement of Theorem 3.3 by H min terms. Thus we only have to consider the smoothing.

Proof of the main decoupling theorem
Let having orthogonal support as well as δ ± AE ∈ P(H AE ). By the equivalence of purified distance and trace distance (Lemma B.1) we have τ AB − τ AB 1 2ε and hence where we have used the triangle inequality for the trace distance in the second inequality. We now deal with the second term above We deal with the third term in a similar fashion This results in

Converse
The main purpose of this section is to state and prove a theorem (Theorem 4.1) which implies that the achievability result of the previous section (Theorem 3.1) is essentially optimal for many natural choices of the mapping T .
4.1. Statement of the converse theorem. According to Theorem 3.1, decoupling is achieved whenever the term H ε min (A|E) ρ + H ε min (A|B) τ is sufficiently larger than 0. Our converse now says that this is also a necessary condition (up to additive terms of the order log(1/ε) and the scaling of the smoothing parameter) if one replaces the smooth conditional min-entropy in the second term, H ε min (A|B) τ (which characterizes the channel), by a smooth conditional max-entropy.
Then, we have for any ε , ε > 0 that where

purification of ρ A , and
Note that we could also write ω AB = |A| √ ρ A J (T ) √ ρ A . In our formulation of the converse theorem, the mapping T is not necessarily prepended by a unitary and the state that appears in the entropy term of the TPCPM is given by the more general expression ω AB = T A →B (ρ A A ) (rather than τ AB = J (T ) as in Theorem 3.1, corresponding to the case where ρ A is fully mixed). However, if we apply the converse to a TPCPM of the formT = T • U, where U corresponds to a random unitary channel applied to the input, Theorem 4.1 simplifies to the following.

Corollary 4.2. For the same premises as in Theorem 4.1, but applied to the TPCPM T A→B = T A→B • U A , where U A corresponds to a Haar random unitary channel applied to the input, we have that
where τ AB = J (T ).
Proof. By assumption we have and since the unitary U A is chosen at random, this is equivalent to where F A→AU denotes the TPCPM that chooses at random a unitary U A and outputs the choice of U A . Now, let σ AU E R be a purification of σ AU E = F A→AU (ρ AE ) and note that σ A = 1 A |A| as well as σ E = ρ E . We apply Theorem 4.1 to (59) with the map T A→B and the state σ AU E to get for δ = 2 √ 6ε + 2ε + 2 √ ε + ε . Since the state σ AU E R and the maximally entangled state | | A A are both purifications of 1 A |A| , there exists by Uhlmann's theorem [Uhl76] , and by the invariance of the smooth conditional max-entropy under local isometries (Lemma 2.6) we get Finally, we show that H δ min (A|U E) σ in (60) is upper bounded by H δ min (A|E) ρ . Since the register U in σ AU E is classical, we can copy U to another register U resulting in the state σ AUU E . With Lemma A.7 we then have But now there exists an isometry V AU →A that reverses the action of the TPCPM F such that V AU →A (σ AU E ) = ρ AE (we let V act on the copy U instead of U ). Using the data processing inequality for the smooth conditional min-entropy (Lemma 2.7) and the invariance of the smooth conditional min-entropy under local isometries (Lemma 2.6), we conclude It can also be verified that the two terms, H ε min (A|B) τ (from the achievability in Theorem 3.1) and H ε max (A|B) τ (from the converse in Corollary 4.2), coincide whenever the relevant states are essentially flat (i.e., proportional to projectors). This is the case for many channels used in applications (e.g., for state merging, cf. Sect. 5). Examples of such channels are given in Table 2. Furthermore, as we shall explain in the discussion section (Sect. 6), the two terms coincide asymptotically for iid channels.

Proof of the converse theorem (Theorem 4.1). Let ρ AE R be a purification of ρ AE , W A→B B a Stinespring dilation [Sti55] of T A→B and definẽ
We have by Uhlmann's theorem [Uhl76] that ω AB andσ B E R are related by an isometry V A→E R , and hence by the invariance of the smooth conditional max-entropy under local isometries (Lemma 2.6) that Such a state exists by Uhlmann's theorem [Uhl76], and can be shown to satisfy P(σ , σ ) √ 6ε + 2ε. The latter bound is obtained from combined with the equivalence of purified distance and trace distance (Lemma B.1). Now, we know from a technical lemma about the conditional max-entropy (Lemma B.2) that where This implies that for any ε > 0. Tracing out the R system, we get We now define Note that G is a contraction, i.e., G ∞ 1, where we have used the operator monotonicity of f (t) = −1/t. At this point, we conjugate both sides of (70) by G B E to get which implies We will now need to show that ψ B E B is (2 √ 6ε + 2ε + 2 √ ε + ε )-close toσ B E B , because the invariance of the smooth conditional min-entropy under local isometries (Lemma 2.6) then implies the claim To this end, we shall define the following vectors We first show that all these vectors define subnormalized states such that the purified distance between them is well-defined. Since G B E is a contraction, we immediately get that |ψ B E R B 1 and |ψ B E R B 1. Furthermore, we have that We have ψ |ψ = √ 1 − ε , and where the inequality is due to the operator monotonicity of the square-root function. Therefore, we have that P(ψ ,σ ) 2 √ ε and furthermore P(ψ ,σ ) = P(ψ ,σ ), since Since conjugation by G is trace-non-increasing, we also have P(ψ , ψ) P(σ,σ ) √ 6ε + 2ε. This implies P(ψ,σ ) P(ψ, ψ ) + P(ψ ,σ ) + P(σ , σ ) + P(σ,σ )

One-Shot State Merging
As an example application of the decoupling theorem and its converse we discuss oneshot quantum state merging. This is a two-party task: its goal is to transfer the information contained in a quantum system, A, initially held by one party, Alice, to the other party, Bob. This should be achieved with only limited resources (such as entanglement or communication). It is taken into account that Bob may have access to a quantum system, B, correlated to A, which may be used to minimize the use of resources. The term oneshot is used to emphasize that the task is considered in the general one-shot scenario. As explained in the discussion section, the asymptotic iid results, where many independent copies of a given state are transferred, can be recovered as a special case. The notion of quantum state merging has been introduced in [HOW05,HOW07] and a protocol has been proposed that achieves the task in the asymptotic iid scenario. The more general one-shot setup we consider here was first analyzed in [Ber08] and preliminary results appeared in [KRS09].
We start giving a formal definition of quantum state merging [HOW05,HOW07,Ber08]. Let ρ AB be the joint initial state of Alice and Bob's systems. We can view this state as part of a larger pure state ρ AB E that includes a reference system E. In this picture state merging means that Alice can send the A-part of ρ AB E to Bob's side without altering the joint state. We consider the particular setting proposed in [HOW05] where classical communication from Alice to Bob is free, but no quantum communication is possible. Furthermore, Alice and Bob have access to a source of entanglement and their goal is to minimize the number of entangled bits consumed during the protocol (or maximize the number of entangled bits that can be generated).
where ρ B B E = (I A→B I B E )ρ AB E for a purification ρ AB E of ρ AB , and K , L are maximally entangled states on A 0 B 0 , A 1 B 1 of Schmidt-rank K and L, respectively. The number is called entanglement cost. 5 We are interested in quantifying the minimal entanglement cost for quantum state merging of ρ AB with error ε. For this, we use the achievability and converse for decoupling (Theorems 3.1 and 4.1). These allow us to derive essentially tight (up to additive terms of the order log(1/ε) and the scaling of the smoothing parameter) bounds on the entanglement cost.
The basic idea underlying our analysis of quantum state merging is the observation that the desired situation after the protocol execution is necessarily such that Alice's system is decoupled from the reference. Furthermore, it follows from Uhlmann's theorem [Uhl76] that this decoupling is also sufficient.
Proof. Let ρ AB E be a purification of ρ AB . The intuition is as follows. In the first step of the protocol, Alice decouples her part from the reference (employing Theorem 3.1), where she chooses a rank-L projective measurement as the TPCPM, and she sends the measurement result to Bob. For all measurement outcomes the post-measurement state on Alice's side is then approximately given by  [Uhl76]); this is then the second step of the protocol.
More formally, choose K and L such that which is the entanglement cost of the protocol. 6 Choose N fixed orthogonal subspaces of dimension L on A A 0 , 7 denote the projectors on these subspaces followed by a fixed unitary mapping it to A 1 by P x A 0 A→A 1 and define the isometry Denote by U A 0 A a unitary selected randomly according to the Haar measure over the unitary group on H A 0 A and write Now the first step of the protocol is to apply this unitary followed by the isometry (87), and to send the X B system to Bob. In order to take into account that the channel is classical, we keep a copy X A at Alice's side. By the decoupling theorem (Theorem 3.1) we get for that 6 Since we need K , L ∈ N, we can not choose log K − log L exactly equal to H ε 2 /13 max (A|B) ρ + 4 log(1/ε) + 2 log 13 in general. Rather, we need to choose K , L ∈ N such log K − log L is minimal but still greater or equal than H ε 2 /13 max (A|B) ρ + 4 log(1/ε) + 2 log 13. 7 For simplicity assume that K · |A| is divisible by L. In general one has to choose N − 1 fixed orthogonal subspaces of dimension L and one of dimension L = K · |A| − (N − 1) · L < L. The proof remains the same, although some coefficients change.
where A 0 A is a copy of A 0 A, and We can simplify this using the superadditivity of the smooth conditional min-entropy (Lemma A.2) and the duality between smooth conditional min-and max-entropy (Lemma 2.5) Furthermore, because τ A 0 A A 1 X A is classical on X A , we can use a lemma about the conditional min-entropy of classical-quantum states (Lemma A.5) and get H ε 2 /13 where But since P x A 0 A→A 1 is a rank L projector, we can use a dimension lower bound of the conditional min-entropy (Lemma A.3) to conclude that for all x This together with (86), (91) and (94) implies and hence F(σ A 1 X A E , .1). In the second step of the protocol, Bob decodes the system to the state ρ B B E A 1 B 1 . A suitable decoder can be shown to exist using Uhlmann's theorem [Uhl76]. There exists an isometry V B B 0 X B →B B B 1 X B such that for and with that Expressing this in the purified distance (with Lemma B.1) and discarding X A X B , we obtain a ε-error quantum state merging protocol for ρ AB E .
Proof. We start with noting that any ε-error quantum state merging protocol for ρ AB can be assumed to have the following form: applying local operations at Alice's side, then sending a classical register from Alice to Bob, and finally applying local operations at Bob's side. For a purified state ρ AB E , the protocol produces a state ε-close to As can be seen from the definition, it is a necessary step for any quantum state merging protocol to decouple Alice's part from the reference. The idea of the proof is to use the converse for decoupling (Theorem 4.1). This then results in the desired converse for quantum state merging.
More precisely, a general ε-error quantum state merging protocol for ρ AB E has the following form. At first some TPCPM is applied to the input state K A 0 B 0 ρ AB E . By the Stinespring dilation [Sti55] we can think of this TPCPM as an isometry where the M x A 0 A→A 1 A G are partial isometries and A G , X A are additional 'garbage' registers on Alice's side that will be discarded in the end. The isometry W results in the state with The next step of the protocol is then to send the classical register X B to Bob. Now let us analyze how the state γ A 1 A G X A E has to look like. By the definition of quantum state merging (Definition 5.1) the state at the end of the protocol has to be ε-close to L A 1 B 1 ρ B B E . This implies that Alice's part A 1 has to be decoupled from the reference. But because the state L A 1 B 1 ρ B B E is pure this also implies that all additional registers, that we might have at the end of the protocol, have to be decoupled as well. Thus we need and in trace distance (using Lemma B.1) this reads Using the converse for decoupling (Theorem 4.1) for the isometry W A 0 A→A 1 A G X B X A in (105) followed by the partial trace over X B , we get that the decoupling condition (109) implies for any ε , ε > 0 that where for As a next step we simplify this in order to bring the converse into the desired form.
Choosing ε = ε 2 and ε = ε, using a dimension upper bound for the smooth conditional min-entropy (Lemma A.4), and the duality between smooth conditional min-and max-entropy (Lemma 2.5) we obtain By the decoupling criterion in purified distance (Eq. (108)), the state ω A 0 A A 1 A G X A has to be ε-close to a state where q x is some probability distribution and ξ x and by a lemma about the conditional max-entropy of classical-quantum states (Lemma A.6) Using the duality between conditional min-and max-entropy (Lemma 2.5) and a polar decomposition of ξ x Hence, the converse becomes (117)

Discussion
The main contribution of this work is a decoupling theorem, i.e., a sufficient (Theorem 3.1) and necessary (Theorem 4.1) criterion for decoupling in terms of smooth conditional entropies. These criteria can then be applied to obtain tight characterizations of various operational tasks. As outlined in Sect. 5 by means of state merging, such applications are often possible because of a duality between independence and maximum entanglement: given a pure state ρ B E R such that ρ B is maximally mixed, the property that the subsystem B is independent of E and the property that B is fully entangled with R are equivalent. A crucial property of our decoupling criterion is that it gives (nearly optimal) bounds in a one-shot scenario, where the decoupling map T may only be applied once (or, by replacing T by T k , any finite number of times). For a typical example, consider m qubits, A, and assume that A undergoes a reversible evolution, U, after which we discard m − m qubits, corresponding to a partial trace, T = Tr m−m (see last example of Table 2). Our decoupling theorem (Theorem 3.1) then shows that decoupling up to an error ε is achieved for most choices of U if In contrast to this, the original decoupling results [ADHW09], formulated in terms of smooth non-conditional entropies, only show that decoupling up to an error ε is achieved for most choices of U if To see that this latter bound may be arbitrarily weaker than the bound (118) that uses smooth conditional entropies, consider the following completely classical state. Let A and E be perfectly correlated, and let the marginal distribution of A (and E) have one value that is taken with probability 1/2, and be uniform over the remaining 2 m −1 values. Then we have (for ε 0 close to zero) The difference between these two bounds is conceptually relevant. An example illustrating this is the quantitative Landauer's principle derived recently in [FDOR12]. The result, which is based on the bound (118), shows that correlations between the inputs and outputs of an irreversible mapping are relevant for the thermodynamic work cost of implementations of the mapping. These correlations would not be accounted for if a bound of the form (119) was used for the derivation of the principle. In contrast to the original results on decoupling that are based on specific decoupling processes (where the mapping T is either a partial trace [ADHW09] or a projective measurement [HOW07]), our decoupling criterion is also applicable to general mappings T . This extension is, e.g., employed in [Hut11, Sect. 5] in order to discuss the postulate of equal a priori probability in quantum statistical mechanics.
Our generalizations of the decoupling technique are crucial for other applications in physics as well, e.g., for the analysis of thermodynamic systems [dRAR + 11], for finding an efficient classical description of 1D quantum states with an exponential decay of correlations [BH13], or for the study of black hole radiation [HP07,BP07,PZ13].
Information-theoretic applications other than state merging (cf. Sect. 5) have been investigated in the doctoral thesis of one of the authors [Dup09]. One of these applications is channel coding. Here, Alice wants to use a noisy quantum channel N A→B to send qubits to Bob with fidelity at least 1 − ε. The idea is that decoding is possible whenever a purification of the qubits Alice is sending is decoupled from the channel environment. One can therefore get a coding theorem directly from Theorem 3.1 by setting T to be the complementary channel of N (i.e., consider a Stinespring dilation [Sti55]  Another application where decoupling can be employed as a building block for constructing protocols is the simulation of noisy quantum channels using perfect classical channels together with pre-shared entanglement. The fully quantum reverse Shannon theorem asserts that this is possible using only a classical communication rate equal to the capacity of the channel to be simulated [BSST02,BDH + 09]. In [BCR11], a proof of this theorem using one-shot decoupling has been proposed. Our one-shot decoupling results contrast with (and are strictly more general than) the iid scenario 8 usually considered in information theory, where statements are proved asymptotically under the assumption that the underlying processes (such as channel uses) are repeated many times independently. We note that asymptotic iid statements can be easily retrieved from the general one-shot results using the quantum asymptotic equipartition property (AEP) for smooth entropies [Ren05,TCR09] (see Lemma 2.8). Consider decoupling with a map of the formT = T • U (with U a random unitary channel). If the map T as well as the initial state ρ AE consist of many identical copies, i.e., T n and ρ n AE , then the achievability bound of Theorem 3.1, i.e., the condition that is sufficient for decoupling, turns into the criterion where H denotes the (conditional) von Neumann entropy. Analogously, the converse in Corollary 4.2 (i.e., the condition which is necessary for decoupling for maps of this form) turns into In other words, in the iid scenario, the achievability bound (121) and the converse bound (122), taken together, imply an exact characterization of decoupling.
We have the following dimension lower and upper bounds for the (smooth) conditional min-entropy.
that is, λ is minimal such that λ · 1 AB σ C −ρ ABC 0. By taking the partial trace over B we get λ · |B| · 1 A σ C −ρ AC 0. Furthermore we have by the monotonicity of the purified distance [TCR10, Lemma 7] thatρ AC ∈ B ε (ρ AC ) and hence where μ ∈ R is minimal such that μ · 1 A σ C −ρ AC 0. Thus λ · |B| μ and therefore The following lemma is about the conditional min-entropy of quantum-classical states.
Lemma A.5. Let ρ AB X ∈ S = (H AB X ) with ρ AB X = x p x · ρ x AB |x x| X and ρ x AB ∈ S = (H AB ) for all x. Then, we have that Proof. By the operational interpretation of the conditional min-entropy as the maximal achievable singlet fraction [KRS09, Theorem 2] we have where the maximum is taken over all TPCPMs F B X→A , | A A = |A| −1/2 i |x A |x A , and H A ∼ = H A . Writing out the conditional min-entropy terms on the right hand side of (130) in the same manner we obtain The claim is therefore equivalent to Now, because the state ρ AB X is classical on X , the maximization on the left hand side can without loss of generality be restricted to TPCPMs that first measure on X in the basis {|x } and then do some TPCPM F x B→A conditioned on the measurement outcome x. By the linearity of the square of the fidelity when one argument is pure, the claim then follows.
The following lemma is about the conditional max-entropy of quantum-classical states.
Finally we obtain P(ρ ABC , ρ ABC ) P(ρ ABC ,ρ ABC ) + P(ρ ABC , ρ ABC ) + P(ρ ABC , ρ ABC ) ε + ε + ε + ε = ε + 2ε + ε , and thus together with (147)  From the definition of the conditional max-entropy (Definition 2.10) and Uhlmann's theorem [Uhl76] it is clear that the optimal value of the primal problem is 2 H max (A|B) ρ|σ . One can also easily show that strong duality holds (i.e., that the optimal value of the dual problem is equal to that of the primal problem). One simply needs to show that there exists a Z AB such that Z AB 1 C > ρ ABC , which holds for Z AB = 2 · 1 AB . Now, we need to show that the optimal Z AB for this problem has the form given in the lemma statement. First, note that by Uhlmann's theorem [Uhl76], there must exist an optimal X ABC which has rank 1, assuming we consider the system C to be large enough.
Tracing out C and using the fact that F(ρ, ϕ) 2 = 2 H max (A|B) ρ|σ , we get If σ B has full rank, we get the expression for Z AB by conjugating both sides by σ −1/2 B . Finally, the fact that Tr[Z AB σ B ] = 2 H max (A|B) ρ|σ can simply be computed from the expression for Z .

Lemma B.3. Let ρ AB ∈ S (H AB ) and σ A ∈ S (H A ). Then, there exists T A ∈ L(H A ) with
an extension of σ A such that P(ρ AB , σ AB ) = P(ρ A , σ A ).
Proof. Define X A = σ 1 2 A ρ 1 2 A and polar decompose X A = V A (X † A X A ) 1/2 . Furthermore define T A = σ 1 2 A V A ρ − 1 2 A , where the inverse is a generalized inverse. 9 We have which shows that σ AB = (T A 1 B )ρ AB (T † A 1 B ) is an extension of σ A . Thus it remains to prove that P(ρ AB , σ AB ) = P(ρ A , σ A ).
For this we first assume that ρ AB is pure and normalized, i.e., ρ AB = |ρ ρ| AB ∈ S = (H AB ). Then, we have that If ρ AB = |ρ ρ| AB is not normalized we obtain analogously The statement for a general ρ AB (not necessarily pure) follows by the monotonicity of the purified distance [TCR10, Lemma 7] under partial trace.