Quantum Conditional Mutual Information and Approximate Markov Chains

A state on a tripartite quantum system A⊗B⊗C\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${A \otimes B \otimes C}$$\end{document} forms a Markov chain if it can be reconstructed from its marginal on A⊗B\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${A \otimes B}$$\end{document} by a quantum operation from B to B⊗C\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${B \otimes C}$$\end{document}. We show that the quantum conditional mutual information I(A : C|B) of an arbitrary state is an upper bound on its distance to the closest reconstructed state. It thus quantifies how well the Markov chain property is approximated.


Introduction
The conditional mutual information I(A : C|B) ρ = H(ρ AB ) + H(ρ BC ) − H(ρ B ) − H(ρ ABC ) of a state ρ ABC on a tripartite system A ⊗ B ⊗ C is meant to quantify the correlations between A and C from the point of view of B.Here H(ρ) = −tr(ρ log 2 ρ) is the von Neumann entropy.Apart from its central role in traditional information theory, the conditional mutual information has recently found applications in new areas of computer science and physics.Examples include communication and information complexity (see [10] and references therein), de Finetti type theorems [8,9] and also the study of quantum many-body systems [34].The importance of the conditional mutual information for such applications is due to its various useful properties.In particular, it has an additivity property called the chain rule: I(A 1 . . .A n : C|B) = I(A 1 : C|B) + I(A 2 : C|BA 1 ) + • • • + I(A n : C|BA 1 . . .A n−1 ).
When the B system is classical, the conditional mutual information I(A : C|B) has a simple interpretation: it is the average over the values b taken by B of the (unconditional) mutual information evaluated for the conditional state on the system A ⊗ C.This is crucial for applications because the (unconditional) mutual information can be related to operational quantities such as the distance to product states using Pinsker's inequality for instance.However, when B is quantum, the conditional mutual information is significantly more complicated and much less is known about it.In fact, even the fact that I(A : C|B) ≥ 0, also known as strong subadditivity of the von Neumann entropy, is a highly non-trivial theorem [37].The structure of states that satisfy I(A : C|B) ρ = 0 was also studied [42,26].It has been found that a zero conditional mutual information characterises states ρ ABC whose C system can be reconstructed just by acting on B, i.e., there exists a quantum operation T B→BC from the B to the B ⊗ C system such that ρ ABC = T B→BC (ρ AB ) . ( States ρ ABC that satisfy this condition are called (quantum) Markov chains.When B is classical the condition (1) simply means that, for all values b taken by B, the conditional state on A ⊗ C is a product state.We say that A and C are independent given B.
A natural question that is very relevant for applications is to characterise states for which the conditional mutual information is approximately zero, i.e., for which it is guaranteed that I(A : C|B) ≤ ǫ for some ǫ > 0. In applications involving n systems A 1 , . . ., A n , such a guarantee is often obtained from an upper bound on the total conditional mutual information I(A 1 . . .A n : C|B) ≤ c (which can even be the trivial bound 2 log 2 dim C).The chain rule mentioned above then implies that, on average over i, we have I(A i : C|BA 1 . . .A i−1 ) ≤ c/n.The authors of [28] gave evidence for the difficulty of characterising such states in the quantum setting by finding states for which the conditional mutual information is small whereas their distance to any Markov chain is large (see also [15] for more extreme examples).Recent works by [55,33,56] made the important observation that instead of considering the distance to a (perfect) Markov chain, another possibly more appropriate measure would be the accuracy with which Eq. 1 is satisfied.In fact, it was conjectured in [33] that the conditional mutual information is lower bounded by the trace distance between the two sides of Eq. 1 for a specific form for the map T B→BC known sometimes as the Petz map (cf.Eq. 15 below).Later, in the context of studying Rényi generalisations of the conditional mutual information, the authors of [5] refined this conjecture by replacing the trace distance with the negative logarithm of the fidelity (see also [47]).Here, we prove a variant of this last conjecture where the map T B→BC does not necessarily have the form of a Petz map.
Main result.We prove that for any state ρ ABC on A⊗B ⊗C there exists a quantum operation T B→BC from the B system to the B ⊗ C system such that the fidelity of the reconstructed state is at least1 We refer to Theorem 5.1 for a more precise statement.

Reformulations and implications.
A first immediate implication of our inequality is the strong subadditivity of the von Neumann entropy, I(A : C|B) ρ ≥ 0 [37].The latter may be rewritten in terms of the conditional von Neumann entropy, H(A|B) ρ = H(ρ AB ) − H(ρ B ), as and is also known as the data processing inequality.Furthermore, (3) implies that if (4) holds with equality for some state ρ ABC then it satisfies the Markov chain condition (1), reproducing the result from [42,26].The work presented here may thus be viewed as a robust extension of this result -if (4) holds with approximate equality then the Markov chain condition is fulfilled approximately.
Our result may also be rewritten as where the infimum ranges over all recovered states, i.e., states of the form (2), and where D 1/2 (ρ σ) = −2 log 2 F (ρ, σ) is the Rényi divergence of order α = 1/2 [38,54].We remark that the quantity on the left hand side is equal to the surprisal of the fidelity of recovery, which has been introduced and studied in detail in [47].Finally, we note that (3) also implies an upper bound on the trace distance, which we denote by ∆(•, •), between ρ ABC and the recovered state σ ABC , The bound is readily verified using ∆(•, •) 2 ≤ 1 − F (•, •) 2 (cf.Lemma B.1) and 1 − 2 −x ≤ ln(2)x.
Tightness.One may ask whether, conversely to our main result, the conditional mutual information of a state ρ ABC also gives a lower bound on its distance to any reconstructed state σ ABC of the form (2).
To answer this question, we note that, as a consequence of the data processing inequality, we have The entropy difference on the right hand side can be bounded by the Alicki-Fannes inequality [1] in terms of the trace distance between the two states, yielding2 This can be seen as a converse to (6).To simplify the comparison, we may use which gives Note that a term proportional to the logarithm of the dimension of A is necessary in general as the trace distance is always upper bounded by 1, whereas the conditional mutual information may be as large as 2 log 2 dim A.
The classical case.Inequality (3) is easily obtained in the case where B is classical, i.e., when ρ ABC is a qcq-state, for some probability distribution P B , an orthonormal basis {|b } b of B, and a family of states {ρ AC,b } b on A ⊗ C. Let T B→BC be any map such that where ρ C,b = tr A (ρ AC,b ).Then the reconstructed state σ ABC = T B→BC (ρ AB ) is the qcq-state where ρ A,b = tr C (ρ AC,b ).We remark that σ ABC is a Markov chain.Furthermore, a straightforward calculation shows that the relative entropy3 D(ρ ABC σ ABC ) between ρ ABC and σ ABC is given by Inequality (3) then follows from Lemma B.2.

Related results
. While the conditional mutual information is well understood in the classical case and has various interesting properties (see, e.g., [46]), these properties do not necessarily hold for quantum states.For example, identity ( 14) cannot be generalised directly to the case where B is non-classical (see [55] for a discussion).Furthermore, it has been discovered that there exist states ρ ABC that have a large distance to the closest Markov chain, while the conditional mutual information is small [28,15,21].
We remark that this is not in contradiction to (3) as the reconstructed state σ ABC , defined by (2), is not necessarily a Markov chain.(Note that this is a major difference to the classical case sketched above.)As mentioned above, the special case of (3) where I(A : C|B) = 0 has been studied in earlier work [42,26].There, it has also been shown that the relevant reconstruction map T B→BC is of the form However, it remained unclear whether this particular map also works in the case where I(A : C|B) is strictly larger than zero, even though several conjectures in this direction were proposed and studied [55,33,56,5].We refer to [36] for a detailed account of the evolution of these conjectures.We note that our result provides some information about the structure of the map for which (3) holds (cf.Theorem 5.1), but leaves open the question whether it is of this particular form.
There is a large body of literature underlying the fundamental role that the conditional mutual information plays in quantum information theory.Notably, it has been shown to characterise the communication rate for the task of quantum state redistribution in the asymptotic limit of many independent copies of a resource state [19].Furthermore, the quantum conditional mutual information is the basis for an important measure of entanglement, known as squashed entanglement [16].The properties of this entanglement measure thus hinge on the properties of I(A : C|B).In this context, lower bounds on I(A : C|B) in terms of the distance between the marginal ρ AC from the set of separable states have been proved in [7] and later improved in [35].We also note that another lower bound on the conditional mutual information in terms of a distance between certain operators derived from ρ ABC has recently been stated in [57].This bound is based on a novel monotonicity bound for the relative entropy [11].Our work may be used to obtain strengthened versions of some of these results.We are going to illustrate this for the case of squashed entanglement.
Applications.For us, one motivation to study how well the conditional mutual information characterises approximate Markov chains is in the context of device-independent quantum key distribution [2].Another implication, proposed in [55,36], is a novel lower bound on the squashed entanglement of any bipartite state.The bound depends only on the trace distance to the closest k-extendible4 state, and also implies a strong lower bound in terms of the trace distance to the closest separable state (cf.Appendix D for details).
It would be interesting to investigate whether inequality (3) can lead to better quantum de Finetti theorems.In fact, the authors of [8,9] recently gave beautiful proofs of various de Finetti theorems using the conditional mutual information.For the quantum version, they apply an informationally complete measurement to reduce the problem to the classical case, but this comes at the cost of a factor that is exponential in the number of systems.We also believe that inequality (3) will be helpful in proving communication complexity lower bounds via the quantum information complexity [30,29,32,50].
Structure of the proof.The proof of inequality (3) is based on two main ideas, which we discuss in separate sections.The first is the use of one-shot entropy measures [43] to bound the von Neumann relative entropy (Section 2).The second is an extension of the method of de Finetti reductions [44,13,14,45] (Section 3).We use the latter to derive a general tool for evaluating the fidelity of permutation-invariant states (Section 4).The proof of (3) then proceeds in two main steps in which these techniques are applied successively (Section 5).

Typicality bounds on the relative entropy
In this section we are going to derive bounds on the relative entropy that will be used in the proof of Theorem 5.1.The method we use to obtain these bounds is inspired by a recent approach [3] to prove strong subadditivity of the von Neumann entropy (see Eq. 4).The idea there was to first prove strong subadditivity for one-shot entropies [43] and then use typicality or, more precisely, the Asymptotic Equipartition Property [49] to obtain the desired statement for the von Neumann entropy.Here we proceed analogously: we use one-shot versions of the relative entropy (defined in Appendix A) to obtain bounds on the von Neumann relative entropy.
The (von Neumann) relative entropy D(ρ σ) for two non-negative operators ρ and σ is defined as where we set D(ρ σ) = ∞ if the support of ρ is not contained in the support of σ.Our statements also refer to the trace distance.While this distance is often defined for density operators only, we define it here more generally for any non-negative operators ρ and σ by (see Section 3.2 of [48]).Note that the second term is zero if ρ and σ are both density operators.We also remark that the trace distance may be rewritten as where Y + and Y − are the positive and negative parts of ρ − σ, i.e., ρ − σ = Y + − Y − with Y + ≥ 0, Y − ≥ 0, and tr(Y + Y − ) = 0.It follows that we can write ∆ as One can easily see from this expression that for any trace non-increasing completely positive map W we have Our first lemma provides an upper bound on the relative entropy in terms of sequences of operators that satisfy an operator inequality.
Lemma 2.1.Let ρ be a density operator, let σ be a non-negative operator, and let { ρn } n∈N be a sequence of non-negative operators such that for some s ∈ R ρn ≤ 2 sn σ ⊗n (∀n ∈ N) and Then D(ρ σ) ≤ s.
Proof.By assumption, there exist c < 1 and n 0 ∈ N such that holds for all n ≥ n 0 .Let ǫ ∈ (c, 1).By Lemma A.5 we have where The claim then follows from the Asymptotic Equipartition Property of D ǫ H (• •) (Lemma A.8).The following lemma is in some sense a converse of Lemma 2.1.Lemma 2.2.Let ρ be a density operator, let σ be non-negative operator, and let s > D(ρ σ).Then there exists κ > 0 and a sequence of non-negative operators { ρn } n∈N with tr(ρ n ) ≤ 1 such that ρn ≤ 2 sn σ ⊗n (∀n ∈ N) and lim n→∞ Proof.The proof uses the smooth relative max-entropy D ǫ max (• •) defined in Appendix A. The Asymptotic Equipartition Property for this entropy measure (Lemma A.7) asserts that there exists n 0 ∈ N such that for any n ≥ n 0 D ǫn max (ρ ⊗n σ ⊗n ) < ns (26) for ǫ n > 0 chosen such that where c is independent of n.Inserting this into the definition of D ǫ max (• •) we find that there exists a non-negative operator ρn with tr(ρ n ) ≤ 1 such that ρn ≤ 2 sn σ ⊗n (28) and Eq. ( 27) may be rewritten as Inserting this in (29) and using Lemma B.1, we conclude that This proves (25) for any κ < κ ′ /2.(Note that for n < n 0 we may simply set ρn = 0 so that the left hand side of (25) holds for all n ∈ N.) The next lemma asserts that the relative entropy, evaluated for n-fold product states, has the following stability property: if one acts with the same trace non-increasing map on the two arguments then the relative entropy cannot substantially increase.This property is used in the proof of Theorem 5.1 (but see also Remark 2.4).
Lemma 2.3.Let ρ be a density operator, let σ be a non-negative operator on the same space, and let {W n } n∈N be a sequence of trace non-increasing completely positive maps on the n-fold tensor product of this space.If tr(W n (ρ ⊗n )) decreases less than exponentially in n, i.e., lim inf n→∞ e ξn tr W n (ρ ⊗n ) > 0 (32) for any ξ > 0, then Proof.Let δ > 0. Lemma 2.2 tells us that there exists κ > 0 and a sequence of non-negative operators and To abbreviate notation, we define r n = 1/tr(W n (ρ ⊗n )).Note that, by assumption, r n grows less than exponentially in n, so that holds for n sufficiently large.Let now k, n ∈ N and set m = kn.Applying W n and multiplying with the factor r n on the two sides of (34) yields As W n is trace non-increasing, where the first inequality follows from the monotonicity property of the trace distance ( 20), the second inequality follows from (36), and where the final equality follows from (35).We can now apply Lemma 2.1 to the density operator r n W n (ρ ⊗n ) and the non-negative operator r n W n (σ ⊗n ), which gives Noting that multiplying both arguments of the relative entropy with the same factor leaves it unchanged we conclude Taking the limit n → ∞ and noting that δ > 0 was arbitrary, the claim of the lemma follows.
Remark 2.4.Lemma 2.3 will be used in one of the steps of the proof of Theorem 5.1.We note that, alternatively, this step may also be based on the inequality (cf.Lemma 25 of [4]) which holds for any density operator ρ, any non-negative operator σ, any trace non-increasing completely positive map W, and ǫ = 1 − tr W(ρ) (see also Footnote 7).However, Lemma 2.3 provides a stronger stability condition for the relative entropy of product states (notably when ǫ ≫ 0) and may therefore be useful for generalisations of our results.
As a corollary of Lemma 2.3 we also obtain the well known Asymptotic Equipartition Property (see, e.g., Chapter 3 of [17]).We state it here explicitly as Lemma 2.5 because we are going to use it within the proof of Theorem 5.1 and because it illustrates the use of Lemma 2.3.Lemma 2.5.Let ρ be a density operator.For any n ∈ N let ρ ⊗n = s∈Sn s Π s where S n is the set of eigenvalues of ρ ⊗n and where Π s , for s ∈ S n , is the projector onto the corresponding eigenspace.Furthermore, for any δ > 0, let S δ n be the subset of S n defined by Then lim n→∞ s∈S δ n tr(Π s ρ ⊗n ) = 1 and the convergence is exponentially fast in n.
Proof.Let n ∈ N and consider the projector }.An explicit evaluation of the relative entropy shows that Using and defining the map If we now assume, by contradiction, that tr(ρ ⊗n Π+ n ) = tr(W + n (ρ ⊗n )) decreases less than exponentially fast in n, Lemma 2.3 tells us that lim sup This is obviously in contradiction to (45) and thus proves that tr(ρ ⊗n Π+ n ) decreases exponentially fast in n.
Similarly, we may consider the projector Π− Here, instead of ( 44), we use that for any purification We may choose the purification such that ) and inserting (47) we obtain the bound Assume now, by contradiction, that tr(ρ ⊗n ) decreases less then exponentially fast in n.Then the last term of (49) approaches 0 in the limit of large n.In particular, we have which contradicts the statement of Lemma 2.3.We have thus shown that both tr(ρ ⊗n Π− n ) and tr(ρ ⊗n Π+ n ) decrease exponentially fast in n.The claim of the lemma follows because s∈S δ n We conclude this section with a remark that is going to be useful for our proof of Theorem 5.1.
Remark 2.6.Considering the decomposition ρ = r∈R rπ r , it is easy to see that all eigenvalues of ρ ⊗n have the form r∈R r nr , where (n r ) r∈R are partitions of n, i.e., elements from the set Hence, the set S n of eigenvalues of ρ ⊗n used within Lemma 2.5 has size at most is the number of different eigenvalues of ρ, we can upper bound the size of S n by 3 Generalised de Finetti reduction The main result of this section, stated as Lemma 3.1, is motivated by a variant of the method of de Finetti reductions proposed in [14].(This variant is also known as postselection technique; we refer to [45] for a not too technical presentation.)De Finetti reductions are generally used to study states on n-fold product systems S ⊗n that are invariant under permutations of the subsystems [44,13].More precisely, the idea is to reduce the analysis of any density operator ρ S n in the symmetric subspace Sym n (S) of S ⊗n to the -generally simpler -analysis of states of the form σ ⊗n S , where σ S is pure.We extend this method to the case where S = D ⊗ E is a bipartite space and where the marginal of ρ S n = ρ D n E n on D ⊗n is known to have the form for some given state σ D on D. Lemma 3.1 implies that, in this case, the analysis can be reduced to states of the form σ ⊗n DE , where σ DE is a purification of σ D .(We note that a similar extension has been proposed earlier for another variant of the de Finetti reduction method; see Remark 4.3.3 of [43].)Lemma 3.1 will play a central role for the derivation of the claims of Section 4 below.Its proof uses concepts from representation theory, which are presented in Appendix C. Lemma 3.1.Let D and E be Hilbert spaces and let σ D be a non-negative operator on D. Then there exists a probability measure dφ on the set of all purifications |φ φ| DE of σ D such that holds for any n ∈ N, any permutation-invariant purification ρ D n E n of σ ⊗n D , and For the following argument, we assume without loss of generality that d = dim(D) = dim(E), and that σ D has full rank and is therefore invertible on D. (If this is not the case one may embed the smaller space in one of dimension d and replace σ D by σ D + ǫ id D for ǫ > 0. The claim is then obtained in the limit ǫ → 0.) We define where {|d i D } i and {|e i E } i are orthonormal bases of D and E, respectively.Let now where dU is the Haar probability measure on the group of unitaries on E. Because (σ is obviously of the form for some suitably chosen measure dφ on the set of purifications |φ φ| DE of σ D .It therefore suffices to show that We do this by analysing the structure of T D n E n .For this we employ the Schur-Weyl duality, which equips the product space (D ⊗E) ⊗n with a convenient structure (see Appendix C).Specifically, according to Lemma C.1, the vector |θ ⊗n DE decomposes as where, for each Young diagram λ, for orthonormal bases , respectively, and The latter may always be written in the Schmidt decomposition as where {|u j U D,λ } j and {|ū j U E,λ } j are orthonormal bases of of U D,λ and U E,λ , respectively, and α λ,j are appropriately chosen coefficients, which we assume without loss of generality to be real.The marginal of |θ θ| ⊗n DE on D ⊗n ∼ = D,λ U D,λ ⊗ V D,λ is equal to the identity and can thus be written as Comparing this to (60) shows that all coefficients α λ,j in (62) must be equal to 1, i.e., Note also that, according to the Schur-Weyl duality (see, e.g., Theorem 1.10 of [12]), U ⊗n E acts on We may therefore write where T λ,λ ′ ,j,j ′ is the homomorphism between U E,λ ′ and U E,λ defined by Since this operator manifestly commutes with the action of the unitary, Schur's lemma (see, e.g., Lemma 0.8 of [12]), together with the fact that U E,λ and U E,λ ′ are inequivalent for λ = λ ′ , implies that it has the form for appropriately chosen coefficients µ λ,j,j ′ .Inserting this in (66) gives Because the marginal of T D n E n on D ⊗n , must be equal to the marginal of |θ θ| ⊗n DE , we conclude from (63) that µ λ,j,j ′ = 1 dim(U λ ) δ j,j ′ .Hence, where |ψ λ V E,λ is normalised.Defining the invertible operator we have Note that κ D n commutes with any permutation, because, according to the Schur-Weyl duality, permutations act like λ id U λ ⊗ V λ (π) on the decomposition of D ⊗n .Consequently, because the support of T D n E n is contained in the symmetric subspace Sym n (D ⊗ E), the same must hold for S D n E n .Furthermore, for any vector |Ω ∈ Sym n (D ⊗ E), it follows from its representation according to Lemma C.1 that Consider now the operator Since the support of ρ D n E n is contained in Sym n (D ⊗ E), the same must hold for Q D n E n and we find This, in turn, implies that To conclude the proof of (59), we note that ρ

Fidelity between permutation-invariant operators
The purpose of this section is to provide techniques to approximate the fidelity of permutation-invariant states.They play a key role in the proof of Theorem 5.1.The derivation of the statements below is based on the generalised de Finetti reduction method introduced in Section 3. Furthermore, we will use several established facts about the fidelity, which are summarised in Appendix B. where (That such a purification exists is the statement of Lemma B.10.We also note that, if ρ D n E n is already pure, then R can be chosen to be the trivial space C, i.e., dim(R) = 1.)According to Lemma B.11, there exists a permutation-invariant purification Let Γ be the set of vectors |φ DER on D ⊗ E ⊗ R such that tr ER (|φ φ| DER ) = σ D .According to Lemma 3.1 there exists a probability measure dφ on Γ such that where 2 .Using this we find We now set σ DER = |φ φ| DER , where |φ DER ∈ Γ is a vector that maximises the above expression.Note that, by the definition of the set Γ, σ DE is then a valid extension of the given operator σ D .(Furthermore, if R is the trivial space C then σ DE is pure.)Combining (80) with (82), and using the monotonicity of the fidelity under the partial trace (Lemma B.4), we conclude that Lemma 4.2.Let ρ R n S n be a permutation-invariant non-negative operator on (R ⊗ S) ⊗n and let σ RS be a non-negative operator on R ⊗ S. Furthermore, let W R n be a permutation-invariant operator on R ⊗n with W R n ∞ ≤ 1.Then there exists a unitary U R on R such that5 Hence, according to Lemma 4.1, there exists a purification σ RSE of σ RS such that where . We then use Lemma B.8 which asserts that Furthermore, again by Lemma 4.1, there exists a purification σRSE of σ SE such that where 2 .Because all purifications are unitarily equivalent, there exists a unitary Because the fidelity is non-decreasing under the partial trace (cf.Lemma B.4), we also have Combining all these equations, we obtain the desired claim.To see this, let R be a system that is isomorphic to R and let C be the isometry from R to R ⊗ R defined by It is straightforward to verify that, for W R n diagonal in the product basis {|r 1 ⊗ • • • ⊗ |r n } r1,...rn , we have Let now ρ R n S n E n and σ RSE be pure operators such that (85) holds.Furthermore, define ρR n Rn S n E n = C ⊗n ρ R n S n E n (C † ) ⊗n and σR RSE = Cσ RSE C † .Using Lemma B.6 we find Furthermore, we can carry out the proof steps as in ( 86) and (87), while keeping the system R⊗n , to obtain for some purification σR RSE of σ RSE .Because σR RSE is pure there must exist a unitary ŪR on R such that ŪR σR RSE Ū † R = σR RSE .Using this and the fact that the fidelity is non-decreasing under the partial trace we find Finally, by the definition of ρR n Rn S n E n and σR RSE , and using Lemma B.6 we have where Combining this with (85), ( 92), (93), and (94) we obtain again inequality (84).Furthermore, by construction, U R is diagonal in the basis {|r } r and satisfies Remark 4.4.If ρ R n S n has product form ρ ⊗n RS , the statement of Lemma 4.2 can be rewritten as Hence, for a family {W R n } n∈N of permutation-invariant non-negative operators such that W R n ∞ ≤ 1 we have sup 5 Main result and proof Theorem 5.1.For any density operator ρ ABC on A ⊗ B ⊗ C, where A, B, and C are separable Hilbert spaces, there exists a trace-preserving completely positive map T B→BC from the space of operators on B to the space of operators on B ⊗ C such that6 Furthermore, if A, B, and C are finite-dimensional then T B→BC has the form on the support of ρ B , where U B and V BC are unitaries on B and B ⊗ C, respectively.
Proof.We first note that, by Remark 5.2 below, it is sufficient to prove the statement for the case where A, B, and C are finite-dimensional.Let δ > 0, δ ′ > 0, and δ ′′ > 0. Let n ∈ N and let {Π b } b∈ Bn and {Π d } d∈ Dn be the families of projectors onto the eigenspaces of ρ ⊗n B and ρ ⊗n BC , labelled by their eigenvalues b ∈ Bn and d ∈ Dn , respectively.Furthermore, let Bδ ′ n and Dδ ′′ n be the subsets of Bn and Dn defined by Lemma 2.5 and define Note that for any η > 0 we have for n sufficiently large.Define the mapping on (A ⊗ B ⊗ C) ⊗n as well as the abbreviation It is easily seen that the map W n is trace non-increasing and completely positive.Furthermore, because of (101), we always have tr(W n (ρ ⊗n ABC )) = tr(Γ A n B n C n ) > 2/3 for η sufficiently small (using the gentle measurement lemma, see [53] for instance).Lemma 2.3 then tells us that, for n sufficiently large,7 where the last equality is the definition of the conditional entropy, H(C|AB) = −tr(ρ ABC log 2 ρ ABC ) + tr(ρ AB log 2 ρ AB ) = −D(ρ ABC ρ AB ⊗ id C ).The relation between the fidelity and the relative entropy (Lemma B.2) now allows us to conclude that We now use Lemma B.6 to remove the projector Π B n C n from the second argument and note that the factor tr(Γ A n B n C n ) > 2/3 can be absorbed by another factor 2 − 1 4 nδ for n sufficiently large.This shows that Because b∈ Bn Π b = id B n we can apply Lemma B.7, which gives where the equality follows from Hence, there exists b ∈ Bδ ′ n such that where the second inequality follows from Remark 2.6.By the definition of Π b we also have where b is the eigenvalue of ρ ⊗n B corresponding to Π b .By the definition of Bδ ′ n we also have and, hence, where the equality follows from Lemma B.6, which we will use repeatedly in the following.Furthermore, by Lemma 4.2, there must exist a unitary U B on B such that Combining now (106), ( 109), (111), and (112) we obtain 2 where B .Next we use that d∈ Dn Π d = id B n C n and apply again Lemma B.7 to obtain where | Dδ ′′ n | ≤ poly(n) by Remark 2.6.Hence, there exists d ∈ Dδ ′′ n such that By the definition of Dδ ′′ n we have with d ≥ 2 −n(H(BC)+δ ′′ ) .This implies We use again Lemma 4.2, which asserts that there must exist a unitary V BC on B ⊗ C such that BC ) ⊗n γ ⊗n ABC (ρ BC ) ⊗n γ ⊗n ABC (ρ Combining this with (113), ( 115) and (117) yields We take the nth root, use H(BC) − H(B) − H(C|AB) = I(A : C|B), and insert the expression for γ ABC to rewrite this as Because this holds for all sufficiently large n ∈ N and because n poly(n) approaches 1 for n large, we conclude that Inequality (98) now follows because δ > 0, δ ′ > 0, and δ ′′ > 0 were arbitrary.It remains to verify that the map T B→BC is trace-preserving.But this follows from the observation that Remark 5.2.Any proof of the main claim of Theorem 5.1, which uses the assumption that A, B, and C are finite-dimensional Hilbert spaces, implies that the claim also holds under the less restrictive assumption that these spaces are separable.
To see this, let {P k A } k∈N , {P k B } k∈N , and {P k C } kC ∈N be sequences of finite-rank projectors on A, B, and C which converge to id A , id B , and id C , respectively, with respect to the weak (and, hence, also the strong) operator topology (see, e.g., Definition 2 of [23]).Define furthermore the density operators and We note that, for any k ∈ N, the sequence {ρ k,k ′ ABC } k ′ ∈N converges to ρ k ABC in the trace-norm (see, e.g., Corollary 2 of [23]).Also, {ρ k ABC } k∈N converges to ρ ABC in the trace norm.Let us first consider the left hand side of (123).Because, for any fixed finite dimension of system A, the conditional mutual information I(A : C|B) ρ = H(A|B) ρ − H(A|BC) ρ is continuous in ρ with respect to the trace norm [1], we have lim for any k ∈ N. In addition, using the fact that local projectors applied to the subsystems A and C can only decrease I(A : C|B) ρ , we find lim sup k→∞ The combination of these statements yields We now consider the right hand side of (123).Let δ > 0 and note that, for sufficiently large k and k ′ , we have Because the trace norm is monotonically non-increasing under trace-preserving completely positive maps, we also have for any T B→BC .Lemma B.9 then implies that But because this holds for any T B→BC , we have sup Because this holds for all δ > 0 and sufficiently large k and k ′ , we find that lim sup k→∞ lim sup To conclude the argument, we observe that if the inequality (123) is valid for finite-dimensional spaces A, B, and C we have in particular lim inf Combining this with (128) and ( 133) then proves the claim that the inequality holds for arbitrary separable spaces A, B, and C.
Remark 5.3.By Remark 4.3, the unitary U B chosen in (112) may be replaced by an operator which commutes with ρ B and satisfies U B ∞ ≤ 1. Analogously, the unitary V BC chosen in (118) may be replaced by an operator of the form V BC = V ′ B V ′′ BC where V ′ B commutes with ρ B and V ′′ BC commutes with ρ BC , and where Similarly to (122) one can see that the resulting recovery map T B→BC is trace non-increasing.Furthermore, we have In particular, we have This implies that one can always choose a recovery map that exactly reproduces the marginal on B and the marginal on C.

Appendices A One-shot relative entropies
In this appendix, we briefly review the generalised relative entropy introduced in [52,20] and the smooth max-relative entropy introduced in [18] (we will use a slightly modified variant defined in [49,48]).
Definition A.1.For any two non-negative operators ρ and σ and for any ǫ ∈ [0, tr(ρ)], the generalised relative entropy is defined by where the optimisation is over operators Q.For ρ a density operator,9 the ǫ-smooth max-relative entropy is defined by where the optimisation is over non-negative operators ρ with tr(ρ) ≤ 1. 10Remark A.2.The second argument, σ, of the two one-shot entropy measures of Definition A.1 may be rescaled easily because holds for any λ > 0.
The generalised relative entropy may be expressed equivalently as follows.
Lemma A.3.For any two non-negative operators ρ and σ, where the optimisation is over operators Y and reals µ.
Proof.As shown in [20], the expression on the right hand side of ( 137) is a semidefinite program whose dual has the form where the optimisation is over operators X and reals µ.Replacing X by µY we can rewrite this as To conclude the proof, we note that the condition µY ≥ 0 can be replaced by Y ≥ 0 because µ ≥ 0 and because for µ = 0 the value of Y is irrelevant.
Remark A.4.It is obvious from this representation that D ǫ H (ρ σ) is a monotonically non-increasing function in ǫ.
The following lemma provides an upper bound on D ǫ H (ρ σ), expressed in terms of the trace distance of ρ to an operator ρ that satisfies a simple operator inequality.
Although we are not using this for our argument, we note that Lemma A.5 can be extended to a relation between the smooth relative max-entropy and the generalised relative entropy.
Lemma A.6.Let ρ be a density operator, let σ be a non-negative operator, and let ǫ > ǫ ′ ≥ 0. Then Proof.Let µ = 2 −D ǫ ′ max (ρ σ) and let ρ be such that the expression on the right hand side of ( 138) is satisfied for D ǫ ′ max (ρ σ).That is, we have ρ ≤ σ/µ as well as 1 − F (ρ, ρ) 2 ≤ ǫ ′ , which, by Lemma B.1, implies ∆(ρ, ρ) ≤ ǫ ′ .Hence, by Lemma A.5, The following claim about the smooth relative max-entropy for product states is known as the Quantum Asymptotic Equipartition Property [49].(Note that this is a strictly more general statement than the "classical" Asymptotic Equipartition Property stated as Lemma 2.5.)While the proof in [49] applies to the case where σ is a density operator, the slightly extended claim provided here follows directly from Remark A.2 (see also Footnote 9 of [49] as well as Chapter 6 of [48]).
Lemma A.7.For any density operator ρ, for any non-negative operator σ, for any ǫ ∈ (0, 1), and for sufficiently large n ∈ N, where c = c(ρ, σ) is independent of n and ǫ.
Because of Lemma A.6, almost the same upper bound also holds for D ǫ H (• •).In fact, as a consequence of the Quantum Stein's Lemma [27,41], the statement holds asymptotically with equality [20].
Lemma A.8. Let ρ be a density operator, let σ be a non-negative operator, and let ǫ ∈ (0, 1).Then B General facts about the fidelity In the literature, the definition and discussion of the fidelity F (ρ, σ) is often restricted to the case where its arguments, ρ and σ, are density operators (see, e.g., Chapter 9 of [40]).In this work, however, we need the fidelity for general non-negative operators.Recall that we defined F (ρ, σ) = √ ρ √ σ 1 .Fortunately, most established properties of the fidelity are still valid in this more general case.For completeness, we state them in the following.
Lemma B.1.For any two non-negative operators ρ and σ with tr(ρ) ≥ tr(σ), the trace distance is upper bounded by The following lemma relates the relative entropy to the fidelity (see also Section 5.4 of [25]).
Lemma B.2.For any non-negative operators ρ and σ Proof.Let D α (• •) be the α-Quantum Rényi Divergence as defined in [38,54].As shown in these papers, for α = 1 it is identical to the relative entropy, i.e., For α = 1/2, it is related to the fidelity via Finally, α → D α (ρ σ) is a monotonically non-decreasing function in α.Combining these statements, we find Next we recall a statement that is known as Uhlmann's theorem [51].
where the maximisation is over all unitaries U R on R.
The following lemma is a direct consequence of Lemma B.3.It asserts that the fidelity is monotonically non-decreasing when a partial trace is applied to both arguments.Lemma B.4.For any two non-negative operators ρ DE and σ DE we have Using the Stinespring dilation theorem, the statement can be brought into the following more general form.
Lemma B.5.For any trace-preserving completely positive map T we have The next few claims allow us to keep track of the change of the fidelity when we apply operators to its arguments.Lemma B.6.For any non-negative operators ρ and σ and any operator W on the same space we have where the maximisation is taken over the set of unitaries U R on R.
Lemma B.7.Let ρ and σ be non-negative operators and let {W d } d∈D be a family of operators such that Proof.Let |ψ ψ| DR and |φ φ| DR be purifications of ρ D = ρ and σ D = σ, respectively.By Uhlmann's theorem (Lemma B.3), there exists a unitary U R on R such that The assertion follows because, again by Uhlmann's theorem, holds for any d ∈ D.
Lemma B.8.Let ρ DE and σ DE be non-negative operators on D⊗E and let W E be a trace non-increasing completely positive map on E. Then Proof.Let X E → e W e XW † e be an operator-sum representation of W E .The second argument of the fidelity on the left hand side of (164), σ ′ DE = (I D ⊗ W E )(σ DE ), may then be written as Because, by assumption, e W † e W e ≤ id E , we have Together with the fact that the square root is operator monotone (cf.Theorem V.1.9 of [6]), this implies The claim then follows from Lemma B.4, which asserts that F (ρ DE , σ ′ DE ) ≤ F (ρ D , σ ′ D ).We also recall that the fidelity is continuous in its arguments with respect to the trace norm.
Finally, we provide a lemma (Lemma B.11) that simplifies the evaluation of the fidelity between permutation-invariant operators.It may be seen as a generalisation of a known result on symmetric purifications, which we state as Lemma B.10 (see, e.g., Lemma II.5 of [13] for a proof).Specifically, Lemma B.11 may be seen as a combination of this result and Uhlmann's theorem (Lemma B.3).We also note that the lemma may be generalised to other symmetry groups (other than the symmetric group).
Lemma B.10.For any permutation-invariant operator ρ D n on D ⊗n and any space R with dim(R) ≥ dim(D) there exists a permutation-invariant purification ρ D n R n on (D ⊗ R) ⊗n .Lemma B.11.Let ρ D n and σ D n be permutation-invariant non-negative operators on D ⊗n and let ρ D n R n be a permutation-invariant purification of ρ D n .Then there exists a permutation-invariant purification Proof.The proof of this lemma essentially follows the lines of the standard proof of Uhlmann's theorem (see, e.g., Chapter 9 of [40]), while keeping track of the permutation invariance the relevant operators.
For the following, we assume without loss of generality that ρ D n and σ D n are invertible.(The claim for the cases where this assumption does not hold may be obtained by considering the operators ρ D n + ǫ id D n and σ D n + ǫ id D n for ǫ > 0 and then taking the limit ǫ → 0.) Let |Ψ D n R n be a vector in (D ⊗ R) ⊗n such that ρ D n R n = |Ψ Ψ| DR and define where {|d x D n } x and {|r x R n } x are orthonormal bases of D ⊗n and R ⊗n , respectively.Let U D n be the unitary operator in the left polar decomposition of where is non-negative.We now define the purification σ It is readily verified that this is indeed a purification of σ D n .The fidelity between the purifications is given by Exploiting now the particular form (177) of |Ω as well as (178), this can be rewritten as The claim (175) then follows by inserting the explicit expression for Q D n , i.e., tr(Q To verify that σ D n R n is permutation-invariant, we first note that for any permutation-invariant Hermitian operator X on an n-fold product space and for any real function f the operator f (X) is also permutation-invariant.(To see this, consider the decomposition X = i x i Π i , where Π i are the projectors onto the eigenspaces of X and x i are the corresponding eigenvalues.Because [X, π] = 0 for any permutation π, we also have [Π i , π] = 0 for any i.Using now that f (X) = i f (x i )Π i , we conclude that [f (X), π] = 0.) We therefore know, in particular, that [ρ for any permutation π.Furthermore, it follows from the explicit expression for Q D n that this operator is also permutation-invariant.Similarly, since U † D n can be written as it is also permutation-invariant.By assumption, we also have π|Ψ

C On the Schur-Weyl duality
The following lemma follows immediately from the considerations in Chapter 6 of [24] (see, in particular, Eq. 6.25).
Lemma C.1.Let D and E be Hilbert spaces with dim(D) = dim(E) = d and let n ∈ N. Furthermore, let Λ n,d be the set of Young diagrams of size n with at most d rows, and, for any λ ∈ Λ n,d , let U λ and V λ be the corresponding irreducible representations of the unitary group U (d) and the symmetric group S n , respectively, so that, according to the Schur-Weyl duality (see, e.g., Theorem 1.10 of [12]) Then there exists a family {|ψ λ V D,λ V E,λ } λ∈Λ n,d of maximally entangled normalised vectors on V D,λ ⊗V E,λ such that any vector |Ω ∈ Sym n (D ⊗ E) in the symmetric subspace of (D ⊗ E) ⊗n can be decomposed as where {|φ λ U D,λ U E,λ } λ∈Λ n,d is a family of (not necessarily normalised) vectors on U D,λ ⊗ U E,λ .
Proof.According to (186) and (187), the space (D ⊗ E) ⊗n decomposes as Any vector |Ω ∈ (D ⊗ E) ⊗n can therefore always be written as where, for any λ, λ ′ ∈ Λ n,d , {|φ λ,λ ′ ,i U D,λ U E,λ ′ } i and {|ψ λ,λ ′ ,i V D,λ V E,λ ′ } i are families of vectors in U D,λ ⊗ U E,λ ′ and V D,λ ⊗ V E,λ ′ , respectively.For any λ ∈ Λ n,d , let {|v k V D,λ } k and {|v k V E,λ } k be orthonormal bases of V D,λ and V E,λ , respectively, with respect to which the representations of the symmetric group S n are given by the same real-valued matrices.(Such bases always exist, see, e.g., [31].)We then define the maximally entangled vector |ψ λ V D,λ V E,λ on V D,λ ⊗ V E,λ by Now, to prove the claim (188) for any permutation-invariant |Ω , it suffices to show that the vectors on V D,λ ⊗ V E,λ ′ in (190) satisfy for all λ, λ ′ ∈ Λ n,d and for all i.
For any permutation π, let V λ (π) be its action on the irreducible space V λ .Using that, by definition, the matrix elements v k |V D,λ (π)|v k ′ = vk |V E,λ (π)|v k ′ are real-valued, it is easily verified that Furthermore, the Schur-Weyl duality (cf.Theorem 1.10 of [12]) states that π acts on λ U λ ⊗ V λ as Using this and that the vector |Ω is by assumption invariant under the action of π, we find that holds for all λ, λ ′ ∈ Λ n,d and for all i for which the corresponding term in the sum (190) is nonzero.
Define now the density operator where the sum ranges over all permutations of {1, . . ., k}.The density operator ω AC = ωAC1 is then k-extendible by construction.Using the convexity of the trace distance we find ∆(ρ AC , ω AC ) = ∆(ρ AC , ωAC1 ) ≤ Inserting now the bound (208) we conclude that The claim of the theorem follows because the above holds for any extension ρ ACE of ρ AC .
We remark that the bound provided by Theorem D.1 does not depend on the dimension of the two subsystems A and C. As mentioned above, this yields a quantitative claim on the faithfulness of squashed entanglement, which we formulate as Corollary D.3 below.Its proof uses the following statement about the distance of k-extendible states from the set of separable states, which we denote by S A:C .
Corollary D.3 quantifies the faithfulness of squashed entanglement in terms of the trace norm.Compared to previously known versions of this claim [7], only the dimension of one the two subsystems enters as a factor in the bound (213).(Because of the symmetry of the other involved quantities, one can always choose the lower-dimensional one.)We also note that the example of the totally antisymmetric state on A ⊗ C with dim A = dim C = d shows that such a factor is necessary.Indeed, the squashed entanglement of this state is of the order O(1/d) [15] whereas its trace distance to the closest separable state cannot be smaller than 1  4 .(To see this, note that for any product state σ A ⊗ σ C we have tr(σ A ⊗ σ C Π as ) ≤ 1 2 where Π as denotes the projector onto the antisymmetric subspace of A ⊗ C.) 12 The bound can also be obtained directly from Theorem II.7 ′ of [13], however with a quadratic dependence on dim C.

Lemma 4 . 1 .
Let ρ D n E n be a permutation-invariant non-negative operator on (D ⊗ E) ⊗n and let σ D be a non-negative operator on D. Then there exists a non-negative extension σ DE of σ D on D ⊗ E such that

Remark 4 . 3 .
If for some orthonormal basis {|r } r of R the operator W R n is diagonal in the corresponding product basis {|r 1 ⊗ • • • ⊗ |r n } r1,...rn then inequality (84) also holds for an operator U R which is diagonal in the basis {|r } r and satisfies U R ∞ ≤ 1, and for d = dim(R) 2 dim(S) 2 .