Necessary criterion for approximate recoverability

A tripartite state $\rho_{ABC}$ forms a Markov chain if there exists a recovery map $\mathcal{R}_{B \to BC}$ acting only on the $B$-part that perfectly reconstructs $\rho_{ABC}$ from $\rho_{AB}$. To achieve an approximate reconstruction, it suffices that the conditional mutual information $I(A:C|B)_{\rho}$ is small, as shown recently. Here we ask what conditions are necessary for approximate state reconstruction. This is answered by a lower bound on the relative entropy between $\rho_{ABC}$ and the recovered state $\mathcal{R}_{B\to BC}(\rho_{AB})$. The bound consists of the conditional mutual information and an entropic correction term that quantifies the disturbance of the $B$-part by the recovery map.


Introduction
A recovery map is a trace-preserving completely positive map that reconstructs parts of a composite system. More precisely, for a tripartite state ρ ABC on A ⊗ B ⊗ C we can consider a recovery map R B→BC from B to B ⊗ C that reconstructs the C-part from the B-part only. If such a reconstruction is perfectly possible, i.e., if we call ρ ABC a (quantum) Markov chain in order A ↔ B ↔ C. 1 The structure of Markov chains is well understood. A state ρ ABC is a Markov chain if and only if there exists a decomposition of the B system as B = ⊕ j (b L j ⊗ b R j ) such that with states ρ Ab L j on A⊗b L j , ρ b R j C on b R j ⊗C, and a probability distribution P [16]. Alternatively, a state ρ ABC is a Markov chain if and only if its conditional mutual information I(A : C|B) ρ vanishes [26,27] recovers perfectly, i.e., (1) holds with R B→BC = T B→BC .
Tripartite states ρ ABC that have a small conditional mutual information are called approximate Markov chains. The justification for this terminology is a recent result [15] proving that for any state ρ ABC there exists a recovery map R B→BC such that where F (τ, ω) := √ τ √ ω 2 1 denotes the fidelity between τ and ω. 2 Inequality (4) shows that the Markov property (1) approximately holds whenever the conditional mutual information is small. However, there exist tripartite states with a small conditional mutual information whose distance to any Markov chain is nevertheless large [10,17]. As a consequence, approximate quantum Markov chains are not necessarily close to quantum Markov chains. We refer to Appendix A for a more detailed explanation of this phenomenon.
Inequality (4) has been refined in a series of works [8,5,31,36,32,18,30]. The most recent (and strongest) result [30,Theorem 4.1] shows that for any state ρ ABC we have for the explicit recovery map and the probability measure β 0 (dt) := π 2 (cosh(πt) + 1) −1 dt (7) on R. D M denotes the measured relative entropy, which is defined as where M is the set of all classical-quantum channels M (ω) = x (trM x ω)|x x| with {M x } a positive operator valued measure (POVM) and {|x } an orthonormal basis. A simple property of the measured relative entropy ensures that D M (τ ω) ≥ − log F (τ, ω) for all states τ, ω [31, Inequality (16)] which shows that (5) implies (4). We further note that the recovery map P B→BC given in (6) is universal in the sense that it only depends on ρ BC and it satisfies P B→BC (ρ B ) = ρ BC . Inequality (4) shows that there always exists a recovery map whose recovery quality (measured in terms of the logarithm of the fidelity) is of the order of the conditional mutual information. This shows that a small conditional mutual information is a sufficient condition for a state to be approximately recoverable. In other words, (4) gives an entropic characterization for the set of tripartite states that can be approximately recovered.
In this work, we are interested in an opposite statement. This corresponds to an inequality that bounds the distance between ρ ABC and any reconstructed state R B→BC (ρ AB ) from below with an entropic functional of ρ ABC and the recovery map R B→BC that involves the conditional mutual information. Such an inequality is the converse to (4), and gives a necessary condition for approximate recoverability.

Main result
For any trace-preserving completely positive map E on a system S we denote by Inv(E) the set of density operators τ on S which are left invariant under the action of E, i.e., We may now quantify the deviation of any state ρ from the set Inv(E) by where D max (ω σ) := inf{λ ∈ R : ω ≤ 2 λ σ} denotes the the max-relative entropy. The Λ max -quantity has the property that it is zero if and only if E leaves ρ invariant 3 , i.e., Main result. We prove that for any state ρ ABC on A ⊗ B ⊗ C and any recovery map R B→BC from the B system to the B ⊗ C system we have where D(τ σ) := tr τ log τ − tr τ log σ if supp(τ ) ⊆ supp(σ) and +∞ otherwise denotes the relative entropy, and R B→B := tr C • R B→BC is the action of the recovery map R B→BC on B. We refer to Theorem 3.1 for a more precise statement.
Cases where the Λ max -term vanishes. To interpret the term Λ max in (12), note that the recovery map R B→BC generally not only reads the content of system B in order to generate C, but also disturbs it. Λ max quantifies the amount of this disturbance of B, taking system A as a reference. In particular, Λ max (ρ AB R B→B ) = 0 if R B→BC is "read only" on B, i.e., if ρ AB = R B→B (ρ AB ). Inequality (12) then simplifies to We further note that in case R B→BC is a recovery map that is "read only" on B its output state where the two inequality steps follow from the data-processing inequality [19,20] applied for R B→BC and tr C , respectively and hence I(A : C|B) σ = H(A|B) σ − H(A|BC) σ = 0.

Tightness of the main result
We next discuss several aspects concerning the tightness of (12). This will also give a better understanding about the role of the Λ max -term. We first note that by combining (5) with (12) we obtain where the recovery map P B→BC on the left-hand side is given by (6) and the infimum is over all recovery maps R B→BC that map B to B ⊗ C. The main difference between the lower and upper bound for the conditional mutual information given by (15) and (16), respectively, is the Λ max -term. In the following we will discuss the necessity and optimality of this term.
Classical case. Inequalities (15) and (16) hold with equality in case ρ ABC is a classical state, i.e., for some probability distribution P ABC . To see this, we first note that if ρ ABC is classical (in which case ρ ABC and all its marginals commute pairwise) a straightforward calculation gives for the Petz recovery map T B→BC defined in (3). Furthermore, if ρ ABC is classical T B→BC (ρ AB ) = ρ BC ρ −1 B ρ AB . We further see that tr C T B→BC (ρ AB ) = T B→B (ρ AB ) = ρ AB and hence This shows that in the classical case (16) is an equality and that the Petz recovery map T B→BC minimizes the right-hand side of (16). We further note that in the classical case the measured relative entropy coincides with the relative entropy and the rotated Petz recovery map P B→BC that satisfies (15) simplifies to the Petz recovery map T B→BC . This together with (18) then shows that (15) holds with equality in the classical case.
Necessity of the Λ max -term. A natural question regarding (12) is wether the Λ max -term is necessary. Here we show that this is indeed the case by constructing an example proving that a large conditional mutual information does not imply that all recovery maps are bad and hence the Λ max -term is indispensable.
More precisely, in Section 4.1 we construct a generic example showing that for any constant κ < ∞ there exists a classical state ρ ABC (i.e., a state of the form (17)) such that for some recovery map R B→BC that satisfies R B→BC (ρ B ) = ρ BC . A similar construction (also given in Section 4.1) shows that there exists another classical state ρ ABC such that for some recovery map R B→BC that satisfies R B→BC (ρ B ) = ρ BC . These constructions therefore show that the term Λ max (ρ AB R B→B ), which measures the deviation from a "read only" map on B, is necessary to obtain a lower bound on the relative entropy between a state and its reconstruction version. The example has an even stronger implication. It shows that the Λ max -term is necessary even if one tries to bound the the max-relative entropy between a state and its reconstruction version, i.e., D max (ρ ABC R B→BC (ρ AB )), which cannot be smaller than D(ρ ABC R B→BC (ρ AB )), from below. 4 The two strict inequalities (20) and (21) show that the Λ max -term is also necessary if one would allow for swapping the two arguments of the relative (or even max-relative) entropy. Furthermore, restricting the set of recovery maps such that they satisfy R B→BC (ρ B ) = ρ BC still requires the Λ max -term.
Since for classical states (18) holds, these examples also show that for the task of minimizing the relative entropy between ρ ABC and its reconstructed state R B→BC (ρ AB ) the Petz recovery map can be far from being optimal -even in the classical case. The examples further show that considering recovery maps that leave the B system invariant (i.e., they only "read" the B-part) is a considerable restriction. We refer to Section 4.1 for more information about these examples.
Optimality of the Λ max -term. Even in the case where ρ ABC is not classical, (12) is still close to optimal. We present two arguments why this is the case. First, we show that the Λ max -term cannot be replaced by a relative entropy measure that is smaller than the max-relative entropy. More precisely, (12) is violated if the max-relative entropy in the definition of Λ max (ρ AB R B→B ) is replaced with any α-Rényi relative entropy for any α ∈ [ 1 2 , ∞). We refer to Section 4.2 for more information. Second, we show that the Λ max -term cannot be defined as a distance between ρ AB and R B→B (ρ AB ). The Λ max -term in (12) quantifies the (max-relative entropy) distance between ρ AB and its closest state that is invariant under R B→B . A natural question is if (12) remains valid if the Λ max -term is replaced by the (max-relative entropy) distance between ρ AB and R B→B (ρ AB ), i.e., D max (ρ AB R B→B (ρ AB )).
This however is ruled out. To see this we recall that by the example presented above in (20) there exists a tripartite state ρ ABC and a recovery map R B→BC such that The data-processing inequality for the max-relative entropy [13,33] and the fact that the max-relative entropy cannot be smaller than the relative entropy then imply which shows that (12) is no longer valid for the modified Λ max -term described above.

Related results
Using the continuity of the conditional entropy, it is possible to derive an upper bound for the conditional mutual information of a state ρ ABC in terms of its distance to any reconstructed state σ ABC := R B→BC (ρ AB ), where R B→BC denotes an arbitrary recovery map [4,15]. This leads to a lower bound on the relative entropy between ρ ABC and R B→BC (ρ AB ) that however depends on the dimension of the A system. To see this, denote the binary entropy function and let ∆(τ, ω) := 1 2 τ − ω 1 be the trace distance between τ and ω. The data-processing inequality [19,20] implies that By the improved Alicki-Fannes inequality [38, Lemma 2] we find where we used that x for all x ∈ [0, 1] and ∆(ρ, σ) ∈ [0, 1]. Together with Pinsker's inequality [28,12] this gives The fact that this bound explicitly depends on the dimension of the system A is unsatisfactory. Furthermore, the example discussed above in (20) shows that such a dependence on the dimension is unavoidable. A different approach to derive an upper bound for the conditional mutual information of a state ρ ABC in terms of its distance to a reconstructed state R B→BC (ρ AB ) was taken in [14, Theorem 11 and Remark 12] (see also [30,Proposition F.1]). It was shown that for any state ρ ABC where β 0 and P

[t]
B→BC are given in (7) and (6), respectively andD 2 (τ ω) := log tr τ 2 ω −1 denotes Petz' Rényi relative entropy of order 2 [25]. The examples discussed above imply that the left-hand side of (28) can be much larger than the relative entropy between ρ ABC and R B→BC (ρ AB ) for the optimal recovery map R B→BC . In other words, rotated Petz recovery maps are generally far from optimal recovery maps.

One-shot relative entropies
The goal of this section is to derive a triangle-like inequality for the relative entropy (see Lemma 2.1) which will be used in the proof of our main result, i.e., Theorem 3.1. To understand Lemma 2.1 we need to review a few properties of one-shot relative entropy measures.

Preliminaries
Let S(A) and P(A) denote the set of density and nonnegative operators on A, respectively. For any linear operator L on A, the trace norm is given by L 1 := tr|L| with |L| := √ L † L. For ρ, σ ∈ P(A) we write ρ ≪ σ if the support of ρ is contained in the support of σ. Within this document our Hilbert spaces are assumed to be separable. We define the min-relative entropy [29] as and the max-relative entropy [13,29] as As the names suggest, the min-relative entropy cannot be larger than the max-relative entropy, or more precisely we have with strict inequalities in the generic case [23,33]. The max-relative entropy turns out to be the largest relative entropy measure that satisfies the data-processing inequality and is additive under tensor products [33,Section 4.2.4].
The following remarks show that Lemma 2.1 is optimal and that there is not much flexibility to prove triangle-like inequalities for the relative entropy different than (35).
Remark 2.2. Lemma 2.1 is optimal in the sense that (35) is no longer valid if D max is replaced with D α for any α ∈ [ 1 2 , ∞). To see this, let p ∈ (0, 1) and consider three classical distributions on {0, 1} × {0, 1} defined by A simple calculation shows that Choosing p = 1 − 2 −α reveals that In the limit α → ∞ the strict inequality (41) becomes an equality.
Remark 2.3. The statement of Lemma 2.1 is no longer true if the max-relative entropy and the relative entropy on the right-hand side of (35) are exchanged. To see this consider the three classical binary probability distributions with p, ε ∈ (0, 1). This gives For p = 7 8 and ε = 1 8 we find that This shows that it is crucial which term in Lemma 2.1 carries a max-relative entropy.

Main result and proof
Theorem 3.1. Let A, B, and C be separable Hilbert spaces, let ρ ABC ∈ S(A⊗ B ⊗ C), and let R B→BC be a trace-preserving completely positive map from B to B ⊗ C. Then The quantity Λ max (ρ AB R B→B ) is defined in (10) and R B := tr C •R B→BC . To prove the assertion of Theorem 3.1 we make use of a known lemma stating that the conditional mutual information of a tripartite density operator is bounded from above by the smallest relative entropy distance to Markov chains. Let MC(A ⊗ B ⊗ C) denote the set of Markov chains on A ⊗ B ⊗ C.
Proof of Lemma 3.2. We provide a proof for the known statement of Lemma 3.2 since the original proof given in [17,Theorem 4] has been simplified considerably (see the short note after the acknowledgements in [17]). The resulting proof is elegant and leads to a good understanding of the assertion of the lemma. By definition of the relative entropy and the conditional mutual information we find for all ρ ABC ∈ S(A ⊗ B ⊗ C) and all µ ABC ∈ MC(A ⊗ B ⊗ C) where Using the structure of Markov chains given by (2) we have that . A straightforward calculation then shows that ν = 0. The data-processing inequality [21,35] ensures that D(ρ B µ B ) ≤ D(ρ AB µ AB ) and the nonnegativity of the relative entropy (follows by Klein's inequality) guarantees that D(ρ BC µ BC ) ≥ 0. This then completes the proof.
In order to prove Theorem 3.1 we need one more lemma that relates the distance to Markov chains with the Λ max -quantity defined in (10).
where R B→B = tr C • R B→BC is the reduction of R B→BC to the output space B.
Proof of Lemma 3.3. The data processing inequality for the max-relative entropy [13,33] implies that Furthermore, because the data processing inequality for the conditional entropy [19,20] implies that H(A|BC) RB→BC (τAB) ≥ H(A|B) τAB for any τ AB ∈ S(A ⊗ B), we also have Note that the inequality on the right hand side of the implication must, again by the data processing inequality, be an equality, which means that I(A : C|B) µ = 0 and, hence, that µ ∈ MC. This proves the general implication We now use it to obtain Combining this with (55) completes the proof.

On the tightness of the main result
In this section we construct examples that show two things. First, there exist classical tripartite states with a large conditional mutual information that, however, can be recovered well. This shows the necessity of the Λ max -term in the main bound (48) -even if the relative entropy was replaced by the largest possible relative entropy measure, i.e., the max-relative entropy. Furthermore, the violation of such a bound without the Λ max -term can be made arbitrarily large. Second, our example shows that (48) is no longer valid if the max-relative entropy in the definition of Λ max (ρ AB R B→B ) is replaced with any α-Rényi relative entropy for any α ∈ [ 1 2 , ∞). Both examples will be classical, i.e., we consider tripartite states of the form (17). Such states are special as the corresponding density operators of the states and all its marginals are simultaneously diagonalizable. As a result, we can use the classical notion of a distribution to describe such states.

A large conditional mutual information does not imply bad recovery
Let X = {1, 2, . . . , 2 n } for n ∈ N, p, q ∈ [0, 1] such that p+q ≤ 1, and consider two independent random variables E Z and E Y on {0, 1} and {0, 1, 2}, respectively, such that P(E Z = 0) = p + q, P(E Y = 0) = p, and P(E Y = 1) = q. Let X ∼ U(X ), where U(X ) denotes the uniform distribution on X and define two random variables by where U Y ∼ U(X ) and U Z ∼ U(X ) are independent. This defines a tripartite distribution P XY Z . A simple calculation reveals that Similarly we find We thus obtain We next define a recovery map R Y →Y ′ Z ′ that creates a tuple of random variables (Y ′ , Z ′ ) out of Y . Let the recovery map be such that where U, U ′ are independent uniformly distributed on X . Let denote the distribution that is generated when applying the recovery map (described above) to P XY . In the following we will assume that n is sufficiently large. It can be verified easily that . We note that P(X = Y ) = p + pq + q 2 according to the distribution P XY and hence and We are now ready to state the conclusion of this example. For κ < ∞, p = 1 2 , q = 0, and n sufficiently large we find by combining (69) with (72) For κ < ∞, p = q = 1 4 , and n sufficiently large (69) and (73) imply This shows that there exist classical tripartite distributions P XY Z with a large conditional mutual information I(X : Y |Z) P and a recovery map R Y →Y Z such that R Y →Y Z (P XY ) is close to P XY Z and R Y →Y Z (P Y ) = P Y Z . The closeness is measured with respect to the max-relative entropy.

Tightness of the Λ max -term
In this section we construct a classical example showing that our main result, i.e., (48) is essentially tight in the sense that it is no longer valid if the max-relative entropy in the definition of Λ max (ρ AB R B→B ), given in (10), is replaced with an α-Rényi relative entropy for any α < ∞. More precisely, for α ∈ [1, ∞] we define For α = ∞ we have Λ ∞ (ρ E) = Λ max (ρ E). In this section we show that for all α < ∞ there exits a (classical) tripartite state ρ ABC and a recovery map R B→BC that satisfies R B→BC (ρ B ) = ρ BC such that To see this consider the following classical example (where we switch to the classical notation). Let S = {0, . . . , 2 n − 1} and consider a tripartite distribution Q XY Z defined via the random variables X ∼ U(S) and X = Y = Z. Let Q ′ XY Z be the distribution defined via the random variables X ∼ U(S), Y ∼ U(S) where X and Y are independent and Z = (X + Y ) mod 2 n . For p ∈ [0, 1] we define a binary random variable E such that P(E = 0) = p. Consider the distribution We next define two recovery mapsR where U ∼ U(S), respectively. We then define another recovery map as We note that the recovery map satisfies R Y →Y ′ Z ′ (P Y ) = P Y Z . A simple calculation shows that and We thus find The distribution R Y →Y ′ Z ′ (P XY ) generated by applying the recovery map to P XY can be decomposed as A simple calculation shows that and We thus have We note that the recovery map As a result we find where the final step follows by definition of the α-Rényi relative entropy and a straightforward calculation.
Recall that we need to prove (77), which in the classical notation reads as D P XY Z R Y →Y ′ Z ′ (P XY ) + Λ α (P XY R Y →Y ′ ) < I(X : Z|Y ) P , for all α < ∞. As mentioned in (34), the α-Rényi relative entropy is monotone in α which shows that it suffices to prove (93) for all α ∈ (α 0 , ∞), where α 0 ≥ 0 can be arbitrarily large.
The two steps (99) and (100) are both valid because α is sufficiently large. The final step uses (85). This example shows that (48) is no longer valid if the Λ max -term is replaced with a Λ α -term for any α ∈ [ 1 2 , ∞). 6 Note also that this example implies Remark 2.2 on the tightness of the triangle-like inequality for the relative entropy.
distance between τ and ω. For any state µ S1...S k on S 1 ⊗ · · · ⊗ S k that forms a Markov chain in order S 1 ↔ S 2 . . . S k−1 ↔ S k , it follows by (2) that its reduced state µ S1S k on S 1 ⊗ S k is separable. Using the monotonicity of the trace distance under trace-preserving completely positive maps [24, Theorem 9.2] we thus find ∆(ρ S1···S k , µ S1···S k ) ≥ ∆(ρ S1S k , µ S1S k ) ≥ 1 2 , showing that the state ρ S1···S k despite having a conditional mutual information that is arbitrarily small (see (104)) is far from any Markov chain. As discussed in the introduction, states with a small conditional mutual information are called approximate Markov chains (which is justified by (4)). The example in this appendix shows that approximate quantum Markov chains are not necessarily close to quantum Markov chains.