Necessary Criterion for Approximate Recoverability

A tripartite state ρABC\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho _{ABC}$$\end{document} forms a Markov chain if there exists a recovery map RB→BC\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {R}_{B \rightarrow BC}$$\end{document} acting only on the B-part that perfectly reconstructs ρABC\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho _{ABC}$$\end{document} from ρAB\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho _{AB}$$\end{document}. To achieve an approximate reconstruction, it suffices that the conditional mutual information I(A:C|B)ρ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I(A:C|B)_{\rho }$$\end{document} is small, as shown recently. Here we ask what conditions are necessary for approximate state reconstruction. This is answered by a lower bound on the relative entropy between ρABC\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho _{ABC}$$\end{document} and the recovered state RB→BC(ρAB)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {R}_{B\rightarrow BC}(\rho _{AB})$$\end{document}. The bound consists of the conditional mutual information and an entropic correction term that quantifies the disturbance of the B-part by the recovery map.


Introduction
A recovery map is a trace-preserving completely positive map that reconstructs parts of a composite system. More precisely, for a tripartite state ρ ABC on A ⊗ B ⊗ C, we can consider a recovery map R B→BC from B to B ⊗ C that reconstructs the C-part from the B-part only. If such a reconstruction is perfectly possible, i.e., if we call ρ ABC a (quantum) Markov chain in order A ↔ B ↔ C. 1 The structure of Markov chains is well understood. A state ρ ABC is a Markov chain if and only if there exists a decomposition of the B system as 1 We usually omit the identity map and the identity operator in our notation when its use is clear from the context. For example, we write R B→BC (ρ AB ) instead of (I A ⊗ R B→BC )(ρ AB ) and ρ B ρ AB ρ B instead of (id A ⊗ ρ B ) ρ AB (id A ⊗ ρ B ). We will drop the order of the Markov chain if it is A ↔ B ↔ C.
with states ρ Ab L j on A ⊗ b L j , ρ b R j C on b R j ⊗ C, and a probability distribution P [19]. A measure that is useful to describe Markov chains is the conditional mutual information that is given by whenever the trace is defined, i.e., whenever the operator ρ ABC (log ρ ABC + log ρ B − log ρ AB − log ρ BC ) is trace class. One often restricts to the case where the conditional von Neumann entropy H(A|B) = −D(ρ AB id A ⊗ ρ B ) is finite, where D(ρ σ) := trρ(log ρ − log σ) denotes the relative entropy between ρ and σ. Indeed, in this case, the data processing inequality [24,40] implies that H(A|BC) = −D(ρ ABC id A ⊗ ρ BC ) is also finite, and hence, the operators ρ ABC (log ρ AB − log ρ B ) and ρ ABC (log ρ ABC − log ρ BC ) are both trace class, implying that their difference is trace class, too. We further note that for finite-dimensional Hilbert spaces the conditional mutual information may be written as I It has been shown that a state ρ ABC is a Markov chain if and only if its conditional mutual information I(A : C|B) ρ vanishes [30,31]. Furthermore, the Petz recovery map (also known as transpose map) recovers such states perfectly, i.e., (1) holds with R B→BC = T B→BC .
Tripartite states ρ ABC that have a small conditional mutual information are called approximate Markov chains. The justification for this terminology is a recent result [16] proving that for any state ρ ABC there exists a recovery map R B→BC such that where F (τ, ω) := √ τ √ ω 2 1 denotes the fidelity between τ and ω. 2 Inequality (5) shows that the Markov property (1) approximately holds whenever the conditional mutual information is small. However, there exist tripartite states with a small conditional mutual information whose distance to any Markov chain is nevertheless large [11,20]. As a consequence, approximate quantum Markov chains are not necessarily close to quantum Markov chains. We refer to "Appendix A" for a more detailed explanation of this phenomenon.
Inequality (5) has been refined in a series of works [5,8,21,[35][36][37]41]. More precisely, the initial bound from [16] has been strengthened by replacing the right-hand side of (5) by the measured relative entropy between the original and the recovered state (see (9) below for a definition). This result came with a novel proof based on the notion of quantum state redistribution [8]. The proof has later been simplified by utilizing tools from semidefinite programming [5]. In [36], it was shown that there exists a universal recovery map, i.e., one that does not depend on the A system, that satisfies (5). Another major step was the discovery that (5), as well as generalizations thereof, can be obtained by complex interpolation theory [41], providing further insight into the structure of the recovery map. In [37], an intuitive proof of (5) based on the spectral pinching method was presented. In [21], it was shown that there exists an explicit recovery map (of the form (7)) that satisfies (5). The most recent result [35,Theorem 4.1] shows that for any state ρ ABC we have for the explicit recovery map and the probability measure on R. D M denotes the measured relative entropy, which is defined as where M is the set of all quantum-classical channels M (ω) = x (trMxω)|x x| with {M x } a positive operator valued measure (POVM) and {|x } an orthonormal basis. A simple property of the measured relative entropy ensures that D M (τ ω) ≥ − log F (τ, ω) for all states τ, ω [8], which shows that (6) implies (5). We further note that the recovery map P B→BC given in (7) is universal in the sense that it only depends on ρ BC and it satisfies P B→BC (ρ B ) = ρ BC . The interested reader can find additional information about the concepts and achievements around (5) in [34]. Inequality (5) shows that there always exists a recovery map whose recovery quality (measured in terms of the logarithm of the fidelity) is of the order of the conditional mutual information. This shows that a small conditional mutual information is a sufficient condition for a state to be approximately recoverable. In other words, (5) gives an entropic characterization for the set of tripartite states that can be approximately recovered.
In this work, we are interested in an opposite statement. This corresponds to an inequality that bounds the distance between ρ ABC and any reconstructed state R B→BC (ρ AB ) from below with an entropic functional of ρ ABC and the recovery map R B→BC that involves the conditional mutual information. Such an inequality is the converse to (5) and gives a necessary condition for approximate recoverability.

Main Result
For any trace-preserving completely positive map E on a system S we denote by Inv(E) the set of density operators τ on S which are left invariant under the action of E, i.e., We may now quantify the deviation of any state ρ from the set Inv(E) by where D max (ω σ) := inf{λ ∈ R : ω ≤ 2 λ σ} denotes the max-relative entropy.
The Λ max -quantity has the property that it is zero if and only if E leaves ρ invariant 3 , i.e., Main Result We prove that for any state ρ ABC on A ⊗ B ⊗ C and any recovery map R B→BC from the B system to the B ⊗ C system we have where D(τ σ) := tr τ log τ − tr τ log σ if supp(τ ) ⊆ supp(σ) and +∞ otherwise denotes the relative entropy, and R B→B := tr C • R B→BC is the action of the recovery map R B→BC on B. We refer to Theorem 3.1 for a more precise statement.
Cases where the Λ max -Term Vanishes To interpret the term Λ max in (13), note that the recovery map R B→BC generally not only reads the content of system B in order to generate C, but also disturbs it. Λ max quantifies the amount of this disturbance of B, taking system A as a reference. In particular, Λ max (ρ AB R B→B ) = 0 if R B→BC is "read only" on B, i.e., if ρ AB = R B→B (ρ AB ). Inequality (13) then simplifies to We further note that in case R B→BC is a recovery map that is "read only" on B its output state σ ABC := R B→BC (ρ AB ) is a Markov chain since where the two inequality steps follow from the data processing inequality [22,23] applied for R B→BC and tr C , respectively and hence I(A : C|B) σ = H(A|B) σ − H(A|BC) σ = 0.

Tightness of the Main Result
We next discuss several aspects concerning the tightness of (13). This will also give a better understanding about the role of the Λ max -term. We first note that by combining (6) with (13) we obtain where the recovery map P B→BC on the left-hand side is given by (7) and the infimum is over all recovery maps R B→BC that map B to B ⊗ C. The main difference between the lower and upper bound for the conditional mutual information given by (16) and (17), respectively, is the Λ max -term. Classical Case Inequalities (16) and (17) hold with equality in case ρ ABC is a classical state, i.e., for some probability distribution P ABC . To see this, we first note that if ρ ABC is classical (in which case ρ ABC and all its marginals commute pairwise) a straightforward calculation gives for the Petz recovery map T B→BC defined in (4).
This shows that in the classical case (17) is an equality and that the Petz recovery map T B→BC minimizes the right-hand side of (17). We further note that in the classical case the measured relative entropy coincides with the relative entropy and the rotated Petz recovery map P B→BC that satisfies (16) simplifies to the Petz recovery map T B→BC . This together with (19) then shows that (16) holds with equality in the classical case.
Necessity of the Λ max -Term A natural question regarding (13) is whether the Λ max -term is necessary. Here we show that this is indeed the case by constructing an example proving that a large conditional mutual information does not imply that all recovery maps are bad and hence the Λ max -term is indispensable.
More precisely, in Sect. 4.1 we construct a generic example showing that for any constant κ < ∞ there exists a classical state ρ ABC (i.e., a state of the form (18)) such that for some recovery map R B→BC that satisfies R B→BC (ρ B ) = ρ BC . A similar construction (also given in Sect. 4.1) shows that there exists another classical state ρ ABC such that for some recovery map R B→BC that satisfies R B→BC (ρ B ) = ρ BC . These constructions therefore show that an additional term like Λ max (ρ AB R B→B ), which measures the deviation from a "read only" map on B, is necessary to obtain a lower bound on the relative entropy between a state and its reconstructed version. The example has an even stronger implication. It shows that the Λ max -term is necessary even if one tries to bound the max-relative entropy between a state and its reconstructed version, i.e., D max (ρ ABC R B→BC (ρ AB )), which cannot be smaller than D(ρ ABC R B→BC (ρ AB )), from below. 4 The two strict inequalities (21) and (22) show that the Λ max -term is also necessary if one would allow for swapping the two arguments of the relative (or even max-relative) entropy. Furthermore, restricting the set of recovery maps such that they satisfy R B→BC (ρ B ) = ρ BC still requires the Λ max -term.
Since for classical states (19) holds, these examples also show that for the task of minimizing the relative entropy between ρ ABC and its reconstructed state R B→BC (ρ AB ) the Petz recovery map can be far from being optimaleven in the classical case. The examples further show that considering recovery maps that leave the B system invariant (i.e., they only "read" the B-part) is a considerable restriction. We refer to Sect. 4.1 for more information about these examples.
Optimality of the Λ max -Term Even in the case where ρ ABC is not classical, (13) is still close to optimal. We present two arguments why this is the case. First, we show that the Λ max -term cannot be replaced by a relative entropy measure that is smaller than the max-relative entropy. More precisely, (13) is violated if the max-relative entropy in the definition of Λ max (ρ AB R B→B ) is replaced with any α-Rényi relative entropy for any α ∈ [ 1 2 , ∞). We refer to Sect. 4.2 for more information.
The Λ max -term in (13) quantifies the (max-relative entropy) distance between ρ AB and its closest state that is invariant under R B→B . A natural question is if (13) remains valid if the Λ max -term is replaced by the max-relative entropy distance between ρ AB and R B→B (ρ AB ), i.e., Dmax(ρAB RB→B(ρAB)). This however is ruled out. To see this we recall that by the example mentioned above in (21) there exists a tripartite state ρ ABC and a recovery map R B→BC such that The data processing inequality for the max-relative entropy [14,38] and the fact that the max-relative entropy cannot be smaller than the relative entropy then imply which shows that (13) is no longer valid for the modified Λ max -term described above.

Related Results
Using the continuity of the conditional entropy, it is possible to derive an upper bound for the conditional mutual information of a state ρ ABC in terms of its distance to any reconstructed state σ ABC := R B→BC (ρ AB ), where R B→BC denotes an arbitrary recovery map [4,16]. This leads to a lower bound on the relative entropy between ρ ABC and R B→BC (ρ AB ) that however depends on the dimension of the A system. To see this, denote the binary entropy function and let Δ(τ, ω) := 1 2 τ − ω 1 be the trace distance between τ and ω. The data processing inequality [22,23] implies that By the improved Alicki-Fannes inequality [43, Lemma 2] we find where we used that . Together with Pinsker's inequality [13,32] this gives The fact that this bound explicitly depends on the dimension of the system A is unsatisfactory. Furthermore, the example discussed above in (21) shows that such a dependence on the dimension is unavoidable. A different approach to derive an upper bound for the conditional mutual information of a state ρ ABC in terms of its distance to a reconstructed state R B→BC (ρ AB ) was taken in [15,Theorem 11 and Remark 12] (see also [35,Proposition F.1]). It was shown that for any state ρ ABC where β 0 and P B→BC are given in (8) and (7), respectively andD 2 (τ ω) := log tr τ 2 ω −1 denotes Petz' Rényi relative entropy of order 2 [29]. The examples discussed above imply that the left-hand side of (29) can be much larger than the relative entropy between ρ ABC and R B→BC (ρ AB ) for the optimal recovery map R B→BC . In other words, rotated Petz recovery maps are generally far from optimal recovery maps.

One-Shot Relative Entropies
The goal of this section is to derive a triangle-like inequality for the relative entropy (see Lemma 2.1) which will be used in the proof of our main result, i.e., Theorem 3.1. To understand Lemma 2.1, we need to review a few properties of one-shot relative entropy measures.

Preliminaries
Let S(A) and P(A) denote the set of density and nonnegative operators on A, respectively. For any linear operator L on A, the trace norm is given by For ρ, σ ∈ P(A) we write ρ σ if the support of ρ is contained in the support of σ. Within this document our Hilbert spaces are assumed to be separable. We define the min-relative entropy [33] as and the max-relative entropy [14,33] as As the names suggest, the min-relative entropy cannot be larger than the max-relative entropy, or more precisely we have with strict inequalities in the generic case [27,38]. The max-relative entropy turns out to be the largest relative entropy measure that satisfies the data processing inequality and is additive under tensor products [38,Section 4.2.4].
We also note that it follows immediately from the definition that the maxrelative entropy cannot increase if the same positive map is applied to both arguments (see also [26,Theorem 2] for a more general statement).
The min-and max-relative entropies can be seen as the extreme points of a family of relative entropies called minimal quantum Rényi relative entropy (also known as sandwiched Rényi relative entropy) [27,42]. For α ∈ [ 1 2 , 1) ∪ (1, ∞) and ρ, σ ∈ P(A), this family is defined as It can be shown [27] that Furthermore the minimal quantum Rényi relative entropy is monotone in α ∈
Remark 2.2. We note that if A is a finite-dimensional Hilbert space then (36) is valid for all α ∈ [ 1 2 , ∞). This follows from the fact that t → t 1−α α is operator anti-monotone [38] for α > 1 and that the function X → trX α is monotone on the set of Hermitian operators [9, Theorem 2.10].
Very recently, a similar triangle-like inequality for Rényi relative entropies that additionally involves trace-preserving completely positive maps has been established in [10]. The following remarks show that Lemma 2.1 is optimal and that there is not much flexibility to prove triangle-like inequalities for the relative entropy different than (36).
A simple calculation shows that Choosing p = 1 − 2 −α reveals that In the limit α → ∞, the strict inequality (42) becomes an equality.
Remark 2.4. The statement of Lemma 2.1 is no longer true if the max-relative entropy and the relative entropy on the right-hand side of (36) are exchanged. To see this consider the three classical binary probability distributions with p, ε ∈ (0, 1). This gives For p = 7 8 and ε = 1 8 we find that This shows that it is crucial which term in Lemma 2.1 carries a max-relative entropy.

Main Result and Proof
The quantity Λ max (ρ AB R B→B ) is defined in (11) and R B→B := tr C • R B→BC . To prove the assertion of Theorem 3.1, we make use of a known lemma stating that the conditional mutual information of a tripartite density operator is bounded from above by the smallest relative entropy distance to Markov chains. Let MC(A ⊗ B ⊗ C) denote the set of Markov chains on A ⊗ B ⊗ C, i.e., tripartite density operators ρ ABC ∈ S(A ⊗ B ⊗ C) that satisfy (1).

Lemma 3.2 ([20, Theorem 4]). Let ρ ABC ∈ S(A ⊗ B ⊗ C). Then
Proof. The proof we provide here follows the lines of a proof by Jencová (see the short note after the acknowledgements in [20]), but extends it to general separable spaces. Let μ ABC ∈ MC and assume without loss of generality that the relative entropy D(ρ ABC μ ABC ) is finite. (If there is no such state then the infimum in (50) equals infinity and the statement is trivial.) Due to the data processing inequality [24,40] we have and In particular, the relative entropies D(ρ AB μ AB ), D(ρ BC μ BC ), and D(ρ B μ B ) are finite. We thus have Using the Markov chain property (2) for μ ABC , i.e., it is straightforward to verify that The above can thus be simplified to It follows from (51) and (52) that which concludes the proof.
In order to prove Theorem 3.1, we need one more lemma that relates the distance to Markov chains and the Λ max -quantity defined in (11).
Proof. For the proof, we first assume that the system A has a finite dimension, so that conditional entropies of the form H(A|B) are finite. The data processing inequality for the max-relative entropy [14,17,38] implies that Furthermore, because the data processing inequality for the conditional entropy [22,23] implies that H(A|BC) RB→BC (τAB ) ≥ H(A|B) τAB for any τ AB ∈ S(A ⊗ B), we also have Note that the inequality on the right-hand side of the implication must, again by the data processing inequality, be an equality, which means that I(A : C|B) μ = 0 and, hence, that μ ∈ MC. This proves the general implication We now use it to obtain Combining this with (58) completes the proof for the case where the system A is finite-dimensional.
To extend the claim to general separable Hilbert spaces, consider a sequence of finite-rank projectors A for any k ∈ N that, for k → ∞, converges to the identity in the weak, and hence also in the strong, operator topology [18]. It follows from the monotonicity of the max-relative entropy under positive maps and (56) for finite-dimensional A that The right-hand side can be bounded for any k ∈ N by where the first inequality uses that Π k . The final step follows because τ AB is a density operator and hence tr Π k A τ AB ≤ 1 for any projector Π k A on A. Using once again the monotonicity of the maxrelative entropy under positive maps we find with the above To conclude the proof, it thus suffices to establish that Because the max-relative entropy cannot increase if the same positive map is applied to both arguments, the max-relative entropy is non-decreasing for increasing k, and the lim sup may therefore be replaced by a lim. Hence, there exists a sequence (μ k ) k∈N of density operators in MC such that and we can assume without loss of generality that Π k A μ k ABC Π k A = μ k ABC . From here we proceed analogously to the proof of Lemma 11 in [18]. In particular, we use that the space, T (H), of trace-class operators on H = A⊗B ⊗C (equipped with the trace norm) is isometrically isomorphic to the dual of the space K(H) of compact operators on H (equipped with the operator norm), with the isomorphism τ → ψ τ given by ψ τ (κ) = tr κτ , and that, by the Banach-Alaoglu theorem, the closed unit ball on T (H) is therefore compact with respect to the weak * topology. This implies that there exists a subsequence (μ k ) k∈Γ⊂N that converges in the weak * topology to an element μ ∈ T (H), i.e., lim k→∞ tr κμ k = tr κμ for al κ ∈ K(H). Because, for any k ∈ N, μ k is a density operator, μ is also a density operator. The convergence (73) also implies for any κ ∈ K(H). By the definition of the max-relative entropy, the sequence on the left-hand side must converge to a nonnegative real for any κ ≥ 0. This implies (71).

On the Tightness of the Main Result
In this section, we construct examples that show two things. First, there exist classical tripartite states with a large conditional mutual information that, however, can be recovered well. This shows the necessity of the Λ max -term in the main bound (49)-even if the relative entropy was replaced by the largest possible relative entropy measure, i.e., the max-relative entropy. Furthermore, the violation of such a bound without the Λ max -term can be made arbitrarily large. Second, our example shows that (49) is no longer valid if the max-relative entropy in the definition of Λ max (ρ AB R B→B ) is replaced with any α-Rényi relative entropy for any α ∈ [ 1 2 , ∞). Both examples will be classical, i.e., we consider tripartite states of the form (18). Such states are special as the corresponding density operators of the states and all its marginals are simultaneously diagonalizable. As a result, we can use the classical notion of a distribution to describe such states.

A Large Conditional Mutual Information Does Not Imply Bad Recovery
Let X = {1, 2, . . . , 2 n } for n ∈ N, p, q ∈ [0, 1] such that p + q ≤ 1, and consider two independent random variables E Z and E Y on {0, 1} and {0, 1, 2}, respectively, such that P(E Z = 0) = p+q, P(E Y = 0) = p, and P(E Y = 1) = q. Let X ∼ U(X ), where U(X ) denotes the uniform distribution on X and define two random variables by where U Y ∼ U(X ) and U Z ∼ U(X ) are independent. This defines a tripartite distribution P XY Z . A simple calculation reveals that Similarly we find We thus obtain We next define a recovery map R Y →Y Z that creates a tuple of random variables (Y , Z ) out of Y . Let the recovery map be such that where U, U are independent uniformly distributed on X . Let denote the distribution that is generated when applying the recovery map (described above) to P XY . In the following, we will assume that n is sufficiently large. It can be verified easily that Q Y Z = P Y Z . Since P XY Z and Q XY Z are classical distributions we have Dmax(P XY Z Q XY Z ) = maxx,y,z log P XY Z (x,y,z) Q XY Z (x,y,z) . We note that P(X = Y ) = p + pq + q 2 according to the distribution P XY and hence and We are now ready to state the conclusion of this example. For κ < ∞, p = 1 2 , q = 0, and n sufficiently large we find by combining (84) with (87) For κ < ∞, p = q = 1 4 , and n sufficiently large (84) and (88) imply This shows that there exist classical tripartite distributions P XY Z with a large conditional mutual information I(X : Y |Z) P and a recovery map The closeness is measured with respect to the max-relative entropy.

Tightness of the Λ max -Term
In this section, we construct a classical example showing that our main result, i.e., (49) is essentially tight in the sense that it is no longer valid if the max-relative entropy in the definition of Λ max (ρ AB R B→B ), given in (11), is replaced with an α-Rényi relative entropy for any α < ∞. More precisely, for α ∈ [1, ∞] we define For α = ∞ we have Λ ∞ (ρ E) = Λ max (ρ E). In this section, we show that for all α < ∞ there exits a (classical) tripartite state ρ ABC and a recovery map To see this, consider the following classical example (where we switch to the classical notation). Let S = {0, . . . , 2 n − 1} and consider a tripartite distribution Q XY Z defined via the random variables X ∼ U(S) and X = Y = Z. Let Q XY Z be the distribution defined via the random variables X ∼ U(S), Y ∼ U(S) where X and Y are independent and Z = (X + Y ) mod 2 n . For p ∈ [0, 1], we define a binary random variable E such that P(E = 0) = p. Consider the distribution We next define two recovery mapsR Y →Y Z andR Y →Y Z that create the tuples (Y , Z ) out of Y such that where U ∼ U(S), respectively. We then define another recovery map as We note that the recovery map satisfies R Y →Y Z (P Y ) = P Y Z . A simple calculation shows that and We thus find The distribution R Y →Y Z (P XY ) generated by applying the recovery map to P XY can be decomposed as A simple calculation shows that and We thus have We note that the recovery map where the final step follows by definition of the α-Rényi relative entropy and a straightforward calculation.
Recall that we need to prove (92), which in the classical notation reads as for all α < ∞. As mentioned in (35), the α-Rényi relative entropy is monotone in α which shows that it suffices to prove (108) for all α ∈ (α 0 , ∞), where α 0 ≥ 0 can be arbitrarily large.
Combining (106) and (107) shows that for any α ∈ (α 0 , ∞) where α 0 is sufficiently large, p = α −2 , and n = α where we used that (1 − α −2 ) α (2 α − 1) ≤ 2 α for α ≥ 1. Using the simple inequality log(1 + x) ≤ log x + 2 x for x ≥ 1 gives where the final step is valid since α is assumed to be sufficiently large. Using once more log(1 + x) ≤ log x + 2 x for x ≥ 1 gives where poly(α) denotes an arbitrary polynomial in α. As a result, we obtain for a sufficiently large α The two steps (114) and (115) are both valid because α is sufficiently large. The final step uses (100). This example shows that (49) is no longer valid if the Λ max -term is replaced with a Λ α -term for any α ∈ [ 1 2 , ∞). 6 Note also that this example implies Remark 2.3 on the tightness of the triangle-like inequality for the relative entropy.

Open Questions
In this article, we introduced a new entropic quantity Λ max (ρ AB R B→B ) that measures how much the map R B→B disturbs the B system, taking system A as a reference. It would be interesting to better understand this quantity and its properties. For example in case ρ ABC is a state whose marginals are all flat 7 , is it possible to bound Λ max (ρ AB R B→B ) in terms of D(ρ ABC R B→BC (ρ AB )) from above? This would considerably simplify our main result (49) for this special case, which is of interest, e.g., in applications to condensed matter physics. Ann. Henri Poincaré Open Access. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/ by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Appendix A. Approximate Markov Chains Can be far from Markov Chains
As mentioned in the introduction, it is known [11,20] that there exist tripartite states with a small conditional mutual information whose distance to any Markov chain is nevertheless large. For example, consider a state ρ S1,...S d = |ψ ψ| S1,...S d on S 1 ⊗ · · · ⊗ S d with dim S k = d > 1 for all k = 1, . . . , d, where |ψ S1,...S d : Because the mutual information is nonnegative, there exists k ∈ {2, . . . , d} such that which can be arbitrarily small (as d gets large). By definition, the reduced state ρ S1S k is the antisymmetric state on S 1 ⊗ S k that is far from separable [7, p. 53]. More precisely, for any separable state σ S1S k on S 1 ⊗ S k we have Δ(ρ S1S k , σ S1S k ) ≥ 1 2 , where Δ(τ, ω) := 1 2 τ − ω 1 denotes the trace distance between τ and ω. For any state μ S1...S k on S 1 ⊗ · · · ⊗ S k that forms a Markov chain in order S 1 ↔ S 2 . . . S k−1 ↔ S k , it follows by (2) that its reduced state μ S1S k on S 1 ⊗ S k is separable. Using the monotonicity of the trace distance under trace-preserving completely positive maps [28, Theorem 9.2] we thus find Δ(ρ S1...S k , μ S1...S k ) ≥ Δ(ρ S1S k , μ S1S k ) ≥ 1 2 , showing that the state ρ S1...S k despite having a conditional mutual information that is arbitrarily small (see (119)) is far from any Markov chain.
As discussed in the introduction, states with a small conditional mutual information are called approximate Markov chains (which is justified by (5)).
The example in this appendix shows that approximate quantum Markov chains are not necessarily close to quantum Markov chains.