Entropy accumulation

We ask the question whether entropy accumulates, in the sense that the operationally relevant total uncertainty about an $n$-partite system $A = (A_1, \ldots A_n)$ corresponds to the sum of the entropies of its parts $A_i$. The Asymptotic Equipartition Property implies that this is indeed the case to first order in $n$, under the assumption that the parts $A_i$ are identical and independent of each other. Here we show that entropy accumulation occurs more generally, i.e., without an independence assumption, provided one quantifies the uncertainty about the individual systems $A_i$ by the von Neumann entropy of suitably chosen conditional states. The analysis of a large system can hence be reduced to the study of its parts. This is relevant for applications. In device-independent cryptography, for instance, the approach yields essentially optimal security bounds valid for general attacks, as shown by Arnon-Friedman et al.


Introduction
In classical information theory, the uncertainty one has about a variable A given access to side information B can be operationally quantified by the number of bits one would need to learn, in addition to B, in order to reconstruct A. While this number generally fluctuates, it is -except with probability of order ε > 0 -not larger than the ε-smooth max-entropy, H ε max (A|B) ρ , evaluated for the joint distribution ρ of A and B [45]. 1 Conversely, it is in the same way not smaller than the ε-smooth min-entropy, H ε min (A|B) ρ . This may be summarised by saying that the number of bits needed to reconstruct A from B is with probability at least 1 − O(ε) contained in the interval whose boundaries are defined by the smooth entropies. We refer to Definition 2.2 below for a precise definition of these quantities. This approach to quantifying uncertainty can be extended to the case where A and B are quantum systems. The conclusion remains the same: the operationally relevant uncertainty interval is I as defined by (1). The only difference is that ρ is now a density operator, which describes the joint state of A and B [44,41,51].
Finding the boundaries of the interval I is a central task of information theory. However, the smooth entropies of a large system A are often difficult to calculate. It is therefore rather common to introduce certain assumptions to render this task more feasible. One extremely popular approach in standard information theory is to assume that the system consists of many mutually independent and identically distributed (IID) parts. More precisely, the IID Assumption demands that the system be of the form A = A n 1 = A 1 ⊗ · · · ⊗ A n , that the side information have an analogous structure B = B n 1 = B 1 ⊗ · · · ⊗ B n , and that the joint state of these systems be of the form ρ A1B1···AnBn = ν ⊗n AB , for some density operator ν AB . A fundamental result from information theory, the Asymptotic Equipartition Property (AEP) [48] (see [54] for the quantum version), then asserts that the uncertainty interval satisfies where c ε is a constant (independent of n) and where H(A|B) ν is the conditional von Neumann entropy evaluated for the state ν AB . In other words, for large n, the operationally relevant total uncertainty one has about A n 1 given B n 1 is well approximated by nH(A|B) ν = i H(A i |B i ) ρ . In this sense, the entropy of the individual systems A i accumulates to the entropy of the total system A n 1 . 2 In this work, we generalise this statement to the case where the individual pairs A i B i are no longer independent of each other, i.e., where the IID assumption does not hold. Without loss of generality one may think of the pairs A i B i as being generated by a sequence of processes M i , as shown in Figure 1. Each process M i may pass information on to the next one via a "memory" register R i . The state of the "future" pairs can thus depend on the "past" ones. 3 The only assumption we make is that, given the side information B i 1 generated until step i, the systems A i 1 are independent of the next piece of side information B i+1 . This is captured by the requirement that A i 1 ↔ B i 1 ↔ B i+1 forms a quantum Markov chain. 4 Entropy accumulation is then the claim that where, in the ith term of each sum, the infimum or supremum ranges over joint states ω Ri−1R of the memory R i−1 and a system R isomorphic to it, and the conditional von Neumann entropy is evaluated for the state (M i ⊗ I R )(ω Ri−1R ), abbreviated by M i (ω), which describes the output pair A i B i generated by M i jointly with R. To illustrate (3) it is useful to think of a communication scenario with two parties, Alice and Bob, who are receiving information A n 1 and B n 1 , respectively. Suppose that a source with memory R i generates this information sequentially in n steps, described by maps M i as depicted in Figure 1. Suppose furthermore that Bob would like to infer all n values A i Figure 1: Circuit diagram illustrating the decomposition of states ρ A n 1 B n 1 relevant for our main theorem. One starts with a state ρ 0 R0 , and each of the pairs A i B i is generated sequentially, one after the other, by the process M i . The map M i takes as input a state on R i−1 and outputs a state on R i ⊗ A i ⊗ B i .
(which, for the purpose of this example, we assume to be classical). As discussed above, for this he would require N additional classical bits from Alice, where N fluctuates (up to probability ε) within an interval I with boundaries given by the entropies H ε min (A n 1 |B n 1 ) and H ε max (A n 1 |B n 1 ), which quantify Bob's uncertainty about A n 1 . While these entropies depend on the joint state ρ A n 1 B n 1 of the entire information generated by the source over all n steps, they can, according to (3), be lower (or upper) bounded by a sum of terms that merely depend on the individual steps M i . Specifically, the minimum (or maximum) number N of bits that Alice needs to send to Bob so that he can infer her values A i grows for each such value by the von Neumann entropy H(A i |B i R), minimised (or maximised) over all possible states the memory R i−1 could have been in right before the pair A i B i was produced, and conditioned on B i as well as any information R about this memory. 5 The main result we derive in this work is actually a bit more general than (3), allowing one to take into account global information about the statistics of A n 1 and B n 1 . This is relevant for applications. In quantum key distribution, for instance, M i models the generation of the ith bit of the raw key. However, in this cryptographic scenario, M i can depend on the attack strategy of an adversary, and is thus partially unknown. Hence, in order to bound the entropy (which characterises an adversary's uncertainty) of the raw key bits, one must as well take into account global statistical properties. These are inferred by tests carried out by the quantum key distribution protocol on a small sample of the generated bits. To incorporate such statistical information in the analysis, we consider for each i an additional classical value X i derived from A i and B i , as depicted by Figure 2. Specifically, X i shall tell us whether position i was included in the statistical test, and if so, the outcome of the test performed at step i. For this extended scenario, (3) still holds, but now the infimum and supremum are taken over a restricted set, containing only those states ω for which the resulting probability distribution on X i corresponds to the observed statistics.
Entropy accumulation has a number of theoretical and practical implications. For example, it serves as a technique to turn cryptographic security proofs that were restricted to collective attacks to security proofs against general attacks. This application is demonstrated in [5] for the case of a fully device-independent quantum key distribution and a randomness expansion protocol. The resulting security bounds are essentially tight, implying that device-independent cryptography is possible with state-of-the-art technology. To illustrate the basic ideas behind such applications, we will present two concrete examples in Figure 2: Circuit diagram illustrating the decomposition of states ρ A n 1 B n 1 X n 1 relevant for the full version of our main theorem, which can take into account statistical information X n 1 . The individual pieces X i of this statistical information are classical values that can be determined from A i and B i without disturbing them. When A i and B i are themselves classical, this means that X i is a deterministic function of A i and B i . For a precise definition in the general case we refer to Section 4. more detail. The first is a proof of security of a variant of the E91 Quantum Key Distribution protocol. This new security proof has two advantages. First, its structure is modular and it may therefore be adapted to other cryptographic schemes (see also the discussion in Section 6). In addition, it achieves a strong level of security where no assumption is made on Bob's devices. This is sometimes referred to as one-sided measurement device independence and this level of security was partially achieved in [58] (they used a memoryless devices assumption which we do not need) and later fully in [56] though with sub-optimal rates. The second example is the derivation of an upper bound on the fidelity achievable by Fully Quantum Random Access Codes.
The proof of the main result, Eq. (3), has a similar structure as the proof of the Quantum Asymptotic Equipartition Property [54], which we can retrieve as a special case (see Corollary 4.10). The idea is to first bound the smooth entropy of the entire sequence A n 1 conditioned on B n 1 by a conditional Rényi entropy of order α, then decompose this entropy into a sum of conditional Rényi entropies for the individual terms A i , and finally bound these in terms of von Neumann entropies. However, in contrast to previous arguments, we use a recently introduced version of conditional Rényi entropies, termed "sandwiched Rényi entropies" [64,37]. For these entropies, we derive a novel chain rule, which forms the core technical part of our proof. In addition, some of the concepts used in this work generalise techniques proposed in the recent security proofs for device-independent cryptography presented in [34,35]. In particular, the dominant terms of the lower bound on the amount of randomness obtained in [35], called rate curves, are similar to the tradeoff functions considered here (cf. Definition 4.1). 6 Paper organisation: We begin with preliminaries and notation in Section 2. Section 3 is devoted to the central technical ingredient of our argument, a chain rule for Rényi entropies. The main result, the theorem on entropy accumulation, is then stated and proved in Section 4. In Section 5 we present the two sample applications mentioned above, before concluding with remarks and suggestions for future work in Section 6. 6 While the tradeoff functions considered in this work are defined in terms of conditional von Neumann entropies, the rate curves of [35] are equal to a difference of (1 + ε)-Rényi entropies (see [35,Section 6]). The latter cannot be larger than the tradeoff functions, which yield asymptotically optimal randomness extraction rates (as shown in [5]).

Notation
In the table below, we summarise some of the notation used throughout the paper:
Quantum systems, and their associated Hilbert spaces Set of normalised density operators on A D (A) Set of sub-normalised density operators on A Pos(A) Set of positive semidefinite operators on A X −1 for X ∈ Pos(A) Generalised inverse, such that Often used as shorthand for A 1 , . . . , A n log(x) Logarithm of x in base 2 Throughout the paper, we restrict ourselves to finite-dimensional Hilbert spaces. Furthermore, we use the following notation for classical-quantum states ρ XA ∈ D(X ⊗ A) with respect to the basis {|x } x∈X of the system X. For any x ∈ X , we let ρ A,x = x|ρ AX |x so that ρ XA = x∈X |x x| ⊗ ρ A,x . To refer to the conditional state, we write ρ A|x = ρA,x tr(ρA,x) . An event Ω ⊆ X in this paper refers to a subset of X and we can similarly define ρ XA|Ω = 1 ρ[Ω] x∈Ω |x x| ⊗ ρ A,x , where we introduced the notation ρ[Ω] = x∈Ω tr(ρ A,x ). We also use the usual notation for the partial trace for conditional states, e.g., ρ XA|Ω = tr B (ρ XAB|Ω ).
For a density operator ρ AB ∈ D(A ⊗ B) on a bipartite Hilbert space A ⊗ B we define the operator 7 and such that holds. Specifically, for any map M one may define it is then straightforward to verify the properties above. Conversely, for any M B|A such that (4) holds the map defined by (6) and hence (5). It is also easy to verify that it is completely positive and trace non-increasing.
We mention here a slight abuse of terminology: for a completely positive map M B←A from L(A) to L(B), we often use a shorthand to indicate the systems it acts on and simply say that it maps A to B.

Background on quantum Markov chains
The concept of quantum Markov chains will be used throughout the paper, and here we give some relevant basic facts about them. Let {a j } j∈J and {c j } j∈J be families of Hilbert spaces and let B be a Hilbert space such that 8 holds. Let us furthermore denote by V = j∈J V aj cj ←B the corresponding isomorphism. It is convenient to treat j a j ⊗ c j as a subspace of the product a ⊗ c of the spaces a = j∈J a j and c = j∈J c j .
The mapping V may then be viewed as an embedding of B into a ⊗ c. Given a density operator ρ B , we denote by ρ ac the density operator V ρ B V † . More generally, for a multipartite density operator ρ AB , we write ρ Aac for V ρ AB V † . Furthermore, for any j ∈ J, we denote by ρ Aaj cj the projection of ρ Aac onto the subspace defined by a j ⊗ c j , i.e., A tri-partite density operator ρ ABC is said to obey the Markov chain condition A ↔ B ↔ C if there exists a decomposition of B of the form (7) such that where {q j } j∈J is a probability distribution and {ρ Aaj } j∈J and {ρ cjC } j∈J are families of density operators [39,28,26]. It follows from this decomposition that a state ρ ABC obeying the Markov chain condition can be reconstructed from ρ AB with a map T BC←B acting only on B [39]: Another useful characterization of the Markov chain condition for ρ ABC is given by the entropic equality I(A : C|B) ρ = 0 [39,28,26]. The conditional mutual information is defined as is the von Neumann entropy.

Entropic quantities
The formulation of the main claim refers to smooth entropies, which can be defined as follows.
Definition 2.2. For any density operator ρ AB and for ε ∈ [0, 1] the ε-smooth min-and max-entropies of A conditioned on B are respectively, whereρ is any non-negative operator with trace at most 1 that is ε-close to ρ in terms of the purified distance 9 [55,51], and where σ B is any density operator on B.
The proof we present here heavily relies on the sandwiched relative Rényi entropies introduced in [64,37]. These relative entropies can be used to define a conditional entropy. 10 Definition 2.3. For any density operator ρ AB and for α ∈ (0, 1) ∪ (1, ∞) the sandwiched α-Rényi entropy of A conditioned on B is defined as where α ′ = α−1 α and where X α = tr (X † X) α 2 1 α . Note that α ′ is the inverse of the Hölder conjugate of α.
We note that, while the function X → X α is a norm for α 1, this is not the case for α < 1 since it does not satisfy the triangle inequality. Some key properties of this function are summarised in Appendix A. Using them, the sandwiched Rényi entropies may be rewritten as It turns out that there are multiple ways of defining conditional entropies from relative entropies. Another variant that will be needed in this work is the following: 9 The purified distance is defined as P (ρ,ρ) = 1 − tr √ ρ √ρ 2 whenever either ρ orρ is normalized. 10 We note that there are at least two common variants for how to define a conditional entropy based on a relative entropy. We refer to Appendix B for more details. Definition 2.4. For any density operator ρ AB and for α ∈ (0, 1) ∪ (1, ∞), we define , where the infimum is over all sub-normalised density operators on B.
Other relevant facts about the sandwiched Rényi entropy and the corresponding notion of relative entropy can be found in Appendix B.

Chain rule for Rényi entropies
As explained in the introduction, our main result can be regarded as a generalisation of the Quantum Asymptotic Equipartition Property [54], corresponding to (2). The approach used for the proof of the latter is to bound both the smooth min-entropy and the von Neumann entropy by Rényi entropies with an appropriate parameter α. The IID assumption is then used to decompose the Rényi entropy into a sum of n terms. However, since our main claim, Eq. (3), is supposed to hold for general non-IID states, we do not have this luxury, and we must somehow decompose the Rényi entropy into n terms using other means. The tool we will use for this purpose is a chain rule for Rényi entropies, which we present as a separate theorem (Theorem 3.2). We start by stating a more general version that will be useful in the proof of the main theorem.
Proof. When α = 1, this equality follows directly from the definition of the entropies. To prove the equality for α ∈ (0, 1) ∪ (1, ∞) we consider a purification |ψ A1A2BE of ρ A1A2B . Using Lemma B.2 and setting α ′ = α−1 α we have chain stated in (9). Namely, there exists a decomposition j a j ⊗ b j of the system B 1 such that holds, where {q j } is a probability distribution and where {ρ Aaj } and {ρ bj B } are families of density operators. Then, To prove (14), it only remains to show that the state ν A1A2B1B2 defined in (13) satisfies ν A2B2|A1B1 = ρ A2B2|A1B1 . For that, we again use the fact that ρ A1B1B2 forms a Markov chain. As we will be using this statement later in other contexts, we state it as a claim.
Claim 3.4. Let ρ A1B1A2B2 be a density operator such that the Markov chain condition , the decomposition (15) allows us to write It follows that whereρ 0 A1aj is the projector onto the support ofρ A1aj . We used for the first equality the fact that the support of the operator ρ 1 2

A1ajρ
−α ′ ajρ 1 2 A1aj α is the same as the support ofρ A1aj . As a result, we find This concludes the proof of Claim 3.4 and gives the desired statement.
The following simple corollary expresses the above chain rules in terms of quantum channels, i.e., trace preserving completely positive (TPCP) maps, rather than conditional states.
where the supremum and infimum range over density operators ω RA1B1 on R ⊗ A 1 ⊗ B 1 . Moreover, if ρ 0 RA1B1 is pure then we can optimise over pure states ω RA1B1 . Proof. We apply Theorem 3.3 to ρ A1B1A2B2 . It suffices to show that the optimisation over ν A1B1A2B2 satisfying ν A2B2|A1B1 = ρ A2B2|A1B1 is contained in the optimisation over ω RA1B1 . For this, let ν A1B1A2B2 be any density operator satisfying ν A2B2|A1B1 = ρ A2B2|A1B1 , i.e., Now we choose We then see that

Entropy accumulation
This section is devoted to the main result on entropy accumulation. The statement is formulated in its fully general form as Theorem 4.4 and presented in a slightly simplified version as Corollary 4.8. We also give a formulation that corresponds to statement (3) of the introduction (Corollary 4.9). Finally, we show how the Quantum Asymptotic Equipartition Property follows as a special case (cf. Corollary 4.10).
where A i is finitedimensional and where X i represents a classical value from an alphabet X that is determined by A i and B i together. More precisely, we require that, where {Π Ai,y } and {Π Bi,z } are families of mutually orthogonal projectors on A i and B i , and where t : Y × Z → X is a deterministic function (cf. Figs. 1 and 2). Special cases of interest are when X i is trivial and T i is the identity map, and when X i = t(Y i , Z i ) where Y i and Z i are classical parts of A i and B i , respectively. Note that the maps T i have the property that, for any operatorW Xi AiBi , ifW The entropy accumulation theorem stated below will hold for states of the form where ρ 0 R0E ∈ D(R 0 ⊗E) is a density operator on R 0 and an arbitrary system E. In addition, we require that the Markov conditions be satisfied for all i ∈ {1, . . . , n}.
Let P be the set of probability distributions on the alphabet X of X i , and let R be a system isomorphic to R i−1 . For any q ∈ P we define the set of states (28) where ν Xi denotes the probability distribution over X with the probabilities given by x|ν Xi |x .

Remark 4.2.
To determine the infimum inf ν∈Σi(q) H(A i |B i R) ν , we may assume that ω Ri−1R in the definition of Σ i (q) is pure. In fact, including a purifying system in R cannot increase H(A i |B i R) because of strong subadditivity. Similarly, to calculate the supremum sup ν∈Σi(q) H(A i |B i R) ν , we may assume that ω Ri−1R is a product state or that R is trivial. This justifies the fact that we assumed R is isomorphic to R i−1 in the definition of Σ i (q). Remark 4.3. As we will see in the proof below, one can also impose the constraint on the set Σ i (q) that the system R be isomorphic to A i−1 1 B i−1 1 E. Furthermore, if a part of the latter is classical in ρ, one can restrict Σ i (q) to states satisfying this property.
In the following, we denote by ∇f the gradient of a function f . (Note that in Theorem 4.4 and Proposition 4.5 f is an affine function, so that ∇f is a constant.) We write freq(X n 1 ) for the distribution on X defined by freq(X n 1 )(x) = |{i∈{1,...,n}:Xi=x}| n . We also recall that in this context, an event Ω is defined by a subset of X n and we write ρ[Ω] = x n 1 ∈Ω tr(ρ A n Theorem 4.4. Let M 1 , . . . , M n and ρ A n 1 B n 1 X n 1 E be such that (26) and the Markov conditions (27) hold, let h ∈ R, let f be an affine min-tradeoff function for M 1 , . . . , M n , and let ε ∈ (0, 1). Then, for any event Ω ⊆ X n that implies f (freq(X n 1 )) h, 12 holds if f is replaced by an affine max-tradeoff function and if Ω implies f (freq(X n 1 )) h.
Before proceeding to the proof, some remarks are in order. The first is that the Markov chain assumption on the state is important as argued in Appendix C. Secondly, the system E could have been included in B 1 , but for the applications we consider, it is clearer to keep a separate system E that is not affected by the processes M 1 , . . . , M n . Thirdly, concerning the second order term, it is possible to replace d A with appropriate entropic quantities, as in the Quantum Asymptotic Equipartition Property [54], which could be useful when the systems A i are infinite-dimensional. The dependence of the second order term in the state and in the tradeoff function f is studied in more detail in the subsequent work [19]. Finally, we note that the constraint that the tradeoff function be affine is not a severe restriction: given a convex min-tradeoff function, one can always choose a tangent hyperplane at a point of interest as an affine lower bound. This is illustrated in Corollary 4.7.
To prove the theorem, we will first show the following proposition, which is essentially a Rényi version of entropy accumulation. We then show how Theorem 4.4 follows from this proposition. Then, for any event Ω which implies f (freq(X n 1 )) h, holds for α satisfying 1 < α < 1 + 2 holds if f is replaced by an affine max-tradeoff function and if Ω implies f (freq(X n 1 )) h.
Proof. We focus on proving the first inequality (31). The proof of the second inequality (32) is similar, we only point out the main differences in the course of the proof. The first step of the proof is to construct a state that will allow us to lower-bound H ↑ α (A n 1 |B n 1 E) ρ |Ω using the chain rule of Theorem 3.3, while ensuring that the tradeoff function is taken into account. Let [g min , g max ] be the smallest real interval that contains the 12 We say that the event Ω implies f (freq(X n 1 )) h if for every x n 1 ∈ Ω, f (freq(x n 1 )) h.
range f (P) of f , and setḡ = 1 2 (g min +g max ). Furthermore, for every i, let D i : where τ (x) is a mixture between a maximally entangled state on D i ⊗D i and a fully mixed state such that the marginal onD i is uniform and such that here δ x stands for the distribution with all the weight on element x). To ensure that this is possible, we need to choose dim D i large enough, so we need to bound how largē g − f (δ x ) can be, positive or negative. By the definition ofḡ, |ḡ − f (δ x )| cannot be larger than 1 2 |g max − g min | ∇f ∞ . We therefore take the dimension of the spaces D i to be equal to For later use, we note that we have Now, letρ Note thatρ X n One can think of the D systems as an "entropy price" that encodes the tradeoff function. With these systems in place, the output entropy includes an extra term that allows the tradeoff function to be taken into account in the optimisation arising in Theorem 3.3. This is formalised by the following facts, which are proven in Claim 4.6: To show the desired inequality (31), it now suffices to prove that H α (A n 1 D n 1 |B n 1 ED n 1 )ρ is lower bounded by (roughly) nḡ. To do that, we are now going to use the chain rule for Rényi entropies in the form of Corollary 3.5 n times on the stateρ, with the following substitutions at step i: To establish the Markov chain condition, we compute the conditional mutual information.
Using the chain rule, we obtain We first show that the second term is zero. By construction, is independent of all the other systems. This implies that I(D i−1 can be removed from the conditioned without changing the value. Then, using the chain rule and together with the non-negativity of the conditional mutual information, this shows that But then the assumed Markov condition on ρ A n 1 B n 1 E implies that this quantity is zero and establishes the required condition to apply Corollary 3.5. We thus obtain where we have invoked Lemma B.9 in the second inequality and (33) in the last. Note that the restriction of this lemma that α satisfy 1 < α < 1 + 1/ log(1 + 2d A d D ) is implied by our assumption that α < 1+2/V . The infimum is taken over all states ω Ri−1R , where the system R is isomorphic to This condition can be further strengthened by redoing the above argument with Theorem 3.2 instead of Corollary 3.5. It turns out that the system R can be taken to be isomorphic to Considering the right hand side of expression (40), we get for any such state ω Ri−1R , The third equality comes from the fact that X i is determined by A i B i . The first inequality follows from the monotonicity of the Rényi entropies in α [8,37]. The last equality holds because f is affine and the final inequality because f is a min-tradeoff function. Putting everything together, Eq. (37) becomes .
This concludes the proof of the first inequality (31) of Proposition 4.5.
In order to show the second inequality (32), using the same argument as before, we obtain where the supremum is over all states ω Ri−1R with R constrained as described by Remark 4.3. For any such state and a max-tradeoff function f , we have It then suffices to combine these inequalities with inequality (38).
We now prove the claim used in the preceding proof.
Proof. We focus on proving inequality (41). The first step is to show that as X n 1 is a deterministic function of A n 1 B n 1 , we have In order to do that, observe that for any x n 1 ∈ X n , we havē This implies that for any x n 1 , we havē . By taking the sum over x n 1 ∈ Ω and then normalising by ρ[Ω], we get . Thus, we can apply Lemma B.7 and prove the equality (43).
Let now σ B n 1 ED n 1 be a state such that . Let furthermore S = S DD be the TPCP map that applies a random (according to the Haar measure) unitary to D and its conjugate toD (in such a way that the maximally entangled state on DD used to define τ (x) is preserved). It is then easy to see that the map S ⊗n applied to the n pairs D iDi leavesρ |Ω invariant. Hence, by the data processing inequality where ν is a state defined by We now use properties of ρ |Ω andσ to simplify the expression of ν. Observing that we can write In addition, asρ |Ω is of the form Getting back to the inequality (44) It is a direct consequence of the definition of τ (x) that where we have used that f is an affine function. Using Lemma B.3 and (46) we can bound the second term on the right hand side of (47) by Inserting this in (47) gives This concludes the proof of inequality (41). For the proof of inequality (42), we can follow similar steps. 15 Finally, we prove Theorem 4.4 using Proposition 4.5.
Proof of Theorem 4.4. The first step is to use Lemma B.10 to lower-bound the smooth minentropy by a Rényi entropy: Then Proposition 4.5 yields where we have used the fact that we are constrained to choose α 1 + 2 V 2 in the last inequality. We now choose and note that, as long as the value α is strictly smaller than 1 + 2 V and therefore within the required bounds. Note also that if (50) does not hold then the term c √ n in the claim (29) is at least nV 2n log(1 + 2d A ) 2n log d A , whereas the min-entropy is always at least −n log d A and nf min (q) is at most n log d A , which means that the claim is trivial. Finally, inserting (49) into the above yields as advertised. Once again, the max-entropy statement (30) holds by switching the direction of the inequalities, flipping the appropriate signs, and replacing every occurrence of H ↑ α by H 1 α . It might seem restrictive to assume that the tradeoff function is affine. We next show that we may take a general convex function provided the event Ω can be described as follows: x n ∈ Ω if and only if freq(x n ) ∈Ω whereΩ is a convex subset of P.  (27) hold, let h ∈ R, ε ∈ (0, 1), letΩ be a convex setΩ ⊆ P and define the corresponding event Ω ⊆ X n by x n 1 ∈ Ω ⇔ freq(x n 1 ) ∈Ω. Then, if f is a differentiable and convex mintradeoff function for M 1 , . . . , M n satisfying f (q) h for all q ∈Ω, we have Similarly, if f is a differentiable and concave max-tradeoff function for M 1 , . . . , M n satisfying f (q) h for all q ∈Ω, we have Proof. Let us denote by cl(Ω) the closure of the setΩ. Now as f is continuous on the compact set cl(Ω) (it is even assumed to be differentiable on all of P), we have min q∈cl(Ω) f (q) = f (q 0 ) for some q 0 ∈ cl(Ω). By continuity of f and by definition of h, we have f (q 0 ) ≥ h. Now consider the affine function g(q) = (∇f ) q0 · (q − q 0 ) + f (q 0 ). By convexity of f , we have that g(q) ≤ f (q) for all q ∈ P and thus g is a min-tradeoff function. In addition, as cl(Ω) is convex we can apply the first order optimality conditions and get that (∇f ) q0 · (q − q 0 ) ≥ 0 for all q ∈ cl(Ω). As a result, for all q ∈ cl(Ω), we have g(q) f (q 0 ) ≥ h. This implies that if x n 1 ∈ Ω, then g(freq(x n )) ≥ h. We can then apply Theorem 4.4 with the affine tradeoff function g and get the desired result as ∇g ∞ ≤ ∇f ∞ .
The proof for H ε max is analogous. One natural choice for the event Ω is that the empirical distribution freq(X n 1 ) takes a particular value q. This yields the following special case of Corollary 4.7.
, where ρ |q denotes the state ρ conditioned on the event that freq(X n 1 ) = q, and ρ[q] the probability of this event. Note that an analogous statement holds of course for the max-entropy, replacing f by a concave max-tradeoff function and changing the inequality accordingly.
The following corollary specialises the above to the formulation (3), in which no statistical test is being done, i.e. the X i systems are trivial. We provide the statement for the case of the lower boundary.
Proof. Note that the quantity H ε min (A n 1 |B n 1 E) ρ only depends on the marginal of the state ρ on A n 1 B n 1 E. Thus, we can modify the maps M i in any way that does not affect the reduced state ρ A n 1 B n 1 E before applying Corollary 4.8. In particular, we change M i so that the original value of X i is disregarded and replaced with the constant value X i = i. The values X 1 , . . . , X n can then be regarded as random variables with alphabet X = {1, . . . , n}. We define the real function f on P as Note that for any i ∈ {1, . . . , n} and any q ∈ P, we have either q(i) = 1 in which case Σ i (q) = ∅ (we use the notation in (28)) and the min-tradeoff condition is trivial or q As a result, f is a min-tradeoff function for all M i for i ∈ {1, . . . , n}. We now fix q ∈ P such that q(1) = · · · = q(n) = 1 n , in which case the event freq(X n 1 ) = q occurs with certainty. Because ∇f (q) ∞ log d A which implies that ⌈ ∇f (q) ∞ ⌉ ≤ log(1 + 2d A ), the claim follows immediately from Corollary 4.8.
As indicated in the introduction, in the special case where the individual pairs (A i , B i ) are independent and identically distributed (IID), the entropy accumulation theorem corresponds to the Quantum Asymptotic Equipartition Property [54]. We can therefore formulate the latter as a corollary of Theorem 4.4. 16 Corollary 4.10. For any bipartite state ν AB , any n ∈ N, and any ε ∈ (0, 1), Proof. Let, for any i = 1, . . . , n, M i be the TPCP map from R to XABR which sets AB to state ν AB and where X and R are trivial (one-dimensional) systems. The concatenation of these maps thus generates the state ρ A n 1 B n 1 = ν ⊗n AB . The claim is then obtained from Theorem 4.4 with the trade-off function f being a constant equal to h = H(A|B) ν and with Ω as the certain event.

Applications
Entropy is a rather general notion and, accordingly, entropy accumulation has applications in various areas of physics, information theory, and computer science. An example from physics is the phenomena of thermalisation. It is known that a system can only thermalise if its smooth min-entropy is sufficiently large [18]. To illustrate how Theorem 4.4 could give an estimate of this entropy, consider a system of interest (e.g., a cup of coffee) which is in contact with a large environment (the air around it). Suppose that, for an appropriately chosen discretisation of the evolution, the system interacts at each time step with a different part of the environment (e.g., with different air molecules bouncing off the coffee cup). 17 Theorem 4.4 then provides a bound on the total entropy that is transferred to the environment in terms of the von Neumann entropy transferred in each time step. Because the joint time evolution of system and environment is unitary, this entropy flow to the environment could be expressed in terms of the entropy change of the system itself. The argument would therefore prove that the total entropy acquired by the system over many time steps is bounded by the sum of the entropies produced in each individual time step.
Another area where the notion of entropy plays a crucial role is quantum cryptography. Many proofs of security of cryptographic protocols involve lower-bounding the uncertainty that a dishonest adversary has about some system of interest. The state-of-the-art is to derive such bounds using a combination of de Finetti-type theorems as well as the Quantum Asymptotic Equipartition Property [41,42,15,4]. However, the use of de Finetti theorems comes with various disadvantages. Firstly, they are only applicable under certain assumptions on the symmetry of the protocols. Secondly, they introduce additional error terms that can be large in the practically relevant finite-size regime [47]. Finally, it is not known how to apply de Finetti theorems in a device-independent scenario (see [21] for an overview and references on device-independent cryptography). These problems can all be circumvented by the use of entropy accumulation, as demonstrated in [5] for the case of device-independent quantum key distribution and randomness expansion. The resulting security statements are valid against general attacks and essentially optimal in the finite-size regime.
In the remainder of this section, we illustrate the use of entropy accumulation with two concrete examples. The first is a security proof for a basic quantum key distribution protocol. The second is a novel derivation of an upper bound on the fidelity of fully quantum random access codes.

Sample application: Security of quantum key distribution
A Quantum Key Distribution (QKD) protocol enables two parties, Alice and Bob, to establish a common secret key, i.e., a string of random bits unknown to a potential eavesdropper, Eve. The setting is such that Alice and Bob can communicate over a quantum channel, which may however be fully controlled by Eve. In addition, Alice and Bob have a classical communication link which is assumed to be authenticated, i.e., Eve may read but cannot alter the classical messages exchanged between Alice and Bob. The protocol is said to be secure against general attacks if any attack by Eve is either detected (in which case the protocol aborts) or does not compromise the secrecy of the final key. Here, we will show that our main theorem can be directly applied to show security against general attacks for a fairly standard QKD protocol. As a bonus, our proof still holds even if we do not make any assumptions about Bob's measurement device: the POVM applied by Bob at every step of the protocol can be arbitrary, and may vary from one step to the next (thereby achieving one-sided measurement device independence as in [58], but without the restriction to memoryless devices; see also [56]). In fact, as shown in [5], the entropy accumulation theorem can be used to prove the security of fully device-independent quantum key distribution.
For concreteness, we consider here a variant of the E91 QKD protocol [22] (and note that any security proof for this protocol also implies security of the BB84 protocol [10,11]). The protocol consists of a sequence of instructions for Alice and Bob, as described in the box below. These depend on certain parameters, including the number, n, of qubits that need to be transmitted over the quantum channel, the maximum tolerated noise level, e, of this channel, as well as the key rate, r, which is defined as the number of final key bits divided by n. In the first protocol step, Alice and Bob need to measure their qubits at random in one of two mutually unbiased bases, which we term the computational and the diagonal basis. These are chosen with probability 1 − µ and µ, respectively, for some µ ∈ (0, 1). The protocol also invokes an error correction scheme termed EC, which allows Bob to infer the measurement outcomes obtained by Alice for the set of indices S where the basis choices of Alice and Bob were the same. Note that if the protocol was implemented without any noise, then Bob's outcomes would match exactly with Alice's outcomes on the indices S and no error correction would be required. However, in the presence of noise, such an error correction step is needed. For this, Alice needs to send classical error correcting information to Bob, whose maximum relative length is characterised by another parameter, ϑ EC . We assume that EC is reliable. This means that, except with negligible probability, Bob either obtains a correct copy of Alice's string or he is notified that the string cannot be inferred. 18 The E91 Quantum Key Distribution Protocol Protocol parameters n ∈ N : number of uses of qubit channel µ ∈ (0, 1) : probability for measurements in diagonal basis e ∈ (0, 3. Parameter estimation: Bob counts the number of indices i ∈ S for whichB i = 1 andĀ i =Â i . If this number is larger than eµ 2 n then the protocol is aborted.

Privacy amplification:
Alice chooses a function F at random from a two-universal set of hash functions [63] from |S| bits to ⌊rn⌋ bits and announces F to Bob. Both Alice and Bob compute the final key as F (A S ) and F (Â S ), respectively.
The security of QKD against general attacks has been established in a sequence of works [32,33,49,13,41]. Specifically, for the E91 protocol, the following result has been shown.
Theorem 5.1. The E91 protocol is secure for any choice of protocol parameters satisfying 19 provided that n is sufficiently large.
Note that, because µ > 0 can be chosen arbitrarily small, the theorem implies that the E91 protocol can generate secret keys at an asymptotic rate of 1 − H Sh (e) − ϑ EC . We now show how this result can be obtained using the notion of entropy accumulation.
Proof. According to a standard result on two-universal hashing (see, for instance, Corollary 5.6.1 of [41]), the key F (A S ) computed in the privacy amplification step is secret to an adversary holding information E ′ if the smooth min-entropy of A S conditioned on E ′ is sufficiently larger than the output size of the hash function F . Since, in our case, this size is ⌊rn⌋, the condition reads where the entropy is evaluated for the joint state ρ |Ω of A S and E ′ conditioned on the event Ω that the protocol is not aborted and that Bob's guessÂ S of A S is correct. The smoothing parameter ε ∈ (0, 1) specifies the desired level of secrecy, 20 and we assume here that it is constant (independent of n). Because conditioning the smooth min-entropy of a classical variable on an additional bit cannot decrease its value by more than 1 (see, e.g., Proposition 5.10 of [51]), we may bound the smooth min-entropy in (54) by where E denotes all information held by Eve after the distribution step, and where |S|ϑ EC is the maximum number of bits exchanged for error correction. Note that we also included the basis information B n 1 andB n 1 in the conditioning part because Eve may obtain this information during the sifting and information reconciliation step. We are thus left with the task of lower bounding H ε min (A S |B n 1B n 1 E) |ρ |Ω , which is usually the central part of any security proof. Since it is also the part where entropy accumulation is used, we formulate it separately as Claim 5.2 below. Inserting this claim into (55), we conclude that the secrecy condition (54) is fulfilled whenever holds. But this is clearly the case for any choice of parameters satisfying (53), provided that n is sufficiently large.
It remains to show the separate claim, which we do using entropy accumulation.
Claim 5.2. Let A n 1 , B n 1 ,B n 1 , and S be the information held by Alice and Bob as defined by the protocol, let E be the information gathered by Eve during the distribution step, and let Ω be the event that the protocol is not aborted and that Bob's guessÂ S of A S is correct. Then, provided that Ω has a non-negligible probability (i.e., it does not decrease exponentially fast in n), Proof. Let ρ 0 Q n 1Q n 1 E be the joint state of Alice and Bob's qubit pairs before measurement, together with the information E gathered by Eve during the distribution step, and let where M i , for any i ∈ {1, . . . , n}, is the TPCP map from Q n iQ n i to Q n i+1Q n i+1 A iĀi B iBi X i defined as follows: (i) B i ,B i : random bits chosen independently according to the distribution (1 − µ, µ) Note that the values B n 1 andB n 1 correspond to the ones generated during the distribution step of the protocol. The same is true for A n 1 , with the modification that A i holds the measurement outcome only if B i =B i . That is, A i =⊥ if and only if i ∈ S, where S is the set determined in the sifting step. We can therefore rewrite (56) as To prove this inequality, we use Theorem 4.4 with the replacements . We note that X i is a deterministic function of the classical registers A iĀi and B iBi . To obtain the bound in (57), we need to define a min-tradeoff function. Let i ∈ {1, . . . , n} and consider the state where ω Q n iQ n i R is an arbitrary state. Let furthermore ν |b = ν XiAiĀiR|b be the corresponding state obtained by conditioning on the event that B i =B i = b, for b ∈ {0, 1}. We may now bound the entropy of A i using the entropic uncertainty relation proved in [12], which asserts that By the definition of X i , we also have where we wrote ν Xi to denote the probability distribution on {0, 1, ⊥} defined by the state ν, and where we have used that ν Xi (0) + ν Xi (1) = µ 2 . Furthermore, because A i is classical, its von Neumann entropy cannot be negative, which implies that Combining this with the above, we find that In other words,f is a min-tradeoff function for M i . Furthermore, because the binary Shannon entropy H Sh is concave,f is convex. We may thus define a linearised min-tradeoff function f as a tangent hyperplane tof at the point q 0 given by q 0 (0) = (1 − e)µ 2 , q 0 (1) = eµ 2 , and q 0 (⊥) = 1 − µ 2 . Furthermore, we define Finally, note that the event Ω that Bob's guess of A S is correct and that the protocol is not aborted implies that q = freq(X n 1 ) is such that q(1) µ 2 e and, hence, f (freq(X n 1 )) h. Since we assumed that Ω has non-negligible probability, Theorem 4.4 implies that (Note that the Markov chain conditions are satisfied because B i andB i are chosen at random independently of any other information.) Furthermore, becauseĀ i equals ⊥ unless B i =B i = 1, which occurs with probability µ 2 , we have Combining these inequalities with the chain rule for smooth entropies (see Theorem 15 of [60]), proves (57) and, hence, Claim 5.2.

Sample application: Fully quantum random access codes
One relatively simple application of our main result is to give upper bounds on the fidelity achieved by so-called Fully Quantum Random Access Codes (FQRAC). An FQRAC is a method for encoding m message qubits into n < m code qubits, such that any subset of k message qubits can be retrieved with high fidelity. Limits on the performance of random access codes with classical messages are rather well understood: the case k = 1 was studied in [38,1,2], and upper bounds on the success probability that decay exponentially in k were derived in [9,65,20]. In the fully quantum case, [20] gives similar upper bounds on the fidelity that decay exponentially in k. Here, we show that such exponential bounds for the fully quantum case can be obtained in a relatively elementary fashion via the concept of entropy accumulation. The example also highlights that entropy accumulation is already useful in its basic form (3), which does not involve statistics information X i . Indeed, here the bound on the entropy produced at every step comes from the bound on the number of code qubits. which is classical on S, we must have that where R is a reference system of arbitrary dimension, and where S M ′ m 1 S→MS S is a TPCP map that selects the k positions of M ′ m 1 corresponding to those in S and outputs them intoM S . Moreover, F (ρ, σ) := √ ρ √ σ 1 refers to the fidelity between two states ρ and σ.
Entropy accumulation gives the following constraint on FQRACs: Compared to the previously derived bound (Theorem 9 of [20]), the one obtained here is tighter for small k, 21 whereas it is weaker for large k.
Proof. Since the fidelity bound must be true for any state ρ, it must in particular be true for the state consisting of m maximally entangled pairs and a uniform distribution over subsets S. For every i ∈ {1, . . . , k}, define as a TPCP map that does the following: Finally, define the state where |Φ MiM ′ i := 1 √ 2 (|00 + |11 ). The next step is to use Theorem 4.4 on the state ρ k with the identifications and the tradeoff function f being the constant function equal to where the infimum is taken over states ν i of the form The bound of Theorem 9 of [20] has a pre-factor of the order of m and is therefore only non-trivial if k > log m.
for some state ω i . Here we also used Remark 4.3, which asserts that the system R that is used when defining the min-tradeoff function can be chosen isomorphic to A i−1 1 B i−1 1 E. Note that the Markov chain condition is immediate from the fact thatJ i is chosen at random. As the systems X i are trivial, we naturally take Ω to be the certain event. We find that Furthermore, again by Remark 4.3, if part of B is classical in ρ, then it remains classical in ν. As a result, we can assume in the following thatJ i−1 1 is a classical system in ν i . We continue by computing the expectation over the choice ofJ i : where the last inequality holds becauseJ i is classical, which implies that the first entropy in the bracket of the penultimate expression is non-negative, and because the second entropy in the bracket is upper bounded by n + k − 1. We now use Proposition 5.5 and Remark 5.6 of [51], which imply that 22 where the second inequality holds because the denominator in the logarithm is lower bounded by f 3 /3, as can be readily verified. Combining this with the above gives Conversely, note that, by assumption, the purified distance between ρ k and the state consisting of k maximally entangled qubit pairs is upper bounded by Since the max-entropy of k maximally entangled qubit pairs equals −k, we have We have thus derived the condition 4k log 8 It is easy to verify that this condition is violated whenever is violated. In fact, if log 8 Adding the square root of the first inequality and the second one, we get that inequality (64) is violated. Thus, the condition (65) must hold, and therefore also (58).

Conclusions
Informally speaking, entropy accumulation is the claim that the operationally relevant entropy (the smooth min-or max-entropy) of a multipartite system is well approximated by the sum of the von Neumann entropies of its individual parts. This has ramifications in various areas of science, ranging from quantum cryptography to thermodynamics. As described in Section 5, current cryptographic security proofs have various fundamental and practical limitations [46]. That these can be circumvented using entropy accumulation has already been demonstrated in [5] for the case of device-independent cryptography. We anticipate that the approach can be applied similarly to other cryptographic protocols. Examples include quantum key distribution protocols such as DPS and COW [27,50], for which full security has not yet been established. 23 One may also expect to obtain significantly improved security bounds for protocols that involve high-dimensional information carriers and, in particular, continuous-variable protocols [25,62]. 24 A strengthening of current security claims may as well be obtained for other cryptographic constructions, such as bit commitment and oblivious transfer protocols (see, for example, [16,29,20]).
Entropy accumulation can also be used in statistical mechanics, e.g., to characterise thermalisation processes. At the beginning of Section 5 we outlined an argument that could confirm -and make precise -the intuition that entropy production (in terms of von Neumann entropy) is relevant for thermalisation. However, to base such arguments on physically realistic assumptions, it may be necessary to generalise Theorem 4.4 to the case where the Markov conditions (27) do not hold exactly. One possibility, motivated by the main result of [23], could be to replace them by the less stringent conditions 23 These protocols do not have the required symmetries to employ standard techniques such as de Finettitype theorems [42]. 24 The security of continuous-variable protocols against general attacks has been proved [43], but the bounds have an unfavourable scaling in the finite-size regime.
Another promising direction would be to apply entropy accumulation to estimate the entropy of low-energy states of many-body systems. One may expect that, under appropriate physical assumptions, these states possess a structure that permits a decomposition of the form described by Figure 1 such that the Markov conditions required for Theorem 4.4, or at least some relaxations of them such as (66), hold. This may for example be the case for systems whose states are well approximated by matrix products states (see, e.g., [59]). We leave the investigation of such applications, as well as the development of corresponding extensions of the entropy accumulation theorem, for future work.

Appendix A The function · α
We use an extension of the Schatten α-norm to the regime where α > 0, which is defined for any operator X = X B←A from a space A to a space B by It follows from the Singular Value Theorem that X α = X † α = X ⊺ α = X α (see, e.g, Section 2 of [61]), from which it also follows that Note also that The following is Lemma 12 from [37].
Lemma A.1. For any non-negative operator X and for any α ∈ R + where the supremum and infimum range over density operators Z.

Appendix B Properties of the sandwiched Rényi entropies
The sandwiched Rényi entropy from Definition 2.3 is a special case of the sandwiched Rényi relative entropy, which is defined as follows.
In particular, for a bipartite density operator ρ AB , the sandwiched α-Rényi entropy of A conditioned on B is related to this relative entropy by It turns out that this is not the only way to define a conditional entropy based on a relative entropy. One popular alternative is to replace the marginal ρ B by a maximisation over arbitrary density operators on B: We refer to [53] for a comparison of the different notions.
The following Lemma corresponds to Eq. 19 of [37]. For its proof, it is convenient to represent vectors of product systems as matrices. Let {|i A } and {|j B } be fixed orthonormal bases of A and B, respectively. For any vector we define the linear operator We emphasise that this definition is basis-dependent. Therefore, in expressions that involve this operator as well as the transpose operation Z → Z ⊺ , it is understood that both are taken with respect to the same basis. It is straightforward to prove the following properties (see, e.g., Section 2.4 of [61]). For any operators X A ′ ←A and Y B ′ ←B , and, hence, Lemma B.2. For any density operators ρ and σ on the same Hilbert space and for α ∈ (0, 1) ∪ (1, ∞) we have where |ψ ψ| is a purification of ρ and where the supremum ranges over all density operators τ on the purifying system. In particular, for any pure ρ ABE = |ψ ψ| we have Proof. Let us denote by A the Hilbert space on which ρ and σ act and by E the purifying space, so that |ψ is a vector on A ⊗ E. Then, using (67) and (72), the sandwiched Rényi entropy can be written as Using Lemma A.1 as well as (67) and (71) we obtain where the supremum is taken over density operators τ on E. The first equality of the lemma then follows by (73). Finally, the second equality is obtained via (69).
The next lemma concerns the conditioning on classical information.
Lemma B.3 (Proposition 5.1 of [52]). For any density operator ρ ABX which is classical on X, i.e., where ρ AB|x are density operators on A ⊗ B and {|x } x∈X is an orthonormal basis of X, we have Proof. Using the explicit form of ρ ABX , it is straightforward to verify that Taking the trace on both sides, the equality can be rewritten in terms of α-entropies as which concludes the proof.
The following lemma can be found as Lemma 3.9 in [36]; the statement and its proof are given here for the convenience of the reader. The statement and proof can also be found in [52,Proposition 6.5].
Proof. Assume without loss of generality that σ has full support. By Lemma 6.1 in [51], we can find a λ such that λ D ε max (ρ σ) where and ∆ is the positive part of ρ−2 λ σ. 25 It suffices to upper-bound λ by D α (ρ σ)+g(ε)/(α−1). Now, let {|e i } i∈S be an orthonormal basis consisting of eigenvectors of ρ − 2 λ σ. Let S + be the subset of S corresponding to positive eigenvalues. Define the non-negative numbers r i = e i |ρ|e i and s i = e i |σ|e i . Note that for i ∈ S + , we have r i − 2 λ s i = e i |(ρ − 2 λ σ)|e i 0 and therefore ri si 2 −λ 1. We use this to bound

Now, we solve Equation (74) for tr[∆] and bound
It remains to upper-bound 1 α−1 log i∈S r α i s 1−α i by D α (ρ σ). To this end, we define the TPCP map F (X) = i∈S P i XP i , where P i denotes the projector onto the subspace spanned by e i . Note that The theorem then follows from the data processing inequality.
The following two lemmas relate the entropy conditioned on a classical value x to the unconditioned entropy.
Lemma B.5. Let ρ AB be a quantum state of the form ρ = x p x ρ AB|x , where {p x } is a probability distribution over X . Then, for any x ∈ X and any α ∈ (1, ∞), and for α ∈ (0, 1), Proof. For any σ B and α ∈ (1, ∞), we have For the first inequality, we used the fact that . We then used the fact that y → y α is a monotone function on [0, ∞). Taking the infimum over σ B and then multiplying both sides by −1, we get the desired result. The proof is the same for α ∈ (0, 1) except that the direction of the inequality is reversed.
Lemma B.6. Let ρ AB be a quantum state of the form ρ = x p x ρ AB|x , where {p x } is a probability distribution over X . Then, for any x ∈ X and any α ∈ (1, 2], Proof. We define the state ρ ABX = x ′ p x ′ ρ AB|x ′ ⊗ |x ′ x ′ | X . Note that it is legitimate to use the notation ρ as the reduced state on A ⊗ B corresponds to ρ AB . As conditioning can only decrease the entropy, we obtain Lemma B.7. Let E be a TPCP map from A ⊗ B to A ⊗ B ⊗ X defined by E(W AB ) = y,z (Π y,A ⊗ Π z,B )W AB (Π y,A ⊗ Π z,B ) ⊗ |t(y, z) t(y, z)| X , where t : Y × Z → X is a (deterministic) function, {Π y,A } y∈Y and {Π z,B } z∈Z are mutually orthogonal projectors acting on A and B, respectively, and {|x } x∈X is an orthonormal basis on X. Let ρ ABX = E(ω AB ), for an arbitrary state ω AB . Then for α ∈ [ 1 2 , ∞), we have Proof. We only prove Eq. (78). Eq. (79) is easier. Let M be the TPCP map from B to B defined by M(W B ) = z Π z,B W B Π z,B . Using the data processing inequality and the fact that (I AX ⊗ M)(ρ ABX ) = ρ ABX , we have Similarly, We now show that for any state σ B , we have D α (ρ ABX id AX ⊗ M(σ B )) = D α (ρ AB id A ⊗ M(σ B )). To make the notation lighter, we use in the following Π z for Π z,B and Π y for Π y,A . The relative entropy D α (ρ ABX id AX ⊗ M(σ B )) is defined in terms of And this concludes the proof.
In the subsequent arguments we will use the quantity which is defined for any non-negative operators ρ and σ on the same space and for any α ∈ [0, 1) ∪ (1, ∞). As observed in [64,17,24], it follows from the Araki-Lieb-Thirring inequality [31,3] that Furthermore, we can define a conditional entropy based on this quantity: In [53,Theorem 2], it is shown that H ′ and H are duals of each other, in the sense that for any pure state ρ ABC . The following lemma is another variant of Lemma 8 of [54] (see also Lemma 6.3 of [51]).
The following lemma is a generalisation of Proposition 3.10 of [36]. In [52, Section 6.4.2], using a Taylor approximation, the factor in front of (α − 1) can be improved although at the price of having the error term containing non-explicit constants.
Lemma B.9. For any density operator ρ AB and 1 < α < 1 + 1/ log(1 + 2d A ) Proof. We start with the proof of the first inequality. Lemma B.8 implies that holds for all 1 < α < 1 + 1 log(1+2dA) , where we have used that Furthermore, because of (82) we have Combining this with the above concludes the proof of the first inequality. The second inequality follows directly from the monotonicity of the relative Rényi entropy in α [8,37].
Combining the two inequalities with the fact that −H(A|C) = H(A|B) concludes the proof.
The following lemma generalises a classical result originally proposed in [44]. It follows rather directly from similar statements proved in [54,51,36,34].
Proof. For the first inequality, we use Lemma B.4, which directly implies that The desired inequality then follows because To prove the second inequality we use the duality between smooth min-and maxentropy [55], which asserts that H ε max (A|B) ρ = −H ε min (A|C) ρ hols for any purification ρ ABC of ρ AB . We can then employ Proposition 6.2 of [51], 26 The claim then follows from (83): Similarly, using the composition property, one can show that this set of conditions is equivalent to the set of conditions (i = 1, . . . , n − 1).
The latter can be expressed in terms of the entropy equalities The entropy accumulation statement for the smooth min-entropy in the simplified form (3) can thus be rewritten as Note that, if one replaces the smooth min-entropy on the left hand side by the von Neumann entropy then this expression looks similar to the usual chain rule for von Neumann entropies, which holds for arbitrary ρ A n 1 B n 1 . One may therefore wonder whether (86) may also hold without the Markov conditions (85). This is however not the case, as we are going to show with a specific example.
The example is classical, in the sense that A 1 , . . . , A n and B 1 , . . . B n correspond to random variables and the map M i takes a i−1 1 b i−1