1 Introduction

In classical information theory, the uncertainty one has about a variable A given access to side information B can be operationally quantified by the number of bits one would need to learn, in addition to B, in order to reconstruct A. While this number generally fluctuates, it is—except with probability of order \(\varepsilon > 0\)—not larger than the \(\varepsilon \)-smooth max-entropy, \(H_{\max }^\varepsilon (A|B)_\rho \), evaluated for the joint distribution \(\rho \) of A and B [45].Footnote 1 Conversely, it is in the same way not smaller than the \(\varepsilon \)-smooth min-entropy, \(H_{\min }^\varepsilon (A|B)_\rho \). This may be summarised by saying that the number of bits needed to reconstruct A from B is with probability at least \(1-O(\varepsilon )\) contained in the interval

$$\begin{aligned} I = \bigl [H_{\min }^\varepsilon (A|B)_\rho , H_{\max }^\varepsilon (A|B)_\rho \bigr ] \ , \end{aligned}$$
(1)

whose boundaries are defined by the smooth entropies. We refer to Definition 2.2 below for a precise definition of these quantities.

This approach to quantifying uncertainty can be extended to the case where A and B are quantum systems. The conclusion remains the same: the operationally relevant uncertainty interval is I as defined by (1). The only difference is that \(\rho \) is now a density operator, which describes the joint state of A and B [41, 44, 51].

Finding the boundaries of the interval I is a central task of information theory. However, the smooth entropies of a large system A are often difficult to calculate. It is therefore rather common to introduce certain assumptions to render this task more feasible. One extremely popular approach in standard information theory is to assume that the system consists of many mutually independent and identically distributed (IID) parts. More precisely, the IID Assumption demands that the system be of the form \(A = A_1^n = A_1 \otimes \cdots \otimes A_n\), that the side information have an analogous structure \(B = B_1^n = B_1 \otimes \cdots \otimes B_n\), and that the joint state of these systems be of the form \(\rho _{A_1 B_1 \cdots A_n B_n} = \nu _{A B}^{\otimes n}\), for some density operator \(\nu _{A B}\). A fundamental result from information theory, the Asymptotic Equipartition Property (AEP) [48] (see [54] for the quantum version), then asserts that the uncertainty interval satisfies

$$\begin{aligned} I \subset \left[ n \left( H(A | B)_{\nu } - \frac{c_\varepsilon }{\sqrt{n}}\right) , \, n \left( H(A | B)_{\nu } + \frac{c_\varepsilon }{\sqrt{n}} \right) \right] \ , \end{aligned}$$
(2)

where \(c_\varepsilon \) is a constant (independent of n) and where \(H(A|B)_{\nu }\) is the conditional von Neumann entropy evaluated for the state \(\nu _{A B}\). In other words, for large n, the operationally relevant total uncertainty one has about \(A_1^n\) given \(B_1^n\) is well approximated by \(n H(A|B)_{\nu } = \sum _{i} H(A_i | B_i)_{\rho }\). In this sense, the entropy of the individual systems \(A_i\) accumulates to the entropy of the total system \(A_1^n\).Footnote 2

In this work, we generalise this statement to the case where the individual pairs \(A_i B_i\) are no longer independent of each other, i.e., where the IID assumption does not hold. Without loss of generality one may think of the pairs \(A_i B_i\) as being generated by a sequence of processes \({\mathcal {M}}_i\), as shown in Fig. 1. Each process \({\mathcal {M}}_i\) may pass information on to the next one via a “memory” register \(R_i\). The state of the “future” pairs can thus depend on the “past” ones.Footnote 3 The only assumption we make is that, given the side information \(B_1^i\) generated until step i, the systems \(A_1^i\) are independent of the next piece of side information \(B_{i+1}\). This is captured by the requirement that \(A_1^{i} \leftrightarrow B_1^{i} \leftrightarrow B_{i+1}\) forms a quantum Markov chain.Footnote 4 Entropy accumulation is then the claim that

$$\begin{aligned} I \subset \left[ \sum _{i =1}^n \left( \inf _{\omega _{R_{i-1} R}} H(A_i | B_i R )_{{\mathcal {M}}_i(\omega )} - \frac{c_\varepsilon }{\sqrt{n}} \right) , \, \sum _{i =1}^n \left( \sup _{\omega _{R_{i-1} R}} H(A_i | B_i R)_{{\mathcal {M}}_i(\omega )} + \frac{c_\varepsilon }{\sqrt{n}} \right) \right] \ , \end{aligned}$$
(3)

where, in the ith term of each sum, the infimum or supremum ranges over joint states \(\omega _{R_{i-1} R}\) of the memory \(R_{i-1}\) and a system R isomorphic to it, and the conditional von Neumann entropy is evaluated for the state \(({{\mathcal {M}}_i \otimes {\mathcal {I}}_R})(\omega _{R_{i-1} R})\), abbreviated by \({\mathcal {M}}_i(\omega )\), which describes the output pair \(A_i B_i\) generated by \({\mathcal {M}}_i\) jointly with R.

To illustrate (3) it is useful to think of a communication scenario with two parties, Alice and Bob, who are receiving information \(A_1^n\) and \(B_1^n\), respectively. Suppose that a source with memory \(R_i\) generates this information sequentially in n steps, described by maps \({\mathcal {M}}_i\) as depicted in Fig. 1. Suppose furthermore that Bob would like to infer all n values \(A_i\) (which, for the purpose of this example, we assume to be classical). As discussed above, for this he would require N additional classical bits from Alice, where N fluctuates (up to probability \(\varepsilon \)) within an interval I with boundaries given by the entropies \(H_{\min }^{\varepsilon }(A_1^n|B_1^n)\) and \(H_{\max }^{\varepsilon }(A_1^n|B_1^n)\), which quantify Bob’s uncertainty about \(A_1^n\). While these entropies depend on the joint state \(\rho _{A_1^n B_1^n}\) of the entire information generated by the source over all n steps, they can, according to (3), be lower (or upper) bounded by a sum of terms that merely depend on the individual steps \({\mathcal {M}}_i\). Specifically, the minimum (or maximum) number N of bits that Alice needs to send to Bob so that he can infer her values \(A_i\) grows for each such value by the von Neumann entropy \(H(A_i|B_i R)\), minimised (or maximised) over all possible states the memory \(R_{i-1}\) could have been in right before the pair \(A_i B_i\) was produced, and conditioned on \(B_i\) as well as any information R about this memory.Footnote 5

Fig. 1
figure 1

Circuit diagram illustrating the decomposition of states \(\rho _{A_1^n B_1^n}\) relevant for our main theorem. One starts with a state \(\rho ^0_{R_0}\), and each of the pairs \(A_i B_i\) is generated sequentially, one after the other, by the process \({\mathcal {M}}_i\). The map \({\mathcal {M}}_i\) takes as input a state on \(R_{i-1}\) and outputs a state on \(R_{i} \otimes A_i \otimes B_i\)

The main result we derive in this work is actually a bit more general than (3), allowing one to take into account global information about the statistics of \(A_1^n\) and \(B_1^n\). This is relevant for applications. In quantum key distribution, for instance, \({\mathcal {M}}_i\) models the generation of the ith bit of the raw key. However, in this cryptographic scenario, \({\mathcal {M}}_i\) can depend on the attack strategy of an adversary, and is thus partially unknown. Hence, in order to bound the entropy (which characterises an adversary’s uncertainty) of the raw key bits, one must as well take into account global statistical properties. These are inferred by tests carried out by the quantum key distribution protocol on a small sample of the generated bits. To incorporate such statistical information in the analysis, we consider for each i an additional classical value \(X_i\) derived from \(A_i\) and \(B_i\), as depicted by Fig. 2. Specifically, \(X_i\) shall tell us whether position i was included in the statistical test, and if so, the outcome of the test performed at step i. For this extended scenario, (3) still holds, but now the infimum and supremum are taken over a restricted set, containing only those states \(\omega \) for which the resulting probability distribution on \(X_i\) corresponds to the observed statistics.

Fig. 2
figure 2

Circuit diagram illustrating the decomposition of states \(\rho _{A_1^n B_1^n X_1^n}\) relevant for the full version of our main theorem, which can take into account statistical information \(X_1^n\). The individual pieces \(X_i\) of this statistical information are classical values that can be determined from \(A_i\) and \(B_i\) without disturbing them. When \(A_i\) and \(B_i\) are themselves classical, this means that \(X_i\) is a deterministic function of \(A_i\) and \(B_i\). For a precise definition in the general case we refer to Sect. 4

Entropy accumulation has a number of theoretical and practical implications. For example, it serves as a technique to turn cryptographic security proofs that were restricted to collective attacks to security proofs against general attacks. This application is demonstrated in [5] for the case of a fully device-independent quantum key distribution and a randomness expansion protocol. The resulting security bounds are essentially tight, implying that device-independent cryptography is possible with state-of-the-art technology. To illustrate the basic ideas behind such applications, we will present two concrete examples in more detail. The first is a proof of security of a variant of the E91 Quantum Key Distribution protocol. This new security proof has two advantages. First, its structure is modular and it may therefore be adapted to other cryptographic schemes (see also the discussion in Sect. 6). In addition, it achieves a strong level of security where no assumption is made on Bob’s devices. This is sometimes referred to as one-sided measurement device independence and this level of security was partially achieved in [58] (they used a memoryless devices assumption which we do not need) and later fully in [56] though with sub-optimal rates.

The second example is the derivation of an upper bound on the fidelity achievable by Fully Quantum Random Access Codes.

The proof of the main result, Eq. (3), has a similar structure as the proof of the Quantum Asymptotic Equipartition Property [54], which we can retrieve as a special case (see Corollary 4.10). The idea is to first bound the smooth entropy of the entire sequence \(A_1^n\) conditioned on \(B_1^n\) by a conditional Rényi entropy of order \(\alpha \), then decompose this entropy into a sum of conditional Rényi entropies for the individual terms \(A_i\), and finally bound these in terms of von Neumann entropies. However, in contrast to previous arguments, we use a recently introduced version of conditional Rényi entropies, termed “sandwiched Rényi entropies” [37, 64]. For these entropies, we derive a novel chain rule, which forms the core technical part of our proof. In addition, some of the concepts used in this work generalise techniques proposed in the recent security proofs for device-independent cryptography presented in [34, 35]. In particular, the dominant terms of the lower bound on the amount of randomness obtained in [35], called rate curves, are similar to the tradeoff functions considered here (cf. Definition 4.1).Footnote 6

Paper organisation: We begin with preliminaries and notation in Sect. 2. Section 3 is devoted to the central technical ingredient of our argument, a chain rule for Rényi entropies. The main result, the theorem on entropy accumulation, is then stated and proved in Sect. 4. In Sect. 5 we present the two sample applications mentioned above, before concluding with remarks and suggestions for future work in Sect. 6.

2 Preliminaries

2.1 Notation

In the table below, we summarise some of the notation used throughout the paper:

Symbol

Definition

\(A, B, C, \dots \)

Quantum systems, and their associated Hilbert spaces

\({\mathcal {L}}(A,B)\)

Set of linear operators from A to B

\({\mathcal {L}}(A)\)

\({\mathcal {L}}(A,A)\)

\(X_{AB}\)

Operator in \({\mathcal {L}}(A \otimes B)\)

\(X_{B \leftarrow A}\)

Operator in \({\mathcal {L}}(A, B)\)

\(\mathrm {D}(A)\)

Set of normalised density operators on A

\(\mathrm {D}_{\leqslant }(A)\)

Set of sub-normalised density operators on A

\(\mathrm {Pos}(A)\)

Set of positive semidefinite operators on A

\(X^{-1}\) for \(X \in \mathrm {Pos}(A)\)

Generalised inverse, such that \(XX^{-1}X = X\) holds

\(X_A \geqslant Y_A\)

\(X_A - Y_A \in \mathrm {Pos}(A)\)

\(A_i^j\) (with \(j \geqslant i\))

Given n systems \(A_1,\dots ,A_n\), this is a shorthand for \(A_i,\dots ,A_j\)

\(A^n\)

Often used as shorthand for \(A_1,\dots ,A_n\)

\(\log (x)\)

Logarithm of x in base 2

Throughout the paper, we restrict ourselves to finite-dimensional Hilbert spaces. Furthermore, we use the following notation for classical-quantum states \(\rho _{X A} \in \mathrm {D}(X \otimes A)\) with respect to the basis of the system X. For any \(x \in {\mathcal {X}}\), we let so that . To refer to the conditional state, we write \(\rho _{A|x} = \frac{\rho _{A,x}}{\mathrm {tr}(\rho _{A,x})}\). An event \(\Omega \subseteq {\mathcal {X}}\) in this paper refers to a subset of \({\mathcal {X}}\) and we can similarly define , where we introduced the notation \(\rho [\Omega ] = \sum _{x \in \Omega } \mathrm {tr}(\rho _{A, x})\). We also use the usual notation for the partial trace for conditional states, e.g., \(\rho _{XA|\Omega } = \mathrm {tr}_{B}(\rho _{XAB|\Omega })\).

For a density operator \(\rho _{A B} \in \mathrm {D}(A \otimes B)\) on a bipartite Hilbert space \(A \otimes B\) we define the operatorFootnote 7

$$\begin{aligned} \rho _{A | B} = (\mathrm {id}_A \otimes \rho _B)^{-\frac{1}{2}} \rho _{A B} (\mathrm {id}_A \otimes \rho _B)^{-\frac{1}{2}} \ , \end{aligned}$$

which may be interpreted as the state of A conditioned on B, analogous to a conditional probability distribution. This operator was previously defined and studied in [6, 30]. In the following, we will usually drop identity operators from the notation when they are clear from the context. We would thus write, for instance,

$$\begin{aligned} \rho _{A | B} = \rho _B^{-\frac{1}{2}} \rho _{A B} \rho _B^{-\frac{1}{2}} \ . \end{aligned}$$

Remark 2.1

Let A and \({\bar{A}}\) be two isomorphic Hilbert spaces with orthonormal bases and and define

Then any trace-non-increasing map \({\mathcal {M}}= {\mathcal {M}}_{B \leftarrow {\bar{A}}}\) from \({\mathcal {L}}({\bar{A}})\) to \({\mathcal {L}}(B)\) can be represented as a “conditional state” (also known as the Choi-Jamiolkowski state) \(M_{B | A}\) on \(A \otimes B\) with the property that

$$\begin{aligned} M_{B | A} \geqslant 0 \qquad \text {and} \qquad \mathrm {tr}_B[M_{B|A}] \leqslant \mathrm {id}_A \end{aligned}$$
(4)

and such that

(5)

holds. Specifically, for any map \({\mathcal {M}}\) one may define

(6)

it is then straightforward to verify the properties above.

Conversely, for any \(M_{B | A}\) such that (4) holds the map defined by

satisfies (6) and hence (5). It is also easy to verify that it is completely positive and trace non-increasing.

We mention here a slight abuse of terminology: for a completely positive map \({\mathcal {M}}_{B \leftarrow A}\) from \({\mathcal {L}}(A)\) to \({\mathcal {L}}(B)\), we often use a shorthand to indicate the systems it acts on and simply say that it maps A to B.

2.2 Background on quantum Markov chains

The concept of quantum Markov chains will be used throughout the paper, and here we give some relevant basic facts about them. Let \(\{a_j\}_{j \in J}\) and \(\{c_j\}_{j \in J}\) be families of Hilbert spaces and let B be a Hilbert space such thatFootnote 8

$$\begin{aligned} B \cong \bigoplus _{j \in J} a_j \otimes c_j \ , \end{aligned}$$
(7)

holds. Let us furthermore denote by \(V = \bigoplus _{j \in J} V_{a_j c_j \leftarrow B}\) the corresponding isomorphism. It is convenient to treat \(\bigoplus _j a_j \otimes c_j\) as a subspace of the product \(a \otimes c\) of the spaces

$$\begin{aligned} a = \bigoplus _{j \in J} a_j \qquad \text {and} \qquad c = \bigoplus _{j \in J} c_j \ . \end{aligned}$$

The mapping V may then be viewed as an embedding of B into \(a \otimes c\). Given a density operator \(\rho _B\), we denote by \(\rho _{a c}\) the density operator \(V \rho _B V^{\dagger }\). More generally, for a multi-partite density operator \(\rho _{A B}\), we write \(\rho _{A a c}\) for \(V \rho _{A B} V^{\dagger }\). Furthermore, for any \(j \in J\), we denote by \(\rho _{A a_j c_j}\) the projection of \(\rho _{A a c}\) onto the subspace defined by \(a_j \otimes c_j\), i.e.,

$$\begin{aligned} \rho _{A a_j c_j} = V_{a_j c_j \leftarrow B} \rho _{A B} V_{a_j c_j \leftarrow B}^{\dagger } \ . \end{aligned}$$
(8)

A tri-partite density operator \(\rho _{A B C}\) is said to obey the Markov chain condition \(A \leftrightarrow B \leftrightarrow C\) if there exists a decomposition of B of the form (7) such that

$$\begin{aligned} \rho _{A B C} \cong \rho _{A a c C} = \bigoplus _{j \in J} q_j {\hat{\rho }}_{A a_j} \otimes {\hat{\rho }}_{c_j C} \end{aligned}$$
(9)

where \(\{q_j\}_{j \in J}\) is a probability distribution and \(\{{\hat{\rho }}_{A a_j}\}_{j \in J}\) and \(\{{\hat{\rho }}_{c_j C}\}_{j \in J}\) are families of density operators [26, 28, 39]. It follows from this decomposition that a state \(\rho _{ABC}\) obeying the Markov chain condition can be reconstructed from \(\rho _{AB}\) with a map \({\mathcal {T}}_{BC \leftarrow B}\) acting only on B [39]:

$$\begin{aligned} \rho _{A B C} = {\mathcal {I}}_{A} \otimes {\mathcal {T}}_{BC \leftarrow B}(\rho _{AB}) \ . \end{aligned}$$
(10)

Another useful characterization of the Markov chain condition for \(\rho _{ABC}\) is given by the entropic equality \(I(A:C|B)_{\rho } = 0\) [26, 28, 39]. The conditional mutual information is defined as \(I(A:C|B)_{\rho } = H(AB)_{\rho } + H(BC)_{\rho } - H(B)_{\rho } - H(ABC)_{\rho }\) where \(H(A)_{\rho } = -\mathrm {tr}(\rho _{A} \log \rho _{A})\) is the von Neumann entropy.

2.3 Entropic quantities

The formulation of the main claim refers to smooth entropies, which can be defined as follows.

Definition 2.2

For any density operator \(\rho _{A B}\) and for \(\varepsilon \in [0,1]\) the \(\varepsilon \)-smooth min- and max-entropies of A conditioned on B are

$$\begin{aligned} H_{\min }^\varepsilon (A|B)_{\rho }&= - \log \inf _{{\tilde{\rho }}_{A B}} \inf _{\sigma _B} \left\| {\tilde{\rho }}_{A B}^{\frac{1}{2}} \sigma _B^{-\frac{1}{2}} \right\| _{\infty }^2 \\ H_{\max }^\varepsilon (A|B)_{\rho }&= \log \inf _{{\tilde{\rho }}_{A B}} \sup _{\sigma _B} \left\| {\tilde{\rho }}_{A B}^{\frac{1}{2}} \sigma _B^{\frac{1}{2}} \right\| _1^2 \ , \end{aligned}$$

respectively, where \({\tilde{\rho }}\) is any non-negative operator with trace at most 1 that is \(\varepsilon \)-close to \(\rho \) in terms of the purified distanceFootnote 9 [51, 55], and where \(\sigma _B\) is any density operator on B.

The proof we present here heavily relies on the sandwiched relative Rényi entropies introduced in [37, 64]. These relative entropies can be used to define a conditional entropy.Footnote 10

Definition 2.3

For any density operator \(\rho _{A B}\) and for \(\alpha \in (0, 1) \cup (1, \infty )\) the sandwiched \(\alpha \)-Rényi entropy of A conditioned on B is defined as

$$\begin{aligned} H_\alpha (A|B)_{\rho } = - \frac{1}{\alpha '} \log \left\| \rho _{A B}^{\frac{1}{2}} \rho _B^{\frac{-\alpha '}{2}} \right\| _{2 \alpha }^2 \ , \end{aligned}$$

where \(\alpha ' = \frac{\alpha - 1}{\alpha }\) and where \(\Vert X \Vert _\alpha = \mathrm {tr}\bigl ( (X^{\dagger } X)^{\frac{\alpha }{2}} \bigr )^{\frac{1}{\alpha }}\). Note that \(\alpha '\) is the inverse of the Hölder conjugate of \(\alpha \).

We note that, while the function \(X \mapsto \Vert X \Vert _\alpha \) is a norm for \(\alpha \geqslant 1\), this is not the case for \(\alpha < 1\) since it does not satisfy the triangle inequality. Some key properties of this function are summarised in Appendix A. Using them, the sandwiched Rényi entropies may be rewritten as

$$\begin{aligned} H_\alpha (A|B)_{\rho }&= \frac{\alpha }{1-\alpha } \log \left\| \rho _B^{\frac{1-\alpha }{2\alpha }} \rho _{A B} \rho _B^{\frac{1-\alpha }{2\alpha }} \right\| _\alpha \\&= \frac{1}{1-\alpha } \log \mathrm {tr}\left( \bigl ( \rho _B^{\frac{1-\alpha }{2\alpha }} \rho _{A B} \rho _B^{\frac{1-\alpha }{2\alpha }} \bigr )^{\alpha } \right) \ . \end{aligned}$$

It turns out that there are multiple ways of defining conditional entropies from relative entropies. Another variant that will be needed in this work is the following:

Definition 2.4

For any density operator \(\rho _{A B}\) and for \(\alpha \in (0, 1) \cup (1, \infty )\), we define

$$\begin{aligned} H^{\uparrow }_\alpha (A|B)_{\rho } = - \inf _{\sigma _B} \frac{1}{\alpha '} \log \left\| \rho _{A B}^{\frac{1}{2}} \sigma _B^{\frac{-\alpha '}{2}} \right\| _{2 \alpha }^2 \ , \end{aligned}$$

where the infimum is over all sub-normalised density operators on B.

Other relevant facts about the sandwiched Rényi entropy and the corresponding notion of relative entropy can be found in Appendix B.

3 Chain Rule for Rényi Entropies

As explained in the introduction, our main result can be regarded as a generalisation of the Quantum Asymptotic Equipartition Property [54], corresponding to (2). The approach used for the proof of the latter is to bound both the smooth min-entropy and the von Neumann entropy by Rényi entropies with an appropriate parameter \(\alpha \). The IID assumption is then used to decompose the Rényi entropy into a sum of n terms. However, since our main claim, Eq. (3), is supposed to hold for general non-IID states, we do not have this luxury, and we must somehow decompose the Rényi entropy into n terms using other means. The tool we will use for this purpose is a chain rule for Rényi entropies, which we present as a separate theorem (Theorem 3.2). We start by stating a more general version that will be useful in the proof of the main theorem.

Lemma 3.1

Let \(\rho _{A_1 A_2 B}\) and \(\sigma _B\) be density operators and let \(\alpha \in (0, \infty )\). Then

$$\begin{aligned} D_{\alpha }(\rho _{A_1 B} \Vert \mathrm {id}_{A_1} \otimes \sigma _B) - D_\alpha (\rho _{A_1 A_2 B}\Vert \mathrm {id}_{A_1 A_2} \otimes \sigma _B)&= H_{\alpha }(A_2 | A_1 B)_{\nu } \ , \end{aligned}$$

where

$$\begin{aligned} \nu _{A_1A_2B} = \nu ^{\frac{1}{2}}_{A_1B} \rho _{A_2|A_1B} \nu ^{\frac{1}{2}}_{A_1B} \, \text { with } \, \nu _{A_1B} = \frac{\left( \rho _{A_1B}^{\frac{1}{2}} \sigma ^{\frac{1-\alpha }{\alpha }}_{B} \rho _{A_1B}^{\frac{1}{2}}\right) ^{\alpha }}{\mathrm {tr}\left( \rho _{A_1B}^{\frac{1}{2}} \sigma ^{\frac{1-\alpha }{\alpha }}_{B} \rho _{A_1B}^{\frac{1}{2}}\right) ^{\alpha }} \ . \end{aligned}$$
(11)

We note that \(\nu _{A_1B} = \mathrm {tr}_{A_2}(\nu _{A_1A_2B})\), which justifies the notation.

Proof

When \(\alpha = 1\), this equality follows directly from the definition of the entropies. To prove the equality for \(\alpha \in (0,1) \cup (1, \infty )\) we consider a purification of \(\rho _{A_1A_2B}\). Using Lemma B.2 and setting \(\alpha ' = \frac{\alpha -1}{\alpha }\) we have

By the definition of \(\nu _{A_1B}\), we get

where we defined the pure state , which is a purification of \(\nu _{A_1A_2B}\). To conclude we use the fact that \(\nu _{A_1B} = \mathrm {tr}_{A_2}(\nu _{A_1A_2B})\) and Lemma B.2. \(\quad \square \)

By choosing \(\sigma _B = \rho _{B}\) in Lemma 3.1, we directly obtain a chain rule for the Rényi entropies:

Theorem 3.2

Let \(\rho _{A_1 A_2 B}\) be a density operator and let \(\alpha \in (0, \infty )\). Then

$$\begin{aligned} H_\alpha (A_1 A_2 | B)_{\rho }&= H_{\alpha }(A_1 | B)_{\rho } + H_{\alpha }(A_2 | A_1 B)_{\nu } \ , \end{aligned}$$
(12)

where

$$\begin{aligned} \nu _{A_1A_2B} = \nu ^{\frac{1}{2}}_{A_1B} \rho _{A_2|A_1B} \nu ^{\frac{1}{2}}_{A_1B} \, \text { with } \, \nu _{A_1B} = \frac{\left( \rho _{A_1B}^{\frac{1}{2}} \rho ^{\frac{1-\alpha }{\alpha }}_{B} \rho _{A_1B}^{\frac{1}{2}}\right) ^{\alpha }}{\mathrm {tr}\left( \rho _{A_1B}^{\frac{1}{2}} \rho ^{\frac{1-\alpha }{\alpha }}_{B} \rho _{A_1B}^{\frac{1}{2}}\right) ^{\alpha }} \ . \end{aligned}$$
(13)

One drawback of the above result is that we are seldom interested in the particular state \(\nu \) defined in the theorem statement. It is therefore generally more useful to present the result in a slightly weaker form, where the state \(\nu \) is chosen to be the worst case over an appropriate class of density operators. When \(\rho \) obeys the Markov chain condition \(A_1 \leftrightarrow B_1 \leftrightarrow B_2\), we obtain the following result.

Theorem 3.3

Let \(\rho _{A_1 B_1 A_2 B_2}\) be a density operator such that the Markov chain condition \(A_1 \leftrightarrow B_1 \leftrightarrow B_2\) holds and let \(\alpha \in (0, \infty )\). Then

$$\begin{aligned} \inf _{\nu } H_{\alpha }(A_2 | B_2 A_1 B_1)_{\nu } \leqslant H_\alpha (A_1 A_2 | B_1 B_2)_{\rho } - H_{\alpha }(A_1 | B_1)_{\rho } \leqslant \sup _{\nu } H_{\alpha }(A_2 | B_2 A_1 B_1)_{\nu } \end{aligned}$$
(14)

where the supremum and infimum range over density operators \(\nu \) such that \(\nu _{A_2 B_2 | A_1 B_1} = \rho _{A_2 B_2 | A_1 B_1}\) holds.

Proof

We apply Theorem 3.2 with \(B = B_1B_2\). The Markov chain condition implies that \(H_\alpha (A_1|B_1B_2)_{\rho } = H_\alpha (A_1|B_1)_{\rho }\). To see this for \(\alpha \in (\frac{1}{2}, \infty )\), we could use the recoverability condition (10) for Markov chains together with the monotonicity of \(D_{\alpha }\) under quantum channels [8, 24, 37, 64]. We can also see it for all \(\alpha \in (0, \infty )\) using the structure of a Markov chain stated in (9). Namely, there exists a decomposition \(\bigoplus _{j} a_j \otimes b_j\) of the system \(B_1\) such that

$$\begin{aligned} \rho _{A_1 B_1 B_2}&\cong \bigoplus _{j} q_j \, {\hat{\rho }}_{A_1 a_j} \otimes {\hat{\rho }}_{b_j B_2} \end{aligned}$$
(15)

holds, where \(\{q_j\}\) is a probability distribution and where \(\{{\hat{\rho }}_{A a_j}\}\) and \(\{{\hat{\rho }}_{b_j B}\}\) are families of density operators. Then,

$$\begin{aligned} H_{\alpha }(A_1 | B_1 B_2)_{\rho }&= \frac{1}{1-\alpha } \log \mathrm {tr}\left( \bigl ( \rho _{B_1 B_2}^{\frac{-\alpha '}{2}} \rho _{A_1 B_1 B_2} \rho _{B_1 B_2}^{\frac{-\alpha '}{2}} \bigr )^{\alpha } \right) \\&= \frac{1}{1-\alpha } \log \mathrm {tr}\left( \bigoplus _{j} q_j \bigl ( {\hat{\rho }}_{a_j}^{\frac{-\alpha '}{2}} \otimes {\hat{\rho }}_{b_j B_2}^{\frac{-\alpha '}{2}} \left( {\hat{\rho }}_{A_1 a_j} \otimes {\hat{\rho }}_{b_j B_2} \right) {\hat{\rho }}_{a_j}^{\frac{-\alpha '}{2}} \otimes {\hat{\rho }}_{b_j B_2}^{\frac{-\alpha '}{2}} \bigr )^{\alpha } \right) \\&= \frac{1}{1-\alpha } \log \mathrm {tr}\left( \bigoplus _{j} q_j \bigl ( {\hat{\rho }}_{a_j}^{\frac{-\alpha '}{2}} {\hat{\rho }}_{A_1 a_j} {\hat{\rho }}_{a_j}^{\frac{-\alpha '}{2}} \bigr )^{\alpha } \otimes {\hat{\rho }}_{b_j B_2} \right) = H_{\alpha }(A_1 | B_1)_{\rho } \ . \end{aligned}$$

To prove (14), it only remains to show that the state \(\nu _{A_1A_2B_1B_2}\) defined in (13) satisfies \(\nu _{A_2B_2|A_1B_1} = \rho _{A_2B_2|A_1B_1}\). For that, we again use the fact that \(\rho _{A_1B_1B_2}\) forms a Markov chain. As we will be using this statement later in other contexts, we state it as a claim.

\(\square \)

Claim 3.4

Let \(\rho _{A_1 B_1 A_2 B_2}\) be a density operator such that the Markov chain condition \(A_1 \leftrightarrow B_1 \leftrightarrow B_2\) holds, let \(\alpha \in (0, \infty )\) and let \(\nu _{A_1 B_1 A_2 B_2}\) be as in (13) with \(B \rightarrow B_1 B_2\). Then \(\nu _{A_2 B_2 | A_1 B_1} = \rho _{A_2 B_2 | A_1 B_1}\).

Letting \(Z = \mathrm {tr}\left( \rho _{A_1B_1B_2}^{\frac{1}{2}} \rho ^{-\alpha '}_{B_1B_2} \rho _{A_1B_1B_2}^{\frac{1}{2}}\right) ^{\alpha }\), the decomposition (15) allows us to write

$$\begin{aligned} \nu _{A_1B_1B_2}&= \frac{1}{Z} \left( \rho _{A_1B_1B_2}^{\frac{1}{2}} \rho _{B_1B_2}^{-\alpha '} \rho _{A_1B_1B_2}^{\frac{1}{2}} \right) ^{\alpha } \end{aligned}$$
(16)
$$\begin{aligned}&= \frac{1}{Z} \left( \bigoplus _{j} q_j^{1-\alpha '} {\hat{\rho }}_{A_1 a_j}^{\frac{1}{2}} {\hat{\rho }}_{a_j}^{-\alpha '} {\hat{\rho }}_{A_1 a_j}^{\frac{1}{2}} \otimes {\hat{\rho }}_{b_j B_2}^{1-\alpha '} \right) ^{\alpha } \end{aligned}$$
(17)
$$\begin{aligned}&= \frac{1}{Z} \bigoplus _{j} q_j \left( {\hat{\rho }}_{A_1 a_j}^{\frac{1}{2}} {\hat{\rho }}_{a_j}^{-\alpha '} {\hat{\rho }}_{A_1 a_j}^{\frac{1}{2}} \right) ^{\alpha } \otimes {\hat{\rho }}_{b_j B_2} \ . \end{aligned}$$
(18)

It follows that

$$\begin{aligned} \nu _{A_1B_1}^{-\frac{1}{2}} \nu _{A_1B_1B_2}^{\frac{1}{2}}&= \bigoplus _{j} {\hat{\rho }}^0_{A_1 a_j} \otimes {\hat{\rho }}_{b_j}^{-\frac{1}{2}} {\hat{\rho }}_{b_j B_2}^{\frac{1}{2}} \end{aligned}$$
(19)
$$\begin{aligned}&= \rho _{A_1B_1}^{-\frac{1}{2}} \rho _{A_1B_1B_2}^{\frac{1}{2}} \ , \end{aligned}$$
(20)

where \({\hat{\rho }}^0_{A_1 a_j}\) is the projector onto the support of \({\hat{\rho }}_{A_1 a_j}\). We used for the first equality the fact that the support of the operator \(\left( {\hat{\rho }}_{A_1 a_j}^{\frac{1}{2}} {\hat{\rho }}_{a_j}^{-\alpha '} {\hat{\rho }}_{A_1 a_j}^{\frac{1}{2}} \right) ^{\alpha }\) is the same as the support of \({\hat{\rho }}_{A_1 a_j}\). As a result, we find

$$\begin{aligned} \nu _{A_2B_2|A_1B_1}&= \nu _{A_1B_1}^{-\frac{1}{2}} \nu _{A_1B_1B_2}^{\frac{1}{2}} \rho _{A_2|A_1B_1B_2} \nu _{A_1B_1B_2}^{\frac{1}{2}} \nu _{A_1B_1}^{-\frac{1}{2}} \end{aligned}$$
(21)
$$\begin{aligned}&= \rho _{A_1B_1}^{-\frac{1}{2}} \rho _{A_1B_1B_2}^{\frac{1}{2}} \rho _{A_2|A_1B_1B_2} \rho _{A_1B_1B_2}^{\frac{1}{2}} \rho _{A_1B_1}^{-\frac{1}{2}} \end{aligned}$$
(22)
$$\begin{aligned}&= \rho _{A_2B_2 | A_1B_1} \ . \end{aligned}$$
(23)

This concludes the proof of Claim 3.4 and gives the desired statement. \(\quad \square \)

The following simple corollary expresses the above chain rules in terms of quantum channels, i.e., trace preserving completely positive (TPCP) maps, rather than conditional states.

Corollary 3.5

Let \(\rho ^0_{R A_1 B_1}\) be a density operator on \(R \otimes A_1 \otimes B_1\), \({\mathcal {M}}= {\mathcal {M}}_{A _2 B_2 \leftarrow R}\) be a TPCP map and \(\alpha \in (0, \infty )\). Assuming that \(\rho _{A_1 B_1 A_2 B_2} = {\mathcal {M}}(\rho ^0_{RA_1B_1})\) satisfies the Markov condition \(A_1 \leftrightarrow B_1 \leftrightarrow B_2\), we have

$$\begin{aligned} \inf _{\omega } H_{\alpha }(A_2 | B_2 A_1 B_1)_{{\mathcal {M}}(\omega )}&\leqslant H_\alpha (A_1 A_2 | B_1 B_2)_{{\mathcal {M}}(\rho ^0)} - H_{\alpha }(A_1 | B_1)_{\rho ^0} \\&\leqslant \sup _{\omega } H_{\alpha }(A_2 | B_2 A_1 B_1)_{{\mathcal {M}}(\omega )} \end{aligned}$$

where the supremum and infimum range over density operators \(\omega _{R A_1 B_1}\) on \(R \otimes A_1 \otimes B_1\). Moreover, if \(\rho ^0_{R A_1 B_1}\) is pure then we can optimise over pure states \(\omega _{R A_1 B_1}\).

Proof

We apply Theorem 3.3 to \(\rho _{A_1 B_1 A_2 B_2}\). It suffices to show that the optimisation over \(\nu _{A_1 B_1 A_2 B_2}\) satisfying \(\nu _{A_2 B_2 | A_1 B_1} = \rho _{A_2 B_2 | A_1 B_1}\) is contained in the optimisation over \(\omega _{RA_1B_1}\). For this, let \(\nu _{A_1 B_1 A_2 B_2}\) be any density operator satisfying \(\nu _{A_2 B_2 | A_1 B_1} = \rho _{A_2 B_2 | A_1 B_1}\), i.e.,

$$\begin{aligned} \nu _{A_1 B_1 A_2 B_2} = \nu _{A_1 B_1}^{\frac{1}{2}} \rho _{A_2 B_2 | A_1 B_1} \nu _{A_1 B_1}^{\frac{1}{2}} \ . \end{aligned}$$
(24)

Now we choose

$$\begin{aligned} \omega _{R A_1 B_1} = \nu _{A_1 B_1}^{\frac{1}{2}} \rho _{A_1 B_1}^{-\frac{1}{2}} \rho ^0_{R A_1 B_1} \rho _{A_1 B_1}^{-\frac{1}{2}} \nu _{A_1 B_1}^{\frac{1}{2}} \ . \end{aligned}$$

We then see that

$$\begin{aligned} {\mathcal {M}}(\omega _{R A_1 B_1})&= \nu _{A_1 B_1}^{\frac{1}{2}} \rho _{A_1 B_1}^{-\frac{1}{2}} {\mathcal {M}}(\rho ^0_{R A_1 B_1}) \rho _{A_1 B_1}^{-\frac{1}{2}} \nu _{A_1 B_1}^{\frac{1}{2}} \\&= \nu _{A_1 B_1}^{\frac{1}{2}} \rho _{A_2 B_2 | A_1 B_1 } \nu _{A_1 B_1}^{\frac{1}{2}} \\&= \nu _{A_1 B_1 A_2 B_2} \ . \end{aligned}$$

\(\square \)

4 Entropy Accumulation

This section is devoted to the main result on entropy accumulation. The statement is formulated in its fully general form as Theorem 4.4 and presented in a slightly simplified version as Corollary 4.8. We also give a formulation that corresponds to statement (3) of the introduction (Corollary 4.9). Finally, we show how the Quantum Asymptotic Equipartition Property follows as a special case (cf. Corollary 4.10).

For \(i \in \{1,\dots ,n\}\), let \({\mathcal {M}}_i\) be a TPCP map from \(R_{i-1}\) to \(X_i A_i B_i R_i\), where \(A_i\) is finite-dimensional and where \(X_i\) represents a classical value from an alphabet \({\mathcal {X}}\) that is determined by \(A_i\) and \(B_i\) together. More precisely, we require that, \({\mathcal {M}}_{i} = {\mathcal {T}}_{i} \circ {\mathcal {M}}'_i\) where \({\mathcal {M}}'_{i}\) is an arbitrary TPCP map from \(R_{i-1}\) to \(A_{i} B_{i} R_{i}\) and \({\mathcal {T}}_i\) is a TPCP map from \(A_{i}B_{i}\) to \(X_{i} A_i B_i\) of the form

(25)

where \(\{\Pi _{A_i, y}\}\) and \(\{\Pi _{B_i, z}\}\) are families of mutually orthogonal projectors on \(A_i\) and \(B_i\), and where \(t : {\mathcal {Y}}\times {\mathcal {Z}}\rightarrow {\mathcal {X}}\) is a deterministic function (cf. Figs. 1 and 2). Special cases of interest are when \(X_i\) is trivial and \({\mathcal {T}}_{i}\) is the identity map, and when \(X_i = t(Y_i, Z_i)\) where \(Y_i\) and \(Z_i\) are classical parts of \(A_i\) and \(B_i\), respectively. Note that the maps \({\mathcal {T}}_i\) have the property that, for any operator \({\bar{W}}_{X_iA_iB_i}\), if \({\bar{W}}_{X_iA_iB_i} = {\mathcal {T}}_{i}(W_{A_i B_i})\) then \({\bar{W}}_{X_i A_i B_i} = {\mathcal {T}}_{i}({\bar{W}}_{A_i B_i})\).

The entropy accumulation theorem stated below will hold for states of the form

$$\begin{aligned} \rho _{A_1^n B_1^n X_1^n E} = ({{\mathcal {M}}_n \circ \dots \circ {\mathcal {M}}_1} \otimes {\mathcal {I}}_E)(\rho ^0_{R_0 E}) \end{aligned}$$
(26)

where \(\rho ^0_{R_0 E} \in \mathrm {D}(R_0 \otimes E)\) is a density operator on \(R_0\) and an arbitrary system E. In addition, we require that the Markov conditions

$$\begin{aligned} A_1^{i-1} \leftrightarrow B_1^{i-1} E \leftrightarrow B_{i} \end{aligned}$$
(27)

be satisfied for all \(i \in \{1, \ldots , n\}\).

Let \({\mathbb {P}}\) be the set of probability distributions on the alphabet \({\mathcal {X}}\) of \(X_i\), and let R be a system isomorphic to \(R_{i-1}\). For any \(q \in {\mathbb {P}}\) we define the set of states

$$\begin{aligned} \Sigma _i(q) = \bigl \{\nu _{X_i A_i B_i R_i R} = ({\mathcal {M}}_i \otimes {\mathcal {I}}_R)(\omega _{R_{i-1} R}) : \quad \omega \in \mathrm {D}(R_{i-1} \otimes R) \text { and } \nu _{X_i} = q \bigr \} \ , \end{aligned}$$
(28)

where \(\nu _{X_i}\) denotes the probability distribution over \({\mathcal {X}}\) with the probabilities given by .

Definition 4.1

A real function f on \({\mathbb {P}}\) is called a min- or max-tradeoff function for \({\mathcal {M}}_i\) if it satisfies

$$\begin{aligned} f(q) \leqslant \inf _{\nu \in \Sigma _i(q)} H(A_i | B_i R)_{\nu } \qquad \text {or} \qquad f(q) \geqslant \sup _{\nu \in \Sigma _i(q)} H(A_i |B_i R)_{\nu } \ , \end{aligned}$$

respectively.Footnote 11

Remark 4.2

To determine the infimum \(\inf _{\nu \in \Sigma _i(q)} H(A_i | B_i R)_{\nu }\), we may assume that \(\omega _{R_{i-1} R}\) in the definition of \(\Sigma _i(q)\) is pure. In fact, including a purifying system in R cannot increase \(H(A_i | B_i R)\) because of strong subadditivity. Similarly, to calculate the supremum \(\sup _{\nu \in \Sigma _i(q)} H(A_i | B_i R)_{\nu }\), we may assume that \(\omega _{R_{i-1} R}\) is a product state or that R is trivial. This justifies the fact that we assumed R is isomorphic to \(R_{i-1}\) in the definition of \(\Sigma _i(q)\).

Remark 4.3

As we will see in the proof below, one can also impose the constraint on the set \(\Sigma _i(q)\) that the system R be isomorphic to \(A_1^{i-1}B_1^{i-1}E\). Furthermore, if a part of the latter is classical in \(\rho \), one can restrict \(\Sigma _i(q)\) to states satisfying this property.

In the following, we denote by \(\nabla f\) the gradient of a function f. (Note that in Theorem 4.4 and Proposition 4.5f is an affine function, so that \(\nabla f\) is a constant.) We write \(\mathsf {freq}(X_1^n)\) for the distribution on \({\mathcal {X}}\) defined by \(\mathsf {freq}(X_1^n)(x) = \frac{|\{i \in \{1,\dots ,n\} : X_i = x\}|}{n}\). We also recall that in this context, an event \(\Omega \) is defined by a subset of \({\mathcal {X}}^n\) and we write \(\rho [\Omega ] = \sum _{x_1^n \in \Omega }\mathrm {tr}(\rho _{A_1^n B_1^n E, x_1^n})\) for the probability of the event \(\Omega \) and

for the state conditioned on \(\Omega \) (cf. Section 2.1).

Theorem 4.4

Let \({\mathcal {M}}_1,\dots ,{\mathcal {M}}_n\) and \(\rho _{A_1^n B_1^n X_1^n E}\) be such that (26) and the Markov conditions (27) hold, let \(h \in {\mathbb {R}}\), let f be an affine min-tradeoff function for \({\mathcal {M}}_1,\dots ,{\mathcal {M}}_n\), and let \(\varepsilon \in (0,1)\). Then, for any event \(\Omega \subseteq {\mathcal {X}}^n\) that implies \(f(\mathsf {freq}(X_1^n)) \geqslant h\),Footnote 12

$$\begin{aligned} H_{\min }^{\varepsilon }(A_1^n | B_1^n E)_{\rho _{|\Omega }}&> n h - c \sqrt{n} \end{aligned}$$
(29)

holds for \(c = 2 \bigl (\log (1+2 d_A) + \left\lceil \Vert \nabla f \Vert _\infty \right\rceil \bigr ) \sqrt{1- 2 \log (\varepsilon \rho [\Omega ])}\), where \(d_A\) is the maximum dimension of the systems \(A_i\). Similarly,

$$\begin{aligned} H_{\max }^{\varepsilon }(A_1^n | B_1^n E)_{\rho _{|\Omega }}&< n h + c \sqrt{n} \end{aligned}$$
(30)

holds if f is replaced by an affine max-tradeoff function and if \(\Omega \) implies \(f(\mathsf {freq}(X_1^n)) \leqslant h\).

Before proceeding to the proof, some remarks are in order. The first is that the Markov chain assumption on the state is important as argued in Appendix C. Secondly, the system E could have been included in \(B_1\), but for the applications we consider, it is clearer to keep a separate system E that is not affected by the processes \({\mathcal {M}}_1, \dots , {\mathcal {M}}_n\). Thirdly, concerning the second order term, it is possible to replace \(d_A\) with appropriate entropic quantities, as in the Quantum Asymptotic Equipartition Property [54], which could be useful when the systems \(A_i\) are infinite-dimensional. The dependence of the second order term in the state and in the tradeoff function f is studied in more detail in the subsequent work [19]. Finally, we note that the constraint that the tradeoff function be affine is not a severe restriction: given a convex min-tradeoff function, one can always choose a tangent hyperplane at a point of interest as an affine lower bound. This is illustrated in Corollary 4.7.

To prove the theorem, we will first show the following proposition, which is essentially a Rényi version of entropy accumulation. We then show how Theorem 4.4 follows from this proposition.

Proposition 4.5

Let \({\mathcal {M}}_1, \ldots , {\mathcal {M}}_n\) and \(\rho _{A_1^n B_1^n X_1^n E}\) be such that (26) and the Markov conditions (27) hold, let \(h \in {\mathbb {R}}\), and let f be an affine min-tradeoff function f for \({\mathcal {M}}_1,\dots ,{\mathcal {M}}_n\). Then, for any event \(\Omega \) which implies \(f(\mathsf {freq}(X_1^n)) \geqslant h\),

$$\begin{aligned} H^{\uparrow }_{\alpha }(A_1^n | B_1^n E)_{\rho _{|\Omega }}&> n h - n \left( \frac{\alpha -1}{4} \right) V^2 - \frac{\alpha }{\alpha - 1} \log \frac{1}{\rho [\Omega ]} \end{aligned}$$
(31)

holds for \(\alpha \) satisfying \(1< \alpha < 1 + \frac{2}{V}\), and \(V = 2 \left\lceil \Vert \nabla f \Vert _\infty \right\rceil + 2 \log (1+2 d_A)\), where \(d_A\) is the maximum dimension of the systems \(A_i\). Similarly,

$$\begin{aligned} H_{\frac{1}{\alpha }}(A_1^n | B_1^n E)_{\rho _{|\Omega }}&< n h + n \left( \frac{\alpha -1}{4} \right) V^2 + \frac{\alpha }{\alpha - 1} \log \frac{1}{\rho [\Omega ]} \end{aligned}$$
(32)

holds if f is replaced by an affine max-tradeoff function and if \(\Omega \) implies \(f(\mathsf {freq}(X_1^n)) \leqslant h\).

Proof

We focus on proving the first inequality (31). The proof of the second inequality (32) is similar, we only point out the main differences in the course of the proof.

The first step of the proof is to construct a state that will allow us to lower-bound \(H^\uparrow _{\alpha }(A_1^n | B_1^n E)_{\rho _{|\Omega }}\) using the chain rule of Theorem 3.3, while ensuring that the tradeoff function is taken into account. Let \([g_{\min }, g_{\max }]\) be the smallest real interval that contains the range \(f({\mathbb {P}})\) of f, and set \({\bar{g}} = \frac{1}{2} (g_{\min } + g_{\max })\). Furthermore, for every i, let \({\mathcal {D}}_i : X_i \rightarrow X_i D_i {\bar{D}}_i\), with \(\dim D_i = \dim {\bar{D}}_i\), be a TPCP map defined as

where \(\tau (x)\) is a mixture between a maximally entangled state on \(D_i \otimes {\bar{D}}_i\) and a fully mixed state such that the marginal on \({\bar{D}}_i\) is uniform and such that \(H_{\alpha }(D_i|{\bar{D}}_i)_{\tau (x)} = {\bar{g}} - f(\delta _x)\) (here \(\delta _x\) stands for the distribution with all the weight on element x). To ensure that this is possible, we need to choose \(\dim D_i\) large enough, so we need to bound how large \({\bar{g}} - f(\delta _x)\) can be, positive or negative. By the definition of \({\bar{g}}\), \(|{\bar{g}} - f(\delta _x)|\) cannot be larger than \(\frac{1}{2} |g_{\max } - g_{\min }| \leqslant \Vert \nabla f \Vert _\infty \). We therefore take the dimension of the spaces \(D_i\) to be equal to

$$\begin{aligned} d_D := \left\lceil 2^{\Vert \nabla f \Vert _\infty } \right\rceil \leqslant 2^{\left\lceil \Vert \nabla f \Vert _\infty \right\rceil } \ . \end{aligned}$$

For later use, we note that we have

$$\begin{aligned} \log (1 + 2 d_{A} d_D) \leqslant \left\lceil \Vert \nabla f \Vert _\infty \right\rceil + \log (1+2 d_A ) = \frac{V}{2}. \end{aligned}$$
(33)

Now, let

$$\begin{aligned} {\bar{\rho }} := ({\mathcal {D}}_n \circ \dots \circ {\mathcal {D}}_1)(\rho ) \ . \end{aligned}$$
(34)

Note that \({\bar{\rho }}_{X_1^n A_1^n B_1^n E} = \rho _{X_1^n A_1^n B_1^n E}\).

One can think of the D systems as an “entropy price” that encodes the tradeoff function. With these systems in place, the output entropy includes an extra term that allows the tradeoff function to be taken into account in the optimisation arising in Theorem 3.3. This is formalised by the following facts, which are proven in Claim 4.6:

$$\begin{aligned} H^{\uparrow }_{\alpha }(A_1^n | B_1^n E)_{\rho _{|\Omega }}&\geqslant H^{\uparrow }_{\alpha }(A_1^n D_1^n | B_1^n E {\bar{D}}_1^n)_{{\bar{\rho }}_{|\Omega }} - n {\bar{g}} + n h \ , \end{aligned}$$
(35)
$$\begin{aligned} H_{\frac{1}{\alpha }}(A_1^n | B_1^n E)_{\rho _{|\Omega }}&\leqslant H_{\frac{1}{\alpha }}(A_1^n D_1^n | B_1^n E {\bar{D}}_1^n)_{{\bar{\rho }}_{|\Omega }} - n {\bar{g}} + n h \ . \end{aligned}$$
(36)

The next step is to relate the entropies on the conditional state \(\rho _{|\Omega }\) to those on the unconditional state. To do this, we use Lemmas B.5 and B.6 applied to \({\bar{\rho }} = \rho [\Omega ] {\bar{\rho }}_{|\Omega } + ({\bar{\rho }} - \rho [\Omega ] {\bar{\rho }}_{|\Omega })\), together with the fact that \(H_{\alpha }^{\uparrow } \geqslant H_{\alpha }\), and obtainFootnote 13

$$\begin{aligned} H^{\uparrow }_{\alpha }(A_1^n | B_1^n E)_{\rho _{|\Omega }}&\geqslant H_{\alpha }(A_1^n D_1^n | B_1^n E {\bar{D}}_1^n)_{{\bar{\rho }}} - \frac{\alpha }{\alpha - 1} \log \frac{1}{\rho [\Omega ]} - n {\bar{g}} + n h \ , \end{aligned}$$
(37)
$$\begin{aligned} H_{\frac{1}{\alpha }}(A_1^n | B_1^n E)_{\rho _{|\Omega }}&\leqslant H_{\frac{1}{\alpha }}(A_1^n D_1^n | B_1^n E {\bar{D}}_1^n)_{{\bar{\rho }}} + \frac{\alpha }{\alpha - 1} \log \frac{1}{\rho [\Omega ]} - n {\bar{g}} + n h \ . \end{aligned}$$
(38)

To show the desired inequality (31), it now suffices to prove that \(H_{\alpha }(A_1^n D_1^n | B_1^n E {\bar{D}}_1^n)_{{\bar{\rho }}}\) is lower bounded by (roughly) \(n {\bar{g}}\). To do that, we are now going to use the chain rule for Rényi entropies in the form of Corollary 3.5n times on the state \({\bar{\rho }}\), with the following substitutions at step i:

  • \(A_1 \rightarrow A_1^{i-1} D_1^{i-1}\)

  • \(B_1 \rightarrow B_1^{i-1} E {\bar{D}}_1^{i-1} \)

  • \(A_2 \rightarrow A_i D_i\)

  • \(B_2 \rightarrow B_i {\bar{D}}_i\)

  • \(R \rightarrow R_{i-1}\)

  • \({\mathcal {M}}\rightarrow \mathrm {tr}_{X_i} \circ {\mathcal {D}}_i \circ {\mathcal {M}}_i\).

To establish the Markov chain condition, we compute the conditional mutual information. Using the chain rule, we obtain

$$\begin{aligned}&I(A_1^{i-1} D_1^{i-1} : B_i {\bar{D}}_i | B_1^{i-1} E {\bar{D}}_1^{i-1}) \nonumber \\&\quad = I(A_1^{i-1} : B_i {\bar{D}}_i | B_1^{i-1} E {\bar{D}}_1^{i-1}) + I(D_1^{i-1} : B_i {\bar{D}}_i | A_1^{i-1} B_1^{i-1} E {\bar{D}}_1^{i-1}) . \end{aligned}$$
(39)

We first show that the second term is zero. By construction, \(D_1^{i-1} {\bar{D}}_1^{i-1}\) conditioned on \(X_1^{i-1}\) is independent of all the other systems. This implies that \(I(D_1^{i-1} {\bar{D}}_1^{i-1} : B_i {\bar{D}}_i | X_1^{i-1} A_1^{i-1} B_1^{i-1} E ) = 0\). In addition, using the fact that \(X_1^{i-1}\) is determined by \(A_1^{i-1} B_1^{i-1}\), the systems \(X_1^{i-1}\) can be removed from the conditioned without changing the value. Then, using the chain rule and together with the non-negativity of the conditional mutual information, this shows that \(I(D_1^{i-1} : B_i {\bar{D}}_i | A_1^{i-1} B_1^{i-1} E {\bar{D}}_1^{i-1}) = 0\). To compute the first term in (39), we use the fact that \({\bar{D}}_1^n\) is uniform independently of \(A_1^n B_1^n E\) so that \(I(A_1^{i-1} : B_i {\bar{D}}_i | B_1^{i-1} E {\bar{D}}_1^{i-1}) = I(A_1^{i-1} : B_i | B_1^{i-1} E )\). But then the assumed Markov condition on \(\rho _{A_1^n B_1^n E}\) implies that this quantity is zero and establishes the required condition to apply Corollary 3.5.

We thus obtain

$$\begin{aligned}&H_{\alpha }(A_1^n D_1^n | B_1^n E {\bar{D}}_1^n)_{{\bar{\rho }}}\nonumber \\&\quad \geqslant \sum _i \inf _{\omega _{R_{i-1} R}} H_{\alpha }(A_i D_i|B_i {\bar{D}}_i R)_{({\mathcal {D}}_i \circ {\mathcal {M}}_i)(\omega )}\nonumber \\&\quad > \sum _i \inf _{\omega _{R_{i-1} R}} H(A_i D_i|B_i {\bar{D}}_i R)_{({\mathcal {D}}_i \circ {\mathcal {M}}_i)(\omega )} - n(\alpha -1) \log ^2(1 + 2 d_{A} d_D) \nonumber \\&\quad \geqslant \sum _i \inf _{\omega _{R_{i-1} R}} H(A_i D_i|B_i {\bar{D}}_i R)_{({\mathcal {D}}_i \circ {\mathcal {M}}_i)(\omega )} - n \frac{(\alpha -1)}{4} V^2 \ , \end{aligned}$$
(40)

where we have invoked Lemma B.9 in the second inequality and (33) in the last. Note that the restriction of this lemma that \(\alpha \) satisfy \(1< \alpha < 1+ 1/ \log (1 + 2 d_A d_D)\) is implied by our assumption that \(\alpha < 1 + 2/V\). The infimum is taken over all states \(\omega _{R_{i-1} R}\), where the system R is isomorphic to \(A_1^{i-1} D_1^{i-1} B_1^{i-1} {\bar{D}}_1^{i-1} E\). This condition can be further strengthened by redoing the above argument with Theorem 3.2 instead of Corollary 3.5. It turns out that the system R can be taken to be isomorphic to \(A_1^{i-1} B_1^{i-1} E\), as noted in Remark 4.3.

To prove that we can restrict in our optimisation the system R to be isomorphic to \(A_1^{i-1}B_1^{i-1}E\) and drop the systems \(D_1^{i-1} {\bar{D}}_1^{i-1}\), we use Theorem 3.2 directly instead of Corollary 3.5. In particular, using Lemma B.7 as for (43), we can write

$$\begin{aligned}&H_{\alpha }(A_1^n D_1^n | B_1^n E {\bar{D}}_1^n)_{{\bar{\rho }}} \\&\quad = H_{\alpha }(A_1^n X_1^n D_1^n | B_1^n E {\bar{D}}_1^n)_{{\bar{\rho }}} \\&\quad = H_{\alpha }(A_1^{n-1} X_1^{n-1} D_1^{n-1} | B_1^n E {\bar{D}}_1^n)_{{\bar{\rho }}} + H_{\alpha }(A_{n} X_n D_{n} | B_1^n E {\bar{D}}_1^n A_1^{n-1} X_1^{n-1} D_1^{n-1})_{\nu ^n} \\&\quad = H_{\alpha }(A_1^{n-1} X_1^{n-1} D_1^{n-1} | B_1^{n-1} E {\bar{D}}_1^{n-1})_{{\bar{\rho }}} + H_{\alpha }(A_{n} X_n D_{n} | B_1^n E {\bar{D}}_1^n A_1^{n-1} X_1^{n-1} D_1^{n-1})_{\nu ^n} \ , \end{aligned}$$

where we used the Markov chain condition \(A_1^{n-1} X_1^{n-1} D_1^{n-1} \leftrightarrow B_1^{n-1} E {\bar{D}}_1^{n-1} \leftrightarrow B_n {\bar{D}}_n\) and we defined for all \(i \in \{1, \dots , n\}\)

$$\begin{aligned} \nu ^i_{A_1^{i-1} X_1^{i-1} D_1^{i-1} B_1^{i} E {\bar{D}}_1^{i}}&= \frac{\left( {\bar{\rho }}_{A_1^{i-1} X_1^{i-1} D_1^{i-1} B_1^{i} E {\bar{D}}_1^{i}}^{\frac{1}{2}} {\bar{\rho }}^{\frac{1-\alpha }{\alpha }}_{B_1^{i} E {\bar{D}}_1^{i}} {\bar{\rho }}_{A_1^{i-1} X_1^{i-1} D_1^{i-1} B_1^{i} E {\bar{D}}_1^{i}}^{\frac{1}{2}}\right) ^{\alpha }}{Z_i} \\ \nu ^i_{A_1^i X_1^i D_1^i B_1^{i} E {\bar{D}}_1^{i}}&= (\nu ^i_{A_1^{i-1} X_1^{i-1} D_1^{i-1} B_1^{i} E {\bar{D}}_1^{i}})^{\frac{1}{2}} {\bar{\rho }}_{A_i X_i D_i | A_1^{i-1} X_1^{i-1} D_1^{i-1} B_1^{i} E {\bar{D}}_1^{i}}(\nu ^i_{A_1^{i-1} X_1^{i-1} D_1^{i-1} B_1^{i} E {\bar{D}}_1^{i}})^{\frac{1}{2}} \ , \end{aligned}$$

with \(Z_i = \mathrm {tr}\left( {\bar{\rho }}_{A_1^{i-1} X_1^{i-1} D_1^{i-1} B_1^{i} E {\bar{D}}_1^{i}}^{\frac{1}{2}} {\bar{\rho }}^{\frac{1-\alpha }{\alpha }}_{B_1^{i} E {\bar{D}}_1^{i}} {\bar{\rho }}_{A_1^{i-1} X_1^{i-1} D_1^{i-1} B_1^{i} E {\bar{D}}_1^{i}}^{\frac{1}{2}}\right) ^{\alpha }\). We then use the chain rule \(n-2\) more times to get

$$\begin{aligned} H_{\alpha }(A_1^n D_1^n | B_1^n E {\bar{D}}_1^n)_{{\bar{\rho }}}&= \sum _{i} H_{\alpha }(A_i X_i D_i | B_1^{i} E {\bar{D}}_1^{i} A_1^{i-1} X_1^{i-1} D_1^{i-1})_{\nu ^i} \ . \end{aligned}$$

We now use the properties of \({\bar{\rho }}\) to simplify the entropies in the right hand side.

Using the properties of the systems \(D_1^i {\bar{D}}_1^i\), we get for any \(x \in {\mathcal {X}}^{i-1}\),

$$\begin{aligned}&{\bar{\rho }}_{A_1^{i-1} D_1^{i-1} B_1^{i} E {\bar{D}}_1^{i},x}^{\frac{1}{2}} {\bar{\rho }}^{\frac{1-\alpha }{\alpha }}_{B_1^{i} E {\bar{D}}_1^{i}} {\bar{\rho }}_{A_1^{i-1} D_1^{i-1} B_1^{i} E {\bar{D}}_1^{i},x}^{\frac{1}{2}} \\&\quad = \left( \rho _{A_1^{i-1}B_1^{i} E,x} \otimes \tau (x)_{D_1^{i-1} {\bar{D}}_1^{i-1}} \otimes {\bar{\rho }}_{{\bar{D}}_i} \right) ^{\frac{1}{2}} \left( \rho _{B_1^{i} E} \otimes {\bar{\rho }}_{{\bar{D}}_1^{i}}\right) ^{\frac{1-\alpha }{\alpha }}\\&\qquad \left( \rho _{A_1^{i-1}B_1^{i} E,x} \otimes \tau (x)_{D_1^{i-1} {\bar{D}}_1^{i-1}} \otimes {\bar{\rho }}_{{\bar{D}}_i} \right) ^{\frac{1}{2}} \\&\quad = \left( \rho _{A_1^{i-1}B_1^{i} E,x} \right) ^{\frac{1}{2}} \left( \rho _{B_1^{i} E}\right) ^{\frac{1-\alpha }{\alpha }} \left( \rho _{A_1^{i-1}B_1^{i} E,x} \right) ^{\frac{1}{2}} \otimes \tau (x)_{D_1^{i-1} {\bar{D}}_1^{i-1}}^{\frac{1}{2}} {\bar{\rho }}_{D_1^{i-1}}^{\frac{1-\alpha }{\alpha }} \tau (x)_{D_1^{i-1} {\bar{D}}_1^{i-1}}^{\frac{1}{2}} \otimes {\bar{\rho }}^{\frac{1}{\alpha }}_{{\bar{D}}_i} \ , \end{aligned}$$

where we used the fact that \({\bar{\rho }}_{D_1^{i}} = \otimes _{j=1}^i {\bar{\rho }}_{D_j}\). Letting

$$\begin{aligned} \tau '(x)_{D_1^{i-1} {\bar{D}}_1^{i-1}}&= \frac{\left( \tau (x)_{D_1^{i-1} {\bar{D}}_1^{i-1}}^{\frac{1}{2}} {\bar{\rho }}_{D_1^{i-1}}^{\frac{1-\alpha }{\alpha }} \tau (x)_{D_1^{i-1} {\bar{D}}_1^{i-1}}^{\frac{1}{2}}\right) ^{\alpha }}{\mathrm {tr}\left( \tau (x)_{D_1^{i-1} {\bar{D}}_1^{i-1}}^{\frac{1}{2}} {\bar{\rho }}_{D_1^{i-1}}^{\frac{1-\alpha }{\alpha }} \tau (x)_{D_1^{i-1} {\bar{D}}_1^{i-1}}^{\frac{1}{2}}\right) ^{\alpha }} \\ \nu ^i_{A_1^{i-1} B_1^{i} E, x}&= \frac{\mathrm {tr}\left( \tau (x)_{D_1^{i-1} {\bar{D}}_1^{i-1}}^{\frac{1}{2}} {\bar{\rho }}_{D_1^{i-1}}^{\frac{1-\alpha }{\alpha }} \tau (x)_{D_1^{i-1} {\bar{D}}_1^{i-1}}^{\frac{1}{2}}\right) ^{\alpha }}{Z_i}\\&\qquad \left( \left( \rho _{A_1^{i-1}B_1^{i} E,x} \right) ^{\frac{1}{2}} \left( \rho _{B_1^{i} E}\right) ^{\frac{1-\alpha }{\alpha }} \left( \rho _{A_1^{i-1}B_1^{i} E,x} \right) ^{\frac{1}{2}} \right) ^{\alpha } \ , \end{aligned}$$

we can write

In addition

As a result,

with

$$\begin{aligned}&\nu ^i_{A_1^i X_1^i D_1^i B_1^{i} E {\bar{D}}_1^{i},x} \\&\quad = \left( (\nu ^i_{A_1^{i-1} B_1^{i} E, x})^{\frac{1}{2}} \rho _{A_1^{i-1} B_1^{i} E, x}^{-\frac{1}{2}} \otimes \tau '(x)^{\frac{1}{2}} \tau (x)^{-\frac{1}{2}} \right) {\bar{\rho }}_{A_1^{i} X_i D_1^{i} B_1^{i} {\bar{D}}_1^{i} E, x}\\&\qquad \times \left( \rho _{A_1^{i-1} B_1^{i} E, x}^{-\frac{1}{2}} (\nu ^i_{A_1^{i-1} B_1^{i} E, x})^{\frac{1}{2}} \otimes \tau (x)^{-\frac{1}{2}} \tau '(x)^{\frac{1}{2}} \right) \\&\quad = \left( (\nu ^i_{A_1^{i-1} B_1^{i} E, x})^{\frac{1}{2}} \rho _{A_1^{i-1} B_1^{i} E, x}^{-\frac{1}{2}} \right) {\bar{\rho }}_{A_1^{i} X_i D_i {\bar{D}}_i B_1^{i} E, x} \left( \rho _{A_1^{i-1} B_1^{i} E, x}^{-\frac{1}{2}} (\nu ^i_{A_1^{i-1} B_1^{i} E, x})^{\frac{1}{2}} \right) \\&\qquad \otimes \tau '(x)_{D_1^{i-1} {\bar{D}}_1^{i-1}} \ . \end{aligned}$$

As the system \(D_1^{i-1}{\bar{D}}_1^{i-1}\) can be generated by only acting on \(X_1^{i-1}\), we have by data processing

$$\begin{aligned} H_{\alpha }(A_i X_i D_i | B_1^{i} E {\bar{D}}_1^{i} A_1^{i-1} X_1^{i-1} D_1^{i-1})_{\nu ^i}&= H_{\alpha }(A_i X_i D_i | B_1^{i} E {\bar{D}}_i A_1^{i-1} X_1^{i-1})_{\nu ^i} \ . \end{aligned}$$

We can then write

$$\begin{aligned} \nu ^i_{A_1^{i} X_1^{i} B_1^{i} E D_i {\bar{D}}_i}&= (\nu ^i_{A_1^{i-1} X_1^{i-1} B_1^{i} E})^{\frac{1}{2}} \rho _{A_1^{i-1} X_{1}^{i-1} B_1^{i} E}^{-\frac{1}{2}} {\bar{\rho }}_{A_1^{i} X_1^{i} B_1^{i} E D_i {\bar{D}}_i} \rho _{A_1^{i-1} X_1^{i-1} B_1^{i} E}^{-\frac{1}{2}} (\nu ^i_{A_1^{i-1} X_1^{i-1} B_1^{i} E})^{\frac{1}{2}} \ . \end{aligned}$$

We now use Claim 3.4 with the substitutions

  • \(A_1 \rightarrow X_1^{i-1} A_1^{i-1}\)

  • \(A_2 \rightarrow X_i A_i D_i {\bar{D}}_i\)

  • \(B_1 \rightarrow B_1^{i-1} E\)

  • \(B_2 \rightarrow B_i\)

and using the Markov property \(X_1^{i-1} A_1^{i-1} \leftrightarrow B_1^{i-1} E \leftrightarrow B_i\). Thus, we have

$$\begin{aligned} \nu _{X_i A_i D_i {\bar{D}}_i B_i | X_1^{i-1} A_1^{i-1} B_1^{i-1} E} = {\bar{\rho }}_{X_i A_i D_i {\bar{D}}_i B_i | X_1^{i-1} A_1^{i-1} B_1^{i-1} E} \ . \end{aligned}$$

As a result, as in the proof of Corollary 3.5, we then get

$$\begin{aligned} \nu ^i_{A_1^{i} X_1^{i} B_1^{i} E D_i {\bar{D}}_i}&= ({\mathcal {D}}_i \circ {\mathcal {M}}_i)(\omega ^i_{R_{i-1} A_1^{i-1} X_{1}^{i-1} B_1^{i-1} E}) \ , \end{aligned}$$

where

$$\begin{aligned} \omega ^i_{R_{i-1} A_1^{i-1} X_1^{i-1} B_1^{i-1} E} := T_{X_1^{i-1} A_1^{i-1} B_1^{i-1} E} \rho _{R_{i-1} A_1^{i-1} X_1^{i-1} B_1^{i-1} E} T^{\dagger }_{X_1^{i-1} A_1^{i-1} B_1^{i-1} E} \ , \end{aligned}$$

with \(T_{X_1^{i-1} A_1^{i-1} B_1^{i-1} E} = (\nu ^i_{X_1^{i-1} A_1^{i-1} B_1^{i-1} E})^{\frac{1}{2}} (\rho _{X_1^{i-1} A_1^{i-1} B_1^{i-1} E})^{-\frac{1}{2}}\). Finally, we get

$$\begin{aligned}&H_{\alpha }(A_1^n D_1^n | B_1^n E {\bar{D}}_1^n)_{{\bar{\rho }}} \\&\quad = \sum _{i} H_{\alpha }(A_i D_i | B_1^{i} E {\bar{D}}_{i} A_1^{i-1} X_1^{i-1} )_{({\mathcal {D}}_i \circ {\mathcal {M}}_i) (\omega ^i)} \\&\quad \geqslant \sum _{i} \inf _{\omega _{R_{i-1} A_1^{i-1} B_1^{i-1} E}}H_{\alpha }(A_i D_i | B_1^{i} E {\bar{D}}_{i} A_1^{i-1} )_{({\mathcal {D}}_i \circ {\mathcal {M}}_i)(\omega ^i)} \ , \end{aligned}$$

where in the inequality we used the fact that \(X_1^{i-1}\) is classical together with Lemma B.3. We point out that it is clear from this calculation that if part of the systems \(A_1^{i-1} B_1^{i-1} E\) is classical in \(\rho \), it remains classical in \(\omega ^i\). This proves the claims in Remark 4.3.

Considering the right hand side of expression (40), we get for any such state \(\omega _{R_{i-1} R}\),

$$\begin{aligned} H(A_i D_i | B_i {\bar{D}}_i R)_{({\mathcal {D}}_i \circ {\mathcal {M}}_i)(\omega )}&= H(A_i X_i D_i | B_i {\bar{D}}_i R)_{({\mathcal {D}}_i \circ {\mathcal {M}}_i)(\omega )} \\&= H(A_i X_i | B_i R)_{{\mathcal {M}}_i(\omega )} + H(D_i|{\bar{D}}_i X_i)_{({\mathcal {D}}_i \circ {\mathcal {M}}_i)(\omega )}\\&= H(A_i | B_i R)_{{\mathcal {M}}_i(\omega )} + \sum _x q(x) H(D_i|{\bar{D}}_i)_{\tau (x)}\\&\geqslant H(A_i | B_i R)_{{\mathcal {M}}_i(\omega )} + \sum _x q(x) H_{\alpha }(D_i|{\bar{D}}_i )_{\tau (x)}\\&= H(A_i | B_i R)_{{\mathcal {M}}_i(\omega )} + \sum _x q(x) \bigl ( {\bar{g}} - f(\delta _x) \bigr ) \\&= H(A_i | B_i R)_{{\mathcal {M}}_i(\omega )} + {\bar{g}} - f(q)\\&\geqslant {\bar{g}} \ \end{aligned}$$

where \(q = {\mathcal {M}}_i(\omega )_{X_i}\) denotes the distribution of \(X_i\) on \({\mathcal {X}}\) obtained from the state \({\mathcal {M}}_i(\omega )\). The third equality comes from the fact that \(X_i\) is determined by \(A_iB_i\). The first inequality follows from the monotonicity of the Rényi entropies in \(\alpha \) [8, 37]. The last equality holds because f is affine and the final inequality because f is a min-tradeoff function. Putting everything together, Eq. (37) becomes

$$\begin{aligned} H^{\uparrow }_{\alpha }(A_1^n | B_1^n E)_{\rho _{|\Omega }} > n h - n \frac{(\alpha -1)}{4} V^2 - \frac{\alpha -1}{\alpha } \log \frac{1}{\rho [\Omega ]} \ . \end{aligned}$$

This concludes the proof of the first inequality (31) of Proposition 4.5.

In order to show the second inequality (32), using the same argument as before, we obtain

$$\begin{aligned} H_{\frac{1}{\alpha }}(A_1^n D_1^n | B_1^n E {\bar{D}}_1^n)_{{\bar{\rho }}}&< \sum _i \sup _{\omega _{R_{i-1} R}} H(A_i D_i|B_i {\bar{D}}_i R)_{({\mathcal {D}}_i \circ {\mathcal {M}}_i)(\omega )} + n \frac{(\alpha -1)}{4} V^2 \ , \end{aligned}$$

where the supremum is over all states \(\omega _{R_{i-1}R}\) with R constrained as described by Remark 4.3. For any such state and a max-tradeoff function f, we have

$$\begin{aligned} H(A_i D_i | B_i {\bar{D}}_i R)_{({\mathcal {D}}_i \circ {\mathcal {M}}_i)(\omega )}&\leqslant H(A_i | B_i R)_{{\mathcal {M}}_i(\omega )} + \sum _x q(x) H_{\frac{1}{\alpha }}(D_i|{\bar{D}}_i )_{\tau (x)}\\&= H(A_i | B_i R)_{{\mathcal {M}}_i(\omega )} + \sum _x q(x) \bigl ( {\bar{g}} - f(\delta _x) \bigr ) \\&= H(A_i | B_i R)_{{\mathcal {M}}_i(\omega )} + {\bar{g}} - f(q)\\&\leqslant {\bar{g}} \ . \end{aligned}$$

It then suffices to combine these inequalities with inequality (38). \(\quad \square \)

We now prove the claim used in the preceding proof.

Claim 4.6

For \(\alpha \in (1,2]\), \(\rho \) and \(\Omega \) as in the statement of Proposition 4.5 and \({\bar{\rho }}\) as defined in (34) (see also the preceding text for a definition of \({\bar{g}}\)), we have

$$\begin{aligned} H^{\uparrow }_{\alpha }(A_1^n | B_1^n E)_{\rho _{|\Omega }}&\geqslant H^{\uparrow }_{\alpha }(A_1^n D_1^n | B_1^n E {\bar{D}}_1^n)_{{\bar{\rho }}_{|\Omega }} - n {\bar{g}} + n h \ , \end{aligned}$$
(41)
$$\begin{aligned} H_{\frac{1}{\alpha }}(A_1^n | B_1^n E)_{\rho _{|\Omega }}&\leqslant H_{\frac{1}{\alpha }}(A_1^n D_1^n | B_1^n E {\bar{D}}_1^n)_{{\bar{\rho }}_{|\Omega }} - n {\bar{g}} + n h \ . \end{aligned}$$
(42)

Proof

We focus on proving inequality (41). The first step is to show that as \(X_1^n\) is a deterministic function of \(A_1^n B_1^n\), we have

$$\begin{aligned} H^{\uparrow }_{\alpha }(A_1^n D_1^n | B_1^n E {\bar{D}}_1^n)_{{\bar{\rho }}_{|\Omega }}&= H^{\uparrow }_{\alpha }(A_1^n X_1^n D_1^n | B_1^n E {\bar{D}}_1^n)_{{\bar{\rho }}_{|\Omega }} \ . \end{aligned}$$
(43)

In order to do that, observe that for any \(x_1^n \in {\mathcal {X}}^n\), we have

$$\begin{aligned} {\bar{\rho }}_{A_1^n B_1^n E D_1^n {\bar{D}}_1^n, x_1^n} = \rho _{A_1^n B_1^n E, x_1^n} \otimes \tau (x_1^n)_{D_1^n {\bar{D}}_1^n} \ , \end{aligned}$$

where we introduced the notation \(\tau (x_1^n)_{D_1^n {\bar{D}}_1^n} = \tau (x_1)_{D_1 {\bar{D}}_1} \otimes \cdots \otimes \tau (x_n)_{D_n {\bar{D}}_n}\). This implies that for any \(x_1^n\), we have

$$\begin{aligned} {\bar{\rho }}_{X_1^n A_1^n B_1^n E D_1^n {\bar{D}}_1^n, x_1^n} = ({\mathcal {T}}_n \circ \dots \circ {\mathcal {T}}_1)({\bar{\rho }}_{A_1^n B_1^n E D_1^n {\bar{D}}_1^n, x_1^n}) \ . \end{aligned}$$

By taking the sum over \(x_1^n \in \Omega \) and then normalising by \(\rho [\Omega ]\), we get

$$\begin{aligned} {\bar{\rho }}_{X_1^n A_1^n B_1^n E D_1^n {\bar{D}}_1^n | \Omega } = ({\mathcal {T}}_n \circ \dots \circ {\mathcal {T}}_1)({\bar{\rho }}_{A_1^n B_1^n E D_1^n {\bar{D}}_1^n | \Omega }) \ . \end{aligned}$$

Thus, we can apply Lemma B.7 and prove the equality (43).

Let now \(\sigma _{B_1^n E {\bar{D}}_1^n}\) be a state such that

$$\begin{aligned} H^\uparrow _{\alpha }(A_1^n X_1^n D_1^n | B_1^n E {\bar{D}}_1^n)_{{\bar{\rho }}_{|\Omega }} = - D_{\alpha }({\bar{\rho }}_{A_1^n X_1^n D_1^n B_1^n E {\bar{D}}_1^n | \Omega } \Vert \mathrm {id}_{A_1^n X_1^n D_1^n} \otimes \sigma _{B_1^n E {\bar{D}}_1^n}) \ . \end{aligned}$$

Let furthermore \({\mathcal {S}}= {\mathcal {S}}_{D {\bar{D}}}\) be the TPCP map that applies a random (according to the Haar measure) unitary to D and its conjugate to \({\bar{D}}\) (in such a way that the maximally entangled state on \(D {\bar{D}}\) used to define \(\tau (x)\) is preserved). It is then easy to see that the map \({\mathcal {S}}^{\otimes n}\) applied to the n pairs \(D_i {\bar{D}}_i\) leaves \({\bar{\rho }}_{|\Omega }\) invariant. Hence, by the data processing inequality

$$\begin{aligned} H^\uparrow _{\alpha }(A_1^n X_1^n D_1^n | B_1^n E {\bar{D}}_1^n)_{{\bar{\rho }}_{|\Omega }}&\leqslant - D_{\alpha }({\mathcal {S}}^{\otimes n}({\bar{\rho }}_{A_1^n X_1^n D_1^n B_1^n E {\bar{D}}_1^n | \Omega }) \Vert {\mathcal {S}}^{\otimes n}(\mathrm {id}_{A_1^n X_1^n D_1^n} \otimes \sigma _{B_1^n E {\bar{D}}_1^n})) \\&= - D_{\alpha }({\bar{\rho }}_{A_1^n X_1^n D_1^n B_1^n E {\bar{D}}_1^n | \Omega } \Vert \mathrm {id}_{A_1^n X_1^n D_1^n} \otimes {\bar{\sigma }}_{B_1^n E {\bar{D}}_1^n}) \ , \end{aligned}$$

where \({\bar{\sigma }}_{B_1^n E {\bar{D}}_1^n} = \sigma _{B_1^n E} \otimes {\bar{\rho }}_{{\bar{D}}_1^n}\). Lemma 3.1 then implies that

$$\begin{aligned} H^\uparrow _{\alpha }(A_1^n X_1^n D_1^n | B_1^n E {\bar{D}}_1^n)_{{\bar{\rho }}_{|\Omega }} \leqslant H^\uparrow _{\alpha }(A_1^n X_1^n | B_1^n E {\bar{D}}_1^n)_{{\bar{\rho }}_{|\Omega }} + H_{\alpha }(D_1^n | A_1^n X_1^n B_1^n E {\bar{D}}_1^n)_{\nu } \end{aligned}$$
(44)

where \(\nu \) is a state defined by

$$\begin{aligned} \nu _{A_1^n X_1^n B_1^n E {\bar{D}}_1^n}&= \frac{\left( {\bar{\rho }}_{A_1^n X_1^n B_1^n E {\bar{D}}_{1}^n|\Omega }^{\frac{1}{2}} {\bar{\sigma }}^{-\alpha '}_{B_1^n E {\bar{D}}_{1}^n} {\bar{\rho }}_{A_1^n X_1^n B_1^n E {\bar{D}}_{1}^n|\Omega }^{\frac{1}{2}}\right) ^{\alpha }}{\mathrm {tr}\left( {\bar{\rho }}_{A_1^n X_1^n B_1^n E {\bar{D}}_{1}^n|\Omega }^{\frac{1}{2}} {\bar{\sigma }}^{-\alpha '}_{B_1^n E {\bar{D}}_{1}^n} {\bar{\rho }}_{A_1^n X_1^n B_1^n E {\bar{D}}_{1}^n|\Omega }^{\frac{1}{2}}\right) ^{\alpha }} \quad \text {and} \\ \nu _{A_1^n X_1^n B_1^n E D_1^n {\bar{D}}_1^n}&= \nu _{A_1^n X_1^n B_1^n E {\bar{D}}_1^n}^{\frac{1}{2}} {\bar{\rho }}_{D_1^n | A_1^n X_1^n B_1^n E {\bar{D}}_1^n | \Omega } \nu _{A_1^n X_1^n B_1^n E {\bar{D}}_1^n}^{\frac{1}{2}} \ . \end{aligned}$$

We now use properties of \(\rho _{|\Omega }\) and \({\bar{\sigma }}\) to simplify the expression of \(\nu \). Observing that

(45)

we can write

In addition, as \({\bar{\rho }}_{|\Omega }\) is of the form

we have

where \(\rho _{A_1^n B_1^n E, x_1^n}^{0}\) is the projector onto the support of \(\rho _{A_1^n B_1^n E, x_1^n}\). Hence,

(46)

Getting back to the inequality (44), we have \(H^\uparrow _{\alpha }(A_1^n X_1^n | B_1^n E {\bar{D}}_1^n)_{{\bar{\rho }}_{|\Omega }} = H^\uparrow _{\alpha }(A_1^n | B_1^n E )_{{\bar{\rho }}_{|\Omega }}\) using Eq. (45) to drop \({\bar{D}}_1^n\) and Lemma B.7 to drop \(X_1^n\). Moreover, using (46), we have that \(H_{\alpha }(D_1^n | A_1^n X_1^n B_1^n E {\bar{D}}_1^n)_{\nu } = H_{\alpha }(D_1^n | X_1^n {\bar{D}}_1^n)_{\nu }\). Finally, we get

$$\begin{aligned} H^{\uparrow }_{\alpha }(A_1^n D_1^n | B_1^n E {\bar{D}}_1^n)_{{\bar{\rho }}_{|\Omega }} \leqslant H^{\uparrow }_{\alpha }(A_1^n | B_1^n E)_{\rho _{|\Omega }} + H_{\alpha }(D_1^n | {\bar{D}}_1^n X_1^n)_{\nu } \ . \end{aligned}$$
(47)

It is a direct consequence of the definition of \(\tau (x)\) that

$$\begin{aligned} H_{\alpha }(D_1^n | {\bar{D}}_1^n)_{\tau (x_1^n)}= & {} n {\bar{g}} - \sum _{i=1}^n f(\delta _{x_i}) \\= & {} n {\bar{g}} - n \sum _{x \in {\mathcal {X}}} \mathsf {freq}(x_1^n)(x) f(\delta _x)\\= & {} n {\bar{g}} - n f\left( \sum _{x \in {\mathcal {X}}} \mathsf {freq}(x_1^n)(x) \delta _x\right) = n {\bar{g}} - n f(\mathsf {freq}(x_1^n)) \ , \end{aligned}$$

where we have used that f is an affine function. Using Lemma B.3 and (46) we can bound the second term on the right hand side of (47) by

$$\begin{aligned} H_{\alpha }(D_1^n | {\bar{D}}_1^n X_1^n)_{\nu }&\leqslant \max _{x_1^n \in \Omega } H_{\alpha }(D_1^n | {\bar{D}}_1^n)_{\tau ({x_1^n})}\\&\leqslant \max _{x_1^n: \, f(\mathsf {freq}(x_1^n)) \geqslant h} n {\bar{g}} - n f(\mathsf {freq}(x_1^n)) \leqslant n {\bar{g}} - n h \ . \end{aligned}$$

Inserting this in (47) gives

$$\begin{aligned} H^{\uparrow }_{\alpha }(A_1^n | B_1^n E)_{\rho _{|\Omega }} \geqslant H^{\uparrow }_{\alpha }(A_1^n D_1^n | B_1^n E {\bar{D}}_1^n)_{{\bar{\rho }}_{|\Omega }} - n {\bar{g}} + n h \ . \end{aligned}$$

This concludes the proof of inequality (41). For the proof of inequality (42), we can follow similar steps.

In order to prove inequality (42). The first step is to show that as \(X_1^n\) is a deterministic function of \(A_1^n B_1^n\), we have

$$\begin{aligned} H_{\alpha }(A_1^n D_1^n | B_1^n E {\bar{D}}_1^n)_{{\bar{\rho }}_{|\Omega }}&= H_{\alpha }(A_1^n X_1^n D_1^n | B_1^n E {\bar{D}}_1^n)_{{\bar{\rho }}_{|\Omega }} \ . \end{aligned}$$
(48)

In order to do that, observe that for any \(x_1^n \in {\mathcal {X}}^n\), we have

$$\begin{aligned} {\bar{\rho }}_{A_1^n B_1^n E D_1^n {\bar{D}}_1^n, x_1^n} = \rho _{A_1^n B_1^n E, x_1^n} \otimes \tau (x_1^n)_{D_1^n {\bar{D}}_1^n} \ , \end{aligned}$$

where we introduced the notation \(\tau (x_1^n)_{D_1^n {\bar{D}}_1^n} = \tau (x_1)_{D_1 {\bar{D}}_1} \otimes \cdots \otimes \tau (x_n)_{D_n {\bar{D}}_n}\). This implies that for any \(x_1^n\), we have

$$\begin{aligned} {\bar{\rho }}_{X_1^n A_1^n B_1^n E D_1^n {\bar{D}}_1^n, x_1^n} = ({\mathcal {T}}_n \circ \dots \circ {\mathcal {T}}_1)({\bar{\rho }}_{A_1^n B_1^n E D_1^n {\bar{D}}_1^n, x_1^n}) \ . \end{aligned}$$

By taking the sum over \(x_1^n \in \Omega \) and then normalising by \(\rho [\Omega ]\), we get

$$\begin{aligned} {\bar{\rho }}_{X_1^n A_1^n B_1^n E D_1^n {\bar{D}}_1^n | \Omega } = ({\mathcal {T}}_n \circ \dots \circ {\mathcal {T}}_1)({\bar{\rho }}_{A_1^n B_1^n E D_1^n {\bar{D}}_1^n | \Omega }) \ . \end{aligned}$$

Thus, we can apply Lemma B.7 and prove the equality (48). Theorem 3.2 then implies that

$$\begin{aligned} H_{\alpha }(A_1^n X_1^n D_1^n | B_1^n E {\bar{D}}_1^n)_{{\bar{\rho }}_{|\Omega }} = H_{\alpha }(A_1^n X_1^n | B_1^n E {\bar{D}}_1^n)_{{\bar{\rho }}_{|\Omega }} + H_{\alpha }(D_1^n | A_1^n X_1^n B_1^n E {\bar{D}}_1^n)_{\nu } \end{aligned}$$
(49)

where \(\nu \) is a state defined by

$$\begin{aligned} \nu _{A_1^n X_1^n B_1^n E {\bar{D}}_1^n}&= \frac{\left( {\bar{\rho }}_{A_1^n X_1^n B_1^n E {\bar{D}}_{1}^n|\Omega }^{\frac{1}{2}} {\bar{\rho }}^{\frac{1-\alpha }{\alpha }}_{B_1^n E {\bar{D}}_{1}^n} {\bar{\rho }}_{A_1^n X_1^n B_1^n E {\bar{D}}_{1}^n|\Omega }^{\frac{1}{2}}\right) ^{\alpha }}{\mathrm {tr}\left( {\bar{\rho }}_{A_1^n X_1^n B_1^n E {\bar{D}}_{1}^n|\Omega }^{\frac{1}{2}} {\bar{\rho }}^{\frac{1-\alpha }{\alpha }}_{B_1^n E {\bar{D}}_{1}^n} {\bar{\rho }}_{A_1^n X_1^n B_1^n E {\bar{D}}_{1}^n|\Omega }^{\frac{1}{2}}\right) ^{\alpha }} \quad \text {and} \\ \nu _{A_1^n X_1^n B_1^n E D_1^n {\bar{D}}_1^n}&= \nu _{A_1^n X_1^n B_1^n E {\bar{D}}_1^n}^{\frac{1}{2}} {\bar{\rho }}_{D_1^n | A_1^n X_1^n B_1^n E {\bar{D}}_1^n | \Omega } \nu _{A_1^n X_1^n B_1^n E {\bar{D}}_1^n}^{\frac{1}{2}} \ . \end{aligned}$$

We now use properties of \(\rho _{|\Omega }\) to simplify the expression of \(\nu \). Observing that

(50)

we can write

In addition, as \({\bar{\rho }}_{|\Omega }\) is of the form

we have

where \(\rho _{A_1^n B_1^n E, x_1^n}^{0}\) is the projector onto the support of \(\rho _{A_1^n B_1^n E, x_1^n}\). Hence,

(51)

Getting back to the inequality (49), we have \(H_{\alpha }(A_1^n X_1^n | B_1^n E {\bar{D}}_1^n)_{{\bar{\rho }}_{|\Omega }} = H_{\alpha }(A_1^n | B_1^n E )_{{\bar{\rho }}_{|\Omega }}\) using Eq. (50) to drop \({\bar{D}}_1^n\) and Lemma B.7 to drop \(X_1^n\). Moreover, using (51), we have that \(H_{\alpha }(D_1^n | A_1^n X_1^n B_1^n E {\bar{D}}_1^n)_{\nu } = H_{\alpha }(D_1^n | X_1^n {\bar{D}}_1^n)_{\nu }\). Finally, we get

$$\begin{aligned} H_{\alpha }(A_1^n D_1^n | B_1^n E {\bar{D}}_1^n)_{{\bar{\rho }}_{|\Omega }} = H_{\alpha }(A_1^n | B_1^n E)_{\rho _{|\Omega }} + H_{\alpha }(D_1^n | {\bar{D}}_1^n X_1^n)_{\nu } \ . \end{aligned}$$
(52)

It is a direct consequence of the definition of \(\tau (x)\) that

$$\begin{aligned} H_{\alpha }(D_1^n | {\bar{D}}_1^n)_{\tau (x_1^n)}= & {} n {\bar{g}} - \sum _{i=1}^n f(\delta _{x_i}) \\= & {} n {\bar{g}} - n \sum _{x \in {\mathcal {X}}} \mathsf {freq}(x_1^n)(x) f(\delta _x) = n {\bar{g}} - n f\left( \sum _{x \in {\mathcal {X}}} \mathsf {freq}(x_1^n)(x) \delta _x\right) \\= & {} n {\bar{g}} - n f(\mathsf {freq}(x_1^n)) \ , \end{aligned}$$

where we have used that f is an affine function. Using Lemma B.3 and (51) we can bound the second term on the right hand side of (52) by

$$\begin{aligned} H_{\alpha }(D_1^n | {\bar{D}}_1^n X_1^n)_{\nu }&\geqslant \min _{x_1^n \in \Omega } H_{\alpha }(D_1^n | {\bar{D}}_1^n)_{\tau ({x_1^n})}\\&\geqslant \min _{x_1^n: \, f(\mathsf {freq}(x_1^n)) \leqslant h} n {\bar{g}} - n f(\mathsf {freq}(x_1^n)) \geqslant n {\bar{g}} - n h \ . \end{aligned}$$

Inserting this in (52) and replacing \(\alpha \) with \(\frac{1}{\alpha }\) gives

$$\begin{aligned} H_{\frac{1}{\alpha }}(A_1^n | B_1^n E)_{\rho _{|\Omega }} \leqslant H_{\frac{1}{\alpha }}(A_1^n D_1^n | B_1^n E {\bar{D}}_1^n)_{{\bar{\rho }}_{|\Omega }} - n {\bar{g}} + n h \ . \end{aligned}$$

This concludes the proof of inequality (42). \(\quad \square \)

Finally, we prove Theorem 4.4 using Proposition 4.5.

Proof of Theorem 4.4

The first step is to use Lemma B.10 to lower-bound the smooth min-entropy by a Rényi entropy:

$$\begin{aligned} H_{\min }^{\varepsilon }(A_1^n | B_1^n E)_{\rho _{|\Omega }} \geqslant H^{\uparrow }_{\alpha }(A_1^n | B_1^n E)_{\rho _{|\Omega }} - \frac{g(\varepsilon )}{\alpha -1} \ . \end{aligned}$$
(53)

Then Proposition 4.5 yields

$$\begin{aligned} H_{\min }^{\varepsilon }(A_1^n | B_1^n E)_{\rho _{|\Omega }}&> n h - n \frac{(\alpha -1)}{4} V^2 - \frac{1}{\alpha '} \log \frac{1}{\rho [\Omega ]} - \frac{g(\varepsilon )}{\alpha -1}\\&> n h - n \frac{(\alpha -1)}{4} V^2 - \frac{1}{\alpha '} \log \frac{1}{\rho [\Omega ]} - \frac{\log (2/\varepsilon ^2)}{\alpha -1} \\&\geqslant n h - n \frac{(\alpha -1)}{4} V^2 - \frac{1}{(\alpha -1)} \log \frac{2}{\rho [\Omega ]^2 \varepsilon ^2} \ , \end{aligned}$$

where we have used the fact that we are constrained to choose \(\alpha \leqslant 1 + \frac{2}{V} \leqslant 2\) in the last inequality. We now choose

$$\begin{aligned} \alpha := 1 + \frac{2 \sqrt{\log \frac{2}{\rho [\Omega ]^2 \varepsilon ^2} }}{\sqrt{n} V} \end{aligned}$$
(54)

and note that, as long as

$$\begin{aligned} n > \log \frac{2}{\rho [\Omega ]^2 \varepsilon ^2} \ , \end{aligned}$$
(55)

the value \(\alpha \) is strictly smaller than \(1 + \frac{2}{V}\) and therefore within the required bounds. Note also that if (55) does not hold then the term \(c \sqrt{n}\) in the claim (29) is at least \(n V \geqslant 2 n \log (1+ 2 d_A) \geqslant 2 n \log d_A\), whereas the min-entropy is always at least \(- n \log d_A\) and \(n f_{\min }(q)\) is at most \(n \log d_A\), which means that the claim is trivial. Finally, inserting (54) into the above yields

$$\begin{aligned} H_{\min }^{\varepsilon }(A_1^n | B_1^n E)_{\rho } > n h - \sqrt{n} V \sqrt{\log \frac{2}{\rho [\Omega ]^2 \varepsilon ^2}} \ , \end{aligned}$$

as advertised. Once again, the max-entropy statement (30) holds by switching the direction of the inequalities, flipping the appropriate signs, and replacing every occurrence of \(H^{\uparrow }_{\alpha }\) by \(H_{\frac{1}{\alpha }}\). \(\quad \square \)

It might seem restrictive to assume that the tradeoff function is affine. We next show that we may take a general convex function provided the event \(\Omega \) can be described as follows: \(x^n \in \Omega \) if and only if \(\mathsf {freq}(x^n) \in {\hat{\Omega }}\) where \({\hat{\Omega }}\) is a convex subset of \({\mathbb {P}}\).

Corollary 4.7

Let \({\mathcal {M}}_1,\dots ,{\mathcal {M}}_n\) and \(\rho _{A_1^n B_1^n X_1^n E}\) be such that (26) and the Markov conditions (27) hold, let \(h \in {\mathbb {R}}, \varepsilon \in (0,1)\), let \({\hat{\Omega }}\) be a convex set \({\hat{\Omega }} \subseteq {\mathbb {P}}\) and define the corresponding event \(\Omega \subseteq {\mathcal {X}}^n\) by \(x_1^n \in \Omega \Leftrightarrow \mathsf {freq}(x_1^n) \in {\hat{\Omega }}\). Then, if f is a differentiable and convex min-tradeoff function for \({\mathcal {M}}_1,\dots ,{\mathcal {M}}_n\) satisfying \(f(q) \geqslant h\) for all \(q \in {\hat{\Omega }}\), we have

$$\begin{aligned} H_{\min }^{\varepsilon }(A_1^n | B_1^n E)_{\rho _{|\Omega }}&> n h - c \sqrt{n} \end{aligned}$$
(56)

where \(c = 2 \bigl (\log (1+2 d_A) + \left\lceil \Vert \nabla f \Vert _\infty \right\rceil \bigr ) \sqrt{1- 2 \log (\varepsilon \rho [\Omega ])}\). Similarly, if f is a differentiable and concave max-tradeoff function for \({\mathcal {M}}_1,\dots ,{\mathcal {M}}_n\) satisfying \(f(q) \leqslant h\) for all \(q \in {\hat{\Omega }}\), we have

$$\begin{aligned} H_{\max }^{\varepsilon }(A_1^n | B_1^n E)_{\rho _{|\Omega }}&< n h + c \sqrt{n} \ . \end{aligned}$$
(57)

Proof

Let us denote by \(\mathrm {cl}({\hat{\Omega }})\) the closure of the set \({\hat{\Omega }}\). Now as f is continuous on the compact set \(\mathrm {cl}({\hat{\Omega }})\) (it is even assumed to be differentiable on all of \({\mathbb {P}}\)), we have \(\min _{q \in \mathrm {cl}({\hat{\Omega }})} f(q) = f(q_0)\) for some \(q_0 \in \mathrm {cl}({\hat{\Omega }})\). By continuity of f and by definition of h, we have \(f(q_0) \ge h\). Now consider the affine function \(g(q) = (\nabla f)_{q_0} \cdot (q - q_0) + f(q_0)\). By convexity of f, we have that \(g(q) \le f(q)\) for all \(q \in {\mathbb {P}}\) and thus g is a min-tradeoff function. In addition, as \(\mathrm {cl}({\hat{\Omega }})\) is convex we can apply the first order optimality conditions and get that \((\nabla f)_{q_0} \cdot (q - q_0) \ge 0\) for all \(q \in \mathrm {cl}({\hat{\Omega }})\). As a result, for all \(q \in \mathrm {cl}({\hat{\Omega }})\), we have \(g(q) \geqslant f(q_0) \ge h\). This implies that if \(x_1^n \in \Omega \), then \(g(\mathsf {freq}(x^n)) \ge h\). We can then apply Theorem 4.4 with the affine tradeoff function g and get the desired result as \(\Vert \nabla g \Vert _{\infty } \le \Vert \nabla f \Vert _{\infty }\).

The proof for \(H^{\varepsilon }_{\max }\) is analogous. \(\quad \square \)

One natural choice for the event \(\Omega \) is that the empirical distribution \(\mathsf {freq}(X_1^n)\) takes a particular value q. This yields the following special case of Corollary 4.7.

Corollary 4.8

Let \({\mathcal {M}}_1,\dots , {\mathcal {M}}_n\) and \(\rho _{A_1^n B_1^n X_1^n E}\) be such that (26) and the Markov conditions (27) hold. Then, for any differentiable and convex min-tradeoff function f for \({\mathcal {M}}_1, \ldots , {\mathcal {M}}_n\) and for any \(q \in {\mathbb {P}}\), we have

$$\begin{aligned} H_{\min }^\varepsilon (A_1^n | B_1^n E)_{\rho _{|q}} > n f(q) - c \sqrt{n} \end{aligned}$$

where \(c = 2 \bigl (\log (1+2 d_A) + \left\lceil \Vert \nabla f(q) \Vert _\infty \right\rceil \bigr ) \sqrt{1- 2 \log (\varepsilon \rho [q]})\), where \(\rho _{|q}\) denotes the state \(\rho \) conditioned on the event that \(\mathsf {freq}(X_1^n) = q\), and \(\rho [q]\) the probability of this event.

Note that an analogous statement holds of course for the max-entropy, replacing f by a concave max-tradeoff function and changing the inequality accordingly.

The following corollary specialises the above to the formulation (3), in which no statistical test is being done, i.e. the \(X_i\) systems are trivial. We provide the statement for the case of the lower boundary.

Corollary 4.9

Let \({\mathcal {M}}_1,\dots , {\mathcal {M}}_n\) and \(\rho _{A_1^n B_1^n E}\) be such that (26) and the Markov conditions (27) hold. Then

$$\begin{aligned} H_{\min }^\varepsilon (A_1^n | B_1^n E)_{\rho } > \sum _ {i=1}^n \inf _{\omega _{R_{i-1} R}} H(A_i | B_i R)_{({\mathcal {M}}_i \otimes {\mathcal {I}}_R)(\omega _{R_{i-1} R})} - c \sqrt{n} \end{aligned}$$

where \(c = 3 (\log (1+2 d_A) \bigr ) \sqrt{1- 2 \log (\varepsilon })\).

Proof

Note that the quantity \(H_{\min }^\varepsilon (A_1^n | B_1^n E)_{\rho }\) only depends on the marginal of the state \(\rho \) on \(A_1^n B_1^n E\). Thus, we can modify the maps \({\mathcal {M}}_i\) in any way that does not affect the reduced state \(\rho _{A_1^n B_1^n E}\) before applying Corollary 4.8. In particular, we change \({\mathcal {M}}_i\) so that the original value of \(X_i\) is disregarded and replaced with the constant value \(X_i = i\). The values \(X_1, \ldots , X_n\) can then be regarded as random variables with alphabet \({\mathcal {X}}= \{1, \ldots , n\}\). We define the real function f on \({\mathbb {P}}\) as

$$\begin{aligned} f(q) = \sum _{i=1}^n q(i) \inf _{\omega _{R_{i-1} R}} H(A_i | B_i R)_{({\mathcal {M}}_i \otimes {\mathcal {I}}_R)(\omega _{R_{i-1} R})} \ . \end{aligned}$$

Note that for any \(i \in \{1,\dots , n\}\) and any \(q \in {\mathbb {P}}\), we have either \(q(i) \ne 1\) in which case \(\Sigma _i(q) = \emptyset \) (we use the notation in (28)) and the min-tradeoff condition is trivial or \(q(i) = 1\), in which case \(\Sigma _i(q) = \{({\mathcal {M}}_i \otimes {\mathcal {I}}_R)(\omega _{R_{i-1} R}) : \omega _{R_{i-1} R} \in \mathrm {D}(R_{i-1} \otimes R) \}\). Thus for any \(q \in {\mathbb {P}}\),

$$\begin{aligned} f(q) \le \inf _{\omega _{R_{i-1} R}} H(A_i | B_i R)_{({\mathcal {M}}_i \otimes {\mathcal {I}}_R)(\omega _{R_{i-1} R})} \ . \end{aligned}$$

As a result, f is a min-tradeoff function for all \({\mathcal {M}}_i\) for \(i \in \{1, \dots , n\}\). We now fix \(q \in {\mathbb {P}}\) such that \(q(1) = \cdots = q(n) = \frac{1}{n}\), in which case the event \(\mathsf {freq}(X_1^n) = q\) occurs with certainty. Because

$$\begin{aligned} \Vert \nabla f(q) \Vert _\infty \leqslant \log d_A \end{aligned}$$

which implies that \(\left\lceil \Vert \nabla f(q) \Vert _\infty \right\rceil \le \log (1+2d_A)\), the claim follows immediately from Corollary 4.8. \(\quad \square \)

As indicated in the introduction, in the special case where the individual pairs \((A_i, B_i)\) are independent and identically distributed (IID), the entropy accumulation theorem corresponds to the Quantum Asymptotic Equipartition Property [54]. We can therefore formulate the latter as a corollary of Theorem 4.4.Footnote 14

Corollary 4.10

For any bipartite state \(\nu _{A B}\), any \(n \in {\mathbb {N}}\), and any \(\varepsilon \in (0,1)\),

$$\begin{aligned} \frac{1}{n} H_{\min }^{\varepsilon }(A_1^n | B_1^n)_{\nu ^{\otimes n}} > H(A|B)_{\nu } - 2 \sqrt{\frac{1-2 \log \varepsilon }{n}} \log (1+2 d_A) \ . \end{aligned}$$

Proof

Let, for any \(i=1, \ldots , n\), \({\mathcal {M}}_i\) be the TPCP map from R to XABR which sets AB to state \(\nu _{A B}\) and where X and R are trivial (one-dimensional) systems. The concatenation of these maps thus generates the state \(\rho _{A_1^n B_1^n} = \nu _{A B}^{\otimes n}\). The claim is then obtained from Theorem 4.4 with the trade-off function f being a constant equal to \(h = H(A|B)_{\nu }\) and with \(\Omega \) as the certain event. \(\quad \square \)

5 Applications

Entropy is a rather general notion and, accordingly, entropy accumulation has applications in various areas of physics, information theory, and computer science. An example from physics is the phenomena of thermalisation. It is known that a system can only thermalise if its smooth min-entropy is sufficiently large [18]. To illustrate how Theorem 4.4 could give an estimate of this entropy, consider a system of interest (e.g., a cup of coffee) which is in contact with a large environment (the air around it). Suppose that, for an appropriately chosen discretisation of the evolution, the system interacts at each time step with a different part of the environment (e.g., with different air molecules bouncing off the coffee cup).Footnote 15 Theorem 4.4 then provides a bound on the total entropy that is transferred to the environment in terms of the von Neumann entropy transferred in each time step. Because the joint time evolution of system and environment is unitary, this entropy flow to the environment could be expressed in terms of the entropy change of the system itself. The argument would therefore prove that the total entropy acquired by the system over many time steps is bounded by the sum of the entropies produced in each individual time step.

Another area where the notion of entropy plays a crucial role is quantum cryptography. Many proofs of security of cryptographic protocols involve lower-bounding the uncertainty that a dishonest adversary has about some system of interest. The state-of-the-art is to derive such bounds using a combination of de Finetti-type theorems as well as the Quantum Asymptotic Equipartition Property [4, 15, 41, 42]. However, the use of de Finetti theorems comes with various disadvantages. Firstly, they are only applicable under certain assumptions on the symmetry of the protocols. Secondly, they introduce additional error terms that can be large in the practically relevant finite-size regime [47]. Finally, it is not known how to apply de Finetti theorems in a device-independent scenario (see [21] for an overview and references on device-independent cryptography). These problems can all be circumvented by the use of entropy accumulation, as demonstrated in [5] for the case of device-independent quantum key distribution and randomness expansion. The resulting security statements are valid against general attacks and essentially optimal in the finite-size regime.

In the remainder of this section, we illustrate the use of entropy accumulation with two concrete examples. The first is a security proof for a basic quantum key distribution protocol. The second is a novel derivation of an upper bound on the fidelity of fully quantum random access codes.

5.1 Sample application: security of quantum key distribution

A Quantum Key Distribution (QKD) protocol enables two parties, Alice and Bob, to establish a common secret key, i.e., a string of random bits unknown to a potential eavesdropper, Eve. The setting is such that Alice and Bob can communicate over a quantum channel, which may however be fully controlled by Eve. In addition, Alice and Bob have a classical communication link which is assumed to be authenticated, i.e., Eve may read but cannot alter the classical messages exchanged between Alice and Bob. The protocol is said to be secure against general attacks if any attack by Eve is either detected (in which case the protocol aborts) or does not compromise the secrecy of the final key. Here, we will show that our main theorem can be directly applied to show security against general attacks for a fairly standard QKD protocol. As a bonus, our proof still holds even if we do not make any assumptions about Bob’s measurement device: the POVM applied by Bob at every step of the protocol can be arbitrary, and may vary from one step to the next (thereby achieving one-sided measurement device independence as in [58], but without the restriction to memoryless devices; see also [56]). In fact, as shown in [5], the entropy accumulation theorem can be used to prove the security of fully device-independent quantum key distribution.

For concreteness, we consider here a variant of the E91 QKD protocol [22] (and note that any security proof for this protocol also implies security of the BB84 protocol [10, 11]). The protocol consists of a sequence of instructions for Alice and Bob, as described in the box below. These depend on certain parameters, including the number, n, of qubits that need to be transmitted over the quantum channel, the maximum tolerated noise level, e, of this channel, as well as the key rate, r, which is defined as the number of final key bits divided by n. In the first protocol step, Alice and Bob need to measure their qubits at random in one of two mutually unbiased bases, which we term the computational and the diagonal basis. These are chosen with probability \(1-\mu \) and \(\mu \), respectively, for some \(\mu \in (0,1)\). The protocol also invokes an error correction scheme termed \(\mathrm {EC}\), which allows Bob to infer the measurement outcomes obtained by Alice for the set of indices S where the basis choices of Alice and Bob were the same. Note that if the protocol was implemented without any noise, then Bob’s outcomes would match exactly with Alice’s outcomes on the indices S and no error correction would be required. However, in the presence of noise, such an error correction step is needed. For this, Alice needs to send classical error correcting information to Bob, whose maximum relative length is characterised by another parameter, \(\vartheta _{\mathrm {EC}}\). We assume that \(\mathrm {EC}\) is reliable. This means that, except with negligible probability, Bob either obtains a correct copy of Alice’s string or he is notified that the string cannot be inferred.Footnote 16

figure a

The security of QKD against general attacks has been established in a sequence of works [13, 32, 33, 41, 49]. Specifically, for the E91 protocol, the following result has been shown.

Theorem 5.1

The E91 protocol is secure for any choice of protocol parameters satisfyingFootnote 17

$$\begin{aligned} r < 1 - H_{\mathrm {Sh}}(e) - \vartheta _{\mathrm {EC}} - 2 \mu \ , \end{aligned}$$
(58)

provided that n is sufficiently large.

Note that, because \(\mu > 0\) can be chosen arbitrarily small, the theorem implies that the E91 protocol can generate secret keys at an asymptotic rate of \(1-H_{\mathrm {Sh}}(e) - \vartheta _{\mathrm {EC}}\). We now show how this result can be obtained using the notion of entropy accumulation.

Proof

According to a standard result on two-universal hashing (see, for instance, Corollary 5.6.1 of [41]), the key \(F(A_S)\) computed in the privacy amplification step is secret to an adversary holding information \(E'\) if the smooth min-entropy of \(A_S\) conditioned on \(E'\) is sufficiently larger than the output size of the hash function F. Since, in our case, this size is \(\lfloor r n \rfloor \), the condition reads

$$\begin{aligned} n r \leqslant H_{\min }^{\varepsilon }(A_S | E')_{\rho _{|\Omega }} - O(1) \ , \end{aligned}$$
(59)

where the entropy is evaluated for the joint state \(\rho _{|\Omega }\) of \(A_S\) and \(E'\) conditioned on the event \(\Omega \) that the protocol is not aborted and that Bob’s guess \({\hat{A}}_S\) of \(A_S\) is correct. The smoothing parameter \(\varepsilon \in (0,1)\) specifies the desired level of secrecy,Footnote 18 and we assume here that it is constant (independent of n). Because conditioning the smooth min-entropy of a classical variable on an additional bit cannot decrease its value by more than 1 (see, e.g., Proposition 5.10 of [51]), we may bound the smooth min-entropy in (59) by

$$\begin{aligned} H_{\min }^\varepsilon (A_S | E')_{\rho _{|\Omega }} \geqslant H_{\min }^\varepsilon (A_S | B_1^n {\bar{B}}_1^n E)_{\rho _{|\Omega }} - |S| \vartheta _{\mathrm {EC}} \geqslant H_{\min }^\varepsilon (A_S | B_1^n {\bar{B}}_1^n E)_{\rho _{|\Omega }} - n \vartheta _{\mathrm {EC}} \ , \end{aligned}$$
(60)

where E denotes all information held by Eve after the distribution step, and where \(|S| \vartheta _{\mathrm {EC}}\) is the maximum number of bits exchanged for error correction. Note that we also included the basis information \(B_1^n\) and \({\bar{B}}_1^n\) in the conditioning part because Eve may obtain this information during the sifting and information reconciliation step. We are thus left with the task of lower bounding \(H_{\min }^\varepsilon (A_S|B_1^n {\bar{B}}_1^n E)_{|\rho _{|\Omega }}\), which is usually the central part of any security proof. Since it is also the part where entropy accumulation is used, we formulate it separately as Claim 5.2 below. Inserting this claim into (60), we conclude that the secrecy condition (59) is fulfilled whenever

$$\begin{aligned} n r \leqslant n \bigl (1-H_{\mathrm {Sh}}(e) - \vartheta _{\mathrm {EC}} - 2\mu \bigr ) - o(n) \end{aligned}$$

holds. But this is clearly the case for any choice of parameters satisfying (58), provided that n is sufficiently large. \(\quad \square \)

It remains to show the separate claim, which we do using entropy accumulation.

Claim 5.2

Let \(A_1^n\), \(B_1^n\), \({\bar{B}}_1^n\), and S be the information held by Alice and Bob as defined by the protocol, let E be the information gathered by Eve during the distribution step, and let \(\Omega \) be the event that the protocol is not aborted and that Bob’s guess \({\hat{A}}_S\) of \(A_S\) is correct. Then, provided that \(\Omega \) has a non-negligible probability (i.e., it does not decrease exponentially fast in n),

$$\begin{aligned} H_{\min }^\varepsilon (A_S | B_1^n {\bar{B}}_1^n E)_{\rho _{|\Omega }} > n \bigl (1 - 2\mu - H_{\mathrm {Sh}}(e)\bigr ) - o(n) \ . \end{aligned}$$
(61)

Proof

Let \(\rho ^0_{Q_1^n {\bar{Q}}_1^n E}\) be the joint state of Alice and Bob’s qubit pairs before measurement, together with the information E gathered by Eve during the distribution step, and let

$$\begin{aligned} \rho _{A_1^n {\bar{A}}_1^n B_1^n {\bar{B}}_1^n X_1^n E} = ({\mathcal {M}}_n \circ \cdots \circ {\mathcal {M}}_1 \otimes {\mathcal {I}}_E)(\rho ^0_{Q_1^n {\bar{Q}}_1^n E}) \ , \end{aligned}$$

where \({\mathcal {M}}_i\), for any \(i \in \{1, \ldots , n\}\), is the TPCP map from \(Q_i^n {\bar{Q}}_i^n\) to \(Q_{i+1}^n {\bar{Q}}_{i+1}^n A_i {\bar{A}}_i B_i {\bar{B}}_i X_i\) defined as follows:

  1. (i)

    \(B_i, {\bar{B}}_i\): random bits chosen independently according to the distribution \((1-\mu , \mu )\)

  2. (ii)

    \(A_i = {\left\{ \begin{array}{ll} \text {if } B_i = {\bar{B}}_i = 0: &{} \text {outcome of measurement of }Q_i\text { in computational basis} \\ \text {if } B_i = {\bar{B}}_i = 1: &{} \text {outcome of measurement of } Q_i\text { in diagonal basis} \\ \text {if } B_i \ne {\bar{B}}_i: &{} \perp \end{array}\right. }\)

  3. (iii)

    \({\bar{A}}_i = {\left\{ \begin{array}{ll} \text {if } B_i = {\bar{B}}_i = 1: &{} \text {outcome of measurement of }{\bar{Q}}_i\text { in diagonal basis} \\ \text {otherwise}: &{} \perp \end{array}\right. }\)

  4. (iv)

    \(X_i = {\left\{ \begin{array}{ll} \text {if } B_i = {\bar{B}}_i = 1: &{} A_i \oplus {\bar{A}}_i \\ \text {otherwise}: &{} \perp \end{array}\right. }\)

  5. (v)

    \(Q_{i+1}^n\) and \({\bar{Q}}_{i+1}^n\) are left untouched.

Note that the values \(B_1^n\) and \({\bar{B}}_1^n\) correspond to the ones generated during the distribution step of the protocol. The same is true for \(A_1^n\), with the modification that \(A_i\) holds the measurement outcome only if \(B_i = {\bar{B}}_i \). That is, \(A_i \ne \perp \) if and only if \(i \in S\), where S is the set determined in the sifting step. We can therefore rewrite (61) as

$$\begin{aligned} H_{\min }^\varepsilon (A_1^n | B_1^n {\bar{B}}_1^n E)_{\rho _{|\Omega }} > n \bigl (1 - 2\mu - H_{\mathrm {Sh}}(e)\bigr ) - o(n) \ . \end{aligned}$$
(62)

To prove this inequality, we use Theorem 4.4 with the replacements \(A_i \rightarrow A_i {\bar{A}}_i\), \(B_i \rightarrow B_i {\bar{B}}_i\), \(X_i \rightarrow X_i\), and \(R_i \rightarrow Q_{i+1}^n {\bar{Q}}_{i+1}^n\). We note that \(X_i\) is a deterministic function of the classical registers \(A_i {\bar{A}}_i\) and \(B_i {\bar{B}}_i\). To obtain the bound in (62), we need to define a min-tradeoff function. Let \(i \in \{1, \ldots , n\}\) and consider the state

$$\begin{aligned} \nu _{X_i A_i {\bar{A}}_i B_i {\bar{B}}_i R} = \mathrm {tr}_{Q_{i+1}^n {\bar{Q}}_{i+1}^n}({\mathcal {M}}_i \otimes {\mathcal {I}}_R)(\omega _{Q_{i}^n {\bar{Q}}_{i}^n R}) \ , \end{aligned}$$

where \(\omega _{Q_{i}^n {\bar{Q}}_{i}^n R}\) is an arbitrary state. Let furthermore \(\nu _{|b} = \nu _{X_i A_i {\bar{A}}_i R | b}\) be the corresponding state obtained by conditioning on the event that \(B_i = {\bar{B}}_i = b\), for \(b \in \{0, 1\}\). We may now bound the entropy of \(A_i\) using the entropic uncertainty relation proved in [12], which asserts that

$$\begin{aligned} H(A_i | R)_{\nu _{|0}} \geqslant 1 - H(A_i | {\bar{A}}_i)_{\nu _{|1}} \ . \end{aligned}$$

By the definition of \(X_i\), we also have

$$\begin{aligned} H(A_i | {\bar{A}}_i)_{\nu _{|1}} = H(X_i)_{\nu _{|1}} = H_{\mathrm {Sh}}\left( {\textstyle \frac{\nu _{X_i}(1)}{\nu _{X_i}(0) + \nu _{X_i}(1)}}\right) = H_{\mathrm {Sh}}\left( {\textstyle \frac{\nu _{X_i}(1)}{\mu ^2}}\right) \ , \end{aligned}$$

where we wrote \(\nu _{X_i}\) to denote the probability distribution on \(\{0, 1, \bot \}\) defined by the state \(\nu \), and where we have used that \(\nu _{X_i}(0) + \nu _{X_i}(1) = \mu ^2\). Furthermore, because \(A_i\) is classical, its von Neumann entropy cannot be negative, which implies that

$$\begin{aligned} H(A_i | B_i {\bar{B}}_i R)_{\nu } \geqslant \nu _{B_i {\bar{B}}_i}(0,0) H(A_i | R)_{\nu _{|0}} = (1-\mu )^2 H(A_i | R)_{\nu _{|0}} \geqslant H(A_i | R)_{\nu _{|0}} - 2 \mu + \mu ^2 \ . \end{aligned}$$

Combining this with the above, we find that

$$\begin{aligned} H(A_i {\bar{A}}_i | B_i {\bar{B}}_i R)_{\nu } \geqslant H(A_i | B_i {\bar{B}}_i R)_{\nu } \geqslant {\tilde{f}}(\nu _{X_i}) \end{aligned}$$

holds for

$$\begin{aligned} {\tilde{f}}(q) = {\left\{ \begin{array}{ll} 1 - 2 \mu + \mu ^2 - H_{\mathrm {Sh}}\left( {\textstyle \frac{q(1)}{\mu ^2}} \right) &{} \text {if }q(0) + q(1) = \mu ^2 \\ 1 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

In other words, \({\tilde{f}}\) is a min-tradeoff function for \({\mathcal {M}}_i\). Furthermore, because the binary Shannon entropy \(H_{\mathrm {Sh}}\) is concave, \({\tilde{f}}\) is convex. We may thus define a linearised min-tradeoff function f as a tangent hyperplane to \({\tilde{f}}\) at the point \(q_0\) given by \(q_0(0) = (1-e) \mu ^2\), \(q_0(1) = e \mu ^2\), and \(q_0(\bot ) = 1-\mu ^2\). Furthermore, we define

$$\begin{aligned} h = f(q_0) = {\tilde{f}}(q_0) = 1 - 2 \mu + \mu ^2 - H_{\mathrm {Sh}}(e) \ . \end{aligned}$$

Finally, note that the event \(\Omega \) that Bob’s guess of \(A_S\) is correct and that the protocol is not aborted implies that \(q = \mathsf {freq}(X_1^n)\) is such that \(\frac{q(1)}{\mu ^2} \leqslant e\) and, hence, \(f(\mathsf {freq}(X_1^n)) \geqslant h\). Since we assumed that \(\Omega \) has non-negligible probability, Theorem 4.4 implies that

$$\begin{aligned} H_{\min }^{\varepsilon /4}(A_1^n {\bar{A}}_1^n | B_1^n {\bar{B}}_1^n E)_{\rho _{|\Omega }} > n h - o(n) = n \bigl (1 - 2\mu + \mu ^2 - H_{\mathrm {Sh}}(e)\bigr ) - o(n) \ . \end{aligned}$$

(Note that the Markov chain conditions are satisfied because \(B_i\) and \({\bar{B}}_i\) are chosen at random independently of any other information.) Furthermore, because \({\bar{A}}_i\) equals \(\bot \) unless \(B_i = {\bar{B}}_i = 1\), which occurs with probability \(\mu ^2\), we have

$$\begin{aligned} H_{\max }^{\frac{\varepsilon }{4}}({\bar{A}}_1^n | A_1^n B_1^n {\bar{B}}_1^n E)_{\rho _{|\Omega }} \leqslant H_{\max }^{\frac{\varepsilon }{4}}({\bar{A}}_1^n | B_1^n {\bar{B}}_1^n)_{\rho _{|\Omega }} \leqslant \mu ^2 n + o(n) \ . \end{aligned}$$

Combining these inequalities with the chain rule for smooth entropies (see Theorem 15 of [60]),

$$\begin{aligned} H_{\min }^\varepsilon (A_1^n | B_1^n {\bar{B}}_1^n E)_{\rho _{|\Omega }} \geqslant H_{\min }^{\varepsilon /4}(A_1^n {\bar{A}}_1^n| B_1^n {\bar{B}}_1^n E)_{\rho _{|\Omega }} - H_{\max }^{\varepsilon /4}({\bar{A}}_1^n | A_1^n B_1^n {\bar{B}}_1^n E)_{\rho _{|\Omega }} - O(1) \ , \end{aligned}$$

proves (62) and, hence, Claim 5.2. \(\quad \square \)

5.2 Sample application: fully quantum random access codes

One relatively simple application of our main result is to give upper bounds on the fidelity achieved by so-called Fully Quantum Random Access Codes (FQRAC). An FQRAC is a method for encoding m message qubits into \(n < m\) code qubits, such that any subset of k message qubits can be retrieved with high fidelity. Limits on the performance of random access codes with classical messages are rather well understood: the case \(k=1\) was studied in [1, 2, 38], and upper bounds on the success probability that decay exponentially in k were derived in [9, 20, 65]. In the fully quantum case, [20] gives similar upper bounds on the fidelity that decay exponentially in k. Here, we show that such exponential bounds for the fully quantum case can be obtained in a relatively elementary fashion via the concept of entropy accumulation. The example also highlights that entropy accumulation is already useful in its basic form (3), which does not involve statistics information \(X_i\). Indeed, here the bound on the entropy produced at every step comes from the bound on the number of code qubits.

Definition 5.3

A \((\varepsilon ,m,n,k)\)-Fully Quantum Random Access Code (FQRAC) consists of an encoder \({\mathcal {E}}_{{M'}_1^m \rightarrow C_1^n}\) and a decoder \({\mathcal {D}}_{C_1^n S \rightarrow {\bar{M}}_S S}\), where \({M'}_1^m\) represents the m message qubits, \(C_1^n\) represents the n code qubits, S represents a classical description of a subset of \(\{1,\dots ,m\}\) of size k, and \({\bar{M}}_S\) represents the output of the decoder, corresponding to the k positions of \({M'}_1^m\) listed in S. Such a code must satisfy the following: for any state \(\rho _{R {M'}_1^m S}\) which is classical on S, we must have that

$$\begin{aligned} F\big ({\mathcal {S}}(\rho _{R {M'}_1^m S}), ({\mathcal {D}} \circ {\mathcal {E}})(\rho _{R {M'}_1^m S}) \big )^2 \geqslant 1 - \varepsilon , \end{aligned}$$

where R is a reference system of arbitrary dimension, and where \({\mathcal {S}}_{ {M'}_1^m S \rightarrow {\bar{M}}_S S}\) is a TPCP map that selects the k positions of \({M'}_1^m\) corresponding to those in S and outputs them into \({\bar{M}}_S\). Moreover, \(F(\rho ,\sigma ) := \Vert \sqrt{\rho } \sqrt{\sigma }\Vert _1\) refers to the fidelity between two states \(\rho \) and \(\sigma \).

Entropy accumulation gives the following constraint on FQRACs:

Theorem 5.4

A \((\varepsilon ,m,n,k)\)-FQRAC satisfies

$$\begin{aligned} 1-\varepsilon = f^2 < 2^{-k \left( \frac{m-n-k+1}{ 5 m} \right) ^2 + 3} \ . \end{aligned}$$
(63)

Compared to the previously derived bound (Theorem 9 of [20]), the one obtained here is tighter for small k,Footnote 19 whereas it is weaker for large k.

Proof

Since the fidelity bound must be true for any state \(\rho \), it must in particular be true for the state consisting of m maximally entangled pairs and a uniform distribution over subsets S. For every \(i \in \{1,\dots ,k\}\), define

$$\begin{aligned} {\mathcal {M}}_i: M_1^{m-i+1} \rightarrow M_1^{m-i} {\bar{J}}_i {\hat{M}}_i \end{aligned}$$

as a TPCP map that does the following:

  1. 1.

    Generate an index \({\bar{J}}_i\) at random from \(\{ 1,\dots ,m - i +1 \}\).

  2. 2.

    Move the contents of \(M_{{\bar{J}}_i}\) into \({\hat{M}}_i\), and set \(M_1^{m-i}\) to the contents of \(M_1^{m-i+1}\) with the \({\bar{J}}_i\)th position removed.

Finally, define the state

where . The next step is to use Theorem 4.4 on the state \(\rho ^k\) with the identifications

$$\begin{aligned} A_i \rightarrow {\hat{M}}_i \qquad B_i \rightarrow {\bar{J}}_i \qquad E \rightarrow C_1^n \end{aligned}$$

and the tradeoff function f being the constant function equal to

$$\begin{aligned} \inf _{i, \nu ^i} H({\hat{M}}_i | {\hat{M}}_1^{i-1} {\bar{J}}_1^i C_1^n)_{\nu ^i} \ , \end{aligned}$$

where the infimum is taken over states \(\nu ^i\) of the form

$$\begin{aligned} \nu ^i_{{\hat{M}}_1^{i} {\bar{J}}_1^i C_1^n} = {\mathcal {M}}_i\left( \omega ^i_{{\hat{M}}_1^{i-1} {\bar{J}}_1^{i-1} M_1^{m-i+1} C_1^n} \right) \ , \end{aligned}$$

for some state \(\omega ^i\). Here we also used Remark 4.3, which asserts that the system R that is used when defining the min-tradeoff function can be chosen isomorphic to \(A_1^{i-1} B_1^{i-1} E\). Note that the Markov chain condition is immediate from the fact that \({\bar{J}}_i\) is chosen at random. As the systems \(X_i\) are trivial, we naturally take \(\Omega \) to be the certain event. We find that

$$\begin{aligned} H_{\min }^{f/2}({\hat{M}}_1^k | {\bar{J}}_1^k C_1^n)_{\rho ^k}&\geqslant k \inf _{i, \nu ^i} H({\hat{M}}_i|{\hat{M}}_1^{i-1} {\bar{J}}_1^i C_1^n)_{\nu ^i} - \sqrt{4 k \log \frac{8}{f^2}} \log 5 \ . \end{aligned}$$

Furthermore, again by Remark 4.3, if part of B is classical in \(\rho \), then it remains classical in \(\nu \). As a result, we can assume in the following that \({\bar{J}}_{1}^{i-1}\) is a classical system in \(\nu ^i\).

We continue by computing the expectation over the choice of \({\bar{J}}_i\):

$$\begin{aligned} H({\hat{M}}_i|{\hat{M}}_1^{i-1} {\bar{J}}_1^i C_1^n)_{\nu ^i}&= \frac{1}{m-i+1} \sum _{j_i = 1}^{m-i+1} H(M_{j_i} | {\hat{M}}_1^{i-1} C_1^n {\bar{J}}_1^{i-1})_{\omega ^i} \end{aligned}$$
(64)
$$\begin{aligned}&\geqslant \frac{1}{m} \sum _{j_i = 1}^{m-i+1} H(M_{j_i} | M_1^{j_i-1} {\hat{M}}_1^{i-1} C_1^n {\bar{J}}_1^{i-1})_{\omega ^i} \end{aligned}$$
(65)
$$\begin{aligned}&= \frac{1}{m} H(M_{1}^{m-i+1} | {\hat{M}}_1^{i-1} C_1^n {\bar{J}}_1^{i-1})_{\omega ^i} \end{aligned}$$
(66)
$$\begin{aligned}&= \frac{1}{m} \left( H(M_{1}^{m-i+1} {\hat{M}}_1^{i-1} C_1^n | {\bar{J}}_1^{i-1})_{\omega ^i} - H({\hat{M}}_1^{i-1} C_1^n | {\bar{J}}_1^{i-1})_{\omega ^i}\right) \end{aligned}$$
(67)
$$\begin{aligned}&\geqslant \frac{-n-k+1}{m} \ , \end{aligned}$$
(68)

where the last inequality holds because \({\bar{J}}_i\) is classical, which implies that the first entropy in the bracket of the penultimate expression is non-negative, and because the second entropy in the bracket is upper bounded by \(n+k-1\).

We now use Proposition 5.5 and Remark 5.6 of [51], which imply thatFootnote 20

$$\begin{aligned} H_{\max }^{\sqrt{1-f^2}}({\hat{M}}_1^k | {\bar{J}}_1^k C_1^n)_{\rho ^k}\geqslant & {} H_{\min }^{f/2}({\hat{M}}_1^k | {\bar{J}}_1^k C_1^n)_{\rho ^k} - \log \frac{1}{1-\bigl (f^2/2 + \sqrt{1-f^2} \sqrt{1-f^2/4}\bigr )^2} \\\geqslant & {} H_{\min }^{f/2}({\hat{M}}_1^k | {\bar{J}}_1^k C_1^n)_{\rho ^k} - \log \frac{3}{f^3} \ , \end{aligned}$$

where the second inequality holds because the denominator in the logarithm is lower bounded by \(f^3/3\), as can be readily verified. Combining this with the above gives

$$\begin{aligned} H_{\max }^{\sqrt{1-f^2}}({\hat{M}}_1^k | {\bar{J}}_1^k C_1^n)_{\rho ^k} \geqslant - k \left( \frac{n+k-1}{m} \right) - \sqrt{4 k \log \frac{8}{f^2}} \log 5 - \log \frac{3}{f^3} \ . \end{aligned}$$

Conversely, note that, by assumption, the purified distance between \(\rho ^k\) and the state consisting of k maximally entangled qubit pairs is upper bounded by \(\sqrt{1-(1-\varepsilon )} = \sqrt{1-f^2}\). Since the max-entropy of k maximally entangled qubit pairs equals \(-k\), we have

$$\begin{aligned} H_{\max }^{\sqrt{1-f^2}}({\hat{M}}_1^k | {\bar{J}}_1^k C_1^n)_{\rho ^k} \leqslant -k \ . \end{aligned}$$

We have thus derived the condition

$$\begin{aligned} \sqrt{4 k \log \frac{8}{f^2}} \log 5 \geqslant k \left( \frac{m-n-k+1}{m} \right) - \log \frac{3}{f^3} \ . \end{aligned}$$
(69)

It is easy to verify that this condition is violated whenever

$$\begin{aligned} \log \frac{8}{f^2} > k \left( \frac{m-n-k+1}{ 5 m} \right) ^2 \end{aligned}$$
(70)

is violated. In fact, if \(\log \frac{8}{f^2} \leqslant k \left( \frac{m-n-k+1}{ 5 m} \right) ^2\), then we have

$$\begin{aligned} 4k \log \frac{8}{f^2} \log ^2 5&\leqslant \frac{4 \log ^2 5}{25} k^2 \left( \frac{m-n-k+1}{ m} \right) ^2 \ , \text { and } \\ \log \frac{3}{f^3}&\leqslant \frac{3}{2} \log \frac{8}{f^2} \leqslant \frac{3}{50} k \left( \frac{m-n-k+1}{ m} \right) ^2 \leqslant \frac{3}{50} k \left( \frac{m-n-k+1}{ m} \right) \ . \end{aligned}$$

Adding the square root of the first inequality and the second one, we get that inequality (69) is violated. Thus, the condition (70) must hold, and therefore also (63). \(\quad \square \)

6 Conclusions

Informally speaking, entropy accumulation is the claim that the operationally relevant entropy (the smooth min- or max-entropy) of a multipartite system is well approximated by the sum of the von Neumann entropies of its individual parts. This has ramifications in various areas of science, ranging from quantum cryptography to thermodynamics.

As described in Sect. 5, current cryptographic security proofs have various fundamental and practical limitations [46]. That these can be circumvented using entropy accumulation has already been demonstrated in [5] for the case of device-independent cryptography. We anticipate that the approach can be applied similarly to other cryptographic protocols. Examples include quantum key distribution protocols such as DPS and COW [27, 50], for which full security has not yet been established.Footnote 21 One may also expect to obtain significantly improved security bounds for protocols that involve high-dimensional information carriers and, in particular, continuous-variable protocols [25, 62].Footnote 22 A strengthening of current security claims may as well be obtained for other cryptographic constructions, such as bit commitment and oblivious transfer protocols (see, for example, [16, 20, 29]).

Entropy accumulation can also be used in statistical mechanics, e.g., to characterise thermalisation processes. At the beginning of Sect. 5 we outlined an argument that could confirm | and make precise | the intuition that entropy production (in terms of von Neumann entropy) is relevant for thermalisation. However, to base such arguments on physically realistic assumptions, it may be necessary to generalise Theorem 4.4 to the case where the Markov conditions (27) do not hold exactly. One possibility, motivated by the main result of [23], could be to replace them by the less stringent conditions

$$\begin{aligned} H(B_i | B_1^{i-1} E) \approx H(B_i | A_1^{i-1} B_1^{i-1} E) \ . \end{aligned}$$
(71)

Another promising direction would be to apply entropy accumulation to estimate the entropy of low-energy states of many-body systems. One may expect that, under appropriate physical assumptions, these states possess a structure that permits a decomposition of the form described by Fig. 1 such that the Markov conditions required for Theorem 4.4, or at least some relaxations of them such as (71), hold. This may for example be the case for systems whose states are well approximated by matrix products states (see, e.g., [59]). We leave the investigation of such applications, as well as the development of corresponding extensions of the entropy accumulation theorem, for future work.