Abstract
We ask the question whether entropy accumulates, in the sense that the operationally relevant total uncertainty about an n-partite system \(A = (A_1, \ldots A_n)\) corresponds to the sum of the entropies of its parts \(A_i\). The Asymptotic Equipartition Property implies that this is indeed the case to first order in n—under the assumption that the parts \(A_i\) are identical and independent of each other. Here we show that entropy accumulation occurs more generally, i.e., without an independence assumption, provided one quantifies the uncertainty about the individual systems \(A_i\) by the von Neumann entropy of suitably chosen conditional states. The analysis of a large system can hence be reduced to the study of its parts. This is relevant for applications. In device-independent cryptography, for instance, the approach yields essentially optimal security bounds valid for general attacks, as shown by Arnon-Friedman et al. (SIAM J Comput 48(1):181–225, 2019).
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In classical information theory, the uncertainty one has about a variable A given access to side information B can be operationally quantified by the number of bits one would need to learn, in addition to B, in order to reconstruct A. While this number generally fluctuates, it is—except with probability of order \(\varepsilon > 0\)—not larger than the \(\varepsilon \)-smooth max-entropy, \(H_{\max }^\varepsilon (A|B)_\rho \), evaluated for the joint distribution \(\rho \) of A and B [45].Footnote 1 Conversely, it is in the same way not smaller than the \(\varepsilon \)-smooth min-entropy, \(H_{\min }^\varepsilon (A|B)_\rho \). This may be summarised by saying that the number of bits needed to reconstruct A from B is with probability at least \(1-O(\varepsilon )\) contained in the interval
whose boundaries are defined by the smooth entropies. We refer to Definition 2.2 below for a precise definition of these quantities.
This approach to quantifying uncertainty can be extended to the case where A and B are quantum systems. The conclusion remains the same: the operationally relevant uncertainty interval is I as defined by (1). The only difference is that \(\rho \) is now a density operator, which describes the joint state of A and B [41, 44, 51].
Finding the boundaries of the interval I is a central task of information theory. However, the smooth entropies of a large system A are often difficult to calculate. It is therefore rather common to introduce certain assumptions to render this task more feasible. One extremely popular approach in standard information theory is to assume that the system consists of many mutually independent and identically distributed (IID) parts. More precisely, the IID Assumption demands that the system be of the form \(A = A_1^n = A_1 \otimes \cdots \otimes A_n\), that the side information have an analogous structure \(B = B_1^n = B_1 \otimes \cdots \otimes B_n\), and that the joint state of these systems be of the form \(\rho _{A_1 B_1 \cdots A_n B_n} = \nu _{A B}^{\otimes n}\), for some density operator \(\nu _{A B}\). A fundamental result from information theory, the Asymptotic Equipartition Property (AEP) [48] (see [54] for the quantum version), then asserts that the uncertainty interval satisfies
where \(c_\varepsilon \) is a constant (independent of n) and where \(H(A|B)_{\nu }\) is the conditional von Neumann entropy evaluated for the state \(\nu _{A B}\). In other words, for large n, the operationally relevant total uncertainty one has about \(A_1^n\) given \(B_1^n\) is well approximated by \(n H(A|B)_{\nu } = \sum _{i} H(A_i | B_i)_{\rho }\). In this sense, the entropy of the individual systems \(A_i\) accumulates to the entropy of the total system \(A_1^n\).Footnote 2
In this work, we generalise this statement to the case where the individual pairs \(A_i B_i\) are no longer independent of each other, i.e., where the IID assumption does not hold. Without loss of generality one may think of the pairs \(A_i B_i\) as being generated by a sequence of processes \({\mathcal {M}}_i\), as shown in Fig. 1. Each process \({\mathcal {M}}_i\) may pass information on to the next one via a “memory” register \(R_i\). The state of the “future” pairs can thus depend on the “past” ones.Footnote 3 The only assumption we make is that, given the side information \(B_1^i\) generated until step i, the systems \(A_1^i\) are independent of the next piece of side information \(B_{i+1}\). This is captured by the requirement that \(A_1^{i} \leftrightarrow B_1^{i} \leftrightarrow B_{i+1}\) forms a quantum Markov chain.Footnote 4 Entropy accumulation is then the claim that
where, in the ith term of each sum, the infimum or supremum ranges over joint states \(\omega _{R_{i-1} R}\) of the memory \(R_{i-1}\) and a system R isomorphic to it, and the conditional von Neumann entropy is evaluated for the state \(({{\mathcal {M}}_i \otimes {\mathcal {I}}_R})(\omega _{R_{i-1} R})\), abbreviated by \({\mathcal {M}}_i(\omega )\), which describes the output pair \(A_i B_i\) generated by \({\mathcal {M}}_i\) jointly with R.
To illustrate (3) it is useful to think of a communication scenario with two parties, Alice and Bob, who are receiving information \(A_1^n\) and \(B_1^n\), respectively. Suppose that a source with memory \(R_i\) generates this information sequentially in n steps, described by maps \({\mathcal {M}}_i\) as depicted in Fig. 1. Suppose furthermore that Bob would like to infer all n values \(A_i\) (which, for the purpose of this example, we assume to be classical). As discussed above, for this he would require N additional classical bits from Alice, where N fluctuates (up to probability \(\varepsilon \)) within an interval I with boundaries given by the entropies \(H_{\min }^{\varepsilon }(A_1^n|B_1^n)\) and \(H_{\max }^{\varepsilon }(A_1^n|B_1^n)\), which quantify Bob’s uncertainty about \(A_1^n\). While these entropies depend on the joint state \(\rho _{A_1^n B_1^n}\) of the entire information generated by the source over all n steps, they can, according to (3), be lower (or upper) bounded by a sum of terms that merely depend on the individual steps \({\mathcal {M}}_i\). Specifically, the minimum (or maximum) number N of bits that Alice needs to send to Bob so that he can infer her values \(A_i\) grows for each such value by the von Neumann entropy \(H(A_i|B_i R)\), minimised (or maximised) over all possible states the memory \(R_{i-1}\) could have been in right before the pair \(A_i B_i\) was produced, and conditioned on \(B_i\) as well as any information R about this memory.Footnote 5
Circuit diagram illustrating the decomposition of states \(\rho _{A_1^n B_1^n}\) relevant for our main theorem. One starts with a state \(\rho ^0_{R_0}\), and each of the pairs \(A_i B_i\) is generated sequentially, one after the other, by the process \({\mathcal {M}}_i\). The map \({\mathcal {M}}_i\) takes as input a state on \(R_{i-1}\) and outputs a state on \(R_{i} \otimes A_i \otimes B_i\)
The main result we derive in this work is actually a bit more general than (3), allowing one to take into account global information about the statistics of \(A_1^n\) and \(B_1^n\). This is relevant for applications. In quantum key distribution, for instance, \({\mathcal {M}}_i\) models the generation of the ith bit of the raw key. However, in this cryptographic scenario, \({\mathcal {M}}_i\) can depend on the attack strategy of an adversary, and is thus partially unknown. Hence, in order to bound the entropy (which characterises an adversary’s uncertainty) of the raw key bits, one must as well take into account global statistical properties. These are inferred by tests carried out by the quantum key distribution protocol on a small sample of the generated bits. To incorporate such statistical information in the analysis, we consider for each i an additional classical value \(X_i\) derived from \(A_i\) and \(B_i\), as depicted by Fig. 2. Specifically, \(X_i\) shall tell us whether position i was included in the statistical test, and if so, the outcome of the test performed at step i. For this extended scenario, (3) still holds, but now the infimum and supremum are taken over a restricted set, containing only those states \(\omega \) for which the resulting probability distribution on \(X_i\) corresponds to the observed statistics.
Circuit diagram illustrating the decomposition of states \(\rho _{A_1^n B_1^n X_1^n}\) relevant for the full version of our main theorem, which can take into account statistical information \(X_1^n\). The individual pieces \(X_i\) of this statistical information are classical values that can be determined from \(A_i\) and \(B_i\) without disturbing them. When \(A_i\) and \(B_i\) are themselves classical, this means that \(X_i\) is a deterministic function of \(A_i\) and \(B_i\). For a precise definition in the general case we refer to Sect. 4
Entropy accumulation has a number of theoretical and practical implications. For example, it serves as a technique to turn cryptographic security proofs that were restricted to collective attacks to security proofs against general attacks. This application is demonstrated in [5] for the case of a fully device-independent quantum key distribution and a randomness expansion protocol. The resulting security bounds are essentially tight, implying that device-independent cryptography is possible with state-of-the-art technology. To illustrate the basic ideas behind such applications, we will present two concrete examples in more detail. The first is a proof of security of a variant of the E91 Quantum Key Distribution protocol. This new security proof has two advantages. First, its structure is modular and it may therefore be adapted to other cryptographic schemes (see also the discussion in Sect. 6). In addition, it achieves a strong level of security where no assumption is made on Bob’s devices. This is sometimes referred to as one-sided measurement device independence and this level of security was partially achieved in [58] (they used a memoryless devices assumption which we do not need) and later fully in [56] though with sub-optimal rates.
The second example is the derivation of an upper bound on the fidelity achievable by Fully Quantum Random Access Codes.
The proof of the main result, Eq. (3), has a similar structure as the proof of the Quantum Asymptotic Equipartition Property [54], which we can retrieve as a special case (see Corollary 4.10). The idea is to first bound the smooth entropy of the entire sequence \(A_1^n\) conditioned on \(B_1^n\) by a conditional Rényi entropy of order \(\alpha \), then decompose this entropy into a sum of conditional Rényi entropies for the individual terms \(A_i\), and finally bound these in terms of von Neumann entropies. However, in contrast to previous arguments, we use a recently introduced version of conditional Rényi entropies, termed “sandwiched Rényi entropies” [37, 64]. For these entropies, we derive a novel chain rule, which forms the core technical part of our proof. In addition, some of the concepts used in this work generalise techniques proposed in the recent security proofs for device-independent cryptography presented in [34, 35]. In particular, the dominant terms of the lower bound on the amount of randomness obtained in [35], called rate curves, are similar to the tradeoff functions considered here (cf. Definition 4.1).Footnote 6
Paper organisation: We begin with preliminaries and notation in Sect. 2. Section 3 is devoted to the central technical ingredient of our argument, a chain rule for Rényi entropies. The main result, the theorem on entropy accumulation, is then stated and proved in Sect. 4. In Sect. 5 we present the two sample applications mentioned above, before concluding with remarks and suggestions for future work in Sect. 6.
2 Preliminaries
2.1 Notation
In the table below, we summarise some of the notation used throughout the paper:
Symbol | Definition |
---|---|
\(A, B, C, \dots \) | Quantum systems, and their associated Hilbert spaces |
\({\mathcal {L}}(A,B)\) | Set of linear operators from A to B |
\({\mathcal {L}}(A)\) | \({\mathcal {L}}(A,A)\) |
\(X_{AB}\) | Operator in \({\mathcal {L}}(A \otimes B)\) |
\(X_{B \leftarrow A}\) | Operator in \({\mathcal {L}}(A, B)\) |
\(\mathrm {D}(A)\) | Set of normalised density operators on A |
\(\mathrm {D}_{\leqslant }(A)\) | Set of sub-normalised density operators on A |
\(\mathrm {Pos}(A)\) | Set of positive semidefinite operators on A |
\(X^{-1}\) for \(X \in \mathrm {Pos}(A)\) | Generalised inverse, such that \(XX^{-1}X = X\) holds |
\(X_A \geqslant Y_A\) | \(X_A - Y_A \in \mathrm {Pos}(A)\) |
\(A_i^j\) (with \(j \geqslant i\)) | Given n systems \(A_1,\dots ,A_n\), this is a shorthand for \(A_i,\dots ,A_j\) |
\(A^n\) | Often used as shorthand for \(A_1,\dots ,A_n\) |
\(\log (x)\) | Logarithm of x in base 2 |
Throughout the paper, we restrict ourselves to finite-dimensional Hilbert spaces. Furthermore, we use the following notation for classical-quantum states \(\rho _{X A} \in \mathrm {D}(X \otimes A)\) with respect to the basis of the system X. For any \(x \in {\mathcal {X}}\), we let
so that
. To refer to the conditional state, we write \(\rho _{A|x} = \frac{\rho _{A,x}}{\mathrm {tr}(\rho _{A,x})}\). An event \(\Omega \subseteq {\mathcal {X}}\) in this paper refers to a subset of \({\mathcal {X}}\) and we can similarly define
, where we introduced the notation \(\rho [\Omega ] = \sum _{x \in \Omega } \mathrm {tr}(\rho _{A, x})\). We also use the usual notation for the partial trace for conditional states, e.g., \(\rho _{XA|\Omega } = \mathrm {tr}_{B}(\rho _{XAB|\Omega })\).
For a density operator \(\rho _{A B} \in \mathrm {D}(A \otimes B)\) on a bipartite Hilbert space \(A \otimes B\) we define the operatorFootnote 7
which may be interpreted as the state of A conditioned on B, analogous to a conditional probability distribution. This operator was previously defined and studied in [6, 30]. In the following, we will usually drop identity operators from the notation when they are clear from the context. We would thus write, for instance,
Remark 2.1
Let A and \({\bar{A}}\) be two isomorphic Hilbert spaces with orthonormal bases and
and define
![](http://media.springernature.com/lw153/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ206_HTML.png)
Then any trace-non-increasing map \({\mathcal {M}}= {\mathcal {M}}_{B \leftarrow {\bar{A}}}\) from \({\mathcal {L}}({\bar{A}})\) to \({\mathcal {L}}(B)\) can be represented as a “conditional state” (also known as the Choi-Jamiolkowski state) \(M_{B | A}\) on \(A \otimes B\) with the property that
and such that
![](http://media.springernature.com/lw273/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ5_HTML.png)
holds. Specifically, for any map \({\mathcal {M}}\) one may define
![](http://media.springernature.com/lw148/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ6_HTML.png)
it is then straightforward to verify the properties above.
Conversely, for any \(M_{B | A}\) such that (4) holds the map defined by
![](http://media.springernature.com/lw204/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ207_HTML.png)
satisfies (6) and hence (5). It is also easy to verify that it is completely positive and trace non-increasing.
We mention here a slight abuse of terminology: for a completely positive map \({\mathcal {M}}_{B \leftarrow A}\) from \({\mathcal {L}}(A)\) to \({\mathcal {L}}(B)\), we often use a shorthand to indicate the systems it acts on and simply say that it maps A to B.
2.2 Background on quantum Markov chains
The concept of quantum Markov chains will be used throughout the paper, and here we give some relevant basic facts about them. Let \(\{a_j\}_{j \in J}\) and \(\{c_j\}_{j \in J}\) be families of Hilbert spaces and let B be a Hilbert space such thatFootnote 8
holds. Let us furthermore denote by \(V = \bigoplus _{j \in J} V_{a_j c_j \leftarrow B}\) the corresponding isomorphism. It is convenient to treat \(\bigoplus _j a_j \otimes c_j\) as a subspace of the product \(a \otimes c\) of the spaces
The mapping V may then be viewed as an embedding of B into \(a \otimes c\). Given a density operator \(\rho _B\), we denote by \(\rho _{a c}\) the density operator \(V \rho _B V^{\dagger }\). More generally, for a multi-partite density operator \(\rho _{A B}\), we write \(\rho _{A a c}\) for \(V \rho _{A B} V^{\dagger }\). Furthermore, for any \(j \in J\), we denote by \(\rho _{A a_j c_j}\) the projection of \(\rho _{A a c}\) onto the subspace defined by \(a_j \otimes c_j\), i.e.,
A tri-partite density operator \(\rho _{A B C}\) is said to obey the Markov chain condition \(A \leftrightarrow B \leftrightarrow C\) if there exists a decomposition of B of the form (7) such that
where \(\{q_j\}_{j \in J}\) is a probability distribution and \(\{{\hat{\rho }}_{A a_j}\}_{j \in J}\) and \(\{{\hat{\rho }}_{c_j C}\}_{j \in J}\) are families of density operators [26, 28, 39]. It follows from this decomposition that a state \(\rho _{ABC}\) obeying the Markov chain condition can be reconstructed from \(\rho _{AB}\) with a map \({\mathcal {T}}_{BC \leftarrow B}\) acting only on B [39]:
Another useful characterization of the Markov chain condition for \(\rho _{ABC}\) is given by the entropic equality \(I(A:C|B)_{\rho } = 0\) [26, 28, 39]. The conditional mutual information is defined as \(I(A:C|B)_{\rho } = H(AB)_{\rho } + H(BC)_{\rho } - H(B)_{\rho } - H(ABC)_{\rho }\) where \(H(A)_{\rho } = -\mathrm {tr}(\rho _{A} \log \rho _{A})\) is the von Neumann entropy.
2.3 Entropic quantities
The formulation of the main claim refers to smooth entropies, which can be defined as follows.
Definition 2.2
For any density operator \(\rho _{A B}\) and for \(\varepsilon \in [0,1]\) the \(\varepsilon \)-smooth min- and max-entropies of A conditioned on B are
respectively, where \({\tilde{\rho }}\) is any non-negative operator with trace at most 1 that is \(\varepsilon \)-close to \(\rho \) in terms of the purified distanceFootnote 9 [51, 55], and where \(\sigma _B\) is any density operator on B.
The proof we present here heavily relies on the sandwiched relative Rényi entropies introduced in [37, 64]. These relative entropies can be used to define a conditional entropy.Footnote 10
Definition 2.3
For any density operator \(\rho _{A B}\) and for \(\alpha \in (0, 1) \cup (1, \infty )\) the sandwiched \(\alpha \)-Rényi entropy of A conditioned on B is defined as
where \(\alpha ' = \frac{\alpha - 1}{\alpha }\) and where \(\Vert X \Vert _\alpha = \mathrm {tr}\bigl ( (X^{\dagger } X)^{\frac{\alpha }{2}} \bigr )^{\frac{1}{\alpha }}\). Note that \(\alpha '\) is the inverse of the Hölder conjugate of \(\alpha \).
We note that, while the function \(X \mapsto \Vert X \Vert _\alpha \) is a norm for \(\alpha \geqslant 1\), this is not the case for \(\alpha < 1\) since it does not satisfy the triangle inequality. Some key properties of this function are summarised in Appendix A. Using them, the sandwiched Rényi entropies may be rewritten as
It turns out that there are multiple ways of defining conditional entropies from relative entropies. Another variant that will be needed in this work is the following:
Definition 2.4
For any density operator \(\rho _{A B}\) and for \(\alpha \in (0, 1) \cup (1, \infty )\), we define
where the infimum is over all sub-normalised density operators on B.
Other relevant facts about the sandwiched Rényi entropy and the corresponding notion of relative entropy can be found in Appendix B.
3 Chain Rule for Rényi Entropies
As explained in the introduction, our main result can be regarded as a generalisation of the Quantum Asymptotic Equipartition Property [54], corresponding to (2). The approach used for the proof of the latter is to bound both the smooth min-entropy and the von Neumann entropy by Rényi entropies with an appropriate parameter \(\alpha \). The IID assumption is then used to decompose the Rényi entropy into a sum of n terms. However, since our main claim, Eq. (3), is supposed to hold for general non-IID states, we do not have this luxury, and we must somehow decompose the Rényi entropy into n terms using other means. The tool we will use for this purpose is a chain rule for Rényi entropies, which we present as a separate theorem (Theorem 3.2). We start by stating a more general version that will be useful in the proof of the main theorem.
Lemma 3.1
Let \(\rho _{A_1 A_2 B}\) and \(\sigma _B\) be density operators and let \(\alpha \in (0, \infty )\). Then
where
We note that \(\nu _{A_1B} = \mathrm {tr}_{A_2}(\nu _{A_1A_2B})\), which justifies the notation.
Proof
When \(\alpha = 1\), this equality follows directly from the definition of the entropies. To prove the equality for \(\alpha \in (0,1) \cup (1, \infty )\) we consider a purification of \(\rho _{A_1A_2B}\). Using Lemma B.2 and setting \(\alpha ' = \frac{\alpha -1}{\alpha }\) we have
![](http://media.springernature.com/lw547/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ208_HTML.png)
By the definition of \(\nu _{A_1B}\), we get
![](http://media.springernature.com/lw534/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ209_HTML.png)
where we defined the pure state , which is a purification of \(\nu _{A_1A_2B}\). To conclude we use the fact that \(\nu _{A_1B} = \mathrm {tr}_{A_2}(\nu _{A_1A_2B})\) and Lemma B.2. \(\quad \square \)
By choosing \(\sigma _B = \rho _{B}\) in Lemma 3.1, we directly obtain a chain rule for the Rényi entropies:
Theorem 3.2
Let \(\rho _{A_1 A_2 B}\) be a density operator and let \(\alpha \in (0, \infty )\). Then
where
One drawback of the above result is that we are seldom interested in the particular state \(\nu \) defined in the theorem statement. It is therefore generally more useful to present the result in a slightly weaker form, where the state \(\nu \) is chosen to be the worst case over an appropriate class of density operators. When \(\rho \) obeys the Markov chain condition \(A_1 \leftrightarrow B_1 \leftrightarrow B_2\), we obtain the following result.
Theorem 3.3
Let \(\rho _{A_1 B_1 A_2 B_2}\) be a density operator such that the Markov chain condition \(A_1 \leftrightarrow B_1 \leftrightarrow B_2\) holds and let \(\alpha \in (0, \infty )\). Then
where the supremum and infimum range over density operators \(\nu \) such that \(\nu _{A_2 B_2 | A_1 B_1} = \rho _{A_2 B_2 | A_1 B_1}\) holds.
Proof
We apply Theorem 3.2 with \(B = B_1B_2\). The Markov chain condition implies that \(H_\alpha (A_1|B_1B_2)_{\rho } = H_\alpha (A_1|B_1)_{\rho }\). To see this for \(\alpha \in (\frac{1}{2}, \infty )\), we could use the recoverability condition (10) for Markov chains together with the monotonicity of \(D_{\alpha }\) under quantum channels [8, 24, 37, 64]. We can also see it for all \(\alpha \in (0, \infty )\) using the structure of a Markov chain stated in (9). Namely, there exists a decomposition \(\bigoplus _{j} a_j \otimes b_j\) of the system \(B_1\) such that
holds, where \(\{q_j\}\) is a probability distribution and where \(\{{\hat{\rho }}_{A a_j}\}\) and \(\{{\hat{\rho }}_{b_j B}\}\) are families of density operators. Then,
To prove (14), it only remains to show that the state \(\nu _{A_1A_2B_1B_2}\) defined in (13) satisfies \(\nu _{A_2B_2|A_1B_1} = \rho _{A_2B_2|A_1B_1}\). For that, we again use the fact that \(\rho _{A_1B_1B_2}\) forms a Markov chain. As we will be using this statement later in other contexts, we state it as a claim.
\(\square \)
Claim 3.4
Let \(\rho _{A_1 B_1 A_2 B_2}\) be a density operator such that the Markov chain condition \(A_1 \leftrightarrow B_1 \leftrightarrow B_2\) holds, let \(\alpha \in (0, \infty )\) and let \(\nu _{A_1 B_1 A_2 B_2}\) be as in (13) with \(B \rightarrow B_1 B_2\). Then \(\nu _{A_2 B_2 | A_1 B_1} = \rho _{A_2 B_2 | A_1 B_1}\).
Letting \(Z = \mathrm {tr}\left( \rho _{A_1B_1B_2}^{\frac{1}{2}} \rho ^{-\alpha '}_{B_1B_2} \rho _{A_1B_1B_2}^{\frac{1}{2}}\right) ^{\alpha }\), the decomposition (15) allows us to write
It follows that
where \({\hat{\rho }}^0_{A_1 a_j}\) is the projector onto the support of \({\hat{\rho }}_{A_1 a_j}\). We used for the first equality the fact that the support of the operator \(\left( {\hat{\rho }}_{A_1 a_j}^{\frac{1}{2}} {\hat{\rho }}_{a_j}^{-\alpha '} {\hat{\rho }}_{A_1 a_j}^{\frac{1}{2}} \right) ^{\alpha }\) is the same as the support of \({\hat{\rho }}_{A_1 a_j}\). As a result, we find
This concludes the proof of Claim 3.4 and gives the desired statement. \(\quad \square \)
The following simple corollary expresses the above chain rules in terms of quantum channels, i.e., trace preserving completely positive (TPCP) maps, rather than conditional states.
Corollary 3.5
Let \(\rho ^0_{R A_1 B_1}\) be a density operator on \(R \otimes A_1 \otimes B_1\), \({\mathcal {M}}= {\mathcal {M}}_{A _2 B_2 \leftarrow R}\) be a TPCP map and \(\alpha \in (0, \infty )\). Assuming that \(\rho _{A_1 B_1 A_2 B_2} = {\mathcal {M}}(\rho ^0_{RA_1B_1})\) satisfies the Markov condition \(A_1 \leftrightarrow B_1 \leftrightarrow B_2\), we have
where the supremum and infimum range over density operators \(\omega _{R A_1 B_1}\) on \(R \otimes A_1 \otimes B_1\). Moreover, if \(\rho ^0_{R A_1 B_1}\) is pure then we can optimise over pure states \(\omega _{R A_1 B_1}\).
Proof
We apply Theorem 3.3 to \(\rho _{A_1 B_1 A_2 B_2}\). It suffices to show that the optimisation over \(\nu _{A_1 B_1 A_2 B_2}\) satisfying \(\nu _{A_2 B_2 | A_1 B_1} = \rho _{A_2 B_2 | A_1 B_1}\) is contained in the optimisation over \(\omega _{RA_1B_1}\). For this, let \(\nu _{A_1 B_1 A_2 B_2}\) be any density operator satisfying \(\nu _{A_2 B_2 | A_1 B_1} = \rho _{A_2 B_2 | A_1 B_1}\), i.e.,
Now we choose
We then see that
\(\square \)
4 Entropy Accumulation
This section is devoted to the main result on entropy accumulation. The statement is formulated in its fully general form as Theorem 4.4 and presented in a slightly simplified version as Corollary 4.8. We also give a formulation that corresponds to statement (3) of the introduction (Corollary 4.9). Finally, we show how the Quantum Asymptotic Equipartition Property follows as a special case (cf. Corollary 4.10).
For \(i \in \{1,\dots ,n\}\), let \({\mathcal {M}}_i\) be a TPCP map from \(R_{i-1}\) to \(X_i A_i B_i R_i\), where \(A_i\) is finite-dimensional and where \(X_i\) represents a classical value from an alphabet \({\mathcal {X}}\) that is determined by \(A_i\) and \(B_i\) together. More precisely, we require that, \({\mathcal {M}}_{i} = {\mathcal {T}}_{i} \circ {\mathcal {M}}'_i\) where \({\mathcal {M}}'_{i}\) is an arbitrary TPCP map from \(R_{i-1}\) to \(A_{i} B_{i} R_{i}\) and \({\mathcal {T}}_i\) is a TPCP map from \(A_{i}B_{i}\) to \(X_{i} A_i B_i\) of the form
![](http://media.springernature.com/lw560/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ25_HTML.png)
where \(\{\Pi _{A_i, y}\}\) and \(\{\Pi _{B_i, z}\}\) are families of mutually orthogonal projectors on \(A_i\) and \(B_i\), and where \(t : {\mathcal {Y}}\times {\mathcal {Z}}\rightarrow {\mathcal {X}}\) is a deterministic function (cf. Figs. 1 and 2). Special cases of interest are when \(X_i\) is trivial and \({\mathcal {T}}_{i}\) is the identity map, and when \(X_i = t(Y_i, Z_i)\) where \(Y_i\) and \(Z_i\) are classical parts of \(A_i\) and \(B_i\), respectively. Note that the maps \({\mathcal {T}}_i\) have the property that, for any operator \({\bar{W}}_{X_iA_iB_i}\), if \({\bar{W}}_{X_iA_iB_i} = {\mathcal {T}}_{i}(W_{A_i B_i})\) then \({\bar{W}}_{X_i A_i B_i} = {\mathcal {T}}_{i}({\bar{W}}_{A_i B_i})\).
The entropy accumulation theorem stated below will hold for states of the form
where \(\rho ^0_{R_0 E} \in \mathrm {D}(R_0 \otimes E)\) is a density operator on \(R_0\) and an arbitrary system E. In addition, we require that the Markov conditions
be satisfied for all \(i \in \{1, \ldots , n\}\).
Let \({\mathbb {P}}\) be the set of probability distributions on the alphabet \({\mathcal {X}}\) of \(X_i\), and let R be a system isomorphic to \(R_{i-1}\). For any \(q \in {\mathbb {P}}\) we define the set of states
where \(\nu _{X_i}\) denotes the probability distribution over \({\mathcal {X}}\) with the probabilities given by .
Definition 4.1
A real function f on \({\mathbb {P}}\) is called a min- or max-tradeoff function for \({\mathcal {M}}_i\) if it satisfies
respectively.Footnote 11
Remark 4.2
To determine the infimum \(\inf _{\nu \in \Sigma _i(q)} H(A_i | B_i R)_{\nu }\), we may assume that \(\omega _{R_{i-1} R}\) in the definition of \(\Sigma _i(q)\) is pure. In fact, including a purifying system in R cannot increase \(H(A_i | B_i R)\) because of strong subadditivity. Similarly, to calculate the supremum \(\sup _{\nu \in \Sigma _i(q)} H(A_i | B_i R)_{\nu }\), we may assume that \(\omega _{R_{i-1} R}\) is a product state or that R is trivial. This justifies the fact that we assumed R is isomorphic to \(R_{i-1}\) in the definition of \(\Sigma _i(q)\).
Remark 4.3
As we will see in the proof below, one can also impose the constraint on the set \(\Sigma _i(q)\) that the system R be isomorphic to \(A_1^{i-1}B_1^{i-1}E\). Furthermore, if a part of the latter is classical in \(\rho \), one can restrict \(\Sigma _i(q)\) to states satisfying this property.
In the following, we denote by \(\nabla f\) the gradient of a function f. (Note that in Theorem 4.4 and Proposition 4.5f is an affine function, so that \(\nabla f\) is a constant.) We write \(\mathsf {freq}(X_1^n)\) for the distribution on \({\mathcal {X}}\) defined by \(\mathsf {freq}(X_1^n)(x) = \frac{|\{i \in \{1,\dots ,n\} : X_i = x\}|}{n}\). We also recall that in this context, an event \(\Omega \) is defined by a subset of \({\mathcal {X}}^n\) and we write \(\rho [\Omega ] = \sum _{x_1^n \in \Omega }\mathrm {tr}(\rho _{A_1^n B_1^n E, x_1^n})\) for the probability of the event \(\Omega \) and
![](http://media.springernature.com/lw317/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ210_HTML.png)
for the state conditioned on \(\Omega \) (cf. Section 2.1).
Theorem 4.4
Let \({\mathcal {M}}_1,\dots ,{\mathcal {M}}_n\) and \(\rho _{A_1^n B_1^n X_1^n E}\) be such that (26) and the Markov conditions (27) hold, let \(h \in {\mathbb {R}}\), let f be an affine min-tradeoff function for \({\mathcal {M}}_1,\dots ,{\mathcal {M}}_n\), and let \(\varepsilon \in (0,1)\). Then, for any event \(\Omega \subseteq {\mathcal {X}}^n\) that implies \(f(\mathsf {freq}(X_1^n)) \geqslant h\),Footnote 12
holds for \(c = 2 \bigl (\log (1+2 d_A) + \left\lceil \Vert \nabla f \Vert _\infty \right\rceil \bigr ) \sqrt{1- 2 \log (\varepsilon \rho [\Omega ])}\), where \(d_A\) is the maximum dimension of the systems \(A_i\). Similarly,
holds if f is replaced by an affine max-tradeoff function and if \(\Omega \) implies \(f(\mathsf {freq}(X_1^n)) \leqslant h\).
Before proceeding to the proof, some remarks are in order. The first is that the Markov chain assumption on the state is important as argued in Appendix C. Secondly, the system E could have been included in \(B_1\), but for the applications we consider, it is clearer to keep a separate system E that is not affected by the processes \({\mathcal {M}}_1, \dots , {\mathcal {M}}_n\). Thirdly, concerning the second order term, it is possible to replace \(d_A\) with appropriate entropic quantities, as in the Quantum Asymptotic Equipartition Property [54], which could be useful when the systems \(A_i\) are infinite-dimensional. The dependence of the second order term in the state and in the tradeoff function f is studied in more detail in the subsequent work [19]. Finally, we note that the constraint that the tradeoff function be affine is not a severe restriction: given a convex min-tradeoff function, one can always choose a tangent hyperplane at a point of interest as an affine lower bound. This is illustrated in Corollary 4.7.
To prove the theorem, we will first show the following proposition, which is essentially a Rényi version of entropy accumulation. We then show how Theorem 4.4 follows from this proposition.
Proposition 4.5
Let \({\mathcal {M}}_1, \ldots , {\mathcal {M}}_n\) and \(\rho _{A_1^n B_1^n X_1^n E}\) be such that (26) and the Markov conditions (27) hold, let \(h \in {\mathbb {R}}\), and let f be an affine min-tradeoff function f for \({\mathcal {M}}_1,\dots ,{\mathcal {M}}_n\). Then, for any event \(\Omega \) which implies \(f(\mathsf {freq}(X_1^n)) \geqslant h\),
holds for \(\alpha \) satisfying \(1< \alpha < 1 + \frac{2}{V}\), and \(V = 2 \left\lceil \Vert \nabla f \Vert _\infty \right\rceil + 2 \log (1+2 d_A)\), where \(d_A\) is the maximum dimension of the systems \(A_i\). Similarly,
holds if f is replaced by an affine max-tradeoff function and if \(\Omega \) implies \(f(\mathsf {freq}(X_1^n)) \leqslant h\).
Proof
We focus on proving the first inequality (31). The proof of the second inequality (32) is similar, we only point out the main differences in the course of the proof.
The first step of the proof is to construct a state that will allow us to lower-bound \(H^\uparrow _{\alpha }(A_1^n | B_1^n E)_{\rho _{|\Omega }}\) using the chain rule of Theorem 3.3, while ensuring that the tradeoff function is taken into account. Let \([g_{\min }, g_{\max }]\) be the smallest real interval that contains the range \(f({\mathbb {P}})\) of f, and set \({\bar{g}} = \frac{1}{2} (g_{\min } + g_{\max })\). Furthermore, for every i, let \({\mathcal {D}}_i : X_i \rightarrow X_i D_i {\bar{D}}_i\), with \(\dim D_i = \dim {\bar{D}}_i\), be a TPCP map defined as
![](http://media.springernature.com/lw338/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ211_HTML.png)
where \(\tau (x)\) is a mixture between a maximally entangled state on \(D_i \otimes {\bar{D}}_i\) and a fully mixed state such that the marginal on \({\bar{D}}_i\) is uniform and such that \(H_{\alpha }(D_i|{\bar{D}}_i)_{\tau (x)} = {\bar{g}} - f(\delta _x)\) (here \(\delta _x\) stands for the distribution with all the weight on element x). To ensure that this is possible, we need to choose \(\dim D_i\) large enough, so we need to bound how large \({\bar{g}} - f(\delta _x)\) can be, positive or negative. By the definition of \({\bar{g}}\), \(|{\bar{g}} - f(\delta _x)|\) cannot be larger than \(\frac{1}{2} |g_{\max } - g_{\min }| \leqslant \Vert \nabla f \Vert _\infty \). We therefore take the dimension of the spaces \(D_i\) to be equal to
For later use, we note that we have
Now, let
Note that \({\bar{\rho }}_{X_1^n A_1^n B_1^n E} = \rho _{X_1^n A_1^n B_1^n E}\).
One can think of the D systems as an “entropy price” that encodes the tradeoff function. With these systems in place, the output entropy includes an extra term that allows the tradeoff function to be taken into account in the optimisation arising in Theorem 3.3. This is formalised by the following facts, which are proven in Claim 4.6:
The next step is to relate the entropies on the conditional state \(\rho _{|\Omega }\) to those on the unconditional state. To do this, we use Lemmas B.5 and B.6 applied to \({\bar{\rho }} = \rho [\Omega ] {\bar{\rho }}_{|\Omega } + ({\bar{\rho }} - \rho [\Omega ] {\bar{\rho }}_{|\Omega })\), together with the fact that \(H_{\alpha }^{\uparrow } \geqslant H_{\alpha }\), and obtainFootnote 13
To show the desired inequality (31), it now suffices to prove that \(H_{\alpha }(A_1^n D_1^n | B_1^n E {\bar{D}}_1^n)_{{\bar{\rho }}}\) is lower bounded by (roughly) \(n {\bar{g}}\). To do that, we are now going to use the chain rule for Rényi entropies in the form of Corollary 3.5n times on the state \({\bar{\rho }}\), with the following substitutions at step i:
-
\(A_1 \rightarrow A_1^{i-1} D_1^{i-1}\)
-
\(B_1 \rightarrow B_1^{i-1} E {\bar{D}}_1^{i-1} \)
-
\(A_2 \rightarrow A_i D_i\)
-
\(B_2 \rightarrow B_i {\bar{D}}_i\)
-
\(R \rightarrow R_{i-1}\)
-
\({\mathcal {M}}\rightarrow \mathrm {tr}_{X_i} \circ {\mathcal {D}}_i \circ {\mathcal {M}}_i\).
To establish the Markov chain condition, we compute the conditional mutual information. Using the chain rule, we obtain
We first show that the second term is zero. By construction, \(D_1^{i-1} {\bar{D}}_1^{i-1}\) conditioned on \(X_1^{i-1}\) is independent of all the other systems. This implies that \(I(D_1^{i-1} {\bar{D}}_1^{i-1} : B_i {\bar{D}}_i | X_1^{i-1} A_1^{i-1} B_1^{i-1} E ) = 0\). In addition, using the fact that \(X_1^{i-1}\) is determined by \(A_1^{i-1} B_1^{i-1}\), the systems \(X_1^{i-1}\) can be removed from the conditioned without changing the value. Then, using the chain rule and together with the non-negativity of the conditional mutual information, this shows that \(I(D_1^{i-1} : B_i {\bar{D}}_i | A_1^{i-1} B_1^{i-1} E {\bar{D}}_1^{i-1}) = 0\). To compute the first term in (39), we use the fact that \({\bar{D}}_1^n\) is uniform independently of \(A_1^n B_1^n E\) so that \(I(A_1^{i-1} : B_i {\bar{D}}_i | B_1^{i-1} E {\bar{D}}_1^{i-1}) = I(A_1^{i-1} : B_i | B_1^{i-1} E )\). But then the assumed Markov condition on \(\rho _{A_1^n B_1^n E}\) implies that this quantity is zero and establishes the required condition to apply Corollary 3.5.
We thus obtain
where we have invoked Lemma B.9 in the second inequality and (33) in the last. Note that the restriction of this lemma that \(\alpha \) satisfy \(1< \alpha < 1+ 1/ \log (1 + 2 d_A d_D)\) is implied by our assumption that \(\alpha < 1 + 2/V\). The infimum is taken over all states \(\omega _{R_{i-1} R}\), where the system R is isomorphic to \(A_1^{i-1} D_1^{i-1} B_1^{i-1} {\bar{D}}_1^{i-1} E\). This condition can be further strengthened by redoing the above argument with Theorem 3.2 instead of Corollary 3.5. It turns out that the system R can be taken to be isomorphic to \(A_1^{i-1} B_1^{i-1} E\), as noted in Remark 4.3.
To prove that we can restrict in our optimisation the system R to be isomorphic to \(A_1^{i-1}B_1^{i-1}E\) and drop the systems \(D_1^{i-1} {\bar{D}}_1^{i-1}\), we use Theorem 3.2 directly instead of Corollary 3.5. In particular, using Lemma B.7 as for (43), we can write
where we used the Markov chain condition \(A_1^{n-1} X_1^{n-1} D_1^{n-1} \leftrightarrow B_1^{n-1} E {\bar{D}}_1^{n-1} \leftrightarrow B_n {\bar{D}}_n\) and we defined for all \(i \in \{1, \dots , n\}\)
with \(Z_i = \mathrm {tr}\left( {\bar{\rho }}_{A_1^{i-1} X_1^{i-1} D_1^{i-1} B_1^{i} E {\bar{D}}_1^{i}}^{\frac{1}{2}} {\bar{\rho }}^{\frac{1-\alpha }{\alpha }}_{B_1^{i} E {\bar{D}}_1^{i}} {\bar{\rho }}_{A_1^{i-1} X_1^{i-1} D_1^{i-1} B_1^{i} E {\bar{D}}_1^{i}}^{\frac{1}{2}}\right) ^{\alpha }\). We then use the chain rule \(n-2\) more times to get
We now use the properties of \({\bar{\rho }}\) to simplify the entropies in the right hand side.
![](http://media.springernature.com/lw646/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ212_HTML.png)
Using the properties of the systems \(D_1^i {\bar{D}}_1^i\), we get for any \(x \in {\mathcal {X}}^{i-1}\),
where we used the fact that \({\bar{\rho }}_{D_1^{i}} = \otimes _{j=1}^i {\bar{\rho }}_{D_j}\). Letting
we can write
![](http://media.springernature.com/lw523/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ213_HTML.png)
In addition
![](http://media.springernature.com/lw522/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ214_HTML.png)
As a result,
![](http://media.springernature.com/lw357/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ215_HTML.png)
with
As the system \(D_1^{i-1}{\bar{D}}_1^{i-1}\) can be generated by only acting on \(X_1^{i-1}\), we have by data processing
We can then write
We now use Claim 3.4 with the substitutions
-
\(A_1 \rightarrow X_1^{i-1} A_1^{i-1}\)
-
\(A_2 \rightarrow X_i A_i D_i {\bar{D}}_i\)
-
\(B_1 \rightarrow B_1^{i-1} E\)
-
\(B_2 \rightarrow B_i\)
and using the Markov property \(X_1^{i-1} A_1^{i-1} \leftrightarrow B_1^{i-1} E \leftrightarrow B_i\). Thus, we have
As a result, as in the proof of Corollary 3.5, we then get
where
with \(T_{X_1^{i-1} A_1^{i-1} B_1^{i-1} E} = (\nu ^i_{X_1^{i-1} A_1^{i-1} B_1^{i-1} E})^{\frac{1}{2}} (\rho _{X_1^{i-1} A_1^{i-1} B_1^{i-1} E})^{-\frac{1}{2}}\). Finally, we get
where in the inequality we used the fact that \(X_1^{i-1}\) is classical together with Lemma B.3. We point out that it is clear from this calculation that if part of the systems \(A_1^{i-1} B_1^{i-1} E\) is classical in \(\rho \), it remains classical in \(\omega ^i\). This proves the claims in Remark 4.3.
Considering the right hand side of expression (40), we get for any such state \(\omega _{R_{i-1} R}\),
where \(q = {\mathcal {M}}_i(\omega )_{X_i}\) denotes the distribution of \(X_i\) on \({\mathcal {X}}\) obtained from the state \({\mathcal {M}}_i(\omega )\). The third equality comes from the fact that \(X_i\) is determined by \(A_iB_i\). The first inequality follows from the monotonicity of the Rényi entropies in \(\alpha \) [8, 37]. The last equality holds because f is affine and the final inequality because f is a min-tradeoff function. Putting everything together, Eq. (37) becomes
This concludes the proof of the first inequality (31) of Proposition 4.5.
In order to show the second inequality (32), using the same argument as before, we obtain
where the supremum is over all states \(\omega _{R_{i-1}R}\) with R constrained as described by Remark 4.3. For any such state and a max-tradeoff function f, we have
It then suffices to combine these inequalities with inequality (38). \(\quad \square \)
We now prove the claim used in the preceding proof.
Claim 4.6
For \(\alpha \in (1,2]\), \(\rho \) and \(\Omega \) as in the statement of Proposition 4.5 and \({\bar{\rho }}\) as defined in (34) (see also the preceding text for a definition of \({\bar{g}}\)), we have
Proof
We focus on proving inequality (41). The first step is to show that as \(X_1^n\) is a deterministic function of \(A_1^n B_1^n\), we have
In order to do that, observe that for any \(x_1^n \in {\mathcal {X}}^n\), we have
where we introduced the notation \(\tau (x_1^n)_{D_1^n {\bar{D}}_1^n} = \tau (x_1)_{D_1 {\bar{D}}_1} \otimes \cdots \otimes \tau (x_n)_{D_n {\bar{D}}_n}\). This implies that for any \(x_1^n\), we have
By taking the sum over \(x_1^n \in \Omega \) and then normalising by \(\rho [\Omega ]\), we get
Thus, we can apply Lemma B.7 and prove the equality (43).
Let now \(\sigma _{B_1^n E {\bar{D}}_1^n}\) be a state such that
Let furthermore \({\mathcal {S}}= {\mathcal {S}}_{D {\bar{D}}}\) be the TPCP map that applies a random (according to the Haar measure) unitary to D and its conjugate to \({\bar{D}}\) (in such a way that the maximally entangled state on \(D {\bar{D}}\) used to define \(\tau (x)\) is preserved). It is then easy to see that the map \({\mathcal {S}}^{\otimes n}\) applied to the n pairs \(D_i {\bar{D}}_i\) leaves \({\bar{\rho }}_{|\Omega }\) invariant. Hence, by the data processing inequality
where \({\bar{\sigma }}_{B_1^n E {\bar{D}}_1^n} = \sigma _{B_1^n E} \otimes {\bar{\rho }}_{{\bar{D}}_1^n}\). Lemma 3.1 then implies that
where \(\nu \) is a state defined by
We now use properties of \(\rho _{|\Omega }\) and \({\bar{\sigma }}\) to simplify the expression of \(\nu \). Observing that
![](http://media.springernature.com/lw391/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ45_HTML.png)
we can write
![](http://media.springernature.com/lw516/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ216_HTML.png)
In addition, as \({\bar{\rho }}_{|\Omega }\) is of the form
![](http://media.springernature.com/lw472/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ217_HTML.png)
we have
![](http://media.springernature.com/lw438/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ218_HTML.png)
where \(\rho _{A_1^n B_1^n E, x_1^n}^{0}\) is the projector onto the support of \(\rho _{A_1^n B_1^n E, x_1^n}\). Hence,
![](http://media.springernature.com/lw397/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ46_HTML.png)
Getting back to the inequality (44), we have \(H^\uparrow _{\alpha }(A_1^n X_1^n | B_1^n E {\bar{D}}_1^n)_{{\bar{\rho }}_{|\Omega }} = H^\uparrow _{\alpha }(A_1^n | B_1^n E )_{{\bar{\rho }}_{|\Omega }}\) using Eq. (45) to drop \({\bar{D}}_1^n\) and Lemma B.7 to drop \(X_1^n\). Moreover, using (46), we have that \(H_{\alpha }(D_1^n | A_1^n X_1^n B_1^n E {\bar{D}}_1^n)_{\nu } = H_{\alpha }(D_1^n | X_1^n {\bar{D}}_1^n)_{\nu }\). Finally, we get
It is a direct consequence of the definition of \(\tau (x)\) that
where we have used that f is an affine function. Using Lemma B.3 and (46) we can bound the second term on the right hand side of (47) by
Inserting this in (47) gives
This concludes the proof of inequality (41). For the proof of inequality (42), we can follow similar steps.
In order to prove inequality (42). The first step is to show that as \(X_1^n\) is a deterministic function of \(A_1^n B_1^n\), we have
In order to do that, observe that for any \(x_1^n \in {\mathcal {X}}^n\), we have
where we introduced the notation \(\tau (x_1^n)_{D_1^n {\bar{D}}_1^n} = \tau (x_1)_{D_1 {\bar{D}}_1} \otimes \cdots \otimes \tau (x_n)_{D_n {\bar{D}}_n}\). This implies that for any \(x_1^n\), we have
By taking the sum over \(x_1^n \in \Omega \) and then normalising by \(\rho [\Omega ]\), we get
Thus, we can apply Lemma B.7 and prove the equality (48). Theorem 3.2 then implies that
where \(\nu \) is a state defined by
We now use properties of \(\rho _{|\Omega }\) to simplify the expression of \(\nu \). Observing that
![](http://media.springernature.com/lw390/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ50_HTML.png)
we can write
![](http://media.springernature.com/lw529/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ219_HTML.png)
In addition, as \({\bar{\rho }}_{|\Omega }\) is of the form
![](http://media.springernature.com/lw471/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ220_HTML.png)
we have
![](http://media.springernature.com/lw438/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ221_HTML.png)
where \(\rho _{A_1^n B_1^n E, x_1^n}^{0}\) is the projector onto the support of \(\rho _{A_1^n B_1^n E, x_1^n}\). Hence,
![](http://media.springernature.com/lw405/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ51_HTML.png)
Getting back to the inequality (49), we have \(H_{\alpha }(A_1^n X_1^n | B_1^n E {\bar{D}}_1^n)_{{\bar{\rho }}_{|\Omega }} = H_{\alpha }(A_1^n | B_1^n E )_{{\bar{\rho }}_{|\Omega }}\) using Eq. (50) to drop \({\bar{D}}_1^n\) and Lemma B.7 to drop \(X_1^n\). Moreover, using (51), we have that \(H_{\alpha }(D_1^n | A_1^n X_1^n B_1^n E {\bar{D}}_1^n)_{\nu } = H_{\alpha }(D_1^n | X_1^n {\bar{D}}_1^n)_{\nu }\). Finally, we get
It is a direct consequence of the definition of \(\tau (x)\) that
where we have used that f is an affine function. Using Lemma B.3 and (51) we can bound the second term on the right hand side of (52) by
Inserting this in (52) and replacing \(\alpha \) with \(\frac{1}{\alpha }\) gives
This concludes the proof of inequality (42). \(\quad \square \)
Finally, we prove Theorem 4.4 using Proposition 4.5.
Proof of Theorem 4.4
The first step is to use Lemma B.10 to lower-bound the smooth min-entropy by a Rényi entropy:
Then Proposition 4.5 yields
where we have used the fact that we are constrained to choose \(\alpha \leqslant 1 + \frac{2}{V} \leqslant 2\) in the last inequality. We now choose
and note that, as long as
the value \(\alpha \) is strictly smaller than \(1 + \frac{2}{V}\) and therefore within the required bounds. Note also that if (55) does not hold then the term \(c \sqrt{n}\) in the claim (29) is at least \(n V \geqslant 2 n \log (1+ 2 d_A) \geqslant 2 n \log d_A\), whereas the min-entropy is always at least \(- n \log d_A\) and \(n f_{\min }(q)\) is at most \(n \log d_A\), which means that the claim is trivial. Finally, inserting (54) into the above yields
as advertised. Once again, the max-entropy statement (30) holds by switching the direction of the inequalities, flipping the appropriate signs, and replacing every occurrence of \(H^{\uparrow }_{\alpha }\) by \(H_{\frac{1}{\alpha }}\). \(\quad \square \)
It might seem restrictive to assume that the tradeoff function is affine. We next show that we may take a general convex function provided the event \(\Omega \) can be described as follows: \(x^n \in \Omega \) if and only if \(\mathsf {freq}(x^n) \in {\hat{\Omega }}\) where \({\hat{\Omega }}\) is a convex subset of \({\mathbb {P}}\).
Corollary 4.7
Let \({\mathcal {M}}_1,\dots ,{\mathcal {M}}_n\) and \(\rho _{A_1^n B_1^n X_1^n E}\) be such that (26) and the Markov conditions (27) hold, let \(h \in {\mathbb {R}}, \varepsilon \in (0,1)\), let \({\hat{\Omega }}\) be a convex set \({\hat{\Omega }} \subseteq {\mathbb {P}}\) and define the corresponding event \(\Omega \subseteq {\mathcal {X}}^n\) by \(x_1^n \in \Omega \Leftrightarrow \mathsf {freq}(x_1^n) \in {\hat{\Omega }}\). Then, if f is a differentiable and convex min-tradeoff function for \({\mathcal {M}}_1,\dots ,{\mathcal {M}}_n\) satisfying \(f(q) \geqslant h\) for all \(q \in {\hat{\Omega }}\), we have
where \(c = 2 \bigl (\log (1+2 d_A) + \left\lceil \Vert \nabla f \Vert _\infty \right\rceil \bigr ) \sqrt{1- 2 \log (\varepsilon \rho [\Omega ])}\). Similarly, if f is a differentiable and concave max-tradeoff function for \({\mathcal {M}}_1,\dots ,{\mathcal {M}}_n\) satisfying \(f(q) \leqslant h\) for all \(q \in {\hat{\Omega }}\), we have
Proof
Let us denote by \(\mathrm {cl}({\hat{\Omega }})\) the closure of the set \({\hat{\Omega }}\). Now as f is continuous on the compact set \(\mathrm {cl}({\hat{\Omega }})\) (it is even assumed to be differentiable on all of \({\mathbb {P}}\)), we have \(\min _{q \in \mathrm {cl}({\hat{\Omega }})} f(q) = f(q_0)\) for some \(q_0 \in \mathrm {cl}({\hat{\Omega }})\). By continuity of f and by definition of h, we have \(f(q_0) \ge h\). Now consider the affine function \(g(q) = (\nabla f)_{q_0} \cdot (q - q_0) + f(q_0)\). By convexity of f, we have that \(g(q) \le f(q)\) for all \(q \in {\mathbb {P}}\) and thus g is a min-tradeoff function. In addition, as \(\mathrm {cl}({\hat{\Omega }})\) is convex we can apply the first order optimality conditions and get that \((\nabla f)_{q_0} \cdot (q - q_0) \ge 0\) for all \(q \in \mathrm {cl}({\hat{\Omega }})\). As a result, for all \(q \in \mathrm {cl}({\hat{\Omega }})\), we have \(g(q) \geqslant f(q_0) \ge h\). This implies that if \(x_1^n \in \Omega \), then \(g(\mathsf {freq}(x^n)) \ge h\). We can then apply Theorem 4.4 with the affine tradeoff function g and get the desired result as \(\Vert \nabla g \Vert _{\infty } \le \Vert \nabla f \Vert _{\infty }\).
The proof for \(H^{\varepsilon }_{\max }\) is analogous. \(\quad \square \)
One natural choice for the event \(\Omega \) is that the empirical distribution \(\mathsf {freq}(X_1^n)\) takes a particular value q. This yields the following special case of Corollary 4.7.
Corollary 4.8
Let \({\mathcal {M}}_1,\dots , {\mathcal {M}}_n\) and \(\rho _{A_1^n B_1^n X_1^n E}\) be such that (26) and the Markov conditions (27) hold. Then, for any differentiable and convex min-tradeoff function f for \({\mathcal {M}}_1, \ldots , {\mathcal {M}}_n\) and for any \(q \in {\mathbb {P}}\), we have
where \(c = 2 \bigl (\log (1+2 d_A) + \left\lceil \Vert \nabla f(q) \Vert _\infty \right\rceil \bigr ) \sqrt{1- 2 \log (\varepsilon \rho [q]})\), where \(\rho _{|q}\) denotes the state \(\rho \) conditioned on the event that \(\mathsf {freq}(X_1^n) = q\), and \(\rho [q]\) the probability of this event.
Note that an analogous statement holds of course for the max-entropy, replacing f by a concave max-tradeoff function and changing the inequality accordingly.
The following corollary specialises the above to the formulation (3), in which no statistical test is being done, i.e. the \(X_i\) systems are trivial. We provide the statement for the case of the lower boundary.
Corollary 4.9
Let \({\mathcal {M}}_1,\dots , {\mathcal {M}}_n\) and \(\rho _{A_1^n B_1^n E}\) be such that (26) and the Markov conditions (27) hold. Then
where \(c = 3 (\log (1+2 d_A) \bigr ) \sqrt{1- 2 \log (\varepsilon })\).
Proof
Note that the quantity \(H_{\min }^\varepsilon (A_1^n | B_1^n E)_{\rho }\) only depends on the marginal of the state \(\rho \) on \(A_1^n B_1^n E\). Thus, we can modify the maps \({\mathcal {M}}_i\) in any way that does not affect the reduced state \(\rho _{A_1^n B_1^n E}\) before applying Corollary 4.8. In particular, we change \({\mathcal {M}}_i\) so that the original value of \(X_i\) is disregarded and replaced with the constant value \(X_i = i\). The values \(X_1, \ldots , X_n\) can then be regarded as random variables with alphabet \({\mathcal {X}}= \{1, \ldots , n\}\). We define the real function f on \({\mathbb {P}}\) as
Note that for any \(i \in \{1,\dots , n\}\) and any \(q \in {\mathbb {P}}\), we have either \(q(i) \ne 1\) in which case \(\Sigma _i(q) = \emptyset \) (we use the notation in (28)) and the min-tradeoff condition is trivial or \(q(i) = 1\), in which case \(\Sigma _i(q) = \{({\mathcal {M}}_i \otimes {\mathcal {I}}_R)(\omega _{R_{i-1} R}) : \omega _{R_{i-1} R} \in \mathrm {D}(R_{i-1} \otimes R) \}\). Thus for any \(q \in {\mathbb {P}}\),
As a result, f is a min-tradeoff function for all \({\mathcal {M}}_i\) for \(i \in \{1, \dots , n\}\). We now fix \(q \in {\mathbb {P}}\) such that \(q(1) = \cdots = q(n) = \frac{1}{n}\), in which case the event \(\mathsf {freq}(X_1^n) = q\) occurs with certainty. Because
which implies that \(\left\lceil \Vert \nabla f(q) \Vert _\infty \right\rceil \le \log (1+2d_A)\), the claim follows immediately from Corollary 4.8. \(\quad \square \)
As indicated in the introduction, in the special case where the individual pairs \((A_i, B_i)\) are independent and identically distributed (IID), the entropy accumulation theorem corresponds to the Quantum Asymptotic Equipartition Property [54]. We can therefore formulate the latter as a corollary of Theorem 4.4.Footnote 14
Corollary 4.10
For any bipartite state \(\nu _{A B}\), any \(n \in {\mathbb {N}}\), and any \(\varepsilon \in (0,1)\),
Proof
Let, for any \(i=1, \ldots , n\), \({\mathcal {M}}_i\) be the TPCP map from R to XABR which sets AB to state \(\nu _{A B}\) and where X and R are trivial (one-dimensional) systems. The concatenation of these maps thus generates the state \(\rho _{A_1^n B_1^n} = \nu _{A B}^{\otimes n}\). The claim is then obtained from Theorem 4.4 with the trade-off function f being a constant equal to \(h = H(A|B)_{\nu }\) and with \(\Omega \) as the certain event. \(\quad \square \)
5 Applications
Entropy is a rather general notion and, accordingly, entropy accumulation has applications in various areas of physics, information theory, and computer science. An example from physics is the phenomena of thermalisation. It is known that a system can only thermalise if its smooth min-entropy is sufficiently large [18]. To illustrate how Theorem 4.4 could give an estimate of this entropy, consider a system of interest (e.g., a cup of coffee) which is in contact with a large environment (the air around it). Suppose that, for an appropriately chosen discretisation of the evolution, the system interacts at each time step with a different part of the environment (e.g., with different air molecules bouncing off the coffee cup).Footnote 15 Theorem 4.4 then provides a bound on the total entropy that is transferred to the environment in terms of the von Neumann entropy transferred in each time step. Because the joint time evolution of system and environment is unitary, this entropy flow to the environment could be expressed in terms of the entropy change of the system itself. The argument would therefore prove that the total entropy acquired by the system over many time steps is bounded by the sum of the entropies produced in each individual time step.
Another area where the notion of entropy plays a crucial role is quantum cryptography. Many proofs of security of cryptographic protocols involve lower-bounding the uncertainty that a dishonest adversary has about some system of interest. The state-of-the-art is to derive such bounds using a combination of de Finetti-type theorems as well as the Quantum Asymptotic Equipartition Property [4, 15, 41, 42]. However, the use of de Finetti theorems comes with various disadvantages. Firstly, they are only applicable under certain assumptions on the symmetry of the protocols. Secondly, they introduce additional error terms that can be large in the practically relevant finite-size regime [47]. Finally, it is not known how to apply de Finetti theorems in a device-independent scenario (see [21] for an overview and references on device-independent cryptography). These problems can all be circumvented by the use of entropy accumulation, as demonstrated in [5] for the case of device-independent quantum key distribution and randomness expansion. The resulting security statements are valid against general attacks and essentially optimal in the finite-size regime.
In the remainder of this section, we illustrate the use of entropy accumulation with two concrete examples. The first is a security proof for a basic quantum key distribution protocol. The second is a novel derivation of an upper bound on the fidelity of fully quantum random access codes.
5.1 Sample application: security of quantum key distribution
A Quantum Key Distribution (QKD) protocol enables two parties, Alice and Bob, to establish a common secret key, i.e., a string of random bits unknown to a potential eavesdropper, Eve. The setting is such that Alice and Bob can communicate over a quantum channel, which may however be fully controlled by Eve. In addition, Alice and Bob have a classical communication link which is assumed to be authenticated, i.e., Eve may read but cannot alter the classical messages exchanged between Alice and Bob. The protocol is said to be secure against general attacks if any attack by Eve is either detected (in which case the protocol aborts) or does not compromise the secrecy of the final key. Here, we will show that our main theorem can be directly applied to show security against general attacks for a fairly standard QKD protocol. As a bonus, our proof still holds even if we do not make any assumptions about Bob’s measurement device: the POVM applied by Bob at every step of the protocol can be arbitrary, and may vary from one step to the next (thereby achieving one-sided measurement device independence as in [58], but without the restriction to memoryless devices; see also [56]). In fact, as shown in [5], the entropy accumulation theorem can be used to prove the security of fully device-independent quantum key distribution.
For concreteness, we consider here a variant of the E91 QKD protocol [22] (and note that any security proof for this protocol also implies security of the BB84 protocol [10, 11]). The protocol consists of a sequence of instructions for Alice and Bob, as described in the box below. These depend on certain parameters, including the number, n, of qubits that need to be transmitted over the quantum channel, the maximum tolerated noise level, e, of this channel, as well as the key rate, r, which is defined as the number of final key bits divided by n. In the first protocol step, Alice and Bob need to measure their qubits at random in one of two mutually unbiased bases, which we term the computational and the diagonal basis. These are chosen with probability \(1-\mu \) and \(\mu \), respectively, for some \(\mu \in (0,1)\). The protocol also invokes an error correction scheme termed \(\mathrm {EC}\), which allows Bob to infer the measurement outcomes obtained by Alice for the set of indices S where the basis choices of Alice and Bob were the same. Note that if the protocol was implemented without any noise, then Bob’s outcomes would match exactly with Alice’s outcomes on the indices S and no error correction would be required. However, in the presence of noise, such an error correction step is needed. For this, Alice needs to send classical error correcting information to Bob, whose maximum relative length is characterised by another parameter, \(\vartheta _{\mathrm {EC}}\). We assume that \(\mathrm {EC}\) is reliable. This means that, except with negligible probability, Bob either obtains a correct copy of Alice’s string or he is notified that the string cannot be inferred.Footnote 16
![figure a](http://media.springernature.com/lw685/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Figa_HTML.png)
The security of QKD against general attacks has been established in a sequence of works [13, 32, 33, 41, 49]. Specifically, for the E91 protocol, the following result has been shown.
Theorem 5.1
The E91 protocol is secure for any choice of protocol parameters satisfyingFootnote 17
provided that n is sufficiently large.
Note that, because \(\mu > 0\) can be chosen arbitrarily small, the theorem implies that the E91 protocol can generate secret keys at an asymptotic rate of \(1-H_{\mathrm {Sh}}(e) - \vartheta _{\mathrm {EC}}\). We now show how this result can be obtained using the notion of entropy accumulation.
Proof
According to a standard result on two-universal hashing (see, for instance, Corollary 5.6.1 of [41]), the key \(F(A_S)\) computed in the privacy amplification step is secret to an adversary holding information \(E'\) if the smooth min-entropy of \(A_S\) conditioned on \(E'\) is sufficiently larger than the output size of the hash function F. Since, in our case, this size is \(\lfloor r n \rfloor \), the condition reads
where the entropy is evaluated for the joint state \(\rho _{|\Omega }\) of \(A_S\) and \(E'\) conditioned on the event \(\Omega \) that the protocol is not aborted and that Bob’s guess \({\hat{A}}_S\) of \(A_S\) is correct. The smoothing parameter \(\varepsilon \in (0,1)\) specifies the desired level of secrecy,Footnote 18 and we assume here that it is constant (independent of n). Because conditioning the smooth min-entropy of a classical variable on an additional bit cannot decrease its value by more than 1 (see, e.g., Proposition 5.10 of [51]), we may bound the smooth min-entropy in (59) by
where E denotes all information held by Eve after the distribution step, and where \(|S| \vartheta _{\mathrm {EC}}\) is the maximum number of bits exchanged for error correction. Note that we also included the basis information \(B_1^n\) and \({\bar{B}}_1^n\) in the conditioning part because Eve may obtain this information during the sifting and information reconciliation step. We are thus left with the task of lower bounding \(H_{\min }^\varepsilon (A_S|B_1^n {\bar{B}}_1^n E)_{|\rho _{|\Omega }}\), which is usually the central part of any security proof. Since it is also the part where entropy accumulation is used, we formulate it separately as Claim 5.2 below. Inserting this claim into (60), we conclude that the secrecy condition (59) is fulfilled whenever
holds. But this is clearly the case for any choice of parameters satisfying (58), provided that n is sufficiently large. \(\quad \square \)
It remains to show the separate claim, which we do using entropy accumulation.
Claim 5.2
Let \(A_1^n\), \(B_1^n\), \({\bar{B}}_1^n\), and S be the information held by Alice and Bob as defined by the protocol, let E be the information gathered by Eve during the distribution step, and let \(\Omega \) be the event that the protocol is not aborted and that Bob’s guess \({\hat{A}}_S\) of \(A_S\) is correct. Then, provided that \(\Omega \) has a non-negligible probability (i.e., it does not decrease exponentially fast in n),
Proof
Let \(\rho ^0_{Q_1^n {\bar{Q}}_1^n E}\) be the joint state of Alice and Bob’s qubit pairs before measurement, together with the information E gathered by Eve during the distribution step, and let
where \({\mathcal {M}}_i\), for any \(i \in \{1, \ldots , n\}\), is the TPCP map from \(Q_i^n {\bar{Q}}_i^n\) to \(Q_{i+1}^n {\bar{Q}}_{i+1}^n A_i {\bar{A}}_i B_i {\bar{B}}_i X_i\) defined as follows:
-
(i)
\(B_i, {\bar{B}}_i\): random bits chosen independently according to the distribution \((1-\mu , \mu )\)
-
(ii)
\(A_i = {\left\{ \begin{array}{ll} \text {if } B_i = {\bar{B}}_i = 0: &{} \text {outcome of measurement of }Q_i\text { in computational basis} \\ \text {if } B_i = {\bar{B}}_i = 1: &{} \text {outcome of measurement of } Q_i\text { in diagonal basis} \\ \text {if } B_i \ne {\bar{B}}_i: &{} \perp \end{array}\right. }\)
-
(iii)
\({\bar{A}}_i = {\left\{ \begin{array}{ll} \text {if } B_i = {\bar{B}}_i = 1: &{} \text {outcome of measurement of }{\bar{Q}}_i\text { in diagonal basis} \\ \text {otherwise}: &{} \perp \end{array}\right. }\)
-
(iv)
\(X_i = {\left\{ \begin{array}{ll} \text {if } B_i = {\bar{B}}_i = 1: &{} A_i \oplus {\bar{A}}_i \\ \text {otherwise}: &{} \perp \end{array}\right. }\)
-
(v)
\(Q_{i+1}^n\) and \({\bar{Q}}_{i+1}^n\) are left untouched.
Note that the values \(B_1^n\) and \({\bar{B}}_1^n\) correspond to the ones generated during the distribution step of the protocol. The same is true for \(A_1^n\), with the modification that \(A_i\) holds the measurement outcome only if \(B_i = {\bar{B}}_i \). That is, \(A_i \ne \perp \) if and only if \(i \in S\), where S is the set determined in the sifting step. We can therefore rewrite (61) as
To prove this inequality, we use Theorem 4.4 with the replacements \(A_i \rightarrow A_i {\bar{A}}_i\), \(B_i \rightarrow B_i {\bar{B}}_i\), \(X_i \rightarrow X_i\), and \(R_i \rightarrow Q_{i+1}^n {\bar{Q}}_{i+1}^n\). We note that \(X_i\) is a deterministic function of the classical registers \(A_i {\bar{A}}_i\) and \(B_i {\bar{B}}_i\). To obtain the bound in (62), we need to define a min-tradeoff function. Let \(i \in \{1, \ldots , n\}\) and consider the state
where \(\omega _{Q_{i}^n {\bar{Q}}_{i}^n R}\) is an arbitrary state. Let furthermore \(\nu _{|b} = \nu _{X_i A_i {\bar{A}}_i R | b}\) be the corresponding state obtained by conditioning on the event that \(B_i = {\bar{B}}_i = b\), for \(b \in \{0, 1\}\). We may now bound the entropy of \(A_i\) using the entropic uncertainty relation proved in [12], which asserts that
By the definition of \(X_i\), we also have
where we wrote \(\nu _{X_i}\) to denote the probability distribution on \(\{0, 1, \bot \}\) defined by the state \(\nu \), and where we have used that \(\nu _{X_i}(0) + \nu _{X_i}(1) = \mu ^2\). Furthermore, because \(A_i\) is classical, its von Neumann entropy cannot be negative, which implies that
Combining this with the above, we find that
holds for
In other words, \({\tilde{f}}\) is a min-tradeoff function for \({\mathcal {M}}_i\). Furthermore, because the binary Shannon entropy \(H_{\mathrm {Sh}}\) is concave, \({\tilde{f}}\) is convex. We may thus define a linearised min-tradeoff function f as a tangent hyperplane to \({\tilde{f}}\) at the point \(q_0\) given by \(q_0(0) = (1-e) \mu ^2\), \(q_0(1) = e \mu ^2\), and \(q_0(\bot ) = 1-\mu ^2\). Furthermore, we define
Finally, note that the event \(\Omega \) that Bob’s guess of \(A_S\) is correct and that the protocol is not aborted implies that \(q = \mathsf {freq}(X_1^n)\) is such that \(\frac{q(1)}{\mu ^2} \leqslant e\) and, hence, \(f(\mathsf {freq}(X_1^n)) \geqslant h\). Since we assumed that \(\Omega \) has non-negligible probability, Theorem 4.4 implies that
(Note that the Markov chain conditions are satisfied because \(B_i\) and \({\bar{B}}_i\) are chosen at random independently of any other information.) Furthermore, because \({\bar{A}}_i\) equals \(\bot \) unless \(B_i = {\bar{B}}_i = 1\), which occurs with probability \(\mu ^2\), we have
Combining these inequalities with the chain rule for smooth entropies (see Theorem 15 of [60]),
proves (62) and, hence, Claim 5.2. \(\quad \square \)
5.2 Sample application: fully quantum random access codes
One relatively simple application of our main result is to give upper bounds on the fidelity achieved by so-called Fully Quantum Random Access Codes (FQRAC). An FQRAC is a method for encoding m message qubits into \(n < m\) code qubits, such that any subset of k message qubits can be retrieved with high fidelity. Limits on the performance of random access codes with classical messages are rather well understood: the case \(k=1\) was studied in [1, 2, 38], and upper bounds on the success probability that decay exponentially in k were derived in [9, 20, 65]. In the fully quantum case, [20] gives similar upper bounds on the fidelity that decay exponentially in k. Here, we show that such exponential bounds for the fully quantum case can be obtained in a relatively elementary fashion via the concept of entropy accumulation. The example also highlights that entropy accumulation is already useful in its basic form (3), which does not involve statistics information \(X_i\). Indeed, here the bound on the entropy produced at every step comes from the bound on the number of code qubits.
Definition 5.3
A \((\varepsilon ,m,n,k)\)-Fully Quantum Random Access Code (FQRAC) consists of an encoder \({\mathcal {E}}_{{M'}_1^m \rightarrow C_1^n}\) and a decoder \({\mathcal {D}}_{C_1^n S \rightarrow {\bar{M}}_S S}\), where \({M'}_1^m\) represents the m message qubits, \(C_1^n\) represents the n code qubits, S represents a classical description of a subset of \(\{1,\dots ,m\}\) of size k, and \({\bar{M}}_S\) represents the output of the decoder, corresponding to the k positions of \({M'}_1^m\) listed in S. Such a code must satisfy the following: for any state \(\rho _{R {M'}_1^m S}\) which is classical on S, we must have that
where R is a reference system of arbitrary dimension, and where \({\mathcal {S}}_{ {M'}_1^m S \rightarrow {\bar{M}}_S S}\) is a TPCP map that selects the k positions of \({M'}_1^m\) corresponding to those in S and outputs them into \({\bar{M}}_S\). Moreover, \(F(\rho ,\sigma ) := \Vert \sqrt{\rho } \sqrt{\sigma }\Vert _1\) refers to the fidelity between two states \(\rho \) and \(\sigma \).
Entropy accumulation gives the following constraint on FQRACs:
Theorem 5.4
A \((\varepsilon ,m,n,k)\)-FQRAC satisfies
Compared to the previously derived bound (Theorem 9 of [20]), the one obtained here is tighter for small k,Footnote 19 whereas it is weaker for large k.
Proof
Since the fidelity bound must be true for any state \(\rho \), it must in particular be true for the state consisting of m maximally entangled pairs and a uniform distribution over subsets S. For every \(i \in \{1,\dots ,k\}\), define
as a TPCP map that does the following:
-
1.
Generate an index \({\bar{J}}_i\) at random from \(\{ 1,\dots ,m - i +1 \}\).
-
2.
Move the contents of \(M_{{\bar{J}}_i}\) into \({\hat{M}}_i\), and set \(M_1^{m-i}\) to the contents of \(M_1^{m-i+1}\) with the \({\bar{J}}_i\)th position removed.
Finally, define the state
![](http://media.springernature.com/lw406/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ222_HTML.png)
where . The next step is to use Theorem 4.4 on the state \(\rho ^k\) with the identifications
and the tradeoff function f being the constant function equal to
where the infimum is taken over states \(\nu ^i\) of the form
for some state \(\omega ^i\). Here we also used Remark 4.3, which asserts that the system R that is used when defining the min-tradeoff function can be chosen isomorphic to \(A_1^{i-1} B_1^{i-1} E\). Note that the Markov chain condition is immediate from the fact that \({\bar{J}}_i\) is chosen at random. As the systems \(X_i\) are trivial, we naturally take \(\Omega \) to be the certain event. We find that
Furthermore, again by Remark 4.3, if part of B is classical in \(\rho \), then it remains classical in \(\nu \). As a result, we can assume in the following that \({\bar{J}}_{1}^{i-1}\) is a classical system in \(\nu ^i\).
We continue by computing the expectation over the choice of \({\bar{J}}_i\):
where the last inequality holds because \({\bar{J}}_i\) is classical, which implies that the first entropy in the bracket of the penultimate expression is non-negative, and because the second entropy in the bracket is upper bounded by \(n+k-1\).
We now use Proposition 5.5 and Remark 5.6 of [51], which imply thatFootnote 20
where the second inequality holds because the denominator in the logarithm is lower bounded by \(f^3/3\), as can be readily verified. Combining this with the above gives
Conversely, note that, by assumption, the purified distance between \(\rho ^k\) and the state consisting of k maximally entangled qubit pairs is upper bounded by \(\sqrt{1-(1-\varepsilon )} = \sqrt{1-f^2}\). Since the max-entropy of k maximally entangled qubit pairs equals \(-k\), we have
We have thus derived the condition
It is easy to verify that this condition is violated whenever
is violated. In fact, if \(\log \frac{8}{f^2} \leqslant k \left( \frac{m-n-k+1}{ 5 m} \right) ^2\), then we have
Adding the square root of the first inequality and the second one, we get that inequality (69) is violated. Thus, the condition (70) must hold, and therefore also (63). \(\quad \square \)
6 Conclusions
Informally speaking, entropy accumulation is the claim that the operationally relevant entropy (the smooth min- or max-entropy) of a multipartite system is well approximated by the sum of the von Neumann entropies of its individual parts. This has ramifications in various areas of science, ranging from quantum cryptography to thermodynamics.
As described in Sect. 5, current cryptographic security proofs have various fundamental and practical limitations [46]. That these can be circumvented using entropy accumulation has already been demonstrated in [5] for the case of device-independent cryptography. We anticipate that the approach can be applied similarly to other cryptographic protocols. Examples include quantum key distribution protocols such as DPS and COW [27, 50], for which full security has not yet been established.Footnote 21 One may also expect to obtain significantly improved security bounds for protocols that involve high-dimensional information carriers and, in particular, continuous-variable protocols [25, 62].Footnote 22 A strengthening of current security claims may as well be obtained for other cryptographic constructions, such as bit commitment and oblivious transfer protocols (see, for example, [16, 20, 29]).
Entropy accumulation can also be used in statistical mechanics, e.g., to characterise thermalisation processes. At the beginning of Sect. 5 we outlined an argument that could confirm | and make precise | the intuition that entropy production (in terms of von Neumann entropy) is relevant for thermalisation. However, to base such arguments on physically realistic assumptions, it may be necessary to generalise Theorem 4.4 to the case where the Markov conditions (27) do not hold exactly. One possibility, motivated by the main result of [23], could be to replace them by the less stringent conditions
Another promising direction would be to apply entropy accumulation to estimate the entropy of low-energy states of many-body systems. One may expect that, under appropriate physical assumptions, these states possess a structure that permits a decomposition of the form described by Fig. 1 such that the Markov conditions required for Theorem 4.4, or at least some relaxations of them such as (71), hold. This may for example be the case for systems whose states are well approximated by matrix products states (see, e.g., [59]). We leave the investigation of such applications, as well as the development of corresponding extensions of the entropy accumulation theorem, for future work.
Notes
There is some freedom in how to count the number of bits, but the statement always holds up to additive terms of the order \(\log (1/\varepsilon )\).
The IID assumption corresponds to the special case where the systems \(R_{i}\) are trivial (ensuring the mutual independence of the pairs \(A_i B_i\)) and where the maps \({\mathcal {M}}_i\) are all the same (ensuring that the state of each pair \(A_i B_i\) is identical to all others).
The necessity of this condition is discussed in “Appendix C”.
In the special case where the source is IID, the memory register is trivial and no minimisation (or maximisation) is necessary. Expression (3) then reduces to (2), and one retrieves the (well known) result that the number of bits that Alice needs to communicate to Bob per value \(A_i\) is, up to second order terms, given by \(H(A_i|B_i)\).
While the tradeoff functions considered in this work are defined in terms of conditional von Neumann entropies, the rate curves of [35] are equal to a difference of \((1+\varepsilon )\)-Rényi entropies (see [35, Section 6]). The latter cannot be larger than the tradeoff functions, which yield asymptotically optimal randomness extraction rates (as shown in [5]).
Note that this operator is well defined, for the support of \(\rho _{A B}\) is contained in the support of \(\mathrm {id}_A \otimes \rho _B\).
\(\oplus \) denotes the orthogonal direct sum.
The purified distance is defined as \(P(\rho , {\tilde{\rho }}) = \sqrt{1 - \big (\mathrm {tr}\big | \sqrt{\rho } \sqrt{{\tilde{\rho }}} \big | \big )^2 }\) whenever either \(\rho \) or \({\tilde{\rho }}\) is normalized.
We note that there are at least two common variants for how to define a conditional entropy based on a relative entropy. We refer to “Appendix B” for more details.
If the set \(\Sigma _i(q)\) is empty then the infimum and supremum are by definition equal to \(\infty \) and \(-\infty \), respectively, so that the conditions are trivial.
We say that the event \(\Omega \) implies \(f(\mathsf {freq}(X_1^n)) \geqslant h\) if for every \(x_1^n \in \Omega , f(\mathsf {freq}(x_1^n)) \geqslant h\).
In the version of [54], the term \(1 + 2 d_A\) in the logarithm is replaced by an expression that depends on the entropy of A conditioned on B.
Any error correction scheme can be turned into a reliable one by appending a test where Alice and Bob compare a hash value computed from their (corrected) strings.
\(H_{\mathrm {Sh}}(e) = -e \log e - (1-e) \log (1-e)\) is the binary Shannon entropy.
Roughly, \(\varepsilon \) corresponds to the maximum probability by which one could encounter a deviation from perfect secrecy [40].
The bound of Theorem 9 of [20] has a pre-factor of the order of m and is therefore only non-trivial if \(k > \log m\).
Because \(\arcsin (f/2) + \arcsin (\sqrt{1-f^2}) < \arcsin (f) + \arcsin (\sqrt{1-f^2}) = \pi /2\), the condition of Remark 5.6 of [51] is satisfied.
These protocols do not have the required symmetries to employ standard techniques such as de Finetti-type theorems [42].
The security of continuous-variable protocols against general attacks has been proved [43], but the bounds have an unfavourable scaling in the finite-size regime.
The positive part of a Hermitian operator X is defined as \(\{X > 0\} X\), where \(\{X > 0\}\) is the projector onto the span of the eigenspaces of X with positive eigenvalues.
References
Ambainis, A., Nayak, A., Ta-Shma, A., Vazirani, U.: Dense quantum coding and a lower bound for 1-way quantum automata. In: Proceedings of the 31t Annual ACM Symposium on Theory of Computing, STOC ’99, pp. 376–383, New York, NY, USA. ACM (1999)
Ambainis, A., Nayak, A., Ta-Shma, A., Vazirani, U.: Dense quantum coding and quantum finite automata. J. ACM 49(4), 496–511 (2002). arXiv:quant-ph/9804043
Araki, H.: On an inequality of Lieb and Thirring. Lett. Math. Phys. 19(2), 167–170 (1990)
Arnon-Friedman, R., Renner, R.: de Finetti reductions for correlations. J. Math. Phys. 56(5) (2015) arXiv:1308.0312
Arnon-Friedman, R., Renner, R., Vidick, T.: Simple and tight device-independent security proofs. SIAM J. Comput. 48(1), 181–225 (2019)
Asorey, M., Kossakowski, A., Marmo, G., Sudarshan, E.G.: Relations between quantum maps and quantum states. Open Syst. Inf. Dyn. 12(04), 319–329 (2005). arXiv:quant-ph/0602228
Attal, S., Pautrat, Y.: From repeated to continuous quantum interactions. Ann. Henri Poincaré 7(1), 59–104 (2006)
Beigi, S.: Sandwiched Rényi divergence satisfies data processing inequality. J. Math. Phys. 54(12), 122202 (2013). arXiv:1306.5920
Ben-Aroya, A., Regev, O., de Wolf, R.: A hypercontractive inequality for matrix-valued functions with applications to quantum computing and LDCs. In: Proceedings of the FOCS (2008). arXiv:0705.3806
Bennett, C.H., Brassard, G.: Quantum cryptography: public key distribution and coin tossing. In: Proceedings of the International Conference on Computers, Systems and Signal Processing (1984)
Bennett, C.H., Brassard, G., Mermin, N.D.: Quantum cryptography without Bell’s theorem. Phys. Rev. Lett. 68(5), 557 (1992)
Berta, M., Christandl, M., Colbeck, R., Renes, J.M., Renner, R.: The uncertainty principle in the presence of quantum memory. Nat. Phys. 6, 659 (2010). arXiv:0909.0950
Biham, E., Boyer, M., Boykin, P.O., Mor, T., Roychowdhury, V.: A proof of the security of quantum key distribution (extended abstract). In: Proceedings of the ACM STOC, pp. 715–724, New York, NY, USA. ACM (2000)
Bruneau, L., Joye, A., Merkli, M.: Repeated interactions in open quantum systems. J. Math. Phys. 55(7), 1 (2014)
Christandl, M., König, R., Renner, R.: Postselection technique for quantum channels with applications to quantum cryptography. Phys. Rev. Lett. 102(2), 020504 (2009). arXiv:0809.3019
Damgård, I.B., Fehr, S., Salvail, L., Schaffner, C.: Cryptography in the bounded quantum-storage model. In: Proceedings of the FOCS, pp. 449–458 (2005). arXiv:quant-ph/0508222
Datta, N., Leditzky, F.: A limit of the quantum Rényi divergence. J. Phys. A: Math. Theor. 47(4), 045304 (2014). arXiv:1308.5961
del Rio, L., Hutter, A., Renner, R., Wehner, S.: Relative thermalization. arXiv:1401.7997 (2014)
Dupuis, F., Fawzi, O.: Entropy accumulation with improved second-order term. IEEE Trans. Inform. Theory 65(11), 7596–7612 (2019). arXiv:1805.11652
Dupuis, F., Fawzi, O., Wehner, S.: Entanglement sampling and applications. IEEE Transactions on Information Theory 61(2), 1093–1112 (2015). arXiv:1305.1316
Ekert, A., Renner, R.: The ultimate physical limits of privacy. Nature 507(7493), 443–447 (2014)
Ekert, A.K.: Quantum cryptography based on Bell’s theorem. Phys. Rev. Lett. 67(6), 661 (1991)
Fawzi, O., Renner, R.: Quantum conditional mutual information and approximate Markov chains. Commun. Math. Phys. 340(2), 575–611 (2015). arXiv:1410.0664
Frank, R.L., Lieb, E.H.: Monotonicity of a relative Rényi entropy. J. Math. Phys. 54(12), 122201 (2013). arXiv:1306.5358
Grosshans, F., Grangier, P.: Continuous variable quantum cryptography using coherent states. Phys. Rev. Lett. 88, 057902 (2002)
Hayden, P., Jozsa, R., Petz, D., Winter, A.: Structure of states which satisfy strong subadditivity of quantum entropy with equality. Commun. Math. Phys. 246(2), 359–374 (2004). arXiv:quant-ph/0304007
Inoue, K., Honjo, T.: Robustness of differential-phase-shift quantum key distribution against photon-number-splitting attack. Phys. Rev. A 71, 042305 (2005)
Koashi, M., Imoto, N.: Operations that do not disturb partially known quantum states. Phys. Rev. A 66(2), 022318 (2002)
König, R., Wehner, S., Wullschleger, J.: Unconditional security from noisy quantum storage. IEEE Trans. Inform. Theory 58(3), 1962–1984 (2012). arXiv:0906.1030
Leifer, M.S.: Conditional density operators and the subjectivity of quantum operations. AIP Conf. Proc. 889(1), 172–186 (2007). arXiv:quant-ph/0611233
Lieb, E., Thirring, W.: Inequalities for the moments of the eigenvalues of the schrodinger equation and their relation to sobolev inequalities. In: Lieb, E., Simon, B., Wightman, A.S. (eds.) Studies in Mathematical Physics: Essays in honor of Valentine Bargman pp. 269–303 (1976)
Lo, H.-K., Chau, H.F.: Unconditional security of quantum key distribution over arbitrarily long distances. Science 283, 2050–2056 (1999)
Mayers, D.: Unconditional security in quantum cryptography. J. ACM 48, 351–406 (2001)
Miller, C.A., Shi, Y.: Robust protocols for securely expanding randomness and distributing keys using untrusted quantum devices. In: Proceedings of the ACM STOC, pp. 417–426. ACM, (2014). arXiv:1402.0489
Miller, C.A., Shi, Y.: Universal security for randomness expansion. arXiv:1411.6608v3 (2014)
Müller-Lennert, M.: Quantum Relative Rényi Entropies. Master’s thesis, ETH Zurich (2013)
Müller-Lennert, M., Dupuis, F., Szehr, O., Fehr, S., Tomamichel, M.: On quantum Rényi entropies: a new generalization and some properties. J. Math. Phys. 54(12), 122203 (2013). arXiv:1306.3142
Nayak, A.: Optimal lower bounds for quantum automata and random access codes. In: Proceedings of the 40th Annual Symposium on Foundations of Computer Science, FOCS ’99, p. 369, Washington, DC, USA, (1999). IEEE Computer Society
Petz, D.: Sufficiency of channels over von Neumann algebras. Q. J. Math. 39(1), 97–108 (1988)
Portmann, C., Renner, R.: Cryptographic security of quantum key distribution. arXiv:1409.3525 (2014)
Renner, R.: Security of quantum key distribution. PhD thesis, ETH Zurich (2005). arXiv:quant-ph/0512258
Renner, R.: Symmetry of large physical systems implies independence of subsystems. Nat. Phys. 3, 645–649 (2007). arXiv:quant-ph/0703069
Renner, R., Cirac, J.I.: de Finetti representation theorem for infinite-dimensional quantum systems and applications to quantum cryptography. Phys. Rev. Lett. 102, 110504 (2009)
Renner, R., Wolf, S.: Smooth Renyi entropy and applications. In: Proc. IEEE ISIT (2004)
Renner, R., Wolf, S.: Simple and tight bounds for information reconciliation and privacy amplification. In: Roy, B. (ed.) Procedings of the ASIACRYPT, volume 3788 of LNCS, pp. 199–216. Springer, Berlin(2005)
Scarani, V., Bechmann-Pasquinucci, H., Cerf, N.J., Dušek, M., Lütkenhaus, N., Peev, M.: The security of practical quantum key distribution. Rev. Mod. Phys. 81(3), 1301 (2009). arXiv:0802.4155
Scarani, V., Renner, R.: Quantum cryptography with finite resources: unconditional security bound for discrete-variable protocols with one-way postprocessing. Phys. Rev. Lett. 100, 200501 (2008)
Shannon, C.: A mathematical theory of communications. Bell Syst. Tech. J. 27, 379–423 (1948)
Shor, P.W., Preskill, J.: Simple proof of security of the bb84 quantum key distribution protocol. Phys. Rev. Lett. 85(2), 441–444 (Jul 2000). arXiv:quant-ph/0003004
Stucki, D., Brunner, N., Gisin, N., Scarani, V., Zbinden, H.: Fast and simple one-way quantum key distribution. Applied Physics Letters 87(19), 194108 (2005). arXiv:quant-ph/0506097
Tomamichel, M.: A Framework for Non-Asymptotic Quantum Information Theory. PhD thesis, ETH Zurich, (2012). arXiv:1203.2142
Tomamichel, M.: Quantum Information Processing with Finite Resources: Mathematical Foundations, vol. 5. Springer, Berlin (2015)
Tomamichel, M., Berta, M., Hayashi, M.: Relating different quantum generalizations of the conditional Rényi entropy. J. Math. Phys. 55(8), 082206 (2014). arXiv:1311.3887
Tomamichel, M., Colbeck, R., Renner, R.: A fully quantum asymptotic equipartition property. IEEE Trans. Inform. Theory 55, 5840–5847 (2009). arXiv:0811.1221
Tomamichel, M., Colbeck, R., Renner, R.: Duality between smooth min- and max-entropies. IEEE Trans. Inform. Theory, 56, 4674 (2010). arXiv:0907.5238v2
Tomamichel, M., Fehr, S., Kaniewski, J., Wehner, S.: A monogamy-of-entanglement game with applications to device-independent quantum cryptography. New J. Phys. 15(10), 103002 (2013)
Tomamichel, M., Hayashi, M.: A hierarchy of information quantities for finite block length analysis of quantum tasks. IEEE Trans. Inform. Theory 59(11), 7693–7710 (2013)
Tomamichel, M., Renner, R.: Uncertainty relation for smooth entropies. Phys. Rev. Lett. 106(11), 110506 (2011). arXiv:1009.2015
Verstraete, F., Cirac, J.I.: Matrix product states represent ground states faithfully. Phys. Rev. B 73(9), 094423 (2006)
Vitanov, A., Dupuis, F., Tomamichel, M., Renner, R.: Chain rules for smooth min- and max-entropies. IEEE Trans. Inform. Theory 59(5), 2603–2612 (2013). arXiv:1205.5231
Watrous, J.: Theory of quantum information (2011). https://cs.uwaterloo.ca/~watrous/LectureNotes.html
Weedbrook, C., Lance, A.M., Bowen, W.P., Symul, T., Ralph, T.C., Lam, P.K.: Quantum cryptography without switching. Phys. Rev. Lett. 93, 170504 (2004)
Wegman, M.N., Carter, J.L.: New hash functions and their use in authentication and set equality. J. Comput. Syst. Sci. 22(3), 265–279 (1981)
Wilde, M., Winter, A., Yang, D.: Strong converse for the classical capacity of entanglement-breaking and Hadamard channels via a sandwiched Rényi relative entropy. Comm. Math. Phys. 331(2), 593–622 (2014). arXiv:1306.1586
Wullschleger, J.: Bitwise quantum min-entropy sampling and new lower bounds for random access codes (2010). arXiv:1012.2291
Acknowledgements
The authors would like to thank Martin Müller-Lennert, for allowing us to recopy Lemma 3.9 from his Master’s thesis into this paper (as Lemma B.4). We also thank Rotem Arnon-Friedman and Thomas Vidick for useful discussions about applications of the entropy accumulation theorem, Carl Miller for discussions on randomness expansion and Marco Tomamichel as well as the anonymous reviewers for comments on the manuscript. FD acknowledges the financial support of the Czech Science Foundation (GA ČR) project no GA16-22211S and of the European Commission FP7 Project RAQUEL (Grant No. 323970). OF acknowledges support from the French National Research Agency via Project No. ANR-18-CE47-0011 (ACOM) and from LABEX MILYON (ANR-10-LABX-0070) of Université de Lyon, within the program “Investissements d’Avenir" (ANR-11-IDEX-0007). RR acknowledges funding from the RAQUEL project, from the Swiss National Science Foundation (via Grant No. 200020-135048 and the National Centre of Competence in Research “Quantum Science and Technology”), from the European Research Council (Grant No. 258932), and from the US Air Force Office of Scientific Research (Grant Nos. FA9550-16-1-0245 and FA9550-19-1-0202).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by M. M. Wolf
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: The Function \(\Vert \cdot \Vert _\alpha \)
We use an extension of the Schatten \(\alpha \)-norm to the regime where \(\alpha > 0\), which is defined for any operator \(X = X_{B \leftarrow A}\) from a space A to a space B by
It follows from the Singular Value Theorem that \(\Vert X \Vert _\alpha = \Vert X^{\dagger } \Vert _\alpha = \Vert X^\intercal \Vert _{\alpha } = \Vert {\overline{X}} \Vert _{\alpha }\) (see, e.g, Section 2 of [61]), from which it also follows that
Note also that
The following is Lemma 12 from [37].
Lemma A.1
For any non-negative operator X and for any \(\alpha \in {\mathbb {R}}^+\)
where the supremum and infimum range over density operators Z.
Appendix B: Properties of the Sandwiched Rényi Entropies
The sandwiched Rényi entropy from Definition 2.3 is a special case of the sandwiched Rényi relative entropy, which is defined as follows.
Definition B.1
For two density operators \(\rho \) and \(\sigma \) on the same Hilbert space and for \(\alpha \in (0, 1) \cup (1, \infty )\) the sandwiched relative Rényi entropy of order \(\alpha \) is defined as
where \(\alpha ' = \frac{\alpha - 1}{\alpha }\).
In particular, for a bipartite density operator \(\rho _{A B}\), the sandwiched \(\alpha \)-Rényi entropy of A conditioned on B is related to this relative entropy by
It turns out that this is not the only way to define a conditional entropy based on a relative entropy. One popular alternative is to replace the marginal \(\rho _B\) by a maximisation over arbitrary density operators on B:
We refer to [53] for a comparison of the different notions.
The following Lemma corresponds to Eq. 19 of [37]. For its proof, it is convenient to represent vectors of product systems as matrices. Let and
be fixed orthonormal bases of A and B, respectively. For any vector
![](http://media.springernature.com/lw256/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ223_HTML.png)
we define the linear operator
![](http://media.springernature.com/lw223/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ224_HTML.png)
We emphasise that this definition is basis-dependent. Therefore, in expressions that involve this operator as well as the transpose operation \(Z \mapsto Z^{\intercal }\), it is understood that both are taken with respect to the same basis. It is straightforward to prove the following properties (see, e.g., Section 2.4 of [61]). For any operators \(X_{A' \leftarrow A}\) and \(Y_{B' \leftarrow B}\),
![](http://media.springernature.com/lw427/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ76_HTML.png)
Furthermore,
![](http://media.springernature.com/lw624/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ77_HTML.png)
and, hence,
![](http://media.springernature.com/lw179/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ78_HTML.png)
Lemma B.2
For any density operators \(\rho \) and \(\sigma \) on the same Hilbert space and for \(\alpha \in (0, 1) \cup (1, \infty )\) we have
![](http://media.springernature.com/lw294/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ225_HTML.png)
where is a purification of \(\rho \) and where the supremum ranges over all density operators \(\tau \) on the purifying system. In particular, for any pure
we have
![](http://media.springernature.com/lw328/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ226_HTML.png)
Proof
Let us denote by A the Hilbert space on which \(\rho \) and \(\sigma \) act and by E the purifying space, so that is a vector on \(A \otimes E\). Then, using (72) and (77), the sandwiched Rényi entropy can be written as
![](http://media.springernature.com/lw413/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ227_HTML.png)
Using Lemma A.1 as well as (72) and (76) we obtain
![](http://media.springernature.com/lw457/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ228_HTML.png)
where the supremum is taken over density operators \(\tau \) on E. The first equality of the lemma then follows by (78). Finally, the second equality is obtained via (74). \(\quad \square \)
The next lemma concerns the conditioning on classical information.
Lemma B.3
(Proposition 5.1 of [52]). For any density operator \(\rho _{A B X}\) which is classical on X, i.e.,
![](http://media.springernature.com/lw230/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ229_HTML.png)
where \(\rho _{A B | x}\) are density operators on \(A \otimes B\) and is an orthonormal basis of X, we have
Proof
Using the explicit form of \(\rho _{A B X}\), it is straightforward to verify that
![](http://media.springernature.com/lw424/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ230_HTML.png)
Taking the trace on both sides, the equality can be rewritten in terms of \(\alpha \)-entropies as
which concludes the proof. \(\quad \square \)
The following lemma can be found as Lemma 3.9 in [36]; the statement and its proof are given here for the convenience of the reader. The statement and proof can also be found in [52, Proposition 6.5].
Lemma B.4
(Lemma 3.9 from [36], a variant of Proposition 6.2 of [51]). Let \(\rho \in \mathrm {D}_{\leqslant }(A)\) and \(\sigma \in \mathrm {Pos}(A)\) with \(\mathrm {supp}(\rho ) \subseteq \mathrm {supp}(\sigma )\), and define \(\varepsilon _{\max } := \sqrt{2 \mathrm {tr}\rho - (\mathrm {tr}\rho )^2}\). For \(\varepsilon \in (0, \varepsilon _{\max })\) and \(\alpha \in (1,2]\), we have
where \(g(\varepsilon ) = -\log \left( 1 - \sqrt{1 - \varepsilon ^2} \right) \), and \(D_{\max }^{\varepsilon }(\rho \Vert \sigma ) = \inf _{{\tilde{\rho }}} \inf \{ \lambda : {\tilde{\rho }} \leqslant 2^{\lambda } \sigma \}\), with the infimum ranging over all \({\tilde{\rho }}\) within \(\varepsilon \) of \(\rho \) in purified distance.
Proof
Assume without loss of generality that \(\sigma \) has full support. By Lemma 6.1 in [51], we can find a \(\lambda \) such that \(\lambda \geqslant D_{\max }^{\varepsilon }(\rho \Vert \sigma )\) where
and \(\Delta \) is the positive part of \(\rho - 2^{\lambda } \sigma \).Footnote 23 It suffices to upper-bound \(\lambda \) by \(D_{\alpha }(\rho \Vert \sigma ) + g(\varepsilon )/(\alpha -1)\). Now, let be an orthonormal basis consisting of eigenvectors of \(\rho - 2^{\lambda } \sigma \). Let \(S_+\) be the subset of S corresponding to positive eigenvalues. Define the non-negative numbers
and
. Note that for \(i \in S_{+}\), we have
and therefore \(\frac{r_i}{s_i} 2^{-\lambda } \geqslant 1\). We use this to bound
Hence,
Now, we solve Eq. (79) for \(\mathrm {tr}[\Delta ]\) and bound
It remains to upper-bound \(\frac{1}{\alpha -1} \log \sum _{i \in S} r_i^{\alpha } s_i^{1-\alpha }\) by \(D_{\alpha }(\rho \Vert \sigma )\). To this end, we define the TPCP map \({\mathcal {F}}(X) = \sum _{i \in S} P_i X P_i\), where \(P_i\) denotes the projector onto the subspace spanned by \(e_i\). Note that
The theorem then follows from the data processing inequality. \(\quad \square \)
The following two lemmas relate the entropy conditioned on a classical value x to the unconditioned entropy.
Lemma B.5
Let \(\rho _{AB}\) be a quantum state of the form \(\rho = \sum _x p_x {\rho }_{AB|x}\), where \(\{ p_x \}\) is a probability distribution over \({\mathcal {X}}\). Then, for any \(x \in {\mathcal {X}}\) and any \(\alpha \in (1, \infty )\),
and for \(\alpha \in (0,1)\),
Proof
For any \(\sigma _{B}\) and \(\alpha \in (1,\infty )\), we have
For the first inequality, we used the fact that \(\rho _{AB} = \sum _{x'} p_{x'} \rho _{AB|x'} \geqslant p_x \rho _{AB|x}\), which implies that \(\sigma _B^{\frac{1-\alpha }{2\alpha }} \rho _{AB} \sigma _B^{\frac{1-\alpha }{2\alpha }} \geqslant \sigma _B^{\frac{1-\alpha }{2\alpha }} p_x \rho _{AB|x} \sigma _B^{\frac{1-\alpha }{2\alpha }}\). We then used the fact that \(y \mapsto y^{\alpha }\) is a monotone function on \([0, \infty )\). Taking the infimum over \(\sigma _B\) and then multiplying both sides by \(-1\), we get the desired result. The proof is the same for \(\alpha \in (0,1)\) except that the direction of the inequality is reversed. \(\quad \square \)
Lemma B.6
Let \(\rho _{AB}\) be a quantum state of the form \(\rho = \sum _x p_x {\rho }_{AB|x}\), where \(\{ p_x \}\) is a probability distribution over \({\mathcal {X}}\). Then, for any \(x \in {\mathcal {X}}\) and any \(\alpha \in (1, 2]\),
Proof
We define the state . Note that it is legitimate to use the notation \(\rho \) as the reduced state on \(A \otimes B\) corresponds to \(\rho _{AB}\). As conditioning can only decrease the entropy, we obtain
\(\square \)
Lemma B.7
Let \({\mathcal {E}}\) be a TPCP map from \(A \otimes B\) to \(A \otimes B \otimes X\) defined by , where \(t : {\mathcal {Y}}\times {\mathcal {Z}}\rightarrow {\mathcal {X}}\) is a (deterministic) function, \(\{\Pi _{y,A}\}_{y \in {\mathcal {Y}}}\) and \(\{\Pi _{z, B}\}_{z \in {\mathcal {Z}}}\) are mutually orthogonal projectors acting on A and B, respectively, and
is an orthonormal basis on X. Let \(\rho _{ABX} = {\mathcal {E}}(\omega _{AB})\), for an arbitrary state \(\omega _{AB}\). Then for \(\alpha \in [\frac{1}{2}, \infty )\), we have
Proof
We only prove Eq. (83). Eq. (84) is easier. Let \({\mathcal {M}}\) be the TPCP map from B to B defined by \({\mathcal {M}}(W_B) = \sum _{z} \Pi _{z, B} W_{B} \Pi _{z, B}\). Using the data processing inequality and the fact that \(({\mathcal {I}}_{AX} \otimes {\mathcal {M}})(\rho _{ABX}) = \rho _{ABX}\), we have
Similarly,
We now show that for any state \(\sigma _B\), we have \(D_{\alpha }(\rho _{ABX} \Vert \mathrm {id}_{AX} \otimes {\mathcal {M}}(\sigma _B)) = D_{\alpha }(\rho _{AB} \Vert \mathrm {id}_{A} \otimes {\mathcal {M}}(\sigma _B))\). To make the notation lighter, we use in the following \(\Pi _{z}\) for \(\Pi _{z,B}\) and \(\Pi _{y}\) for \(\Pi _{y,A}\). The relative entropy \(D_{\alpha }(\rho _{ABX} \Vert \mathrm {id}_{AX} \otimes {\mathcal {M}}(\sigma _B))\) is defined in terms of
![](http://media.springernature.com/lw605/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ231_HTML.png)
where we used multiple times the orthogonality of the family \(\{\Pi _{z}\}\) and of the family \(\{\Pi _{y}\}\). Similarly, \(D_{\alpha }(\rho _{AB} \Vert \mathrm {id}_{A} \otimes {\mathcal {M}}(\sigma _B))\) is defined in terms of
And this concludes the proof. \(\quad \square \)
In the subsequent arguments we will use the quantity
which is defined for any non-negative operators \(\rho \) and \(\sigma \) on the same space and for any \(\alpha \in [0, 1) \cup (1, \infty )\). As observed in [17, 24, 64], it follows from the Araki-Lieb-Thirring inequality [3, 31] that
Furthermore, we can define a conditional entropy based on this quantity:
In [53, Theorem 2], it is shown that \(H'\) and H are duals of each other, in the sense that
for any pure state \(\rho _{ABC}\).
The following lemma is another variant of Lemma 8 of [54] (see also Lemma 6.3 of [51]).
Lemma B.8
Let \(\rho \) be a density operator and \(\sigma \) a non-negative operator, let \(\eta = \max (4, 2^{D'_2(\rho \Vert \sigma )} + 2^{-D'_0(\rho \Vert \sigma )} + 1)\), and let \(\alpha \in (1, 1 + 1/ \log \eta ) \). Then
Proof sketch.
The proof proceeds in the same way as the proof of Lemma 8 of [54]. The idea is to consider, for any \(\beta > 0\), the functions \(r_{\beta }\) and \(s_{\beta }\) from \({\mathbb {R}}^+\) to \({\mathbb {R}}^+\) defined by
It can be readily verified that \(r_\beta (t) \leqslant s_\beta (t)\) for all \(t > 0\), that \(s_{\beta }(t) = s_{\beta }(1/t)\), that \(s_{\beta }(t)\) is monotonically increasing for \(t > 1\), and that \(s_{\beta }(t)\) is concave for \(\beta < 1/2\) and \(t \geqslant 3\). It is then shown that
![](http://media.springernature.com/lw290/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ232_HTML.png)
where \(\beta = \alpha - 1\), \(X = \rho \otimes {\sigma ^{-1}}^T\), and with
.
From there, we proceed in a slightly different way, noting that
Using this as well as Lemma 11 of [54], we obtain
![](http://media.springernature.com/lw519/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ233_HTML.png)
which holds because \(s_\beta \) is concave for \(\beta \leqslant \frac{1}{\log \eta } \leqslant \frac{1}{2}\), and because the eigenvalues of \(X+\frac{1}{X} + {\mathbb {I}}\) lie in the interval \([3, \infty )\). Using that
![](http://media.springernature.com/lw328/springer-static/image/art%3A10.1007%2Fs00220-020-03839-5/MediaObjects/220_2020_3839_Equ234_HTML.png)
and combining the inequalities above, we find
Applying Taylor’s theorem to an expansion around \(\beta = 0\) gives
where for the last inequality, we use the fact that \(\ln 2 \cosh (\ln 2) < 1\). \(\quad \square \)
The following lemma is a generalisation of Proposition 3.10 of [36]. In [52, Section 6.4.2], using a Taylor approximation, the factor in front of \((\alpha - 1)\) can be improved although at the price of having the error term containing non-explicit constants.
Lemma B.9
For any density operator \(\rho _{A B}\) and \(1< \alpha < 1+ 1/\log (1 + 2 d_A)\)
where \(d_A = \dim A\).
Proof
We start with the proof of the first inequality. Lemma B.8 implies that
holds for all \(1< \alpha < 1 + \frac{1}{\log (1 + 2 d_A)}\), where we have used that
Furthermore, because of (87) we have
Combining this with the above concludes the proof of the first inequality.
The second inequality follows directly from the monotonicity of the relative Rényi entropy in \(\alpha \) [8, 37].
To prove the last inequality, we again use the duality relation (88):
We may now again use Lemma B.8 to obtain
Combining the two inequalities with the fact that \(-H(A|C) = H(A|B)\) concludes the proof. \(\quad \square \)
The following lemma generalises a classical result originally proposed in [44]. It follows rather directly from similar statements proved in [34, 36, 51, 54].
Lemma B.10
For any density operator \(\rho \), any non-negative operator \(\sigma \), any \(\alpha \in (1, 2]\), and any \(0< \varepsilon < 1\),
where \(g(\varepsilon ) = - \log (1 - \sqrt{1-\varepsilon ^2}) < \log (2/\varepsilon ^2)\).
Proof
For the first inequality, we use Lemma B.4, which directly implies that
The desired inequality then follows because
To prove the second inequality we use the duality between smooth min- and max-entropy [55], which asserts that
hols for any purification \(\rho _{A B C}\) of \(\rho _{A B}\). We can then employ Proposition 6.2 of [51],Footnote 24
The claim then follows from (88):
\(\square \)
Appendix C: Necessity of the Markov Chain Conditions
The aim of this section is to illustrate that the Markov chain conditions in Theorem 4.4 are important, in the sense that dropping them completely would render the statement invalid.
We first recall that a tri-partite density operator \(\rho _{A B C}\) has the Markov state property \(A \leftrightarrow B \leftrightarrow C\) if and only if the mutual information between A and C conditioned on B equals zero, i.e., \({I(A : C | B)_{\rho }} = 0\) (see [26, 39], as well as [23] for a robust version). Using the properties of the conditional mutual information, one can easily derive the following claims:
-
Symmetry: \(A \leftrightarrow B \leftrightarrow C\) implies \(C \leftrightarrow B \leftrightarrow A\)
-
Local processing of endpoints: \(A A' \leftrightarrow B \leftrightarrow C\) implies \(A \leftrightarrow B \leftrightarrow C\)
-
Centering of information: \(A A' \leftrightarrow B \leftrightarrow C\) implies \(A \leftrightarrow A' B \leftrightarrow C\)
-
Composition: \(A \leftrightarrow B \leftrightarrow C\) and \(A' \leftrightarrow A B \leftrightarrow C\) imply \(A A' \leftrightarrow B \leftrightarrow C\).
By the standard properties of Markov chains described above, it is straightforward to show that the set of Markov chain conditions (27) for a trivial E system is equivalent to the set of conditions
Similarly, using the composition property, one can show that this set of conditions is equivalent to the set of conditions
The latter can be expressed in terms of the entropy equalities
The entropy accumulation statement for the smooth min-entropy in the simplified form (3) can thus be rewritten as
Note that, if one replaces the smooth min-entropy on the left hand side by the von Neumann entropy then this expression looks similar to the usual chain rule for von Neumann entropies,
which holds for arbitrary \(\rho _{A_1^n B_1^n}\). One may therefore wonder whether (91) may also hold without the Markov conditions (90). This is however not the case, as we are going to show with a specific example.
The example is classical, in the sense that \(A_1, \ldots , A_n\) and \(B_1, \ldots B_n\) correspond to random variables and the map \({\mathcal {M}}_i\) takes \(a_1^{i-1} b_1^{i-1}\) as input and outputs \(a_1^{i-1} b_1^{i-1}\) as is, together with \(A_i B_i\) generated from the conditional distribution \(\rho _{A_i B_i|A_1^{i-1} = a_1^{i-1}, B_1^{i-1} = b_1^{i-1}}\). With this setup, (91) can be written as
Actually, (92) is even weaker than what would follow from (91) as we are taking the infimum also over \(b_{i+1}^{n}\) but we will see that even this weaker inequality is false. In fact, consider the following construction, let \(B_1, B_2, \ldots , B_n\) be n mutually independent and uniformly distributed n-bit strings. Furthermore let C be a uniform random bit and let \(A = A_1^n\) be an n-bit-string defined by
where \(\oplus \) denotes the bit-wise addition modulo 2. In other words, with probability 1/2, the string A is fully determined by \(B_1, \dots , B_n\), and with probability 1/2, A is completely random. We then have, for \(\varepsilon \ll 1\),
Furthermore, for any i and for any fixed \(a_1^{i-1}\), \(b_1^{i-1}\), and \(b_{i+1}^n\), we have
for the bit \(A_i\) is random with probability 1/2. Since there are n such terms in the sum on the right hand side of (92), it scales linearly in n. But we have just seen that the left hand side is roughly equal to 1. This shows that this inequality, and hence also the quantum version (91), cannot hold in general if we drop the Markov chain conditions (90).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Dupuis, F., Fawzi, O. & Renner, R. Entropy Accumulation. Commun. Math. Phys. 379, 867–913 (2020). https://doi.org/10.1007/s00220-020-03839-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00220-020-03839-5