Allowed region and optimal measurement for information versus disturbance in quantum measurements

We present graphs of information versus disturbance for general quantum measurements of completely unknown states. Each piece of information and disturbance is quantified by two measures: (i) the Shannon entropy and estimation fidelity for the information and (ii) the operation fidelity and physical reversibility for the disturbance. These measures are calculated for a single outcome and are plotted on four types of information--disturbance planes to show their allowed regions. In addition, we discuss the graphs of these metrics averaged over all possible outcomes and the optimal measurements when saturating the upper bounds on the information for a given disturbance. The results considerably broaden the perspective of trade-offs between information and disturbances in quantum measurements.


Introduction
In quantum theory, a measurement that provides information about a system inevitably disturbs the state of the system, unless the original state is a classical mixture of the eigenstates of an observable. This feature is not only of great interest to the foundations of quantum mechanics but also plays an important role in quantum information processing and communication [1], such as in quantum cryptography [2][3][4][5]. As a result, the relationship between information and disturbances has been the subject of numerous studies [6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22] over many years. Most studies have only discussed the disturbance in terms of the size of the state change. However, the disturbance can also be discussed in terms of the reversibility of the state change [23][24][25][26] because the state change can be recovered with a nonzero probability of success if the measurement is physically reversible [27][28][29].
Intuitively, if a measurement provides more information about a system, the measurement changes the state of the system by a greater degree and the change becomes more irreversible. To show this trade-off, various inequalities have been derived using different formulations. For example, Banaszek [7] derived an inequality between the amount of information gain and the size of the state change using two fidelities, and Cheong and Lee [25] derived an inequality between the amount of information gain and the reversibility of the state change using the fidelity and reversal probability. These inequalities have been verified [30][31][32][33] in single-photon experiments.
In this paper, we present graphs of information versus disturbance for general quantum measurements of a d-level system in a completely unknown state. The information is quantified by the Shannon entropy [6] and the estimation fidelity [7], whereas the disturbance is quantified by the operation fidelity [7] and the physical reversibility [34]. These metrics are calculated for a single outcome using the general formulas derived in Ref. [26] and are plotted on four types of information-disturbance planes to show the allowed regions. Moreover, we show the allowed regions for these metrics averaged over all possible outcomes via an analogy with the center of mass. The allowed regions explain the structure of the relationship between the information and disturbance including both the upper and lower bounds on the information for a given disturbance, even though the lower bounds can be violated by non-quantum effects such as classical noise and the observer's non-optimal estimation. In particular, optimal measurements saturating the upper bounds are shown to be different for the four types of informationdisturbance pairs. Therefore, our results broaden our understanding of the effects of quantum measurements and provide a useful tool for quantum information processing and communication.
Two of the above bounds have been shown by Banaszek [7] and Cheong and Lee [25] to be inequalities for the average values via different methods than ours. The most important difference is that they directly discussed the information and disturbance averaged over outcomes, whereas we start with those pertaining to each single outcome derived [26] in the context of a physically reversible measurement [27][28][29]. Even though trade-offs between information and disturbance are conventionally discussed using the average values [6,7,9,10,16,18], physically reversible measurements strongly imply trade-offs at the level of a single outcome [11]. That is, in a physically reversible measurement, whenever a second measurement called the reversing measurement recovers the pre-measurement state of the first measurement, it erases all the information obtained by the first measurement (see the Erratum of Ref. [35]). This state recovery with information erasure occurs not on average but only when the reversing measurement yields a preferred single outcome. Moreover, starting from the level of a single outcome greatly simplifies the derivation of the allowed regions and optimal measurements. It is easy to show the allowed regions pertaining to a single outcome because the information and disturbance pertaining to a single outcome contain only a definite number of bounded parameters and have some useful invariances under parameter transformations. From these allowed regions, the allowed regions for the average values are shown using a graphical method based on an analogy with the center of mass, which makes it easy to construct the optimal measurements. In fact, without our method, it would be difficult to find all of the bounds and optimal measurements. The rest of this paper is organized as follows. Section 2 reviews the procedure for quantifying the information and disturbances in quantum measurements. Sections 3 and 4 show the allowed regions for information and disturbance pertaining to a single outcome and those for the average values over all possible outcomes. Section 5 discusses the optimal measurements to show their differences for the four types of information-disturbance pairs. Section 6 summarizes our results.

Information and Disturbance
First, the amount of information provided by a measurement is quantified. Suppose that the d-level system to be measured is known to be in one of a set of predefined pure states {|ψ(a) }. The probability for |ψ(a) is given by p(a); however, which |ψ(a) is actually assigned to the system is unknown. Here we focus on the case where no prior information concerning the system is available, assuming that {|ψ(a) } is a set of all the possible pure states and that p(a) is uniform according to the normalized invariant measure over the pure states. Because {|ψ(a) } in this case is a continuous set of states, the index a actually represents a set of continuous parameters such as the hyperspherical coordinates in 2d dimensions as in Ref. [26], where the summation over a is replaced with an integral over the coordinates using the hyperspherical volume element.
It is measured to obtain information about the state of the system. A quantum measurement can be described by a set of measurement operators where m denotes the outcome of the measurement andÎ is the identity operator. Here, the quantum measurement has been assumed to be ideal [36] or efficient [8] in the sense that it does not have classical noise yielding mixed post-measurement states because we focus on the quantum nature of the measurement. When the system is in a state |ψ(a) , the measurement {M m } yields an outcome m with probability changing the state into Each measurement operator can be decomposed by a singular-value decomposition, such asM whereÛ m andV m are unitary operators andD m is a diagonal operator in an orthonormal basis {|i } with i = 1, 2, . . . , d such that The diagonal elements {λ mi } are called the singular values ofM m and satisfy 0 ≤ λ mi ≤ 1. From the outcome m, the state of the system can be partially deduced. For example, Bayes's rule states that, given an outcome m, the probability that the state was |ψ(a) is given by where p(m) is the total probability of outcome m, That is, the outcome m changes the probability distribution for the states from {p(a)} to {p(a|m)}. This change decreases the Shannon entropy, which is known as a measure of the lack of information: Therefore, I(m), which we define as the information gain, quantifies the amount of information provided by the outcome m of the measurement {M m } [11,37] and is explicitly written in terms of the singular values of M m as [26] where Note that I(m) satisfies The average of I(m) over all outcomes, is equal to the mutual information [6] between the random variables {a} and {m}, with p(m, a) = p(m|a) p(a) because p(a) is uniform. Alternatively, the state of the system can be estimated as a state |ϕ(m) depending on the outcome m. In the optimal estimation [7], |ϕ(m) is the eigenvector ofM † mM m corresponding to its maximum eigenvalue. The quality of the estimate is evaluated by the estimation fidelity such that As was found for I(m), G(m) also quantifies the amount of information where λ m,max is the maximum singular value ofM m . Note that G(m) satisfies The average of G(m) over all outcomes, becomes the mean estimation fidelity discussed in Ref. [7] because even though G(m) was not derived in Ref. [7]. Note that G can be derived from G(m); however, G(m) cannot be derived from G. That is, G(m) characterizes the measurement {M m } in more detail than G. Next, the degree of disturbance caused by the measurement is quantified. When the measurement {M m } yields an outcome m, the state of the system changes from |ψ(a) to |ψ(m, a) , as given in Eq. (3). The size of this state change is evaluated by the operation fidelity such that F (m) quantifies the degree of disturbance caused when the measurement {M m } yields the outcome m and is explicitly written in terms of the singular values ofM m as [26] F (m) = 1 d + 1 where Note that F (m) satisfies Similar to G(m), the average of F (m) over all outcomes, becomes the mean operation fidelity discussed in Ref. [7], even though F (m) was not derived in Ref. [7]. In addition to the size of the state change, the reversibility of the state change can also be regarded as a measure of the disturbance. Even though |ψ(a) and |ψ(m, a) are unknown, this state change is physically reversible ifM m has a bounded left inverseM −1 m [28,29]. To recover |ψ(a) , a second measurement called a reversing measurement is made on |ψ(m, a) . The reversing measurement is described by another set of measurement operators {R and, moreover,R (m) where µ denotes the outcome of the reversing measurement. When the reversing measurement yields the preferred outcome µ 0 , the state of the system reverts to |ψ(a) via the state change caused by the reversing measurement becauseR For the optimal reversing measurement [34], the probability of recovery is given by where λ m,min is the minimum singular value ofM m . The reversibility of the state change is then evaluated by this maximum successful probability as The average of R(m) over all outcomes, is the degree of physical reversibility of a measurement discussed in Ref. [34], whose explicit form in terms of the singular values is given in Ref. [25], even though R(m) was not derived in Ref. [25]. Therefore, the information and disturbance for a single outcome m are obtained as functions of the singular values ofM m : I(m) and G(m) for the information and F (m) and R(m) for the disturbance. Note that they are invariant under the interchange of any pair of singular values, and under rescaling of all the singular values, by a constant c [26]. By contrast, the probability for the outcome m, p(m) = σ 2 m /d, is invariant under the interchange but is not invariant under the rescaling.
As an important example, considerM k,l (λ), which is defined as a measurement operator whose singular values are with 0 ≤ λ ≤ 1. Even though the information and disturbance forM (d) k,l (λ) can be calculated from Eqs. (9), (15), (20), and (27), calculating I(m) is not straightforward due to the degeneracy of the singular values. By taking the limit λ mi → λ mk , I(m) is found to be [26] k,1 (λ) and Similarly,P (d) r is defined as a projective measurement operator of rank r. Note thatM

Allowed Region
Next, we plot the information and disturbance for various measurement operators on a plane. In particular, an allowed region for information versus disturbance can be shown on the plane by plotting all physically possible measurement operators; that is, by varying every singular value over the range of 0 ≤ λ mi ≤ 1. It is easy to do this for I(m), G(m), F (m), and   Figure 2 shows the allowed regions when d = 8 in blue (dark gray).
The above boundaries, (1, d − 1) and (k, 1), were first confirmed by bruteforce numerical calculations where every singular value was varied by steps of ∆λ mi = 0.01 for d = 2, 3, . . . , 6 and ∆λ mi = 0.02 for d = 7, 8. Moreover, for G(m) versus F (m) and for G(m) versus R(m), the boundaries can analytically be proven to be the true boundaries for arbitrary d (see Appendix A). Unfortunately, however, for I(m) versus F (m) and for I(m) versus R(m), proving that the boundaries are the true boundaries is difficult analytically. Nevertheless, they can be shown to satisfy the necessary conditions for the true boundaries using the Karush-Kuhn-Tucker (KKT) conditions [39], which generalize the method of Lagrange multipliers to handle inequality constraints in mathematical optimization. For example, to find the lower boundary for I(m) versus F (m), consider minimizing I(m) subject to F (m) = F 0 and λ mi ≥ 0 (i = 1, 2, . . . , d). Then,M (d) k,1 (λ) satisfies a necessary condition for a local minimum, that is, for a Lagrange function (2,1)

Average over Outcomes
Here, the regions that are allowed for the information and disturbance averaged over all possible outcomes are discussed: I and G for the information and F and R for the disturbance. Unfortunately, it is difficult to show the allowed regions directly from their explicit forms written in terms of the singular values because the number of singular values contained in them is not definite due to the indefinite number of outcomes. Note that there are no physical limitations on the number of outcomes.
Instead, we show the allowed regions using the following analogy with the center of mass. In the measurement {M m }, each measurement operator M m corresponds to a point R m in the allowed region pertaining to a single outcome with weight p(m). This situation can be viewed as a set of particles, each with a mass p(m) located at a point R m . The center of mass of these particles then indicates the average information and disturbance of the measurement. Conversely, for an arbitrary set of particles located in the allowed region pertaining to a single outcome, an equivalent measurement satisfying Eq. (1) can be constructed by rescaling and duplicating the measurement operators, as shown in Appendix C. For example, for d = 4, two particles with the same mass 1/2 located at P 1 and P 4 in Fig. 1 can be simulated by a measurement with five outcomes whose measurement operators arê Therefore, the allowed region for the average information and disturbance can be shown by considering the center of mass of all possible sets of particles. Note that the center of mass may be located outside the region where the particles are situated, which means that the allowed region is extended by averaging over the outcomes. The resultant region is the convex hull of the original region. The regions extended by averaging are shown in Fig. 1 in yellow (light gray). As shown in Fig. 1(a), the lower boundary for G versus F is extended to the straight lines between P k and P k+1 for k = 1, 2, . . . , d − 1, whereas the upper boundary is not extended due to its convexity. By contrast, as shown in Fig. 1(b), the boundaries for G versus R are not extended at all. Meanwhile, as shown in Fig. 1(c), the lower boundary for I versus F is extended as in the case of G and, moreover, the upper boundary is extended a little higher when d ≥ 3 because the line (1, d − 1) has a slight dent near P d . In fact, an analytic calculation ofM near P d when d ≥ 3. The upper boundary is therefore extended to the tangent line drawn from P d to the line (1, d − 1) between P d and the point of tangency T. As shown in Fig. 1(d), the upper boundary for I versus R is extended to the straight line between P 1 and P d , whereas the lower boundary is not extended. The case of d = 8 is shown in Fig. 2.
To find the point T on the upper boundary for I versus F , two line slopes are defined as functions of λ: the slope of the tangent line to the line (1, d−1) at the point Q corresponding toM and the slope of the straight line from P d to Q, These functions are shown for d = 4 in Fig. 3. Using λ T such that the measurement operator corresponding to T can be written asM 1,d−1 (λ T ) is the most efficient measurement operator in terms of the ratio of information gain to fidelity loss [26], The upper boundary for G versus F and that for G versus R are equivalent to the inequalities of Banaszek [7] and Cheong and Lee [25], respectively, where the averages are explicitly calculated using p(m) = σ 2 m /d. However, to our knowledge, this is the first derivation of the other two upper and four lower boundaries. The lower boundaries are less important than the upper boundaries in quantum information and can be violated by non-ideal measurements, which have classical noise yielding mixed post-measurement states, or by non-optimal estimations, which assume unwise observers making incorrect choices for |ϕ(m) in G(m). Nevertheless, for the foundations of quantum mechanics, it is worth deriving both the upper and lower boundaries for ideal measurements with optimal estimation to examine the intrinsic nature and power of quantum measurements.
The case of d = 2 is a special case, where the regions extended by averaging are the main parts of the allowed regions, as shown in Fig. 5. In this case, the allowed regions pertaining to a single outcome shrink to the line (1, 1) because a measurement operator can be represented by a single parameter via the rescaling invariance in Eq. (31) [24]. Moreover, the line (1, 1) in Fig. 5(c) has no dent unlike the case of d ≥ 3. In fact, it can be shown forM

Optimal Measurement
Finally, we discuss the optimal measurements saturating the upper bounds on the information for a given disturbance. The upper bounds are denoted by the upper boundaries of the allowed regions for the average information and disturbance. Therefore, according to the analogy with the center of mass, a measurement is optimal for an information-disturbance pair if it is equivalent to a set of particles whose center of mass is on the upper boundary for that information-disturbance pair. The optimal measurements are different for the four types of information-disturbance pairs because the upper boundaries have different shapes on the four information-disturbance planes, as shown in Fig. 1.
The conditions for the optimal measurements are as follows. A measurement {M m } is optimal for G versus F if allM m 's correspond to an identical   Fig. 1(a), whereas it is optimal for G versus R if everyM m corresponds to a point on the line (1, d − 1) because the upper boundary for G versus R is the straight line (1, d − 1), as shown in Fig. 1(b). These conditions are equivalent to those in Refs. [7,25]. Similarly, when d ≥ 3, a measurement {M m } is optimal for I versus F if allM m 's correspond to an identical point between T and P 1 on the line (1, d − 1) or if everyM m corresponds to either P d or T because the upper boundary for I versus F is the union of the convex curve (1, d − 1) between T and P 1 and the straight line between P d and T, as shown in Fig. 1(c). However, when d = 2, the condition to be optimal for I versus F is the same as that for G versus F because the upper boundary is just the convex curve (1, d − 1), as shown in Fig. 5(c). Conversely, a measurement {M m } is optimal for I versus R if everyM m corresponds to either P d or P 1 because the upper boundary for I versus R is the straight line between P d and P 1 , as shown in Fig. 1(d).
Interestingly, an optimal measurement for G versus F is not necessarily optimal for I versus F and an optimal measurement for G versus R is not necessarily optimal for I versus R. The relationships between the four conditions are illustrated in Fig. 7, excluding the strongest measurement, where all the measurement operators correspond to P 1 , and the weakest measurement, where all the measurement operators correspond to P d ; these two measure- Figure 7: Four conditions for optimal measurements. For example, the set G-F represents all measurements that are optimal for G versus F . ments satisfy all four conditions. As a specific example, consider a measurement {M with 0 < λ < 1. For a given λ, allM    Fig. 1(c). It is optimal for I versus F only if λ ≤ λ T , with λ T being defined by Eq. (43). Note thatM   m (λ)} is not optimal for I versus F if λ > λ T or equivalently if F > F T . The optimal measurement for this case can easily be constructed from the analogy with the center of mass by considering two particles: one located at T with mass q and the other located at P d with mass 1 − q. According to Appendix C, the optimal measurement has d + 1 outcomes whose measurement operators arê m (λ)} is optimal for I versus F for arbitrary λ because the line (1, 1) is equal to the upper boundary, as shown in Fig. 5(c).
Conversely, the measurement {M (d) m (λ)} is not optimal for I versus R for any λ because the line (1, d − 1) is not equal to the upper boundary at all, as shown in Fig. 1(d). In this case, the upper boundary is the straight line between P d and P 1 . Therefore, the optimal measurement for I versus R can be constructed from the analogy with the center of mass by considering two particles: one located at P 1 with mass q and the other located at P d with mass 1 − q. This has d + 1 outcomes whose measurement operators arê where q = 1 − R for a given R. The average information and disturbance of this measurement are indicated by a point on the straight line between P d and P 1 equal to the upper boundary. Of course, the measurements given in Eqs. (47) and (48) are also optimal for G versus R for arbitrary q. Even though their measurement operators correspond to different points on the line (1, d − 1), the point indicating the average values is still on the line (1, d − 1) equal to the upper boundary because the line (1, d − 1) is straight, as shown in Fig. 1(b). However, except for q = 0 or 1, the measurement in Eq. (47) is optimal neither for G versus F nor for I versus R and the measurement in Eq. (48) is optimal neither for G versus F nor for I versus F .

Summary
In summary, we have shown the allowed regions for information versus disturbance for quantum measurements of completely unknown states. The information and disturbances pertaining to a single outcome are quantified using the singular values of the measurement operator and are plotted on four types of information-disturbance planes to show the allowed regions pertaining to a single outcome. The allowed regions for the average values are also discussed via an analogy with the center of mass. These regions explicitly give not only the upper bounds but also the lower bounds on the information for a given disturbance together with the optimal measurements saturating the upper bounds. Consequently, our results broaden our perspective of quantum measurements and provide a useful tool for quantum information processing and communication.

B Mathematical Optimization
Here, the mathematical optimizations of the information for a given disturbance are outlined for I(m) versus F (m) and for I(m) versus R(m) based on the method of Lagrange multipliers and its generalization known as the Karush-Kuhn-Tucker (KKT) conditions [39]. Consider maximizing I(m) subject to F (m) = F 0 using a Lagrange function with I k,l (λ) can be written as with F In addition, these {β i } satisfy the requirements as multipliers for the inequality constraints, β i ≥ 0 and β i λ mi = 0, for all i. There exists a parameter λ 0 such that F (m) forM (d) k,1 (λ 0 ) is equal to F 0 if (k + 1)/(d + 1) ≤ F 0 ≤ (k + 2)/(d + 1). That is,M (d) k,1 (λ 0 ) satisfies a necessary condition for a local minimum according to the KKT conditions. This result implies that the line (k, 1) is the lower boundary for I(m) versus F (m).
Similarly, letting λ m,min = λ md , consider maximizing I(m) subject to R(m) = R 0 and λ mi − λ md ≥ 0 (i = 1, 2, . . . , d − 1) using a Lagrange function L R = −I(m) − α R [R(m) − R 0 ] − i γ i (λ mi − λ md ) with multipliers α R and {γ i }. The derivatives of R(m) forM (d) k,l (λ) can be written when k + l = d such that a single outcome. The construction is not trivial because a measurement operator not only corresponds to a point but also gives the weight at that point. Moreover, the measurement operators must satisfy Eq. (1). Consider a set of particles, where each particle n has a mass q n and is located at a point R n in the allowed region pertaining to a single outcome. Without a loss of generality, the total mass can be assumed to be n q n = 1. By definition, there exists a measurement operatorM n with singular values {λ ni } that corresponds to the point R n . In general, its weight p(n) = σ 2 n /d is not equal to the mass q n . However, the weight can be adjusted by rescaling and duplicatingM n . That is, for a particle n, d measurement operators are introduced such thatM (1) such that n,sM † nsM ns = n q nÎ =Î when regarding a pair of indices (n, s) as an outcome m. Therefore, {M ns } is a measurement equivalent to the set of particles.
In this construction, one particle corresponds to d outcomes, even though the number of outcomes can be reduced when some singular values are degenerate. As a result, it suffices to consider measurements having at most 2d outcomes to study the allowed regions for the average values because for any point in the region there exists a set of two particles whose center of mass is located at that point.