1 Introduction

In quantum theory, a measurement that provides information about a system inevitably disturbs the state of the system, unless the original state is a classical mixture of the eigenstates of an observable. This feature is not only of great interest to the foundations of quantum mechanics but also plays an important role in quantum information processing and communication [1], such as in quantum cryptography [2,3,4,5]. As a result, the relationship between information and disturbances has been the subject of numerous studies [6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22] over many years. Most studies have only discussed the disturbance in terms of the size of the state change. However, the disturbance can also be discussed in terms of the reversibility of the state change [23,24,25,26] because the state change can be recovered with a nonzero probability of success if the measurement is physically reversible [27,28,29].

Intuitively, if a measurement provides more information about a system, the measurement changes the state of the system by a greater degree and the change becomes more irreversible. To show this trade-off, various inequalities have been derived using different formulations. For example, Banaszek [7] derived an inequality between the amount of information gain and the size of the state change using two fidelities, and Cheong and Lee [25] derived an inequality between the amount of information gain and the reversibility of the state change using the fidelity and reversal probability. These inequalities have been verified [30,31,32,33] in single-photon experiments.

In this paper, we present graphs of information versus disturbance for general quantum measurements of a d-level system in a completely unknown state. The information is quantified by the Shannon entropy [6] and the estimation fidelity [7], whereas the disturbance is quantified by the operation fidelity [7] and the physical reversibility [34]. These metrics are calculated for a single outcome using the general formulas derived in Ref. [26] and are plotted on four types of information–disturbance planes to show the allowed regions. Moreover, we show the allowed regions for these metrics averaged over all possible outcomes via an analogy with the center of mass. The allowed regions explain the structure of the relationship between the information and disturbance including both the upper and lower bounds on the information for a given disturbance, even though the lower bounds can be violated by non-quantum effects such as classical noise and the observer’s non-optimal estimation. In particular, optimal measurements saturating the upper bounds are shown to be different for the four types of information–disturbance pairs. Therefore, our results broaden our understanding of the effects of quantum measurements and provide a useful tool for quantum information processing and communication.

Two of the above bounds have been shown by Banaszek [7] and Cheong and Lee [25] to be inequalities for the average values via different methods than ours. The most important difference is that they directly discussed the information and disturbance averaged over outcomes, whereas we start with those pertaining to each single outcome derived [26] in the context of a physically reversible measurement [27,28,29]. Even though trade-offs between information and disturbance are conventionally discussed using the average values [6, 7, 9, 10, 16, 18], physically reversible measurements strongly imply trade-offs at the level of a single outcome [11]. That is, in a physically reversible measurement, whenever a second measurement called the reversing measurement recovers the pre-measurement state of the first measurement, it erases all the information obtained by the first measurement (see the Erratum of Refs. [35, 36]). This state recovery with information erasure occurs not on average but only when the reversing measurement yields a preferred single outcome.

Moreover, starting from the level of a single outcome greatly simplifies the derivation of the allowed regions and optimal measurements. It is easy to show the allowed regions pertaining to a single outcome because the information and disturbance pertaining to a single outcome contain only a definite number of bounded parameters and have some useful invariances under parameter transformations. From these allowed regions, the allowed regions for the average values are shown using a graphical method based on an analogy with the center of mass, which makes it easy to construct the optimal measurements. In fact, without our method, it would be difficult to find all of the bounds and optimal measurements.

The rest of this paper is organized as follows. Section 2 reviews the procedure for quantifying the information and disturbances in quantum measurements. Sections 3 and 4 show the allowed regions for information and disturbance pertaining to a single outcome and those for the average values over all possible outcomes. Section 5 discusses the optimal measurements to show their differences for the four types of information–disturbance pairs. Section 6 summarizes our results.

2 Information and disturbance

First, the amount of information provided by a measurement is quantified. Suppose that the d-level system to be measured is known to be in one of a set of predefined pure states \(\{| \psi (a) \rangle \}\). The probability for \(| \psi (a) \rangle \) is given by p(a); however, which \(| \psi (a) \rangle \) is actually assigned to the system is unknown. Here we focus on the case where no prior information concerning the system is available, assuming that \(\{| \psi (a) \rangle \}\) is a set of all the possible pure states and that p(a) is uniform according to the normalized invariant measure over the pure states. Because \(\{| \psi (a) \rangle \}\) in this case is a continuous set of states, the index a actually represents a set of continuous parameters such as the hyperspherical coordinates in 2d dimensions as in Ref. [26], where the summation over a is replaced with an integral over the coordinates using the hyperspherical volume element.

A quantum measurement is performed to obtain information about the state of the system. It can be described by a set of measurement operators \(\{\hat{M}_{m}\}\) [1] that satisfy

$$\begin{aligned} \sum _{m}\hat{M}_{m}^\dagger \hat{M}_{m}=\hat{I}, \end{aligned}$$
(1)

where m denotes the outcome of the measurement and \(\hat{I}\) is the identity operator. Here, the quantum measurement has been assumed to be ideal [37] or efficient [8] in the sense that it does not have classical noise yielding mixed post-measurement states because we focus on the quantum nature of the measurement. When the system is in a state \(| \psi (a) \rangle \), the measurement \(\{\hat{M}_{m}\}\) yields an outcome m with probability

$$\begin{aligned} p(m|a)=\langle \psi (a) |\hat{M}_{m}^\dagger \hat{M}_{m}| \psi (a) \rangle , \end{aligned}$$
(2)

changing the state into

$$\begin{aligned} | \psi (m,a) \rangle =\frac{1}{\sqrt{p(m|a)}}\,\hat{M}_{m}| \psi (a) \rangle . \end{aligned}$$
(3)

Each measurement operator can be decomposed by a singular-value decomposition, such as

$$\begin{aligned} \hat{M}_{m}=\hat{U}_{m}\hat{D}_{m}\hat{V}_{m}, \end{aligned}$$
(4)

where \(\hat{U}_{m}\) and \(\hat{V}_{m}\) are unitary operators and \(\hat{D}_{m}\) is a diagonal operator in an orthonormal basis \(\{| i \rangle \}\) with \(i=1,2,\ldots ,d\) such that

$$\begin{aligned} \hat{D}_{m}=\sum _{i} \lambda _{mi} | i \rangle \langle i |. \end{aligned}$$
(5)

The diagonal elements \(\{\lambda _{mi}\}\) are called the singular values of \(\hat{M}_{m}\) and satisfy \(0\le \lambda _{mi}\le 1\).

From the outcome m, the state of the system can be partially deduced. For example, Bayes’s rule states that, given an outcome m, the probability that the state was \(| \psi (a) \rangle \) is given by

$$\begin{aligned} p(a|m) =\frac{p(m|a)\,p(a)}{p(m)}, \end{aligned}$$
(6)

where p(m) is the total probability of outcome m,

$$\begin{aligned} p(m) =\sum _a p(m|a)\,p(a). \end{aligned}$$
(7)

That is, the outcome m changes the probability distribution for the states from \(\{p(a)\}\) to \(\{p(a|m)\}\). This change decreases the Shannon entropy, which is known as a measure of the lack of information:

$$\begin{aligned} I(m)&=\left[ -\sum _a p(a)\log _2 p(a)\right] \nonumber \\&\quad -\left[ -\sum _a p(a|m)\log _2 p(a|m)\right] . \end{aligned}$$
(8)

Therefore, I(m), which we define as the information gain, quantifies the amount of information provided by the outcome m of the measurement \(\{\hat{M}_{m}\}\) [11, 38] and is explicitly written in terms of the singular values of \(\hat{M}_{m}\) as [26]

$$\begin{aligned} I(m)&=\log _2d-\frac{1}{\ln 2}\Bigl [\eta (d)- 1\Bigr ] \nonumber \\&\quad {}-\log _2\sigma _{m}^2 +\frac{1}{\sigma _{m}^2} \sum _{i}\frac{\lambda _{mi}^{2d}\log _2\lambda _{mi}^2}{\prod _{k\ne i}\left( \lambda _{mi}^2-\lambda _{mk}^2\right) }, \end{aligned}$$
(9)

where

$$\begin{aligned} \eta (n) =\sum ^{n}_{k=1}\frac{1}{k}, \quad \sigma _{m}^2 =\sum _{i}\lambda _{mi}^2. \end{aligned}$$
(10)

Note that I(m) satisfies

$$\begin{aligned} 0\le I(m) \le \log _2d-\frac{1}{\ln 2}[\eta (d)- 1]. \end{aligned}$$
(11)

The average of I(m) over all outcomes,

$$\begin{aligned} I=\sum _{m} p(m)\,I(m), \end{aligned}$$
(12)

is equal to the mutual information [6] between the random variables \(\{a\}\) and \(\{m\}\),

$$\begin{aligned} I=\sum _{m,a} p(m,a)\, \log _2\frac{p(m,a)}{p(m)\,p(a)} \end{aligned}$$
(13)

with \(p(m,a)=p(m|a)\,p(a)\) because p(a) is uniform.

Alternatively, the state of the system can be estimated as a state \(| \varphi (m) \rangle \) depending on the outcome m. In the optimal estimation [7], \(| \varphi (m) \rangle \) is the eigenvector of \(\hat{M}_{m}^\dagger \hat{M}_{m}\) corresponding to its maximum eigenvalue. The quality of the estimate is evaluated by the estimation fidelity such that

$$\begin{aligned} G(m) =\sum _a p(a|m)\,\bigl |\langle \varphi (m)|\psi (a) \rangle \bigr |^2. \end{aligned}$$
(14)

As was found for I(m), G(m) also quantifies the amount of information provided by the outcome m of the measurement \(\{\hat{M}_{m}\}\) [cf. Eq. (8)] and is explicitly written in terms of the singular values of \(\hat{M}_{m}\) as [26]

$$\begin{aligned} G(m) =\frac{1}{d+1}\left( \frac{\sigma _{m}^2 +\lambda _{m,\max }^2}{\sigma _{m}^2}\right) , \end{aligned}$$
(15)

where \(\lambda _{m,\max }\) is the maximum singular value of \(\hat{M}_{m}\). Note that G(m) satisfies

$$\begin{aligned} \frac{1}{d}\le G(m) \le \frac{2}{d+1}. \end{aligned}$$
(16)

The average of G(m) over all outcomes,

$$\begin{aligned} G=\sum _{m} p(m)\,G(m), \end{aligned}$$
(17)

becomes the mean estimation fidelity discussed in Ref. [7] because

$$\begin{aligned} p(m)=\frac{\sigma _{m}^2}{d}, \quad \sum _{m} \sigma _{m}^2=d, \end{aligned}$$
(18)

even though G(m) was not derived in Ref. [7]. Note that G can be derived from G(m); however, G(m) cannot be derived from G. That is, G(m) characterizes the measurement \(\{\hat{M}_{m}\}\) in more detail than G.

Next, the degree of disturbance caused by the measurement is quantified. When the measurement \(\{\hat{M}_{m}\}\) yields an outcome m, the state of the system changes from \(| \psi (a) \rangle \) to \(| \psi (m,a) \rangle \), as given in Eq. (3). The size of this state change is evaluated by the operation fidelity such that

$$\begin{aligned} F(m) =\sum _a p(a|m)\bigl |\langle \psi (a)|\psi (m,a) \rangle \bigr |^2. \end{aligned}$$
(19)

F(m) quantifies the degree of disturbance caused when the measurement \(\{\hat{M}_{m}\}\) yields the outcome m and is explicitly written in terms of the singular values of \(\hat{M}_{m}\) as [26]

$$\begin{aligned} F(m) =\frac{1}{d+1}\left( \frac{\sigma _{m}^2 +\tau _{m}^2}{\sigma _{m}^2}\right) , \end{aligned}$$
(20)

where

$$\begin{aligned} \tau _{m} =\sum _{i}\lambda _{mi}. \end{aligned}$$
(21)

Note that F(m) satisfies

$$\begin{aligned} \frac{2}{d+1}\le F(m) \le 1. \end{aligned}$$
(22)

Similar to G(m), the average of F(m) over all outcomes,

$$\begin{aligned} F=\sum _{m} p(m)\,F(m), \end{aligned}$$
(23)

becomes the mean operation fidelity discussed in Ref. [7], even though F(m) was not derived in Ref. [7].

In addition to the size of the state change, the reversibility of the state change can also be regarded as a measure of the disturbance. Even though \(| \psi (a) \rangle \) and \(| \psi (m,a) \rangle \) are unknown, this state change is physically reversible if \(\hat{M}_{m}\) has a bounded left inverse \(\hat{M}_{m}^{-1}\) [28, 29]. To recover \(| \psi (a) \rangle \), a second measurement called a reversing measurement is made on \(| \psi (m,a) \rangle \). The reversing measurement is described by another set of measurement operators \(\{\hat{R}_\mu ^{(m)}\}\) that satisfy

$$\begin{aligned} \sum _\mu \hat{R}^{(m)\dagger }_\mu \hat{R}^{(m)}_\mu =\hat{I}, \end{aligned}$$
(24)

and, moreover, \(\hat{R}^{(m)}_{\mu _0}\propto \hat{M}_{m}^{-1}\) for a particular \(\mu =\mu _0\), where \(\mu \) denotes the outcome of the reversing measurement. When the reversing measurement yields the preferred outcome \(\mu _0\), the state of the system reverts to \(| \psi (a) \rangle \) via the state change caused by the reversing measurement because \(\hat{R}_{\mu _0}^{(m)}\hat{M}_{m}\propto \hat{I}\). For the optimal reversing measurement [34], the probability of recovery is given by

$$\begin{aligned} R(m,a)=\frac{\lambda _{m,\min }^2}{p(m|a)}, \end{aligned}$$
(25)

where \(\lambda _{m,\min }\) is the minimum singular value of \(\hat{M}_{m}\). The reversibility of the state change is then evaluated by this maximum successful probability as

$$\begin{aligned} R(m) = \sum _a p(a|m)\,R(m,a). \end{aligned}$$
(26)

As was found for F(m), R(m) also quantifies the degree of disturbance caused when the measurement \(\{\hat{M}_{m}\}\) yields the outcome m [cf. Eq. (19)] and is explicitly written in terms of the singular values of \(\hat{M}_{m}\) as [26]

$$\begin{aligned} R(m) =d\left( \frac{\lambda _{m,\min }^2 }{\sigma _{m}^2}\right) . \end{aligned}$$
(27)

Note that R(m) satisfies

$$\begin{aligned} 0\le R(m) \le 1. \end{aligned}$$
(28)

The average of R(m) over all outcomes,

$$\begin{aligned} R=\sum _{m} p(m)\,R(m), \end{aligned}$$
(29)

is the degree of physical reversibility of a measurement discussed in Ref. [34], whose explicit form in terms of the singular values is given in Ref. [25], even though R(m) was not derived in Ref. [25].

Therefore, the information and disturbance for a single outcome m are obtained as functions of the singular values of \(\hat{M}_{m}\): I(m) and G(m) for the information and F(m) and R(m) for the disturbance. Note that they are invariant under the interchange of any pair of singular values,

$$\begin{aligned} \lambda _{mi} \longleftrightarrow \lambda _{mj} \quad \text {for any}\, (i,j), \end{aligned}$$
(30)

and under rescaling of all the singular values,

$$\begin{aligned} \lambda _{mi} \longrightarrow c\lambda _{mi} \quad \text {for all}\, i, \end{aligned}$$
(31)

by a constant c [26]. By contrast, the probability for the outcome m, \(p(m)=\sigma _{m}^2/d\), is invariant under the interchange but is not invariant under the rescaling.

As an important example, consider \(\hat{M}^{(d)}_{k,l}(\lambda )\), which is defined as a measurement operator whose singular values are

$$\begin{aligned} \underbrace{1,1,\ldots ,1}_{k}, \underbrace{\lambda ,\lambda ,\ldots ,\lambda }_{l}, \underbrace{0,0,\ldots ,0}_{d-k-l} \end{aligned}$$
(32)

with \(0\le \lambda \le 1\). Even though the information and disturbance for \(\hat{M}^{(d)}_{k,l}(\lambda )\) can be calculated from Eqs. (9), (15), (20), and (27), calculating I(m) is not straightforward due to the degeneracy of the singular values. By taking the limit \(\lambda _{mi}\rightarrow \lambda _{mk}\), I(m) is found to be [26]

$$\begin{aligned} I(m)&=\log _2d -\frac{1}{\ln 2}\Bigl [\eta (d)- 1\Bigr ]- \log _2\left( k+\lambda ^2\right) \nonumber \\&\quad \,\, +\frac{1}{k+\lambda ^2}\left[ \frac{\lambda ^{2(k+1)}\log _2\lambda ^2}{(\lambda ^2-1)^k} -\sum _{n=0}^{k-1}\frac{a^{(k+1)}_n}{(\lambda ^2-1)^{k-n}} \right] \end{aligned}$$
(33)

for \(\hat{M}^{(d)}_{k,1}(\lambda )\) and

$$\begin{aligned} I(m)&=\log _2d -\frac{1}{\ln 2}\Bigl [\eta (d)- 1\Bigr ] \nonumber \\&{}\quad -\log _2\left( 1+l\lambda ^2\right) -\frac{1}{1+l\lambda ^2} \sum _{n=0}^{l-1}\frac{c^{(l+1)}_n(\lambda )}{(1-\lambda ^2)^{l-n}} \end{aligned}$$
(34)

for \(\hat{M}^{(d)}_{1,l}(\lambda )\), where \(\{a^{(j)}_n\}\) and \(\{c^{(j)}_n(\lambda )\}\) are given by

$$\begin{aligned} a^{(j)}_n= & {} \frac{1}{\ln 2}\left( {\begin{array}{c}j\\ n\end{array}}\right) \Bigl [\eta (j)- \eta (j-n)\Bigr ], \end{aligned}$$
(35)
$$\begin{aligned} c^{(j)}_n(\lambda )= & {} \lambda ^{2(j-n)} \left[ \left( {\begin{array}{c}j\\ n\end{array}}\right) \log _2 \lambda ^2+a^{(j)}_n\right] . \end{aligned}$$
(36)

Similarly, \(\hat{P}^{(d)}_{r}\) is defined as a projective measurement operator of rank r. Note that \(\hat{M}^{(d)}_{k,l}(0)=\hat{P}^{(d)}_{k}\), \(\hat{M}^{(d)}_{k,l}(1)=\hat{P}^{(d)}_{k+l}\), and \(\hat{P}^{(d)}_{d}=\hat{I}\). For \(\hat{P}^{(d)}_{r}\), I(m) is found to be [39]

$$\begin{aligned} I(m)=\log _2\frac{d}{r}-\frac{1}{\ln 2}\Bigl [\eta (d)- \eta (r)\Bigr ]. \end{aligned}$$
(37)

3 Allowed region

Next, we plot the information and disturbance for various measurement operators on a plane. In particular, an allowed region for information versus disturbance can be shown on the plane by plotting all physically possible measurement operators, that is, by varying every singular value over the range of \(0\le \lambda _{mi}\le 1\). It is easy to do this for I(m), G(m), F(m), and R(m) because they contain only a definite number of bounded parameters, i.e., d singular values, in contrast to I, G, F, and R. Moreover, from the interchange invariance in Eq. (30), measurement operators having the same singular values up to ordering correspond to the same point on the plane. According to the rescaling invariance in Eq. (31), \(\hat{M}_{m}\) and \(c\hat{M}_{m}\) correspond to the same point on the plane.

Figure 1a shows the allowed region for G(m) versus F(m) when \(d=4\) in blue (dark gray). In the figure, P\(_r\) and (kl) represent the point corresponding to \(c\hat{P}^{(d)}_{r}\) and the line corresponding to \(c\hat{M}^{(d)}_{k,l}(\lambda )\) with \(0\le \lambda \le 1\), respectively. The upper boundary consists of one curved line \((1,d-1)\) connecting P\(_1\) and P\(_d\) as \(\lambda \) varies from 0 to 1, whereas the lower boundary consists of \(d-1\) curved lines (k, 1) connecting P\(_k\) to P\(_{k+1}\) for \(k=1,2,\ldots ,d-1\). Conversely, Fig. 1b shows the allowed region for G(m) versus R(m) when \(d=4\) in blue (dark gray). In this case, both the upper and lower boundaries consist of one straight line: \((1,d-1)\) for the upper boundary and \((d-1,1)\) for the lower boundary. Similarly, Fig. 1c, d shows the allowed region for I(m) versus F(m) and for I(m) versus R(m), respectively. The measurement operators corresponding to the upper and lower boundaries are the same as for G(m), even though the lines have different shapes. Figure 2 shows the allowed regions when \(d=8\) in blue (dark gray).

Fig. 1
figure 1

Four allowed regions for information versus disturbance for \(d=4\): a estimation fidelity G(m) versus operation fidelity F(m), b estimation fidelity G(m) versus physical reversibility R(m), c information gain I(m) versus operation fidelity F(m), and d information gain I(m) versus physical reversibility R(m). In each panel, the region pertaining to a single outcome is shown in blue (dark gray), and the extended region obtained by averaging over all outcomes is shown in yellow (light gray) (Color figure online)

Fig. 2
figure 2

Four allowed regions for information versus disturbance for \(d=8\): a estimation fidelity G(m) versus operation fidelity F(m), b estimation fidelity G(m) versus physical reversibility R(m), c information gain I(m) versus operation fidelity F(m), and d information gain I(m) versus physical reversibility R(m). In each panel, the region pertaining to a single outcome is shown in blue (dark gray), and the extended region obtained by averaging over all outcomes is shown in yellow (light gray) (Color figure online)

The above boundaries, \((1,d-1)\) and (k, 1), were first confirmed by brute-force numerical calculations where every singular value was varied by steps of \(\varDelta \lambda _{mi}=0.01\) for \(d=2,3,\ldots ,6\) and \(\varDelta \lambda _{mi}=0.02\) for \(d=7,8\). Moreover, for G(m) versus F(m) and for G(m) versus R(m), the boundaries can analytically be proven to be the true boundaries for arbitrary d (see “Appendix A”). Unfortunately, however, for I(m) versus F(m) and for I(m) versus R(m), proving that the boundaries are the true boundaries is difficult analytically. Nevertheless, they can be shown to satisfy the necessary conditions for the true boundaries using the Karush–Kuhn–Tucker (KKT) conditions [40], which generalize the method of Lagrange multipliers to handle inequality constraints in mathematical optimization. For example, to find the lower boundary for I(m) versus F(m), consider minimizing I(m) subject to \(F(m)=F_0\) and \(\lambda _{mi}\ge 0\) (\(i=1,2,\ldots ,d\)). Then, \(\hat{M}^{(d)}_{k,1}(\lambda )\) satisfies a necessary condition for a local minimum, that is, for a Lagrange function

$$\begin{aligned} L_F=I(m)- \alpha _F \left[ F(m)-F_0\right] -\sum _i\beta _i\lambda _{mi}, \end{aligned}$$
(38)

\(\hat{M}^{(d)}_{k,1}(\lambda )\) satisfies \(\partial L_F/\partial \lambda _{mi}=0\) with KKT multipliers \(\alpha _F\) and \(\{\beta _i\}\) such that \(\beta _i\ge 0\) and \(\beta _i\lambda _{mi}=0\) for all i and has \(\lambda =\lambda _0\) such that \(F(m)=F_0\) if \((k+1)/(d+1)\le F_0\le (k+2)/(d+1)\). These mathematical optimizations are explained in “Appendix  B”.

4 Average over outcomes

Here, the regions that are allowed for the information and disturbance averaged over all possible outcomes are discussed: I and G for the information and F and R for the disturbance. Unfortunately, it is difficult to show the allowed regions directly from their explicit forms written in terms of the singular values because the number of singular values contained in them is not definite due to the indefinite number of outcomes. Note that there are no physical limitations on the number of outcomes.

Instead, we show the allowed regions using the following analogy with the center of mass. In the measurement \(\{\hat{M}_{m}\}\), each measurement operator \(\hat{M}_{m}\) corresponds to a point R\(_{m}\) in the allowed region pertaining to a single outcome with weight p(m). This situation can be viewed as a set of particles, each with a mass p(m) located at a point R\(_{m}\). The center of mass of these particles then indicates the average information and disturbance of the measurement. Conversely, for an arbitrary set of particles located in the allowed region pertaining to a single outcome, an equivalent measurement satisfying Eq. (1) can be constructed by rescaling and duplicating the measurement operators, as shown in “Appendix C”. For example, for \(d=4\), two particles with the same mass 1 / 2 located at P\(_1\) and P\(_4\) in Fig. 1 can be simulated by a measurement with five outcomes whose measurement operators are

$$\begin{aligned} \hat{M}_{m} ={\left\{ \begin{array}{ll} \frac{1}{\sqrt{2}}\,| m \rangle \langle m | &{} \quad {(m=1,2,3,4)} \\ \frac{1}{\sqrt{2}}\,\hat{I} &{} \quad {(m=5)}. \end{array}\right. } \end{aligned}$$
(39)

Therefore, the allowed region for the average information and disturbance can be shown by considering the center of mass of all possible sets of particles. Note that the center of mass may be located outside the region where the particles are situated, which means that the allowed region is extended by averaging over the outcomes. The resultant region is the convex hull of the original region.

The regions extended by averaging are shown in Fig. 1 in yellow (light gray). As shown in Fig. 1a, the lower boundary for G versus F is extended to the straight lines between P\(_k\) and P\(_{k+1}\) for \(k=1,2,\ldots ,d-1\), whereas the upper boundary is not extended due to its convexity. By contrast, as shown in Fig. 1b, the boundaries for G versus R are not extended at all. Meanwhile, as shown in Fig. 1c, the lower boundary for I versus F is extended as in the case of G and, moreover, the upper boundary is extended a little higher when \(d\ge 3\) because the line \((1,d-1)\) has a slight dent near P\(_d\). In fact, an analytic calculation of \(\hat{M}^{(d)}_{1,d-1}(\lambda )\) shows that

$$\begin{aligned} \frac{\hbox {d}^2F(m)}{\hbox {d}I(m)^2}>0 \end{aligned}$$
(40)

near P\(_d\) when \(d\ge 3\). The upper boundary is therefore extended to the tangent line drawn from P\(_d\) to the line \((1,d-1)\) between P\(_d\) and the point of tangency T. As shown in Fig. 1d, the upper boundary for I versus R is extended to the straight line between P\(_1\) and P\(_d\), whereas the lower boundary is not extended. The case of \(d=8\) is shown in Fig. 2.

Fig. 3
figure 3

Two line slopes for \(d=4\). \(D_4(\lambda )\) is the slope of the tangent line to the line (1,3) at a point Q, and \(S_4(\lambda )\) is the slope of the straight line from P\(_4\) to Q. Note that the horizontal axis is reversed because large \(\lambda \) corresponds to small I(m) in Fig. 1c

To find the point T on the upper boundary for I versus F, two line slopes are defined as functions of \(\lambda \): the slope of the tangent line to the line \((1,d-1)\) at the point Q corresponding to \(\hat{M}^{(d)}_{1,d-1}(\lambda )\),

$$\begin{aligned} D_d(\lambda )=\frac{\hbox {d}F(m)}{\hbox {d}I(m)}, \end{aligned}$$
(41)

and the slope of the straight line from P\(_d\) to Q,

$$\begin{aligned} S_d(\lambda )=\frac{F(m)-1}{I(m)}. \end{aligned}$$
(42)

These functions are shown for \(d=4\) in Fig. 3. Using \(\lambda _{\mathrm{T}}\) such that

$$\begin{aligned} D_d(\lambda _\mathrm{T})=S_d(\lambda _\mathrm{T}), \end{aligned}$$
(43)

the measurement operator corresponding to T can be written as \(\hat{M}^{(d)}_{1,d-1}(\lambda _\mathrm{T})\). In Fig. 4, \(\lambda _\mathrm{T}\) is shown with I(m) and F(m) at T, denoted by \(I_\mathrm{T}\) and \(F_\mathrm{T}\), respectively, for various d. When \(d=4\), T in Fig. 1c corresponds to \(\hat{M}^{(4)}_{1,3}(0.299)\) and the upper boundary for I versus F moves up between P\(_4\) and T, at most by \(3.5\times 10^{-3}\). This extension of the upper boundary becomes larger as d increases. For example, when \(d=8\), T in Fig. 2c corresponds to \(\hat{M}^{(8)}_{1,7}(0.120)\) and the upper boundary moves up at most by \(2.6\times 10^{-2}\). Interestingly, \(\hat{M}^{(d)}_{1,d-1}(\lambda _\mathrm{T})\) is the most efficient measurement operator in terms of the ratio of information gain to fidelity loss [26],

$$\begin{aligned} E_F(m)=\frac{I(m)}{1-F(m)}. \end{aligned}$$
(44)
Fig. 4
figure 4

Singular value \(\lambda _\mathrm{T}\), information \(I_\mathrm{T}\), and fidelity \(F_\mathrm{T}\) at the point of tangency T for various d

The upper boundary for G versus F and that for G versus R are equivalent to the inequalities of Banaszek [7] and Cheong and Lee [25], respectively, where the averages are explicitly calculated using \(p(m)=\sigma _{m}^2/d\). However, to our knowledge, this is the first derivation of the other two upper and four lower boundaries. The lower boundaries are less important than the upper boundaries in quantum information and can be violated by non-ideal measurements, which have classical noise yielding mixed post-measurement states, or by non-optimal estimations, which assume unwise observers making incorrect choices for \(| \varphi (m) \rangle \) in G(m). Nevertheless, for the foundations of quantum mechanics, it is worth deriving both the upper and lower boundaries for ideal measurements with optimal estimation to examine the intrinsic nature and power of quantum measurements.

Fig. 5
figure 5

Four allowed regions for information versus disturbance for \(d=2\): a estimation fidelity G(m) versus operation fidelity F(m), b estimation fidelity G(m) versus physical reversibility R(m), c information gain I(m) versus operation fidelity F(m), and d information gain I(m) versus physical reversibility R(m). In each panel, the region pertaining to a single outcome is just the solid line denoted by (1, 1) and the extended region obtained by averaging over all outcomes is shown in yellow (light gray) (Color figure online)

The case of \(d=2\) is a special case, where the regions extended by averaging are the main parts of the allowed regions, as shown in Fig. 5. In this case, the allowed regions pertaining to a single outcome shrink to the line (1, 1) because a measurement operator can be represented by a single parameter via the rescaling invariance in Eq. (31) [24]. Moreover, the line (1, 1) in Fig. 5c has no dent unlike the case of \(d\ge 3\). In fact, it can be shown for \(\hat{M}^{(2)}_{1,1}(\lambda )\) that

$$\begin{aligned} \frac{\hbox {d}^2F(m)}{\hbox {d}I(m)^2}<0 \end{aligned}$$
(45)

near P\(_2\). The point T does not exist on the line (1, 1) because the slopes \(D_2(\lambda )\) and \(S_2(\lambda )\) in Eqs. (41) and (42) do not become equal to each other except for \(\lambda =1\), as shown in Fig. 6.

Fig. 6
figure 6

Two line slopes for \(d=2\). \(D_2(\lambda )\) is the slope of the tangent line to the line (1, 1) at a point Q, and \(S_2(\lambda )\) is the slope of the straight line from P\(_2\) to Q. Note that the horizontal axis is reversed because large \(\lambda \) corresponds to small I(m) in Fig. 5c

5 Optimal measurement

Finally, we discuss the optimal measurements saturating the upper bounds on the information for a given disturbance. The upper bounds are denoted by the upper boundaries of the allowed regions for the average information and disturbance. Therefore, according to the analogy with the center of mass, a measurement is optimal for an information–disturbance pair if it is equivalent to a set of particles whose center of mass is on the upper boundary for that information–disturbance pair. The optimal measurements are different for the four types of information–disturbance pairs because the upper boundaries have different shapes on the four information–disturbance planes, as shown in Fig. 1.

The conditions for the optimal measurements are as follows. A measurement \(\{\hat{M}_{m}\}\) is optimal for G versus F if all \(\hat{M}_{m}\)’s correspond to an identical point on the line \((1,d-1)\) because the upper boundary for G versus F is the convex curve \((1,d-1)\), as shown in Fig. 1a, whereas it is optimal for G versus R if every \(\hat{M}_{m}\) corresponds to a point on the line \((1,d-1)\) because the upper boundary for G versus R is the straight line \((1,d-1)\), as shown in Fig. 1b. These conditions are equivalent to those in Refs. [7, 25]. Similarly, when \(d\ge 3\), a measurement \(\{\hat{M}_{m}\}\) is optimal for I versus F if all \(\hat{M}_{m}\)’s correspond to an identical point between T and P\(_1\) on the line \((1,d-1)\) or if every \(\hat{M}_{m}\) corresponds to either P\(_d\) or T because the upper boundary for I versus F is the union of the convex curve \((1,d-1)\) between T and P\(_1\) and the straight line between P\(_d\) and T, as shown in Fig. 1c. However, when \(d=2\), the condition to be optimal for I versus F is the same as that for G versus F because the upper boundary is just the convex curve \((1,d-1)\), as shown in Fig. 5c. Conversely, a measurement \(\{\hat{M}_{m}\}\) is optimal for I versus R if every \(\hat{M}_{m}\) corresponds to either P\(_d\) or P\(_1\) because the upper boundary for I versus R is the straight line between P\(_d\) and P\(_1\), as shown in Fig. 1d.

Interestingly, an optimal measurement for G versus F is not necessarily optimal for I versus F and an optimal measurement for G versus R is not necessarily optimal for I versus R. The relationships between the four conditions are illustrated in Fig. 7, excluding the strongest measurement, where all the measurement operators correspond to P\(_1\), and the weakest measurement, where all the measurement operators correspond to P\(_d\); these two measurements satisfy all four conditions.

Fig. 7
figure 7

Four conditions for optimal measurements. For example, the set GF represents all measurements that are optimal for G versus F

As a specific example, consider a measurement \(\{\hat{M}_{m}^{(d)}(\lambda )\}\) with d outcomes, \(m=1,2,\ldots ,d\), where \(\hat{M}_{m}^{(d)}(\lambda )\) is defined by

$$\begin{aligned} \hat{M}_{m}^{(d)}(\lambda )\equiv \frac{1}{\sqrt{1+(d-1)\lambda ^2}} \left( | m \rangle \langle m |+\sum _{i\ne m} \lambda | i \rangle \langle i |\right) \end{aligned}$$
(46)

with \(0<\lambda <1\). For a given \(\lambda \), all \(\hat{M}_{m}^{(d)}(\lambda )\)’s correspond to an identical point on the line \((1,d-1)\) in the four information–disturbance planes because they are equivalent to \(\hat{M}^{(d)}_{1,d-1}(\lambda )\) via the interchange and rescaling invariances in Eqs. (30) and (31). The corresponding point on the line \((1,d-1)\) indicates the average information and disturbance of \(\{\hat{M}_{m}^{(d)}(\lambda )\}\). The measurement \(\{\hat{M}_{m}^{(d)}(\lambda )\}\) is optimal both for G versus F and for G versus R for arbitrary \(\lambda \) because the line \((1,d-1)\) is equal to the upper boundary, as shown in Fig. 1a, b.

However, the measurement \(\{\hat{M}_{m}^{(d)}(\lambda )\}\) is not necessarily optimal for I versus F because only a part of the line \((1,d-1)\) is equal to the upper boundary when \(d\ge 3\), as shown in Fig. 1c. It is optimal for I versus F only if \(\lambda \le \lambda _\mathrm{T}\), with \(\lambda _\mathrm{T}\) being defined by Eq. (43). Note that \(\hat{M}_{m}^{(d)}(\lambda _\mathrm{T})\) corresponds to T on the S-shaped curve \((1,d-1)\). If \(\lambda > \lambda _\mathrm{T}\), \(\hat{M}_{m}^{(d)}(\lambda )\) corresponds to a point on the concave part between P\(_d\) and T of the line \((1,d-1)\), where the upper boundary is equal to the straight line between P\(_d\) and T. This means that \(\{\hat{M}_{m}^{(d)}(\lambda )\}\) is not optimal for I versus F if \(\lambda > \lambda _\mathrm{T}\) or equivalently if \(F>F_\mathrm{T}\). The optimal measurement for this case can easily be constructed from the analogy with the center of mass by considering two particles: one located at T with mass q and the other located at P\(_d\) with mass \(1-q\). According to “Appendix C”, the optimal measurement has \(d+1\) outcomes whose measurement operators are

$$\begin{aligned} \hat{M}_{m} ={\left\{ \begin{array}{ll} \sqrt{q}\,\hat{M}_{m}^{(d)}(\lambda _\mathrm{T}) &{} \quad {(m=1,2,\ldots ,d)} \\ \sqrt{1-q}\,\hat{I} &{} \quad {(m=d+1)}, \end{array}\right. } \end{aligned}$$
(47)

where \(q=(1-F)/\left( 1-F_\mathrm{T}\right) \) for a given F. The average information and disturbance of this measurement are then indicated by a point on the straight line between P\(_d\) and T equal to a part of the upper boundary. By contrast, when \(d=2\), \(\{\hat{M}_{m}^{(2)}(\lambda )\}\) is optimal for I versus F for arbitrary \(\lambda \) because the line (1, 1) is equal to the upper boundary, as shown in Fig. 5c.

Conversely, the measurement \(\{\hat{M}_{m}^{(d)}(\lambda )\}\) is not optimal for I versus R for any \(\lambda \) because the line \((1,d-1)\) is not equal to the upper boundary at all, as shown in Fig. 1d. In this case, the upper boundary is the straight line between P\(_d\) and P\(_1\). Therefore, the optimal measurement for I versus R can be constructed from the analogy with the center of mass by considering two particles: one located at P\(_1\) with mass q and the other located at P\(_d\) with mass \(1-q\). This has \(d+1\) outcomes whose measurement operators are

$$\begin{aligned} \hat{M}_{m} ={\left\{ \begin{array}{ll} \sqrt{q}\,| m \rangle \langle m | &{} \quad {(m=1,2,\ldots ,d)} \\ \sqrt{1-q}\,\hat{I} &{} \quad {(m=d+1)}, \end{array}\right. } \end{aligned}$$
(48)

where \(q=1-R\) for a given R. The average information and disturbance of this measurement are indicated by a point on the straight line between P\(_d\) and P\(_1\) equal to the upper boundary.

Of course, the measurements given in Eqs. (47) and (48) are also optimal for G versus R for arbitrary q. Even though their measurement operators correspond to different points on the line \((1,d-1)\), the point indicating the average values is still on the line \((1,d-1)\) equal to the upper boundary because the line \((1,d-1)\) is straight, as shown in Fig. 1b. However, except for \(q=0\) or 1, the measurement in Eq. (47) is optimal neither for G versus F nor for I versus R and the measurement in Eq. (48) is optimal neither for G versus F nor for I versus F.

6 Summary

In summary, we have shown the allowed regions for information versus disturbance for quantum measurements of completely unknown states. The information and disturbances pertaining to a single outcome are quantified using the singular values of the measurement operator and are plotted on four types of information–disturbance planes to show the allowed regions pertaining to a single outcome. The allowed regions for the average values are also discussed via an analogy with the center of mass. These regions explicitly give not only the upper bounds but also the lower bounds on the information for a given disturbance together with the optimal measurements saturating the upper bounds. Consequently, our results broaden our perspective of quantum measurements and provide a useful tool for quantum information processing and communication.