1 Introduction

In quantum theory, any measurement that provides information about a physical system also inevitably disturbs the system’s state in a way that depends on the measurement’s outcome. This trade-off between information and disturbance is of great interest in establishing the foundations of quantum mechanics and plays an important role in quantum information processing and communication [1] techniques, such as quantum cryptography [2,3,4,5]. Many authors [6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23] have therefore discussed this trade-off, using several different formulations. For example, Banaszek [7] found an inequality between the amount of information gained and the size of the state change, whereas Cheong and Lee [20] found another one between the amount of information gained and the reversibility of the state change. These inequalities have both been verified [24,25,26,27] in single-photon experiments.

Recently, we have also studied this trade-off, deriving the allowed regions in four types of information–disturbance plane [28]. These four information–disturbance pairs combine one information measure, namely the Shannon entropy [6] or estimation fidelity [7], with one disturbance measure, namely the operation fidelity [7] or physical reversibility [29]. The boundaries of the allowed regions give upper and lower bounds on the information for a given disturbance, together with the optimal measurements that saturate the upper bounds. The optimal measurements are different for each of the four pairs, because the allowed regions’ upper boundaries have different curvatures on each of the information–disturbance planes [28].

Contrary to expectations, the allowed regions show that measurements providing more information do not necessarily cause larger disturbances. This is because the allowed regions have finite areas, i.e., for any given measurement corresponding to an interior point of an allowed region, there always exists another measurement that provides more information with smaller disturbance near that point. However, measurements that lie on the boundary of an allowed region in the information–disturbance plane are subject to a trade-off, meaning that modifying them to increase the information obtained by moving along the boundary also increases the disturbance according to the boundary’s slope.

In this paper, we obtain the first and second derivatives of the disturbance with respect to the information obtained from measurements lying on the allowed regions’ boundaries for each of the four information–disturbance pairs. These measurements are described by a diagonal operator with a continuous parameter and applied to a d-level system in a completely unknown state. For such measurements, we calculate these derivatives to demonstrate the slopes and curvatures of the allowed regions’ boundaries, clarifying the regions’ shapes and, hence, broadening our perspective on the trade-off between information and disturbance in quantum measurements. In fact, it was difficult to judge from the allowed regions shown in Ref. [28] whether the slopes of the boundaries are finite and whether the curvatures of the boundaries are negative at some points. In contrast, the first and second derivatives obtained in this paper give the values of the slopes and curvatures of the boundaries to answer these questions.

The rest of this paper is organized as follows: Sect. 2 reviews the procedure for quantifying the information and the disturbance in quantum measurements, giving their explicit forms for a fundamental class of measurements as functions of a certain parameter. Section 3 presents the first and second derivatives of the information and the disturbance for such measurements with respect to this parameter, while Sect. 4 gives the first and second derivatives of the disturbance with respect to the information. Finally, Sect. 5 summarizes our results.

2 Information and disturbance

In this section, we recall the information and the disturbance in quantum measurements at the single-outcome level [11, 30,31,32,33] and summarize the results of Ref. [28] in order for this paper to be self-contained. Suppose we want to measure a d-level system that is known to be in one of a predefined set of pure states \(\{| \psi (a) \rangle \}\), the probability of the system being in the state \(| \psi (a) \rangle \) is given by p(a), but we do not know the actual states of the system. To study the case where no prior information about the system is available, we assume that the set \(\{| \psi (a) \rangle \}\) consists of all possible pure states and p(a) is uniform according to a normalized invariant measure over the pure states.

First, we quantify the amount of information provided by a given quantum measurement [28]. An ideal quantum measurement [34] can be described by a set of measurement operators \(\{\hat{M}_m\}\) [1] that satisfy

$$\begin{aligned} \sum _m\hat{M}_m^\dagger \hat{M}_m=\hat{I}, \end{aligned}$$
(1)

where m denotes the outcome of the measurement and \(\hat{I}\) is the identity operator. When the system is in state \(| \psi (a) \rangle \), a measurement \(\{\hat{M}_m\}\) yields the outcome m with probability

$$\begin{aligned} p(m|a)=\langle \psi (a) |\hat{M}_m^\dagger \hat{M}_m| \psi (a) \rangle \end{aligned}$$
(2)

and changes the state to

$$\begin{aligned} | \psi (m,a) \rangle =\frac{1}{\sqrt{p(m|a)}}\,\hat{M}_m| \psi (a) \rangle . \end{aligned}$$
(3)

The measurement outcome provides some information about the system’s state. For example, given the outcome m, the probability that the initial state was \(| \psi (a) \rangle \) is given by

$$\begin{aligned} p(a|m) =\frac{p(m|a)\,p(a)}{p(m)} \end{aligned}$$
(4)

using Bayes’ rule, where

$$\begin{aligned} p(m) =\sum _a p(m|a)\,p(a) \end{aligned}$$
(5)

is the total probability of the outcome m. This therefore changes the state probability distribution from \(\{p(a)\}\) to \(\{p(a|m)\}\), decreasing the Shannon entropy by

$$\begin{aligned} I(m)&= \left[ -\sum _a p(a)\log _2 p(a)\right] -\,\left[ -\sum _a p(a|m)\log _2 p(a|m)\right] . \end{aligned}$$
(6)

This entropy change, I(m), quantifies the amount of information provided by a measurement \(\{\hat{M}_m\}\) with outcome m [11, 35] and satisfies

$$\begin{aligned} 0\le I(m) \le \log _2d-\frac{1}{\ln 2}[\eta (d)- 1], \end{aligned}$$
(7)

where

$$\begin{aligned} \eta (n)= {\left\{ \begin{array}{ll} \sum \limits _{k=1}^{n}\frac{1}{k} &{}\quad \text{(if }\;\; n=1,2,\ldots ) \\ 0 &{}\quad \text{(if }\;\; n=0). \end{array}\right. } \end{aligned}$$
(8)

Note that I(m) is a measure of the information generated by a single outcome, unlike

$$\begin{aligned} I=\sum _m p(m)\,I(m), \end{aligned}$$
(9)

which was discussed in Ref. [6].

The measurement outcome m can also be used to estimate the system’s state as \(| \varphi (m) \rangle \), where an optimal \(| \varphi (m) \rangle \) is the eigenvector of \(\hat{M}_m^\dagger \hat{M}_m\) corresponding to its maximum eigenvalue [7]. The quality of this estimate can be evaluated in terms of the estimation fidelity G(m):

$$\begin{aligned} G(m) =\sum _a p(a|m)\,\bigl |\langle \varphi (m)|\psi (a) \rangle \bigr |^2. \end{aligned}$$
(10)

This also quantifies the amount of information provided by the outcome m and satisfies

$$\begin{aligned} \frac{1}{d}\le G(m) \le \frac{2}{d+1}. \end{aligned}$$
(11)

Again, note that G(m) relates to a single outcome, unlike

$$\begin{aligned} G=\sum _m p(m)\,G(m), \end{aligned}$$
(12)

which was discussed in Ref. [7].

Next, we quantify the degree of disturbance caused by the measurement \(\{\hat{M}_m\}\) [28]. The outcome m changes the system’s state from \(| \psi (a) \rangle \) to \(| \psi (m,a) \rangle \), given by Eq. (3). The size of this change can be evaluated using the operation fidelity F(m):

$$\begin{aligned} F(m) =\sum _a p(a|m)\bigl |\langle \psi (a)|\psi (m,a) \rangle \bigr |^2. \end{aligned}$$
(13)

This quantifies the degree of disturbance caused when a measurement \(\{\hat{M}_m\}\) yields the outcome m and satisfies

$$\begin{aligned} \frac{2}{d+1}\le F(m) \le 1. \end{aligned}$$
(14)

Again, note that F(m) relates to a single outcome, unlike

$$\begin{aligned} F=\sum _m p(m)\,F(m), \end{aligned}$$
(15)

which was discussed in Ref. [7].

In addition to the size of the state change, the reversibility of the change can also be used to quantify the disturbance in the context of physically reversible measurements [36,37,38,39,40,41,42,43,44,45,46]. Even though \(| \psi (a) \rangle \) and \(| \psi (m,a) \rangle \) are unknown, the change can be physically reversed by a reversing measurement on \(| \psi (m,a) \rangle \) if \(\hat{M}_m\) has a bounded left inverse \(\hat{M}_m^{-1}\) [39, 40]. Such a reversing measurement can be described by another set of measurement operators \(\{\hat{R}_\mu ^{(m)}\}\) that satisfy

$$\begin{aligned} \sum _\mu \hat{R}^{(m)\dagger }_\mu \hat{R}^{(m)}_\mu =\hat{I} \end{aligned}$$
(16)

and \(\hat{R}^{(m)}_{\mu _0}\propto \hat{M}_m^{-1}\) for a particular \(\mu =\mu _0\), where \(\mu \) denotes the reversing measurement’s outcome. When this measurement on \(| \psi (m,a) \rangle \) yields the preferred outcome \(\mu _0\), the system’s state returns to \(| \psi (a) \rangle \) because \(\hat{R}_{\mu _0}^{(m)}\hat{M}_m\propto \hat{I}\). The state recovery probability for an optimal reversing measurement [29] is

$$\begin{aligned} R(m,a)=\frac{\inf _{| \psi \rangle }\, \langle \psi |\hat{M}_m^\dagger \hat{M}_m| \psi \rangle }{p(m|a)}, \end{aligned}$$
(17)

and we can use this to evaluate the reversibility of the state change as

$$\begin{aligned} R(m) = \sum _a p(a|m)\,R(m,a). \end{aligned}$$
(18)

This also quantifies the degree of disturbance caused when a measurement \(\{\hat{M}_m\}\) yields the outcome m and satisfies

$$\begin{aligned} 0\le R(m) \le 1. \end{aligned}$$
(19)

Again, note that R(m) relates to a single outcome, unlike

$$\begin{aligned} R=\sum _m p(m)\,R(m), \end{aligned}$$
(20)

which was discussed in Refs. [20, 29].

As an important example, we consider a diagonal measurement operator \(\hat{M}^{(d)}_{k,l}(\lambda )\) with diagonal elements

$$\begin{aligned} \underbrace{1,1,\ldots ,1}_{k}, \underbrace{\lambda ,\lambda ,\ldots ,\lambda }_{l}, \underbrace{0,0,\ldots ,0}_{d-k-l} \end{aligned}$$
(21)

for \(k=1,2,\ldots ,d-1\) and \(l=1,2,\ldots ,d-k\), with a parameter \(\lambda \) satisfying \(0\le \lambda \le 1\). In an orthonormal basis \(\{| i \rangle \}\) with \(i=1,2,\ldots ,d\), the measurement operator \(\hat{M}^{(d)}_{k,l}(\lambda )\) can be written as

$$\begin{aligned} \hat{M}^{(d)}_{k,l}(\lambda )=\sum _{i=1}^{k} | i \rangle \langle i | +\sum _{i=k+1}^{k+l} \lambda | i \rangle \langle i |. \end{aligned}$$
(22)

The information that was yielded and the disturbance that was caused by this operator can be quantified in terms of I(m), G(m), F(m), and R(m), given by Eqs. (6), (10), (13), and (18) as functions of the parameter \(\lambda \). Using the general formula derived in Ref. [33], I(m) can be calculated to be

$$\begin{aligned} I(m)&= \log _2d -\frac{1}{\ln 2}[\eta (d)- 1] -\,\log _2(k+l\lambda ^2) +\frac{1}{k+l\lambda ^2} J, \end{aligned}$$
(23)

where J is given by

$$\begin{aligned} J&= (-1)^l \sum _{n=0}^{k-1}\left( {\begin{array}{c}k+l-n-2\\ l-1\end{array}}\right) \, \frac{a^{(k+l)}_n}{(\lambda ^2-1)^{k+l-n-1}} \nonumber \\&\quad +\,(-1)^k\sum _{n=0}^{l-1}\left( {\begin{array}{c}k+l-n-2\\ k-1\end{array}}\right) \, \frac{c^{(k+l)}_n(\lambda )}{(1-\lambda ^2)^{k+l-n-1}}, \end{aligned}$$
(24)

with coefficients

$$\begin{aligned} a^{(j)}_n&= \frac{1}{\ln 2}\left( {\begin{array}{c}j\\ n\end{array}}\right) \left[ \eta (j)- \eta (j-n)\right] , \end{aligned}$$
(25)
$$\begin{aligned} c^{(j)}_n(\lambda )&= \lambda ^{2(j-n)} \left[ \left( {\begin{array}{c}j\\ n\end{array}}\right) \log _2 \lambda ^2+a^{(j)}_n\right] \end{aligned}$$
(26)

for \(n=0,1,\ldots ,j\). Likewise, G(m), F(m), and R(m) can be calculated to be [33]

$$\begin{aligned} G(m)&= \frac{1}{d+1}\left( 1+\frac{1}{k+l\lambda ^2}\right) , \end{aligned}$$
(27)
$$\begin{aligned} F(m)&= \frac{1}{d+1}\left[ 1+\frac{(k+l\lambda )^2}{k+l\lambda ^2}\right] , \end{aligned}$$
(28)
$$\begin{aligned} R(m)&= d\left( \frac{\lambda ^2}{k+l\lambda ^2}\right) \,\delta _{d,(k+l)}. \end{aligned}$$
(29)
Fig. 1
figure 1

Four allowed regions for information versus disturbance for \(d=4\): a estimation fidelity G(m) versus operation fidelity F(m); b estimation fidelity G(m) versus physical reversibility R(m); c information gain I(m) versus operation fidelity F(m); and d information gain I(m) versus physical reversibility R(m)

The measurement operator \(\hat{M}^{(d)}_{k,l}(\lambda )\) is very important for obtaining the allowed regions in the information–disturbance planes by plotting all physically possible measurement operators. We consider four different allowed regions, based on using I(m) or G(m) to quantify the information and F(m) or R(m) to quantify the disturbance. Figure 1 shows these four allowed regions for \(d=4\) in gray [28], where the lines (kl) correspond to \(\hat{M}^{(d)}_{k,l}(\lambda )\) with \(0\le \lambda \le 1\) and the \(\mathrm{P}_r\)’s denote the points corresponding to the projective measurement operator of rank r:

$$\begin{aligned} \hat{P}^{(d)}_{r}=\sum _{i=1}^{r} | i \rangle \langle i |. \end{aligned}$$
(30)

Clearly, \(\hat{M}^{(d)}_{k,l}(0)=\hat{P}^{(d)}_{k}\), \(\hat{M}^{(d)}_{k,l}(1)=\hat{P}^{(d)}_{k+l}\), and \(\hat{P}^{(d)}_{d}=\hat{I}\). Thus, the line (kl) connects \(\mathrm{P}_k\) to \(\mathrm{P}_{k+l}\) and the point \(\mathrm{P}_d\) is at the top left corner of the plot. In Fig. 1, the upper boundaries of the allowed regions consist of the lines \((1,d-1)\) corresponding to \(\hat{M}^{(d)}_{1,d-1}(\lambda )\), whereas the lower boundaries consist of the lines (k, 1) corresponding to \(\hat{M}^{(d)}_{k,1}(\lambda )\) for \(k=1,2,\ldots ,d-1\). Therefore, to find the values of the slopes and curvatures of the boundaries, we need to calculate the first and second derivatives of the disturbance with respect to information for \(\hat{M}^{(d)}_{k,l}(\lambda )\).

The above allowed regions were obtained by considering ideal measurements, as in Eq. (3), with optimal estimates for G(m). Unfortunately, the lower boundaries can be violated by non-ideal measurements, which yield mixed post-measurement states due to classical noise, or non-optimal estimates, which make suboptimal choices for \(| \varphi (m) \rangle \). Here, we ignore such non-quantum effects in order to focus on the quantum nature of measurement.

3 Derivatives with respect to \(\lambda ^2\)

To calculate the derivative of the disturbance with respect to information for \(\hat{M}^{(d)}_{k,l}(\lambda )\), we first consider the derivatives of the information and disturbance with respect to the parameter \(\lambda ^2\). For simplicity, we focus on derivatives with respect to \(\lambda ^2\) rather than \(\lambda \) itself. These derivatives are straightforward to calculate because the information and the disturbance are expressed as functions of \(\lambda =\sqrt{\lambda ^2}\) in Eqs. (23), (27), (28), and (29).

However, the expression for the derivative of I(m) is quite long. This is due to the expression for J given in Eq. (24). From Eq. (23), the first derivative of I(m) is

$$\begin{aligned}{}[I(m)]'&= -\frac{1}{\ln 2}\left( \frac{l}{k+l\lambda ^2}\right) -\,\frac{l}{(k+l\lambda ^2)^2} J +\frac{1}{k+l\lambda ^2} J', \end{aligned}$$
(31)

where primes represent derivatives with respect to \(\lambda ^2\). The first derivative of J can be written as

$$\begin{aligned} J'&= (-1)^l\sum _{n=0}^{k-1}\left( {\begin{array}{c}k+l-n-1\\ l\end{array}}\right) \, \frac{-l a^{(k+l)}_n}{(\lambda ^2-1)^{k+l-n}} \nonumber \\&\quad +\,(-1)^k\sum _{n=0}^{l-1}\left( {\begin{array}{c}k+l-n-1\\ k\end{array}}\right) \, \frac{k c^{(k+l)}_n(\lambda )}{(1-\lambda ^2)^{k+l-n}} \nonumber \\&\quad +\,(-1)^k\sum _{n=0}^{l-1}\left( {\begin{array}{c}k+l-n-2\\ k-1\end{array}}\right) \, \frac{(n+1)\,c^{(k+l)}_{n+1}(\lambda )}{(1-\lambda ^2)^{k+l-n-1}} \end{aligned}$$
(32)

because

$$\begin{aligned}{}[c^{(j)}_n(\lambda )]' = (n+1)\,c^{(j)}_{n+1}(\lambda ). \end{aligned}$$
(33)

Figure 2 shows \([I(m)]'\) as a function of \(\lambda \) for \(d=4\), for various (kl). From this, we can observe that \([I(m)]'\le 0\).

Fig. 2
figure 2

First derivative of I(m) with respect to \(\lambda ^2\) as a function of \(\lambda \), for \(\hat{M}^{(d)}_{k,l}(\lambda )\) with \(d=4\), for various (kl)

In addition, the second derivative of I(m) is

$$\begin{aligned}{}[I(m)]''&= \frac{1}{\ln 2}\left[ \frac{l^2}{(k+l\lambda ^2)^2}\right] +\frac{2l^2}{(k+l\lambda ^2)^3} J -\,\frac{2l}{(k+l\lambda ^2)^2} J' +\frac{1}{k+l\lambda ^2}J'', \end{aligned}$$
(34)

and the second derivative of J can be written as

$$\begin{aligned} J''&= (-1)^{l} \sum _{n=0}^{k-1}\left( {\begin{array}{c}k+l-n\\ l+1\end{array}}\right) \, \frac{l(l+1) a^{(k+l)}_n}{(\lambda ^2-1)^{k+l-n+1}} \nonumber \\&\quad +\,(-1)^k\sum _{n=0}^{l-1}\left( {\begin{array}{c}k+l-n\\ k+1\end{array}}\right) \, \frac{k(k+1) c^{(k+l)}_n(\lambda )}{(1-\lambda ^2)^{k+l-n+1}} \nonumber \\&\quad +\,(-1)^k\sum _{n=0}^{l-1}\left( {\begin{array}{c}k+l-n-1\\ k\end{array}}\right) \, \frac{2k(n+1)\,c^{(k+l)}_{n+1}(\lambda )}{(1-\lambda ^2)^{k+l-n}} \nonumber \\&\quad +\,(-1)^k\sum _{n=0}^{l-1}\left( {\begin{array}{c}k+l-n-2\\ k-1\end{array}}\right) \, \frac{(n+2)(n+1)\,c^{(k+l)}_{n+2}(\lambda )}{(1-\lambda ^2)^{k+l-n-1}}. \end{aligned}$$
(35)

Figure 3 shows \([I(m)]''\) as a function of \(\lambda \) for \(d=4\), for various (kl). From this, we can observe that \([I(m)]''>0\).

Fig. 3
figure 3

Second derivative of I(m) with respect to \(\lambda ^2\) as a function of \(\lambda \), for \(\hat{M}^{(d)}_{k,l}(\lambda )\) with \(d=4\), for various (kl)

As shown in “Appendix A,” at \(\lambda =0\), J and its derivatives become

$$\begin{aligned} \lim _{\lambda \rightarrow 0} J= & {} a^{(k)}_{k-1}, \quad \lim _{\lambda \rightarrow 0} J'=l a^{(k-1)}_{k-1}, \nonumber \\ \lim _{\lambda \rightarrow 0} J''= & {} {\left\{ \begin{array}{ll} l(l+1) a^{(k-2)}_{k-1} &{}\quad \text{(if } \;\; k\ge 2) \\ +\,\infty &{}\quad \text{(if } \;\; k=1), \end{array}\right. } \end{aligned}$$
(36)

where \(a^{(j)}_{j+1}\) is given by

$$\begin{aligned} a^{(j)}_{j+1} =\frac{1}{(j+1)\ln 2} \end{aligned}$$
(37)

instead of Eq. (25). Here, \(J''\) in Eq. (36) diverges for \(k=1\) because

$$\begin{aligned} \lim _{\lambda \rightarrow 0} c^{(j)}_j(\lambda ) = \lim _{\lambda \rightarrow 0} \log _2\lambda ^2+a^{(j)}_{j}, \end{aligned}$$
(38)

which appears in the last sum of Eq. (35) when \(n=l-1\) if \(k=1\). The derivatives of I(m) at \(\lambda =0\) are thus

$$\begin{aligned} \lim _{\lambda \rightarrow 0}\,[I(m)]'= & {} -\frac{l}{k^2\ln 2}, \end{aligned}$$
(39)
$$\begin{aligned} \lim _{\lambda \rightarrow 0}\,[I(m)]''= & {} {\left\{ \begin{array}{ll} \frac{l(k^2+3kl-2l)}{k^3(k-1)\ln 2} &{}\quad \text{(if } \;\; k\ge 2) \\ +\,\infty &{}\quad \text{(if } \;\; k=1). \end{array}\right. } \end{aligned}$$
(40)

Similarly, at \(\lambda =1\), J and its derivatives become

$$\begin{aligned} \lim _{\lambda \rightarrow 1} J= & {} a^{(k+l)}_{k+l-1}, \quad \lim _{\lambda \rightarrow 1} J'=l a^{(k+l)}_{k+l}, \nonumber \\ \lim _{\lambda \rightarrow 1} J''= & {} l(l+1) a^{(k+l)}_{k+l+1}, \end{aligned}$$
(41)

as shown in “Appendix B,” in which case the derivatives of I(m) are

$$\begin{aligned} \lim _{\lambda \rightarrow 1}\,[I(m)]'= & {} 0, \end{aligned}$$
(42)
$$\begin{aligned} \lim _{\lambda \rightarrow 1}\,[I(m)]''= & {} \frac{kl}{(k+l)^2(k+l+1)\ln 2}. \end{aligned}$$
(43)

Likewise, from Eqs. (27), (28), and (29), the first derivatives of G(m), F(m), and R(m) are

$$\begin{aligned}{}[G(m)]'&= -\frac{l}{d+1} \left[ \frac{1}{(k+l\lambda ^2)^2}\right] , \end{aligned}$$
(44)
$$\begin{aligned} {}[F(m)]'&= \frac{kl}{d+1} \left[ \frac{(1-\lambda )(k+l\lambda )}{\lambda (k+l\lambda ^2)^2}\right] , \end{aligned}$$
(45)
$$\begin{aligned} {}[R(m)]'&= kd \left[ \frac{1}{(k+l\lambda ^2)^2} \right] \delta _{d,(k+l)}, \end{aligned}$$
(46)

respectively. These satisfy \([G(m)]'<0\), \([F(m)]'\ge 0\), and \([R(m)]'\ge 0\). Note that \([R(m)]'\) is proportional to \([G(m)]'\) with a non-positive proportionality constant, i.e., \([R(m)]'=\alpha [G(m)]'\) with

$$\begin{aligned} \alpha =-\frac{kd(d+1)}{l}\delta _{d,(k+l)}. \end{aligned}$$
(47)

In addition, the second derivatives of G(m), F(m), and R(m) are

$$\begin{aligned}{}[G(m)]''&= \frac{2l^2}{d+1} \left[ \frac{1}{(k+l\lambda ^2)^3}\right] , \end{aligned}$$
(48)
$$\begin{aligned} {}[F(m)]''&= -\frac{kl}{2(d+1)} \left[ \frac{(k+l\lambda ^2)^2+4l\lambda ^2(1-\lambda )(k+l\lambda )}{\lambda ^3(k+l\lambda ^2)^3}\right] , \end{aligned}$$
(49)
$$\begin{aligned} {}[R(m)]''&= -2kld \left[ \frac{1}{(k+l\lambda ^2)^3}\right] \delta _{d,(k+l)}, \end{aligned}$$
(50)

respectively. These satisfy \([G(m)]''>0\), \([F(m)]''<0\), and \([R(m)]''\le 0\), and \([R(m)]''\) is proportional to \([G(m)]''\) with the same proportionality constant \(\alpha \), given in Eq. (47).

Table 1 Signs of the first and second derivatives of the information and disturbance with respect to \(\lambda ^2\)

The signs of the derivatives of I(m), G(m), F(m), and R(m) are summarized in Table 1. These signs mean that when \(\lambda ^2\) is increased, I(m) and G(m) decrease, while F(m) and R(m) increase. There is a trade-off between the information and the disturbance for \(\hat{M}^{(d)}_{k,l}(\lambda )\).

4 Derivatives with respect to information

Using the derivatives of the information and disturbance with respect to \(\lambda ^2\), we can now calculate the derivative of the disturbance with respect to information for \(\hat{M}^{(d)}_{k,l}(\lambda )\). Let f and g be arbitrary functions of \(\lambda \). Given the derivatives of f and g with respect to \(\lambda ^2\), the first and second derivatives of f with respect to g are

$$\begin{aligned} \frac{\mathrm{d}f}{\mathrm{d}g} = \frac{f'}{g'}, \quad \frac{\mathrm{d}^2f}{\mathrm{d}g^2} = \frac{f'' g'-f' g''}{(g')^3}. \end{aligned}$$
(51)

The same results can be obtained using derivatives with respect to \(\lambda \).

From Eqs. (44), (45), (48), and (49), the first and second derivatives of F(m) with respect to G(m) can be calculated to be

$$\begin{aligned} \frac{\mathrm{d}F(m)}{\mathrm{d}G(m)}&= -k\left[ \frac{(1-\lambda )(k+l\lambda )}{\lambda }\right] , \end{aligned}$$
(52)
$$\begin{aligned} \frac{\mathrm{d}^2F(m)}{\mathrm{d}G(m)^2}&= -\frac{k(d+1)}{2l}\left[ \frac{(k+l\lambda ^2)^3}{\lambda ^3}\right] . \end{aligned}$$
(53)

Figures 4a and 5a show these derivatives as functions of G(m) (Eq. 27) for \(d=4\), for various (kl). Because \(\lambda =0\) corresponds to \(\mathrm{P}_{k}\) and \(\lambda =1\) corresponds to \(\mathrm{P}_{k+l}\) for the lines (kl) in Fig. 1, the derivatives become

$$\begin{aligned} \lim _{\lambda \rightarrow 0}\, \frac{\mathrm{d}F(m)}{\mathrm{d}G(m)} = -\infty , \quad \lim _{\lambda \rightarrow 0}\, \frac{\mathrm{d}^2F(m)}{\mathrm{d}G(m)^2} = -\infty \end{aligned}$$
(54)

at \(\mathrm{P}_{k}\) and

$$\begin{aligned} \lim _{\lambda \rightarrow 1}\, \frac{\mathrm{d}F(m)}{\mathrm{d}G(m)} = 0, \quad \lim _{\lambda \rightarrow 1}\, \frac{\mathrm{d}^2F(m)}{\mathrm{d}G(m)^2} = -\frac{k(k+l)^3(d+1)}{2l} \end{aligned}$$
(55)

at \(\mathrm{P}_{k+l}\). The first derivative of F(m) with respect to G(m) (Eq. 52) is non-positive and the second derivative (Eq. 53) is negative, which means that all the lines (kl) in Fig. 1a are monotonically decreasing convex curves.

Fig. 4
figure 4

First derivatives of the disturbance with respect to information for \(d=4\), for the four information–disturbance pairs: a estimation fidelity G(m) and operation fidelity F(m); b estimation fidelity G(m) and physical reversibility R(m); c information gain I(m) and operation fidelity F(m); and d information gain I(m) and physical reversibility R(m)

Fig. 5
figure 5

Second derivatives of the disturbance with respect to information for \(d=4\), for the four information–disturbance pairs: a estimation fidelity G(m) and operation fidelity F(m); b estimation fidelity G(m) and physical reversibility R(m); c information gain I(m) and operation fidelity F(m); and d information gain I(m) and physical reversibility R(m)

In contrast, from Eqs. (44), (46), (48), and (50), the first and second derivatives of R(m) with respect to G(m) are constant:

$$\begin{aligned} \frac{\mathrm{d}R(m)}{\mathrm{d}G(m)}&= -\frac{kd(d+1)}{l}, \end{aligned}$$
(56)
$$\begin{aligned} \frac{\mathrm{d}^2R(m)}{\mathrm{d}G(m)^2}&= 0 \end{aligned}$$
(57)

if \(k+l=d\), and both derivatives are zero if \(k+l\ne d\). Figures 4b and 5b show these derivatives as functions of G(m) for \(d=4\), for various (kl) satisfying \(k+l=d\). The first derivative of R(m) with respect to G(m) (Eq. 56) is negative and the second derivative (Eq. 57) is zero, which means that all the lines (kl) in Fig. 1b are monotonically decreasing straight lines.

Similarly, from Eqs. (31), (34), (45), and (49), the first and second derivatives of F(m) with respect to I(m) are

$$\begin{aligned} \frac{\mathrm{d}F(m)}{\mathrm{d}I(m)}&= \frac{[F(m)]'}{[I(m)]'}, \end{aligned}$$
(58)
$$\begin{aligned} \frac{\mathrm{d}^2F(m)}{\mathrm{d}I(m)^2}&= \frac{[F(m)]''[I(m)]'-[F(m)]'[I(m)]''}{\left\{ [I(m)]'\right\} ^3}. \end{aligned}$$
(59)

Figures 4c and 5c show these derivatives as functions of I(m) (Eq. 23) for \(d=4\), for various (kl). At \(\mathrm{P}_{k}\), they become

$$\begin{aligned} \lim _{\lambda \rightarrow 0}\, \frac{\mathrm{d}F(m)}{\mathrm{d}I(m)} = -\infty , \quad \lim _{\lambda \rightarrow 0}\, \frac{\mathrm{d}^2F(m)}{\mathrm{d}I(m)^2} = -\infty \end{aligned}$$
(60)

because

$$\begin{aligned} \lim _{\lambda \rightarrow 0}\,[F(m)]' = \infty , \quad \lim _{\lambda \rightarrow 0}\, [F(m)]'' = -\infty . \end{aligned}$$
(61)

Note that the numerator of Eq. (59) goes to positive infinity as \(\lambda \rightarrow 0\) when \([I(m)]'<0\) because \([F(m)]''\) diverges faster than \([F(m)]'\). In contrast, in the limit as \(\lambda \rightarrow 1\), Eqs. (58) and (59) yield the indeterminate form 0 / 0 due to Eq. (42) and

$$\begin{aligned} \lim _{\lambda \rightarrow 1}\,[F(m)]' =0. \end{aligned}$$
(62)

However, by applying L’Hôpital’s rule and considering higher derivatives, we can find that

$$\begin{aligned} \lim _{\lambda \rightarrow 1}\, \frac{\mathrm{d}F(m)}{\mathrm{d}I(m)}= & {} -\frac{(k+l)(k+l+1)\ln 2}{2(d+1)}, \end{aligned}$$
(63)
$$\begin{aligned} \lim _{\lambda \rightarrow 1}\, \frac{\mathrm{d}^2F(m)}{\mathrm{d}I(m)^2}= & {} {\left\{ \begin{array}{ll} +\,\infty &{}\quad \text{(if } \;\; k<l) \\ -\,\frac{k(2k+1)^3}{(2k+3)(d+1)} (\ln 2)^2 &{}\quad \text{(if } \;\; k=l) \\ -\,\infty &{}\quad \text{(if } \;\; k>l) \end{array}\right. } \end{aligned}$$
(64)

at \(\mathrm{P}_{k+l}\), as shown in “Appendix C.” The first derivative of F(m) with respect to I(m) (Fig. 4c) is negative, and the second derivative (Fig. 5c) is always negative if \(k\ge l\) but can be positive near \(\mathrm{P}_{k+l}\) if \(k<l\). This means that the lines (kl) in Fig. 1c are monotonically decreasing convex curves if \(k\ge l\), but monotonically decreasing S-shaped curves if \(k<l\). In particular, even though it is difficult to see from Fig. 1c, the upper boundary \((1,d-1)\) has a slight dent near \(\mathrm{P}_d\) when \(d\ge 3\) [28].

Finally, from Eqs. (31), (34), (46), and (50), the first and second derivatives of R(m) with respect to I(m) are

$$\begin{aligned} \frac{\mathrm{d}R(m)}{\mathrm{d}I(m)}&= \frac{[R(m)]'}{[I(m)]'}, \end{aligned}$$
(65)
$$\begin{aligned} \frac{\mathrm{d}^2R(m)}{\mathrm{d}I(m)^2}&= \frac{[R(m)]''[I(m)]'-[R(m)]'[I(m)]''}{\left\{ [I(m)]'\right\} ^3}. \end{aligned}$$
(66)

Figures 4d and 5d show these derivatives as functions of I(m) for \(d=4\), for various (kl) satisfying \(k+l=d\). (Both derivatives are zero if \(k+l\ne d\).) When \(k+l=d\), they become

$$\begin{aligned} \lim _{\lambda \rightarrow 0}\, \frac{\mathrm{d}R(m)}{\mathrm{d}I(m)}= & {} -\frac{kd\ln 2}{l}, \end{aligned}$$
(67)
$$\begin{aligned} \lim _{\lambda \rightarrow 0}\, \frac{\mathrm{d}^2R(m)}{\mathrm{d}I(m)^2}= & {} {\left\{ \begin{array}{ll} \frac{k^3}{k-1}\left( \frac{d\ln 2}{l}\right) ^2 &{}\quad \text{(if } \;\; k\ge 2) \\ +\,\infty &{}\quad \text{(if } \;\; k=1) \end{array}\right. } \end{aligned}$$
(68)

at \(\mathrm{P}_{k}\), and

$$\begin{aligned} \lim _{\lambda \rightarrow 1}\, \frac{\mathrm{d}R(m)}{\mathrm{d}I(m)} = -\infty , \quad \lim _{\lambda \rightarrow 1}\, \frac{\mathrm{d}^2R(m)}{\mathrm{d}I(m)^2} = +\infty \end{aligned}$$
(69)

at \(\mathrm{P}_{k+l}\). In Eq. (68), the second derivative diverges for \(k=1\) because of the corresponding result in Eq. (40), and the divergences seen in Eq. (69) likewise come from Eq. (42). Note that

$$\begin{aligned} \lim _{\lambda \rightarrow 1}\, \frac{1}{[I(m)]'} =-\infty \end{aligned}$$
(70)

because \([I(m)]'\) tends to zero from below, as shown in Fig. 2. The first derivative of R(m) with respect to I(m) (Fig. 4d) is negative and the second derivative (Fig. 5d) is positive, which means that all the lines (kl) in Fig. 1d are monotonically decreasing concave curves.

Table 2 Signs of the first and second derivatives of the disturbance with respect to information

The signs of the derivatives for the four information–disturbance pairs are summarized in Table 2. All the first derivatives have negative signs, which implies that there is a trade-off between the information and the disturbance for each of the four pairs. In contrast, the second derivatives have different signs, which implies that the optimal measurements are different for each of the four pairs [28].

5 Conclusion

In this paper, we have obtained the first and second derivatives of the disturbance with respect to information for a class of quantum measurements described by the measurement operator \(\hat{M}^{(d)}_{k,l}(\lambda )\) (Eq. 22). When the measurement performed on a d-level system in a completely unknown state yields a single outcome m, the information is quantified by the Shannon entropy I(m) (Eq. 23) and the estimation fidelity G(m) (Eq. 27), while the disturbance is quantified by the operation fidelity F(m) (Eq. 28) and the physical reversibility R(m) (Eq. 29). In these four information–disturbance planes, \(\hat{M}^{(d)}_{k,l}(\lambda )\) with \(0\le \lambda \le 1\) corresponds to a line (kl), as shown in Fig. 1. In particular, the lines \((1,d-1)\) and (k, 1) form the boundaries of the allowed regions obtained by plotting all physically possible measurement operators in these planes [28].

The slope and curvature of each line (kl) are given by the first and second derivatives of the disturbance with respect to the information for \(\hat{M}^{(d)}_{k,l}(\lambda )\). For these four information–disturbance pairs, the first derivatives are given by Eqs. (52), (56), (58), and (65) (shown for \(d=4\) in Fig. 4), while the second derivatives are given by Eqs. (53), (57), (59), and (66) (shown for \(d=4\) in Fig. 5). For the derivative of F(m) with respect to G(m), all the lines (kl) in Fig. 1a are monotonically decreasing convex curves, because the first and second derivatives are non-positive and negative, respectively, as shown in Figs. 4a and 5a. For the derivative of R(m) with respect to G(m), all the lines (kl) in Fig. 1b are monotonically decreasing straight lines, because the first and second derivatives are negative and zero, respectively, as shown in Figs. 4b and 5b. For the derivative of F(m) with respect to I(m), the lines (kl) in Fig. 1c are monotonically decreasing convex curves if \(k\ge l\) and monotonically decreasing S-shaped curves if \(k<l\), because the first derivative is negative and the second derivative is always negative if \(k\ge l\) but can be positive near \(\mathrm{P}_{k+l}\) if \(k<l\), as shown in Figs. 4c and 5c. Finally, for the derivative of R(m) with respect to I(m), all the lines (kl) in Fig. 1d are monotonically decreasing concave curves, because the first and second derivatives are negative and positive, respectively, as shown in Figs. 4d and 5d. See also Table 2 for a summary of the signs of the derivatives.

Based on these results, we can see that the boundaries \((1,d-1)\) and (k, 1) of the allowed regions have non-positive slopes for all four information–disturbance pairs, indicating that there is a trade-off between the information and the disturbance for measurements on their boundaries. When the information is increased by moving along a boundary, the disturbance also increases, decreasing F(m) and R(m). In addition, the rate of change of the disturbance with respect to information is given by the boundary’s slope. For example, if G(m) is increased by \(\varDelta G(m)\), F(m) decreases by about

$$\begin{aligned} \varDelta F(m)= \left| \frac{\mathrm{d}F(m)}{\mathrm{d}G(m)}\right| \varDelta G(m). \end{aligned}$$
(71)

Figure 4a shows that \(\left| \mathrm{d}F(m)/\mathrm{d}G(m)\right| \) is infinitely large near \(\mathrm{P}_{1}\), but almost zero near \(\mathrm{P}_{d}\).

In contrast, the curvatures of the boundaries \((1,d-1)\) and (k, 1) for the four information–disturbance pairs have different signs. This means that the allowed regions are extended in different ways when the information and disturbance are averaged over all possible outcomes, as with I, G, F, and R, given by Eqs. (9), (12), (15), and (20), because the allowed regions for the average values are the convex hulls of those for a single outcome [28]. The upper boundaries of the allowed regions for the average values correspond to the optimal measurements that saturate the upper information bounds for a given disturbance. Consequently, the optimal measurements are different for each of the four information–disturbance pairs [28].