1 Introduction

Let A be a Hermitian operator on a Hilbert space \(\mathcal {H}_{}\). Then, for any (not necessarily normalized) vector \(\left| \psi \right\rangle \in \mathcal {H}_{}\),

$$\begin{aligned} A\left| \psi \right\rangle = \left\langle A \right\rangle \left| \psi \right\rangle + \Delta A \left| \psi ^{\perp }_A \right\rangle , \end{aligned}$$
(1)

where \(\left\langle A \right\rangle = \left\langle \psi \big \vert A \big \vert \psi \right\rangle /\left\langle \psi \big \vert \psi \right\rangle \) is the expectation value of A, \(\Delta A = \sqrt{\left\langle A^2 \right\rangle - \left\langle A \right\rangle ^2}\) is its standard deviation, and \(\left| \psi ^{\perp }_A \right\rangle \) is a vector that is orthogonal to \(\left| \psi \right\rangle \), has equal norm \(\left\langle \psi ^{\perp }_A \big \vert \psi ^{\perp }_A \right\rangle = \left\langle \psi \big \vert \psi \right\rangle \), and depends on the operator A.

Equation (1) is the Aharonov–Vaidman identity, which first appeared in [1]. Yakir Aharonov has stated that he “[does not] understand why it doesn’t appear in every quantum book” [2]. The main purpose of this article is to explain why it should appear in undergraduate quantum mechanics textbooks.Footnote 1

The uncertainty relation that is proved most often in quantum mechanics classes and textbooks is the Robertson relation [4]:

$$\begin{aligned} \Delta A \Delta B \ge \frac{1}{2} \left| \left\langle \left[ A, B\right] \right\rangle \right| , \end{aligned}$$
(2)

where \([A,B] = AB-BA\) is the commutator.

As pointed out by Schrödinger [5], the Robertson relation can be extended to

$$\begin{aligned} \left( \Delta A \right) ^2 \left( \Delta B \right) ^2 \ge \left| \frac{1}{2} \left\langle \left\{ A,B\right\} \right\rangle -\left\langle A \right\rangle \left\langle B \right\rangle \right| ^2 + \left| \frac{1}{2}\left\langle [A,B] \right\rangle \right| ^2, \end{aligned}$$
(3)

where \(\{A,B\} = AB + BA\) is the anti-commutator.

Although not often emphasized in quantum mechanics classes, the Schrödinger relation is not harder to prove than the Robertson relation. In fact, the standard textbook proof of the Robertson relation effectively proves the Schrödinger relation and then throws away the anti-commutator term.

The proof almost universally adopted in textbooks is based on the Cauchy–Schwarz inequality. While this proof is elementary for those familiar with the mathematics of Hilbert spaces, it can be daunting for undergraduate physics students, who are likely encountering Hilbert spaces for the first time along with quantum mechanics.

In this article, I will review more direct proofs of (2) and (3) from the Aharonov–Vaidman identity that only make use of basic properties of complex numbers and inner products. These proofs previously appeared in [6] and the proof of the Robertson relation is also problem 3.10 in Aharonov and Rohrlich’s book “Quantum Paradoxes” [7]. The proof of the Aharonov–Vaidman identity itself uses similar ideas to one of the standard proofs of the Cauchy–Schwarz identity, but is perhaps more memorable to undergraduate physics students because it uses concepts that have a physical meaning, i.e. expectation values and standard deviations. The proof of the Robertson and Schrödinger relations so obtained is not independent of the standard Cauchy–Schwarz based proof. I shall discuss their relationship and show that the Cauchy–Schwarz inequality can itself be derived from (1). The main virtue of using the Aharonov–Vaidman based proof of the uncertainty relation is that it is more direct and involves fewer abstractions.

To be clear, I am not against using or teaching the Cauchy–Schwarz inequality. It has been called “one of the most widely used and important inequalities in all of mathematics” [8]. In fact, the Aharonov–Vaidman based proof still uses one instance of the Cauchy–Schwarz inequality, namely that if \(\left| \psi \right\rangle \) and \(\left| \phi \right\rangle \) are unit vectors then \(\left| \left\langle \phi \big \vert \psi \right\rangle \right| \le 1\). But this is easily motivated by the idea that \(\left\langle \phi \big \vert \psi \right\rangle \) is a generalization of the cosine of an angle, and it is used in a more direct way than in the standard proof. Students of quantum mechanics also need to know the Cauchy–Schwarz inequality to prove that the Born rule always yields well-defined probabilities. Physics students should learn the Cauchy–Schwarz inequality. I just think it should be used in a less abstract way where possible.

Besides the Robertson and Schrödinger relations, many other uncertainty relations are known. Indeed, since uncertainty relations have found applications in quantum information science [9,10,11,12,13,14,15] and quantum foundations [16, 17], proving new ones has become something of a sport. The two most common classes of uncertainty relations are those based on entropy [18] and those based on standard deviations [4, 5, 19]. Many of the standard deviation based relations can be derived from the Aharonov–Vaidman relation. I include a proof of the Maccone-Pati uncertainty relations [20] to illustrate this. While these are not the most recent or tightest known uncertainty relations, I include them because they have a simple and elegant Aharonov–Vaidman based proof. For more recent work on standard deviation uncertainty relations, see [21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41].

Another place where relationships between standard deviations are important is in the propagation of uncertainty. In classical statistics, if random variables \(X_1, X_2,\ldots ,X_n\) have standard deviations \(\Delta X_1, \Delta X_2, \ldots \Delta X_n,\) then a function of them \(f(X_1,X_2,\ldots ,X_n)\) has standard deviation \(\Delta f\) that is a function of \(\Delta X_1, \Delta X_2, \ldots \Delta X_n\) (and their correlations if the variables are not independent). Formulas for the propagation of uncertainty tell us how to compute this function and are commonly used to estimate experimental errors. In quantum mechanics, similar formulas can be derived relating the standard deviations of observables. They differ from their classical counterparts due to the fact that quantum observables do not commute, but provided this is taken care of they can be derived by the same methods as in the classical case. However, they can alternatively be derived from the Aharonov-Vaidman identity, as I shall explain.

Although the Aharonov–Vaidman identity is usually discussed for pure quantum states, it can be extended to mixed states, either by use of purification or an equivalent concept called an amplitude operator. Relations between standard deviations can be extended to mixed states, but obtaining tight bounds is sometimes more difficult than in the pure case due to the need to optimize over all purifications or amplitude operators that can represent a given mixed state.

The remainder of this article is structured as follows. Section 2 gives the proof of the Aharonov–Vaidman identity and a corollary that is useful for understanding the equality conditions in uncertainty relations. Section 3 presents the proof of the Robertson and Schrödinger relations based on the Aharonov–Vaidman identity. Section 4 explains the relationship with the standard textbook proof of the Robertson relation and explains how the Cauchy–Schwarz inequality can be derived from the Aharonov–Vaidman identity. Section 5 comments on the effective teaching of the Robertson uncertainty relations via the Aharonov–Vaidman identity. Section 6 presents Aharonov–Vaidman based proofs of the Maccone–Pati uncertainty relations. Section 7 describes how to use the Aharonov–Vaidman identity to derive formulas for the propagation of quantum uncertainty. Section 8 explains how to generalize the Aharonov–Vaidman relation to mixed states using amplitude operators. (The relationship between amplitude operators and purifications is discussed in Appendix A.) Finally, Sect. 9 presents the summary and conclusions.

I intend this article to be pedagogical and self-contained, so as to be accessible to undergraduate students and anyone teaching introductory quantum mechanics.

2 Proof of the Aharonov Vaidman identity

Sometimes, it is useful to generalize the Aharonov–Vaidman identity to non-Hermitian operators, so we prove the more general version here.

Proposition 2.1

(The Aharonov–Vaidman Identity) Let A be a linear operator on a Hilbert space \(\mathcal {H}_{}\) and let \(\left| \psi \right\rangle \) be a (not necessarily normalized) vector in \(\mathcal {H}\). Then,

$$\begin{aligned} A \left| \psi \right\rangle = \left\langle A \right\rangle \left| \psi \right\rangle + \Delta A \left| \psi ^{\perp }_A \right\rangle , \end{aligned}$$
(4)

where \(\left\langle A \right\rangle = \left\langle \psi \big \vert A \big \vert \psi \right\rangle /\left\langle \psi \big \vert \psi \right\rangle \), \(\Delta A = \sqrt{\left\langle A^{\dagger }A \right\rangle - \left| \left\langle A \right\rangle \right| ^2}\), and \(\left| \psi ^{\perp }_A \right\rangle \) is a vector orthogonal to \(\left| \psi \right\rangle \) that depends on both \(\left| \psi \right\rangle \) and A and satisfies \(\left\langle \psi ^{\perp }_A \big \vert \psi ^{\perp }_A \right\rangle = \left\langle \psi \big \vert \psi \right\rangle \).

Note that, if A is Hermitian, then this reduces to (1), where \(\left\langle A \right\rangle \) and \(\Delta A\) are the expectation value and standard deviation. In general, \(\left\langle A \right\rangle \) is a complex number, but \(\Delta A\) is always real and non-negative.

For most of what we need to do, it is sufficient to consider the case where \(\left| \psi \right\rangle \) is a unit vector, in which case \(\left| \psi ^{\perp }_A \right\rangle \) is also a unit vector. The exception is the proof of the Cauchy–Schwarz inequality (Proposition 4.1 in Sect. 4), which uses the identity with an unnormalized vector.

Proof

Given a vector \(\left| \psi \right\rangle \in \mathcal {H}_{}\), any other vector \(\left| \phi \right\rangle \in \mathcal {H}_{}\) can be written as \(\left| \phi \right\rangle = \alpha \left| \psi \right\rangle + \beta \left| \psi ^{\perp } \right\rangle \), where \(\alpha \) and \(\beta \) are complex coefficients and \(\left| \psi ^{\perp } \right\rangle \) is some vector that is orthogonal to \(\left| \psi \right\rangle \). By an appropriate rescaling of \(\beta \), we can ensure that \(\left\langle \psi ^{\perp } \big \vert \psi ^{\perp } \right\rangle = \left\langle \psi \big \vert \psi \right\rangle \). Applying this to \(\left| \phi \right\rangle = A\left| \psi \right\rangle \) gives

$$\begin{aligned} A\left| \psi \right\rangle = \alpha \left| \psi \right\rangle + \beta \left| \psi ^{\perp } \right\rangle . \end{aligned}$$
(5)

To determine \(\alpha \), take the inner product of (5) with \(\left| \psi \right\rangle \), which gives

$$\begin{aligned} \left\langle \psi \big \vert A \big \vert \psi \right\rangle = \alpha \left\langle \psi \big \vert \psi \right\rangle . \end{aligned}$$
(6)

Rearranging this gives \(\alpha =\left\langle A \right\rangle \).

To determine \(\beta \), substitute \(\alpha = \left\langle A \right\rangle \) into (5) and take the inner product of \(A\left| \psi \right\rangle \) with itself to obtain

$$\begin{aligned} \left\langle \psi \right| A^{\dagger }A\left| \psi \right\rangle&= \left| \left\langle A \right\rangle \right| ^2 \left\langle \psi \big \vert \psi \right\rangle + \vert \beta \vert ^2 \left\langle \psi ^{\perp } \big \vert \psi ^{\perp } \right\rangle \\&= \left| \left\langle A \right\rangle \right| ^2 \left\langle \psi \big \vert \psi \right\rangle + \vert \beta \vert ^2 \left\langle \psi \big \vert \psi \right\rangle , \end{aligned}$$

where we have used \(\left\langle \psi ^{\perp } \big \vert \psi ^{\perp } \right\rangle = \left\langle \psi \big \vert \psi \right\rangle \).

Rearranging and using \(\left\langle A^{\dagger }A \right\rangle = \left\langle \psi \big \vert A^{\dagger }A \big \vert \psi \right\rangle /\left\langle \psi \big \vert \psi \right\rangle \) gives

$$\begin{aligned} \vert \beta \vert ^2 = \left\langle A^{\dagger }A \right\rangle - \left| \left\langle A \right\rangle \right| ^2 = (\Delta A)^2. \end{aligned}$$
(7)

This means that \(\beta = (\Delta A) e^{i\theta }\) for some phase angle \(\theta \). If we define \(\left| \psi ^{\perp }_A \right\rangle = e^{i\theta }\left| \psi ^{\perp } \right\rangle ,\) then \(\left| \psi ^{\perp }_A \right\rangle \) is still orthogonal to \(\left| \psi \right\rangle \), its norm is unchanged, and we have (4). \(\square \)

The following corollary is useful for finding the conditions for equality in uncertainty relations.

Corollary 2.2

In general, for two operators A and B, and for a unit vector \(\left| \psi \right\rangle \),

$$\begin{aligned} \left\langle \psi ^{\perp }_A \big \vert \psi ^{\perp }_B \right\rangle = \frac{\left\langle A^{\dagger }B \right\rangle - \left\langle A \right\rangle ^*\left\langle B \right\rangle }{\Delta A\Delta B}. \end{aligned}$$
(8)

Proof

From Proposition 2.1, we have

$$\begin{aligned} A\left| \psi \right\rangle&= \left\langle A \right\rangle \left| \psi \right\rangle + \Delta A \left| \psi ^{\perp }_A \right\rangle , \end{aligned}$$
(9)
$$\begin{aligned} B\left| \psi \right\rangle&= \left\langle B \right\rangle \left| \psi \right\rangle + \Delta B \left| \psi ^{\perp }_B \right\rangle . \end{aligned}$$
(10)

Taking the inner product of these gives

$$\begin{aligned} \left\langle \psi \right| A^{\dagger }B\left| \psi \right\rangle = \left\langle A \right\rangle ^*\left\langle B \right\rangle + \Delta A \Delta B \left\langle \psi ^{\perp }_A \big \vert \psi ^{\perp }_B \right\rangle . \end{aligned}$$
(11)

Rearranging gives the desired result. \(\square \)

Note that if A and B are Hermitian, then we have

$$\begin{aligned} \left\langle \psi ^{\perp }_A \big \vert \psi ^{\perp }_B \right\rangle = \frac{\left\langle AB \right\rangle - \left\langle A \right\rangle \left\langle B \right\rangle }{\Delta A\Delta B}. \end{aligned}$$
(12)

If it is also the case that \([A,B] = 0,\) then (12) is the correlation, denoted \(\text {corr}_{A,B}\), that would be obtained from a joint measurement of A and B. The correlation is a well-known statistical measure of how two random variables are related to one another. (12) is a formal generalization of the correlation, so we will also denote it \(\text {corr}_{A,B}\). However, if A and B do not commute, then \(\text {corr}_{A,B}\) is generally a complex number, there is no joint measurement of A and B of which \(\text {corr}_{A,B}\) could be the correlation, and AB is not an observable.

The real and imaginary parts of \(\text {corr}_{A,B}\) are

$$\begin{aligned} \text {Re} \left( \text {corr}_{A,B} \right)&= \frac{1}{2} \left( \left\langle \psi ^{\perp }_A \big \vert \psi ^{\perp }_B \right\rangle + \left\langle \psi ^{\perp }_B \big \vert \psi ^{\perp }_A \right\rangle \right) = \frac{\frac{1}{2}\left\langle \left\{ A, B \right\} \right\rangle - \left\langle A \right\rangle \left\langle B \right\rangle }{\Delta A\Delta B}, \end{aligned}$$
(13)
$$\begin{aligned} \text {Im} \left( \text {corr}_{A,B} \right)&= \frac{1}{2i} \left( \left\langle \psi ^{\perp }_A \big \vert \psi ^{\perp }_B \right\rangle - \left\langle \psi ^{\perp }_B \big \vert \psi ^{\perp }_A \right\rangle \right) = \frac{\left\langle \left[ A, B \right] \right\rangle }{2i\Delta A\Delta B}. \end{aligned}$$
(14)

The real part is also a formal generalization of the correlation, in that it reduces to the classical formula when A and B commute. We denote it \(\text {Rcorr}_{A,B}\).

3 The Robertson and Schrödinger uncertainty relations

We are now in a position to prove the Robertson and Schrödinger uncertainty relations.

Proposition 3.1

(The Robertson uncertainty relation) Let A and B be two Hermitian operators on a Hilbert space \(\mathcal {H}_{}\). Then, for any unit vector \(\left| \psi \right\rangle \in \mathcal {H}_{}\)

$$\begin{aligned} \Delta A \Delta B \ge \frac{1}{2}\left| \left\langle [A,B] \right\rangle \right| . \end{aligned}$$
(15)

Proof

From the Aharonov–Vaidman identity, we have

$$\begin{aligned} A\left| \psi \right\rangle&= \left\langle A \right\rangle \left| \psi \right\rangle + \Delta A \left| \psi ^{\perp }_A \right\rangle , \end{aligned}$$
(16)
$$\begin{aligned} B\left| \psi \right\rangle&= \left\langle B \right\rangle \left| \psi \right\rangle + \Delta B \left| \psi ^{\perp }_B \right\rangle . \end{aligned}$$
(17)

Taking the inner product of these two equations and its complex conjugate gives

$$\begin{aligned} \left\langle \psi \big \vert AB \big \vert \psi \right\rangle&= \left\langle A \right\rangle \left\langle B \right\rangle + \Delta A \Delta B \left\langle \psi ^{\perp }_A \big \vert \psi ^{\perp }_B \right\rangle , \end{aligned}$$
(18)
$$\begin{aligned} \left\langle \psi \big \vert BA \big \vert \psi \right\rangle&= \left\langle A \right\rangle \left\langle B \right\rangle + \Delta A \Delta B \left\langle \psi ^{\perp }_B \big \vert \psi ^{\perp }_A \right\rangle . \end{aligned}$$
(19)

Subtracting these two equations gives

$$\begin{aligned} \left\langle \psi \big \vert (AB-BA) \big \vert \psi \right\rangle = \Delta A \Delta B \left( \left\langle \psi ^{\perp }_A \big \vert \psi ^{\perp }_B \right\rangle - \left\langle \psi ^{\perp }_B \big \vert \psi ^{\perp }_A \right\rangle \right) , \end{aligned}$$
(20)

or

$$\begin{aligned} \left\langle [A,B] \right\rangle = \Delta A \Delta B \left( \left\langle \psi ^{\perp }_A \big \vert \psi ^{\perp }_B \right\rangle - \left\langle \psi ^{\perp }_B \big \vert \psi ^{\perp }_A \right\rangle \right) . \end{aligned}$$
(21)

Since \(\left\langle \psi ^{\perp }_B \big \vert \psi ^{\perp }_A \right\rangle \) is the complex conjugate of \(\left\langle \psi ^{\perp }_A \big \vert \psi ^{\perp }_B \right\rangle \), we can rewrite this as

$$\begin{aligned} \left\langle [A,B] \right\rangle = 2 i \Delta A \Delta B \textrm{Im} \left( \left\langle \psi ^{\perp }_A \big \vert \psi ^{\perp }_B \right\rangle \right) . \end{aligned}$$
(22)

Taking the absolute value of both sides and rearranging it gives

$$\begin{aligned} \Delta A \Delta B \left| \textrm{Im} \left( \left\langle \psi ^{\perp }_A \big \vert \psi ^{\perp }_B \right\rangle \right) \right| = \frac{1}{2} \left| \left\langle [A,B] \right\rangle \right| . \end{aligned}$$
(23)

Because \(\left| \psi ^{\perp }_A \right\rangle \) and \(\left| \psi ^{\perp }_B \right\rangle \) are unit vectors, \(0 \le \left| \left\langle \psi ^{\perp }_A \big \vert \psi ^{\perp }_B \right\rangle \right| ^2 \le 1\), and hence the absolute value of the imaginary part of \(\left\langle \psi ^{\perp }_B \big \vert \psi ^{\perp }_A \right\rangle \) is also bounded between 0 and 1. Hence, we have

$$\begin{aligned} \Delta A \Delta B \ge \frac{1}{2} \left| \left\langle [A,B] \right\rangle \right| . \end{aligned}$$
(24)

\(\square \)

The condition for equality in the Robertson relation is \(\left| \text {Im} \left( \left\langle \psi ^{\perp }_A \big \vert \psi ^{\perp }_B \right\rangle \right) \right| = 1\) or, equivalently, \(\text {corr}_{A,B} = \pm i\). States that saturate the inequality are called (Robertson) intelligent states. The condition \(\text {corr}_{A,B} = \pm i\) can be used to find intelligent states, although this is not easier than solving for equality in the Robertson relation directly.

Proposition 3.2

(The Schrödinger uncertainty relation) Let A and B be two Hermitian operators on a Hilbert space \(\mathcal {H}_{}\). Then, for any unit vector \(\left| \psi \right\rangle \in \mathcal {H}_{},\)

$$\begin{aligned} \left( \Delta A \right) ^2 \left( \Delta B \right) ^2 \ge \left| \frac{1}{2} \left\langle \left\{ A,B\right\} \right\rangle -\left\langle A \right\rangle \left\langle B \right\rangle \right| ^2 + \left| \frac{1}{2}\left\langle [A,B] \right\rangle \right| ^2. \end{aligned}$$
(25)

Proof

Taking the sum of (18) and (19) gives

$$\begin{aligned} \left\langle \left\{ A,B \right\} \right\rangle = 2\left\langle A \right\rangle \left\langle B \right\rangle + \Delta A \Delta B \left( \left\langle \psi ^{\perp }_A \big \vert \psi ^{\perp }_B \right\rangle + \left\langle \psi ^{\perp }_B \big \vert \psi ^{\perp }_A \right\rangle \right) , \end{aligned}$$
(26)

or

$$\begin{aligned} \left\langle \left\{ A,B \right\} \right\rangle - 2\left\langle A \right\rangle \left\langle B \right\rangle = \Delta A \Delta B \left( \left\langle \psi ^{\perp }_A \big \vert \psi ^{\perp }_B \right\rangle + \left\langle \psi ^{\perp }_B \big \vert \psi ^{\perp }_A \right\rangle \right) . \end{aligned}$$
(27)

Adding this to (21) gives

$$\begin{aligned} \left\langle \left\{ A,B \right\} \right\rangle - 2\left\langle A \right\rangle \left\langle B \right\rangle + \left\langle [A,B] \right\rangle = 2 \Delta A \Delta B \left\langle \psi ^{\perp }_A \big \vert \psi ^{\perp }_B \right\rangle , \end{aligned}$$
(28)

or

$$\begin{aligned} \Delta A \Delta B \left\langle \psi ^{\perp }_A \big \vert \psi ^{\perp }_B \right\rangle = \frac{1}{2}\left\langle \left\{ A,B \right\} \right\rangle - \left\langle A \right\rangle \left\langle B \right\rangle + \frac{1}{2} \left\langle [A,B] \right\rangle . \end{aligned}$$
(29)

Now, because A and B are Hermitian, \(\{A,B\}\) is Hermitian and [AB] is anti-Hermitian. Therefore, \(\left\langle \{A,B\} \right\rangle \) is real and \(\left\langle [A,B] \right\rangle \) is imaginary. Further, \(\left\langle A \right\rangle \), \(\left\langle B \right\rangle \), \(\Delta A\) and \(\Delta B\) are real. Therefore, taking the modulus squared of (29) gives

$$\begin{aligned} (\Delta A)^2 (\Delta B)^2 \left| \left\langle \psi ^{\perp }_A \big \vert \psi ^{\perp }_B \right\rangle \right| ^2 = \left| \frac{1}{2}\left\langle \left\{ A,B \right\} \right\rangle - \left\langle A \right\rangle \left\langle B \right\rangle \right| ^2 + \left| \frac{1}{2} \left\langle [A,B] \right\rangle \right| ^2. \end{aligned}$$
(30)

Finally, because \(\left| \psi ^{\perp }_A \right\rangle \) and \(\left| \psi ^{\perp }_B \right\rangle \) are unit vectors, we have \(0 \le \left| \left\langle \psi ^{\perp }_A \big \vert \psi ^{\perp }_B \right\rangle \right| ^2 \le 1\), from which the result follows. \(\square \)

The condition for equality in the Schrödinger relation is \(\left| \left\langle \psi ^{\perp }_A \big \vert \psi ^{\perp }_B \right\rangle \right| ^2 = \left| \text {corr}_{A,B} \right| ^2 = 1\). States that saturate the inequality are called (Schrödinger) intelligent states. The condition \(\left| \text {corr}_{A,B} \right| ^2 = 1\) can be used to find intelligent states, although this is not easier than solving for equality in the Schrödinger relation directly.

4 The textbook proof and the Cauchy–Schwarz inequality

The textbook proofs of the Robertson and Schrödinger uncertainty relations are based on the Cauchy–Schwarz inequality

$$\begin{aligned} \left| \left\langle f \big \vert g \right\rangle \right| ^2 \le \left\langle f \big \vert f \right\rangle \left\langle g \big \vert g \right\rangle . \end{aligned}$$
(31)

Note that the proofs given in Sect. 3 also make use of a special case of this inequality: that for unit vectors \(\left| \left\langle f \big \vert g \right\rangle \right| ^2 \le 1\). This is applied to \(\left| f \right\rangle = \left| \psi ^{\perp }_A \right\rangle \), \(\left| g \right\rangle = \left| \psi ^{\perp }_B \right\rangle \). My aim is not to eliminate any use of the Cauchy–Schwarz inequality, but just to argue that the proof is more memorable if the inequality is applied in a different way than in the standard proof.

In the standard proof, the Cauchy–Schwarz inequality is applied to the two vectors \(\left| f \right\rangle = \left( A - \left\langle A \right\rangle \right) \left| \psi \right\rangle \) and \(\left| g \right\rangle = \left( B - \left\langle B \right\rangle \right) \left| \psi \right\rangle \) to obtain

$$\begin{aligned} \left| \left\langle \psi \big \vert (A - \left\langle A \right\rangle )(B - \left\langle B \right\rangle ) \big \vert \psi \right\rangle \right| ^2 \le \left\langle \psi \big \vert (A - \left\langle A \right\rangle )^2 \big \vert \psi \right\rangle \left\langle \psi \big \vert (B- \left\langle B \right\rangle )^2 \big \vert \psi \right\rangle . \end{aligned}$$
(32)

A few lines of messy algebra and cancellations, which I will spare you the details of, yield

$$\begin{aligned} \left( \Delta A \right) ^2 \left( \Delta B \right) ^2 \ge \left| \frac{1}{2} \left\langle \left\{ A,B \right\} \right\rangle - \left\langle A \right\rangle \left\langle B \right\rangle + \frac{1}{2} \left\langle \left[ A,B \right] \right\rangle \right| ^2, \end{aligned}$$
(33)

from which we can derive the Schrödinger and Robertson relations by recognizing the real and imaginary parts of the right hand side.

As physics students do not often see the Cauchy–Schwarz inequality prior to their first course on quantum mechanics, most textbooks include a proof of this as well. One of the common proofs uses reasoning similar to that which we used to establish the Aharonov–Vaidman identity. It starts by recognizing that \(\left| g \right\rangle \) can be written as

$$\begin{aligned} \left| g \right\rangle = \alpha \left| f \right\rangle + \beta \left| f^{\perp } \right\rangle , \end{aligned}$$
(34)

where \(\left| f^{\perp } \right\rangle \) is a unit vector that is orthogonal to \(\left| f \right\rangle \). To find \(\alpha \), take the inner product of this with \(\left| f \right\rangle \), which yields \(\alpha = \left\langle f \big \vert g \right\rangle /\left\langle f \big \vert f \right\rangle \). Substituting this back into (34) and then taking the inner product of \(\left| g \right\rangle \) with itself give

$$\begin{aligned} \left\langle g \big \vert g \right\rangle = \frac{\left| \left\langle f \big \vert g \right\rangle \right| ^2}{\left\langle f \big \vert f \right\rangle } + \left| \beta \right| ^2. \end{aligned}$$
(35)

The Cauchy–Schwarz inequality follows from this by recognizing that \(\left| \beta \right| ^2\) is real and non-negative.

Summarizing, the standard proof of the Robertson inequality consists of: proving the Cauchy–Schwarz inequality and then finding convenient vectors to insert into the inequality that will yield terms involving \(\Delta A\) and \(\Delta B\) after some algebra. From the Aharonov–Vaidman identity, we can see that the reason the choice \(\left| f \right\rangle = \left( A - \left\langle A \right\rangle \right) \left| \psi \right\rangle \) and \(\left| g \right\rangle = \left( B - \left\langle B \right\rangle \right) \left| \psi \right\rangle \) is guaranteed to work is that \(\left| f \right\rangle = \Delta A \left| \psi ^{\perp }_A \right\rangle \) and \(\left| g \right\rangle = \Delta B \left| \psi ^{\perp }_B \right\rangle \).

After inserting these choices, one has to multiply out and simplify the expressions in the Cauchy–Schwarz inequality. This involves recognizing things like \(\left\langle A \right\rangle \left\langle \psi \big \vert A \big \vert \psi \right\rangle = \left\langle A \right\rangle ^2\) and then canceling several terms. It is difficult for students to follow the full details of this in a lecture. In the approach using the Aharonov–Vaidman relation, we already have expressions involving \(\Delta A\) and \(\Delta B\), so it is easier to see how to get an expression involving \(\Delta A \Delta B\). This expression has fewer terms and there is less cancellation to do.

Although the approach using the Aharonov–Vaidman identity uses the Cauchy–Schwarz inequality in a less convoluted way, it uses similar mathematical ideas. For vectors \(\left| f \right\rangle \) and \(\left| g \right\rangle \), we can write \(\left| g \right\rangle \) in terms of \(\left| f \right\rangle \) and an orthogonal vector, as in the proof of Cauchy–Schwarz, or we can write both vectors in terms of a third vector \(\left| h \right\rangle \) as

$$\begin{aligned} \left| f \right\rangle&= \alpha _1 \left| h \right\rangle + \beta _1 \left| h^{\perp }_f \right\rangle ,\end{aligned}$$
(36)
$$\begin{aligned} \left| g \right\rangle&= \alpha _2 \left| h \right\rangle + \beta _2 \left| h^{\perp }_g \right\rangle , \end{aligned}$$
(37)

where \(\left| h^{\perp }_f \right\rangle \) and \(\left| h^{\perp }_g \right\rangle \) are (generally different) vectors orthogonal to \(\left| h \right\rangle \) and \(\alpha _1,\beta _1,\alpha _2,\beta _2\) are complex coefficients. This is what we do in the proof of the Aharonov–Vaidman identity with the choices \(\left| f \right\rangle = A\left| \psi \right\rangle \), \(\left| g \right\rangle = B\left| \psi \right\rangle \) and \(\left| h \right\rangle = \left| \psi \right\rangle \). The advantage of this approach is that it immediately yields expressions involving the expectation values and standard deviations of the observables, with which it is easy to see what to do to get the uncertainty relations. From this point of view, the standard proof looks like shoehorning something into the Cauchy–Schwarz inequality that will yield standard deviations, and then backtracking to a point more easily obtained from the Aharonov–Vaidman identity. At the end of the day, both approaches use the same mathematics, but the Aharonov–Vaidman approach does so in a simpler and more direct way.

I would go so far as to say that whenever you are tempted to use the Cauchy–Schwarz inequality to prove a relationship between standard deviations of observables in quantum mechanics, you will have an easier time working from the Aharonov–Vaidman identity (and the special case \(\left| \left\langle f \big \vert g \right\rangle \right| ^2 \le 1\) of the Cauchy–Schwarz inequality for unit vectors) instead. Sections 6 and 7 give more examples of this.

I end this section by showing that you can prove the Cauchy–Schwarz inequality from the Aharonov–Vaidman identity. I include this not because I think it is the best way to prove the Cauchy–Schwarz inequality, but because finding alternative proofs of the Cauchy–Schwarz inequality is the mathematician’s equivalent of the sport of finding new uncertainty relations in quantum mechanics. It also shows that, in principle, there is nothing that can be proved using the Cauchy–Schwarz inequality that could not be proved using the Aharonov–Vaidman identity. Of course, outside the context of standard deviations in quantum mechanics, using the Aharonov–Vaidman identity instead of the Cauchy–Schwarz inequality is unlikely to yield a better proof.

Proposition 4.1

(Cauchy–Schwarz Inequality) Let \(\left| f \right\rangle \) and \(\left| g \right\rangle \) be two vectors in a Hilbert space \(\mathcal {H}_{}\). Then

$$\begin{aligned} \left| \left\langle f \big \vert g \right\rangle \right| ^2 \le \left\langle f \big \vert f \right\rangle \left\langle g \big \vert g \right\rangle . \end{aligned}$$
(38)

Proof

First, note that the inequality trivially holds whenever \(\left\langle f \big \vert g \right\rangle = 0\) and that \(\left\langle f \big \vert f \right\rangle = 0\) implies \(\left\langle f \big \vert g \right\rangle = 0\). Therefore, we can assume that both \(\left\langle f \big \vert g \right\rangle \ne 0\) and \(\left\langle f \big \vert f \right\rangle > 0\).

Let \(P = \left| g \big \rangle \big \langle g \right| \). Note this is not necessarily a projector because \(\left| g \right\rangle \) does not have to be normalized, but it is a Hermitian operator. Applying the Aharonov–Vaidman identity to P and \(\left| f \right\rangle \) gives

$$\begin{aligned} P \left| f \right\rangle = \left\langle P \right\rangle \left| f \right\rangle + \Delta P \left| f^{\perp }_{P} \right\rangle , \end{aligned}$$
(39)

or equivalently,

$$\begin{aligned} \left| g \right\rangle \left\langle g \big \vert f \right\rangle = \frac{\left\langle f \big \vert g \right\rangle \left\langle g \big \vert f \right\rangle }{\left\langle f \big \vert f \right\rangle } \left| f \right\rangle + \Delta P \left| f^{\perp }_P \right\rangle . \end{aligned}$$
(40)

Taking the inner product with \(\left| f^{\perp }_P \right\rangle \) gives

$$\begin{aligned} \left\langle f^{\perp }_P \big \vert g \right\rangle \left\langle g \big \vert f \right\rangle = \Delta P \left\langle f \big \vert f \right\rangle , \end{aligned}$$
(41)

where we used the fact that \(\left\langle f^{\perp }_P \big \vert f^{\perp }_P \right\rangle = \left\langle f \big \vert f \right\rangle .\) Rearranging and taking the complex conjugate give

$$\begin{aligned} \left\langle g \big \vert f^{\perp }_P \right\rangle = \frac{\Delta P\left\langle f \big \vert f \right\rangle }{\left\langle f \big \vert g \right\rangle }. \end{aligned}$$
(42)

Now, taking the inner product of (40) with \(\left| g \right\rangle \) gives

$$\begin{aligned} \left\langle g \big \vert g \right\rangle \left\langle g \big \vert f \right\rangle = \frac{\left\langle f \big \vert g \right\rangle \left\langle g \big \vert f \right\rangle }{\left\langle f \big \vert f \right\rangle } \left\langle g \big \vert f \right\rangle + \Delta P \left\langle g \big \vert f^{\perp }_P \right\rangle . \end{aligned}$$
(43)

Multiplying both sides by \(\left\langle f \big \vert f \right\rangle /\left\langle g \big \vert f \right\rangle \) gives

$$\begin{aligned} \left\langle f \big \vert f \right\rangle \left\langle g \big \vert g \right\rangle = \left\langle f \big \vert g \right\rangle \left\langle g \big \vert f \right\rangle + \frac{\Delta P \left\langle g \big \vert f^{\perp }_P \right\rangle \left\langle f \big \vert f \right\rangle }{\left\langle g \big \vert f \right\rangle }. \end{aligned}$$
(44)

Substituting (42) into this gives

$$\begin{aligned} \left\langle f \big \vert f \right\rangle \left\langle g \big \vert g \right\rangle = \left\langle f \big \vert g \right\rangle \left\langle g \big \vert f \right\rangle + \frac{\left( \Delta P \right) ^2 \left| \left\langle f \big \vert f \right\rangle \right| ^2}{\left\langle f \big \vert g \right\rangle \left\langle g \big \vert f \right\rangle }, \end{aligned}$$
(45)

or

$$\begin{aligned} \left\langle f \big \vert f \right\rangle \left\langle g \big \vert g \right\rangle = \left| \left\langle f \big \vert g \right\rangle \right| ^2 + \frac{\left( \Delta P \right) ^2 \left| \left\langle f \big \vert f \right\rangle \right| ^2}{\left| \left\langle f \big \vert g \right\rangle \right| ^2}. \end{aligned}$$
(46)

Now, the terms \(\Delta P\), \(\left\langle f \big \vert f \right\rangle \) and \(\left| \left\langle f \big \vert g \right\rangle \right| \) are all real and non-negative. Hence,

$$\begin{aligned} \left\langle f \big \vert f \right\rangle \left\langle g \big \vert g \right\rangle \ge \left| \left\langle f \big \vert g \right\rangle \right| ^2. \end{aligned}$$
(47)

\(\square \)

5 Pedagogical notes

To teach the Robertson uncertainty relation via the Aharonov–Vaidman identity, you first have to establish the Aharonov–Vaidman identity. For the purposes of proving the Robertson uncertainty relation, it is sufficient to restrict the operator in the identity to be Hermitian and the vector \(\left| \psi \right\rangle \) to be a unit vector, as I shall in this section.

In my experience, not all students immediately understand why, given a unit vector \(\left| \psi \right\rangle \), any other unit vector \(\left| \phi \right\rangle \) can be written as

$$\begin{aligned} \left| \phi \right\rangle = \alpha \left| \psi \right\rangle + \beta \left| \psi ^{\perp } \right\rangle , \end{aligned}$$
(48)

where \(\left| \psi ^{\perp } \right\rangle \) is a unit vector orthogonal to \(\left| \psi \right\rangle \). They will probably have seen Gram–Schmidt orthogonalization in a linear algebra class, but may have difficulty using that knowledge here due to the jump to abstract Hilbert spaces and Dirac notation. To aid intuition, I remark that \(\left| \psi \right\rangle \) and \(\left| \phi \right\rangle \) span a two-dimensional subspace of \(\mathcal {H}_{}\) and show them Fig. 1.

Fig. 1
figure 1

Diagram showing that there exists a unit vector \(\left| \psi ^{\perp } \right\rangle \) such that \(\left| \psi \right\rangle \) and \(\left| \psi ^{\perp } \right\rangle \) form an orthogonal basis for the two dimensional subspace of \(\mathcal {H}_{}\) spanned by \(\left| \psi \right\rangle \) and \(\left| \phi \right\rangle \)

By the process of Gram–Schmidt orthogonalization, we can construct an orthornormal basis for this subspace consisting of \(\left| \psi \right\rangle \) and

$$\begin{aligned} \left| \psi ^{\perp } \right\rangle = \frac{1}{\sqrt{1 - \left| \left\langle \phi \big \vert \psi \right\rangle \right| ^2}} \left( \left| \phi \right\rangle - \left| \psi \right\rangle \left\langle \psi \big \vert \phi \right\rangle \right) , \end{aligned}$$
(49)

from which we have (48) with \(\alpha = \left\langle \psi \big \vert \phi \right\rangle \) and \(\beta = \sqrt{1 - \left| \left\langle \phi \big \vert \psi \right\rangle \right| ^2}\).

In my quantum mechanics classes, I set students in-class activities that involve things like deriving important equations or making order of magnitude estimates. These take about 5–10 min each and are done in pairs. I usually do two or three such activities per class. I believe this increases active engagement and retention of the main principles. I try to reduce the number of long derivations that I do myself on the board, because I think they cause confusion about what the most important equations are and the derivations are rarely remembered by the students. However, I also do not want to set the students a long and complicated derivation to do themselves in class, so I try to find shorter derivations that they can do with guidance instead. The proof of the Robertson relation from the Aharonov–Vaidman relation is better suited to this approach than the standard proof.

After establishing (48), I set students the following activity.

In class activity

Given that \(A\left| \psi \right\rangle = \alpha \left| \psi \right\rangle + \beta \left| \psi ^{\perp } \right\rangle \), find \(\alpha \) and \(\beta \) in terms of the expectation value \(\left\langle A \right\rangle \) and standard deviation \(\Delta A\) of A in the state \(\left| \psi \right\rangle \).

Although some students can do this straight away, most need some help. During the course of the activity, I walk around the class to get an idea of how they are doing. When it seems like many students are stuck, I reveal the following three hints in sequence.

Hints

  1. 1.

    Try taking the inner product of \(A\left| \psi \right\rangle = \alpha \left| \psi \right\rangle + \beta \left| \psi ^{\perp } \right\rangle \) with other states.

  2. 2.

    Try taking the inner product of \(A\left| \psi \right\rangle \) with \(\left| \psi \right\rangle \).

  3. 3.

    Try taking the inner product of \(A\left| \psi \right\rangle \) with itself.

Although most students can get \(\alpha = \left\langle A \right\rangle \) either straight away or after the first hint, \(\vert \beta \vert = \Delta A\) is more challenging. After taking the inner product with \(\left| \psi \right\rangle \), the obvious instinct is to take the inner product with \(\left| \psi ^{\perp } \right\rangle \), which does not help, so the third hint is usually needed. After this, it is a short hop to the Robertson relation via the proof given in Sect. 3.

I think it would be more difficult to teach the standard proof in this way. One would either have to ask the students to derive the Cauchy–Schwarz inequality for themselves or derive the Robertson relation from Cauchy–Schwarz. The former is a bit abstract for a quantum mechanics class and the latter involves a lot of algebra and cancellations with a high potential for making mistakes. Both would require a large number of hints. In contrast, the proof of the Aharonov–Vaidman identity is relatively short, and I think that students who retain the identity are more likely to be able to reconstruct the proof of the Robertson relation for themselves.

6 Other uncertainty relations for standard deviations

Despite the ubiquity of the Schrödinger–Robertson uncertainty relations in quantum mechanics classes, there are good reasons to go beyond them. For example, consider a spin-1/2 particle with spin operators \(S_x\), \(S_y\) and \(S_z\). For this case, the Robertson uncertainty is \(\Delta S_x \Delta S_y \ge \hbar \left| \left\langle S_z \right\rangle \right| \). Let \(\left| x+ \right\rangle \) be the spin-up state in the x direction. For this state we have \(\left\langle S_z \right\rangle = 0\), which is perfectly valid because \(\left| x+ \right\rangle \) is an eigenstate of \(S_x\) and hence \(\Delta S_x = 0\). However, because \([S_x,S_y] \ne 0\) there is necessarily some uncertainty in \(S_y\) and in fact \(\Delta S_y = \hbar / 2\). The Schrödinger relation also yields \(\Delta S_x \Delta S_y \ge 0\). So the Schrödinger–Robertson relations do not capture all uncertainty trade-offs that necessarily exist in quantum mechanics.

More generally, for bounded operators A and B, any uncertainty relation of the form \(\Delta A \Delta B \ge f \left( A,B,\left| \psi \right\rangle \right) \) for some function f must necessarily have \(f \left( A,B,\left| \psi \right\rangle \right) = 0\) whenever \(\left| \psi \right\rangle \) is an eigenstate of A or B. For this reason, it makes sense to seek uncertainty relations that bound the sum of standard deviations \(\Delta A + \Delta B\), the sum of variances \(\left( \Delta A\right) ^2 + \left( \Delta B \right) ^2\), or more exotic combinations. We shall discuss the Maccone–Pati relations, and some simple generalizations, in this section.

Uncertainty relations are classified as either state dependent or state independent, depending on whether the right hand side of the inequality depends on the state \(\left| \psi \right\rangle \). For two observables A and B, a state dependent uncertainty relation is of the form \(f(\Delta A,\Delta B) \ge g(A, B, \left| \psi \right\rangle )\), where f and g are specified functions, whereas a state independent uncertainty relation would be of the form \(f(\Delta A,\Delta B) \ge g(A, B)\), noting that g is no longer allowed to depend on \(\left| \psi \right\rangle \).

On the face of it, a state-dependent uncertainty relation is a strange idea, since, for any given normalized state \(\left| \psi \right\rangle \), we can always just calculate the uncertainties \(\Delta A\) and \(\Delta B\) and get the exact value of \(f(\Delta A, \Delta B)\). Therefore, bounds on uncertainty that apply to all states seem more useful.

However, a state-dependent uncertainty relation can be a useful step in deriving a state independent one. This can happen in two ways. First, it may happen that, for a particular choice of the observables A and B, the function \(g(A, B, \left| \psi \right\rangle )\) turns out not to depend on \(\left| \psi \right\rangle \). For example, the Robertson relation \(\Delta A \Delta B \ge \frac{1}{2} \left| \left\langle \psi \big \vert [A,B] \big \vert \psi \right\rangle \right| \) is state dependent, but if we choose \(A = x\), \(B=p\), then \(\left| \left\langle \psi \big \vert [A,B] \big \vert \psi \right\rangle \right| = 1\) and so we get the Heisenberg relation \(\Delta x \Delta p \ge \frac{\hbar }{2}\), which is state independent. Since the main point of proving the Robertson uncertainty relation in a quantum mechanics class is to give a rigorous derivation of the Heisenberg relation, its state dependence does no harm. However, the utility of the Robertson relation for other classes of observable, such as spin components, is more questionable. Despite the fact that I have asked students to compute it for states of a spin-1/2 particle as a homework problem, I do not think there is ever a need to do this in practice, as it is just as easy to calculate the exact uncertainties.

The second way of obtaining a state independent uncertainty relation from a state dependent one is to optimize, i.e. if \(f(\Delta A,\Delta B) \ge g(A, B, \left| \psi \right\rangle )\) thenFootnote 2

$$\begin{aligned} f(\Delta A,\Delta B) \ge \min _{\left| \psi \right\rangle } g(A, B, \left| \psi \right\rangle ). \end{aligned}$$
(50)

Of course, if \(f(\Delta A,\Delta B) = \Delta A \Delta B\) and A and B are bounded operators, then this leads to the trivial relation \(\Delta A \Delta B \ge 0\) because we can choose \(\left| \psi \right\rangle \) to be an eigenstate of either A or B. However, for sums and more general combinations of observables, optimization can lead to a nontrivial relation.

Further, if we are considering a set of experiments that can only prepare a subset of the possible states, then we can get an uncertainty relation that applies to those states by optimizing over the subset. An example might be experiments in which we can only prepare the system in a Gaussian state. Although this does not yield a state independent uncertainty relation, it is more useful than a completely state dependent one, as it allows us to bound the possible uncertainties for a class of relevant states.

To summarize, state-dependent uncertainty relations are a strange idea, and I am not sure whether they would ever have been considered had not Robertson introduced one as a way-point in proving the Heisenberg relation. However, they can be useful in proving more generally applicable uncertainty relations. The relations that we discuss here are state dependent.

The remainder of this section is structured as follows. In Sect. 6.1 we prove two propositions called the sum relations that will be used repeatedly using the Aharonov–Vaidman identity. In Sect. 6.2, we give an Aharonov–Vaidman based proof of the Maccone-Pati uncertainty relations, and in in Sect. 6.3 we give some simple generalizations.

6.1 The sum relations

Proposition 6.1

Let A and B be linear operators acting on \(\mathcal {H}_{}\). Then, for any \(\left| \psi \right\rangle \in \mathcal {H}_{}\),

$$\begin{aligned} \Delta (A+B) \left| \psi ^{\perp }_{A+B} \right\rangle = \Delta A \left| \psi ^{\perp }_A \right\rangle + \Delta B \left| \psi ^{\perp }_B \right\rangle . \end{aligned}$$

Proof

Apply the Aharonov–Vaidman identity to \(A+B\) in two different ways. The first way is

$$\begin{aligned} (A + B) \left| \psi \right\rangle&= \left\langle A + B \right\rangle \left| \psi \right\rangle + \Delta (A + B) \left| \psi ^{\perp }_{A+B} \right\rangle \nonumber \\&= \left( \left\langle A \right\rangle + \left\langle B \right\rangle \right) \left| \psi \right\rangle + \Delta (A + B) \left| \psi ^{\perp }_{A+B} \right\rangle , \end{aligned}$$
(51)

and the second is

$$\begin{aligned} (A + B) \left| \psi \right\rangle&= A\left| \psi \right\rangle + B\left| \psi \right\rangle \nonumber \\&= \left( \left\langle A \right\rangle + \left\langle B \right\rangle \right) \left| \psi \right\rangle + \Delta A \left| \psi ^{\perp _A} \right\rangle + \Delta B \left| \psi ^{\perp }_B \right\rangle . \end{aligned}$$
(52)

Subtracting (52) from (51) and rearranging give the desired result. \(\square \)

The next proposition comes from [19]. Here, the proof relies on Poposition 6.1 and so is based on the Aharonov–Vaidman relation. The original proof uses a different method and is a little more complicated.

Proposition 6.2

(The sum relation) Let A and B be two linear operators acting on a Hilbert space \(\mathcal {H}_{}\). Then,

$$\begin{aligned} \Delta (A + B) \le \Delta A + \Delta B. \end{aligned}$$

Proof

Let \(\left| \psi \right\rangle \) in Proposition 6.1 be a unit vector. Then, starting from \(\Delta (A+B) \left| \psi ^{\perp }_{A+B} \right\rangle = \Delta A \left| \psi ^{\perp }_A \right\rangle + \Delta B \left| \psi ^{\perp }_B \right\rangle \) and taking the inner product with \(\left| \psi ^{\perp }_{A+B} \right\rangle \) gives

$$\begin{aligned} \Delta (A + B) = \Delta A \left\langle \psi ^{\perp }_{A+B} \big \vert \psi ^{\perp }_A \right\rangle + \Delta B \left\langle \psi ^{\perp _{A+B}} \big \vert \psi ^{\perp }_B \right\rangle . \end{aligned}$$

The left hand side of this equation is a real number, so the right hand side must be too. Therefore, we can take the real part of each term to give

$$\begin{aligned} \Delta (A + B) = \Delta A \text {Re} \left( \left\langle \psi ^{\perp }_{A+B} \big \vert \psi ^{\perp }_A \right\rangle \right) + \Delta B \text {Re} \left( \left\langle \psi ^{\perp _{A+B}} \big \vert \psi ^{\perp }_B \right\rangle \right) , \end{aligned}$$

but the real part of an inner product between two unit vectors is \(\le 1\), so we have

$$\begin{aligned} \Delta (A + B) \le \Delta A + \Delta B. \end{aligned}$$

\(\square \)

From the proof, we see that the equality condition for the sum relation is

$$\begin{aligned} \text {Rcorr}(A + B, A) = \text {Rcorr}(A+B,B) = 1. \end{aligned}$$

Remark 6.3

For a set of linear operators \(A_1,A_2,\ldots ,A_n\) on a Hilbert space \(\mathcal {H}_{}\), Proposition 6.1 is easily generalized to

$$\begin{aligned} \Delta \left( \sum _{j=1}^n A_j \right) \left| \psi ^{\perp }_{\sum _{j=1}^n A_j} \right\rangle = \sum _{j=1}^n \Delta A_j \left| \psi _{A_j}^{\perp } \right\rangle , \end{aligned}$$
(53)

by applying the Aharonov–Vaidman identity to \(\sum _{j=1}^n A_j\). Similarly, Proposition 6.2 is easily generalized to

$$\begin{aligned} \Delta \left( \sum _{j=1}^n A_j \right) \le \sum _{j=1}^n \Delta A_j, \end{aligned}$$
(54)

by taking the inner product of (53) with \(\left| \psi ^{\perp }_{\sum _{j=1}^n A_j} \right\rangle \). We will also refer to the generalization in (54) as the sum relation.

6.2 The Maccone–Pati uncertainty relations

Between the time of Robertson’s uncertainty relation and now, there has always been some literature on uncertainty relations for variances and standard deviations. However, the field was reinvigorated in 2014, when Maccone and Pati [20] proved a pair of uncertainty relations for sums of variances, which always give a nontrivial bound, even in the case of an eigenstate of an observable.

Here, we give Aharonov–Vaidman based proofs of the Maccone–Pati relations.Footnote 3

Theorem 6.4

(The first Maccone–Pati uncertainty relation) Let A and B be Hermitian operators on a Hilbert space \(\mathcal {H}_{}\) and let \(\left| \psi \right\rangle \in \mathcal {H}_{}\) be a unit vector. Then,

$$\begin{aligned} \left( \Delta A \right) ^2 + \left( \Delta B \right) ^2 \ge \pm i \left\langle [A,B] \right\rangle + \left| \left\langle \psi ^{\perp } \big \vert (A \mp i B) \big \vert \psi \right\rangle \right| ^2, \end{aligned}$$
(55)

where \(\left| \psi ^\perp \right\rangle \) is any unit vector orthogonal to \(\left| \psi \right\rangle \).

Proof

We will prove \(\left( \Delta A \right) ^2 + \left( \Delta B \right) ^2 \ge -i \left\langle [A,B] \right\rangle + \left| \left\langle \psi ^{\perp } \big \vert (A + i B) \big \vert \psi \right\rangle \right| ^2\) by applying the Aharonov–Vaidman identity to \((A + iB)\). The proof of the other inequality follows by replacing \(A + iB\) with \(A - iB\). Note that, even though A and B are Hermitian, \(A + iB\) is not, so it is crucial that we previously generalized the Aharonov–Vaidman identity to arbitrary linear operators.

Applying the Aharonov–Vaidman identity to \(A+iB\) gives

$$\begin{aligned} (A + iB) \left| \psi \right\rangle = \left( \left\langle A \right\rangle + i \left\langle B \right\rangle \right) \left| \psi \right\rangle + \Delta (A + iB) \left| \psi ^{\perp }_{A+iB} \right\rangle . \end{aligned}$$

Taking the inner product with any unit vector \(\left| \psi ^{\perp } \right\rangle \) orthogonal to \(\left| \psi \right\rangle \) gives

$$\begin{aligned} \left\langle \psi ^{\perp } \big \vert (A+iB) \big \vert \psi \right\rangle = \Delta (A + iB) \left\langle \psi ^{\perp } \big \vert \psi ^{\perp }_{A + iB} \right\rangle , \end{aligned}$$

and taking the modulus squared of this gives

$$\begin{aligned} \left| \left\langle \psi ^{\perp } \big \vert (A+iB) \big \vert \psi \right\rangle \right| ^2 = \left( \Delta (A + iB) \right) ^2 \left| \left\langle \psi ^{\perp } \big \vert \psi ^{\perp }_{A + iB} \right\rangle \right| ^2. \end{aligned}$$

Now, \(\left| \left\langle \psi ^{\perp } \big \vert \psi ^{\perp }_{A + iB} \right\rangle \right| \le 1\), so

$$\begin{aligned} \left( \Delta (A + iB) \right) ^2 \ge \left| \left\langle \psi ^{\perp } \big \vert (A+iB) \big \vert \psi \right\rangle \right| ^2. \end{aligned}$$

The result now follows by expanding \(\left( \Delta (A + iB) \right) ^2\) as follows.

$$\begin{aligned} \left( \Delta (A + iB) \right) ^2&= \left\langle (A - iB)(A + iB) \right\rangle - \left\langle A - iB \right\rangle \left\langle A + iB \right\rangle \\&= \left\langle A^2 \right\rangle + \left\langle B^2 \right\rangle + i\left\langle [A,B] \right\rangle - \left\langle A \right\rangle ^2 - \left\langle B \right\rangle ^2 \\&= \left( \Delta A \right) ^2 + \left( \Delta B \right) ^2 + i \left\langle [A,B] \right\rangle . \end{aligned}$$

\(\square \)

Theorem 6.5

(The second Maccone–Pati uncertainty relation) Let A and B be linear operators on a Hilbert space \(\mathcal {H}_{}\) and let \(\left| \psi \right\rangle \in \mathcal {H}_{}\) be a unit vector. Then,

$$\begin{aligned} \left( \Delta A \right) ^2 + \left( \Delta B \right) ^2 \ge \frac{1}{2} \left| \left\langle \psi ^{\perp }_{A+B} \big \vert (A+B) \big \vert \psi \right\rangle \right| ^2. \end{aligned}$$
(56)

Proof

Applying the Aharonov–Vaidman identity to \(A+B\) gives

$$\begin{aligned} (A+B) \left| \psi \right\rangle = \left( \left\langle A \right\rangle + \left\langle B \right\rangle \right) \left| \psi \right\rangle + \Delta (A+B) \left| \psi ^{\perp }_{A+B} \right\rangle . \end{aligned}$$

Taking the inner product with \(\left| \psi ^{\perp }_{A+B} \right\rangle \) gives

$$\begin{aligned} \left\langle \psi ^{\perp }_{A+B} \big \vert (A+B) \big \vert \psi \right\rangle&= \Delta (A + B) \\&\le \Delta A + \Delta B, \end{aligned}$$

where the second line follows from the sum relation.

We could stop here and regard \(\Delta A + \Delta B \ge \left\langle \psi ^{\perp }_{A+B} \big \vert (A+B) \big \vert \psi \right\rangle \) as an uncertainty relation, but Maccone and Pati wanted a relation in terms of variances to compare to their first result. To do this, we take the modulus squared of both sides to obtain

$$\begin{aligned} \left( \Delta A + \Delta B\right) ^2 \ge \left| \left\langle \psi ^{\perp }_{A+B} \big \vert (A + B) \big \vert \psi \right\rangle \right| ^2. \end{aligned}$$

The result now follows from the real number inequality \(x^2 + y^2 \ge \frac{1}{2}(x+y)^2\) with \(x = \Delta A\) and \(y = \Delta B\). For completeness, this inequality is proved as follows.

$$\begin{aligned}&0 \le (x-y)^2 = x^2 + y^2 -2xy \\&\quad \Rightarrow x^2 + y^2 \ge 2xy \\&\quad \Rightarrow 2x^2 + 2y^2 \ge x^2 + y^2 + 2xy \\&\quad \Rightarrow 2x^2 +2 y^2 \ge (x+y)^2 \\&\quad \Rightarrow x^2 + y^2 \ge \frac{1}{2} (x+y)^2. \end{aligned}$$

\(\square \)

6.3 Generalizations

Generalizations of the Maccone–Pati uncertainty relations can be obtained by applying the Aharonov–Vaidman identity to more general linear combinations \(\alpha A + \beta B\), where \(\alpha , \beta \in \mathbb {C}\). This gives

$$\begin{aligned} (\alpha A + \beta B) \left| \psi \right\rangle = \left( \alpha \left\langle A \right\rangle + \beta \left\langle B \right\rangle \right) \left| \psi \right\rangle + \Delta (\alpha A + \beta B) \left| \psi ^{\perp }_{\alpha A + \beta B} \right\rangle . \end{aligned}$$
(57)

Applying the strategy we used to prove Theorem 6.4, we can take the inner product of this with an arbitrary unit vector \(\left| \psi ^{\perp } \right\rangle \) that is orthogonal to \(\left| \psi \right\rangle \), which gives

$$\begin{aligned} \left\langle \psi ^{\perp } \big \vert (\alpha A + \beta B) \big \vert \psi \right\rangle = \Delta (\alpha A + \beta B) \left\langle \psi ^{\perp } \big \vert \psi ^{\perp }_{\alpha A + \beta B} \right\rangle . \end{aligned}$$

We can now take the modulus squared of this and recognize that \(0 \le \left| \left\langle \psi ^{\perp } \big \vert \psi ^{\perp }_{\alpha A + \beta B} \right\rangle \right| ^2 \le 1\) to obtain

$$\begin{aligned} \left| \left\langle \psi ^{\perp } \big \vert (\alpha A + \beta B) \big \vert \psi \right\rangle \right| ^2 \le \Delta (\alpha A + \beta B). \end{aligned}$$

Next, we can expand \(\Delta (\alpha A + \beta B)\) and rearrange to obtain

$$\begin{aligned} \left| \alpha \right| ^2 \left( \Delta A \right) ^2+ & {} \left| \beta \right| ^2 \left( \Delta B \right) ^2 \ge - \text {Re}(\alpha ^* \beta ) \left( \left\langle \{A,B\} \right\rangle -2 \left\langle A \right\rangle \left\langle B \right\rangle \right) - i \text {Im} \left( \alpha ^* \beta \right) \left\langle [A,B] \right\rangle \nonumber \\+ & {} \left| \left\langle \psi ^{\perp } \big \vert (\alpha A + \beta B) \big \vert \psi \right\rangle \right| ^2. \end{aligned}$$
(58)

Substituting \(\alpha = 1\), \(\beta = i\) and \(\alpha = 1\), \(\beta = -i\) immediately yields the first Maccone–Pati uncertainty relation.

Alternatively, we can apply the strategy used to prove Theorem 6.5. Starting from (57), we can take the inner product with \(\left| \psi ^{\perp }_{\alpha A + \beta B} \right\rangle \) and rearrange to obtain

$$\begin{aligned} \Delta (\alpha A + \beta B) = \left\langle \psi ^{\perp }_{\alpha A + \beta B} \big \vert (\alpha A + \beta B) \big \vert \psi \right\rangle . \end{aligned}$$

Using the sum relation, together with \(\Delta (\alpha A) = \vert \alpha \vert \Delta A\) gives

$$\begin{aligned} \vert \alpha \vert \Delta A + \vert \beta \vert \Delta B \ge \left\langle \psi ^{\perp }_{\alpha A + \beta B} \big \vert (\alpha A + \beta B) \big \vert \psi \right\rangle . \end{aligned}$$

Finally, squaring and using the inequality \(x^2 + y^2 \ge \frac{1}{2}(x+y)^2\) gives

$$\begin{aligned} \vert \alpha \vert ^2 \left( \Delta A \right) ^2 + \vert \beta \vert ^2 \left( \Delta B \right) ^2 \ge \frac{1}{2} \left| \left\langle \psi ^{\perp }_{\alpha A + \beta B} \big \vert (\alpha A + \beta B) \big \vert \psi \right\rangle \right| ^2. \end{aligned}$$
(59)

The inequalities (58) and (59) are related to some of the generalizations of the Maccone–Pati uncertainty relations that have previously appeared in the literature [21, 28]. For example, (58) can be used to derive an uncertainty relation that has appeared in the literature under the name “weighted uncertainty relation” [28]. To do so, we set \(\alpha = \sqrt{\lambda }\), \(\beta = \pm i/\sqrt{\lambda }\) in (58), where \(\lambda > 0\). This yields

$$\begin{aligned} \lambda \left( \Delta A \right) ^2 + \frac{1}{\lambda } \left( \Delta B \right) ^2 \ge \pm i \left\langle [A,B] \right\rangle + \frac{1}{\lambda } \left| \left\langle \psi ^{\perp } \big \vert ( \lambda A \mp i B) \big \vert \psi \right\rangle \right| ^2. \end{aligned}$$

This is an uncertainty relation in its own right, but the relation in [28] comes from adding this to (55), which yields

$$\begin{aligned} (1 + \lambda ) \left( \Delta A \right) ^2+ & {} \left( 1 + \frac{1}{\lambda } \right) \left( \Delta B \right) ^2 \ge \pm 2 i \left\langle [A,B] \right\rangle \left| \left\langle \psi ^{\perp }_1 \big \vert ( A \mp i B) \big \vert \psi \right\rangle \right| ^2 \nonumber \\+ & {} \frac{1}{\lambda } \left| \left\langle \psi ^{\perp }_2 \big \vert ( \lambda A \mp i B) \big \vert \psi \right\rangle \right| ^2, \end{aligned}$$
(60)

where \(\left| \psi _1^{\perp } \right\rangle \) and \(\left| \psi _2^{\perp } \right\rangle \) are (possibly different) unit vectors that are orthogonal to \(\left| \psi \right\rangle \).

This is intended as a simple example of a generalization that is easily obtained from the Aharonov–Vaidman identity, but I expect many other uncertainty relations that are usually proved using the Cauchy–Schwarz inequality or the parallelogram law would also have simple Aharonov–Vaidman based proofs.

7 Quantum propagation of uncertainty

In this section, we develop generalizations of the classical formulas for the propagation of uncertainty. We start with the case of linear functions in Sect. 7.1, for which exact formulas are easy to obtain, before moving on to the general, possibly nonlinear, case in Sect. 7.2, for which we have to employ a Taylor series approximation.

7.1 Linear functions

We start with the simplest case: a sum of two observables. Classically, if A and B are random variables, then

$$\begin{aligned} \left[ \Delta (A + B) \right] ^2 = \left( \Delta A \right) ^2 + \left( \Delta B \right) ^2 + 2 \Delta A \Delta B \, \textrm{corr}_{A,B}. \end{aligned}$$
(61)

Consider an experiment consisting of multiple runs. On each run, the quantities A and B are measured. These quantities are formalized as random variables because we assume that our experiments are subject to random statistical fluctuations, and that the “true” values that we are seeking are the means \(\left\langle A \right\rangle \) and \(\left\langle B \right\rangle \) of these random processes. We then use the average values calculated from the data as estimates of \(\left\langle A \right\rangle \) and \(\left\langle B \right\rangle \), and the standard deviations as a measure of the error in our experiment. If we are actually interested in the quantity \(A+B,\) then we would sum the averages to form our estimate of \(\left\langle A+B \right\rangle \), and we would use (61) to determine the error in our estimate of \(\left\langle A+B \right\rangle \). Using (61) in this way is called the propagation of uncertainty or propagation of error.

If the random variables, A and B,  are independent, which would be the case if the randomness were due to independent statistical errors, then \(\textrm{corr}_{A,B} = 0\) and we would have

$$\begin{aligned} \left[ \Delta (A + B) \right] ^2 = \left( \Delta A \right) ^2 + \left( \Delta B \right) ^2, \end{aligned}$$

which is the formula for propagation of uncertainty that is most commonly used in practice.

We now want to generalize these formulas by replacing classical random variables with quantum observables. The generalization of (61) is as follows.

Theorem 7.1

Let A and B be Hermitian operators on a Hilbert space \(\mathcal {H}_{}\). Then,

$$\begin{aligned} \left[ \Delta (A + B) \right] ^2&= \left( \Delta A \right) ^2 + \left( \Delta B \right) ^2 + 2 \Delta A \Delta B \, \textrm{Rcorr}_{A,B} \end{aligned}$$
(62)
$$\begin{aligned}&= \left( \Delta A \right) ^2 + \left( \Delta B \right) ^2 + \left\langle \{A,B\} \right\rangle - 2\left\langle A \right\rangle \left\langle B \right\rangle . \end{aligned}$$
(63)

Proof

Proposition 6.1 implies that, for any unit vector \(\left| \psi \right\rangle \in \mathcal {H}_{}\),

$$\begin{aligned} \Delta (A + B) \left| \psi _{A+B}^{\perp } \right\rangle = \Delta A \left| \psi _A^{\perp } \right\rangle + \Delta B \left| \psi _B^{\perp } \right\rangle . \end{aligned}$$

Taking the inner product of this with itself gives

$$\begin{aligned} \left[ \Delta \left( A + B \right) \right] ^2&= \left( \Delta A \right) ^2 + \left( \Delta B \right) ^2 + \Delta A \Delta B \left( \left\langle \psi _A^{\perp } \big \vert \psi _B^{\perp } \right\rangle + \left\langle \psi _B^{\perp } \big \vert \psi _A^{\perp } \right\rangle \right) \\&= \left( \Delta A \right) ^2 + \left( \Delta B \right) ^2 + 2\Delta A \Delta B \, \textrm{Re}\left( \left\langle \psi _A^{\perp } \big \vert \psi _B^{\perp } \right\rangle \right) . \end{aligned}$$

Applying (13) completes the proof. \(\square \)

Remark 7.2

For operators \(A_1,A_2,\cdots ,A_n\) and real numbers \(\alpha _1,\alpha _2,\cdots ,\alpha _n\), Theorem 7.1 is easily generalized to

$$\begin{aligned} \left[ \Delta \left( \sum _{j=1}^n \alpha _j A_j \right) \right] ^2&= \sum _{j=1}^n \alpha _j^2 \left( \Delta A_j \right) ^2 + \sum _{j \ne k} \alpha _j \alpha _k \Delta A_j \Delta A_k \, \textrm{Rcorr}_{A_j,A_k} \\&= \sum _{j=1}^n \alpha _j^2 \left( \Delta A_j \right) ^2 + \sum _{j \ne k} \alpha _j \alpha _k \left( \left\langle \{ A_j,A_k \} \right\rangle - 2 \left\langle A_j \right\rangle \left\langle A_k \right\rangle \right) . \end{aligned}$$

Although Theorem 7.1 is a true theorem about quantum observables, it cannot be used to propagate uncertainty in the same way as its classical counterpart. Classically, we can measure A and B together in the same run of the experiment. We can then estimate \(A+B\) by summing the average values of A and B that we found in the experiment. We also have all the information we need to calculate the uncertainty \(\Delta (A+B)\), i.e., \(\Delta A\), \(\Delta B\), \(\left\langle A \right\rangle \), \(\left\langle B \right\rangle \) and \(\left\langle AB \right\rangle \), so we can determine the uncertainty without doing any more experiments.

In quantum mechanics, this is not the case. When A and B do not commute, they cannot both be accurately measured on the same run of an experiment. We can still estimate their expectation values by measuring A on half of the runs of the experiment and B on the other half and taking averages. Since \(\left\langle A + B \right\rangle = \left\langle A \right\rangle + \left\langle B \right\rangle \), summing these averages is still a way of estimating \(\left\langle A+B \right\rangle \). However, we do not have enough information to calculate \(\Delta (A+B)\). The reason is that \(\Delta (A+B)\) is the uncertainty in a direct measurement of \(A + B\). Since A and B do not commute, this requires a different experimental setup from a measurement of A and B alone.

If we wanted to use (62) to calculate \(\Delta (A+B)\), we would also have to estimate \(\left\langle \{ A,B\} \right\rangle \). The most straightforward way of doing this would be to measure the observable \(\{A,B\} = AB + BA\), but this requires yet another different experimental setup, and one that is likely to be at least as complicated as measuring \(A+B\) directly.

An exception to this are cases where \(\{A,B\} = c I\) for some constant c, in which case \(\left\langle \{ A,B\} \right\rangle = c\) regardless of the state. In particular, this is true of the Pauli observables \(\sigma _x\), \(\sigma _y\), \(\sigma _z\) of a qubit for which \(\{\sigma _j,\sigma _k\} = \delta _{jk} I\), where j and k run over xyz. Therefore, if we measure \(\sigma _x\) on many qubits prepared in the same way and \(\sigma _y\) on another set of such qubits, we can estimate \(\left\langle \sigma _x + \sigma _y \right\rangle \) and \(\Delta (\sigma _x + \sigma _y)\) without doing any further experiments using the formula

$$\begin{aligned} \left[ \Delta \left( \sigma _x + \sigma _y \right) \right] ^2= \left( \Delta \sigma _x \right) ^2 + \left( \Delta \sigma _y \right) ^2 - 2\left\langle \sigma _x \right\rangle \left\langle \sigma _y \right\rangle . \end{aligned}$$

When \(\{A,B\} \ne c I\), I do not know of any situations in which (62) would be useful in practice, but from a theoretical point of view it is the appropriate generalization of (61) to quantum mechanics, and this bolsters the case that \(\text {Rcorr}_{A,B}\) is the appropriate quantum generalization of the classical correlation.

7.2 Nonlinear functions

For nonlinear functions f(AB) of two random variables A and B, it is common to use a first-order Taylor expansion of f(AB) about the point \(f(\left\langle A \right\rangle ,\left\langle B \right\rangle )\) to derive an approximation for the variance \([\Delta f(A,B)]^2\) to second order in \(\Delta A\) and \(\Delta B\). This yields the formula

$$\begin{aligned} \left[ \Delta f(A,B) \right] ^2&\approx \left( \left. \frac{\partial f}{\partial A} \right| _{A = \left\langle A \right\rangle ,B=\left\langle B \right\rangle } \right) ^2 \left( \Delta A \right) ^2 + \left( \left. \frac{\partial f}{\partial B} \right| _{A=\left\langle A \right\rangle , B = \left\langle B \right\rangle } \right) ^2 \left( \Delta B \right) ^2 \\&\quad + \left. \frac{\partial f}{\partial A} \right| _{A = \left\langle A \right\rangle ,B=\left\langle B \right\rangle } \left. \frac{\partial f}{\partial B} \right| _{A=\left\langle A \right\rangle ,B = \left\langle B \right\rangle } \Delta A \Delta B \, \textrm{corr}_{A,B}. \end{aligned}$$

To avoid cluttering notation, I will write \(\bar{A}\) for \(A= \left\langle A \right\rangle \), so that we can more compactly write

$$\begin{aligned} \left[ \Delta f(A,B) \right] ^2 \approx \left. \frac{\partial f}{\partial A} \right| _{\bar{A},\bar{B}}^2 \left( \Delta A \right) ^2 + \left. \frac{\partial f}{\partial B} \right| _{\bar{A}, \bar{B}}^2 \left( \Delta B \right) ^2 + \left. \frac{\partial f}{\partial A} \right| _{\bar{A},\bar{B}} \left. \frac{\partial f}{\partial B} \right| _{\bar{A},\bar{B}} \Delta A \Delta B \, \textrm{corr}_{A,B}. \end{aligned}$$
(64)

When A and B are independent, this reduces to

$$\begin{aligned} \left[ \Delta f(A,B) \right] ^2 \approx \left. \frac{\partial f}{\partial A} \right| _{\bar{A},\bar{B}}^2 \left( \Delta A \right) ^2 + \left. \frac{\partial f}{\partial B} \right| _{\bar{A},\bar{B}}^2 \left( \Delta B \right) ^2, \end{aligned}$$

which is the most commonly used form.

The quantum generalization of (64) is as follows.

Theorem 7.3

Let A and B be Hermitian operators on a Hilbert space \(\mathcal {H}_{}\) and consider a function \(f: \mathfrak {H}(\mathcal {H}_{}) \times \mathfrak {H}(\mathcal {H}_{}) \rightarrow \mathfrak {H}(\mathcal {H}_{}),\) where \(\mathfrak {H}(\mathcal {H}_{})\) is the space of Hermitian operators on \(\mathcal {H}_{}\). Then,

$$\begin{aligned} \left[ \Delta f(A,B) \right] ^2\approx & {} \left. \frac{\partial f}{\partial A} \right| _{\bar{A},\bar{B}}^2 \left( \Delta A \right) ^2 + \left. \frac{\partial f}{\partial B} \right| _{\bar{A}, \bar{B}}^2 \left( \Delta B \right) ^2 \nonumber \\{} & {} + \left. \frac{\partial f}{\partial A} \right| _{\bar{A},\bar{B}} \left. \frac{\partial f}{\partial B} \right| _{\bar{A},\bar{B}} \Delta A \Delta B \, \textrm{Rcorr}_{A,B}, \end{aligned}$$
(65)

where \(\approx \) means equality to second order in \(\Delta A\) and \(\Delta B.\)

Proof

Consider the first-order Taylor expansion of f(AB) about the point \(f_0 = f(\left\langle A \right\rangle ,\left\langle B \right\rangle )\),

$$\begin{aligned} f(A,B) \approx f_0 + \left. \frac{\partial f}{\partial A} \right| _{\bar{A},\bar{B}} A + \left. \frac{\partial f}{\partial B} \right| _{\bar{A},\bar{B}} B. \end{aligned}$$

Applying Proposition 6.1 to this gives

$$\begin{aligned} \left[ \Delta f(A,B) \right] \left| \psi ^{\perp }_{f(A,B)} \right\rangle \approx \left. \frac{\partial f}{\partial A} \right| _{\bar{A},\bar{B}} \Delta A \left| \psi ^{\perp }_A \right\rangle + \left. \frac{\partial f}{\partial B} \right| _{\bar{A},\bar{B} } \Delta B \left| \psi ^{\perp }_B \right\rangle . \end{aligned}$$

Taking the inner product of this with itself gives

$$\begin{aligned} \left[ \Delta f(A,B) \right] ^2&\approx \left. \frac{\partial f}{\partial A} \right| _{\bar{A},\bar{B}}^2 \left( \Delta A \right) ^2 + \left. \frac{\partial f}{\partial B} \right| _{\bar{A}, \bar{B}}^2 \left( \Delta B \right) ^2 \\ {}&\qquad + \left. \frac{\partial f}{\partial A} \right| _{\bar{A},\bar{B}} \left. \frac{\partial f}{\partial B} \right| _{\bar{A}, \bar{B}} \Delta A \Delta B \, \textrm{Re} \left( \left\langle \psi _A^{\perp } \big \vert \psi _B^{\perp } \right\rangle \right) \\&= \left. \frac{\partial f}{\partial A} \right| _{\bar{A},\bar{B}}^2 \left( \Delta A \right) ^2 + \left. \frac{\partial f}{\partial B} \right| _{\bar{A}, \bar{B}}^2 \left( \Delta B \right) ^2 \\ {}&\qquad + \left. \frac{\partial f}{\partial A} \right| _{\bar{A},\bar{B}} \left. \frac{\partial f}{\partial B} \right| _{\bar{A}, \bar{B}} \Delta A \Delta B \, \textrm{Rcorr}_{A,B}. \end{aligned}$$

\(\square \)

Remark 7.4

For operators \(A_1,A_2,\ldots ,A_n\) and a function \(f(A_1,A_2,\cdots ,A_n)\), Theorem 7.3 is easily generalized to

$$\begin{aligned} \left[ \Delta f \left( A_1,A_2,\cdots ,A_n \right) \right] ^2\approx & {} \sum _{j=1}^n \left. \frac{\partial f}{\partial A_j} \right| _{\bar{A}}^2 \left( \Delta A_j \right) ^2 \nonumber \\{} & {} + \sum _{j \ne k} \left. \frac{\partial f}{\partial A_j} \right| _{\bar{A}} \left. \frac{\partial f}{\partial A_k} \right| _{\bar{A}} \Delta A_j \Delta A_k \, \textrm{Rcorr}_{A_j,A_k}, \end{aligned}$$
(66)

where \(\bar{A}\) is shorthand for \(A_1 = \left\langle A_1 \right\rangle , A_2 = \left\langle A_2 \right\rangle ,\cdots A_n = \left\langle A_n \right\rangle \).

As a formula for propagating uncertainty, (65) inherits all of the problems of (62), but the problems are compounded further by use of the first-order Taylor approximation. This approximation is valid when \(\Delta A\) and \(\Delta B\) are suitably small compared to \(\left\langle A \right\rangle \), \(\left\langle B \right\rangle \), \(f(\left\langle A \right\rangle ,\left\langle B \right\rangle )\) and the derivatives of f(AB) at \(A=\left\langle A \right\rangle \), \(B=\left\langle B \right\rangle \). This is often the case in classical experiments where everything can be measured with a small statistical error. However, in quantum mechanics, when A and B do not commute, the (various) uncertainty relations tell us that there is necessarily a trade-off between the size of \(\Delta A\) and \(\Delta B\). If one of them is small, then the other might necessarily have to be large. For example, for the Pauli observables \(\sigma _x\) and \(\sigma _y\), at least one of the uncertainties must be comparable in size to 1, which is the largest possible value of \(\left\langle \sigma _x \right\rangle \) or \(\left\langle \sigma _y \right\rangle \).

A case where the formula will work well is for a continuous variable system where \(\Delta x \sim \Delta p \sim \sqrt{\hbar }\), and \(\left\langle x \right\rangle \), \(\left\langle p \right\rangle \) are large compared to \(\sqrt{\hbar }\). But this is a case where you would expect classical physics to be a good approximation anyway.

I do not know whether there is a practical use of (65), but it is nonetheless a correct formal generalization of (64).

8 Dealing with mixed states

So far, we have dealt exclusively with the case of pure state vectors \(\left| \psi \right\rangle \). However, all of our results can be extended to more general density operators \(\rho \), which can represent mixed states. The most familiar way to do this is to make use of the concept of a purification of a density operator. Given a density operator on a Hilbert space \(\mathcal {H}_{S}\), where S stands for “system”, we can always find a pure state vector \(\left| \psi \right\rangle _{SE} \in \mathcal {H}_{S}\otimes \mathcal {H}_{E}\), where E is the “environment”, such that

$$\begin{aligned} \rho _S = \textrm{Tr}_{E} \left( \left| \psi \big \rangle \big \langle \psi \right| _{SE} \right) , \end{aligned}$$

and \(\textrm{Tr}_E\) is the partial trace over \(\mathcal {H}_{E}\). You can then apply the Aharonov–Vaidman identity to operators of the form \(A_S \otimes I_E\) acting on a purification to obtain results about the density operator \(\rho _S\).

However, to make the parallels to the pure state case as close as possible, I prefer to use an equivalent concept, called an amplitude operator. The equivalence between amplitude operators and purifications is discussed in Appendix A.

Definition 8.1

Given a density operator \(\rho _S\) on a Hilbert space \(\mathcal {H}_{S}\), an amplitude operator for \(\rho _S\) is a linear operator \(L_S: \mathcal {H}_{E} \rightarrow \mathcal {H}_{S}\), where \(\mathcal {H}_{E}\) is any Hilbert space, such that

$$\begin{aligned} \rho _S = L_S L_S^{\dagger }. \end{aligned}$$

The reason for the name amplitude operator is that, in pure-state quantum mechanics, an amplitude is a complex number \(\alpha \) such that \(\vert \alpha \vert ^2\) is a probability. A density operator is a non-commutative generalization of a probability distribution [42, 43], and hence an amplitude operator ought to be an operator that “squares” to a density operator.

Given a density operator \(\rho _S\), one obvious way of constructing an amplitude operator is to set \(\mathcal {H}_{E}=\mathcal {H}_{S}\) and \(L_S = \sqrt{\rho }_S\), but there are an infinite number of alternatives, as the following proposition shows

Proposition 8.2

An operator \(L_S:\mathcal {H}_{E}\rightarrow \mathcal {H}_{S}\) is an amplitude operator for \(\rho _S\) if and only if

$$\begin{aligned} L_S = \sqrt{\rho }_S U_{S \vert E}, \end{aligned}$$

where \(U_{S \vert E}:\mathcal {H}_{E} \rightarrow \mathcal {H}_{S}\) is a semi-unitary operator, i.e. it satisfies \(U_{S \vert E}U_{S \vert E}^{\dagger } = I_S\)

Proof

An operator of the form \(L_S = \sqrt{\rho }_SU_{S \vert E}\) obviously satisfies Definition 8.1. For the other direction, assume \(L_S\) is an amplitude operator. Like any operator, it may be decomposed in its polar decomposition \(L_S = P_S U_{S \vert E}\) where \(P_S\) is a positive semi-definite operator on \(\mathcal {H}_{S}\), and \(U_{S \vert E}: \mathcal {H}_{E} \rightarrow \mathcal {H}_{S}\) is semi-unitary.Footnote 4 The definition of an amplitude operator then implies that \(\rho _S = P_S U_{S \vert E} U_{S \vert E}^{\dagger } P_S = P_S^2\), so we must have \(P_S = \sqrt{\rho }_S\). \(\square \)

Going back to the analogy between amplitudes and amplitude operators, multiplying an amplitude \(\alpha \) by a phase factor \(e^{i\phi }\) does not change the probability it represents. Similarly, multiplying an amplitude operator \(L_S\) by a semi-unitary \(V_{E \vert E'}\), i.e., an operator \(V_{E \vert E'}:\mathcal {H}_{E'}\rightarrow \mathcal {H}_{E}\) satisfying \(V_{E\vert E'} V_{E \vert E'}^{\dagger } = I_E\), on the right does not change the density operator it represents. Although one might think it desirable to work directly with probabilities or density operators to eliminate these ambiguities, the mathematical manipulations we need to do in quantum mechanics are often linear in the amplitudes or amplitude operators, but would be nonlinear if you used probabilities or density operators. Therefore, it is often more convenient to live with the ambiguity.

Since every operator has a polar decomposition, the only requirement for \(L_S\) to be an amplitude operator for some density operator is that \(\textrm{Tr}_{S} \left( L_S L_S^{\dagger } \right) = 1\). If we want to work with unnormalized density operators, i.e., any positive operator, then any operator \(L_S:\mathcal {H}_{E} \rightarrow \mathcal {H}_{S}\) is the amplitude operator for some (possibly unnormalized) density operator. This is analogous to the fact that any vector in \(\mathcal {H}_{S}\) represents a (possibly unnormalized) pure state.

The strategy for generalizing the Aharonov–Vaidman identity, and everything that follows from it, is to replace the state vector \(\left| \psi \right\rangle _S\) with an amplitude operator \(L_S\). The reason this works is that the space of linear operators mapping \(\mathcal {H}_{E}\) to \(\mathcal {H}_{S}\), which we denote \(\mathfrak {L}_{S \vert E}\), is itself a Hilbert space with inner product \(\left\langle L_S,M_S \right\rangle = \textrm{Tr}_{E} \left( L_S^{\dagger }M_S \right) \), known as the Hilbert–Schmidt inner product.Footnote 5 Since the Aharonov–Vaidman identity is valid for any Hilbert space, it must be valid on \(\mathfrak {L}_{S \vert E}\) as well.

Proposition 8.3

(The Aharonov–Vaidman Identity for operators) Let \(A_S\) be a linear operator on a Hilbert space \(\mathcal {H}_{S}\) and let \(L_S:\mathcal {H}_{E} \rightarrow \mathcal {H}_{S}\). Then,

$$\begin{aligned} A_S L_S = \left\langle A_S \right\rangle L_S + \left( \Delta A_S \right) L_{A_S}^{\perp }, \end{aligned}$$
(67)

where \(\left\langle A_S \right\rangle = \textrm{Tr}_{S} \left( A_SL_SL_S^{\dagger } \right) /\textrm{Tr}_{S} \left( L_S L_S^{\dagger } \right) \), \(\Delta A = \sqrt{\left\langle A_S^{\dagger }A_S \right\rangle - \left| \left\langle A_S \right\rangle \right| ^2}\), and \(L^{\perp }_{A_S}:\mathcal {H}_{E}\rightarrow \mathcal {H}_{S}\) is an amplitude operator that is orthogonal to \(L_S\), i.e., \(\textrm{Tr}_{E} \left( L^{\dagger }_S L^{\perp }_{A_S} \right) = 0\), satisfies \(\textrm{Tr}_{S} \left( L^{\perp }_{A_S}L_{A_S}^{\perp \dagger } \right) = \textrm{Tr}_{S} \left( L_S L_S^{\dagger } \right) \), and depends on both \(L_S\) and \(A_S\).

The proof of this proposition is essentially the same as the proof of the vector Aharonov–Vaidman identity (Proposition 2.1) with the standard inner product replaced by the Hilbert–Schmidt inner product. The only difference is that the cyclic property of the trace also needs to be used to write things in the exact form given in Proposition 8.3. I leave this as an exercise for the reader.

Since \(\rho _S = L_S L_S^{\dagger }\) is always a (possibly unnormalized) density operator, we can write

$$\begin{aligned} \left\langle A_S \right\rangle = \frac{\textrm{Tr}_{S} \left( A_SL_SL_S^{\dagger } \right) }{\textrm{Tr}_{E} \left( L_S^{\dagger }L_S \right) } = \frac{\textrm{Tr}_{S} \left( A_S \rho _S \right) }{\textrm{Tr}_{E} \left( \rho _S \right) }. \end{aligned}$$

We can also introduce the density operator \(\rho _{A_S}^{\perp } = L^{\perp }_{A_S} L_{A_S}^{\perp \dagger }\), which will be normalized in the same way as \(\rho _S\), i.e., \(\textrm{Tr}_{S} \left( \rho _{A_S}^{\perp } \right) \) = \(\textrm{Tr}_{S} \left( \rho _S \right) \).

When \(L_S\) is normalized so that \(\rho _S = L_S L_S^{\dagger }\) is a normalized density operator, i.e., \(\textrm{Tr}_{S} \left( L_SL_S^{\dagger } \right) = 1\), then \(\rho _{A_S}^{\perp }\) is also normalized, i.e., \(\textrm{Tr}_{S} \left( \rho _{A_S}^{\perp } \right) \) = 1.

As defined, \(\rho _{A_S}^{\perp } = L^{\perp }_{A_S} L_{A_S}^{\perp \dagger }\) looks like it depends on the choice of the amplitude operator \(L_S\). In fact, it does not. It only depends on \(\rho _S\) and \(A_S\). To see this, rewrite the operator Aharonov–Vaidman identity as

$$\begin{aligned} L_{A_S}^{\perp } = \frac{1}{\Delta A_S} \left( A_S - \left\langle A_S \right\rangle I_S \right) L_S, \end{aligned}$$

and then we have

$$\begin{aligned} \rho _{A_S}^{\perp }&= L^{\perp }_{A_S} L_{A_S}^{\perp \dagger } \\&= \frac{1}{(\Delta A_S)^2} \left( A_S - \left\langle A_S \right\rangle I_S \right) L_S L_S^{\dagger } \left( A_S^{\dagger } - \left\langle A_S \right\rangle ^* I_S \right) \\&= \frac{1}{(\Delta A_S)^2} \left( A_S - \left\langle A_S \right\rangle I_S \right) \rho _S \left( A_S^{\dagger } - \left\langle A_S \right\rangle ^* I_S \right) , \end{aligned}$$

which is clearly independent of the choice of \(L_S\). Note that, although \(L_S\) and \(L_{A_S}^{\perp }\) are Hilbert–Schmidt orthogonal, \(\rho _S\) and \(\rho _{A_S}^{\perp }\) are generally not.

To generalize the results of this paper from state vectors to density operators, we replace the vector Aharonov–Vaidman identity with its operator counterpart applied to amplitude operators, and we replace the usual inner product with the Hilbert–Schmidt inner product. In many cases, the final result is independent of the amplitude operator used to represent the state. Although we use it in the proof, it drops out in the final result by only appearing in the combination \(L_S L_S^{\dagger }\), as in the expression we derived for \(\rho _{A_S}^{\perp }\). In fact, the final formulas are usually the same as in the pure state case, except that we have to interpret \(\left\langle A_S \right\rangle \) as \(\textrm{Tr}_{S} \left( A_S \rho _S \right) \) rather than \(\left\langle \psi \big \vert A_S \big \vert \psi \right\rangle \).

However, this is not true for the Maccone–Pati uncertainty relations and their generalizations, which do depend on the choice of amplitude operator \(L_S\).

Theorem 8.4

(The first Maccone–Pati uncertainty relation for amplitude operators) Let \(A_S\) and \(B_S\) be Hermitian operators on a Hilbert space \(\mathcal {H}_{S}\) and let \(\rho _S\) be a normalized density operator on \(\mathcal {H}_{S}\). Then,

$$\begin{aligned} \left( \Delta A \right) ^2 + \left( \Delta B \right) ^2 \ge \pm i \left\langle [A,B] \right\rangle + \left| \textrm{Tr}_{E} \left( L_S^{\perp \dagger } (A \mp i B)L_S \right) \right| ^2, \end{aligned}$$
(68)

where \(L_S: \mathcal {H}_{E} \rightarrow \mathcal {H}_{S}\) is any amplitude operator for \(\rho _S\), and \(L_S^{\perp }: \mathcal {H}_{E} \rightarrow \mathcal {H}_{S}\) is any normalized amplitude operator orthogonal to \(L_S\) that has the same input space \(\mathcal {H}_{E}\).

Note that, to obtain the tightest possible bound on \(\left( \Delta A \right) ^2 + \left( \Delta B \right) ^2\), the right hand side of (68) should be maximized over all possible choices of \(L_S\) and \(L_S^{\perp }\). To do this in practice, a bound on the largest dimension \(d_E\) required to obtain the maximum is needed. I conjecture that \(d_E = 2d_S\) is sufficient because this allows \(L_S\) and \(L_S^{\perp }\) to have orthogonal kernels on \(\mathcal {H}_{E}\), but I do not have a proof of this.

Theorem 8.5

(The second Maccone–Pati uncertainty relation for amplitude operators) Let \(A_S\) and \(B_S\) be linear operators on a Hilbert space \(\mathcal {H}_{S}\) and let \(\rho _S\) be a normalized density operator on \(\mathcal {H}_{S}\). Then,

$$\begin{aligned} \left( \Delta A_S \right) ^2 + \left( \Delta B_S \right) ^2 \ge \frac{1}{2} \left| \textrm{Tr}_{E} \left( L^{\perp \dagger }_{A_S + B_S} (A+B) L_S \right) \right| ^2, \end{aligned}$$
(69)

where \(L_S\) is any amplitude operator for \(\rho _S\) and

$$\begin{aligned} L_{A_S+B_S}^{\perp } = \frac{1}{\Delta (A_S + B_S)} \left( A_S + B_S - \left\langle A_S + B_S \right\rangle I_S \right) L_S. \end{aligned}$$

In this case, to obtain the tightest bound, we have to maximize the right hand side over \(L_S\). We do not have to separately optimize over \(L_{A_S+B_S}^{\perp }\) because it is a function of \(L_S\), \(A_S\) and \(B_S\). However, its dependence on \(L_S\) makes the problem into a complicated nonlinear optimization.

9 Summary and conclusions

In this paper, I discussed how the standard textbook uncertainty relations of Robertson and Schrödinger can be derived from the Aharonov–Vaidman identity in a more direct way than the standard proof. I also demonstrated the identity’s usefulness in proving other uncertainty relations, such as the Maccone–Pati relations, and the quantum formulas for propagation of uncertainty. Finally, I gave a mixed-state generalization of the Aharonov–Vaidman identity in terms of amplitude operators. I hope that this has persuaded you that the Aharonov–Vaidman identity belongs in undergraduate textbooks and that it ought to be a first-line tool in proving relationships between standard deviations in quantum mechanics. I am sure there are other uncertainty relations that have an elegant Aharonov–Vaidman based proofs, and I hope to find new and useful uncertainty relations that have not been discovered before via this method.

The Aharonov–Vaidman identity naturally gives rise to two quantum generalizations of the correlation, \(\textrm{corr}_{A,B}\) and \(\textrm{Rcorr}_{A,B}\). It would be interesting to determine whether these quantities have an operational meaning in the case where A and B do not commute. On the more formal side, perhaps there is a pseudo-probability representation of quantum mechanics, such as the Wigner function [45,46,47] or the Kirkwood-Dirac distribution [48,49,50], for which these are the correlations for observables as defined on the appropriate phase space. This might help to find uses for the propagation of error formulas in cases where the observables do not commute.