We have seen how quantum mechanical kets (states and other vectors) can be represented by wave-functions. Kets are the vectors of the vector space which encompasses a quantum system. Kets are either states or the result of operators on states. Most of the previous chapters deal with wave mechanics, where the kets are continuous functions of space, and therefore, the vector space has infinite dimension.

However, kets are often represented as finite-dimensional vectors, sometimes for convenience (e.g., orbital angular momentum) , and sometimes by necessity (e.g., spin). Such a ket is called a discrete ket or “finite-dimensional ket.” It may be written as a column of N complex numbers , where the vector space has finite dimension, N. Most notably:

All of angular momentum, both orbital and spin, can be described by finite-dimensional vector spaces.

In addition, lots of other things can be represented or well-approximated by finite-dimensional states, including an ammonia atom’s nitrogen position, electron configurations in atoms and molecules, or excitations of an oscillator where only a finite number of states are likely.

Column vectors, row vectors, and matrices are perfect for quantum mechanics (QM), since they are defined to be the elements of linear transformations, and so represent the fundamental axiom of QM: systems exist in a linear superposition of states. As such, no description of QM is complete without matrix mechanics : the QM of systems which can be represented by finite-dimensional vectors and matrices. Just as wave-functions are vectors in a vector space , finite-dimensional vectors are also vectors in a vector space. (Some quantum references call a finite-dimensional vector space a “Hilbert space,” but mathematicians insist a Hilbert space must be infinite dimensional. Therefore, we use the generic term “vector space” for finite dimensional cases.)

Because of the simple representation of discrete kets and operators as finite vectors and matrices, many QM concepts are easier to describe in finite-dimensional systems, even if they also apply to continuum systems. For example, density matrices are much easier to visualize and understand in finite-dimensional vector spaces .

Note that the dimension of the quantum state vector space describing a system has nothing to do with the dimension of the physical space of that system. Most of the systems we will consider exist in ordinary three-dimensional (3D) space, but are described by quantum state spaces of many different dimensions. For example, particles orbit in 3D space, but the state space of orbital angular momentum for l = 0 is 1D, for l = 1 is 3D, for l = 2 is 5D, and for arbitrary l is (2l + 1)D.

4.1 Finite-Dimensional Kets, Bras, and Inner Products

There is a strong analogy between continuous state space (wave-function space) and discrete state spaces. When written in Dirac notation , all the formulas of wave mechanics apply equally well to matrix mechanics , which illustrates again the utility of Dirac notation. Most of the wave mechanics algebra of kets, bras, and operators has simple analogs in finite dimensions. We describe those analogies as we go. Note that discrete space QM uses the standard mathematics of linear algebra, which is not derived from the continuous spaces, but is analogous to continuous spaces.

Finite-dimensional kets have N components and N basis kets, i.e., any ket can be written as a linear combination of N basis vectors. For example,

$$ {\rm{For}}\;N = 3,\quad | \psi \rangle = \left[{\begin{array}{*{20}{c}}{{a_1}}\\{{a_2}}\\{{a_3}}\end{array}} \right] = {a_1}\left| {{\phi _1}} \right\rangle +{a_2}\left| {{\phi _2}} \right\rangle + {a_3}\left| {{\phi _3}}\right\rangle $$

.

(We often use 3D quantum state spaces as examples, because they are nontrivial and illustrative. However, this has nothing to do with ordinary 3D space.) In wave mechanics, a ket \( \left| \psi\right\rangle \leftrightarrow \psi ( x ) \), where given x, ψ(x) is the complex value of the ket \( \left| \psi\right\rangle\) at position x. In finite dimensions, \( \left| \psi\right\rangle \leftrightarrow {{\psi }_{j}} \), where given an index j, ψ j is the complex value of the jth component of \( \left| \psi\right\rangle\). For general N:

$$ \left| \psi\right\rangle =\left[ \begin{matrix} {{a}_{1}}\\ {{a}_{2}}\\ :\\ {{a}_{N}}\\\end{matrix} \right]=\sum\limits_{k=1}^{N}{{{a}_{k}}\left| {{\phi }_{k}} \right\rangle ={{a}_{1}}\left| {{\phi }_{1}} \right\rangle +{{a}_{2}}\left| {{\phi }_{2}} \right\rangle +...}, $$

where \(\left| {{\phi }_{k}} \right\rangle \equiv \) basis kets and a k are complex.

Inner products: Inner products and bras are analogous to wave mechanics:

Let

$$ \left| \chi\right\rangle =\left[ \begin{matrix} {{c}_{1}}\\ {{c}_{2}}\\ {{c}_{3}}\\\end{matrix} \right]. $$

Then:

$$ \left\langle\chi| \psi\right\rangle =\sum\limits_{j=1}^{N}{c_{j}^{*}{{a}_{j}}},\quad \text{analogous to} \quad\left\langle\alpha| \beta\right\rangle =\int_{\infty }^{{}}{{{\alpha }^{*}}(x)\beta (x)\ dx}. $$

Therefore, bras are written as row vectors, with conjugated components, so that an inner product is given by ordinary matrix multiplication:

$$ \left\langle \chi \right| \equiv {\left( {\left| \chi \right\rangle } \right)^\dag } = \left[{\begin{array}{*{20}{c}}{c_1^*}&{c_2^*}&{c_3^*}\end{array}} \right] \Rightarrow \left\langle {\chi }\mathrel{\left | {\vphantom {\chi \psi }} \right.} {\psi }\right\rangle = \left[{\begin{array}{*{20}{c}}{c_1^*}&{c_2^*}&{c_3^*}\end{array}} \right]\left[ {\begin{array}{*{20}{c}}{{a_1}}\\{{a_2}}\\{{a_3}}\end{array}} \right] = c_1^*{a_1} + c_2^*{a_2} + c_3^*{a_3}.$$

Recall that the dagger symbol acting on a ket produces the dual bras: \( {{( \left| \chi\right\rangle)}^{\dagger }}\equiv \left\langle\chi\right| \).

Kets are written as column vectors, and bras are written as row vectors.

The squared magnitude of a vector is then:

$$ {{\left| \left| \psi\right\rangle\right|}^{2}}=\left\langle\psi| \psi\right\rangle =\sum\limits_{j=1}^{N}{{{a}_{j}}^{*}{{a}_{j}}}=\sum\limits_{j=1}^{N}{{{\left| {{a}_{j}} \right|}^{2}}}. $$

All of these definitions comply with standard mathematical definitions.

4.2 Finite-Dimensional Linear Operators

Operators acting on kets (vectors): Matrix multiplication is defined to be the most general linear operation possible on a discrete vector. Therefore:

Any discrete linear operator can be written as a matrix, which operates on a vector by matrix multiplication.

The matrix elements are, in general, complex. For example, an operator in a 3D quantum state space (not physical 3D space) can be written:

$$ \hat{B}=\left[ \begin{matrix} {{B}_{11}} & {{B}_{12}} & {{B}_{13}}\\ {{B}_{21}} & {{B}_{22}} & {{B}_{23}}\\ {{B}_{31}} & {{B}_{32}} & {{B}_{33}}\\\end{matrix} \right]. $$

It is important to have a good mental image of a matrix multiplying a vector (Fig. 4.1).

Fig. 4.1
figure 1

Visualization of a matrix premultiplying a vector, yielding a weighted sum of the matrix columns

Each component of the vector multiplies the corresponding column of the matrix. These “weighted” columns (vectors) are then added (horizontally) to produce the final result. Thus, when used as a linear operator , matrix multiplication of a vector converts each vector component into a whole vector with full N components, and sums those vectors. Matrix multiplication is linear, which means:

$$ \hat{B}( a\left| v \right\rangle +\left| w \right\rangle)=a\hat{B}\left| v \right\rangle +\hat{B}\left| w \right\rangle$$

for all \(a,\left| v \right\rangle ,\left| w \right\rangle \).

Average values : Now let us look at the elements of the \(\hat{B}\) matrix, B ij , another way. First, consider the diagonal elements, B ii . Recall that we compute the average value of B on a ket \(\left| \psi\right\rangle \) as:

$$ \left\langle B \right\rangle \ \text{in}\ \text{state}\ \left| \psi\right\rangle =\left\langle\psi\right|\hat{B}\left| \psi\right\rangle . $$

If \(\left| \psi\right\rangle \) is just a single basis vector, say \(\left| {{\phi }_{1}} \right\rangle \), then

$$ \left| \psi \right\rangle = \left[ {\begin{array}{*{20}{c}}1\\0\\0\end{array}} \right],\quad \left\langle \psi \right|\hat B\left| \psi \right\rangle = \left[ {\begin{array}{*{20}{c}}1&0&0\end{array}} \right]\left[ {\begin{array}{*{20}{c}}{{B_{11}}}&.&.\\.&.&.\\.&.&.\end{array}} \right]\left[ {\begin{array}{*{20}{c}}1\\0\\0\end{array}} \right] = {B_{11}}. $$

More generally, ψ is a superposition: \( \left| \psi \right\rangle = {a_1}\left| {{\phi _1}} \right\rangle + {a_2}\left| {{\phi _2}} \right\rangle + {a_3}\left| {{\phi _3}} \right\rangle \). Looking at all the diagonal elements, \(\left\langle{{\phi }_{i}} \right|\hat{B}\left| {{\phi }_{i}} \right\rangle ={{B}_{ii}}\), we see that each basis vector \( \left| {{\phi }_{i}} \right\rangle\) in \(\left| \psi\right\rangle \) contributes an amount B ii to the average of B, weighted by \( {{\left| {{a}_{i}} \right|}^{2}} \)(the squared magnitude of the amount of \( \left| {{\phi }_{i}} \right\rangle\) in \(\left| \psi\right\rangle \)).

Now, what do the off-diagonal elements mean?

When computing an average, off-diagonal elements of an operator matrix describe the interaction or “coupling” between two different basis vectors for the given operator.

That is, in addition to the diagonal contributions just described, if \(\left| \psi\right\rangle \) contains two different components \( \left| {{\phi }_{i}} \right\rangle\) and\( \left| {{\phi }_{j}} \right\rangle\), then their interaction produces two additional contributions to the average of B. For example:

$$ \begin{array}{l}\left| \psi \right\rangle = \left[ {\begin{array}{*{20}{c}}{{a_1}}\\{{a_2}}\\0\end{array}} \right], \\\left\langle \psi \right|\hat B\left| \psi \right\rangle =\left[ {\begin{array}{*{20}{c}} {{a_1}^*}&{{a_2}^*}&0\end{array}} \right]\left[ {\begin{array}{*{20}{c}}{{B_{11}}}&{{B_{12}}}&.\\{{B_{21}}}&{{B_{22}}}&.\\.&.&.\end{array}} \right]\left[ {\begin{array}{*{20}{c}}{{a_1}}\\{{a_2}}\\0\end{array}} \right] = {\left| {{a_1}} \right|^2}{B_{11}} + {\left| {{a_2}} \right|^2}{B_{22}} + + \;.\end{array} $$

The off-diagonal elements B12 and B21 produce the last two interaction terms on the right. When evaluating \(\left\langle\psi\right|\hat{B}\left| \psi\right\rangle \) for a Hermitian operator , those terms are conjugates of each other, and their sum is real. This means that the ith component of \(\hat{B}\left| \psi\right\rangle \) has half of the i-j interaction and the jth component of \(\hat{B}\left| \psi\right\rangle \) has the other half (though they are complex conjugates of each other). This means the average value of a Hermitian operator in any state is real. In particular, the average value of an eigenstate is real, which means the eigenvalues are also real.

Thus, each off-diagonal element B ij of \(\hat{B}\) is the result of the interaction of two basis components \( \left| {{\phi }_{i}} \right\rangle\) and \( \left| {{\phi }_{j}} \right\rangle\), where i and j take on all values 1, 2,…,N. To find the numbers B ij , the interactions are computed as if both a i and a j were 1. When we later multiply by some actual ket \( \left| \psi\right\rangle\), the elements B ij get properly weighted by the components a i and a j of \( \left| \psi\right\rangle\). Thus, we can write the elements of \(\hat{B}\) as the operation of \(\hat{B}\) between all pairs of basis vectors:

$$ {{B}_{ij}}=\left[ \begin{matrix} \left\langle{{\phi }_{1}} \right|\hat{B}\left| {{\phi }_{1}} \right\rangle& \left\langle{{\phi }_{1}} \right|\hat{B}\left| {{\phi }_{2}} \right\rangle& \left\langle{{\phi }_{1}} \right|\hat{B}\left| {{\phi }_{3}} \right\rangle \\ \left\langle{{\phi }_{2}} \right|\hat{B}\left| {{\phi }_{1}} \right\rangle& \left\langle{{\phi }_{2}} \right|\hat{B}\left| {{\phi }_{2}} \right\rangle& \left\langle{{\phi }_{2}} \right|\hat{B}\left| {{\phi }_{3}} \right\rangle \\ \left\langle{{\phi }_{3}} \right|\hat{B}\left| {{\phi }_{1}} \right\rangle& \left\langle{{\phi }_{3}} \right|\hat{B}\left| {{\phi }_{2}} \right\rangle& \left\langle{{\phi }_{3}} \right|\hat{B}\left| {{\phi }_{3}} \right\rangle \\\end{matrix} \right]. $$

Again, each column i of the B matrix is a vector which will get weighted by the ket component a i . The vector sum of these weighted vectors is the resultant vector \(\hat{B}\left| \psi\right\rangle \).

Note that we sometimes take the inner product with some bra other than \( \left\langle\psi\right| \). For example, in perturbation theory, we often take inner products \(\left\langle{{\phi }_{k}} \right|\hat{H}\left| {{\phi }_{m}} \right\rangle \), where the bra and ket are different states. Our visualization works just as well for an inner product where the bra is different than the ket: \(\left\langle\chi\right|\hat{B}\left| \psi\right\rangle \).

Recall our discussion of “local” and “nonlocal” operators on wave-functions. Local operators depend only on (at most) an infinitesimal neighborhood around a point. In finite dimensions, this is analogous to the diagonal elements of an operator. The off-diagonal elements are like “nonlocal” effects: the value of one component of the vector being acted on contributes to a different component of the result.

Visualization of an inner product with an operator: Besides the matrix multiplication shown previously, it is also sometimes helpful to visualize the inner product \(\left\langle\psi\right|\hat{B}\left| \psi\right\rangle \) another way. This visualization will be used later for density matrices. An inner product with an operator is trilinear: it is linear in the row vector (bra) , linear in the operator matrix, and linear in the column vector (ket) . Therefore, the inner product must be able to be written as a sum of terms, each term containing exactly three factors: one from the bra, one from the operator, and one from the ket. We can see this by explicit multiplication of an example:

$$\begin{array}{ll}\left\langle \psi \right|\hat{B}\left| \psi \right\rangle &=\underbrace{\left[ \begin{matrix} a_{1}^{*} & a_{2}^{*} & a_{3}^{*} \\ \end{matrix} \right]}_{\left\langle \psi \right|}\ \underbrace{\left[ {{a}_{1}}\left[ \begin{matrix} {{b}_{11}} \\ {{b}_{21}} \\ {{b}_{31}} \\ \end{matrix} \right]+{{a}_{2}}\left[ \begin{matrix} {{b}_{12}} \\ {{b}_{22}} \\ {{b}_{32}} \\ \end{matrix} \right]+{{a}_{3}}\left[ \begin{matrix} {{b}_{13}} \\ {{b}_{23}} \\ {{b}_{33}} \\ \end{matrix} \right]\ \right]}_{\hat{B}\left| \psi \right\rangle }\\&=\left( \begin{matrix} a_{1}^{*}{{a}_{1}}{{b}_{11}}+a_{1}^{*}{{a}_{2}}{{b}_{12}}+a_{1}^{*}{{a}_{3}}{{b}_{13}} \\ + \\ a_{2}^{*}{{a}_{1}}{{b}_{21}}+a_{2}^{*}{{a}_{2}}{{b}_{22}}+a_{2}^{*}{{a}_{3}}{{b}_{23}} \\ + \\ a_{3}^{*}{{a}_{1}}{{b}_{31}}+a_{3}^{*}{{a}_{2}}{{b}_{32}}+a_{3}^{*}{{a}_{3}}{{b}_{33}} \\ \end{matrix} \right)\begin{array}{*{35}{l}} \}={{P}_{11}} \\ {} \\ \}={{P}_{22}} \\ {} \\ \}={{P}_{33}} \\ \end{array}\end{array}.$$
(4.1)

(We define P ij shortly.) Indeed, the inner product is a sum of N2 terms, of three factors each. Note that this is equivalent to:

$$ \left\langle\psi\right|\hat{B}\left| \psi\right\rangle =\sum\limits_{i=1}^{N}{{}}\sum\limits_{j=1}^{N}{a_{i}^{*}{{B}_{ij}}{{a}_{j}}}. $$
(4.2)

It will be useful (for density matrices) to separate the pieces of the inner product sum in Eq. (4.2) into the products of the bra and ket components, and then separately, the elements of the operator matrix. First, construct the N × N matrix of all combinations of the components of \(\left| \psi\right\rangle \) and \(\left\langle\psi\right|\):

$$ \begin{array}{ll}\left| \psi \right\rangle &= \left[ {\begin{array}{*{20}{c}}{{a_1}}\\{{a_2}}\\{{a_3}}\end{array}} \right], \left| \psi \right\rangle \otimes \left\langle \psi \right| \equiv \left| \psi \right\rangle \left\langle \psi \right| = \left[ {\begin{array}{*{20}{c}}{{a_1}}\\{{a_2}}\\{{a_3}}\end{array}} \right]\left[ {\begin{array}{*{20}{c}}{a_1^*}&{a_2^*}&{a_3^*}\end{array}} \right] \\&= \left[ {\begin{array}{*{20}{c}}{\begin{array}{*{20}{c}}{a_1^*{a_1}}\\{a_1^*{a_2}}\\{a_1^*{a_3}}\end{array}}&{\begin{array}{*{20}{c}}{a_2^*{a_1}}\\{a_2^*{a_2}}\\{a_2^*{a_3}}\end{array}}&{\begin{array}{*{20}{c}}{a_3^*{a_1}}\\{a_3^*{a_2}}\\{a_3^*{a_3}}\end{array}}\end{array}} \right]\;,\\{\rm{or}}\; &\quad\quad{\left[ {\;\left| \psi \right\rangle \left\langle\psi \right|\;} \right]_{\;ij}} = a_j^*{a_i}\;.\end{array} $$
(4.3)

This N × N matrix is the outer product (aka tensor product) of \(\left| \psi\right\rangle \) with its dual bra. It is a 3 × 1 matrix times a 1 × 3 matrix, producing a 3 × 3 matrix, under the standard rules of matrix multiplication. (Also, the trace of the outer product of a bra and a ket is their inner product.) The \(\left| \psi\right\rangle \left\langle\psi\right|\) matrix lists “how much” of each basis pair will contribute to the inner product \(\left\langle\psi\right|\hat{B}\left| \psi\right\rangle \). Then from Eq. (4.2), a general inner product becomes:

$$ \left\langle \psi \right|\hat B\left| \psi \right\rangle = \sum\limits_{i = 1}^N {} \sum\limits_{j = 1}^N {{B_{ij}}a_i^*{a_j}} = \sum\limits_{i = 1}^N {} \sum\limits_{j = 1}^N {{B_{ij}}} {\left[ {\;\left| \psi \right\rangle \left\langle \psi \right|\;} \right]_{ji}} $$
(4.4)

The last factor is the transpose of \(\left| \psi\right\rangle \left\langle\psi\right|\). Figure 4.2 illustrates the computation of the inner product.

Fig. 4.2
figure 2

A visualization of the inner product, with an operator

The total inner product , given in Eq. (4.1) and Fig. 4.2, is the sum of the inner products of the B matrix rows with the columns of \(\left| \psi\right\rangle \left\langle\psi\right|\): (B row 1)·(\(\left| \psi\right\rangle \left\langle\psi\right|\) column 1) + (B row 2)·(\(\left| \psi\right\rangle \left\langle\psi\right|\) column 2) + …. These N terms are the diagonal elements of the matrix product \(B[ \left| \psi\right\rangle \left\langle\psi\right| ]\), which we define for convenience as \(P \equiv {P_{ij}}\) \(P\equiv {{P}_{ij}}\).

$$ {\bf{B}}\left[ {\;\left| \psi \right\rangle \left\langle \psi \right|\;} \right] \equiv {P_{ij}} = \left[ {\begin{array}{*{20}{c}}{{P_{11}}}&.&.\\.&{{P_{22}}}&.\\.&.&{{P_{33}}}\end{array}} \right]. $$

Then Eq. (4.4) becomes:

$$ \left\langle \psi \right|\hat B\left| \psi \right\rangle = \sum\limits_{i = 1}^N {} {\left[ {{\bf{B}}\left[ {\;\left| \psi \right\rangle \left\langle \psi \right|\;} \right]} \right]_{ii}} = {P_{11}} + {P_{22}} + {P_{33}} = {\mathop{\rm Tr}\nolimits} \left( {{\bf{B}}\left[ {\;\left| \psi \right\rangle \left\langle \psi \right|\;} \right]} \right). $$
(4.5)

Note, also:

$$ {{\rm {Tr}}} \left( {{\bf{UV}}} \right) = {\mathop{\rm {Tr}}\nolimits} \left( {{\bf{VU}}} \right) \quad \Rightarrow \quad \left\langle \psi \right|\hat B\left| \psi \right\rangle = {\mathop{\rm {Tr}}\nolimits} \left[ {{\bf{B}}\left( {\;\left| \psi \right\rangle \left\langle \psi \right|\;} \right)} \right] = {\mathop{\rm {Tr}}\nolimits} \left[ {\left( {\;\left| \psi \right\rangle \left\langle \psi \right|\;} \right){\bf{B}}} \right]. $$

Finite-dimensional adjoint operators: We now show that finite-dimensional adjoints are given by the conjugate transpose of the original operator matrix. Recall the definition of the adjoint \({{\hat{B}}^{\dagger }}\) of the operator \(\hat{B}\):

$$ \left\langle\psi\right|{{\hat{B}}^{\dagger }}\equiv \hat{B}{{\left| \psi\right\rangle }^{DC}}\equiv {{( \hat{B}\left| \psi\right\rangle)}^{\dagger }}. $$

In other words, the adjoint operator acts to the left the way the original operator acts to the right, i.e., the adjoint produces a bra from \( \left\langle\psi\right| \) whose elements are the conjugates of the column vector \(\hat{B}\left| \psi\right\rangle \). \({{\hat{B}}^{\dagger }}\) is pronounced “bee dagger.”

To construct our adjoint matrix, first note the visualization of postmultiplying a bra (row vector) by a matrix (which is analogous to the visualization of premultiplying a vector by a matrix):

We see that postmultiplying a bra produces a weighted sum of the rows of the matrix (much like premultiplying a vector produces a weighted sum of the columns). By definition of the adjoint, the elements of the resulting bra (row vector) must be the conjugates of the corresponding ket (column vector). Comparing Fig. 4.3 to Eq. (4.1), we see that this happens if the rows of \({{\hat{B}}^{\dagger }}\) are the conjugates of the columns of \(\hat{B}\). Therefore:

Fig. 4.3
figure 3

Visualization of a matrix postmultiplying a row vector, yielding a weighted sum of the matrix rows

The adjoint operator matrix is the conjugate-transpose of the original operator matrix.

$$ {\rm{For}}\,{\rm{example}}\,\hat B = \left[ {\begin{array}{*{20}{c}}1&{2i}\\{ - 3i}&{ - 1 - 5i}\end{array}} \right] \Rightarrow {\hat B^\dag } = \left[ {\begin{array}{*{20}{c}}1&{3i}\\{ - 2i}&{ - 1 + 5i}\end{array}} \right]. $$

Hermitian (self-adjoint) operators are very important in QM: all physical observable operators are Hermitian. The eigenvalues of a Hermitian matrix are real, and hence all observable properties, and their averages, are real. A Hermitian matrix equals its own conjugate-transpose , i.e., conjugate symmetric across the main diagonal. For example,

$$ \hat B = \left[ {\begin{array}{*{20}{c}}1&2&{3 + 4i}\\2&{ - 2}&{1 - i}\\{3 - 4i}&{1 + i}&2\end{array}} \right] \Rightarrow \hat B = {\hat B^\dag }\;{\rm{and}}\;\hat B\;{\rm{is}}\;{\rm{Hermitian}}{\rm{.}} $$

It follows immediately that the diagonal elements of a Hermitian matrix are real.

A simple example of an operator and its adjoint is the spin-1/2 raising and lowering operators , which add (or subtract) 1 to the m s quantum number of a spin ket (if possible):

$$ {\hat s_ + }\left( {\begin{array}{*{20}{c}}0\\1\end{array}} \right) = \left( {\begin{array}{*{20}{c}}1\\0\end{array}} \right), {\rm{and}} {\hat s_ + }\left( {\begin{array}{*{20}{c}}1\\0\end{array}} \right) = \left( {\begin{array}{*{20}{c}}0\\0\end{array}} \right) \Rightarrow {\hat s_ + } = \left[ {\begin{array}{*{20}{c}}0&1\\0&0\end{array}} \right], {\rm{and}} \hat s_ + ^\dag = \left[ {\begin{array}{*{20}{c}}0&0\\1&0\end{array}} \right]. $$

Then:

$$ \left( {\begin{array}{*{20}{c}}0&1\end{array}} \right)\hat s_ + ^\dag = \left( {\begin{array}{*{20}{c}}1&0\end{array}} \right), {\rm{and}} \left( {\begin{array}{*{20}{c}}1&0\end{array}} \right)\hat s_ + ^\dag = \left( {\begin{array}{*{20}{c}}0&0\end{array}} \right). $$

(We see explicitly that ŝ +  = ŝ.)

Non-hermitian operators: For non-hermitian operators , the interactions between the ith and jth components are not conjugate, or necessarily related at all. Consider the N = 3 space of angular momentum j = 1, and Ĵ acting on vectors with components labeled (a1, a0, a–1):

$$ {\hat J_ - }\left( {\begin{array}{*{20}{c}}{{a_1}}\\0\\0\end{array}} \right) = \left[ {\begin{array}{*{20}{c}}0&0&0\\{\sqrt 2 }&0&0\\0&{\sqrt 2 }&0\end{array}} \right]\left( {\begin{array}{*{20}{c}}{{a_1}}\\0\\0\end{array}} \right) = \left( {\begin{array}{*{20}{c}}0\\{\sqrt 2 {a_1}}\\0\end{array}} \right), {\hat J_ - }\left( {\begin{array}{*{20}{c}}0\\{{a_0}}\\0\end{array}} \right) = \left[ {\begin{array}{*{20}{c}}0&0&0\\{\sqrt 2 }&0&0\\0&{\sqrt 2 }&0\end{array}} \right]\left( {\begin{array}{*{20}{c}}0\\{{a_0}}\\0\end{array}} \right) = \left( {\begin{array}{*{20}{c}}0\\0\\{\sqrt 2 {a_0}}\end{array}} \right). $$

We see that under Ĵ, the a1 component of the input vector contributes to the a0 result, but the a0 component of the input vector does not contribute to the a1 result. There is no symmetry across the main diagonal of the Ĵ matrix, and therefore no symmetry between the effects of, say, the a1 input on the a0 output, and the a0 input on the a1 output.

These finite-dimensional operator matrix principles apply to any dimension N.

Representing wave-functions as finite-dimensional kets: Note that in some cases, wave-functions that we have thought to be infinite dimensional may actually be finite dimensional. For example, the l = 1 orbital angular momentum states are given by the three functions Y1,1(θ, φ), Y1,0(θ, φ), and Y1,–1(θ, φ). Although these can be written in the infinite-dimensional basis of θ and Φ, the l = 1 space is only 3D.

4.3 Getting to Second Basis: Change of Bases

QM is a study of vectors, and vectors are often expressed in terms of components in some basis vectors:

$$\mathbf{r}=a{{\mathbf{e}}_{x}}+b{{\mathbf{e}}_{y}}+c{{\mathbf{e}}_{z}}\quad where\quad {\mathbf{e}_{x}}\text{, }{\mathbf{e}_{y}}\text{, and }{\mathbf{e}_{z}}\text{ are basis vectors,}\quad \text{or}$$
$$ \left| \chi\right\rangle =a\left| z+ \right\rangle +b\left| z- \right\rangle \quad where \quad\left| z+ \right\rangle ,\left| z- \right\rangle\text{are basis vectors.} $$

Our choice of basis vectors (i.e., our basis) is, in principle, arbitrary, since all observable calculations are independent of basis . However, most times one or two particular bases are significantly more convenient than others. Therefore, it is often helpful to change our basis: i.e., we transform our components from one basis to another. Note that such a transformation does not change any of our vectors; it only changes how we write the vectors, and how the internals of some calculations are performed, without changing any observable results.

A basis change transforms the components of vectors and operators; it does not transform the vectors or operators themselves.

Angular momentum provides many examples where changing bases is very helpful. The infamous Clebsch–Gordon coefficients are used to change bases.

Many references refer to the “transformation of basis vectors,” but this is a misnomer. We do not transform our basis vectors; we choose new ones. We can write the new basis vectors as a superposition of old basis vectors, and we can even write these relations in matrix form, but this is fundamentally a different mathematical process than transforming the components of a vector.

In matrix mechanics , we can change the basis of a ket by multiplying it by a transformation matrix . Such a matrix is not a quantum operator in the usual sense. The usual quantum operators we have discussed so far change a ket into another ket; a transformation matrix changes the representation of a ket in one basis into a representation of the same ket in another basis. We will show that for orthonormal bases, the transformation matrix is unitary (preserves the magnitudes of kets), as it must be if it does not change the ket.

We describe here basis changes in the notation of QM, but the results apply to all vectors in any field of study. We consider mostly orthonormal bases , since more general bases are rarely used in QM. We rely heavily on our visualization of matrices and matrix multiplication, from Eq. (4.2). In matrix mechanics , there are two parts to changing bases: transforming vector components , and transforming the operator matrices.

We use a notation similar to [18, Chap. 6]. Note that some references use a notation in which the roles of U and U are reversed from our notation below. (And some references use both notations, in different parts of the book.)

4.3.1 Transforming Vector Components to a New Basis

We describe a general transformation matrix in two complementary ways. We start with an N-dimensional vector expressed in the orthonormal basis b i :

$$ \left| w \right\rangle ={{a}_{1}}\left| {{b}_{1}} \right\rangle +{{a}_{2}}\left| {{b}_{2}} \right\rangle +\ldots +{{a}_{N}}\left| {{b}_{N}} \right\rangle. $$
(4.6)

It is important to note that the previous is a vector equation and is true in any basis. The components of \( \left| w \right\rangle\) in the b-basis are a i , i = 1,…,N. Furthermore, the a i are given by:

$$ {{a}_{i}}=\left\langle{{b}_{i}} | w \right\rangle . $$

How would we convert the components a i of \( \left| w \right\rangle\) into a new basis, n j ? That is:

$$ \left| w \right\rangle = {c_1}\left| {{n_1}} \right\rangle + {c_2}\left| {{n_2}} \right\rangle + \;...\;{c_N}\left| {{n_N}} \right\rangle , {c_j} = \left\langle {{{n_j}}}\mathrel{\left | {\vphantom {{{n_j}} u}} \right.\kern-\nulldelimiterspace} {u} \right\rangle . $$

The inner product on the right, for each c j , can be evaluated in any basis we choose. In particular, it can be evaluated in our old basis, the b-basis. In matrix notation, we have:

$$ {{c}_{j}}=\left\langle{{n}_{j}} | w \right\rangle ={{( {{\mathbf{n}}_{j}}\ \text{bra}\ \to)}_{\text{old}}}{{\left( \begin{matrix} {{a}_{1}}\\ {{a}_{2}}\\ :\\\end{matrix} \right)}_{\text{old}}}. $$

This is true for every j, so we can write the entire transformation as a single matrix multiplication:

$$ \left| w \right\rangle = {\left( {\begin{array}{*{20}{c}}{{c_1}}\\{{c_2}}\\:\end{array}} \right)_{{\rm{new}}}} = \underbrace {\left( {\begin{array}{*{20}{c}}{{{\left( {{{\bf{n}}_1}\;{\rm{bra}} \to } \right)}_{{\rm{old}}}}}&{...}\\{{{\left( {{{\bf{n}}_2}\;{\rm{bra}} \to } \right)}_{{\rm{old}}}}}&{...}\\:&{}\end{array}} \right)}_{N \times N\;{\rm{matrix}} \equiv U}{\left( {\begin{array}{*{20}{c}}{{a_1}}\\{{a_2}}\\:\end{array}} \right)_{{\rm{old}}}}. $$
(4.7)

(Recall that when multiplying a vector by a matrix, the jth element of the result is the dot-product (without conjugating the row) of the jth matrix row with the given vector.) In other words, the rows of the transformation matrix are the bras of the new basis vectors written in the old basis. We call the transformation matrix U.

We now consider a second view of the same transformation matrix . The vector Eq. (4.6) is true in any basis, so we can write it in the new basis. Recall that a matrix multiplied by a vector produces a weighted sum of the matrix columns, therefore:

$$ \left| w \right\rangle ={{\left( \begin{matrix} {{c}_{1}}\\ {{c}_{2}}\\ :\\\end{matrix} \right)}_{\text{new}}}=\underbrace{\left( \begin{matrix} {{\left( \begin{matrix} {{\mathbf{b}}_{1}}\\ \downarrow \\\end{matrix} \right)}_{\text{new}}} & {{\left( \begin{matrix} {{\mathbf{b}}_{2}}\\ \downarrow \\\end{matrix} \right)}_{\text{new}}} & ...\\\end{matrix} \right)}_{N\times N\ \text{matrix}\equiv U}{{\left( \begin{matrix} {{a}_{1}}\\ {{a}_{2}}\\ :\\\end{matrix} \right)}_{\text{old}}}={{a}_{1}}{{\left( \begin{matrix} {{\mathbf{b}}_{1}}\\ \downarrow \\\end{matrix} \right)}_{\text{new}}}+{{a}_{2}}{{\left( \begin{matrix} {{\mathbf{b}}_{2}}\\ \downarrow \\\end{matrix} \right)}_{\text{new}}}+\ \ldots . $$
(4.8)

In other words, the columns of the transformation matrix are the old basis vectors written in the new basis. This view is equivalent to our first view of the transformation matrix (for orthonormal bases).

Transforming bra components: By the definition of adjoint, we can determine how to transform the components of a bra . We start with the transform of the ket components, and use the definition of adjoint:

$$ \left| w \right\rangle ={\left( {\begin{array}{*{20}{c}}{{c_1}}\\ {{c_2}}\\ :\end{array}}\right)_{{\rm{new}}}} = U{\left({\begin{array}{*{20}{c}}{{a_1}}\\ {{a_2}}\\ :\end{array}}\right)_{{\rm{old}}}} \Rightarrow \left\langle w \right| =\left( {\begin{array}{*{20}{c}} {c_1^*}&{c_2^*}&{...}\end{array}} \right){U^\dag }. $$
(4.9)

This follows directly from visualizing how premultiplying a column vector (ket) compares to postmultiplying a row vector (bra) .

Summary of vector basis change: The transformation matrix can be viewed two ways: (1) as a set of row vectors, which are the bras of the new basis vectors written in the old basis; or (2) as a set of column vectors, which are the old basis vectors written in the new basis. These two views are equivalent.

In the rare case of a nonorthonormal basis , we note that the second view of the transformation matrix is still valid, since the vector Eq. (4.6) is always true, even in nonorthonormal bases . In contrast, the first view of the transformation matrix is not valid in nonorthonormal bases, because the components in such bases cannot be found from simple inner products. This means the transformation matrix for nonorthonormal bases is not unitary.

Basis change example: Again we use spin 1/2 as a simple example of a two-dimensional (2D) vector space , though we need not understand spin at all for this example. Suppose we want to transform the components of the 2D vector \(\left| \chi\right\rangle \) from the z-basis to the x-basis, i.e., given a, b, we wish to find c and d such that:

$$ \left| \chi\right\rangle = {\left( {\begin{array}{*{20}{c}} a\\ b\end{array}} \right)_z} = {\left( {\begin{array}{*{20}{c}} c\\ d\end{array}} \right)_x} \Rightarrow \left| \chi \right\rangle= a\left| {z + } \right\rangle + b\left| {z - } \right\rangle =c\left| {x + } \right\rangle + d\left| {x - } \right\rangle . $$

(This is analogous to transforming the components of a 2D spatial vector in the x-y plane to some other set of basis axes.) We are given the new x-basis vectors in the old z-basis:

$$ \begin{array}{l}\left| {x + } \right\rangle = \left( {1/\sqrt 2 } \right)\left| {z + } \right\rangle + \left( {1/\sqrt 2 } \right)\left| {z - } \right\rangle = \left( {\begin{array}{*{20}{c}}{1/\sqrt 2 }\\{1/\sqrt 2 }\end{array}} \right),\\\left| {x - } \right\rangle = \left( {1/\sqrt 2 } \right)\left| {z + } \right\rangle - \left( {1/\sqrt 2 } \right)\left| {z - } \right\rangle = \left( {\begin{array}{*{20}{c}}{1/\sqrt 2 }\\{ - 1/\sqrt 2 }\end{array}} \right)\;.\end{array} $$

Then the rows of the transformation matrix U are the conjugates of the coefficients of \(\left| x+ \right\rangle \) and \(\left| x- \right\rangle \) (since the coefficients are real, the conjugation is invisible):

$$ \begin{array}{l} c = \left\langle {{x + }} \mathrel{\left |{\vphantom {{x + } \chi }} \right. \kern-\nulldelimiterspace}{\chi } \right\rangle , d = \left\langle {{x - }} \mathrel{\left| {\vphantom {{x - } \chi }} \right. \kern-\nulldelimiterspace}{\chi } \right\rangle , {\rm{or}}\\ {\left({\begin{array}{*{20}{c}} c\\ d \end{array}} \right)_x} =\underbrace {\left[ {\begin{array}{*{20}{c}} {{{\left( {x +\;{\rm{bra}} \to } \right)}_z}}\\ {{{\left( {x - \;{\rm{bra}} \to} \right)}_z}} \end{array}} \right]}_{2 \times2\;{\rm{matrix}}}{\left( {\begin{array}{*{20}{c}} a\\ b\end{array}} \right)_z} = \left( {\begin{array}{*{20}{c}} {1/\sqrt2 }&{1/\sqrt 2 }\\ {1/\sqrt 2 }&{ - 1/\sqrt 2 } \end{array}}\right){\left( {\begin{array}{*{20}{c}} a\\ b \end{array}}\right)_z}\;.\end{array} $$

Also, we can just as well interpret the very same transformation matrix as:

$$ \left| \chi\right\rangle ={{\left( \begin{matrix} c\\ d\\\end{matrix} \right)}_{x}}=\underbrace{\left[ \begin{matrix} {{\left( \begin{matrix} z+\\ \downarrow \\\end{matrix} \right)}_{x}} & {{\left( \begin{matrix} z-\\ \downarrow \\\end{matrix} \right)}_{x}}\\\end{matrix} \right]}_{2\times 2\ \text{matrix}}\left( \begin{matrix} a\\ b\\\end{matrix} \right), $$

which tells us that the columns of the matrix are the z-basis vectors in the x-basis. Thus:

$$ \left| {z + } \right\rangle = \left( {1/\sqrt 2 }\right)\left| {x + } \right\rangle + \left( {1/\sqrt 2 }\right)\left| {x - } \right\rangle ,\;\;\; \left| {z - }\right\rangle = \left( {1/\sqrt 2 } \right)\left| {x + }\right\rangle - \left( {1/\sqrt 2 } \right)\left| {x - }\right\rangle . $$

4.3.2 The Transformation Matrix is Unitary

A basis-changing transformation matrix is often named U, because such a matrix is “unitary.” This important property will help us in the next section on transforming operator matrices. Recall that a unitary matrix is one which preserves the magnitude of all vectors. We prove that any basis transformation matrix is unitary using two different methods. We also show that every unitary matrix satisfies U = U−1, which is an equivalent definition of “unitary.”

The magnitude of a vector is a property of the vector itself, and is independent of the basis in which we represent the vector. Consider again our vector \(\left| w \right\rangle \) in the two bases b i and n j . Then

$$ {\left| {\left| w \right\rangle } \right|^2} = \left\langle {w}\mathrel{\left | {\vphantom {w w}}\right. \kern-\nulldelimiterspace}{w} \right\rangle = \left( {\begin{array}{*{20}{c}}{a_1^*}&{a_2^*}&{...}\end{array}} \right)\left( {\begin{array}{*{20}{c}}{{a_1}}\\{{a_2}}\\:\end{array}} \right) = \left( {\begin{array}{*{20}{c}}{c_1^*}&{c_2^*}&{...}\end{array}} \right)\left( {\begin{array}{*{20}{c}}{{c_1}}\\{{c_2}}\\:\end{array}} \right). $$

But the components a i and c j are related by:

$$ \left| w \right\rangle = {\left( {\begin{array}{*{20}{c}}{{c_1}}\\{{c_2}}\\:\end{array}} \right)_{{\rm{new}}}} = U{\left( {\begin{array}{*{20}{c}}{{a_1}}\\{{a_2}}\\:\end{array}} \right)_{{\rm{old}}}} \Rightarrow \left\langle w \right| = {\left( {\begin{array}{*{20}{c}}{c_1^*}&{c_2^*}&{...}\end{array}} \right)_{{\rm{new}}}} = {\left( {\begin{array}{*{20}{c}}{a_1^*}&{a_2^*}&{...}\end{array}} \right)_{{\rm{old}}}}{U^\dag }. $$

Then the inner product becomes:

$$ \left\langle {w}\mathrel{\left | {\vphantom {w w}}\right. \kern-\nulldelimiterspace}{w} \right\rangle = \left( {\begin{array}{*{20}{c}}{c_1^*}&{c_2^*}&{...}\end{array}} \right)\left( {\begin{array}{*{20}{c}}{{c_1}}\\{{c_2}}\\:\end{array}} \right) = \left( {\begin{array}{*{20}{c}}{a_1^*}&{a_2^*}&{...}\end{array}} \right){U^\dag }U\left( {\begin{array}{*{20}{c}}{{a_1}}\\{{a_2}}\\:\end{array}} \right). $$

Since this is true for every possible vector \(\left| w \right\rangle \), it must be that for every unitary matrix U, U U = 1N × N (the N × N identity matrix). This means:

$$ {{U}^{\dagger }}={{U}^{-1}}. $$

There is another way to see that the basis transformation matrix is unitary, using the descriptions of our transformation matrices from the previous section. The transformation matrix from the old b-basis to the new n-basis can be written [from Eq. (4.8)]:

$$ {{U}_{b\to n}}=\underbrace{\left( \begin{matrix} {{\left( \begin{matrix} {{\mathbf{b}}_{1}}\\ \downarrow \\\end{matrix} \right)}_{\text{new}}} & {{\left( \begin{matrix} {{\mathbf{b}}_{2}}\\ \downarrow \\\end{matrix} \right)}_{\text{new}}} & \ldots \\\end{matrix} \right)}_{N\times N\ \text{matrix}}. $$
(4.10)

The reverse transformation, from n to b, can be written [from the form of Eq. (4.7), but going from n to b]:

$$ {U_{n \to b}} = \underbrace {\left( {\begin{array}{*{20}{c}}{{{\left( {{{\bf{b}}_1}\;{\rm{bra}} \to } \right)}_{{\rm{old}}}}}&{...}\\{{{\left( {{{\bf{b}}_2}\;{\rm{bra}} \to } \right)}_{{\rm{old}}}}}&{...}\\:&{}\end{array}} \right)}_{N \times N\;{\rm{matrix}}}. $$
(4.11)

But U b ® n and U n ® b perform inverse transformations, and are therefore matrix inverses of each other. By inspection of the previous two equations, we see that they are adjoints (conjugate transposes) of each other. These statements are completely general, and true for any two orthonormal bases for any finite-dimensional vector space . Therefore, it is generally true that a basis transformation matrix is always unitary:

$$ {{U}^{\dagger }}={{U}^{-1}}, $$

as before.

A property of unitary matrices is that their rows (taken as vectors) are orthogonal to each other, and their columns are orthogonal to each other.

We can see this orthogonality from the preservation of inner products. Consider two basis vectors, \(i\ne j\). Their inner product is zero in both the old and the new bases:

$${{\left( {{\mathbf{b}}_{j}}\ bra\to \right)}_{old}}{{\left(\begin{matrix}{{\mathbf{b}}_{i}} \\\downarrow \\\end{matrix} \right)}_{old}}=0={{\left( \underbrace{{{\left( {{\mathbf{b}}_{j}}\ bra\to \right)}_{old}}{{U}^{\dagger }}}_{row\ j} \right)}_{new}}{{\left( \underbrace{U{{\left( \begin{matrix}{{\mathbf{b}}_{i}} \\ \downarrow \\\end{matrix}\right)}_{old}}}_{column\ i} \right)}_{new}}.$$

Since U is the conjugate-transpose of U, row j of U is the conjugate of column j of U, so the previous expression is the dot product of column j of U with column i. By the rules of matrix multiplication, it is also the ji element of the matrix product U U = 1N × N, which is zero since ij. Hence the columns of a unitary matrix are orthogonal. Furthermore, since UU = 1N × N , we can also say that the columns of U are orthogonal. Since the columns of U are the rows of U, the rows of U are also orthogonal.

A unitary matrix which is also real is called an orthogonal matrix. Rotation matrices for ordinary 3D spatial vectors are orthogonal.

4.3.3 Transforming Operators to a New Basis

Any operator can be defined in terms of its action on vectors. This action is conceptually independent of basis , though the components of both vectors and operator matrices clearly depend on our choice of basis. For a given operator Â, we write its matrix elements  ij in the old basis as:

$$ {{[ {\hat{A}} ]}_{\text{old}}}={{\left[ \begin{matrix} {{A}_{11}} & {{A}_{12}} & {} & ...\\ {{A}_{21}} & {{A}_{22}} & {} & {}\\ {} & {} & \ddots& {}\\ : & {} & {} & {{A}_{NN}}\\\end{matrix} \right]}_{\text{old}}}. $$

Then, by definition, the elements of  in the new basis must satisfy:

$$ {{[ {\hat{A}} ]}_{\text{new}}}{{\left( \begin{matrix} {{c}_{1}}\\ {{c}_{2}}\\ :\\\end{matrix} \right)}_{\text{new}}}=U{{[ {\hat{A}} ]}_{\text{old}}}{{\left( \begin{matrix} {{a}_{1}}\\ {{a}_{2}}\\ :\\\end{matrix} \right)}_{\text{old}}}. $$

A simple way to find the matrix elements of  in the new basis is to have [Â]new transform the components of \(\left| w \right\rangle \), which are given in the new basis, back to the old basis, act on the vector in the old basis with [Â]old, and then transform the result to the new basis again:

$$ {\left[ {\hat A} \right]_{{\rm{new}}}} = U{\left[ {\hat A}\right]_{{\rm{old}}}}{U^{ - 1}}{\rm{(operator}}\;{\rm{matrix}}\;{\rm{transformation)}}{\rm{.}} $$

For orthonormal bases, U−1 = U, so this is sometimes written as \({{[ {\hat{A}} ]}_{\text{new}}}=U{{[ {\hat{A}} ]}_{\text{old}}}{{U}^{\dagger }}\).

We now consider how a basis change looks for an inner product with an operator. Such inner products are basis-independent. For example, for a given operator Â, every inner product \( {\left[ {\hat A} \right]_{{\rm{old}}}} = {\left[ {\begin{array}{*{20}{c}}{{A_{11}}}&{{A_{12}}}&{}&{...}\\{{A_{21}}}&{{A_{22}}}&{}&{}\\{}&{}& \ddots &{}\\:&{}&{}&{{A_{NN}}}\end{array}} \right]_{{\rm{old}}}}. \) must be the same in any basis. In our (old) b-basis, we can write the inner product:

$$ \left\langle v \right|\hat A\left| w \right\rangle = {\left( {\begin{array}{*{20}{c}}{{v_1}^*}&{{v_2}^*}&{...}\end{array}} \right)_{{\rm{old}}}}{\left[ {\hat A} \right]_{{\rm{old}}}}{\left( {\begin{array}{*{20}{c}}{{a_1}}\\{{a_2}}\\:\end{array}} \right)_{{\rm{old}}}}. $$
(4.12)

In our new basis n j , with transformation matrix U, we must have, for every possible \(\left| w \right\rangle \) and \(\left| w \right\rangle \):

$$ \left\langle v \right|\hat A\left| w \right\rangle = {\left[ {\hat A} \right]_{{\rm{new}}}}. $$

Now use \({{[ {\hat{A}} ]}_{\text{new}}}=U{{[ {\hat{A}} ]}_{\text{old}}}{{U}^{-1}},\ \ \text{and}\ \ {{U}^{\dagger }}U={{U}^{-1}}U={{\mathbf{1}}_{N\times N}}\):

$$ \left\langle v \right|\hat A\left| w \right\rangle = {\left( {\begin{array}{*{20}{c}}{{v_1}^*}&{{v_2}^*}&{...}\end{array}} \right)_{{\rm{old}}}}{U^\dag }U{\left( {\begin{array}{*{20}{c}}{{a_1}}\\{{a_2}}\\:\end{array}} \right)_{{\rm{old}}}} = {\left( {\begin{array}{*{20}{c}}{{v_1}^*}&{{v_2}^*}&{...}\end{array}} \right)_{{\rm{old}}}}{\left[ {\hat A} \right]_{{\rm{old}}}}{\left( {\begin{array}{*{20}{c}}{{a_1}}\\{{a_2}}\\:\end{array}} \right)_{{\rm{old}}}}, $$

which is the same as before, Eq. (4.12). Thus, our transformations for kets, bras, and operator matrices satisfy the requirement that inner products are independent of basis.

4.4 Density Matrices

Density matrices and mixed states are important concepts required for many real-world situations. Experimentalists frequently require density matrices to describe their results. Density matrices are the quantum analog of the classical concept of ensembles of particles (or systems). Ensembles are used heavily in Statistical Mechanics.

Up until now, we have described systems of distinct “particles,” where the particles are in definite quantum states. Even in a definite state, though, the measurable (i.e., observable) properties may be statistical (and thus not definite). This latter property, quite different from classical mechanics, gives rise to a striking new quantum result for the classical concept of ensembles . An ensemble is just a bunch of identical particles (or systems), possibly each in a different state, but we know the statistics of the distribution of states. For example, a particle drawn from a thermal bath is in an unknown quantum state, due to the randomness of thermal systems. However, if we draw many such particles (an ensemble) from the bath, we can predict the statistics of their properties from the bath temperature. While a known quantum state of a particle may be given by a ket , the state of an ensemble, or of a single particle drawn from it, is given by a density matrix .

We consider here ensembles only for finite-dimensional (and therefore discrete) quantum systems, though the concept extends to more general (continuous) systems. We use some examples from the QM of angular momentum , which is a topic discussed later in this book.

Instead of having a single particle in a definite state, suppose we have an ensemble of particles. If all the particles in the ensemble are in identical quantum states, then we have nothing new. All our QM so far applies to every particle, and extends to the ensemble as a whole. But suppose the ensemble is a mixture of particles in several different quantum states. What then? Can we compute average values of measurable quantities? If we know the fractions of all the constituent states in the ensemble, then of course we can compute the average value of any observable, and we do it in the straightforward, classical way. We will see that this idea of a classical mixture of quantum particles leads to a “density matrix:” a way of defining all the properties of such a mixture. However, we will also see that quantum ensembles have a highly nonclassical nature.

The density matrix is essentially the quantum state of an ensemble.

For example, suppose we have an ensemble of electrons, 3/4 are spin \(\left| z+ \right\rangle \), and 1/4 are spin \(\left| x+ \right\rangle \). If we measure spin in the z-direction of many particles from the ensemble, we will get an average which is the simple weighted average of the two states:

For \(\left| z+ \right\rangle \):

$$ {{\left\langle {{s}_{z}} \right\rangle }_{\left| z+ \right\rangle }}=\frac{\hbar }{2}, $$

for \(\left| x+ \right\rangle \):

$$ {{\left\langle {{s}_{z}} \right\rangle }_{\left| x+ \right\rangle }}=0. $$

Then:

$$ {{[ {{s}_{z}} ]}_{\text{ensemble}}}=\frac{3}{4}{{\left\langle {{s}_{z}} \right\rangle }_{\left| z+ \right\rangle }}+\frac{1}{4}{{\left\langle {{S}_{z}} \right\rangle }_{\left| x+ \right\rangle }}=\frac{3}{4}\cdot \frac{\hbar }{2}. $$

Following [16] , we use square brackets [B] to explicitly distinguish the ensemble average of an observable \(\hat{B}\), from a “pure state average \(\left\langle B \right\rangle \equiv \left\langle\psi\right|\hat{B}\left| \psi\right\rangle \), the average of an observable for a particle in a known quantum state (which may be a superposition) . This average [B] is a number, distinct from the matrix for the operator \(\hat{B}\equiv [ {\hat{B}} ]\). Expanding the previous average values into bra-operator-ket inner products, we get:

$$ {\left\langle {{s_z}} \right\rangle _{\left| {z + } \right\rangle }} = \left\langle {z + } \right|{\hat s_z}\left| {z + } \right\rangle = \frac{\hbar }{2}, $$
$$ {\left\langle {{s_z}} \right\rangle _{\left| {x + } \right\rangle }} = \left\langle {x + } \right|{\hat s_z}\left| {x + } \right\rangle = 0. $$

Then:

$$ {\left[ {{s_z}} \right]_{{\rm{ensemble}}}} = \frac{3}{4}\left\langle {z + } \right|{\hat s_z}\left| {z + } \right\rangle + \frac{1}{4}\left\langle {x + } \right|{\hat s_z}\left| {x + } \right\rangle = \frac{3}{4} \cdot \frac{\hbar }{2}\;. $$

So far, this is very simple.

Now let us consider a more general case: an ensemble consists of a mix of an arbitrary number of quantum states, each with an arbitrary fraction of occurrence (i.e., probability). Note that even in finite-dimensional systems, there are an infinite number of quantum states, because the N basis vectors can be combined with complex coefficients in infinitely many ways. Therefore, the number of states in the mixture is unrelated to the Hilbert space dimension, N. Say we have a mix of M states, \(\left| {{\psi }^{(k)}} \right\rangle \), k = 1,…,M, each with a fraction of occurrence in the ensemble (or weight) w k . As in the spin example, we can simply compute the average value of many measurements of particles from the ensemble by taking a weighted average of the quantum averages [16, 3.4.6, p. 177]:

$$ [ B ]=\sum\limits_{k=1}^{M}{{{w}_{k}}\left\langle{{\psi }^{(k)}} \right|\hat{B}\left| {{\psi }^{(k)}} \right\rangle }\quad {where}\quad {{w}_{k}}\text{ are real, and} \sum\limits_{k=1}^{M}{{{w}_{k}}}=1. $$

A mixed state is quite different from a superposition. For one thing, a mixed state has no phase information relating the constituent states \(\left| {{\psi }_{k}} \right\rangle \): the w k are real.

Everything we have done so far is independent of basis: the \(\left| {{\psi }^{(k)}} \right\rangle \) are arbitrary states, and will be superpositions in some bases, but not others. We use the term constituent to mean one of the states, \(\left| {{\psi }^{(k)}} \right\rangle \), of the mixture. This is distinct from “component,” which refers to the complex coefficient of a basis function in a superposition. The constituents in a mixture are quantum states, independent of basis.

An ensemble with only one constituent (M = 1) is called a pure state : each particle is in a definite quantum state.

4.4.1 Development of the Density Matrix

We now use the outer-product method of computing the average value of an operator on a definite state (Fig. 4.2): we overlay each matrix \(\left| {{\psi }^{(k)}} \right\rangle {{\left\langle{{\psi }^{(k)}} \right|}^{T}}={{a}_{i}}*{{a}_{j}}\) with B ij , multiply the overlapping element pairs, and add the resulting products, as in Eq. (4.2):

$$ [B]=\sum\limits_{k=1}^{M}{{{w}_{k}}\left( \sum\limits_{i=1}^{N}{\sum\limits_{j=1}^{N}{a_{i}^{(k)*}a_{j}^{(k)}{{B}_{ij}}}} \right)}, \text{where }\left| {{\psi }^{(k)}} \right\rangle =\left( \begin{matrix} a_{1}^{(k)}\\ a_{2}^{(k)}\\ :\\ a_{N}^{(k)}\\\end{matrix} \right).$$

The \(a_{i}^{(k)}\ \text{and}\ a_{j}^{(k)}\) are the components of each constituent state \(\left| {{\psi }^{(k)}} \right\rangle \), in any chosen basis [16, 3.4.7.8 p. 177]. This essentially computes \(\left\langle{{\psi }^{(k)}} \right|\hat{B}\left| {{\psi }^{(k)}} \right\rangle \) for each k, and takes the weighted sum.

Now instead, we can move the summation over k inside the other two sums, i.e., we compute the weighted average of the \(\left| {{\psi }^{(k)}} \right\rangle \left\langle{{\psi }^{(k)}} \right|\) matrices, and then sum that over the operator matrix B ij .

$$ \left[ B \right] = \sum\limits_{k = 1}^M {{w_k}\left( {\sum\limits_{i = 1}^N {} \sum\limits_{j = 1}^N {a_i^{(k)*}a_j^{(k)}{B_{ij}}} } \right)} = \sum\limits_{i = 1}^N {} \sum\limits_{j = 1}^N {{B_{ij}}} . $$

The advantage is that we have separated the definition of the ensemble from the operator \(\hat{B}\): the last factor in parentheses depends only on the ensemble, and not on the operator in question. This factor is an N × N matrix whose transpose is so useful that we give it its own name, the density matrix: [16, 3.4.8, p. 177]:

$$ {\bf{\rho }} \equiv {\rho _{ ij}} \equiv \sum\limits_{k =1}^M {{w_k}a_j^{(k)*}a_i^{(k)}} = \sum\limits_{k = 1}^M{{w_k}\left| {{\psi ^{(k)}}} \right\rangle \left\langle {{\psi^{(k)}}} \right|} ,\;\;i,j = 1,\; \ldots ,\;N. $$

Then [16, 3.4.7, p. 177]:

$$ [ B ]=\sum\limits_{i=1}^{N}{\sum\limits_{j=1}^{N}{{{B}_{ij}}{{[ {{\mathbf{\rho }}^{T}} ]}_{ij}}}}=\sum\limits_{i=1}^{N}{{{[ \mathbf{B\rho } ]}_{ii}}}=\operatorname{Tr}( \mathbf{B\rho } )\ . $$

Again, the matrix ρ is independent of the operator \(\hat{B}\), but just like quantum operators, the elements of ρ depend on the basis in which we choose to write it. Off diagonal elements in ρ indicate one or more superpositions (in our chosen basis) in the constituent states \(\left| {{\psi }^{(k)}} \right\rangle \). Note, though, that a zero off-diagonal element ρ ij does not mean there is no superposition of \( \left| {{\phi }_{i}} \right\rangle\) and \( \left| {{\phi }_{j}} \right\rangle\) in the mixture, because it is possible that two constituents of the mixture have off-diagonal elements that cancel. ρ is complex, even though the w k are real, since the constituent states \(\left| {{\psi }^{(k)}} \right\rangle \) can have components with complex coefficients. Furthermore, \(\left| \psi\right\rangle \left\langle\psi\right|\) is Hermitian (Fig. 4.2 and related equations), and therefore ρ is Hermitian.

We need to compute ρ only once, from the weights and states of the ensemble , and we can then use it for any operator. This independence of density matrix from observable operator leads to a new quantum concept, which does not exist in classical mechanics: any two ensembles which have the same density matrix must be considered physically identical, regardless of how they were prepared. This identity is because every physical measurement (every operator acting on the ensemble) has the same statistics for both ensembles.

If no physical measurement in the universe can distinguish two things (ensembles), we must consider them physically identical.

For example, consider an ensemble of spin 1/2 particles, 50 % in \(\left| z+ \right\rangle \) and 50 % in \(\left| z- \right\rangle \). The density matrix is

$$ \left| {z + }\right\rangle = \left[ {\begin{array}{*{20}{c}} 1\\ 0\end{array}} \right], \left| {z - } \right\rangle = \left[{\begin{array}{*{20}{c}} 0\\ 1 \end{array}} \right] \Rightarrow{ \bf{\rho }} = 50\% \left( {\left[ {\begin{array}{*{20}{c}}1&0\\ 0&0 \end {array}} \right]} \right) + 50\% \left( {\left[{\begin{array}{*{20}{c}} 0&0\\ 0&1 \end{array}} \right]} \right) =\left[ {\begin{array}{*{20}{c}} {0.5}&0\\ 0&{0.5}\end{array}} \right]. $$

The average spin in the z-direction is \(\left\langle {{{\hat s}_z}} \right\rangle = 0\). In fact, the average spin in any direction is \(\left\langle {{{\hat{s}}}_{\mathbf{n}}} \right\rangle =0.\)

Now consider a second ensemble, 50 % in \(\left| x+ \right\rangle \), and 50 % in \(\left| x- \right\rangle \). We write the states in the z basis as:

$$\begin{array}{c}\left| {x + }\right\rangle= \left[{\begin{array}{*{20}{c}}{1/\sqrt 2 }\\{1/\sqrt 2 }\end{array}}\right],\;\;\;\left| {x -}\right\rangle= \left[{\begin{array}{*{20}{c}}{1/\sqrt 2 }\\{ - 1/\sqrt 2} \end{array}}\right]\;\;\;\;\Rightarrow\\ {\bf{\rho}} = 50\%\left( {\left[{\begin{array}{*{20}{c}}{0.5}&{0.5}\\{0.5}&{0.5}\end{array}}\right]}\right) + 50\%\left( {\left[{\begin{array}{*{20}{c}}{0.5}&{ -0.5}\\ { -0.5}&{0.5}\end{array}}\right]}\right) =\left[{\begin{array}{*{20}{c}}{0.25}&{0.25}\\{0.25}&{0.25}\end{array}}\right] +\left[{\begin{array}{*{20}{c}}{0.25}&{ -0.25}\\ { -0.25}&{0.25}\end{array}}\right] =\left[{\begin{array}{*{20}{c}}{0.5}&0\\0&{0.5}\end{array}}\right]\;.\end{array} $$

You might think this second ensemble is different from the first. It was formed very differently, from different states. However, the average spin in any direction is again 0. Furthermore, since the density matrix is identical to the first ensemble, the average of any operator is identical to that of the first ensemble. No measurement in the universe can distinguish the two ensembles; therefore as scientists, we must admit that they are physically the same!

Even though the constituent states are quite distinct, the resulting ensembles are the same. (Note that we have not yet shown that all statistics are the same, but we do so in the next section.)

Classically, this collapsing of ensembles does not occur, because classically, we can make as many measurements as we want on each particle of the ensemble, and each without disturbing its state. For example, we could measure both the x and z components of angular momentum , and distinguish between particles with angular momentum in the x or z direction. This would allow us to distinguish the two previous ensembles. But in the real world of QM, we can only measure one component of a particle’s state, and in the process of that measurement, destroy other information about its state. For example, if we measure a particle’s spin in the z-direction to be + ħ/2, we cannot tell whether the particle was actually a \(\left| z+ \right\rangle \) particle, or whether it was an \(\left| x+ \right\rangle \) or \(\left| x- \right\rangle \) particle which happened to measure + ħ/2 in the z-direction. Hence, we cannot distinguish between the two ensembles described previously. And furthermore, no other measurement can distinguish the ensembles, either.

4.4.2 Further Properties of Density Matrices

The phases of the constituent \(\left| {{\psi }^{(k)}} \right\rangle \) are arbitrary: Adding an arbitrary phase to a constituent state \(\left| {{\psi }^{(k)}} \right\rangle \) has no effect on the density matrix , because any phase cancels in the outer product:

$$ {\left[ {\;\left| {{\psi ^{(k)}}} \right\rangle \left\langle {{\psi ^{(k)}}} \right|\;} \right]_{\;ij}} = a_j^{(k)*}a_i^{(k)}. $$

Diagonal elements are probabilities: In a basis \(\left| {{\phi }_{i}} \right\rangle \), the diagonal elements ρ dd , d = 1,…N, are the probabilities of measuring the basis state \(\left| {{\phi }_{d}} \right\rangle \) in a particle in the mixed state ρ. We see this by combining probabilities for the constituent states:

$$ {{\rho }_{dd}}=\sum\limits_{k=1}^{M}{{{w}_{k}}a_{d}^{(k)*}a_{d}^{(k)}}=\sum\limits_{k=1}^{M}{{{w}_{k}}\Pr ( \left| {{\phi }_{d}} \right\rangle \ in\ \left| {{\psi }^{(k)}} \right\rangle)}=\Pr ( \left| {{\phi }_{d}} \right\rangle). $$

Since the sum of the probabilities is 1, we see that the sum of the diagonal elements (i.e., the trace) of ρ = 1. Put another way, consider a state \(\left| \psi\right\rangle \) that exists in a 3D Hilbert space :

$$\begin{array}{ll} \left| \psi \right\rangle =\left[ \begin{matrix} {{a}_{1}} \\ {{a}_{2}} \\ {{a}_{3}} \\ \end{matrix} \right],\quad \quad & \left| \psi \right\rangle \otimes \left\langle \psi \right|\equiv \left| \psi \right\rangle \left\langle \psi \right|=\left[ \begin{matrix} {{a}_{1}} \\ {{a}_{2}} \\ {{a}_{3}} \\ \end{matrix} \right]\left[ \begin{matrix} {{a}_{1}}^{*} & {{a}_{2}}^{*} & {{a}_{3}}^{*} \\ \end{matrix} \right]\\&=\left[ \begin{matrix} {{a}_{1}}{{a}_{1}}^{*} & {{a}_{1}}{{a}_{2}}^{*} & {{a}_{1}}{{a}_{3}}^{*} \\ {{a}_{2}}{{a}_{1}}^{*} & {{a}_{2}}{{a}_{2}}^{*} & {{a}_{2}}{{a}_{3}}^{*} \\ {{a}_{3}}{{a}_{1}}^{*} & {{a}_{3}}{{a}_{2}}^{*} & {{a}_{3}}{{a}_{3}}^{*} \\ \end{matrix} \right]\end{array}$$
(4.13)

The diagonal elements of \(\left| \psi\right\rangle \left\langle\psi\right|\) are just the terms in the dot product \(\left\langle\psi\right|\left. \psi\right\rangle ={{a}_{1}}{{a}_{1}}^{*}+{{a}_{2}}{{a}_{2}}^{*}+{{a}_{3}}{{a}_{3}}^{*}=1\). The trace of a matrix is the sum of the diagonal elements, so we have:

$$ {\mathop{\rm Tr}\nolimits} \left( {\left| \psi \right\rangle\left\langle \psi \right|} \right) = 1{\rm{for}}\;{\rm{a}}\;{\rm{normalized}}\;{\rm{state}}\;\left| \psi\right\rangle . $$

Tr() is a linear operator on matrices. The density matrix ρ is a weighted sum of matrices, where each matrix has a trace of 1, and the sum of the weights equals 1. Thus:

$$\operatorname{Tr}\left( \mathbf{\rho } \right)=\operatorname{Tr}\left(\sum\limits_{k=1}^{M}{{{w}_{k}}\left| {{\psi }^{(k)}} \right\rangle \left\langle {{\psi }^{(k)}} \right|} \right)=\sum\limits_{k=1}^{M}{{{w}_{k}}}\underbrace{\operatorname{Tr}\left(\left| {{\psi }^{(k)}} \right\rangle \left\langle {{\psi }^{(k)}} \right| \right)}_{1}=\sum\limits_{k=1}^{M}{{{w}_{k}}}=1.$$

Sometimes ρ dd is written as \( \left\langle{{\phi }_{d}} \right|\mathbf{\rho }\left| {{\phi }_{d}} \right\rangle\) , but that notation can be misleading, because it does not make physical sense for a density matrix to act on a ket \( \left| {{\phi }_{d}} \right\rangle\). As noted earlier, ρ is an ensemble state, and by extension, ρ is a particle state (a mixed state). It has no business acting on another kind of particle state (a pure state) .

A different kind of operator: The density matrix ρ is a Hermitian matrix expressed in a particular basis. You can change ρ to a different basis using the same similarity transform as for an operator matrix:

$$ \rho ' = U\rho {U^{\; - 1}}, $$

where \(U\equiv \) the unitary transformation matrix.

However, ρ is not a quantum operator, in the usual sense. Some books use the unfortunate term “density operator,” but just because a matrix is expressed in a basis , that does not make it an ordinary operator. In fact, the density matrix is a kind of quantum state. It is used with ordinary quantum operators, but in a different way than kets are. Recall that a quantum operator acts on a ket to produce another ket; density matrices do not usually do that. [We can consider a density matrix to be a rank-2 tensor acting on another rank-2 tensor (a quantum operator) to produce a scalar.]

Basis independent density matrix: Recalling the definition of outer product from Eq. (4.3), we have seen that we can write the density matrix in a basis-independent way, as a sum of outer products of the constituent states with themselves:

$$ {\bf{\rho }} \equiv \sum\limits_{k = 1}^M {{w_k}\left| {{\psi^{(k)}}} \right\rangle \left\langle {{\psi ^{(k)}}} \right|}\quad{\rm{(basis}}\;{\rm{independent)}}{\rm{.}} $$

As with vectors, if we want to write ρ as a set of numbers, we must choose a basis.

For an ensemble in a pure state , where every particle has the exact same quantum state \(\left| \psi\right\rangle \), M = 1, and the density matrix is:

$$ {\bf{\rho }} = \left| \psi \right\rangle \left\langle \psi\right| {\rm{(pure}}\;{\rm{state)}}{\rm{.}} $$

Average from trace: We saw in Eq. 4.5 that the average in a pure quantum state \(\left| \psi\right\rangle \) can be computed from \([ B ]=\operatorname{Tr}( \mathbf{B}\left| \psi\right\rangle \left\langle\psi\right| )\). The average of a mixed state is just the weighted average of the pure averages:

$$\begin{array}{ll} \left[ B \right] &= \sum\limits_{k = 1}^M {{w_k}{\mathop{\rm Tr}\nolimits} \left( {{\bf{B}}\left[ {\left| {{\psi ^{(k)}}} \right\rangle \left\langle {{\psi ^{(k)}}} \right|} \right]} \right)} = {\mathop{\rm Tr}\nolimits} \left( {\sum\limits_{k = 1}^M {{w_k}{\bf{B}}\left[ {\left| {{\psi ^{(k)}}} \right\rangle \left\langle {{\psi ^{(k)}}} \right|} \right]} } \right)\\& = {\mathop{\rm Tr}\nolimits} \left( {{\bf{B}}\sum\limits_{k = 1}^M {{w_k}\left[ {\left| {{\psi ^{(k)}}} \right\rangle \left\langle {{\psi ^{(k)}}} \right|} \right]} } \right) = {\mathop{\rm Tr}\nolimits} \left( {{\bf{B\rho }}} \right) = {\mathop{\rm Tr}\nolimits} \left( {{\bf{\rho B}}} \right)\;. \end{array}$$

Note that the trace of a matrix is unchanged by a basis change (unitary transformation), so \([ B ]=\operatorname{Tr}( \mathbf{\rho B} )\) in any basis, and we can choose to evaluate it in any convenient basis.

Trace of pure and mixed states: ρ is a Hermitian matrix, and therefore can be diagonalized. Then there exists a basis in which:

$$ {\bf{\rho }} = \left[ {\begin{array}{*{20}{c}} {{\rho_{11}}}&0&0\\ 0&{{\rho _{22}}}&0\\ 0&0& \ddots \end{array}}\right], {\mathop{\rm Tr}\nolimits} \left( {\bf{\rho }} \right)\equiv \sum\limits_{d = 1}^N {{\rho _{dd}} = 1} , {\rm{and}}{\rho _{dd}} \le 1,\;\;d = 1,\; \ldots N. $$

In a pure state , there is only one constituent quantum state \( \left| \psi\right\rangle\) , so M = 1, and w1 = 1. Then we can choose a diagonal basis of ρ in which \( \left| \psi\right\rangle\) is one of the basis vectors. Then ρ has the simple form:

$$ \mathbf{\rho }=\left[ \begin{matrix} 1 & 0 & 0\\ 0 & 0 & 0\\ 0 & 0 & \ddots \\\end{matrix} \right]. $$

In this case of a pure state , and only in this case, ρ2 = ρ and Tr(ρ2) = 1. Therefore, since these equations are independent of basis, we have in any basis:

$$ {{\bf{\rho }}^2} = {\bf{\rho }},\;\;\;\;\;{\rm{and}}{\mathop{\rm Tr}\nolimits} \left( {{{\bf{\rho }}^2}} \right) = 1\quad{\rm{(pure}}\;{\rm{state,}}\;{\rm{any}}\;{\rm{basis)}}{\rm{.}}$$

In a mixed state , in a basis where ρ is diagonal, all its elements ρ dd are strictly less than one. Then ρ2 = diag(ρ112, ρ222,…) has the squares of all the diagonal elements of ρ, and thus each diagonal element of ρ2 is strictly less than the corresponding element of ρ. This implies that, since Tr(ρ) = 1, Tr(ρ2) < 1. Since Tr() is basis-independent, all of these statements are true in any basis. Thus Tr(ρ2), in any basis, is a test of whether an ensemble is pure or mixed:

$$ {\mathop{\rm Tr}\nolimits} \left( {{{\bf{\rho }}^2}} \right) <1\quad{\rm{(mixed}}\;{\rm{state,}}\;{\rm{any}}\;{\rm{basis)}}{\rm{.}} $$

Statistics beyond averages: Recall that QM predicts the possible outcomes of an experiment (the “spectrum” of results), and their probabilities of occurrence. What are the spectra and probabilities of mixed ensembles? Any observable \(\hat{B}\) has a PDF of its possible values. The moments of the PDF (probability distribution function) (averages of powers of the observable) are computed from \(\hat{B}\) and ρ:

$$ [ {{{\hat{B}}}^{n}} ]=\sum\limits_{i=1}^{N}{\sum\limits_{j=1}^{N}{{{\rho }_{ij}}{{[ {{\mathbf{B}}^{n}} ]}_{ji}}}}, $$

where \(\mathbf{B}\equiv \) operator matrix of \(\hat{B}\) in our basis.

It is well known that the moments of a random variable fully define its PDF. Therefore, since ρ fully defines all the moments, it also fully defines the PDF of measurements for any operator. Furthermore, any two ensembles with the same density matrix , regardless of how they were prepared or if their constituent states are the same, are physically indistinguishable by any measurement or statistics of measurements. Ultimately, then, we must conclude that the two ensembles are physically the same.

Time evolution of the density matrix: The state of an ensemble may change with time (e.g., it may be approaching equilibrium). Then its density matrix is a function of time, because its constituent states are evolving in time. We find the equation of motion (EOM) for ρ from the EOMs for its constituent kets and bras. From the Schrödinger equation:

$$\begin{array}{@{}l@{}}i\hbar \dfrac{\partial}{{\partial {t}}}| {\psi \rangle} ={\hat H}| {\psi \rangle},\;\;\; {\rm{and}} - i{\hbar}\dfrac{\partial}{\partial {t}}{\langle \psi} | ={\langle {\psi}} |{\hat H} \Rightarrow\\ \dfrac{\partial}{\partial {t}}\left({| \psi\rangle\langle\psi |}\right) =\left({\dfrac{\partial}{{\partial{t}}}\left| \psi\right\rangle}\right)\left\langle\psi \right|+ \left| \psi\right\rangle\dfrac{\partial}{{\partial{t}}}\left\langle\psi \right|=\dfrac{1}{{i\hbar}}\left[ {{\hat{H}}\left| \psi\right\rangle\left\langle\psi \right| -\left| \psi\right\rangle\left\langle\psi\right|\hat {H}}\right].\end{array}$$

Note that \( \left( {\hat H\left| \psi \right\rangle } \right)\left\langle \psi \right| = \hat H\left[ {\;\left| \psi \right\rangle \left\langle \psi \right|\;} \right] \) is just the matrix product of the matrix Ĥ with the matrix \(\left| \psi\right\rangle \left\langle\psi\right|\), and also \(\left| \psi \right\rangle \left( {\left\langle \psi \right|\hat H} \right) = \left[ {\;\left| \psi \right\rangle \left\langle \psi \right|\;} \right]\hat H\) is a matrix product. We can see this by writing the matrix \(\left| \psi\right\rangle \left\langle\psi\right|\) explicitly, and using linearity of Ĥ:

$$\begin{array}{@{}l@{}} {\rm{Let}}\;\;\;\;\left| \psi \right\rangle= \left[ {\begin{array}{*{20}{c}} {{a_1}}\\[8pt] {{a_2}}\\[8pt] {{a_3}}\end{array}} \right]\!.\;\;\;\;{\rm{Then:}} \left| \psi\right\rangle \left\langle \psi \right| = \left[{\begin{array}{*{20}{c}} {{a_1}}\\[8pt] {{a_2}}\\[8pt] {{a_3}} \end{array}}\right]\\\left[ {\begin{array}{*{20}{c}} {a_1^{*}}&{a_2^{*}}&{a_3^*}\end{array}} \right] = \left[{\begin{array}{*{20}{c}}{a_1^*\left[ {\begin{array}{*{20}{c}} {{a_1}}\\[8pt] {{a_2}}\\[8pt] {{a_3}}\end{array}} \right]}&{a_2^*\left[ {\begin{array}{*{20}{c}}{{a_1}}\\[8pt] {{a_2}}\\[8pt] {{a_3}} \end{array}} \right]}&{a_3^*}\end{array}\left[ {\begin{array}{*{20}{c}} {{a_1}}\\[8pt] {{a_2}}\\{{a_3}} \end{array}} \right]} \right]\\[8pt] \Rightarrow \left({\hat H\left| \psi \right\rangle } \right)\left\langle \psi\right| = \left[ {\begin{array}{*{20}{c}} {a_1^*\hat H\left[{\begin{array}{*{20}{c}} {{a_1}}\\[8pt] {{a_2}}\\[8pt] {{a_3}} \end{array}}\right]}&{a_2^*\hat H\left[ {\begin{array}{*{20}{c}} {{a_1}}\\{{a_2}}\\[8pt] {{a_{3}}} \end{array}} \right]}&{a_3^{*}} \end{array}\hat H\left[ {\begin{array}{*{20}{c}} {{a_1}}\\[8pt] {{a_2}}\\[8pt] {{a_3}}\end{array}} \right]} \right]\\[8pt] = \hat H\left[{\begin{array}{*{20}{c}} {{a}_1^{*}\left[ {\begin{array}{*{20}{c}}{{a_1}}\\[8pt] {{a_2}}\\[8pt] {{a_3}} \end{array}} \right]}&{a_2^{*}\left[{\begin{array}{*{20}{c}} {{a_1}}\\[8pt] {{a_2}}\\[8pt] {{a_3}} \end{array}}\right]}&{a_3^{*}} \end{array}\left[{\begin{array}{*{20}{c}}{{a_1}}\\[8pt] {{a_2}}\\[8pt] {{a_3}} \end{array}} \right]} \right] = \hat H\left( {\left| \psi \right\rangle \left\langle \psi \right|}\right).\end{array}$$

Then, using linearity over the constituents of ρ:

$$\begin{array}{ll} & \boldsymbol{\rho }\equiv \sum\limits_{k=1}^{M}{{{w}_{k}}\left| {{\psi }^{(k)}} \right\rangle \left\langle {{\psi }^{(k)}} \right|}\quad \quad \Rightarrow \nonumber\\[8pt] & \frac{\partial }{\partial t}\boldsymbol{\rho}=\sum\limits_{k=1}^{M}{{{w}_{k}}\frac{1}{i\hbar }\left(\hat{H}\left| {{\psi }^{(k)}} \right\rangle \left\langle {{\psi}^{(k)}} \right|-\left| {{\psi }^{(k)}} \right\rangle \left\langle {{\psi }^{(k)}} \right|\hat{H} \right)}\nonumber\\&\quad=\frac{1}{i\hbar }\left[\hat{H}\boldsymbol{\rho }-\mathbf{\rho }\hat{H} \right]=\frac{1}{i\hbar }\left[ \hat{H},\boldsymbol{\rho } \right].\nonumber\\[8pt] \end{array}$$

Note that ∂ρ/∂t is the time derivative of a matrix, which is itself a matrix. The previous form is reminiscent of the time evolution of the average value of an operator, but for the density matrix ρ, the Hamiltonian comes first in the commutator , and there is no inner product (since ρ takes the place of the quantum state). In contrast, for the time evolution of the average of an operator, the Hamiltonian comes second, and we must take an inner product with the quantum state: \({d\hat{A}}/{dt}\;={\left\langle\psi\right|[\hat{A},\,\hat{H}]\left| \psi\right\rangle }/{i\hbar }\;\).

Note also that the time evolution depends only on ρ, and not at all on its constituent states, once again confirming that two ensembles with the same density matrix are physically identical, regardless of how they were constructed or whether their constituent quantum states are the same.

Note that in some realistic systems, such as thermal ensembles, collisions and other multibody interactions make it essentially impossible to compute the long-term time evolution exactly. One generally turns, then, to statistical methods.

Continuous density functions: The concept of a density matrix extends to the continuum case, where the density matrices are replaced by density functions [16, p. 182]. We do not address that case.

4.4.3 Density Matrix Examples

We first consider two pure spin-1/2 states. In the z-basis:

$$ \left| {z + } \right\rangle = \left( {\begin{array}{*{20}{c}}1\\ 0 \end{array}} \right) \Rightarrow {\bf{\rho }} = \left|{z + } \right\rangle \left\langle {z + } \right| = \left({\begin{array}{*{20}{c}} 1\\ 0 \end{array}} \right)\left({\begin{array}{*{20}{c}} 1&0 \end{array}} \right) = \left[{\begin{array}{*{20}{c}} 1&0\\ 0&0 \end{array}} \right],{{\bf{\rho }}^2} = \left[ {\begin{array}{*{20}{c}} 1&0\\ 0&0\end{array}} \right]. $$

ρ is Hermitian, and satisfies Tr(ρ) = 1, as required for all density matrices, and also ρ2 = ρ, as required for all pure states .

As another example:

$$ \begin{array}{c}{{\hat s}_{\theta ,\phi }} = \sin \;\theta \cos \;\phi {{\hat s}_x} + \sin \;\theta \sin \;\phi {{\hat s}_y} + \cos \;\theta {{\hat s}_z}\\= \frac{\hbar }{2}\left( {\sin \;\theta \cos \;\phi \left[ {\begin{array}{*{20}{c}}0&1\\1&0\end{array}} \right] + \sin \;\theta \sin \;\phi \left[ {\begin{array}{*{20}{c}}0&{ - i}\\i&0\end{array}} \right] + \cos \;\theta \left[ {\begin{array}{*{20}{c}}1&0\\0&{ - 1}\end{array}} \right]} \right) = \frac{\hbar }{2}\left[ {\begin{array}{*{20}{c}}{\cos \;\theta }&{\sin \;\theta {e^{ - i\phi }}}\\{\sin \;\theta {e^{ + i\phi }}}&{ - \cos \;\theta }\end{array}} \right]\;.\end{array} $$

Again, ρ is Hermitian, satisfies Tr(ρ) = 1 (as required for all density matrices), and ρ2 = ρ (as required for all pure states).

What is the density matrix for a thermal ensemble of spin-1/2 particles? We first solve this by brute force, and then describe some subtler reasoning. Since a thermal ensemble has no preferred direction, it must have zero average spin in every direction. We note the spin operator for a general direction, (θ, Φ):

$$ \begin{array}{ll}{{\hat s}_{\theta ,\phi }} &= \sin \;\theta \cos \;\phi {{\hat s}_x} + \sin \;\theta \sin \;\phi {{\hat s}_y} + \cos \;\theta {{\hat s}_z}\\&= \frac{\hbar }{2}\left( {\sin \;\theta \cos \;\phi \left[{\begin{array}{*{20}{c}}0&1\\1&0\end{array}} \right] + \sin \;\theta \sin \;\phi \left[ {\begin{array}{*{20}{c}}0&{ - i}\\i&0\end{array}} \right] + \cos \;\theta \left[ {\begin{array}{*{20}{c}}1&0\\0&{ - 1}\end{array}} \right]} \right) \\&= \frac{\hbar }{2}\left[ {\begin{array}{*{20}{c}}{\cos \;\theta }&{\sin \;\theta {e^{ - i\phi }}}\\{\sin \;\theta {e^{ + i\phi }}}&{ - \cos \;\theta }\end{array}} \right]\;.\end{array} $$

We must find a constant density matrix such that for every (θ, Φ) we have zero average:

$$ [ {{s}_{\theta ,\phi }} ]=\operatorname{Tr}( \mathbf{\rho }{{\mathbf{s}}_{\theta ,\phi }} )=\operatorname{Tr}\left( \left[ \begin{matrix} a & b\\ {{b}^{*}} & d\\\end{matrix} \right]\frac{\hbar }{2}\left[ \begin{matrix} \cos \ \theta& \sin \ \theta {{e}^{-i\phi }}\\ \sin \ \theta {{e}^{+i\phi }} & -\cos \ \theta \\\end{matrix} \right] \right)=0, $$

where we must solve for the ρ matrix elements a, b, and d, and we have used that ρ is Hermitian. Then:

$$\begin{array}{@{}l@{}} {\mathop{\rm Tr}\nolimits} \left({\left[{\begin{array}{*{20}{c}} a&b\\[8pt] {{b^*}}&d \end{array}}\right]\left[ {\begin{array}{*{20}{c}} {\cos \;\theta }&{\sin\;\theta {e^{- i\phi}}}\\[8pt] {\sin\;\theta {e^{+i\phi }}}&{-\cos\;\theta }\end{array}} \right]} \right) \\= a\cos \;\theta +b\sin \;\theta {e^{+ i\phi }} + {b^*}\sin \;\theta {e^{- i\phi}} - d\cos \;\theta = 0 \Rightarrow \\ \quad \left( {a - d}\right)\cos \;\theta = 0,\;\;\;\qquad \sin \;\theta {\mathop{\rm{Re}}\nolimits} \left\{{b{e^{i\phi }}} \right\} = 0\;.\end{array}$$

The constraint on a and d is required when θ = 0 (canceling any value of b). Since Tr(ρ) = a + d = 1, and a = d, we must have a = d = 1/2. Then Re{beiΦ} = 0 for all Φ, which can only happen if b = 0. So:

$$ {\bf{\rho }} = \left[ {\begin{array}{*{20}{c}} {1/2}&0\\0&{1/2}\end{array}} \right] {\rm{(thermal}}\;{\rm{ensemble)}}{\rm{.}} $$

We recognize the thermal ensemble as that of a 50-50 mix of \(\left| z+ \right\rangle \) and \(\left| z- \right\rangle \). Since an ensemble is fully characterized by its density matrix , it must be that a thermal ensemble is identical to a 50-50 mix of \(\left| z+ \right\rangle \) and \(\left| z- \right\rangle \). By rotational symmetry, any 50-50 mix of opposing spins is also a thermal ensemble.

We went to substantial effort to arrive at this simple result. Could we have deduced it more directly? Yes. The laws of physics are rotationally invariant: no matter what direction we face, the laws are the same. Therefore, any 50-50 mix of opposing spins has, for each probability to measure spin in some given direction, an equal probability to measure spin in the opposite direction. Therefore, the average spin in all directions is zero. This shows that any 50-50 mix of opposing spins is also a thermal ensemble.

4.4.4 Density Matrix Summary

A quantum state \( \left| \psi\right\rangle\) characterizes a single particle. A density matrix ρ characterizes an ensemble of particles, and defines everything there is to know about it [16, p. 178t]. However, if we draw a single particle from the ensemble, we also say that the particle is in the “mixed state” ρ, because we cannot know exactly which quantum state the particle is in. The density matrix for such a particle then takes the place of a quantum state ket . The density matrix then defines everything that can be known about that particle.

Particles in a definite quantum state have a state vector; particles drawn from an ensemble have a state matrix, the density matrix, ρ.

Two ensembles with the same density matrix are physically identical, even if their constituent quantum states are different. This is a purely quantum mechanical effect (i.e., it has no classical analog).