1 Introduction

The approach of studying thermalization through the analysis of closed quantum systems with huge numbers of degrees of freedom has led, among other things, to the eigenstate thermalization hypothesis (ETH) [4, 6, 27], to the discovery of canonical typicality [5, 12, 18], and more recently to the discovery of dynamical typicality [1, 2, 16, 21,22,23], which is the fact that most pure states \(\psi \) with a given quantum expectation value \(\langle \psi |A|\psi \rangle \) of a macroscopic observable A also have nearly the same \(\langle \psi |B|\psi \rangle \) for any other observable B (and likewise also nearly the same \(\langle \psi _t|B|\psi _t \rangle \)). Here, we provide a very simple proof of an important special case of this statement, namely for A a projection and \(\langle \psi |A|\psi \rangle =1\). Put differently, we show that most \(\psi \) from a macroscopically large subspace of Hilbert space have almost the same expectation values of bounded observables.

Our second result concerns the long-time behavior of \(\langle \psi _t|B|\psi _t \rangle \) under the unitary evolution \(\psi _t=\exp (-iHt)\psi _0\) (taking \(\hbar =1\)) and extends previous results of Reimann and Gemmer [23] as well as von Neumann’s [17] result now known as normal typicality [10, 13]. In particular, our result avoids certain unrealistic assumptions of von Neumann’s.

As usual for the description of macroscopic closed quantum systems, we restrict our consideration to a micro-canonical energy interval \([E-\Delta E,E]\) that is small in macroscopic units but large enough to contain very many eigenvalues of the Hamiltonian H; for a system of N particles, relevant intervals contain of order \(\exp (N)\) eigenvalues. Let \(\mathcal {H}\) be the corresponding spectral subspace, i.e., the range of \(\mathbbm {1}_{[E-\Delta E,E]}(H)\), or energy shell, and let \({\mathbb {S}}(\mathcal {H}) = \{\psi \in \mathcal {H}: \Vert \psi \Vert =1\}\) denote the unit sphere and \(D:=\dim \mathcal {H}<\infty \). Following von Neumann [17], we assume that different macro states \(\nu \) of the system correspond to mutually orthogonal subspaces \(\mathcal {H}_\nu \) (macro spaces) of \(\mathcal {H}\) such that

$$\begin{aligned} \mathcal {H}= \bigoplus _{\nu }\mathcal {H}_\nu . \end{aligned}$$
(1)

Different vectors in the same \(\mathcal {H}_\nu \) are regarded as “looking macroscopically equal”. For example, the “macroscopic look” could be defined in terms of mutually commuting self-adjoint operators \(M_1,\ldots ,M_K\) regarded as the “macroscopic observables” [17]; then \(\mathcal {H}_\nu \) are the joint eigenspaces and \(\nu =(m_1,\ldots ,m_K)\) is the corresponding list of eigenvalues. Let \(P_\nu \) denote the projection onto \(\mathcal {H}_\nu \). Although some macro spaces will have much larger dimensions \(d_\nu :=\dim \mathcal {H}_\nu \) than others, all \(d_\nu \) will be very large, roughly comparable to \(\exp (N)\).

In this setting, it is natural to consider initial states \(\psi _0\) from a certain macro space and ask about the time evolution of the macroscopic superposition weights \(\Vert P_\nu \psi _t\Vert ^2\). We present two general, theoretical findings about these weights that mainly arise just from the hugeness of the \(d_\nu \)’s. The first finding (dynamical typicality) is that the curve given by \(\Vert P_\nu \psi _t\Vert ^2\) as a function of t is nearly \(\psi _0\)-independent once we fix the macro state of \(\psi _0\). In other words, if \(\psi _0\) is purely random in \(\mathcal {H}_\mu \), then the superposition weights are nearly deterministic. The second finding (generalized normal typicality) is that in the long run, as \(t\rightarrow \infty \), \(\Vert P_\nu \psi _t\Vert ^2\) is nearly constant, meaning it is close for most \(t\in [0,\infty )\) to a t-independent and \(\psi _0\)-independent value, once we fix the macro state of \(\psi _0\). This does not mean that \(\Vert P_\nu \psi _t\Vert ^2\) converges as \(t\rightarrow \infty \) (it does not), but that the time periods in which \(\Vert P_\nu \psi _t\Vert ^2\) is far from that value tend to be short compared to the time intervals separating these periods. One can say that the \(\Vert P_\nu \psi _t\Vert ^2\) equilibrate in the long run; however, this equilibration does not correspond to thermal equilibrium in the sense of thermodynamics; rather, thermal equilibrium at time t would correspond to \(\Vert P_\nu \psi _t\Vert ^2\approx 1\) for one particular \(\nu \) (the macro state of thermal equilibrium, \(\mathcal {H}_\nu =\mathcal {H}_{eq }\)) and \(\Vert P_\nu \psi _t\Vert ^2\approx 0\) for all other \(\nu \)’s. We therefore speak of normal equilibrium when \(\Vert P_\nu \psi _t\Vert ^2\) assumes its long-term value for all \(\nu \).

Our results are typicality statements, i.e., they concern the way most \(\psi _0\) behave, notwithstanding the existence of few exceptional \(\psi _0\) that behave differently. However, a statement about most \(\psi _0\) in \(\mathbb {S}(\mathcal {H})\) would be of limited interest because it could be violated by every system outside of thermal equilibrium, as usually most \(\psi _0\) in \(\mathbb {S}(\mathcal {H})\) are in thermal equilibrium (meaning they are close to \(\mathcal {H}_{eq }\)) [11]. Instead, we make more specific statements: we allow an arbitrary initial macro space \(\mathcal {H}_\mu \), possibly far from thermal equilibrium, and make statements about most \(\psi _0\) in \(\mathbb {S}(\mathcal {H}_\mu )\). Such statements are also naturally of interest when we ask about the increase of the quantum Boltzmann entropy observable [14]

$$\begin{aligned} {\hat{S}} = \sum _\nu S(\nu ) P_\nu \,, \end{aligned}$$
(2)

where

$$\begin{aligned} S(\nu )=k_B \log d_\nu \end{aligned}$$
(3)

is the quantum Boltzmann entropy of the macro state \(\nu \), and \(k_B\) is the Boltzmann constant. Note that a quantum system can be in a superposition of different macro states and thus also in a superposition of different entropy values.

In Sect. 2, we formulate our theorem about dynamical typicality and compare it to related results in the literature. In Sect. 3, the same for generalized normal typicality. In Sect. 4, we prove our result on dynamical typicality. In Sect. 5, we formulate further variants of our results. In Sect. 6, conclusions for realistic sizes of \(d_\nu \) are discussed. In Sect. 7, we outline the proof of generalized normal typicality. In Sect. 8, we collect the remaining proofs. In Sect. 9, we conclude.

2 Dynamical Typicality

2.1 Mathematical Description

For formulating theorems, we introduce the following terminology. Suppose that for each \(\psi \in \mathbb {S}(\mathcal {H}_\mu )\), the statement \(s(\psi )\) is either true or false, and let \(\varepsilon >0\). We say that \(s(\psi )\) is true for \((1-\varepsilon )\)-most \(\psi \in \mathbb {S}(\mathcal {H}_\mu )\) if and only if

$$\begin{aligned} u_\mu \big (\big \{\psi \in {\mathbb {S}}(\mathcal {H}_\mu ): s(\psi )\big \}\big ) \ge 1-\varepsilon \,, \end{aligned}$$
(4)

where \(u_\mu \) is the normalized uniform measure over \(\mathbb {S}(\mathcal {H}_\mu )\). Similarly, given \(T>0\) and \(\delta >0\), we say that a statement s(t) is true for \((1-\delta )\)-most \(t\in [0,T]\) if and only if

$$\begin{aligned} \tfrac{1}{T} \big |\big \{t\in [0,T]: s(t)\big \}\big | \ge 1-\delta \,, \end{aligned}$$
(5)

where |S| means the length of the set \(S\subset {\mathbb {R}}\); and that s(t) is true for \((1-\delta )\)-most \(t\in [0,\infty )\) if and only if the lim inf of the left-hand side of (5) as \(T\rightarrow \infty \) is \(\ge 1-\delta \).

The first finding we mentioned can be expressed as follows.

Theorem 1

(Dynamical typicality) Let \(\mu , \nu \) be arbitrary macro states. There is a function \(w_{\mu \nu }:{\mathbb {R}} \rightarrow [0,1]\) such that for every \(t\in {\mathbb {R}}\) and every \(\varepsilon >0\), for \((1-\varepsilon )\)-most \(\psi _0 \in {\mathbb {S}}(\mathcal {H}_\mu )\),

$$\begin{aligned} \Bigl | \Vert P_\nu \psi _t\Vert ^2-w_{\mu \nu }(t)\Bigr | \le \frac{1}{\sqrt{\varepsilon d_\mu }}. \end{aligned}$$
(6)

Moreover, for every \(\mu ,\nu \), every \(T>0\), and \((1-\varepsilon )\)-most \(\psi _0\in \mathbb {S}(\mathcal {H}_\mu )\),

$$\begin{aligned} \frac{1}{T}\int _0^T \! \bigl | \Vert P_\nu \psi _t\Vert ^2-w_{\mu \nu }(t)\bigr |^2 dt \le \frac{1}{\varepsilon d_\mu }. \end{aligned}$$
(7)

That is, if \(d_\mu \gg 1/\varepsilon \), then for any t and purely random \(\psi _0\) from \(\mathcal {H}_\mu \), the random value \(\Vert P_\nu \psi _t\Vert ^2\) is very probably close to the non-random value \(w_{\mu \nu }(t)\). The latter can in fact be taken to be the average of \(\Vert P_\nu \psi _t\Vert ^2\) over \(\psi _0\in \mathbb {S}(\mathcal {H}_\mu )\), which is

$$\begin{aligned} w_{\mu \nu }(t) := \frac{1}{d_\mu } {{\,\textrm{tr}\,}}\Bigl [ P_\mu \exp (iHt) P_\nu \exp (-iHt)\Bigr ]. \end{aligned}$$
(8)

Likewise, the whole curve of \(\Vert P_\nu \psi _t\Vert ^2\) as a function of \(t\in [0,T]\) is very probably close, in the \(L^2\) norm, to \(w_{\mu \nu }(t)\) as a function of t. (Smallness of the \(L^2\) norm implies further that \(\bigl | \Vert P_\nu \psi _t\Vert ^2-w_{\mu \nu }(t)\bigr |\) is small for most t; however, this statement, which is equivalent to saying that the expression is small for most pairs \((t,\psi _0)\in [0,T]\times \mathbb {S}(\mathcal {H}_\mu )\), follows already from (6); note that the quantifiers “most t” and “most \(\psi _0\)” commute. Moreover, it also follows from (7) by letting \(T\rightarrow \infty \) that the long-time average of \(\bigl | \Vert P_\nu \psi _t\Vert ^2-w_{\mu \nu }(t)\bigr |^2\) is small, but this statement is actually weaker than for finite T, and it will be superseded below by a more specific statement in our second result, generalized normal typicality.) A more general statement for arbitrary operators B instead of \(P_\nu \) and a tighter error bound is formulated in Sect. 5.

As a further remark, we observe that another quantity is also deterministic for purely random \(\psi _0\) from \(\mathbb {S}(\mathcal {H}_\mu )\): not only is the probability \(\Vert P_\nu \psi _t\Vert ^2\) associated with \(\mathcal {H}_\nu \) at time t nearly deterministic, but also the probability current between \(\mathcal {H}_\nu \) and \(\mathcal {H}_{\nu '}\),

$$\begin{aligned} J_{\nu \nu '} := -i\left( \langle \psi _t|P_\nu H P_{\nu '}|\psi _t\rangle -\langle \psi _t|P_{\nu '}HP_\nu |\psi _t\rangle \right) = 2\, Im\, \langle \psi _t|P_\nu H P_{\nu '}|\psi _t \rangle . \end{aligned}$$
(9)

This quantity expresses the amount of probability passing, per unit time, from \(\nu '\) to \(\nu \) minus that from \(\nu \) to \(\nu '\); it satisfies a discrete version of the continuity equation, viz.,

$$\begin{aligned} \partial _t \Vert P_\nu \psi _t\Vert ^2 = \sum _{\nu '} J_{\nu \nu '}. \end{aligned}$$
(10)

In Sect. 8.2 we will show that the probability current between two macro spaces is deterministic.

2.2 Previous Results About Dynamical Typicality

Bartsch and Gemmer [2] introduced the name “dynamical typicality” for the following closely related phenomenon: Given an observable A and \(a\in {\mathbb {R}}\), there is a function a(t) such that for every \(t\in {\mathbb {R}}\) and most \(\psi _0\in \mathbb {S}(\mathcal {H})\) with \(\langle \psi _0|A|\psi _0 \rangle \approx a\), \(\langle \psi _t|A|\psi _t \rangle \approx a(t)\). Müller, Gross, and Eisert [16] proved a rigorous version of this fact that also implies that for every operator B whose operator norm (largest absolute eigenvalue or singular value) is not too large, there is a value b such that for most \(\psi _0\in \mathbb {S}(\mathcal {H})\) with \(\langle \psi _0|A|\psi _0 \rangle \approx a\), \(\langle \psi _0|B|\psi _0 \rangle \approx b\). As Reimann [22] pointed out, this also implies that for every \(t\in {\mathbb {R}}\) and most \(\psi _0\in \mathbb {S}(\mathcal {H})\) with \(\langle \psi _0|A|\psi _0 \rangle \approx a\), \(\langle \psi _t|B|\psi _t \rangle \approx b(t)\) for suitable b(t). Setting \(A=P_\mu \), \(a=1\), and \(B=P_\nu \), this yields that for every \(t\in {\mathbb {R}}\) and most \(\psi _0\in \mathbb {S}(\mathcal {H}_\mu )\), \(\langle \psi _t|P_\nu |\psi _t \rangle =\Vert P_\nu \psi _t\Vert ^2\) is nearly deterministic. For technical reasons, the proofs of Müller, Gross, and Eisert [16] and Reimann [22] do not actually cover the case that A is a projection and \(a=1\). As was pointed out to us by one of the referees of our paper, Balz et al. [1] provide a general result that covers Theorem 1 as a special case. Although our proof strategy is similar to the one in [1], we decided to present our proof in this paper, because it is very simple and transparent and could help to make the at first sight striking phenomenon of dynamical typicality a text book result. Theorem 1 can also be obtained through a proof strategy used by Reimann and Gemmer [23].

A further related result is given by Strasberg et al. [28], who consider repeated measurements at \(0<t_1<t_2<\cdots<t_r<T\) of all \(P_\nu \)’s and argue that the probability distribution of the outcomes is essentially indistinguishable from the joint distribution of \(X_{t_1},\ldots ,X_{t_r}\) for a suitable Markov process \(X_t\) on the set of \(\nu \)’s. This includes the claim that omitting one of the measurements does not significantly alter the distribution of the other outcomes, so the distribution of \(X_t\) should agree with \(\Vert P_\nu \psi _t\Vert ^2\), which is in line with our result.

3 Generalized Normal Typicality

3.1 Motivation

It is well known that for most \(\phi \in {\mathbb {S}}(\mathcal {H})\),

$$\begin{aligned} \Vert P_\nu \phi \Vert ^2 \approx \frac{d_\nu }{D}\,, \end{aligned}$$
(11)

provided that \(d_\nu \) and \(D:=\dim \mathcal {H}\) are large [10]. Under the additional condition that relative to a fixed decomposition (1) into macro spaces the eigenbasis of H is chosen purely randomly among all orthonormal bases (and some further technical conditions that are not very restrictive) (11), holds also for the eigenstates of H, and it can be shown that every \(\psi _0\in {\mathbb {S}}(\mathcal {H})\) evolves so that for most times t,

$$\begin{aligned} \Vert P_\nu \psi _t\Vert ^2 \approx \frac{d_\nu }{D}. \end{aligned}$$
(12)

This fact is known as normal typicality [10, 13, 17, 20].

The assumption of a purely random eigenbasis can be regarded as expressing that the energy eigenbasis is unrelated to the orthogonal decomposition (1). In most realistic systems, however, the energy eigenbasis and the macro decomposition (1) are not unrelated. If they were unrelated, then the system would very rapidly go from any macro space \(\mathcal {H}_\nu \) directly to the thermal equilibrium macro space \(\mathcal {H}_{eq }\) (a macro space containing most dimensions of \(\mathcal {H}\), \(d_{eq }/D\approx 1\)) [7,8,9]. But that does not happen in most systems because thermal equilibrium requires that energy (and other quantities) is rather evenly distributed over all degrees of freedom, and for getting evenly distributed, it needs to get transported through space, which usually requires time and passage through other macro states, cf. Fig. 1.

Fig. 1
figure 1

Example of time evolution of superposition weights \(\Vert P_\nu \psi _t\Vert ^2\), here in a Hilbert space of dimension \(D=2222\) decomposed into 4 macro spaces of dimensions \(d_1=2\) (green curve), \(d_2=20\) (red curve), \(d_3=200\) (blue curve), and \(d_4=2000\) (purple curve). The four curves add up to 1 at each t. At large t, the equilibrium subspace \(\mathcal {H}_4\) has the biggest contribution. \(\psi _0\) was chosen purely randomly from \(\mathbb {S}(\mathcal {H}_2)\) (i.e., \(\mu =2\), so the red curve starts at 1, all others at 0). The Hamiltonian is a random band matrix (i.e., only entries sufficiently close to the main diagonal are significantly nonzero) in a basis aligned with the macro spaces, but with a wide enough bandwidth to still ensure delocalized eigenfunctions. Thus, parts of \(\psi _t\) reach \(\mathcal {H}_4\) only after passing through \(\mathcal {H}_3\), as mirrored in the fact that the blue curve increases first before it decreases in favor of the purple curve. Along with each of the four curves, also its deterministic approximation \(w_{2\nu }(t)\) (in black) is drawn; dynamical typicality asserts that it is a good approximation (Color figure online)

That is why we are interested in generalizations of normal typicality that apply also to Hamiltonians whose eigenbasis is not unrelated to \(\mathcal {H}_\nu \). For such H, eigenvectors \(\phi \) must be expected to have superposition weights \(\Vert P_\nu \phi \Vert ^2\) not always near \(d_\nu /D\). Our result actually applies to all Hamiltonians, at the expense that it does not apply to all initial quantum states \(\psi _0\). As noted already, a statement about most \(\psi _0\in {\mathbb {S}}(\mathcal {H})\) would be limited to systems starting out in thermal equilibrium. Our result states that for any macro state \(\mu \), most \(\psi _0\in \mathbb {S}(\mathcal {H}_\mu )\) evolve so that for most times t

$$\begin{aligned} \Vert P_\nu \psi _t\Vert ^2 \approx M_{\mu \nu }\,, \end{aligned}$$
(13)

provided that \(d_\mu \) is large. See Theorem 2 for the precise quantitative statement and the definition of \(M_{\mu \nu }\). The proof (see Sect. 8) builds particularly on techniques developed by Short and Farrelly [24, 25], but is also related to a series of works on quantum equilibration (e.g., [15, 19]) in which the long-time behavior of \(\langle \psi _t|B|\psi _t \rangle \) is studied under various assumptions on B and \(\psi _0\).

The \(M_{\mu \nu }\) are actually the averages of \(\Vert P_\nu \psi _t\Vert ^2\) over \(t\in [0,\infty )\) and over \(\psi _0\in \mathbb {S}(\mathcal {H}_\mu )\). Thus, they depend only on H and the decomposition (1), but not on t or \(\psi _0\).

In this setting, thermalization means that \(M_{\mu \,\textrm{eq}}\approx 1\) for every \(\mu \), i.e., that for all macro states \(\mu \) the overwhelming majority of micro states eventually reach thermal equilibrium in the sense that \(\psi _t\) lies almost completely in \(\mathcal {H}_\textrm{eq}\) and spends most of the time in the long run there. The time scale on which thermalization happens can be read off from the function \(w_{\mu \,\textrm{eq}}(t)\), while the other \(w_{\mu \nu }(t)\) provide information about the detailed path to thermal equilibrium passing through intermediate macro states.

3.2 Statement of Result

In the following we consider Hamiltonians with spectral decomposition

$$\begin{aligned} H=\sum _{e\in {\mathcal {E}}} e \,\Pi _e, \end{aligned}$$
(14)

where \({\mathcal {E}}\) is the set of distinct eigenvalues of H and \(\Pi _e\) the projection onto the eigenspace of H with eigenvalue e. The quantitative bounds in our theorem depend on the Hamiltonian only through the following characteristics of the distribution of its eigenvalues: the maximum degeneracy \(D_E := \max _{e\in {\mathcal {E}}} {{\,\textrm{tr}\,}}(\Pi _e)\) of an eigenvalue and the maximal gap degeneracy

$$\begin{aligned} D_G := \max _{E\in \mathbb {R}} \# \bigl \{ (e,e')\in {\mathcal {E}}\times {\mathcal {E}}: e\ne e' \text { and }e-e'=E \bigr \}. \end{aligned}$$
(15)

Theorem 2

(Generalized normal typicality) Let \(\mu ,\nu \) be any macro states and define

$$\begin{aligned} M_{\mu \nu }&:= \frac{1}{d_\mu } \sum _{e\in {\mathcal {E}}} {{\,\textrm{tr}\,}}\left( P_\mu \Pi _e P_\nu \Pi _e\right) . \end{aligned}$$
(16)

Then for any \(\varepsilon , \delta >0\), \((1-\varepsilon )\)-most \(\psi _0\in {\mathbb {S}}(\mathcal {H}_\mu )\) are such that for \((1-\delta )\)-most \(t\in [0,\infty )\)

$$\begin{aligned} \biggl | \Vert P_\nu \psi _t\Vert ^2 - M_{\mu \nu } \biggr |\le 4\,\sqrt{\frac{D_E D_G}{\delta \varepsilon d_\mu }\min \left\{ 1,\frac{d_\nu }{d_\mu }\right\} }. \end{aligned}$$
(17)

Thus, as soon as \(d_\mu \gg D_E D_G\), i.e., as soon as the dimension of \(\mathcal {H}_\mu \) is huge and no eigenvalue and no gap of H is macroscopically degenerate, for most initial states \(\psi _0\in {\mathbb {S}}(\mathcal {H}_\mu )\) the superposition weight \(\Vert P_\nu \psi _t\Vert ^2\) will be close to the fixed value \(M_{\mu \nu }\) for most times t.

Fig. 2
figure 2

The same simulation as in Fig. 1, only for longer times. The horizontal black lines indicate the values of the weights \(M_{2\nu }\). The inset shows a part of the figure in magnification. Theorem 2 states that the displayed behavior is typical of initial states in \(\mathcal {H}_2\): up to fluctuations that are either small or rare, \(\Vert P_\nu \psi _t\Vert ^2\) is close to \(M_{2\nu }\)

For comparison, Reimann and Gemmer [23] also concluded that \(\langle \psi _t|A|\psi _t \rangle \) is nearly constant, but for a different ensemble based on the condition \(\langle \psi _0|A|\psi _0 \rangle \approx a\). We also provide a statement analogous to Theorem 2 for \(\langle \psi _t|B|\psi _t \rangle \) with arbitrary observable B instead of \(P_\nu \) in Theorem 4 below.

3.3 Example

We illustrate Theorem 2 within a simple random matrix model. We partition the D-dimensional Hilbert space \(\mathcal {H}:= {\mathbb {C}}^D = {\mathbb {C}}^{d_1} \oplus {\mathbb {C}}^{d_2}\oplus {\mathbb {C}}^{d_3}\oplus {\mathbb {C}}^{d_4}=:\bigoplus _{\nu =1}^4\mathcal {H}_\nu \) into four macro spaces \(\mathcal {H}_\nu \) of dimension \(d_\nu \), i.e., \(\mathcal {H}_1\) is spanned by the first \(d_1\) canonical basis vectors, \(\mathcal {H}_2\) by the next \(d_2\) canonical basis vectors and so on. The Hamiltonian H is a random \(D\times D\)-matrix H that has a band structure (i.e., mainly near-diagonal entries) and thus couples neighboring macro spaces more strongly than distant ones. More precisely, we choose \(H=(h_{ij})_{ij}\) to be a self-adjoint random matrix such that \(h_{ii} \sim {\mathcal {N}}(0, \sigma _{ii}^2)\) and \(h_{ij} \sim {\mathcal {N}}(0, \sigma _{ij}^2/2) + i {\mathcal {N}}(0, \sigma _{ij}^2/2)\) for \(i\ne j\), where

$$\begin{aligned} \sigma _{ij}^2 := \exp (-s|i-j|) \end{aligned}$$
(18)

with some \(s>0\) that controls the bandwidth. That is, the variances decrease exponentially in the distance from the diagonal.

In Figs. 1 and 2 the weights \(\Vert P_\nu \psi _t\Vert ^2\) are plotted for the values \(s=0.02\), \(d_\nu = 2 \times 10^{\nu -1}\), and a random initial vector \(\psi _0\in \mathcal {H}_2\). In Fig. 1 the plot shows the initial phase where the system first passes through the 3rd macro state before settling mostly in the “equilibrium space” \(\mathcal {H}_4\). Note that the bandwidth is roughly \(2s^{-1}= 400 \approx D^{0.77}\gg D^{0.5}\) and we thus expect to be in the regime of delocalized eigenfunctions, which is also confirmed by the numerical results.

Theorem 2 states that the long term behaviour depicted in Fig. 2 is typical of initial states \(\psi _0\in \mathcal {H}_2\): after some time the system equilibrates, the superposition weights \(\Vert P_\nu \psi _{2,t}\Vert ^2\) approach values \(M_{2\nu }\) independent of the initial state, and stay close to them after the initial phase of equilibration. We also see that these values differ from the ones one would expect if normal typicality would hold: for example while in our simulation \(d_4/D \approx 0.90\) one finds that \(M_{24} \approx 0.82\).

The average entropy as a function of time is plotted in Fig. 3. As expected, it increases up to small fluctuations.

Fig. 3
figure 3

The average entropy \(\langle \psi _t|\hat{S}|\psi _t\rangle = \sum _\nu \Vert P_\nu \psi _t\Vert ^2 S(\nu )\) as a function of time t for \(k_\textrm{B}=1\) and the same simulation as in Figs. 1 and 2. The tendency to increase can be regarded as a reflection of the second law of thermodynamics

4 Proof of Theorem 1

The proof is very simple, based on an application of Chebyshev’s, respectively Markov’s, inequality to the following formulas for Hilbert space averages and Hilbert space variances [5, App. C]: For any Hilbert space \(\mathcal {H}\) of dimension d, uniformly distributed \(\psi \in \mathbb {S}(\mathcal {H})\), and any operator B on \(\mathcal {H}\),

$$\begin{aligned} \mathbb {E}\bigl [ \langle \psi |B|\psi \rangle \bigr ]&= \frac{1}{d} {{\,\textrm{tr}\,}}B \end{aligned}$$
(19)
$$\begin{aligned} {{\,\textrm{Var}\,}}\bigl [ \langle \psi |B|\psi \rangle \bigr ]&= \frac{1}{d(d+1)} \Bigl ({{\,\textrm{tr}\,}}(B^\dagger B)-\frac{|{{\,\textrm{tr}\,}}B|^2}{d} \Bigr ). \end{aligned}$$
(20)

(As usual, the variance of a complex random variable Z is defined as Var \(Z:= {\mathbb {E}}\bigl [|Z-{\mathbb {E}}(Z)|^2\bigr ] = {\mathbb {E}}\bigl [|Z|^2\bigr ]-|{\mathbb {E}}(Z)|^2\).) Dropping the last term and replacing \(d+1\) by d, we obtain the trivial upper bound

$$\begin{aligned} {{\,\textrm{Var}\,}}\bigl [ \langle \psi |B|\psi \rangle \bigr ] \le \frac{{{\,\textrm{tr}\,}}(B^\dagger B)}{d^2}. \end{aligned}$$
(21)

Now we insert \(\mathcal {H}_\mu \) for \(\mathcal {H}\) and \(B=P_\mu \exp (iHt) P_\nu \exp (-iHt) P_\mu \); we write \(\mathbb {E}_\mu \) and \({{\,\textrm{Var}\,}}_\mu \) for expectation and variance over uniformly distributed \(\psi _0\in \mathbb {S}(\mathcal {H}_\mu )\). We observe first that

$$\begin{aligned} \mathbb {E}_\mu \bigl [ \Vert P_\nu \psi _t\Vert ^2 \bigr ] = \frac{1}{d_\mu } {{\,\textrm{tr}\,}}\bigl [P_\mu \exp (iHt) P_\nu \exp (-iHt) \bigr ] = w_{\mu \nu }(t). \end{aligned}$$
(22)

For the variance, since \(|{{\,\textrm{tr}\,}}(CD)|\le \Vert C\Vert {{\,\textrm{tr}\,}}(|D|)\) for any operators CD and \(\Vert C\Vert \) the operator norm of C [26, Thm. 3.7.6], we have that

$$\begin{aligned} {{\,\textrm{tr}\,}}(B^\dagger B)&= {{\,\textrm{tr}\,}}\Bigl (P_\mu \exp (-iHt) P_\nu \exp (iHt) P_\mu \exp (iHt) P_\nu \exp (-iHt) P_\mu \Bigr ) \end{aligned}$$
(23)
$$\begin{aligned}&\le \Vert P_\mu \Vert \, \Vert \exp (-iHt)\Vert \,\Vert P_\nu \Vert \, \Vert \exp (iHt)\Vert \cdots \Vert \exp (-iHt)\Vert \, {{\,\textrm{tr}\,}}P_\mu \end{aligned}$$
(24)
$$\begin{aligned}&= d_\mu . \end{aligned}$$
(25)

We thus obtain that

$$\begin{aligned} {{\,\textrm{Var}\,}}_\mu \bigl [ \Vert P_\nu \psi _t\Vert ^2 \bigr ] \le \frac{1}{d_\mu }. \end{aligned}$$
(26)

The Chebyshev inequality then yields the first claim, (6).

For the second claim, Fubini’s theorem allows us to interchange expectation and integral. Thus,

$$\begin{aligned} \mathbb {E}_\mu \biggl [\int _0^T \bigl | \Vert P_\nu \psi _t\Vert ^2 - w_{\mu \nu }(t) \bigr |^2\, dt\biggr ]&= \int _0^T \mathbb {E}_\mu \Bigl [ \bigl | \Vert P_\nu \psi _t\Vert ^2 - w_{\mu \nu }(t) \bigr |^2\Bigr ]\, dt \end{aligned}$$
(27)
$$\begin{aligned}&= \int _0^T {{\,\textrm{Var}\,}}_\mu \bigl [ \Vert P_\nu \psi _t\Vert ^2 \bigr ]\, dt \end{aligned}$$
(28)
$$\begin{aligned}&\le \frac{T}{d_\mu } \end{aligned}$$
(29)

by (26). Markov’s inequality then yields the second claim, (7).\(\square \)

As a side remark, the arguments of the proof also yield the following upper bound on the Hilbert space variance over subspaces of dimension \(d_\mu \) for arbitrary B:

$$\begin{aligned} {{\,\textrm{Var}\,}}_\mu \bigl [ \langle \psi |B|\psi \rangle \bigr ] \le \frac{{{\,\textrm{tr}\,}}(P_\mu B^\dagger P_\mu B P_\mu )}{d_\mu ^2} \le \frac{\Vert B\Vert \, {{\,\textrm{tr}\,}}(|B|)}{d_\mu ^2}. \end{aligned}$$
(30)

5 More General Results

5.1 Dynamical Typicality

Here is a variant of Theorem 1 that allows for an arbitrary operator B instead of \(P_\nu \) and provides a tighter error bound:

Theorem 3

Let \(\mu ,\nu \) be arbitrary macro states and let B be any operator on \(\mathcal {H}\). There is a function \(w_{\mu B}: {\mathbb {R}}\rightarrow [0,1]\) such that for every \(t\in {\mathbb {R}}\) and every \(\varepsilon >0\), for \((1-\varepsilon )\)-most \(\psi _0\in {\mathbb {S}}(\mathcal {H}_\mu )\),

$$\begin{aligned} \Bigl | \langle \psi _t|B|\psi _t\rangle -w_{\mu B}(t)\Bigr |\le \min \Biggl \{ \frac{\Vert B\Vert }{\sqrt{\varepsilon d_\mu }},~ \sqrt{\frac{\Vert B\Vert {{\,\textrm{tr}\,}}(|B|)}{\varepsilon d_\mu ^2}},~ \sqrt{\frac{18\pi ^3 \log (4/\varepsilon )}{d_\mu }} \Vert B\Vert \Biggr \}. \end{aligned}$$
(31)

Moreover, for every \(\mu \) and B, every \(T>0\), and \((1-\varepsilon )\)-most \(\psi _0\in {\mathbb {S}}(\mathcal {H}_\mu )\),

$$\begin{aligned} \frac{1}{T}\int _0^T \! \bigl | \langle \psi _t|B|\psi _t\rangle -w_{\mu B}(t)\bigr |^2 dt \le \frac{\Vert B\Vert ^2}{\varepsilon d_\mu }. \end{aligned}$$
(32)

In fact, the function \(w_{\mu B}(t)\) is the average of \(\langle \psi _t|B|\psi _t\rangle \) over \(\psi _0\in {\mathbb {S}}(\mathcal {H}_\mu )\), which is

$$\begin{aligned} w_{\mu B}(t) := \frac{1}{d_\mu } {{\,\textrm{tr}\,}}\left[ P_\mu \exp (iHt) B \exp (-iHt)\right] . \end{aligned}$$
(33)

The proof of Theorem 3 (see Sect. 8.1) is largely analogous to that of Theorem 1. The bound involving \(\sqrt{\log (1/\varepsilon )}\) instead of \(1/\sqrt{\varepsilon }\) can be obtained by using Lévy’s lemma instead of the Chebyshev inequality. However, it turns out that for all other results in this paper, the bounds provided by Markov’s and Chebyshev’s inequality are better than those provided by Lévy’s lemma. That is because in many cases, Lévy’s lemma yields a bound that is better in \(\varepsilon \) but worse in \(d_\mu \), which in our situation is worse because \(d_\mu \) is usually way larger than any relevant \(1/\varepsilon \); see Sect. 8.1 for more detail.

5.2 Generalized Normal Typicality

The next result, Theorem 4, provides a somewhat more general version of Theorem 2 that concerns arbitrary operators B instead of \(P_\nu \), as well as finite time intervals instead of \([0,\infty )\). To formulate it, we define the number \(d_E:=\#{\mathcal {E}}\) of distinct eigenvalues and the maximal number of gaps in an energy interval of length \(\kappa >0\),

$$\begin{aligned} G(\kappa ) := \max _{E\in {\mathbb {R}} } \,\#\big \{&(e,e') \in {\mathcal {E}}\times {\mathcal {E}}\,:\, e\not =e' \text{ and } e-e' \in [E,E+\kappa )\big \}. \end{aligned}$$
(34)

It follows that \(D_G = \lim _{\kappa \rightarrow 0^+} G(\kappa )\).

Theorem 4

Let B be an operator on \(\mathcal {H}\), let \(\varepsilon ,\delta ,\kappa ,T>0\), let \(\mu \) be any macro state, and define

$$\begin{aligned} M_{\mu B}&:= \frac{1}{d_\mu } \sum _{e\in {\mathcal {E}}} {{\,\textrm{tr}\,}}\left( P_\mu \Pi _e B \Pi _e\right) . \end{aligned}$$
(35)

Then \((1-\varepsilon )\)-most \(\psi _0\in {\mathbb {S}}(\mathcal {H}_\mu )\) are such that for \((1-\delta )\)-most \(t\in [0,T]\)

$$\begin{aligned}&\biggl |\langle \psi _t|B|\psi _t\rangle - M_{\mu B} \biggr | \\ \nonumber&\quad \le 4 \,\sqrt{\frac{D_E \,G(\kappa ) \Vert B\Vert }{\delta \varepsilon d_\mu } \left( 1+\frac{8\log _2 d_E}{\kappa T}\right) \min \left\{ \Vert B\Vert , \frac{{{\,\textrm{tr}\,}}(|B|)}{d_\mu }\right\} }. \end{aligned}$$
(36)

Thus, as soon as \(d_\mu \gg D_E G(\kappa ) \Vert B\Vert ^2\) and T is large enough, the right-hand side of (36) is small and the expectation \(\langle \psi _t|B|\psi _t\rangle \) is close to a fixed value \(M_{\mu B}\) for most times \(t\in [0,T]\) and most initial states \(\psi _0\in {\mathbb {S}}(\mathcal {H}_\mu )\). However, the times T required to make the right-hand side of (36) small are usually extremely large. For example, for a system of N particles, \(\mathcal {H}\) has dimension of the order \(\exp (N)\); provided that no eigenvalue is hugely degenerate, there are of the order \(\exp (N)\) energy eigenvalues. In order to obtain a small error, we need to keep \(G(\kappa )\) small. For \(\kappa \sim \exp (-N)\Delta E\), already the number of nearest-neighbor gaps with \(e-e'\in [0,\kappa )\) will be of order \(\exp (N)\), and will thus contribute of order \(\exp (N)\) to \(G(\kappa )\). So, we need \(\kappa \ll \exp (-N)\) and therefore \(T \gg \exp (N)\) to obtain a small error in (36).

For the proof of Theorems 2 and 4 we need, besides Hilbert space averages and variances, also Hilbert space covariances of two operators. The covariance of two complex random variables XY is to be understood as

$$\begin{aligned} \textrm{Cov}[X,Y]&:= \mathbb {E}\bigl [ (X-\mathbb {E}X)^* (Y- \mathbb {E}Y) \bigr ] \end{aligned}$$
(37)
$$\begin{aligned}&= \mathbb {E}[X^*Y]- (\mathbb {E}X)^* \, \mathbb {E}Y. \end{aligned}$$
(38)

Lemma 1

(Hilbert Space Covariance) For uniformly distributed \(\psi \in \mathbb {S}(\mathcal {H})\) with \(\dim \mathcal {H}=d\) and any two operators BC on \(\mathcal {H}\),

$$\begin{aligned} \textrm{Cov}\Bigl [ \langle \psi |B|\psi \rangle , \langle \psi |C|\psi \rangle \Bigr ]&= \frac{{{\,\textrm{tr}\,}}(B^\dagger C)}{d(d+1)}- \frac{{{\,\textrm{tr}\,}}(B^\dagger ) {{\,\textrm{tr}\,}}(C)}{d^2(d+1)}. \end{aligned}$$
(39)

Put differently,

$$\begin{aligned} \mathbb {E}\bigl [ \langle \psi |B|\psi \rangle ^* \langle \psi |C|\psi \rangle \bigr ] = \frac{{{\,\textrm{tr}\,}}(B^\dagger ){{\,\textrm{tr}\,}}(C)+{{\,\textrm{tr}\,}}(B^\dagger C)}{d(d+1)}. \end{aligned}$$
(40)

By inserting \(\mathcal {H}_\mu \) for \(\mathcal {H}\), it follows that for uniformly distributed \(\psi \in \mathbb {S}(\mathcal {H}_\mu )\) and any two operators BC on \(\mathcal {H}\),

$$\begin{aligned}&{\mathbb {E}}_\mu \big [\langle \psi |B^\dagger |\psi \rangle \langle \psi |C|\psi \rangle \big ] \nonumber \\&\quad =\frac{1}{d_\mu (d_\mu +1)} \big ({{\,\textrm{tr}\,}}(P_\mu B^\dagger ) {{\,\textrm{tr}\,}}(P_\mu C) + {{\,\textrm{tr}\,}}(P_\mu B^\dagger P_\mu C)\big ). \end{aligned}$$
(41)

6 Realistic Dimensions and Entropy

As indicated before, for a system of N particles or more generally of N degrees of freedom the dimension D is of order \(\exp (N)\). We actually expect \(D\approx \exp (s_\textrm{eq} N/k_\textrm{B})\), where \(s_\textrm{eq}\) is the entropy per particle in the thermal equilibrium state, and accordingly for all macro spaces \(\mathcal {H}_\mu \),

$$\begin{aligned} d_\mu = \exp (s_\mu N/k_\textrm{B}). \end{aligned}$$
(42)

The following corollary to Theorem 2 shows that in this situation and assuming that no eigenvalues or gaps are macroscopically degenerate, fluctuations of the time-dependent superposition weights around their expected values are exponentially small in the number of particles with a rate controlled by the entropy per particle in the initial macro state.

Corollary 1

Assume (42). Then, for all macro states \(\mu ,\nu _- ,\nu _+ \) with

$$\begin{aligned} s_{\nu _-} \le s_\mu \le s_{\nu _+} \end{aligned}$$
(43)

it holds for \((1-\varepsilon )\)-most \(\psi _0\in {\mathbb {S}}(\mathcal {H}_\mu )\) for \((1-\delta )\)-most of the time that

$$\begin{aligned} \biggl | \Vert P_{\nu _+}\psi _t\Vert ^2 - M_{\mu \nu _+} \biggr |&\le \frac{4\sqrt{D_E D_G}}{\sqrt{\varepsilon \delta }} \,\exp \left( -\frac{s_\mu N}{2k_\textrm{B}}\right) , \end{aligned}$$
(44)
$$\begin{aligned} \biggl | \Vert P_{\nu _-}\psi _t\Vert ^2 - M_{\mu \nu _-} \biggr |&\le \frac{4\sqrt{D_E D_G}}{\sqrt{\varepsilon \delta }} \,\exp \left( -\frac{(s_\mu -\frac{s_{\nu _-}}{2})N}{k_\textrm{B}}\right) . \end{aligned}$$
(45)

In particular, if \(s_\mu ,s_{\nu _{\pm }}\) are fixed and \(N\rightarrow \infty \), the error bounds are exponentially small. Note also that the numerical experiment in Fig. 2 is consistent with the idea that the fluctuations of the superposition weights in macro spaces \(\nu _+\) of larger entropy than the initial state \(\mu \) are controlled by the entropy \(s_\mu \) of the initial macro state, while the fluctuations of the superposition weights in macro spaces \(\nu _-\) of smaller entropy than the initial state \(\mu \) are controlled by the entropy difference \(s_\mu - s_{\nu _-}/2\) and thus even smaller. However, from the green line in Fig. 2 (corresponding to \(\Vert P_{1}\psi _t\Vert ^2\)) it is also apparent that the fluctuations of \(\Vert P_{\nu }\psi _t\Vert ^2\) might exceed the value of \(M_{\mu \nu }\). Indeed, if we assume that the weights \(M_{\mu \nu }\) scale like in the case of normal typicality, i.e.,

$$\begin{aligned} M_{\mu \nu } \approx \frac{d_\nu }{D} \approx \exp \left( -\frac{s_\textrm{eq} - s_\nu }{k_\textrm{B}}N\right) \,, \end{aligned}$$
(46)

then the relative error in (44) is only small if \(s_{\nu _+} > s_\textrm{eq} - s_\mu /2\), and the relative error in (45) is only small if \(s_{\nu _-} > 2(s_\textrm{eq} - s_\mu )\).

More generally, the question remains under which conditions one can prove that even for \(M_{\mu \nu }\) close to 0, the relative error in (17) and thus the relative deviation of \(\Vert P_\nu \psi _t\Vert ^2\) from \(M_{\mu \nu }\) will be small. In a separate work [29], we study this question for specific distributions of the random matrix H.

7 Outline of Proof of Theorem 4

Before we provide the technical details of the proof of Theorem 4 in Sect. 8, we explain now the main strategy and the key ideas. The first step is to control the time variance

$$\begin{aligned} \left\langle \bigl |\langle \psi _t|B|\psi _t\rangle -M_{\psi _0 B}\bigr |^2\right\rangle _T := \frac{1}{T} \int _0^T \!\! \bigl |\langle \psi _t|B|\psi _t\rangle -M_{\psi _0 B}\bigr |^2 \,dt \end{aligned}$$
(47)

of the quantity \(\langle \psi _t|B|\psi _t\rangle \), where

$$\begin{aligned} M_{\psi _0 B} =\overline{\langle \psi _t|B|\psi _t\rangle } := \lim _{T\rightarrow \infty } \frac{1}{T} \int _0^T\langle \psi _t|B|\psi _t\rangle \; dt \end{aligned}$$
(48)

is just the time-average of \(\langle \psi _t|B|\psi _t\rangle \). The time variance (47) was the subject of several earlier investigations concerning thermalization in closed quantum systems. It is usually controlled in terms of the effective dimension [19, 24, 25]

$$\begin{aligned} d_{eff } := \Bigl (\sum _e \langle \psi _0|\Pi _e|\psi _0\rangle ^2 \Bigr )^{-1} \end{aligned}$$
(49)

of the initial state \(\psi _0\), a measure for the number of distinct energies that contribute significantly to \(\psi _0\). In Sect. 8.7 we slightly improve the bound of [25] (relevant when \(d_\nu \ll d_\mu \)) so that we can show that, after averaging the initial state over \({\mathbb {S}}(\mathcal {H}_\mu )\), one obtains that

$$\begin{aligned}&{\mathbb {E}}_\mu \Bigl [ \left\langle \bigl |\langle \psi _t|B|\psi _t\rangle - M_{\psi _0 B}\bigr |^2\right\rangle _T \Bigr ]\\&\quad \le \frac{2D_E G(\kappa )}{d_\mu +1} \left( 1+\frac{8\log _2 d_E}{\kappa T}\right) \min \left\{ \Vert B\Vert ^2, \frac{{{\,\textrm{tr}\,}}(B^\dagger B)}{d_\mu }\right\} .\nonumber \end{aligned}$$
(50)

The second step is to show that \(M_{\psi _0 B}\) is very close to \(M_{\mu B}\) for most states \(\psi _0\in \mathbb {S}(\mathcal {H}_\mu )\). To this end we observe that \( {\mathbb {E}}_\mu (M_{\psi _0 B})= M_{\mu B} \) and then bound the variance according to

$$\begin{aligned} {\mathbb {E}}_\mu \Bigl [ (M_{\psi _0 B} - M_{\mu B})^2 \Bigr ]&\le \frac{\Vert B\Vert }{d_\mu +1} \min \left\{ \Vert B\Vert , \frac{{{\,\textrm{tr}\,}}(|B|)}{d_\mu } \right\} .\end{aligned}$$
(51)

A careful application of Markov’s inequality then shows that (50) and (51) together imply (36).

8 Remaining Proofs

8.1 Proof of Theorem 3

The phenomenon of concentration of measure, i.e., that on a sphere in high dimension, “nice” functions are nearly constant, is often expressed by means of (e.g., [28, Sec. II.C])

Lemma 2

(Lévy’s Lemma) For any Hilbert space \(\mathcal {H}\) with dimension d, any \(f:\mathbb {S}(\mathcal {H})\rightarrow \mathbb {R}\) with Lipschitz constant \(\eta (f)\), and any \(\varepsilon >0\),

$$\begin{aligned} |f(\psi )-\mathbb {E}f| \le \sqrt{\frac{9\pi ^3 \log (4/\varepsilon )}{2d}}\, \eta (f) \end{aligned}$$
(52)

for \((1-\varepsilon )\)-most \(\psi \in \mathbb {S}(\mathcal {H})\).

Alternatively, Chebyshev’s inequality yields that

$$\begin{aligned} |f(\psi )-\mathbb {E}f| \le \sqrt{\frac{{{\,\textrm{Var}\,}}(f)}{\varepsilon }} \end{aligned}$$
(53)

for \((1-\varepsilon )\)-most \(\psi \in \mathbb {S}(\mathcal {H})\). In the important special case \(f\ge 0\), Markov’s inequality yields that

$$\begin{aligned} f(\psi ) \le \frac{\mathbb {E}f}{\varepsilon } \end{aligned}$$
(54)

for \((1-\varepsilon )\)-most \(\psi \in \mathbb {S}(\mathcal {H})\), while Lévy’s lemma can be used in this situation to obtain that

$$\begin{aligned} f(\psi ) \le \mathbb {E}f + \sqrt{\frac{9\pi ^3 \log (4/\varepsilon )}{2d}}\, \eta (f). \end{aligned}$$
(55)

Which bound is best depends on \(\eta (f)\), \({{\,\textrm{Var}\,}}(f)\), and \(\mathbb {E}f\). For quadratic functions \(f(\psi )=\langle \psi |B|\psi \rangle \), \(\eta (f)= 2\Vert B\Vert \) on \(\mathbb {S}(\mathcal {H})\), while expectation and variance are given by (19) and (20); the first two bounds in (31) arise from the Chebyshev bound (53) with different ways of bounding the variance, and the third from Lévy’s lemma (52).

As remarked already, the other results in this paper are not improved by using Lévy’s lemma instead of Markov’s and Chebyshev’s inequality. That is basically because the relevant functions \(f\ge 0\) have means that are small like 1/dimension but Lipschitz constants of order 1, so that (55) yields errors of order \(1/\sqrt{\text {dimension}}\). Now it is of little interest to make \(\varepsilon \) smaller than \(10^{-200}\). (Borel once argued [3, Chap. 6] that events with a probability of \(10^{-200}\) or less can be expected to never occur in the history of the universe.) On the other hand, the dimensions are large like \(10^N\), so the advantage of (55) over (54) in \(\varepsilon \) does not compensate for its disadvantage in the dimension.

Proof of Theorem 3

By (19) after inserting \(\mathcal {H}_\mu \) for \(\mathcal {H}\) and \(P_\mu \exp (iHt) B \exp (-iHt) P_\mu \) for B,

$$\begin{aligned} \mathbb {E}_\mu \langle \psi _t|B|\psi _t \rangle = \frac{1}{d_\mu } {{\,\textrm{tr}\,}}\bigl ( P_\mu \exp (iHt) B \exp (-iHt) \bigr ) = w_{\mu B}(t). \end{aligned}$$
(56)

Lévy’s lemma with \(\eta =2\Vert B\Vert \) yields the third bound in (31).

By (21) after inserting \(\mathcal {H}_\mu \) for \(\mathcal {H}\) and \(P_\mu \exp (iHt) B \exp (-iHt) P_\mu \) for B,

$$\begin{aligned} {{\,\textrm{Var}\,}}_\mu \langle \psi _t|B|\psi _t \rangle \le \frac{1}{d_\mu ^2}{{\,\textrm{tr}\,}}\Bigl ( P_\mu \exp (-iHt) B^\dagger \exp (iHt) P_\mu \exp (iHt) B \exp (-iHt) P_\mu \Bigr ).\nonumber \\ \end{aligned}$$
(57)

We give two upper bounds for the last expression. First, using \(|{{\,\textrm{tr}\,}}(CD)| \le \Vert C\Vert {{\,\textrm{tr}\,}}(|D|)\) and \(\Vert B^\dagger \Vert = \Vert B\Vert \),

$$\begin{aligned} (57)&\le \frac{1}{d_\mu ^2} \Vert P_\mu \Vert \Vert \exp (-iHt)\Vert \cdots \Vert \exp (-iHt)\Vert {{\,\textrm{tr}\,}}P_\mu \end{aligned}$$
(58)
$$\begin{aligned}&= \frac{1}{d_\mu ^2} \Vert B\Vert ^2 d_\mu = \frac{\Vert B\Vert ^2}{d_\mu }. \end{aligned}$$
(59)

Second, by leaving B rather than \(P_\mu \) inside the trace,

$$\begin{aligned} (57) \le \frac{1}{d_\mu ^2} \Vert B\Vert {{\,\textrm{tr}\,}}(|B|). \end{aligned}$$
(60)

From these two bounds on the variance, (53) yields the first two bounds in (31). For the second claim, (32), of Theorem 3, the proof works as for Theorem 1 with the bound (59) for \({{\,\textrm{Var}\,}}_\mu \langle \psi _t|B|\psi _t \rangle \). \(\square \)

8.2 Probability Current

In order to see that also the probability current \(J_{\nu \nu '}(t)\) as defined in (9) is deterministic, we verify that \(\langle \psi _t|P_\nu H P_{\nu '}|\psi _t \rangle \) is deterministic. This can be obtained in the same way as for Theorem 3 by considering \(B=P_\mu \exp (iHt) P_\nu H P_{\nu '} \exp (-iHt) P_\mu \) instead of \(B=P_\mu \exp (iHt) P_\nu \exp (-iHt) P_\mu \) and noting that \(\Vert B\Vert \le \Vert H\Vert = \max \{|E-\Delta E|, |E|\}\). Physically, we expect E to be comparable to the particle number N and thus of order \(\log D\), so \(|J_{\nu \nu '}(t)-\mathbb {E}J_{\nu \nu '}(t)|\) is bounded by a constant times \(\log D/\sqrt{\varepsilon d_\mu }\) (which would be small if we imagine \(d_\mu \sim D^\alpha \) with \(0<\alpha < 1\) and fixed \(\varepsilon \)) for \((1-\varepsilon )\)-most \(\psi _0 \in \mathbb {S}(\mathcal {H}_\mu )\). Likewise, 1/T times the \(L^2\) norm over [0, T] is bounded by a constant times \(\log ^2 D/\varepsilon d_\mu \) (which should be small).

8.3 Hilbert Space Covariance

For the proof of Lemma 1, we need the fourth moments of a random vector that is uniformly distributed over the unit sphere. So consider any Hilbert space \(\mathcal {H}\) of dimension d and a uniformly distributed \(\psi \in \mathbb {S}(\mathcal {H})\). Let \(\left( \varphi _m\right) _{m}\) be an orthonormal basis of \(\mathcal {H}\) and \(a_m := \langle \varphi _m|\psi \rangle \). Then [17, 5, App. A.2 and C.1]

$$\begin{aligned} (i)&\, {\mathbb {E}}(a_k^* a_l a_m^* a_n) = 0 \quad \text{ if } \text{ an } \text{ index } \text{ occurs } \text{ only } \text{ once }, \end{aligned}$$
(61a)
$$\begin{aligned} (ii)&\, {\mathbb {E}}\left( a_k^{*2} a_l^2\right) =0 \quad \text{ for }\; k\ne l, \end{aligned}$$
(61b)
$$\begin{aligned} (iii)&\, {\mathbb {E}}\left( |a_k|^4\right) = \frac{2}{d(d+1)}, \end{aligned}$$
(61c)
$$\begin{aligned} (iv)&\, {\mathbb {E}}\left( |a_k|^2 |a_l|^2\right) = \frac{1}{d(d+1)} \quad \text{ for }\; k \ne l. \end{aligned}$$
(61d)

Proof of Lemma 1

Let \((\varphi _m)_m\) be an orthonormal basis of \(\mathcal {H}\). Then we can write \(\psi \in {\mathbb {S}}(\mathcal {H})\) as

$$\begin{aligned} \psi = \sum _m a_m \varphi _m \end{aligned}$$
(62)

with coefficients \(a_m = \langle \varphi _m|\psi \rangle \). By (61), we get that

$$\begin{aligned} {\mathbb {E}}\bigl [\langle \psi |B|\psi \rangle ^* \langle \psi |C|\psi \rangle \bigr ]&= \sum _{k,l,k',l'} \langle \varphi _k|B^\dagger |\varphi _l\rangle \langle \varphi _{k'}|C|\varphi _{l'}\rangle {\mathbb {E}}\left( a_{k}^* a_l a_{k'}^* a_{l'}\right) \end{aligned}$$
(63)
$$\begin{aligned}&= \frac{1}{d(d+1)} \sum _{k,l,k',l'} \langle \varphi _k|B^\dagger |\varphi _l\rangle \langle \varphi _{k'}|C|\varphi _{l'}\rangle \left( \delta _{kl} \delta _{k'l'}+\delta _{kl'}\delta _{k'l}\right) \end{aligned}$$
(64)
$$\begin{aligned}&= \frac{1}{d(d+1)} \left( \sum _{k,l} \langle \varphi _k|B^\dagger |\varphi _k\rangle \langle \varphi _l|C|\varphi _l\rangle + \langle \varphi _k|B^\dagger |\varphi _l\rangle \langle \varphi _l|C|\varphi _k\rangle \right) \end{aligned}$$
(65)
$$\begin{aligned}&= \frac{1}{d(d+1)} \bigl ({{\,\textrm{tr}\,}}(B^\dagger ) {{\,\textrm{tr}\,}}(C) + {{\,\textrm{tr}\,}}(B^\dagger C)\bigr ). \end{aligned}$$
(66)

Thus,

$$\begin{aligned} \textrm{Cov}\bigl [ \langle \psi |B|\psi \rangle , \langle \psi |C|\psi \rangle \bigr ]&= \mathbb {E}\bigl [ \langle \psi |B|\psi \rangle ^* \langle \psi |C|\psi \rangle \bigr ] - \mathbb {E}\bigl [ \langle \psi |B|\psi \rangle ^*\bigr ] \, \mathbb {E}\bigl [ \langle \psi |C|\psi \rangle \bigr ] \end{aligned}$$
(67)
$$\begin{aligned}&= \frac{{{\,\textrm{tr}\,}}(B^\dagger ) {{\,\textrm{tr}\,}}(C) + {{\,\textrm{tr}\,}}(B^\dagger C)}{d(d+1)} -\frac{{{\,\textrm{tr}\,}}(B^\dagger ) {{\,\textrm{tr}\,}}(C)}{d^2} \end{aligned}$$
(68)
$$\begin{aligned}&= \frac{{{\,\textrm{tr}\,}}(B^\dagger C)}{d(d+1)} - \frac{{{\,\textrm{tr}\,}}(B^\dagger ) {{\,\textrm{tr}\,}}(C)}{d^2(d+1)}. \end{aligned}$$
(69)

\(\square \)

8.4 Computing and Estimating some Averages over \(\varvec{{\mathbb {S}}(\mathcal {H}_\mu )}\)

As a preparation for the proof of Theorem 4, we derive in this section some upper bounds for relevant time and Hilbert space variances. We first note that it is well known that the limit in

$$\begin{aligned} M_{\psi _0 B} = \overline{\langle \psi _t|B|\psi _t\rangle } := \lim _{T\rightarrow \infty } \frac{1}{T}\int _0^T \langle \psi _t|B|\psi _t\rangle \, dt \end{aligned}$$
(70)

exists for all B and is given by

$$\begin{aligned} M_{\psi _0 B} = \langle \psi _0|\sum _{e\in {\mathcal {E}}}\Pi _e B \Pi _e|\psi _0 \rangle . \end{aligned}$$
(71)

From (19), applied to \(\mathcal {H}_\mu \), we then obtain that

$$\begin{aligned} {\mathbb {E}}_\mu M_{\psi _0 B} = \frac{1}{d_\mu }\sum _{e\in {\mathcal {E}}} {{\,\textrm{tr}\,}}(P_\mu \Pi _e B \Pi _e)= M_{\mu B}. \end{aligned}$$
(72)

Proposition 1

Let \(\psi _0\) be uniformly distributed in \({\mathbb {S}}(\mathcal {H}_\mu )\), and let B be any operator on \(\mathcal {H}\). Then for every \(\kappa , T>0\),

$$\begin{aligned} {\mathbb {E}}_\mu \left( \left\langle \left| \langle \psi _t|B|\psi _t\rangle - \overline{\langle \psi _t|B|\psi _t\rangle }\right| ^2\right\rangle _T\right)&\le \frac{2D_E G(\kappa )}{d_\mu +1} \left( 1+\frac{8\log _2 d_E}{\kappa T}\right) \min \left\{ \Vert B\Vert ^2, \frac{{{\,\textrm{tr}\,}}(B^\dagger B)}{d_\mu }\right\} , \end{aligned}$$
(73)
$$\begin{aligned} {{\,\textrm{Var}\,}}_\mu \overline{\langle \psi _t|B|\psi _t\rangle }&\le \frac{\Vert B\Vert }{d_\mu +1} \min \left\{ \Vert B\Vert , \frac{{{\,\textrm{tr}\,}}(|B|)}{d_\mu } \right\} . \end{aligned}$$
(74)

Proof

We start similarly to the proof of Theorem 1 in [25] and compute

$$\begin{aligned}&\left\langle \left| \langle \psi _t|B|\psi _t\rangle - \overline{\langle \psi _t|B|\psi _t\rangle }\right| ^2 \right\rangle _T = \left\langle \left| \sum _{e,e'} e^{i(e-e')t} \langle \psi _0 |\Pi _e B \Pi _{e'}|\psi _0\rangle - \sum _e \langle \psi _0| \Pi _e B \Pi _e|\psi _0\rangle \right| ^2 \right\rangle _T \end{aligned}$$
(75)
$$\begin{aligned}&\quad = \left\langle \left| \sum _{e\ne e'} e^{i(e-e')t} \langle \psi _0| \Pi _e B \Pi _{e'}|\psi _0\rangle \right| ^2 \right\rangle _T \end{aligned}$$
(76)
$$\begin{aligned}&\quad = \sum _{\begin{array}{c} e\ne e'\\ e''\ne e''' \end{array}} \left\langle e^{i(e-e'-e''+e''')t}\right\rangle _T \langle \psi _0| \Pi _e B \Pi _{e'}|\psi _0\rangle \langle \psi _0| \Pi _{e'''} B^\dagger \Pi _{e''}|\psi _0\rangle . \end{aligned}$$
(77)

By averaging over \(\psi _0\in {\mathbb {S}}(\mathcal {H}_\mu )\), we obtain

$$\begin{aligned}&{\mathbb {E}}_\mu \left( \left\langle \left| \langle \psi _t|B|\psi _t\rangle - \overline{\langle \psi _t|B|\psi _t\rangle } \right| ^2 \right\rangle _T \right) \nonumber \\&= \sum _{\begin{array}{c} e\ne e'\\ e''\ne e''' \end{array}} \left\langle e^{i(e-e'-e''+e''')t} \right\rangle _T {\mathbb {E}}_\mu \Bigl [\langle \psi _0| \Pi _e B \Pi _{e'}|\psi _0\rangle \langle \psi _0| \Pi _{e'''} B^\dagger \Pi _{e''}|\psi _0\rangle \Bigr ] \end{aligned}$$
(78)
$$\begin{aligned}&= \frac{1}{d_\mu (d_\mu +1)} \sum _{\begin{array}{c} e\ne e'\\ e''\ne e''' \end{array}} \left\langle e^{i(e-e'-e''+e''')t} \right\rangle _T \Bigl [{{\,\textrm{tr}\,}}(P_\mu \Pi _e B \Pi _{e'}) {{\,\textrm{tr}\,}}(P_\mu \Pi _{e'''} B^\dagger \Pi _{e''}) \nonumber \\&\quad + {{\,\textrm{tr}\,}}(P_\mu \Pi _e B \Pi _{e'} P_\mu \Pi _{e'''} B^\dagger \Pi _{e''})\Bigr ], \end{aligned}$$
(79)

where we applied Lemma 1 in the form (41) in the second equality.

Next we compute the ensemble variance of \(\overline{\langle \psi _t|B|\psi _t\rangle }\): By (71) and (20) for \(\mathcal {H}_\mu \),

$$\begin{aligned}&{{\,\textrm{Var}\,}}_\mu \overline{\langle \psi _t|B|\psi _t\rangle } \nonumber \\&= \frac{1}{d_\mu (d_\mu +1)} \sum _{e,e'} {{\,\textrm{tr}\,}}(P_\mu \Pi _e B \Pi _e P_\mu \Pi _{e'} B^\dagger \Pi _{e'}) - \frac{1}{d_\mu ^2(d_\mu +1)} \left| \sum _e {{\,\textrm{tr}\,}}(P_\mu \Pi _e B \Pi _e)\right| ^2. \end{aligned}$$
(80)

In the rest of the proof we use the computed expressions to prove the upper bounds for \({\mathbb {E}}_\mu \left( \langle \left| \langle \psi _t|B|\psi _t\rangle - M_{\psi _0 B}\right| ^2 \rangle _T \right) \) and \( {{\,\textrm{Var}\,}}_\mu \overline{\langle \psi _t|B|\psi _t\rangle }\). To this end, we define for \(\alpha = (e,e') \in {\mathcal {G}} := \{({\bar{e}},{\bar{e}}')\in {\mathcal {E}}\times {\mathcal {E}}, {\bar{e}}\ne {\bar{e}}'\}\) the vector \(v_\alpha := \langle \psi _0| \Pi _{e'} B^\dagger \Pi _e|\psi _0\rangle \). Moreover, we define the Hermitian matrix

$$\begin{aligned} R_{\alpha \beta } := \left\langle e^{i(G_\alpha -G_\beta )t} \right\rangle _T \end{aligned}$$
(81)

with \(G_\alpha :=e-e'\) for \(\alpha =(e,e')\). Then we obtain with (77) that

$$\begin{aligned} \left\langle \left| \langle \psi _t|B|\psi _t\rangle - \overline{\langle \psi _t|B|\psi _t\rangle }\right| ^2 \right\rangle _T&= \sum _{\alpha ,\beta } v_\alpha ^* R_{\alpha \beta } v_\beta \end{aligned}$$
(82)
$$\begin{aligned}&\le \Vert R\Vert \sum _{\alpha } |v_\alpha |^2 \end{aligned}$$
(83)
$$\begin{aligned}&= \Vert R\Vert \sum _{e\ne e'} \bigl |\langle \psi _0|\Pi _e B \Pi _{e'}|\psi _0\rangle \bigr |^2 \end{aligned}$$
(84)

and thus

$$\begin{aligned}&{\mathbb {E}}_\mu \left( \left\langle \left| \langle \psi _t|B|\psi _t\rangle - \overline{\langle \psi _t|B|\psi _t\rangle } \right| ^2 \right\rangle _T\right) \nonumber \\&\le \Vert R\Vert \sum _{e,e'} {\mathbb {E}}_\mu \bigl [\langle \psi _0|\Pi _e B \Pi _{e'}|\psi _0\rangle \langle \psi _0|\Pi _{e'} B^\dagger \Pi _e|\psi _0\rangle \bigr ] \end{aligned}$$
(85)
$$\begin{aligned}&= \frac{\Vert R\Vert }{d_\mu (d_\mu +1)} \sum _{e,e'} \left[ \left| {{\,\textrm{tr}\,}}(P_\mu \Pi _e B \Pi _{e'})\right| ^2 + {{\,\textrm{tr}\,}}(P_\mu \Pi _e B \Pi _{e'} P_\mu \Pi _{e'} B^\dagger \Pi _e)\right] \end{aligned}$$
(86)

by (41). Short and Farrelly [25] showed for arbitrary \(\kappa >0\) and \(T>0\) that

$$\begin{aligned} \Vert R\Vert \le G(\kappa ) \left( 1+\frac{8\log _2 d_E}{\kappa T}\right) . \end{aligned}$$
(87)

Moreover, we estimate

$$\begin{aligned} \sum _{e,e'} |{{\,\textrm{tr}\,}}(P_\mu \Pi _e B \Pi _{e'})|^2&= \sum _{e,e'} |{{\,\textrm{tr}\,}}(\Pi _{e'} P_\mu \Pi _e \Pi _e B \Pi _{e'})|^2 \end{aligned}$$
(88)
$$\begin{aligned}&\le \sum _{e,e'} \underbrace{{{\,\textrm{tr}\,}}(\Pi _{e'} P_\mu \Pi _e P_\mu )}_{\le {{\,\textrm{tr}\,}}(\Pi _{e'})\le D_E} {{\,\textrm{tr}\,}}(\Pi _{e'} B^\dagger \Pi _e B) \end{aligned}$$
(89)
$$\begin{aligned}&\le D_E {{\,\textrm{tr}\,}}(B^\dagger B), \end{aligned}$$
(90)

where we used the Cauchy-Schwarz inequality for operators AB with scalar product \({{\,\textrm{tr}\,}}(A^\dagger B)\) and that \(|{{\,\textrm{tr}\,}}(CD)|\le \Vert C\Vert {{\,\textrm{tr}\,}}(|D|)\). Similarly we find that

$$\begin{aligned} \sum _{e,e'} |{{\,\textrm{tr}\,}}(P_\mu \Pi _e A \Pi _{e'})|^2&\le \sum _{e,e'} {{\,\textrm{tr}\,}}(\Pi _{e'}P_\mu \Pi _e P_\mu ) \underbrace{{{\,\textrm{tr}\,}}(\Pi _{e'} B^\dagger \Pi _e B)}_{\le {{\,\textrm{tr}\,}}(\Pi _{e'})\Vert B\Vert ^2} \end{aligned}$$
(91)
$$\begin{aligned}&\le D_E \Vert B\Vert ^2 d_\mu . \end{aligned}$$
(92)

This shows that

$$\begin{aligned} \sum _{e,e'} |{{\,\textrm{tr}\,}}(P_\mu \Pi _e B \Pi _{e'})|^2&\le D_E \min \{\Vert B\Vert ^2 d_\mu , {{\,\textrm{tr}\,}}(B^\dagger B)\}. \end{aligned}$$
(93)

Next we compute

$$\begin{aligned} \sum _{e,e'} {{\,\textrm{tr}\,}}(P_\mu \Pi _e B \Pi _{e'} P_\mu \Pi _{e'} B^\dagger \Pi _e)&= \sum _e {{\,\textrm{tr}\,}}\left( \Pi _e P_\mu \Pi _e B \left( \sum _{e'} \Pi _{e'}P_\mu \Pi _{e'}\right) B^\dagger \right) \end{aligned}$$
(94)
$$\begin{aligned}&\qquad \le \sum _{e} {{\,\textrm{tr}\,}}(\Pi _e P_\mu \Pi _e) \left\| B \left( \sum _{e'} \Pi _{e'} P_\mu \Pi _{e'}\right) B^\dagger \right\| \end{aligned}$$
(95)
$$\begin{aligned}&\qquad \le \Vert B\Vert ^2 \sum _{e} {{\,\textrm{tr}\,}}(\Pi _e P_\mu ) \end{aligned}$$
(96)
$$\begin{aligned}&\qquad = \Vert B\Vert ^2 d_\mu , \end{aligned}$$
(97)

where we used in the third line that \(\Vert \sum _{e'} \Pi _{e'}P_\mu \Pi _{e'}\Vert \le 1\), which follows immediately from

$$\begin{aligned} \left\| \sum _{e'} \Pi _{e'}P_\mu \Pi _{e'} \psi _0 \right\| ^2 = \sum _{e'} \Vert \Pi _{e'}P_\mu \Pi _{e'}\psi _0\Vert ^2&\le \sum _{e'} \Vert \Pi _{e'}\psi _0\Vert ^2 = \Vert \psi _0\Vert ^2. \end{aligned}$$
(98)

Similarly we estimate

$$\begin{aligned} \sum _{e,e'} {{\,\textrm{tr}\,}}(P_\mu \Pi _e B \Pi _{e'} P_\mu \Pi _{e'} B^\dagger \Pi _e)&= \sum _{e'} {{\,\textrm{tr}\,}}\left( \left( \sum _e \Pi _e P_\mu \Pi _e\right) B \Pi _{e'} P_\mu \Pi _{e'} B^\dagger \right) \end{aligned}$$
(99)
$$\begin{aligned}&\qquad \le \sum _{e'} {{\,\textrm{tr}\,}}(B \Pi _{e'} P_\mu \Pi _{e'} B^\dagger ) \end{aligned}$$
(100)
$$\begin{aligned}&\qquad = \sum _{e'} {{\,\textrm{tr}\,}}(\Pi _{e'}B^\dagger B \Pi _{e'} \Pi _{e'} P_\mu \Pi _{e'}) \end{aligned}$$
(101)
$$\begin{aligned}&\qquad \le \sum _{e'} {{\,\textrm{tr}\,}}(\Pi _{e'} B^\dagger B) \end{aligned}$$
(102)
$$\begin{aligned}&\qquad = {{\,\textrm{tr}\,}}(B^\dagger B). \end{aligned}$$
(103)

The previous two estimates show that

$$\begin{aligned} \sum _{e,e'} {{\,\textrm{tr}\,}}(P_\mu \Pi _e B \Pi _{e'} P_\mu \Pi _{e'} B^\dagger \Pi _e) \le \min \{\Vert B\Vert ^2 d_\mu , {{\,\textrm{tr}\,}}(B^\dagger B)\}. \end{aligned}$$
(104)

Putting everything together, we arrive at the upper bound

$$\begin{aligned} {\mathbb {E}}_\mu \left( \left\langle \left| \langle \psi _t|B|\psi _t\rangle - \overline{\langle \psi _t|B|\psi _t\rangle } \right| ^2 \right\rangle _T\right) \le \frac{2D_E G(\kappa )}{d_\mu +1} \left( 1+\frac{8\log _2 d_E}{\kappa T}\right) \min \left\{ \Vert B\Vert ^2, \frac{{{\,\textrm{tr}\,}}(B^\dagger B)}{d_\mu }\right\} . \end{aligned}$$
(105)

Finally we turn to the upper bound for \({{\,\textrm{Var}\,}}_\mu \overline{\langle \psi _t|B|\psi _t\rangle }\). To this end, we estimate

$$\begin{aligned} \sum _{e,e'}{{\,\textrm{tr}\,}}(P_\mu \Pi _e B \Pi _e P_\mu \Pi _{e'} B^\dagger \Pi _{e'})&= {{\,\textrm{tr}\,}}\left( P_\mu \left( \sum _e \Pi _e B \Pi _e\right) P_\mu \left( \sum _{e'} \Pi _{e'} B^\dagger \Pi _{e'}\right) \right) \end{aligned}$$
(106)
$$\begin{aligned}&\le {{\,\textrm{tr}\,}}(P_\mu ) \left\| \left( \sum _e \Pi _e B \Pi _e\right) P_\mu \left( \sum _{e'} \Pi _{e'} B^\dagger \Pi _{e'}\right) \right\| \end{aligned}$$
(107)
$$\begin{aligned}&\le d_\mu \Vert B\Vert ^2 \end{aligned}$$
(108)

and

$$\begin{aligned} \sum _{e,e'}{{\,\textrm{tr}\,}}(P_\mu \Pi _e B \Pi _e P_\mu \Pi _{e'} B^\dagger \Pi _{e'})&= {{\,\textrm{tr}\,}}\left( B \left( \sum _e \Pi _e P_\mu \left( \sum _{e'} \Pi _{e'} B^\dagger \Pi _{e'}\right) P_\mu \Pi _e\right) \right) \end{aligned}$$
(109)
$$\begin{aligned}&\le {{\,\textrm{tr}\,}}(|B|) \left\| \sum _e \Pi _e P_\mu \left( \sum _{e'} \Pi _{e'} B^\dagger \Pi _{e'}\right) P_\mu \Pi _e\right\| \end{aligned}$$
(110)
$$\begin{aligned}&\le {{\,\textrm{tr}\,}}(|B|) \left\| P_\mu \left( \sum _{e'} \Pi _{e'}B^\dagger \Pi _{e'}\right) P_\mu \right\| \end{aligned}$$
(111)
$$\begin{aligned}&\le {{\,\textrm{tr}\,}}(|B|)\Vert B\Vert . \end{aligned}$$
(112)

This shows that

$$\begin{aligned} \sum _{e,e'}{{\,\textrm{tr}\,}}(P_\mu \Pi _e B \Pi _e P_\mu \Pi _{e'} B^\dagger \Pi _{e'})&\le \Vert B\Vert \min \left\{ d_\mu \Vert B\Vert , {{\,\textrm{tr}\,}}(|B|)\right\} \end{aligned}$$
(113)

and thus

$$\begin{aligned} {{\,\textrm{Var}\,}}_\mu \overline{\langle \psi _t|B|\psi _t\rangle }&\le \frac{1}{d_\mu (d_\mu +1)} \sum _{e,e'}{{\,\textrm{tr}\,}}(P_\mu \Pi _e B \Pi _e P_\mu \Pi _{e'} B^\dagger \Pi _{e'}) \end{aligned}$$
(114)
$$\begin{aligned}&\le \frac{\Vert B\Vert }{d_\mu +1} \min \left\{ \Vert B\Vert , \frac{{{\,\textrm{tr}\,}}(|B|)}{d_\mu } \right\} . \end{aligned}$$
(115)

\(\square \)

8.5 Proofs of Theorems 2 and 4

Theorem 2 follows immediately from Theorem 4 by setting \(B=P_\nu \), choosing \(\kappa \) small enough such that \(G(\kappa )=D_G\), and then taking the limit \(T\rightarrow \infty \).

Proof of Theorem 4

Markov’s inequality implies

$$\begin{aligned}&{\mathbb {P}}_\mu \left( \left\langle \left| \langle \psi _t|B|\psi _t\rangle - M_{\psi _0 B} \right| ^2 \right\rangle _T \ge \frac{4D_E G(\kappa ) \Vert B\Vert }{\varepsilon d_\mu } \left( 1+\frac{8\log _2 d_E}{\kappa T}\right) \min \left\{ \Vert B\Vert , \frac{{{\,\textrm{tr}\,}}(|B|)}{d_\mu } \right\} \right) \nonumber \\&\qquad \le \frac{{\mathbb {E}}_\mu \left( \left\langle \left| \langle \psi _t|B|\psi _t\rangle - M_{\psi _0 B}\right| ^2 \right\rangle _T\right) }{4 D_E G(\kappa ) \Vert B\Vert \left( 1+\frac{8\log _2 d_E}{\kappa T}\right) \min \left\{ \Vert B\Vert , \frac{{{\,\textrm{tr}\,}}(|B|)}{d_\mu } \right\} } \varepsilon d_\mu \end{aligned}$$
(116)
$$\begin{aligned}&\qquad \le \frac{\min \left\{ \right\| B\Vert ^2, \frac{{{\,\textrm{tr}\,}}(B^\dagger B)}{d_\mu }\}}{2\Vert B\Vert \min \left\{ \Vert B\Vert , \frac{{{\,\textrm{tr}\,}}(|B|)}{d_\mu }\right\} }\varepsilon \end{aligned}$$
(117)
$$\begin{aligned}&\qquad \le \frac{\varepsilon }{2}, \end{aligned}$$
(118)

where we used the bounds from Proposition 1 and that \({{\,\textrm{tr}\,}}(B^\dagger B) \le \Vert B\Vert {{\,\textrm{tr}\,}}(|B|)\). This means that for \((1-\frac{\varepsilon }{2})\)-most \(\psi _0\in {\mathbb {S}}(\mathcal {H}_\mu )\),

$$\begin{aligned} \left\langle \left| \langle \psi _t|B|\psi _t\rangle - M_{\psi _0 B} \right| ^2 \right\rangle _T < \frac{4D_E G(\kappa ) \Vert B\Vert }{\varepsilon d_\mu } \left( 1+\frac{8\log _2 d_E}{\kappa T}\right) \min \left\{ \Vert B\Vert , \frac{{{\,\textrm{tr}\,}}(|B|)}{d_\mu } \right\} . \end{aligned}$$
(119)

Again with the help of Markov’s inequality we obtain that, with \(\lambda \) the Lebesgue measure on \({\mathbb {R}}\),

$$\begin{aligned}&\frac{\lambda \Bigl \{t\in [0,T]:\left| \langle \psi _t|B|\psi _t\rangle - M_{\psi _0 B} \right| ^2 \ge \frac{4D_E G(\kappa ) \Vert B\Vert }{\delta \varepsilon d_\mu } \left( 1+\frac{8\log _2 d_E}{\kappa T}\right) \min \left\{ \Vert B\Vert , \frac{{{\,\textrm{tr}\,}}(|B|)}{d_\mu }\right\} \Bigr \}}{T} \end{aligned}$$
(120)
$$\begin{aligned}&\qquad \le \frac{\delta \varepsilon d_\mu \left\langle \left| \langle \psi _t|B|\psi _t\rangle - M_{\psi _0 B} \right| ^2\right\rangle _T}{4D_E G(\kappa ) \Vert B\Vert \left( 1+\frac{8\log _2 d_E}{\kappa T}\right) \min \left\{ \Vert B\Vert , \frac{{{\,\textrm{tr}\,}}(|B|)}{d_\mu } \right\} } \end{aligned}$$
(121)
$$\begin{aligned}&\qquad \le \delta . \end{aligned}$$
(122)

This shows that for \((1-\frac{\varepsilon }{2})\)-most \(\psi _0\in {\mathbb {S}}(\mathcal {H}_\mu )\) we have for \((1-\delta )\)-most \(t\in [0,T]\) that

$$\begin{aligned} \left| \langle \psi _t|B|\psi _t\rangle - M_{\psi _0 B}\right| \le 2\left( \frac{D_E G(\kappa ) \Vert B\Vert }{\delta \varepsilon d_\mu }\left( 1+\frac{8\log _2 d_E}{\kappa T}\right) \min \left\{ \Vert B\Vert , \frac{{{\,\textrm{tr}\,}}(|B|)}{d_\mu }\right\} \right) ^{1/2}. \end{aligned}$$
(123)

Next we prove in a similar way an upper bound for \(|M_{\psi _0 B}-M_{\mu B}|\), keeping in mind that \(M_{\mu B} = {\mathbb {E}}_\mu M_{\psi _0 B}\). An application of Chebyshev’s inequality and Proposition 1 shows that

$$\begin{aligned} {\mathbb {P}}_\mu \left( |M_{\psi _0 B}-M_{\mu B}|\ge \sqrt{\frac{2 \Vert B\Vert \min \left\{ \Vert B\Vert ,\frac{{{\,\textrm{tr}\,}}(|B|)}{d_\mu }\right\} }{d_\mu \varepsilon }}\right)&\le \frac{{{\,\textrm{Var}\,}}_\mu \overline{\langle \psi _t|B|\psi _t\rangle }}{2\Vert B\Vert \min \left\{ \Vert B\Vert , \frac{{{\,\textrm{tr}\,}}(|B|)}{d_\mu }\right\} } d_\mu \varepsilon \end{aligned}$$
(124)
$$\begin{aligned}&\le \frac{\varepsilon }{2}. \end{aligned}$$
(125)

This implies for \((1-\frac{\varepsilon }{2})\)-most \(\psi _0\in {\mathbb {S}}(\mathcal {H}_\mu )\) that

$$\begin{aligned} |M_{\psi _0 B}-M_{\mu B}| \le \sqrt{2}\left( \frac{\Vert B\Vert }{\varepsilon d_\mu }\min \left\{ \Vert B\Vert ,\frac{{{\,\textrm{tr}\,}}(|B|)}{d_\mu }\right\} \right) ^{1/2}. \end{aligned}$$
(126)

With the triangle inequality we finally obtain the stated upper bound for \(|\langle \psi _t|B|\psi _t\rangle - M_{\mu B}|\).

\(\square \)

8.6 Proof of Corollary 1

From Theorem 4 we obtain immediately that for \((1-\varepsilon )\)-most \(\psi _0\in {\mathbb {S}}(\mathcal {H}_\mu )\) for \((1-\delta )\)-most of the time

$$\begin{aligned} \biggl |\Vert P_{\nu _+}\psi _t\Vert ^2 - M_{\mu \nu _+}\biggr |&\le 4\sqrt{\frac{D_E D_G}{\delta \varepsilon }} \exp \left( -\frac{s_\mu N}{2k_B}\right) \min \left\{ 1,\exp \left( \frac{(s_{\nu _+}-s_\mu )N}{2k_B}\right) \right\} \end{aligned}$$
(127)
$$\begin{aligned}&= 4 \frac{\sqrt{D_E D_G}}{\sqrt{\varepsilon \delta }} \exp \left( -\frac{s_\mu N}{2k_B}\right) . \end{aligned}$$
(128)

Similarly, we find for \((1-\varepsilon )\)-most \(\psi _0\in {\mathbb {S}}(\mathcal {H}_\mu )\) for \((1-\delta )\)-most of the time that

$$\begin{aligned} \biggl |\Vert P_{\nu _-}\psi _t\Vert ^2 - M_{\mu \nu _-}\biggr |&\le 4\sqrt{\frac{D_E D_G}{\varepsilon \delta }} \exp \left( -\frac{s_\mu N}{2k_B}\right) \min \left\{ 1,\exp \left( \frac{(s_{\nu _-}-s_\mu )N}{2k_B}\right) \right\} \end{aligned}$$
(129)
$$\begin{aligned}&= 4\frac{\sqrt{D_E D_G}}{\sqrt{\varepsilon \delta }} \exp \left( -\frac{(s_\mu -\frac{s_{\nu _-}}{2})N}{k_B}\right) . \end{aligned}$$
(130)

This finishes the proof.\(\square \)

8.7 Alternative Estimate in Terms of Effective Dimension

In Proposition 1, we have provided two upper bounds (73) for

$$\begin{aligned} {\mathbb {E}}_\mu \left( \left\langle \left| \langle \psi _t|B|\psi _t\rangle - \overline{\langle \psi _t|B|\psi _t\rangle } \right| ^2 \right\rangle _T\right) . \end{aligned}$$

There is an alternative way of obtaining one of the two bounds in (73) using a result of Short and Farrelly [25] based on the concept of effective dimension. We briefly explain this alternative derivation and then comment on why we also need the other bound in (73).

In [25] the authors show that

$$\begin{aligned} \left\langle \left| \langle \psi _t|B|\psi _t\rangle - \overline{\langle \psi _t|B|\psi _t\rangle } \right| ^2 \right\rangle _T&\le \frac{G(\kappa ) \Vert B\Vert ^2}{d_{eff }} \left( 1+\frac{8\log _2 d_E}{\kappa T}\right) , \end{aligned}$$
(131)

where the effective dimension \(d_{eff } = d_{eff }(\psi _0)\) of a state \(\psi _0\) is

$$\begin{aligned} d_{eff } = \left( \sum _e \langle \psi _0|\Pi _e|\psi _0\rangle ^2\right) ^{-1}. \end{aligned}$$
(132)

Taking an average over \(\psi _0\in {\mathbb {S}}(\mathcal {H}_\mu )\) yields the bound

$$\begin{aligned} {\mathbb {E}}_\mu \left( \left\langle \left| \langle \psi _t|B|\psi _t\rangle - \overline{\langle \psi _t|B|\psi _t\rangle } \right| ^2 \right\rangle _T\right)&\le \frac{2D_E G(\kappa ) \Vert B\Vert ^2}{d_\mu +1}\left( 1+\frac{8\log _2 d_E}{\kappa T}\right) . \end{aligned}$$
(133)

To see this, note that the only quantity on the right-hand side of (131) that depends on \(\psi _0\) is the effective dimension \(d_{eff }\); therefore, it suffices to estimate \({\mathbb {E}}_\mu d_{eff }^{-1}\). With the help of (41) and the usual arguments we find

$$\begin{aligned} {\mathbb {E}}_\mu d_{eff }^{-1}&= \sum _e {\mathbb {E}}_\mu \left( \langle \psi _0|\Pi _e|\psi _0\rangle \langle \psi _0|\Pi _e|\psi _0\rangle \right) \end{aligned}$$
(134)
$$\begin{aligned}&= \frac{1}{d_\mu (d_\mu +1)}\left( {{\,\textrm{tr}\,}}(P_\mu \Pi _e)^2+{{\,\textrm{tr}\,}}(P_\mu \Pi _e P_\mu \Pi _e)\right) \end{aligned}$$
(135)
$$\begin{aligned}&\le \frac{1}{d_\mu (d_\mu +1)}\sum _e \left( \underbrace{{{\,\textrm{tr}\,}}(\Pi _e)}_{\le D_E}{{\,\textrm{tr}\,}}(P_\mu \Pi _e)+ {{\,\textrm{tr}\,}}(P_\mu \Pi _eP_\mu )\right) \end{aligned}$$
(136)
$$\begin{aligned}&\le \frac{2D_E}{d_\mu (d_\mu +1)}\sum _e {{\,\textrm{tr}\,}}(P_\mu \Pi _e) \end{aligned}$$
(137)
$$\begin{aligned}&= \frac{2D_E}{d_\mu +1}\,, \end{aligned}$$
(138)

and (133) immediately follows.

The second estimate in Proposition 1 is sharper than (133) if and only if \({{\,\textrm{tr}\,}}(B^\dagger B)/d_\mu < \Vert B\Vert ^2\), i.e., roughly speaking, if only few (compared to \(d_\mu \)) eigenvalues of \(B^\dagger B\) are close to the largest eigenvalue and most are much smaller. This becomes relevant, for example, when estimating the transitions from \(\mathcal {H}_\mu \) into a lower entropy macro space \(\mathcal {H}_\nu \), cf. (45). Then \(B=P_\nu \) and

$$\begin{aligned} {{\,\textrm{tr}\,}}(B^\dagger B)/d_\mu = d_\nu /d_\mu \ll 1 = \Vert P_\nu \Vert ^2. \end{aligned}$$

9 Conclusions

Our results concern the behavior of typical pure states \(\psi _0\) from a high-dimensional subspace \(\mathcal {H}_\mu \) of Hilbert space under the unitary time evolution. We find that for any operator B, due to the large dimension, the curve \(t\mapsto \langle \psi _t|B|\psi _t \rangle \) is nearly deterministic (a fact that can also be obtained from [1, 23]), and that in the long run \(t\rightarrow \infty \) it is nearly constant. In von Neumann’s framework of an orthogonal decomposition \(\mathcal {H}=\oplus _\nu \mathcal {H}_\nu \) into macro spaces, this means that the time-dependent distribution over the macro states given by the superposition weights \(\Vert P_\nu \psi _t\Vert ^2\) is nearly deterministic and in the long run nearly constant, i.e., it reaches normal equilibrium, a situation analogous (but not identical) to thermal equilibrium. Through our theorems, we have provided explicit error bounds.

Von Neumann’s [17] prior result in the same direction was based on unrealistic assumptions, saying essentially that H is unrelated to \(\mathcal {H}_\nu \). Our result has the advantage of being applicable regardless of relations between H and \(\mathcal {H}_\nu \). The question of whether the deviation from the mean is small compared to the mean even when the mean is small itself, will be analyzed further elsewhere [29].