1 Introduction

Random unitaries are central resources in quantum information science. They appear in many applications including algorithms, cryptography, and communication. Moreover, they are important toy models for random chaotic systems, capturing phenomena like thermalization or scrambling of quantum information.

An idealized model of a random unitary is the uniform distribution over the unitary group, also known as the Haar measure. However, the Haar measure is an unrealistic model for large systems because the number of random coins and gates needed to generate an element of the Haar distribution scale exponentially with the size of the system (i.e. polynomially with dimension, meaning exponentially in the number of qubits or independent degrees of freedom). To resolve this dilemma, approximate t-designs have been proposed as physically and computationally realistic alternatives to the Haar measure. They approximate the behavior of the Haar measure if one only cares about up to the first t moments.

Several constructions of t-designs have been proposed based on either random or structured circuits. While structured circuits can in some cases be more efficient [22, 25, 48], random quantum circuits have other advantages. They are plausible models for chaotic random processes in nature, such as scrambling in black holes [17, 55], or the spread of entanglement in condensed matter systems [46, 47], growth of quantum complexity [12], and decoupling [18]. Moreover, they are practical candidates to benchmark computational advantage for quantum computation over classical models, since they seem to capture the power of a generic polynomial-size unitary circuit. Indeed, the Google quantum AI group has recently run a random unitary circuit on a 53-qubit superconducting device and has argued that this should be hard to simulate classically [5, 8] (see Fig. 1 for a demonstration of their proposal). Here the random gates are useful not only for the 2-design property, specifically “anti-concentration”, but also for evading the sort of structure which would lend itself to easy simulation, such as being made of Clifford gates.

All previous random circuit based constructions for t-designs required the circuits to have linear depth. In this paper, we show that certain random circuit models with small depth are approximate t-designs. We consider two models of random circuits. The first one is nearest-neighbor local interactions on a D-dimensional lattice. In this model, we apply random \(\text {U}(d^2)\) gates on neighboring qudits of a D-dimensional lattice in a certain order.

Depending on the application we want, we can define convergence to the Haar measure in different ways. For example, for scrambling [17] we measure convergence w.r.t. the norm \({\mathbb {E}}_C \Vert \rho _S(s) - \frac{1}{2^{|S|}} \Vert ^2_1\), where \(\rho _S(s)\) is the density matrix \(\rho (s)\) reduced to a subset S of qudits and \(\rho (s)\) is the quantum state that results from s steps of the random process. But for anti-concentration, which corresponds loosely to a claim that typical circuit outputs have nearly maximal entropy, we use a norm related to \({\mathbb {E}}_C\sum _x |\langle x|C|0\rangle |^4\). For other measures of convergence to the Haar measure see [42] or Sect. 2.4. In general, these measures are equivalent but moving between them involves factors that are exponentially large in the number of qudits, i.e., if one norm converges to \(\epsilon \) the translation implies that another norm converges to \(2^{O(n)} \epsilon \). Some of the known size/depth bounds for designs are of the form \(O(f(n,t)(n + \ln 1/\epsilon ))\) (e.g. [13]) and in 1-D simple arguments yield an \(\Omega (n + \ln (1/\epsilon ))\) lower bound [17]. In this case, replacing \(\epsilon \) with \(2^{-O(n)}\epsilon \) will not change the asymptotic scaling. [13] defined a strong notion of convergence which implies all the mentioned definitions.

However, in D dimensional lattices the natural lower bound is \(\Omega (n^{1/D}+\ln (1/\epsilon ))\). Our main challenge in this work is to show that this depth bound is asymptotically achievable, and along the way, we need to deal with the fact that we can no longer freely pay norm-conversion costs of \(2^{O(n)}\). We are able to achieve the desired \({{\,\textrm{poly}\,}}(t)(n^{1/D} + \ln (1/\epsilon ))\) in many operationally relevant norms, but due in part to the difficulty of converting between norms, we do not establish it in all cases. The asymptotic dependency on t in our result for \(D =2\) is \(O(t \ln t)\) times the best asymptotic dependency on t for the \(D=1\) architecture, according to the strong measure defined in [13]. [13] gave a bound of \(t^{10.5}\). Recently this bound was improved to \(t^{5 + o_t(1)}\) by Haferkamp [31]. The dependency on t in our result is hence \(t^{6 + o_t(1)} \ln t\).

Approximate unitary designs We will consider several notions of approximate designs in this paper. First, we will introduce some notation. A degree-(tt) monomial in \(C\in \text{ U }(({\mathbb {C}}^d)^{\otimes n})\) is degree t in the entries of C and degree t in the entries of \(C^*\). We can collect all these monomials into a single matrix of dimension \(d^{2nt}\) by defining \(C^{\otimes t,t}:= C^{\otimes t} \otimes C^{*\otimes t}\). We say that \(\mu \) is an exact [unitary] t-design if expectations of all tt moments of \(\mu \) match those of the Haar measure. We can express this succinctly in terms of the operator

$$\begin{aligned} G_\mu ^{(t)} = \mathop {{\mathbb {E}}}\limits _{C \sim \mu } \left[ C^{\otimes t} \otimes C^{*\otimes t} \right] . \end{aligned}$$
(1)

Then \(\mu \) is an exact t-design iff \(G_\mu ^{(t)} = G_{\text {Haar}}^{(t)}\). Since \(G_{\text {Haar}}^{(t)}\) is a projector, we sometimes call \(G_\mu ^{(t)}\) a quasi-projector operator and we will later use the fact that it can sometimes be shown to be very close to a projector.

Most definitions of approximate designs demand that some norm of \(G_\mu ^{(t)} - G_{\text {Haar}}^{(t)}\) be small. Three norms that we will consider are based on viewing \(G_{\mu }^{(t)}\) as either a vector of length \(d^{4nt}\), a matrix of dimension \(d^{2nt}\) or a quantum operation acting on a space of dimension \(d^{nt}\). In each case, one can show that the t-design property implies the \(t'\)-design property for \(1\le t'\le t\).

Definition 1

(Monomial definition of t-designs). \(\mu \) is a monomial-based \(\epsilon \)-approximate t-design if any monomial has expectation within \(\epsilon d^{-nt}\) of that resulting from the Haar measure. In other words,

$$\begin{aligned} \left\| \textsf{vec}\left[ G^{(t)}_\mu \right] - \textsf{vec}\left[ G^{(t)}_\mu \right] \right\| _\infty \le \frac{\epsilon }{d^{nt}}. \end{aligned}$$
(2)

\(\textsf{vec}(A)\) is a vector consisting of the elements of matrix A (in the computational basis) and \(\Vert \cdot \Vert _\infty \) refers to the vector \(\ell _\infty \) norm.

The monomial measure is natural when studying anti-concentration, since a sufficient condition for anti-concentration is that \({\mathbb {E}}_C |\langle 0|C|0\rangle |^4\) is close to the quantity that arises from the Haar measure, namely \( \frac{2}{2^n(2^n+1)}\). This is achieved by [monomial measure] 2-designs.

If the operator-norm distance between \(G_\mu ^{(t)}\) and \(G_\text {Haar}^{(t)}\) is small then instead of calling \(\mu \) an approximate design we call it a t-tensor product expander [36]. This controls the rate at which certain nonlinear (i.e. degree-t polynomial) functions of the state converge to the average value they would have under the Haar measure. We can also measure the distance between \(G_\mu ^{(t)}\) and \(G_\text {Haar}^{(t)}\) in the 1-norm (i.e. trace norm) and this notion of approximate designs has been considered before [4, 54], although it does not have direct operational meaning. We will show \({{\,\textrm{poly}\,}}(t)(n^{1/D}+\ln (1/\epsilon ))\)-depth convergence in each of these measures.

Finally, we can consider \(G_\mu ^{(t)}\) to be a superoperator using the following canonical map. Define \(\text {Ch}\left[ \sum _i X_i \otimes Y_i^T\right] \) by \( \text {Ch}\left[ \sum _i X_i \otimes Y_i^T\right] (Z):= \sum _i X_i Z Y_i\). Thus

$$\begin{aligned} \text {Ch}\left[ G^{(t)}_{\mu }\right] (Z) = \mathop {{\mathbb {E}}}\limits _{C \sim \mu } \left[ C^{\otimes t} Z C^{\dagger \otimes t} \right] . \end{aligned}$$
(3)

Note that \(\text {Ch}\left[ G_{\mu }\right] \) is completely positive and trace preserving, i.e., a quantum channel. For superoperators \({\mathcal {M}},{\mathcal {N}}\) we say that \({\mathcal {M}}\preceq {\mathcal {N}}\) if \({\mathcal {N}}-{\mathcal {M}}\) is a completely positive (cp) map. Based on this ordering, a strong notion of being an approximate design was proposed by Andreas Winter and first appeared in [13].

Definition 2

(Strong definition of t-designs). A distribution \(\mu \) is a strong \(\epsilon \)-approximate t-design if

$$\begin{aligned} (1-\epsilon ) \text {Ch}\left[ G^{(t)}_\text {Haar}\right] \preceq \text {Ch}\left[ G^{(t)}_\mu \right] \preceq (1+\epsilon ) \text {Ch}\left[ G^{(t)}_\text {Haar}\right] . \end{aligned}$$
(4)

Circuit models The result of [13] constructs t-designs in the strong measure (Definition 2) for \(D=1\) and linear depth, and we generalize this result to construct weak monomial designs for arbitrary D and \(O(n^{1/D})\) depth. We also show that the same construction converges to the Haar measure in other norms: diamond, infinity and trace norm. Our proof techniques do not seem to yield t-designs in the strong measure. We do not even know whether the construction of “strong” t-designs in sub-linear depth is possible.

The second model we consider is circuits with long-range two-qubit interactions. In this model, at each step, we pick a pair of qubits uniformly at random and apply a random \(\text {U}(4)\) gate on them. This model is the standard one when considering bounded-depth circuit classes, such as \(\textsf{QNC}\). Physically, it could model chaotic systems with long-range interactions. Following Oliveira, Dahlsten and Plenio [51] (see also [17, 18, 34]), we can map the \(t=2\) moments of this process onto a simple random walk on the points \(\{1,2,\ldots ,n\}\). We map this random walk to the classical (and exactly solvable) Ehrenfest model, meaning a random walk with a linear bias towards the origin. Further challenges are that this mapping introduces random and heavy-tailed delays and that the norm used for anti-concentration is exponentially sensitive to some of the probabilities. However, we are able to show (in Sect. 4) that after \(O(n \ln ^2 n)\) rounds of this process the resulting distribution over the unitary group converges to the Haar measure in the mentioned norm.

For a distribution p its collision probability is defined as \(\text {Coll}(p) = \sum _x p_x^2\). If \(\text {Coll}(p)\) is large (\(\Omega (1)\)) then the support of p is concentrated around a constant number of outcomes, and if it is small (\(\approx 1/2^n\)) then it is anti-concentrated. The norm that we consider for anti-concentration is basically the expected collision probability of the output distribution of a random circuit. The expected collision probability for the Haar measure is \(\frac{2}{2^n+1}\) and our result shows that a typical circuit of size \(O(n \ln ^2 n)\) outputs a distribution with expected collision probability \(\frac{2}{2^n} \left( 1+\frac{1}{{{\,\textrm{poly}\,}}(n)}\right) \). Along with the Paley–Zygmund anti-concentration inequality this result proves that these circuits have the following anti-concentration property:

$$\begin{aligned} \min _x \mathop {\Pr }\limits _{C\sim \mu } \left[ |\langle x|C|0\rangle |^2 \ge \frac{1}{2^{n+1}} \right] \ge \text {constant}. \end{aligned}$$
(5)

Here \(\mu \) is the distribution of random circuits we consider, and x is any n-bit string. This bound is related to the hardness of classical simulation for random circuits. We furthermore show that sub-logarithmic depth quantum circuits in this model have expected collision probability \(\frac{2}{2^n+1} \omega (1)\). The best anti-concentration depth bound we get from this model is \(O(\ln ^2 n)\). However, we are able to construct a natural family of random circuits with depth \(O(\ln n \ln \ln n)\) that are anti-concentrated.

The organization of this paper The rest of this introductory section states the basic results, ideas and implications related to the main results. In particular, in Sect. 1.1 we explain the connections between our result and the result experiments performed by groups such as the Google AI group aiming towards demonstrating the superiority of quantum computing compared with classical computers on specific tasks. In Sect. 1.2, we describe the models we consider in this paper. In Sect. 1.3, we express the main results of this paper including proof sketches and basic ideas. We then give a brief overview of the previous works related to this paper in Sect. 1.4 and several open questions in Sect. 1.5.

The organization of the rest of this paper is as follows. In Sect. 2 we introduce the preliminary concepts, definitions and tools needed for our proofs. In Sect. 3 we give detailed proofs about how we get approximate t-designs on D-dimensional lattices. In Sect. 4 we give detailed proofs related to anti-concentration bounds from circuits with all-to-all interactions. In Sect. 5 we provide alternative proofs for anti-concentration via low-depth D-dimensional lattices and in Sect. 6 we provide improvements on the existing scrambling and decoupling bounds. Appendix A gives a proof of Theorem 3 about the implications of anti-concentration bound we obtain on computational difficulty of simulating low-depth random quantum circuits. Finally Appendix B gives a background about the basic properties of Krawtchouk polynomials which we use in Sect. 4.

1.1 Connections with quantum computational supremacy experiments

Outperforming classical computers, even for an artificial problem such as sampling from the output of an ideal quantum circuit would be a significant milestone for early quantum computers which has recently been called “quantum computational supremacy” [35, 52]. The reason to study quantum computational supremacy in its own right (as opposed to general quantum algorithms) is that it appears to be a distinctly easier task than full-scale quantum computing and even various non-universal forms [2, 3, 8, 14, 15, 29] of quantum computing can be shown to be hard to simulate classically. For example, the outputs of constant-depth quantum circuits cannot be simulated exactly by classical computers unless the \({{\,\mathrm{\textsf{PH}}\,}}\) collapses [56]. In general, families of quantum circuits have this property if they are universal under postselection, meaning that after measuring all the qubits at the end of the circuit and producing a string of bits, we condition on the values of some of these bits and use the other bits for the output.

However, these hardness results are not robust under noise and error in measurements. A central open question in the theory of quantum computational supremacy is whether simulating these distributions to within constant or \(1/{{\,\textrm{poly}\,}}(n)\) variational distance would still be hard. It is plausible to conjecture that if such a robust hardness of sampling is true, it would also hold for generic circuits [1, 3] (although see [49] for a counterexample). A standard approach to proving such a robust hardness result for generic circuits has been to prove that “anti-concentration” holds, and to use this to relate additive error approximation to average-case relative error approximation; see e.g. [16]. Here “anti-concentration” means having near-maximal entropy in the output of a quantum circuit, which implies that any fixed amplitude of a quantum circuit is likely to be \(\ge \frac{\Omega (1)}{2^n}\). This property implies that the complexity of estimating the amplitudes additively (within \(\pm \frac{1}{{{\,\textrm{poly}\,}}(n) \cdot 2^n})\) is on-average as hard as computing them within inverse polynomial relative error. This lets us turn an assumption about the average-case hardness of relative-error approximation of the amplitudes into a hardness result for the sampling problems. Approximate t-designs (and even approximate 2-designs) have the desired “anti-concentration” property.

For experimental verification of quantum computational supremacy we can consider the following sampling task: let \(\mu \) be a distribution over random circuits that satisfies

$$\begin{aligned} \mathop {\Pr }\limits _{C\sim \mu } \left[ |\langle 0|C|0\rangle |^2 \ge \frac{1}{2^{n+1}} \right] \ge 1/8 - 1/{{\,\textrm{poly}\,}}(n). \end{aligned}$$
(6)

(which we call the anti-concentration property). Let \({\mathcal {C}}_x\) be the family of circuits constructed by first applying a circuit \(C \sim \mu \) and then an X gate to each qubit with probability 1/2 (and identity with probability 1/2). A similar line of reasoning as in Bremner-Montanaro-Shepherd (see Theorem 6 and 7 of [16]) implies that

Theorem 3

Fix \(\epsilon >0\) and \(0<\delta <1/8\). Let \(\mu \) be a \(\frac{1}{{{\,\textrm{poly}\,}}(n)}\)-approximate 2-design. If there exists a \({{\,\mathrm{\textsf{BPP}}\,}}\) machine which takes \(C \sim \mu \) as input and for at least \(1-\delta \) fraction of such inputs outputs a probability distribution that is within \(\epsilon \) total variation distance from the probability distribution \(p_x = |\langle x | C| 0\rangle |^2\), then there exists an \({{\,\mathrm{\textsf{FBPP}}\,}}^{{{\,\mathrm{\textsf{NP}}\,}}}\) algorithm that succeeds with probability \(1-\delta \) and computes the value \(|\langle 0|C'|0\rangle |^2\) within multiplicative error \(\frac{2(\epsilon + 1/{{{\,\textrm{poly}\,}}(n)})}{\delta }\) for \(1/8 -\frac{1}{{{\,\textrm{poly}\,}}(n)}\) fraction of circuits \(C' \sim {\mathcal {C}}_x\).

This theorem is proved in Appendix A. If we further conjecture the \({{\,\mathrm{\textsf{PH}}\,}}\) is infinite and that amplitudes of the random circuits in Theorem 3 are \(\#{\textsf{P}}\)-hard to approximate on average, then this implies that classical computers cannot efficiently sample from any distribution close to the ones generated by these circuits. At the moment, it is only known that nearly exact computation of these amplitudes is hard for \(\# {\textsf{P}}\) [10, 44, 45]. It is an open question whether average-case hardness for the approximation task remains \(\#{\textsf{P}}\)-hard.

The linear to sub-linear improvement of the depth required for anti-concentration provided in this paper is likely to be significant for near-term quantum computers that will be constrained both in terms of the number of qubits (n) and noise rate per gate (\(\delta \)). Due to the constraints in the number of qubits (say 50-100), quantum computational supremacy will only be possible without the overhead of error correction, since even the most efficient known schemes for fault-tolerant quantum computation reduce the number of qubits by more than a factor of two [21]. Thus a quantum circuit with S gates will have an expected \(S\delta \) errors. Recent work due to Yung and Gao [57] and the Google group [9] states that noisy random quantum circuits with \(O(\ln n)\) random errors output distributions that are nearly uniform, and thus are trivially classically simulable. Thus S can be at most \(\ln (n)/\delta \). In proposed near-term quantum devices [6, 8, 26, 50] we can expect \(n\sim 10^2\) and \(\delta \sim 10^{-2}\). Thus the \(S=O(n \ln ^2 n)\) for long-range interactions or \( S = O(n \sqrt{n})\) bound for 2-D lattices from our work is much closer to being practical than the previous \(S=O(n^2)\). (This assumes that the constants are reasonable. We have not made an effort to calculate them rigorously but for the case of long-range interactions we do present a heuristic that suggests that in fact \(\approx \frac{5}{6} n\ln n\) gates are necessary and sufficient.)

1.2 Our models

We consider two models of random quantum circuits. The first involves nearest-neighbor local interactions on a D-dimensional lattice and the second involves long-range random two-qubit gates. The order of gates in the first model has some structure but in the second model it is chosen at random. Hence, we can view the second model as the natural dynamics of an n-qubit system, connected as a complete graph.

We first define the following random circuit model for \(D=1\) which was also considered in [13]:

Definition 4

(Random circuits on one-dimensional lattices). \(\mu ^{\text {lattice}, n}_{1, s}\) is the distribution over unitary circuits resulting from the following random process.

figure a

This definition assumes that n is even but we modify it in the obvious way when n is odd. Another modification which would not change our results would be to put the qudits on a ring so that sites n and 1 are connected.

Building on this, we define the following distribution of random circuits on a two-dimensional lattice.

Fig. 1
figure 1

The architecture proposed by the quantum AI group at Google to demonstrate quantum supremacy consists of a 2D lattice of superconducting qubits. This figure depicts two illustrative timesteps in this proposal. At each timestep, 2-qubit gates (blue) are applied across some pairs of neighboring qubits

Definition 5

(Random circuits on two-dimensional lattices). Consider a two-dimensional lattice with n qudits. Let \(r_{\alpha ,i}\) be the i\(^{\text {th}}\) row of the lattice in direction \(\alpha \in \{1,2\}\), for \(1 \le i \le \sqrt{n}\). For each \(\alpha \in \{1,2\}\) let \(\text {SampleAllRows}(\alpha )\) denote the following procedure (see Fig. 2):

figure b

Now define \(\mu ^{\text {lattice}, n}_{2, c, s}\) to be the distribution over unitary circuits resulting from the following random process:

figure c

This distribution has depth \((2c+1)2s\) and is related but not identical to the Google AI group’s experiment [5, 8], see Fig. 1. For our results on t-designs, we will take c to be \({{\,\textrm{poly}\,}}(t)\) and s to be \({{\,\textrm{poly}\,}}(t) \cdot \sqrt{n}\). We believe that our result can be extended to any natural family of circuits with nearest-neighbor interactions. We also assume for convenience that \(\sqrt{n}\) is an integer, but believe that this assumption is not fundamentally necessary.

Next, we give a recursive definition for our random circuits model on arbitrary D-dimensional lattices. We view a D-dimensional lattice as a collection of \(n^{1/D}\) sub-lattices of size \(n^{1-1/D}\), labeled as \(\xi _{1}, \ldots , \xi _{n^{1-1/D}}\). We label the rows of the lattice in the D-th direction by \(r_1,\ldots , r_{n^{1/D}}\).

Definition 6

(Random circuits on D-dimensional lattices). \(\mu ^{\text {lattice}, n}_{D,c,s}\) is the distribution resulting from the following random process.

figure d

Next, we define the model with long-range interactions on a complete graph.

Definition 7

(Random circuit models on complete graphs). \(\mu ^{\text {CG}}_s\) is the distribution over unitary circuits resulting from the following random process.

figure e

The size of the circuits in this ensemble is s.

1.3 Our results

Our first result is the following.

Theorem 8

Let \(s, c,n > 0\) be positive integers with \(\mu ^{\text {lattice},n}_{2,c,s}\) defined as in Definition 5.

  1. 1.

    \(s {=} {{\,\textrm{poly}\,}}(t)\!\left( \sqrt{n} {+}\ln \frac{1}{\delta }\!\right) , c {=} O\!\left( t \ln t {+} \frac{\ln (1/\delta )}{\sqrt{n}}\!\right) \! \,{\implies }\, \!\left\| \textsf{vec}\!\left[ G_{\mu ^{\text {lattice},n}_{2,c,s}}^{(t)}-G_\text {Haar}^{(t)}\!\right] \!\right\| _\infty {\le }\frac{\delta }{d^{nt}}\).

  2. 2.

    \(s {=} {{\,\textrm{poly}\,}}(t) \left( \sqrt{n} {+} \ln \frac{1}{\delta }\right) , c {=} O\left( t \ln t {+} \frac{\ln (1/\delta )}{\sqrt{n}}\right) \implies \left\| \text {Ch}\left[ G_{\mu ^{\text {lattice},n}_{2,c,s}}^{(t)}{-}G_\text {Haar}^{(t)}\right] \right\| _\diamond {\le } \delta \).

  3. 3.

    \(s = {{\,\textrm{poly}\,}}(t) \left( \sqrt{n} + \ln \frac{1}{\delta } \right) , c = O\left( t \ln t + \frac{\ln (1/\delta )}{\sqrt{n}}\right) \implies \left\| G_{\mu ^{\text {lattice},n}_{2,c,s}}^{(t)} - G_\text {Haar}^{(t)} \right\| _1 \le \delta \).

  4. 4.

    \(\left\| G_{\mu ^{\text {lattice},n}_{2,c,s}}^{(t)} - G_\text {Haar}^{(t)} \right\| _\infty \le c \cdot \sqrt{n} \cdot e^{-s/{{\,\textrm{poly}\,}}(t)} + \frac{1}{d^{O( c \sqrt{n} )}}\).

Fig. 2
figure 2

The random circuit model in definition 5. Each black circle is a qudit and each blue link is a random \(\text {SU}(d^2)\) gate. The model does \(O(\sqrt{n}{{\,\textrm{poly}\,}}(t))\) rounds alternating between applying (1) and (2). Then for \(O(\sqrt{n}{{\,\textrm{poly}\,}}(t))\) rounds it alternates between (3) and then (4). This entire loop is then repeated \(O({{\,\textrm{poly}\,}}(t))\) times

The three norms in the above theorem refer to the vector \(\ell _\infty \) norm, the superoperator diamond norm \(\Vert \cdot \Vert _\diamond \) (see Sect. 2.1) and the operator \(S_\infty \) norm, also known simply as the operator norm.

Proof sketch for part 1

We first give a brief overview of the proof in [13] and explain why their construction requires a circuit to have linear depth. Let \(G_{i,i+1}\) be the projector operator for a random two-qudit gate applied to qudits i and \(i+1\), and let \(G = \frac{1}{n-1} \sum _i G_{i,i+1}\). Therefore \(G_s = G^s\) is the quasi-projector corresponding to a 1-D random circuit with size s. [13] observed that \(G - G_{\text {Haar}}\) corresponds to a certain local Hamiltonian and \(\epsilon = 1- \Vert G - G_{\text {Haar}}\Vert _\infty \) is its spectral gap. The central technical result of [13] is the bound \(\epsilon \ge \frac{1}{n\cdot {{\,\textrm{poly}\,}}(t)}\). As a result, \(\Vert G_s - G_{\text {Haar}}\Vert _\infty = (1 - \frac{1}{n\cdot {{\,\textrm{poly}\,}}(t)})^s\). In general \(G - G_{\text {Haar}}\) has rank \(e^{O(n)}\), and in order to construct a strong approximate t-design (Definition 2), one needs to apply a sequence of expensive changes of norm that lose factors polynomial in the overall dimension of G, i.e., \(e^{O(nt)}\). Thereby in order to compensate for such exponentially large factors one needs to choose \(s = O(n^2\cdot {{\,\textrm{poly}\,}}(t))\), meaning depth growing linearly with n. Brown and Fawzi [17] furthermore observed that if G is the projector corresponding to one step of a random circuit on a 2-D lattice, the spectral gap still remains \(1-\Vert G - G_{\text {Haar}}\Vert _\infty = O\big (\frac{1}{n\cdot {{\,\textrm{poly}\,}}(t)}\big )\), and using the same proof strategy one needs linear depth.

The new ingredient we contribute is to show that if \(s = O(\sqrt{n})\) one can replace \(G^{(t)}_{\mu ^{\text {lattice},n}_{2,1,s}}\) with a certain quasi-projector \(G'\), such that

  1. (1)

    \(G'-G_{\text {Haar}}\) has rank \(t!^{O(\sqrt{n})}\) and

  2. (2)

    \(\Vert G' -G_{\text {Haar}}\Vert _\infty \approx 1/e^{\Omega (\sqrt{n})}\),

  3. (3)

    \(G^{(t)}_{\mu ^{\text {lattice},n}_{2,1,s}} \approx G'\) in various norms.

We first use (1) to relate the monomials definition of t-designs to the infinity norm and then use (2) to bound the infinity norm

$$\begin{aligned} \left\| \textsf{vec}\left[ G^{(t)}_{\mu ^{\text {lattice},n}_{2,c,s}}\right] - \textsf{vec}\left[ G^{(t)}_{\text {Haar}}\right] \right\| _\infty \approx t!^{O(\sqrt{n})} \left\| G'^c -G^{(t)}_{\text {Haar}}\right\| _\infty \cdot \frac{t!}{d^{nt}} \approx \frac{t!^{O(\sqrt{n})}}{e^{\Omega (c \cdot \sqrt{n})}} \cdot \frac{1}{d^{nt}}.\nonumber \\ \end{aligned}$$
(7)

For \(c = t \ln t\) the error bound is \(1/e^{\Omega (\sqrt{n})} \frac{1}{d^{nt}}\). As a result using (3)

$$\begin{aligned} \left\| \textsf{vec}\left[ G^{(t)}_{\mu ^{\text {lattice},n}_{2,c,s}}\right] - \textsf{vec}\left[ G^{(t)}_{\text {Haar}}\right] \right\| _\infty \approx \left\| \textsf{vec}\left[ G'^c\Big ] - \textsf{vec}\Big [G^{(t)}_{\text {Haar}}\right] \right\| _\infty \approx 1/e^{\Omega (\sqrt{n})} \cdot \frac{1}{d^{nt}}.\nonumber \\ \end{aligned}$$
(8)

This step requires a certain change of norm for which we only have to pay a factor like \(e^{O(\sqrt{n})}\), which we justify by bounding the ranks of the right intermediate operators. The factor of \(1/d^{nt}\) comes from the fact that the Haar measure itself has monomial expectation values on this order (in fact as large as \(t!/d^{nt}\) but we suppressing the t-dependence in this proof sketch.)

We now briefly describe the construction of \(G'\). Let \(G_R\) (and \(G_C\)) be the projector operators corresponding to applying a Haar unitary to each row (and column) independently. Then \(G' = G_R G_C\). \(G'\) has rank \(t!^{O(\sqrt{n})}\) because \(G_R\) and \(G_C\) are each tensor products of \(\sqrt{n}\) Haar projectors each with rank t!. Let \(V_R\), \(V_C\), and \(V_\text {Haar}\) be respectively the subspaces that \(G_R\), \(G_C\) and \(G_{\text {Haar}}\) project onto. In order to prove (1) in Sect. 3.6.1 we first use the fact that our circuits are computationally universal to argue that \(V_C \cap V_R = V_\text {Haar}\). We then prove that the angle between \(V_R \cap V_\text {Haar}^\perp \) and \(V_C \cap V_\text {Haar}^\perp \) is very close to \(\pi /2\), i.e., \(\approx \pi /2 \pm \frac{1}{d^{\sqrt{n}}}\). This implies that \(G_C G_R = G_{\text {Haar}} + P\), where P is a small matrix in the sense that \(\Vert P\Vert _{\infty } \approx 1/d^{\sqrt{n}}\). Choosing \(c = {{\,\textrm{poly}\,}}(t)\) we obtain (1). To show (2) it is not hard to see that the rank of \(G' - G_{\text {Haar}}\) is indeed \(e^{O(\sqrt{n})}\). For (3) we use the construction of t-designs from [13]. In particular, our random circuits model first applies an \(O(\sqrt{n})\) depth circuit to each row and then an \(O(\sqrt{n})\) depth circuit to each column and repeats this for \({{\,\textrm{poly}\,}}(t)\) rounds. The result [13] implies that each of these rounds is effectively the same as applying a strong approximate t-design to the rows or columns of the lattice. We then analyze how these designs behave under composition in various norms and prove (3). \(\square \)

Our second result generalizes Theorem 8 to arbitrary dimensions.

Theorem 9

There exists a value \(\delta = 1/d^{\Omega (n^{1/D})}\) such that for some large enough c depending on D and t:

  1. 1.

    \(s > c \cdot n^{1/D} \implies \left\| \textsf{vec}\left[ G_{\mu ^{\text {lattice},n}_{D,c,s}}^{(t)}-G_\text {Haar}^{(t)}\right] \right\| _\infty \le \frac{\delta }{d^{nt}}\).

  2. 2.

    \(s > c \cdot n^{1/D} \implies \left\| \text {Ch}\left[ G_{\mu ^{\text {lattice},n}_{D,c,s}}^{(t)}-G_\text {Haar}^{(t)}\right] \right\| _\diamond \le \delta \).

  3. 3.

    \(s > c \cdot n^{1/D} \implies \left\| G_{\mu ^{\text {lattice},n}_{D,c,s}}^{(t)} - G_\text {Haar}^{(t)} \right\| _\infty \le \delta \).

  4. 4.

    \(s > c \cdot n^{1/D} \implies \left\| G_{\mu ^{\text {lattice},n}_{D,c,s}}^{(t)} - G_\text {Haar}^{(t)} \right\| _1 \le \delta \).

In order to understand the implication of this result for anti-concentration, let’s first define

Definition 10

(Anti-concentration). We say a family of circuits \(\mu \) satisfy the \((\alpha ,\beta )\) anti-concentration property if for any \(x \in \{0,1\}^n\)

$$\begin{aligned} \mathop {\Pr }\limits _{U \sim \mu } \left[ |\langle x|U|0\rangle |^2 \ge \frac{\alpha }{2^n}\right] \ge \beta \end{aligned}$$
(9)

As mentioned before, unitary 2-designs imply strong anti-concentration bound. In particular

Theorem 11

Let \(\mu \) be a an \(\epsilon \)-approximate 2-design in the monomial measure. Then \(\mu \) satisfies the \((\alpha , \beta )\) anti-concentration property for \(\alpha = \delta (1-\epsilon )\), \(\beta = \frac{(1-\delta )^2 (1-\epsilon )^2}{2 (1+\epsilon )}\) and \(0 \le \delta \le 1\).

Proof

(See Appendix A and also Theorem 5 of [32]). The proof is based on the Paley–Zigmond anti-concentration inequality: for a non-negative random variable X and \(\delta >0\) we have

$$\begin{aligned} \Pr \left[ X \ge \delta \cdot {\mathbb {E}}X\right] \ge (1-\delta )^2 \frac{{\mathbb {E}}[X]^2}{{\mathbb {E}}[X^2]}. \end{aligned}$$
(10)

\(\square \)

We remark that based on the result of [24], while sufficient, the 2-design property is not necessary for anti-concentration. In Sect. 5 we give an alternative proof for anti-concentration in \(O(D \cdot n^{\frac{1}{D}})\) depth based on different ideas. The method directly implies anti-concentration and not the approximate 2-design property.

For these spatially local circuits we also improve on some bounds in [17] and [18] about scrambling and decoupling, removing polylogarithmic factors. Here we give an informal statement of the result with full details and definitions found in Sect. 6.

Theorem 12

(Informal). Random quantum circuits acting on D-dimensional lattices composed of n qubits are scramblers and decoupler in the sense of [17] and [18] after \(O(D \cdot n^{1/D})\) number of steps.

Our last result concerns the fully connected model. If \(s = O(n \ln ^2 n)\) and \(d=2\) then \(\mu ^{\text {CG}}_{s}\) satisfies the anti-concentration criterion according to Definition 10 for constant \(\alpha \) and \(\beta \), i.e., (5). We phrase our result in terms of the expected “collision probability” of the output distribution of \(C \sim \mu ^{\text {CG}}_s\) from which a bound similar to the one in theorem 11 will follow using the Paley–Zygmond inequality (10). In particular, if C is a quantum circuit on n qubits, starting from \(|0^n\rangle \) the collision probability is

$$\begin{aligned} \text {Coll}(C):= \sum _{x \in \{0,1\}^n} |\langle x|C|0\rangle |^4. \end{aligned}$$
(11)

For the Haar measure \({\mathbb {E}}_{C \sim \textrm{ Haar}}\text {Coll}(C) = \frac{2}{2^n+1}\), and for the uniform distribution this value is \(1/2^n\). In contrast, a depth-1 random circuit has expected collision probability \( (\sqrt{\frac{2}{5}} )^n\), which is exponentially larger than what we expect from the Haar measure.

Theorem 13

There exists a c such that when \(s > c n \ln ^2 n\),

$$\begin{aligned} \mathop {{\mathbb {E}}}\limits _{C \sim \mu ^{\text {CG}}_s} \text {Coll}(C) \le \frac{29}{2^n}. \end{aligned}$$
(12)

Moreover if \(t \le \frac{1}{3 c'} n \ln n\) for some large enough \(c'\), then

$$\begin{aligned} \mathop {{\mathbb {E}}}\limits _{C \sim \mu ^{\text {CG}}_s} \text {Coll}(C) \ge \frac{1.6 ^{n^{1-1/c'}}}{2^n}. \end{aligned}$$
(13)

Proof Sketch

For the upper bound, we translate the convergence time of the expected collision probability to the mixing time of a certain classical Markov chain (which we call \(X_0,X_1,\ldots \)). This Markov chain has also been considered in previous work [18, 34, 51]. Part of our contribution is to analyze this Markov chain in a new norm. The Markov chain has n sites labeled as \(1,\ldots ,n\), and at each site x it will move only to \(x-1\), x or \(x+1\). Such chains are known as “birth and death” chains, and in our case it results from representing the state of the system by a Pauli operator and then taking x to be the Hamming weight of that Pauli operator. It is known [51] that the probability of moving to site \(x+1\) is \(\approx \frac{6}{5} \frac{x(n-x)}{n^2}\) and the probability of moving to site \(x-1\) is \(\approx \frac{2}{5} \frac{x(x-1)}{n^2}\). The major difficulty in proving mixing for this Markov chain is that the norm which we have to prove mixing in is exponentially sensitive to small fluctuations (measured in either the 1-norm or the 2-norm). Indeed, given starting condition

$$\begin{aligned} \Pr [X_0 = k] = \frac{{n \atopwithdelims ()k}}{2^n-1}. \end{aligned}$$
(14)

we would like to show that

$$\begin{aligned} \mathop {{\mathbb {E}}}\limits _C[\text {Coll}(C)] \approx \sum _{k=1}^n\frac{\Pr \left[ X_t = k\right] }{3^k}, \end{aligned}$$
(15)

is \(\le O(2^{-n})\). We can think of (15) as a weighted 1-norm on probability distributions.

Our proof will compute the distribution of \(X_t\) for \(t = O(n \ln ^2 n)\) nearly exactly. One distinctive feature of this chain is that when \(k/n \ll 1\), the probability of moving is O(k/n) and the chain is strongly biased to move towards the right. When k/n reaches O(n), the chain becomes more like the standard discrete Ehrenfest chain, which is a random walk with a linear bias towards (in this case) \(k=\frac{3}{4} n\). Thus the small-k region needs to be handled separately. This is especially true for anti-concentration thanks to the \(1/3^k\) term in (15), so that even a small probability of waiting for a long time in this region can have a large effect on the collision probability.

The approach of [18, 27, 34] has been to relate the original Markov chain to an “accelerated” chain which is conditioned on moving at each step. The status of the original chain can be recovered from the accelerated chain by adding a geometrically distributed “wait time” at each step. Then standard tools from the analysis of Markov chains, such as comparison theorems and log-Sobolev inequalities, can be used to bound the convergence rate of the accelerated chain. Finally, it can be related back to the original chain by arguing that the accelerated chain is unlikely to spend too long on small values of k, allowing us to bound the wait time. For our purposes, this process does not produce sharp enough bounds, due to the heavy-tailed wait times combined with fairly weak bounds on how quickly the accelerated chain converges and leaves the small-k region.

We will sharpen this approach by incompletely accelerating; i.e., we will couple the original chain to a chain that moves with a carefully chosen (but always \(\Omega (1)\)) probability. In particular, we will introduce a chain where the probabilities of moving from x to \(x-1\), x or \(x+1\) are each affine functions of x. In fact our new “accelerated” chain is only accelerated for \(x< \frac{5}{6}n\) and is actually more likely to stand still for \(x\ge \frac{5}{6}n\). This will allow us to exactly solve for the probability distribution of the accelerated chain after any number of steps, using a method of Kac to relate this distribution to the solution of a differential equation. Our solution can be expressed simply in terms of Krawtchouk polynomials, which have appeared in other exact solutions to random processes on the hypercube. We relate this back to the original chain with careful estimates of the mean and large-deviation properties of the wait time. This ends up showing only that the collision probability is small for t in some interval \([t_1,t_2]\), and to show that it is small for a specific time, we need to prove that the collision probability decreases monotonically when we start in the state \(|0^n\rangle \). A further subtlety is that (15) technically only applies when all qubits have been hit by gates and we need to extend this analysis to include the non-negligible probability that some qubits have never been acted on by a gate.

Because previous work achieved quantitatively less sharp bounds, they could omit some of these steps. For example, [27, 34] used \(O(n^2)\) gates, which meant that the probability of most bad events was exponentially small. By contrast, in depth \(O(n\ln ^2(n))\), there is probability \(n^{-O(\ln n)}\) of missing at least one qubit and so we cannot afford to let this be an additive correction to our target collision probability of \(\text {constant} \cdot 2^{-n}\). Likewise, [18] used only \(O(n \ln ^2(n))\) gates but achieved a collision probability of \(2^{\epsilon n-n}\) for small constant \(\epsilon \), which allowed them to use a simpler version of the accelerated chain whose convergence they bounded using generic tools from the theory of Markov chains.

For the lower bound we just consider the event that the initial Hamming weight does not change throughout the process. The initial state with Hamming weight k has probability mass \(\Pr [X_0=k]=\frac{{n\atopwithdelims ()k}}{2^n-1}\). Starting with Hamming weight k, the probability of not moving in each step is \(e^{-O(k/n)}\), so if \(t= c n \ln n\) for \(c \ll 1\) then we have \(\Pr [X_t=k | X_0=k] \ge e^{-O(k t/n)}\). Hence

$$\begin{aligned} \mathop {{\mathbb {E}}}\limits _{C \sim \mu _t} \text {Coll}(C)\ge & {} \sum _{k = 1}^n \frac{{n \atopwithdelims ()k}}{2^n-1}\frac{\Pr [X_t=k | X_0=k]}{3^k} \ge \sum _{k = 1}^n \frac{{n \atopwithdelims ()k}}{2^n-1}\frac{e^{-O(kt/n)}}{3^k}\nonumber \\\approx & {} \frac{1}{2^n}(1 + e^{- 3 t/n})^n \ge \frac{2^{n^{1-O(1)}}}{2^n} \end{aligned}$$
(16)

\(\square \)

A natural question is whether there is a common generalization of our Theorems 9 and 13. In physics, the \(D\rightarrow \infty \) limit is often considered a good proxy for the fully connected model. This raises the question of whether we needed Theorem 13 to handle the fully connected case, or whether it would be enough to use Theorem 9 in the large D limit. However, Theorem 9 works only for \(D = O(\ln n/ \ln \ln n)\), and the best depth bound we can get from this theorem is \(e^{O(\ln n/\ln \ln n)}\), which is far above the \(O(\ln ^2(n))\) achievable by Theorem 9. However, in Sect. 5 we give an alternative proof for anti-concentration of outputs via circuits on D-dimensional circuits with \(t=2\) and \(D = O(\ln n)\). Using that approach we can make the depth as small as \(O(\ln n \ln \ln n)\). We conjecture that \(O(\ln n)\) depth should also possible.

In order to establish rigorous bounds, our results involve some inequalities that are not always tight. As a result, the upper bound on collision probability in Theorem 13 has a factor of 29 rather than the \(2+o(1)\) that we would expect and the bound on the number of gates required may be too high by a factor of \(\ln (n)\). Since determining the precise number of gates needed for anti-concentration may have utility in near-term quantum hardware, we also undertake a heuristic analysis of what depth seems to be required to achieve anti-concentration. Here we ignore the possibility of large fluctuations in the wait time, for example, and simply set it equal to its expected value. We also freely make the continuum approximation for the biased random walk that ignores wait time, obtaining the Ornstein–Uhlenbeck process. The resulting analysis (found in Sect. 4.6) suggests that \(\frac{5}{6} n \ln n + o(n \ln n)\) gates are needed to achieve anti-concentration comparable to the Haar measure.

This result can also be useful for understanding the near-term power of certain variational quantum algorithms, such as VQE and QAOA. [20, 43] show that when a gate sequence is drawn from a 2-design, the gradients used for optimizing VQE and other algorithms become exponentially small. This is called the “barren plateau” phenomenon. Our result would suggest that this occurs in 2-D circuits once the depth is \(\gtrsim \sqrt{n}\).

1.4 Previous work

The time evolution of the 2nd moments of random quantum circuits was first studied by Oliveira, Dahlsten and Plenio [51], who investigated their entanglement properties. This was extended by [27, 34] to show that after linear depth, arndom circuits on the complete graph yield approximate 2-designs. In [13] Brandão-Harrow-Horodecki (BHH) extended this result and showed that for a 1D-lattice after depth \(t^{10.5} \cdot O(n + \ln \frac{1}{\epsilon })\) these random quantum circuits become \(\epsilon \)-approximate t-designs. This result was subsequently improved to \(t^{5 + o_t(1)} O(n + \ln \frac{1}{\epsilon })\) by Haferkamp [31]. All of these results (except [51]) directly imply anti-concentration after the mentioned depths. The construction of t-designs in [13] is in a stronger measure than the one in HL [34]. The gap of the second-moment operator was calculated exactly for \(D=1\) and fully connected circuits by Žnidarič [58] and a heuristic estimate for the \(t^{\text {th}}\) moment operator was given by Brown and Viola for fully connected circuits [19].

In [17, 18] Brown and Fawzi considered “scrambling” and “decoupling” with random quantum circuits. In particular, they showed for a D-dimensional lattice scrambling occurs in depth \(O(n^{1/D} {\text {polylog}}(n))\), and for complete graphs, they showed that after polylogarithmic depth these circuits demonstrate both decoupling and scrambling. For the case of D-dimensional lattices they showed that for the Markov chain K, after depth \(n^{1/D} {\text {polylog}}(n)\), a string of Hamming weight 1 gets mapped to a string with linear Hamming weight with probability \(1-1/{\text {poly}}(n)\). While this result is related to ours, it does not seem to yield the results we need e.g. for anti-concentration, due to the powers of Hilbert space dimension that are lost when changing norms.

In [46, 47] Nahum, Ruhman, Vijay and Haah considered operator spreading for random quantum circuits on D-dimensional lattices. They considered the case when a single Pauli operator starts from a certain point on the lattice and they analyze the probability that after a certain time a non-identity Pauli operator appears at an arbitrary point on the lattice. For \(D=1\) they showed that this probability function satisfies a biased diffusion equation. Their result in this case is exact. For \(D=2\) they explained, both numerically and theoretically, that this probability function spreads as an almost circular wave whose front satisfies the one dimensional Kardar-Parisi-Zhang equation. They moreover explained: 1) the bulk of the occupied region is in equilibrium, 2) fluctuations appear at the boundary of this region with \(\sim t^{1/3}\), and 3) the area of the occupied region grows like \(t^2\), where t is the depth of the circuit. As far as we understand this result does not directly lead to the construction of t-designs and rigorous bounds on the quality of the approximations made in that paper are not known.

If we assume that qudits have infinite local dimension (\(d \rightarrow \infty \)) then the evolution of Pauli strings on a 2-D lattice is closely related to Eden’s model [28]. Here, Eden has found certain explicit solutions. However, apart from the \(d\rightarrow \infty \) limit, his model differs from ours also in that his considers only starting with a single occupied site and running for a time much less than the graph diameter (or equivalently, considering an infinitely large 2-D lattice), while we consider the initial distribution obtained by starting in the \(|0^n\rangle \) state.

After the first preprint version of this paper was posted online, [24] improved on our results in several ways. Unlike what we expected, they proved that random quantum circuits acting on linear chains or complete graphs anti-concentrated after depth \(\Theta (\ln n)\). It is left as an important open question whether the same bound holds for \(D = 2,3, \ldots \). They also proved one of the conjectures of this paper that the constant factor for the depth bound for the complete graph model is 5/6. The initial presentation of this result had a mistake in the heuristic reasoning and predicted the constant factor to be 5/3. This was pointed out and corrected in [24].

1.5 Open questions

  1. 1.

    Is it possible to construct “strong” t-designs (Definition 2) using sub-linear depth random circuits? If we can show that the off-diagonal moments (see Definition 36) of the distribution, which have expectation zero according to the Haar measure, become smaller than \(1/d^{3nt}\) in sub-linear depth, then our construction of monomial designs implies the construction of strong designs. On the other hand, we cannot rule out the possibility that strong designs require linear depth.

  2. 2.

    How large are the constant factors in bounds reported in this paper? Based on a heuristic argument in Sect. 4.6 for the complete graph architecture we conjecture that such random circuits of size \(s = \frac{5}{6} n (\ln n + \epsilon )\) are \(O(\epsilon )\)-approximate 2-designs. See Conjecture 1 for a precise conjectured bound for obtaining 2-designs. In work appearing after the first version of our paper, Ref. [24] proved this conjecture for anti-concentration. Our result had achieved an upper-bound of \(O(n \ln ^2 n)\).

  3. 3.

    We believe our dependence on n is essentially optimal. But our depth scales with t as \(t^\alpha \) for some \(\alpha \gtrsim 5\) that is almost certainly not optimal. At the moment the best lower bound is \(\Omega (t\ln n)\) depth for any circuit, or \(\Omega (n^{1/D})\) in D dimensions. Indeed, very recently [37] provided strong analytical evidence that for the one-dimensional architecture, \(\alpha =1\) for \(D=1\). The argument, however, contains uncontrolled approximations and is not known to extend even to \(D=2\), although such an extension seems plausible. Intriguingly, also for constant n and with a different gate model, some results are known that are completely independent of t [11].

  4. 4.

    If we pick an arbitrary graph and apply random gates on the edges of this graph, after what depth do these circuits become t-designs? We conjecture that if the graph has large expansion and diameter l, then the answer is O(l). However, if the graph has a tight bottleneck (like a binary tree), then even though the graph has small diameter, we suspect that certain measures of t-designs (including the monomial measure) require linear depth. Ideally, the t-design time for any graph could be related to other properties of the graph such as mixing time, cover time, etc.

  5. 5.

    Can we prove a comparison lemma for random circuits, i.e., can we show that if two random circuits are close to each other, then they become t-designs after roughly the same amount of time? Such comparison lemma may imply that other natural families of low-depth circuits are approximate t-designs. A related question is whether deleting random gates from a circuit family can ever speed up convergence to being a t-design. Such a bound has been called a “censoring” inequality in the Markov-chain literature.

  6. 6.

    Our results do not say much about the actual constants that appear in the asymptotic bounds for the required size for anti-concentration. We conjecture the leading term in the anti-concentration time for random circuits on complete graphs is \(\frac{5}{6} n \ln n\). For the D-dimensional case our bounds inherit constant factors from [13]. Simple numerical simulation and also the analysis of [8, 46, 47] suggest that the constant should be \(\approx 1\).

  7. 7.

    For the case of D-dimensional circuits, our result does not say much about the dynamics of the distribution when depth is \(\ll n^{1/D}\). Such a result may explain the dynamics of entanglement in random circuits. [46, 47] consider this problem for the case when a single Pauli operator starts at the middle of the lattice; however, their result does not apply to arbitrary initial operators.

  8. 8.

    The best anti-concentration lower bound we are able to prove is \(\Omega (\ln n)\) For D-dimensional lattices one would expect a lower-bound of \(\Omega (n^{1/D})\) based on the following intuition for circuits of depth \(s < n^{1/D}\): For \(s \ll n^{1/D}\), we expect any two non-overlapping clusters of \(s^D\) qubits will be close to Haar random. Hence, a crude model for such circuits would be \(n/s^D\) copies of Haar-random unitaries each on \(s^D\) qubits. In this case we would expect the collision probability to be \( \approx \frac{2^{n/s^D}}{2^n}\). Very interestingly, the recent result [24] refutes this intuition for \(D=1\) and showed an upper bound of \(O(\ln n)\) for the depth at which anti-concentration is achieved. It seems plausible that at \(D = 2,3, \ldots \) we would also have anti-concentration in depth \(O(\ln n)\) since it holds both for \(D=1\) and for fully connected circuits.

2 Preliminaries

2.1 Basic definitions

We need the following norms:

Definition 14

For a superoperator \({\mathcal {E}}\) the diamond norm [40] is defined as \(\Vert {\mathcal {E}}\Vert _\diamond := \sup _d \Vert {\mathcal {E}} \otimes \textrm{id}_d\Vert _{1 \rightarrow 1}\), where for a superoperator A the \(1 \rightarrow 1\) norm is defined as\( \Vert A\Vert _{1 \rightarrow 1}:= \sup _{X \ne 0} \frac{\Vert A (X)\Vert _1}{\Vert X\Vert _1}\).

A matrix is called positive semi-definite (psd) if it is Hermitian and has all non-negative eigenvalues. A superoperator \({\mathcal {A}}\) is called completely positive (cp) if for any \(d \ge 0\), \({\mathcal {A}} \otimes \text {id}_d\) maps psd matrices to psd matrices. A superoperator is called trace-preserving completely positive (tpcp) if it maps if it preserves the trace and is furthermore cp.

Let S be a set of qudits, then

Definition 15

\(\text {Haar}(S)\) is the Haar measure on \(U (({\mathbb {C}}^d)^{\otimes |S|})\). We refer to \(\text {Haar}(i,j)\) as the two qudit Haar measure on qudits indexed by i and j and also if m is an integer, the notation \(\text {Haar}(m)\) means Haar measure on m qudits.

We now define expected monomials, moment superoperators and quasi-projectors for a distribution \(\mu \) over the unitary group:

Definition 16

Let \(n,t >0\) be positive integers and \(\mu \) be any distribution over n-qudit unitary group \(\text{ U }(({\mathbb {C}}^d)^{\otimes n})\). Then \(G_\mu ^{(t)}:= {\mathbb {E}}_{C\sim \mu } \left[ C^{\otimes t,t} \right] \) is the quasi-projector of \(\mu \). Here \(C^{\otimes t,t} = C^{\otimes t} \otimes C^{*\otimes t}\). Also \(G^{(t)}_{(i,j)} = G^{(t)}_{\text {Haar}(i,j)}\). Using this Definition we will also use the following quantities:

  1. 1.

    Let \(i_1, j_1, \ldots , i_t, j_t, k_1, l_1, \ldots , k_t, l_t \in [d]^n\) be any 2t-tuple of words \(\in [d]^n\). Then the \(i_1,\ldots ,l_t\) monomial is the expected value of a balanced monomial of \(\mu \) defined as

    $$\begin{aligned} \mathop {{\mathbb {E}}}\limits _{C \sim \mu } \left[ C_{i_1, j_1} \ldots C_{i_t, j_t} C^*_{k_1, l_1} \ldots C^*_{k_t l_t} \right] = \langle i_1, \ldots , j_t|G_\mu ^{(t)}|k_1, \ldots , l_t\rangle \end{aligned}$$
    (17)

    \(C_{a,b}\) is the ab entry of the unitary matrix C.

  2. 2.

    Let \(\textrm{ad}_X (\cdot ):= X (\cdot ) X^\dagger \). Then \(\text {Ch}\left[ G^{(t)}_\mu \right] := {\mathbb {E}}_{C \sim \mu } \left[ \textrm{ ad}_{C^{\otimes t}} \right] \) is the \(t^{\text {th}}\) moment superoperator of \(\mu \).

Next, we define the building blocks of our t-design constructions.

Definition 17

(Rows of a lattice). For \(1 \le i \le n^{1-1/D}\), \(r_{\alpha ,i}\) is the i-th row of a D-dimensional lattice in the \(\alpha \)-th direction. We will label the qubits in row i by \((\alpha ,i,1),\ldots ,(\alpha ,i,n^{1/D})\). Assume for convenience that \(n^{1/D}\) is an even integer and define the sets of pairs \(E_{\alpha ,i}:= \{((\alpha ,i,1),(\alpha ,i,2)),\ldots , ((\alpha ,i,n^{1/D}-1), (\alpha ,i,n^{1/D}))\}\) and \(O_{\alpha ,i}:= \{((\alpha ,i,2),(\alpha ,i,3)),\ldots , ((\alpha ,i,n^{1/D}-2), (\alpha ,i,n^{1/D}-1))\}\).

Definition 18

(Elementary random circuits). The elementary quasi-projector in direction \(\alpha \) is

$$\begin{aligned} g_{\text {Rows} (\alpha , n)}:=\prod _{1\le l \le n^{1-1/D}} \bigotimes _{(i,j) \in E_{\alpha ,l}} G^{(t)}_{(i,j)} \cdot \bigotimes _{(i,j) \in O_{\alpha ,l}} G^{(t)}_{(i,j)}=: \prod _{1\le l \le n^{1-1/D}} g_{r_{\alpha ,l}}. \end{aligned}$$
(18)

For the 2-D lattice \(g_R\) and \(g_C\) for \(g_1\) and \(g_2\), respectively.

The following defines the moment superoperator and quasi-projector of the Haar measure on the rows of a D-dimensional lattice in a specific direction.

Definition 19

(Idealized model with Haar projectors on rows). Let \(1\le \alpha \le D\) be one of the directions of a D-dimensional lattice then

$$\begin{aligned} G_{\text {Rows}(\alpha ,n)}:= \prod _{ 1 \le i \le n^{1-1/D}} G^{(t)}_{\text {Haar}(r_{\alpha ,i})}=:\prod _{ 1 \le i \le n^{1-1/D}} G_{r_{\alpha ,i}}. \end{aligned}$$
(19)

For a 2-D lattice we use \(G_R\) and \(G_C\) for \(G_1\) and \(G_2\), respectively.

Next, we define moment operators and projectors corresponding to the Haar measure on the sub-lattices of a D-dimensional lattice. We view a D-dimensional lattice as a collection of \(n^{1/D}\) smaller lattices each with dimension \(D-1\), composed of \(n^{1-1/D}\) qudits. We label these sub-lattices with \(\text{ Planes }(D):= \{p_1, \ldots , p_{n^{1/D}}\}\).

Definition 20

(Haar measure on sub-lattices). \(G_{\text {Planes}(D)} = \bigotimes _{p \in \text {Planes}(D)} G^{(t)}_{\text {Haar}(p)}\equiv G^{(t)\otimes n^{1/D}}_{\text {Haar}(n^{1-1/D})}\),.

Definition 21

For \(d=2\), \(t=2\) and a superoperator \({\mathcal {A}}\) define

$$\begin{aligned} \text {Coll}({\mathcal {A}}):= {\textrm{Tr}}\left( \sum _{x \in \{0,1\}^n} |x\rangle \langle x| \otimes |x\rangle \langle x| {\mathcal {A}}(|0^n\rangle \langle 0^n| \otimes |0^n\rangle \langle 0^n| ) \right) . \end{aligned}$$
(20)

In particular, for a distribution \(\mu \) over circuits of size s the expected collision probability is defined as

$$\begin{aligned} \text {Coll}_s:= \text {Coll}\left( \text {Ch}\left[ G^{(2)}_{\mu }\right] \right) . \end{aligned}$$
(21)

Remark 1

For \(d=2\), \(t=2\) and when \(\nu \) is the Haar measure on \(\text {U}(4)\), \(\text {Ch}\left[ G^{(2)}_{(i,j)}\right] \) is the following map in the Pauli basis:

$$\begin{aligned} \text {Ch}\left[ G^{(2)}_{(i,j)}\right] (\sigma _p \otimes \sigma _q) = {\left\{ \begin{array}{ll} \sigma _0 \otimes \sigma _0 &{} p q = 00\\ \frac{1}{15}\sum _{s\in \{0,1,2,3\}^2 \backslash 0} \sigma _s \otimes \sigma _s&{} p = q \ne 00\\ 0 &{} \text {otherwise} \end{array}\right. } \end{aligned}$$
(22)

More generally, if S is a collection of qubits, and \(p, q \in \{0,1,2,3\}^S\), then

$$\begin{aligned} \text {Ch}\left[ G^{(2)}_S\right] (\sigma _p \otimes \sigma _q) = {\left\{ \begin{array}{ll} \sigma _0 \otimes \sigma _0 &{} p q = 00\\ \frac{1}{4^{|S|}-1}\sum _{s\in \{0,1,2,3\}^{|S|} \backslash 0} \sigma _s \otimes \sigma _s&{} p = q \ne 00\\ 0 &{} \text {otherwise} \end{array}\right. } \end{aligned}$$
(23)

when \(p,q \in \{0,1,2,3 \}^S\).

See [34, 51] for the proof of these remarks.

2.2 Operator definitions of the models

Definition 22

(Random circuits on a two-dimensional lattice). The quasi-projector of \(\mu ^{\text {lattice},n}_{2,c,s}\) is \(G_{{\mu ^{\text {lattice},n}_{2,c,s}}}^{(t)} = g^s_R (g^s_C g^s_{R} )^c\).

The generalization of this definition to arbitrary D dimensions is according to:

Definition 23

(Recursive definition for random circuits on D-dimensional lattices). The quasi-projector of \(\mu ^{\text {lattice},n}_{D,c,s}\) is specified by the recursive formula:

$$\begin{aligned} G^{(t)}_{ \mu ^{\text {lattice},n}_{D,c,s}} = G^{(t) \otimes n^{1/D}}_{\mu ^{\text {lattice},n^{1-1/D}}_{D-1,c,s}} \left( g_{\text {Rows}(D,n)}^s G^{(t)\otimes n^{1/D}}_{\mu ^{\text {lattice},n^{1-1/D}}_{D-1,c,s}} \right) ^c. \end{aligned}$$
(24)

It will be useful to our proofs to also define:

  1. 1.

    \({\tilde{G}}_{n,D,c} = \left( {\tilde{G}}^{\otimes n^{1/D}}_{n^{1-1/D}, D-1, c} G_{\text {Rows}(D,n)} {\tilde{G}}^{\otimes n^{1/D}}_{n^{1-1/D}, D-1, c} \right) ^c\)

  2. 2.

    \({{\hat{G}}}_{n,D,c,s} = G_{\text {Rows}(D,n)} {\tilde{G}}^{\otimes n^{1/D}}_{n^{1-1/D}, D-1, c,s} G_{\text {Rows}(D,n)}\)

In particular, \({\tilde{G}}_{n,D,c,s}\) is the same as \(G_{\mu ^{\text {lattice},n}_{D,c,s}}\) except that we have replaced \(g_{\text {Rows}(D,n)}^s\) with \(G_{\text {Rows}(D,n)}\). Definition 22 is a special case of Definition 23, but we included both of them for convenience.

Definition 24

\(G^{(t)}_{\mu ^{\text {CG}}_{s}} = \left( \frac{1}{{n \atopwithdelims ()2}} \sum _{i \ne j} G^{(t)}_{(i,j)} \right) ^s\).

2.2.1 Summary of the definitions.

See below for a summary of the definitions:

Notation

Definition

Reference

\(\Vert \cdot \Vert _\diamond \)

superoperator diamond norm

Definition 14

\(\Vert \cdot \Vert _p\)

matrix p-norm for \(p \in [0, \infty ]\)

Definition 14

\(\text {Haar}\)

the Haar measure

Definition 15

\(\text {Haar}(S)\)

Haar measure on subset S of qudits

Definition 15

\(\text {Haar}(i,j)\)

Haar measure on qudits i and j

Definition 15

\(U^{\otimes t,t}\)

\(C^{\otimes t} \otimes C^{*, \otimes t}\)

Definition 16

\(G^{(t)}_\mu \)

average of \(C^{\otimes t,t}\) over \(C\sim \mu \)

Definition 16

\(G^{(t)}_\text {Haar}\)

Projects onto vectors invariant under \(C^{\otimes t,t}\)

Definition 16

\(G^{(t)}_{i,j}\)

Haar projector of order t on qudit i and j

Definition 16

\(\langle i,j| G^{(t)}_\mu |k,l\rangle \)

moment of order t: \({\mathbb {E}}_{C\sim \mu } [C_{i_1, j_1} \ldots C_{i_t, j_t} C^*_{i_1, j_1} \ldots C^*_{i_t, j_t}]\)

Definition 16

\(\text {Ch}[G^{(t)}_\mu ]\)

moment superoperator, equal to \({\mathbb {E}}_{C \sim \mu } [\textrm{ ad}_{C^{\otimes t}} ]\)

Definition 16

\(r_{\alpha ,i}\)

i-th row in the \(\alpha \) direction with \(i\in [n^{1/D}], \alpha \in [D]\)

Definition 17

\(\text {Rows}(\alpha ,n)\)

the collection of rows of a lattice (with n points) in the \(\alpha \) direction

Definition 17

\(g_{\text {Rows}(\alpha ,n)}\)

two-qudit gates applied to even then odd neighbors in each row in the \(\alpha \)   direction

Definition 18

\(g_{r(\alpha ,i)}\)

two-qudit gates applied to even then odd neighbors in the i-th row in the \(\alpha \)   direction

Definition 18

\(g_{R}\) and \(g_C\)

\(g_{\text {Rows}(1,n)}\) and \(g_{\text {Rows}(2,n)}\) when \(D=2\).

Definition 18

\(G_{\text {Rows}(\alpha ,n)}\)

Haar projector applied to each row in the\(\alpha ^{\text {th}}\) direction

Definition 19

\(G_R(G_C)\)

Haar projector applied to each row (column) of a 2D lattice

Definition 19

\(G_{\text {Planes}(\alpha )}\)

Haar projector applied to each plane perpendicular to the direction \(\alpha \)

Definition 20

\(\text {Coll}({\mathcal {A}})\)

collision probability from superoperator \({\mathcal {A}}\)

Definition 21

\(\text {Coll}_s\)

the expected collision probability of a random circuit after s steps

Definition 21

\(\mu ^{\text {lattice},n}_{D,c,s}\)

the distribution over D-dimensional circuits with n qudits

Definition 23

\({{{\tilde{G}}}}_{n,D,c}\)

same as \(G^{(t)}_{\mu ^{\text {lattice},n}_{D,c,s}}\) except that we replace \(g^s_{\text {Rows} (\alpha ,n)}\) with \(G_{\text {Rows} (\alpha ,n)}\)

Definition 23

\({{{\hat{G}}}}_{n,D,c,s}\)

one block of \({{{\tilde{G}}}}_{n,D,c}\) defined as \(G_{\text {Rows}(D,n)} {\tilde{G}}^{\otimes n^{1/D}}_{n^{1-1/D}, D-1, c,s} G_{\text {Rows}(D,n)}\)

Definition 23

\(\mu ^{\text {CG}}_{s}\)

the distribution over circuits with s random two-qubit gates

Definition 24

\(\measuredangle (A,B)\)

\(\cos ^{-1}\max _{x\in A, y \in B} \langle x,y\rangle \) is the angle between two vector spaces A and B

Section 3.6.1

2.3 Elementary tools

If A is a matrix and \(\sigma _i\) are the singular values of A, then for \(p \in [1,\infty )\) the Schatten p-norm of A is defined as \(\Vert A\Vert _p:= (\sum _i \sigma ^p_i)^{1/p}\). The \(\infty \)-norm of A is \(\Vert A\Vert _\infty := \max (i) \sigma _i\). The 1-norm is related to the \(\infty \)-norm by \(\Vert A\Vert _1 \le {{\,\textrm{rank}\,}}(A) \cdot \Vert A\Vert _\infty \). Moreover, for \(p \in [1, \infty ]\) and any two matrices A and B, \(\Vert A \otimes B\Vert _p = \Vert A\Vert _p\cdot \Vert B\Vert _p\).

If \({\mathcal {A}}\) and \({\mathcal {B}}\) are superoperators, then \(\Vert {\mathcal {A}}\otimes {\mathcal {B}}\Vert _\diamond = \Vert {\mathcal {A}}\Vert _\diamond \cdot \Vert {\mathcal {B}}\Vert _\diamond \).

\(\text {Ch}\left[ \cdot \right] \) is the linear map from matrices to superoperators such that for any two equally sized matrices A and B, \(\text {Ch}\left[ A\otimes B^*\right] = A [\cdot ] B^\dagger \). Note that \(\text {Ch}\left[ \cdot \right] \) is associative in the sense that \(\text {Ch}\left[ A \otimes B^*\right] \circ \text {Ch}\left[ C \otimes D^*\right] = \text {Ch}\left[ A C \otimes B^*C^*\right] \), for any equally sized matrices ABCD.

Consider the Haar measure over \(\text{ U }(d)\). \(\text {Ch}\left[ G^{(t)}_\text {Haar}\right] \) (defined in the previous section) is the projector onto the matrix vector space of permutation operators (permuting length t words over the alphabet [d]). In particular, for any matrix \(X \in {\mathbb {C}}^{d^t \times d^t}\) we can write

$$\begin{aligned} \text {Ch}\left[ G^{(t)}_\text {Haar}\right] [X] = \sum _{\pi \in S_t} {\textrm{Tr}}(V(\pi ) X) Wg(\pi ), \end{aligned}$$
(25)

where \(V(\pi )\) is the permutation matrix \(\sum _{(i_1, \ldots , i_t) \in [d]^t} |i_1, \ldots , i_t\rangle \langle i_{\pi (1)}, \ldots , i_{\pi (t)}|\), and \(Wg(\pi )\) is a linear combination of permutations. Specifically

$$\begin{aligned} Wg(\pi ) = \sum _{\sigma \in S_t} \alpha (\pi ^{-1} \sigma ) V(\sigma ). \end{aligned}$$
(26)

Here the coefficients \(\alpha (\cdot )\) are known as Weingarten functions (see [23]). If \(\mu ,\nu \in S_t\) then let \(\textsf{dist}(\mu ,\nu )\) denote the number of transpositions needed to generate \(\mu ^{-1}\nu \) from the identity permutation. Then we can define \(\alpha (\cdot )\) by the following relation.

$$\begin{aligned} \sum _{\mu ,\nu \in S_t} \alpha (\mu ^{-1}\nu )|\mu \rangle \langle \nu | = \left( \sum _{\mu ,\nu \in S_t} \textsf{dist}(\mu ,\nu )|\mu \rangle \langle \nu |\right) ^{-1}. \end{aligned}$$
(27)

Note that \(\alpha (\pi )\) is always real and \(|\alpha (\lambda )| = O(1/d^{t+\textsf{dist}(\lambda )})\). Thus for large d, \(Wg(\pi ) \approx V(\pi )/d^t\).

Furthermore,

$$\begin{aligned} \text {Ch}\left[ G^{(t)}_\text {Haar}\right] [X] = \sum _{\pi \in S_t} \mathop {{\textrm{Tr}}}\limits _M \left( (V(\pi )_M \otimes I_N) X_{MN}\right) \otimes {{\,\textrm{Wg}\,}}(\pi )_M. \end{aligned}$$
(28)

Let AB be matrices. For the superoperator \({\mathcal {D}}\equiv B {\textrm{Tr}}[A \cdot ]\) we use the notation \({\mathcal {D}}= B A^*\). We need the following observation:

$$\begin{aligned} V(\pi ) V^*(\sigma ) = \text {Ch}\left[ |\psi _{\pi }\rangle \langle \psi _\sigma |\right] , \end{aligned}$$
(29)

where \(|\psi _\pi \rangle = (I \otimes V(\pi )) \frac{1}{\sqrt{d^t}} \sum _{i \in [d]^t} |i\rangle |i\rangle \).

We need the following lemma:

Lemma 25

If A is a (possibly rectangular) matrix, then \(A A^\dagger \) and \(A^\dagger A\) have the same spectra.

Lemma 26

If A and B are matrices and \(\Vert \cdot \Vert _*\) is a unitarily invariant norm, then \(\Vert A B\Vert _*\le \Vert A\Vert _*\Vert B\Vert _\infty \).

Proof

This lemma can be viewed as a consequence of Russo-Dye theorem, which states that the extreme points of the unit ball for \(\Vert \cdot \Vert _\infty \) are the unitary matrices. Thus we can write \(B = \Vert B\Vert _\infty \sum _i p_i U_i\) for \(\{p_i\}\) a probability distribution and \(\{U_i\}\) a set of unitary matrices. We use this fact along with the triangle inequality and then unitary invariance to obtain

$$\begin{aligned} \Vert A B\Vert _*= \Vert A \cdot \left( \Vert B\Vert _\infty \sum _i p_iU_i\right) \Vert _*\le & {} \Vert B\Vert _\infty \sum _i p_i \Vert A U_i \Vert _*= \Vert B\Vert _\infty \sum _i p_i \Vert A \Vert _*\nonumber \\ {}= & {} \Vert A\Vert _*\Vert B\Vert _\infty . \end{aligned}$$
(30)

\(\square \)

A similar argument applies to superoperators.

Lemma 27

If \({\mathcal {A}}\) is a superoperator and \({\mathcal {B}}\) is a tpcp superoperator then \(\Vert {\mathcal {A}}{\mathcal {B}}\Vert _\diamond \le \Vert {\mathcal {A}}\Vert _\diamond \).

Proof

Let d be \(\ge \) the input dimensions of both \({\mathcal {A}}\) and \({\mathcal {B}}\). Then \(\Vert {\mathcal {A}}\Vert _\diamond = \max _{\Vert X\Vert _1 \le 1} \Vert ({\mathcal {A}}\otimes {{\,\textrm{id}\,}}_d)(X)\Vert _1\) and \(\Vert {\mathcal {A}}{\mathcal {B}}\Vert _\diamond = \max _{\Vert X\Vert _1 \le 1} \Vert ({\mathcal {A}}\otimes {{\,\textrm{id}\,}}_d)({\mathcal {B}}\otimes {{\,\textrm{id}\,}}_d)(X)\Vert _1\). Since \({\mathcal {B}}\) is a tpcp superoperator \(\Vert ({\mathcal {B}}\otimes {{\,\textrm{id}\,}}_d)(X)\Vert _1 \le 1\) and so \(\Vert {\mathcal {A}}{\mathcal {B}}\Vert _\diamond \) is maximizing over a set which is contained in the set maximized over by \(\Vert {\mathcal {A}}\Vert _\diamond \). \(\square \)

These give rise to the following well-known bound, which often is called “the hybrid argument.”

Lemma 28

Let \(\Vert \cdot \Vert _*\) be a unitarily invariant norm. If \(A_1, \ldots ,A_t\) and \(B_1, \ldots , B_t\) have \(\infty \)-norm \(\le 1\). Then

$$\begin{aligned} \Vert A_1\ldots A_t - B_1 \ldots B_t\Vert _*\le \sum _{i}\Vert A_i- B_i\Vert _*. \end{aligned}$$
(31)

This is also true for superoperators and the diamond norm, if each superoperator is a tpcp map.

We will need a similar bound for tensor products.

Lemma 29

Suppose \(\Vert A - B\Vert _*\le \epsilon \) for some norm \(\Vert \cdot \Vert _*\) that is multiplicative under tensor product. Then for any integer \(M > 0\)

$$\begin{aligned} \left\| A^{\otimes M} - B^{\otimes M}\right\| _*\le (\Vert B\Vert _*+\epsilon )^M -\Vert B\Vert _*. \end{aligned}$$
(32)

The same holds for superoperators and the diamond norm. In particular \(\Vert A^{\otimes M} - B^{\otimes M}\Vert _*\le 2\,M \Vert B\Vert _*^M \epsilon \) for \(\epsilon \le \frac{1}{2\,M}\).

We need the following definition and lemma:

Definition 30

Let X and Y be two real valued random variables on the same totally ordered sample space \(\Omega \). Then we say X is stochastically dominated by Y, if for all \(x \le y \in \Omega \), \(\Pr [X \ge x] \le \Pr [Y \ge y]\). We represent this by \(X \preceq Y\).

Lemma 31

(Coupling). \(X \preceq Y\) if and only if there exists a coupling (a joint probability distribution) between X and Y such that the marginals of this coupling are exactly X and Y and that with probability 1, \(X\le Y\).

2.4 Various measures of convergence to the Haar measure

Definition 32

Let \(\mu \) be a distribution over n-qudit gates. Let \(\epsilon \) be a positive real number.

  1. 1.

    (Strong designs) \(\mu \) is a strong \(\epsilon \)-approximate t-design if

    $$\begin{aligned} (1-\epsilon ) \cdot \text {Ch}\left[ G^{(t)}_{\text {Haar}}\right] \preceq \text {Ch}\left[ G^{(t)}_{\mu }\right] \preceq (1+\epsilon ) \cdot \text {Ch}\left[ G^{(t)}_{\text {Haar}}\right] , \end{aligned}$$
    (33)

    or equivalently if

    $$\begin{aligned} (1-\epsilon )\cdot \left( \text {Ch}\left[ G^{(t)}_{\text {Haar}}\right] \otimes \textrm{id} \right) \Phi ^{\otimes t}_{d^n}\preceq & {} \left( \text {Ch}\left[ G^{(t)}_{\mu }\right] \otimes \textrm{id} \right) \phi _{d^n}^{\otimes t}\nonumber \\ {}\preceq & {} (1+\epsilon ) \cdot \left( \text {Ch}\left[ G^{(t)}_{\text {Haar}}\right] \otimes \textrm{id} \right) \Phi _{d^n}^{\otimes t}. \end{aligned}$$
    (34)

    The first \(\preceq \) is cp ordering and the second \(\preceq \) is psd ordering.

  2. 2.

    (Monomial definition) \(\mu \) is a monomial based \(\epsilon \)-approximate t-design if for any balanced monomial m(C) of degree at most t

    $$\begin{aligned} \left\| \textsf{vec}\left[ G^{(t)}_\mu \right] -\textsf{vec}\left[ G^{(t)}_\text {Haar}\right] \right\| _\infty \le \frac{\epsilon }{d^{nt}}. \end{aligned}$$
    (35)

    Here for a matrix A, \(\textsf{vec}(A)\) is a vector consisting of the entries of A (in the computational basis).

  3. 3.

    (Diamond definition) \(\mu \) is an \(\epsilon \)-approximate t-design in the diamond measure if

    $$\begin{aligned} \left\| \text {Ch}\left[ G^{(t)}_{\mu }\right] - \text {Ch}\left[ G^{(t)}_{\text {Haar}}\right] \right\| _\diamond \le \epsilon . \end{aligned}$$
    (36)
  4. 4.

    (Trace definition) \(\mu \) is an \(\epsilon \)-approximate t-design in the trace measure if

    $$\begin{aligned} \left\| G_{\mu }^{(t)} - G_{\text {Haar}}^{(t)}\right\| _1 \le \epsilon . \end{aligned}$$
    (37)
  5. 5.

    (TPE) \(\mu \) is a \((d,\epsilon ,t)\) t-copy tensor product expander (TPE) if

    $$\begin{aligned} \left\| G_{\mu }^{(t)} - G_{\text {Haar}}^{(t)}\right\| _\infty \le \epsilon . \end{aligned}$$
    (38)
  6. 6.

    (Anti-concentration) \(\mu \) is an \(\epsilon \) approximate anti-concentration design if

    $$\begin{aligned} \mathop {{\mathbb {E}}}\limits _{C\sim \mu } |\langle 0|C|0\rangle |^4 \le \mathop {{\mathbb {E}}}\limits _{C\sim \text {Haar}} |\langle 0|C|0\rangle |^4 \cdot (1+\epsilon ). \end{aligned}$$
    (39)
  7. 7.

    (Approximate scramblers) \(\mu \) is an \(\epsilon \)-approximate scrambler if for any density matrix \(\rho \) and subset S of qubits with \(|S| \le n/3 \)

    $$\begin{aligned} \mathop {{\mathbb {E}}}\limits _{C \sim \mu } \left\| \rho _S(C) - \frac{I}{2^{|S|}} \right\| ^2_1 \le \epsilon . \end{aligned}$$
    (40)

    where \(\rho _S(C) = {\textrm{Tr}}_{\backslash S}C \rho C^\dagger \) and \({\textrm{Tr}}_{\backslash S}\) is trace over the subset of qubits that is complimentary to S.

  8. 8.

    (Weak approximate decouplers) Let \(M, M', A, A'\) be systems composed of \(m, m, n-m\) and \(n-m\), and let \(\phi _{MM'}\), \(\phi _{AA'}\) and \(\psi _{A'}\) be respectively maximally entangled states along \(M, M'\), maximally entangled state along \(AA'\) and a pure state along \(A'\). \(\mu \) is an \((m,\alpha ,\epsilon )\)-approximate weak decoupler if for any subsystem S of \(M'A'\) with size \(\le \alpha \cdot n\), when \(\mu \) applies to \(M'A'\),

    $$\begin{aligned} \mathop {{\mathbb {E}}}\limits _{C \sim \mu }\left\| \rho _{MS}(C) - \frac{I}{2^{m}} \otimes \frac{I}{2^{|S|}}\right\| _1\le \epsilon . \end{aligned}$$
    (41)

    We consider two definitions. In the first definition the initial state is \(\phi _{MM'} \otimes \phi _{AA'}\) and in the second model it is \(\phi _{MM'} \otimes \psi _{A'}\). Here \(\rho _{MS}(C)\) is the reduced density matrix along MS after the application of \(C \sim \mu \).

3 Approximate t-Designs by Random Circuits with Nearest-Neighbor Gates on D-Dimensional Lattices

In this section we prove theorems 8 and 9, which state that our random circuit models defined for D-dimensional lattices (definitions 5) form approximate t-designs in several measures.

We begin in Sect. 3.1 by outlining some basic utility lemmas. The technical core of the proof is contained in the lemmas in Sect. 3.2 in which we bound various norms of products of Haar projectors onto overlapping sets of qubits. These are proved in Sects. 3.5 and 3.6 respectively. We show how to use these lemmas to prove our main theorems in Sect. 3.3 (for a 2-D grid) and in Sect. 3.4 (for a lattice in \(D>2\) dimensions).

3.1 Basic lemmas

In this section we state some utilities lemmas which are largely independent of the details of our circuit models.

3.1.1 Comparison lemma for random quantum circuits.

Definition 33

A superoperator \({\mathcal {C}}\) is completely positive (cp) if for any psd matrix X, \(({\mathcal {C}}\otimes \textrm{id})(X)\) is also psd. For superoperators \({\mathcal {A}}\) and \({\mathcal {B}}\), \({\mathcal {A}}\preceq {\mathcal {B}}\) if \({\mathcal {B}}-{\mathcal {A}}\) is cp.

Our comparison lemma is simply the following:

Lemma 34

(Comparison). Suppose we have the following cp ordering between superoperators \({\mathcal {A}}_1 \preceq {\mathcal {B}}_1, \ldots ,{\mathcal {A}}_t \preceq {\mathcal {B}}_t\). Then \({\mathcal {A}}_t \ldots {\mathcal {A}}_1 \preceq {\mathcal {B}}_t \ldots {\mathcal {B}}_1\).

Corollary 35

(Overlapping designs). If \(K_1, \ldots , K_t\) are respectively the moments superoperators of \(\epsilon _1, \ldots , \epsilon _t\)-approximate strong k-designs each on a potentially different subset of qudits, then

$$\begin{aligned}{} & {} \text {Ch}\left[ G^{(t)}_{\text {Haar}(S_1)} \ldots G^{(t)}_{\text {Haar}(S_t)}\right] (1-\epsilon _1)\ldots (1-\epsilon _t) \preceq K_1 \ldots K_t \nonumber \\{} & {} \quad \preceq \text {Ch}\left[ G^{(t)}_{\text {Haar}(S_1)} \ldots G^{(t)}_{\text {Haar}(S_t)}\right] (1+\epsilon _1)\ldots (1+\epsilon _t). \end{aligned}$$
(42)

3.1.2 Bound on the value of off-diagonal monomials.

We first formally define an off-diagonal monomial.

Definition 36

(Off-diagonal monomials). A diagonal monomial of balanced degree t of a unitary matrix C is a balanced monomial that can be written as product of absolute square of terms, i.e., \(|C_{a_1,b_1}|^2 \ldots |C_{a_t,b_t}|^2\). A monomial is off-diagonal if it is balanced and not diagonal.

We now define the set of diagonal indices as \({\mathcal {D}}= \{|i,j\rangle \langle i',j'|: i=i',j=j', i,i',j,j' \in [d]^{nt}\}\) and the set of off-diagonal indices as \({\mathcal {O}}= \{|i,j\rangle \langle i',j'|: i\ne i' \text { or } j\ne j', i,i',j,j' \in [d]^{nt}\}\). We note that a diagonal monomial can be written as \({\textrm{Tr}}(C^{\otimes t,t} x)\) for some \(x \in {\mathcal {D}}\) and similarly, an off-diagonal monomial can be written as \({\textrm{Tr}}(C^{\otimes t,t} x)\) for some \(x \in {\mathcal {O}}\).

We relate the strong definition of designs to the monomial definiton via the following lemma.

Lemma 37

Let \(\delta > 0\). Assume that \(\text {Ch}\left[ G_\mu ^{(t)}\right] \) and \(\text {Ch}\left[ G_\nu ^{(t)}\right] \) are two moment superoperators that satisfy the following completely positive ordering

$$\begin{aligned} (1 - \delta ) \cdot \text {Ch}\left[ G_\nu ^{(t)}\right] \preceq \text {Ch}\left[ G_\mu ^{(t)}\right] \preceq (1 + \delta ) \cdot \text {Ch}\left[ G_\nu ^{(t)}\right] . \end{aligned}$$
(43)

Let \({\mathcal {O}}\) and \({\mathcal {D}}\) be respectively the set of off-diagonal and diagonal indices for monomials. Then

$$\begin{aligned} \max _{x \in {\mathcal {O}}} |{\textrm{Tr}}\left( x G^{(t)}_\mu \right) | \le \max _{x \in {\mathcal {O}}} |{\textrm{Tr}}\left( x G^{(t)}_\nu \right) | (1+\delta ) + 2\delta \cdot \max _{y \in {\mathcal {D}}} |{\textrm{Tr}}\left( y G^{(t)}_\nu \right) |. \end{aligned}$$
(44)

3.1.3 Bound on the moments of the Haar measure.

We need the following bound on the t-th monomial moment of the Haar measure. Assume we have m qudits.

Lemma 38

(Moments of the Haar measure). Let \(G_{\text {Haar}(m)}^{(t)}\) be the quasi-projector operator for the Haar measure on m qudits. Then

$$\begin{aligned} \max _y \left\| G_{\text {Haar}(m)}^{(t)} y G_{\text {Haar}(m)}^{(t)}\right\| _1 \le \frac{t^{O(t)}}{d^{mt}}. \end{aligned}$$
(45)

Here the maximization is taken over matrix elements in the computational basis like

\(y= |i_1,\ldots , i_t, i'_1,\ldots , i'_t\rangle \langle j_1,\ldots , j_t, j'_1,\ldots , j'_t|\). Each label (e.g. \(i_j\)) is in \([d]^m\).

3.2 Gap bounds for the product of overlapping Haar projectors

We will later need the following results, with proofs deferred until Sect. 3.6.

Lemma 39

\(\Vert G_C G_R -G_\text {Haar}^{(t)}\Vert _\infty \le 1/d^{\Omega (\sqrt{ n})}\).

Lemma 40

Let \(D = O(\ln n / \ln \ln n)\) with small enough constant factor, then \(\Vert G_{\text {Planes}(D)} G_{\text {Rows}(D,n)} - G_{\text {Haar}} \Vert _\infty \le 1/d^{\Omega (n^{1-1/D})}\).

Lemma 41

Let \(|x\rangle \) and \(|y\rangle \) be two computational basis states. For small enough \(D = O(\ln n / \ln \ln n)\) and large enough c, \(|\langle x | {\tilde{G}}_{n,D,c} -G_\text {Haar}| y\rangle | \le \frac{\epsilon }{d^{nt}}\) for some \(\epsilon = 1/d^{\Omega (n^{1/D})}\).

Lemma 42

For large enough c, \(\left\| \text {Ch}\left[ ( G_R G_C G_R )^{c} -G_\text {Haar}^{(t)}\right] \right\| _\diamond = \frac{t^{O(\sqrt{n} t)}}{d^{\Omega (c \sqrt{n})}}\).

Lemma 43

For small enough \(D = O(\ln n / \ln \ln n)\) and large enough c,

$$\begin{aligned} \left\| \text {Ch}\left[ ( G_{\text {Rows}(D,n)} G_{\text {Planes}(D)} G_{\text {Rows}(D,n)})^{c} - G_\text {Haar}^{(t)}\right] \right\| _\diamond = \frac{t^{O(t n^{1-1/D})}}{d^{\Omega (c n^{1-1/D})}}. \end{aligned}$$
(46)

In these last two lemmas, we see that c will need to grow with t. We believe that a sharper analysis could reduce this dependence, but since we already have a \({{\,\textrm{poly}\,}}(t)\) dependence in s, improving Lemmas 42 and 43 would not make a big difference. In fact, even in 1-D, [13] found a sharp n dependence but their factor of \({{\,\textrm{poly}\,}}(t)\) (which we inherit) is probably not optimal.

3.3 Proof of Theorem 8; t-designs on two-dimensional lattices

Theorem

(Restatement of Theorem 8). Let \(s, c,n > 0\) be positive integers with \(\mu ^{\text {lattice},n}_{2,c,s}\) defined as in Definition 5.

  1. 1.

    \(s \,{=}\, {{\,\textrm{poly}\,}}(t)\!\left( \!\sqrt{n} \,{+}\, \ln \frac{1}{\delta } \!\right) \!, c\,{=}\, O\!\left( \!t \ln t\,{+}\, \frac{\ln (1/\delta )}{\sqrt{n}}\!\right) \! \implies \!\left\| \! \textsf{vec}\!\left[ \!G_{\mu ^{\text {lattice},n}_{2,c,s}}^{(t)}-G_\text {Haar}^{(t)}\!\right] \! \right\| _\infty \,{\le }\, \frac{\delta }{d^{nt}}\).

  2. 2.

    \(s {=} {{\,\textrm{poly}\,}}(t) \left( \sqrt{n} {+} \ln \frac{1}{\delta }\right) , c {=} O\left( t \ln t {+} \frac{\ln (1/\delta )}{\sqrt{n}}\right) \implies \left\| \text {Ch}\left[ G_{\mu ^{\text {lattice},n}_{2,c,s}}^{(t)}{-}G_\text {Haar}^{(t)}\right] \right\| _\diamond {\le } \delta \).

  3. 3.

    \(s = {{\,\textrm{poly}\,}}(t) \left( \sqrt{n} + \ln \frac{1}{\delta } \right) , c = O\left( t \ln t + \frac{\ln (1/\delta )}{\sqrt{n}}\right) \implies \left\| G_{\mu ^{\text {lattice},n}_{2,c,s}}^{(t)} - G_\text {Haar}^{(t)} \right\| _1 \le \delta \).

  4. 4.

    \(\left\| G_{\mu ^{\text {lattice},n}_{2,c,s}}^{(t)} - G_\text {Haar}^{(t)} \right\| _\infty \le c \cdot \sqrt{n} \cdot e^{-s/{{\,\textrm{poly}\,}}(t)} + \frac{1}{d^{O( c \sqrt{n} )}}\).

Proof

  1. 1.

    This item corresponds to convergence of the individual moments of the Haar measure. A balanced moment of a distribution \(\mu \) can be written as

    $$\begin{aligned} \mathop {{\mathbb {E}}}\limits _{C \sim \mu } [C_{i_1, j_1} \ldots C_{i_t,j_t} C^{*}_{i'_1, j'_1} \ldots C^{*}_{i'_t,j'_t} ] =\langle i, i'| G_\mu ^{(t)} | j,j'\rangle = {\textrm{Tr}}[G_\mu ^{(t)} \cdot |j,j'\rangle \langle i,i'|]\nonumber \\ \end{aligned}$$
    (47)

    where \(|i\rangle := |i_1,\ldots ,i_t\rangle \) and so on for \(|i'\rangle ,|j\rangle ,|j'\rangle \). The same moment can also be written as

    $$\begin{aligned} {\textrm{Tr}}\left( |j\rangle \langle j'| \text {Ch}\left[ G_\mu ^{(t)}\right] (|i\rangle \langle i'|)\right) \end{aligned}$$
    (48)

    We will see that the strong design condition established by gives us strong bounds first for the “diagonal” case (\(i=i',j=j'\)) then the off-diagonal case. This is because when we interpret \(G_\mu ^{(t)}\) as a quantum operation, the diagonal monomials correspond to \({\textrm{Tr}}Y G_\mu ^{(t)} X\) for psd matrices XY, and so the strong design condition applies directly. For off-diagonal moments we need to do a bit more work. For each the diagonal and off-diagonal monomials, our strategy will be to first compare with the entries of \(G_R (G_C G_R)^c G_R\) and then to compare to \(G_{\text {Haar}}\).

    First observe that since \(\text {Ch}\left[ G_{\mu ^{\text {lattice},n}_{2,c,s}}^{(t)}\right] = (g^s_R g^s_C)^c g^s_R\) and \(s = {{\,\textrm{poly}\,}}(t) \cdot (\sqrt{n} + \ln (1/\delta ))\) then corollary 6 of [13] implies that each \(g^s_i\) for \(i \in \{R,C\}\) is an \(\delta \)-approximate t-design. Hence, using corollary 35,

    $$\begin{aligned} \text {Ch}\left[ G_R \left( G_C G_R\right] \right) ^c G_R (1 - \frac{\delta }{4t!})] \preceq \text {Ch}\left[ G_{\mu ^{\text {lattice},n}_{2,c,s}}^{(t)}\right] \preceq \text {Ch}\left[ G_R ( G_C G_R\right] )^c G_R (1 + \frac{\delta }{4t!}).\nonumber \\ \end{aligned}$$
    (49)

    Note that we chose \({{\,\textrm{poly}\,}}(t)\) large enough so that the error is as small as \(\frac{\delta }{4t!}\). This choice will be helpful later.

    Focusing first on diagonal monomials \(\left| i\right\rangle \left\langle i\right| ,\left| j\right\rangle \left\langle j\right| \) we can bound

    $$\begin{aligned}{} & {} (1+ \frac{\delta }{4t!}) {\textrm{Tr}}\left( \left| j\right\rangle \left\langle j\right| \text {Ch}\left[ G_R (G_C G_R])^c G_R\right] (\left| i\right\rangle \left\langle i\right| )\right) - \langle i,j|G_{\mu ^{\text {lattice},n}_{2,c,s}}^{(t)}|i,j\rangle \nonumber \\{} & {} \quad = {\textrm{Tr}}\left( \left| j\right\rangle \left\langle j\right| [\text {Ch}\left[ G_R ( G_C G_R\right] \right) ^c G_R (1+\frac{\delta }{4t!})- G_{\mu ^{\text {lattice},n}_{2,c,s}}^{(t)}] ](\left| i\right\rangle \left\langle i\right| )) \ge 0. \end{aligned}$$
    (50)

    In other words, for diagonal monomials

    $$\begin{aligned}&{\textrm{Tr}}\left( \left| j\right\rangle \left\langle j\right| \text {Ch}\left[ G_{{\mu ^{\text {lattice},n}_{2,c,s}}^{(t)}}^{(t)}\right] (\left| i\right\rangle \left\langle i\right| )\right) \nonumber \\&\le (1+ \frac{\delta }{4t!}) {\textrm{Tr}}\left( \left| j\right\rangle \left\langle j\right| \text {Ch}\left[ G_R \left( G_C G_R\right] \right) ^c G_R] (\left| i\right\rangle \left\langle i\right| )\right) \nonumber \\&\quad = (1+ \frac{\delta }{4t!}) {\textrm{Tr}}\left( G_R ( G_C G_R)^c G_R \left| i,j\right\rangle \left\langle i,j\right| \right) . \end{aligned}$$
    (51)

    Similarly, using the first inequality in (49)

    $$\begin{aligned} {\textrm{Tr}}\left( \left| j\right\rangle \left\langle j\right| \text {Ch}\left[ G_{{\mu ^{\text {lattice},n}_{2,c,s}}^{(t)}}^{(t)}\right] (\left| i\right\rangle \left\langle i\right| )\right) \ge (1- \frac{\delta }{4t!}) {\textrm{Tr}}\left( G_R ( G_C G_R)^c G_R \left| i,j\right\rangle \left\langle i,j\right| \right) .\nonumber \\ \end{aligned}$$
    (52)

    The next step is to bound \( {\textrm{Tr}}\left( y G_R (G_C G_R)^c G_R x\right) \):

    $$\begin{aligned} \left| {\textrm{Tr}}\left( G_R (G_C G_R)^c G_R x\right) - {\textrm{Tr}}\left( G_\text {Haar}^{(t)} x\right) \right|{} & {} = \left| {\textrm{Tr}}\left( (G_R (G_C G_R)^c G_R - G_\text {Haar}^{(t)} ) x \right) \right| \nonumber \\{} & {} = \left| {\textrm{Tr}}\left( ((G_C G_R)^c - G_\text {Haar}^{(t)} ) G_R x G_R \right) \right| \nonumber \\{} & {} \le \left\| ((G_C G_R)^c - G_\text {Haar}^{(t)} ) \Vert _\infty \cdot \Vert G_R x G_R \right\| _1\nonumber \\{} & {} \le \left\| G_C G_R- G_\text {Haar}^{( t)} \right\| ^c_\infty \cdot \left( \max _{y \in [d]^{2 \sqrt{n} t}} \Vert G_{r_{1,1}} y G_{r_{1,1}}\Vert _1 \right) ^{\sqrt{n}}.\nonumber \\ \end{aligned}$$
    (53)

    In the third line we have used the Hölder’s inequality. In the last inequality we have used the fact that \(G_1\) is a tensor product of \(G_{r_{1,i}}\) across each column in the first direction; by symmetry we can just consider \(G_{r_{1,1}}\).

    Using Lemma 38

    $$\begin{aligned} \max _{y\in [d]^{2 \sqrt{n} t} } \left\| G_{r_{1,1}} y G_{r_{1,1}}\right\| _1 = \frac{t^{O(t)}}{d^{t \sqrt{n}}}. \end{aligned}$$
    (54)

    Furthermore, using Lemma 39

    $$\begin{aligned} \left\| G_C G_{R}- G_\text {Haar}^{( t)} \right\| _\infty \le \frac{1}{d^{\Omega (\sqrt{n})}}. \end{aligned}$$
    (55)

    therefore

    $$\begin{aligned} \left\| G_C G_R- G_\text {Haar}^{( t)} \right\| ^c_\infty \cdot (\max _{y \in [d]^{2 \sqrt{n} t}} \Vert G_{r_{1,1}} y G_{r_{1,1}}\Vert _1 )^{\sqrt{n}} \le \frac{1}{d^{O(c \cdot \sqrt{n})}} \cdot \big (\frac{t^{O(t)}}{d^{t \sqrt{n}}} \big )^{\sqrt{n}}. \end{aligned}$$
    (56)

    As a result, for some large enough \(c = O(t \ln t + \frac{\ln 1/\delta }{\sqrt{n}})\) we conclude

    $$\begin{aligned}{} & {} | {\textrm{Tr}}\left( G_R (G_C G_R)^c G_R x\right) - M^{(\text {Haar},t)}_x |\nonumber \\{} & {} \quad \le \Vert G_C G_R- G_\text {Haar}^{(t)} \Vert ^c_\infty \cdot (\max _{y\in [d]^{2 \sqrt{n} t}} \Vert G_{r_{1,1}} y G_{r_{1,1}}\Vert _1 )^{\sqrt{n}}\nonumber \\{} & {} \quad \le \frac{\delta }{4 d^{nt}}. \end{aligned}$$
    (57)

    As a result, using Lemma 38 any diagonal monomial satisfies

    $$\begin{aligned} | {\textrm{Tr}}\left( G^{(t)}_{\mu ^{\text {lattice},n}_{2,c,s}}\right) - {\textrm{Tr}}\left( G^{(t)}_{\text {Haar}}\right) |\le & {} |{\textrm{Tr}}\left( G_R ( G_C G_R)^c G_R x\right) -{\textrm{Tr}}\left( G_R ( G_C G_R)^c G_R x\right) | \nonumber \\{} & {} + \frac{\delta }{4t!} |{\textrm{Tr}}\left( G_R ( G_C G_R)^c G_R x\right) |\nonumber \\\le & {} \frac{\delta }{4d^{nt}}+ \frac{\delta }{4t!} (M^{(\text {Haar},t)}_x+ \frac{\delta }{4d^{nt}})\nonumber \\\le & {} \frac{\delta }{4d^{nt}}+ \frac{\delta }{4t!} (t!/d^{nt}+ \frac{\delta }{4d^{nt}})\nonumber \\\le & {} \frac{\delta }{d^{nt}}. \end{aligned}$$
    (58)

    Next, we bound the expected off-diagonal monomials of the distribution. The value of the off-diagonal monomials according to the Haar measure is zero. So it is enough to bound \(\max _{x \in {\mathcal {O}}} | {\textrm{Tr}}\left( G^{(t)}_\mu x \right) |\), where \({\mathcal {O}}\) is the set of off-diagonal indices for moments. In order to do this we use Lemma 37 for \(\mu = \mu ^{\text {lattice},n}_{2,c,s}\) and \(\nu \) being a distribution with moment superoperator \(\text {Ch}[G_R] (\text {Ch}[G_R] \text {Ch}[G_C])^c \text {Ch}[G_R]\).

    $$\begin{aligned} \max _{x \in {\mathcal {O}}} |{\textrm{Tr}}(G^{(t)}_{\mu ^{\text {lattice},n}_{2,c,s}}) x|\le & {} \max _{x \in {\mathcal {O}}} {\textrm{Tr}}(G_R (G_C G_R)^c G_R x) (1+\frac{\delta }{4t!})\nonumber \\{} & {} + \delta /t! \cdot \max _{y \in {\mathcal {D}}} {\textrm{Tr}}(G_R (G_C G_R)^c G_R y). \end{aligned}$$
    (59)

    Here \({\mathcal {D}}\) is the set of diagonal monomials. Using (57)

    $$\begin{aligned} \max _{y \in {\mathcal {D}}} {\textrm{Tr}}(G_R (G_C G_R)^c G_R y) \le \max _{y \in {\mathcal {D}}} {\textrm{Tr}}\left( G_\text {Haar}^{(t)} y\right) + \frac{\delta }{4 d^{nt}} \le \frac{t!}{d^{nt}} + \frac{\delta }{4 d^{nt}}. \end{aligned}$$
    (60)

    In order to bound \(\max _{x \in {\mathcal {O}}} {\textrm{Tr}}(G_R (G_C G_R)^c G_R x)\), we first make the observation that since \(x \in {\mathcal {O}}\), \({\textrm{Tr}}(G_\text {Haar}^{(t)} x) =0\). Therefore

    $$\begin{aligned} \max _{x \in {\mathcal {O}}} |{\textrm{Tr}}(G_R (G_C G_R)^c G_R x)|= & {} \max _{x \in {\mathcal {O}}} |{\textrm{Tr}}((G_R (G_C G_R)^c G_R - G_\text {Haar}^{(t)})x)|\nonumber \\\le & {} \max _{x \in {\mathcal {O}}} |{\textrm{Tr}}((G_R G_C)^c- G_\text {Haar}^{(t)}) G_R x G_R)|\nonumber \\\le & {} \frac{\delta }{4 d^{nt}}. \end{aligned}$$
    (61)

    therefore using (57), (59) and (61) we conclude

    $$\begin{aligned} \max _{x \in {\mathcal {O}}} |{\textrm{Tr}}(G^{(t)}_{\mu ^{\text {lattice},n}_{2,c,s}}x )|\le & {} \frac{\delta }{4 d^{nt}} (1+\delta / (4t!)) + \frac{\delta }{4t!} \cdot \big (\frac{t!}{d^{nt}} + \frac{\delta }{4 d^{nt}}\big )\nonumber \\\le & {} \frac{\delta }{2 d^{nt}} + 2 \delta /(4d^{nt}) \le \frac{\delta }{d^{nt}}. \end{aligned}$$
    (62)
  2. 2.
    $$\begin{aligned} \left\| \text {Ch}\left[ G_{\mu ^{\text {lattice},n}_{2,c,s}}^{(t)}-G_\text {Haar}^{(t)}\right] \right\| _\diamond\le & {} \left\| \text {Ch}\left[ g^s_R (g^s_C g^s_R)^c - (G_R G_C G_R\right] )^c \right\| _\diamond \nonumber \\{} & {} + \left\| \left( \text {Ch}\left[ G_R G_C G_R\right] \right) ^c - G_\text {Haar}^{(t)}] \right\| _\diamond \nonumber \\\le & {} 4c \cdot \left\| \text {Ch}\left[ g^s_{r_{1,1}}\right] ^{\otimes \sqrt{n}} - \left[ G_{r_{1,1}}\right] ^{\otimes \sqrt{n}} \right\| _\diamond + \big (\frac{t^t}{d^{c}}\big )^{O(\sqrt{n})}\nonumber \\\le & {} 4c \cdot \sqrt{n} \cdot \left\| \text {Ch}\left[ g^s_{r_{1,1}} - G_{r_{1,1}}\right] \right\| _\diamond + \big (\frac{t^t}{d^{c}}\big )^{O(\sqrt{n})}\nonumber \\\le & {} \delta /2 + \delta /2\nonumber \\\le & {} \delta . \end{aligned}$$
    (63)

    In the first line we have used triangle inequality and the definition \(K_{\mu ^{\text {lattice},n}_{2,c,s}}^{(t)} = (\prod _\alpha g^s_{\text {Rows}(\alpha ,n)})^c\). In the second line, for the first term we have used Lemma 28 and that all operators are compositions of moment superoperators. For the second part we have used Lemma 42. In the third inequality we have used Lemma 29. In fourth inequality, the first term (\(\delta /2\)) comes from lemma 3 and corollary 6 of [13] for \(s= {{\,\textrm{poly}\,}}(t) \cdot (\sqrt{n} + \ln \frac{1}{\delta } )\), and the second \(\delta /2\) is by the choice \(c = O(t \ln t + \frac{\ln (1/\delta )}{\sqrt{n}})\).

  3. 3.

    Let \(Q_0:= G_{r_{1,1}}\) and \(Q_1:= G_{r_{1,1}} - g^s_{r_{1,1}}\), and for \(x \in \{0,1\}^{\sqrt{n}}\) let \(Q_x = Q_{x_1} \ldots Q_{x_{\sqrt{n}}}\). Here, \(\Vert Q_0\Vert _1 = t!\) and \(\Vert Q_x\Vert _1 = t!^{\sqrt{n} - |x|} \cdot \Vert G_{r_{1,1}} - g^s_{r_{1,1}}\Vert ^{|x|}\).

    $$\begin{aligned} \Vert G_{\mu ^{\text {lattice},n}_{2,c,s}}^{(t)} - G_\text {Haar}^{(t)} \Vert _1\le & {} \Vert (g^s_C g^s_R)^c - (G_C G_R)^c \Vert _1 + \Vert (G_C G_R)^c - G_\text {Haar}^{(t)} \Vert _1\nonumber \\\le & {} 4 c \cdot \Vert (g^s_{r_{1,1}})^{\otimes \sqrt{n}} - G_{r_{1,1}}^{\otimes \sqrt{n}} \Vert _1 + t^{O(t) \sqrt{n}} \Vert G_C G_R - G_\text {Haar}^{(t)} \Vert ^c_\infty \nonumber \\ \end{aligned}$$
    (64)

    We bound the two terms separately. First

    $$\begin{aligned} 4 c \Vert (g^s_{r_{1,1}})^{\otimes \sqrt{n}} - G_{r_{1,1}}^{\otimes \sqrt{n}} \Vert _1&\le 4 c \cdot \sum _{x \in \{0,1\}^{\sqrt{n}} : x \ne 0} \Vert Q_x\Vert _1 \nonumber \\&\le 4 c \cdot [(t! + \Vert g^s_{r_{1,1}} - G_{r_{1,1}}\Vert _1) ^{\sqrt{n}} - t!^{\sqrt{n}}] \nonumber \\&= 4c t! ((1 + \Vert g^s_{r_{1,1}} - G_{r_{1,1}}\Vert _1 / t!)^{\sqrt{n}}-1) \nonumber \\&\le 4c \cdot 2 \sqrt{n} \Vert g^s_{r_{1,1}} - G_{r_{1,1}}\Vert _1 \end{aligned}$$

    The last line needs s to be large enough that \(\sqrt{n} \Vert g^s_{r_{1,1}} - G_{r_{1,1}}\Vert _1 \le 1/(2\sqrt{n})\).

    $$\begin{aligned}&\le 8c\sqrt{n} (dt!)^{\sqrt{n}}\cdot \Vert g^s_{r_{1,1}} - G_{r_{1,1}} \Vert _\infty \nonumber \\&\le 8c\sqrt{n} (dt!)^{\sqrt{n}} (1-1/{{\,\textrm{poly}\,}}(t))^s \nonumber \\&\le \delta /2 \ \end{aligned}$$
    (65)

    Now we bound the second term of (64).

    $$\begin{aligned} t^{O(t) \sqrt{n}} \Vert G_C G_R - G_\text {Haar}^{(t)} \Vert ^c_\infty&\le t^{O(t) \sqrt{n}} (d^{-\Omega (\sqrt{n})})^c \quad \text {using Lemma}~39 \end{aligned}$$
    (66)
    $$\begin{aligned}&\le t^{C_1t\sqrt{n}} d^{-cC_2\sqrt{n}} \quad \text {for some universal constants}\, C_1,C_2>0 \end{aligned}$$
    (67)
    $$\begin{aligned}&= (t^{C_1t}/d^{cC_2})^{\sqrt{n}} \nonumber \\&\le \delta /2. \end{aligned}$$
    (68)

    In the last step we need to choose the implicit constant in the definition of c based on \(C_1,C_2\).

  4. 4.
    $$\begin{aligned} \Vert G_{\mu ^{\text {lattice},n}_{2,c,s}}^{(t)} - G_\text {Haar}^{(t)} \Vert _\infty\le & {} \Vert (g^s_C g^s_R )^c - (G_C G_R)^c \Vert _\infty + \Vert (G_C G_R)^c - G_\text {Haar}^{(t)} \Vert _\infty \nonumber \\\le & {} 4 c \cdot \Vert (g^s_{r_{1,1}})^{\otimes \sqrt{n}} - G_{r_{1,1}}^{\otimes \sqrt{n}} \Vert _\infty + \Vert G_C G_R - G_\text {Haar}^{(t)} \Vert ^c_\infty \nonumber \\\le & {} 4 c \cdot \sqrt{n} \cdot \Vert g^s_{r_{1,1}} - G_{r_{1,1}} \Vert _\infty + \frac{1}{d^{\Omega ( c \sqrt{n} )}}\nonumber \\\le & {} 4 c \cdot \sqrt{n} \cdot e^{-s/{{\,\textrm{poly}\,}}(t)} + \frac{1}{d^{\Omega ( c \sqrt{n} )}}. \end{aligned}$$
    (69)

    These steps follow from the proof of part 1. \(\square \)

3.4 Proof of Theorem 9; t-designs on D-dimensional lattices

Throughout this section we treat D and t as constants.

Theorem

(Restatement of Theorem 9). There exists a value \(\delta = 1/d^{\Omega (n^{1/D})}\) such that for some large enough c depending on D and t:

  1. 1.

    \(s > c \cdot n^{1/D} \implies \left\| \textsf{vec}\left[ G_{\mu ^{\text {lattice},n}_{D,c,s}}^{(t)}-G_\text {Haar}^{(t)}\right] \right\| _\infty \le \frac{\delta }{d^{nt}}\).

  2. 2.

    \(s > c \cdot n^{1/D} \implies \left\| \text {Ch}\left[ G_{\mu ^{\text {lattice},n}_{D,c,s}}^{(t)}-G_\text {Haar}^{(t)}\right] \right\| _\diamond \le \delta \).

  3. 3.

    \(s > c \cdot n^{1/D} \implies \left\| G_{\mu ^{\text {lattice},n}_{D,c,s}}^{(t)} - G_\text {Haar}^{(t)} \right\| _\infty \le \delta \).

  4. 4.

    \(s > c \cdot n^{1/D} \implies \left\| G_{\mu ^{\text {lattice},n}_{D,c,s}}^{(t)} - G_\text {Haar}^{(t)} \right\| _1 \le \delta \).

Proof

  1. 1.

    Consider the moment superoperator for the D-dimensional random circuit distribution \(\text {Ch}\left[ G_{\mu ^{\text {lattice},n}_{D,c,s}}\right] \), where for \(3\le \alpha \le D\), \(\kappa _{\text {Rows}(\alpha ,n)}\) is defined according to the recursive formula \(\kappa _{\alpha } = \kappa ^{\otimes n^{1/\alpha }}_{\alpha -1} ((\text {Ch}[g_i])^s \kappa ^{\otimes n^{1/\alpha }}_{\alpha -1})^c \).

    Using corollary 6 of [13], if \(s = O (n^{1/D})\) then each \(g^s_{\text {Rows}(\alpha ,n)}\) for \(1\le \alpha \le D\) satisfies a \(1/d^{\Omega (n^{1/D})}\)-approximate t-design property. Hence, using corollary 35

    $$\begin{aligned} \text {Ch}[{\tilde{G}}_{n,D,c}] (1 - 1/d^{\Omega (n^{1/D})}) \preceq \text {Ch}\left[ G_{\mu ^{\text {lattice},n}_{2,c,s}}^{(t)}\right] \preceq \text {Ch}\left[ {\tilde{G}}_{n,D,c}\right] (1 + 1/d^{\Omega (n^{1/D})}).\nonumber \\ \end{aligned}$$
    (70)

    Therefore,

    $$\begin{aligned} (1 - 1/d^{\Omega (n^{1/D})}) {\textrm{Tr}}({\tilde{G}}_{n,D,c} x) \le {\textrm{Tr}}\left( G^{(t)}_{\mu ^{\text {lattice},n}_{D,c,s}} x\right) \le (1+ 1/d^{\Omega (n^{1/D})}) {\textrm{Tr}}({\tilde{G}}_{n,D,c} x). \nonumber \\ \end{aligned}$$
    (71)

    Where x is a matrix \(|i, j\rangle \langle i',j'|\) for \(i, j, i',j' \in [d]^{nt}\).

    Next, we use Lemma 41. This lemma along with the bound in (71) and Lemma 38 proves the stated bound for diagonal monomials:

    $$\begin{aligned} |{\textrm{Tr}}\left( G^{(t)}_{\mu ^{\text {lattice},n}_{D,c,s}} x\right) -{\textrm{Tr}}(G^{(t)}_{\text {Haar}} x)|\le & {} |{\textrm{Tr}}({\tilde{G}}_{n,D,c} x) - {\textrm{Tr}}(G^{(t)}_{\text {Haar}} x)| +|{\textrm{Tr}}({\tilde{G}}_{n,D,c} x)|1/d^{\Omega (n^{1/D})}\nonumber \\\le & {} \frac{1/d^{\Omega (n^{1/D})}}{d^{nt}} + (|{\textrm{Tr}}(G^{(t)}_{\text {Haar}} x)| + \frac{1/d^{\Omega (n^{1/D})}}{d^{nt}})1/d^{\Omega (n^{1/D})}\nonumber \\\le & {} \frac{1/d^{\Omega (n^{1/D})}}{d^{nt}} + (t!/d^{nt} + \frac{1/d^{\Omega (n^{1/D})}}{d^{nt}})1/d^{\Omega (n^{1/D})}\nonumber \\\le & {} \frac{1/d^{\Omega (n^{1/D})}}{d^{nt}}. \end{aligned}$$
    (72)

    Next, we bound off-diagonal monomials \(\max _{x \in {\mathcal {O}}} | {\textrm{Tr}}\left( G^{(t)}_\mu x \right) |\). Again, we use Lemma 37 for \(\mu = \mu ^{\text {lattice},n}_{D,c,s}\) and \(\nu \) being a distribution with moment superoperator \(K_{\mu ^{\text {lattice},n}_{D,c,s}}\) (or the quasi-projector \({\tilde{G}}_{n,D,c} \)):

    $$\begin{aligned} \max _{x \in {\mathcal {O}}} |{\textrm{Tr}}\left( G^{(t)}_{\mu ^{\text {lattice},n}_{D,c,s}} x\right) |\le & {} \max _{x \in {\mathcal {O}}} {\textrm{Tr}}({\tilde{G}}_{n,D,c} x) (1+1/d^{\Omega (n^{1/D})}) \nonumber \\{} & {} \quad + 1/d^{\Omega (n^{1/D})} \cdot \max _{y \in {\mathcal {D}}} {\textrm{Tr}}({\tilde{G}}_{n,D,c} y). \end{aligned}$$
    (73)

    Using Lemma 41

    $$\begin{aligned} \max _{y \in {\mathcal {D}}} {\textrm{Tr}}({\tilde{G}}_{n,D,c} y) \le \max _{y \in {\mathcal {D}}} {\textrm{Tr}}\left( G_\text {Haar}^{(t)} y\right) + \frac{1/d^{\Omega (n^{1/D})}}{ d^{nt}} \le \frac{t!}{d^{nt}} + \frac{1/d^{\Omega (n^{1/D})}}{ d^{nt}}. \end{aligned}$$
    (74)

    Similar to (61) we can show

    $$\begin{aligned} \max _{x \in {\mathcal {O}}} |{\textrm{Tr}}\left( {\tilde{G}}_{n,D,c} x\right) |= & {} \max _{x \in {\mathcal {O}}} |{\textrm{Tr}}\left( ({\tilde{G}}_{n,D,c} - G_\text {Haar}^{(t)})x\right) |\nonumber \\\le & {} \max _{x \in {\mathcal {O}}} |{\textrm{Tr}}\left( ({{\hat{G}}}_{n,D,c} )^c- G_\text {Haar}^{(t)}\right) {\tilde{G}}_{n^{1-1/D},D-1,c}^{\otimes n^{1/D}} x {\tilde{G}}_{n^{1-1/D},D-1,c}^{\otimes n^{1/D}})|\nonumber \\\le & {} \frac{1/d^{\Omega (n^{1/D})}}{ d^{nt}}, \end{aligned}$$
    (75)

    therefore using (73), (74) and (75) we conclude that any monomial \(M^{(\mu ^{\text {lattice},n}_{D,c,s},t)}_x\) satisfies

    $$\begin{aligned} \max _{x \in {\mathcal {O}}} |{\textrm{Tr}}\left( G^{(t)}_{\mu ^{\text {lattice},n}_{D,c,s}} x\right) |\le & {} \frac{1/d^{\Omega (n^{1/D})}}{d^{nt}}. \end{aligned}$$
    (76)
  2. 2.

    Let \(\epsilon _{D,n}:= \left\| \text {Ch}\left[ G_{\mu ^{\text {lattice},n}_{D,c,s}}\right] - \text {Ch}\left[ G_\text {Haar}^{(t)}\right] \right\| _\diamond \). We use induction to show that \(\epsilon _{D,n} = 1/d^{\Omega (n^{1/D})}\) for any integers n and D. This is true for \(D=2\) by Theorem 8. Assuming \(\epsilon _{D-1, n} = 1/d^{\Omega (n^{1/(D-1)})}\) for any n, we show that \(\epsilon _{D,n} = 1/d^{\Omega (n^{1/D})}\).

    $$\begin{aligned} \epsilon _{D,n}:= & {} \left\| \text {Ch}\left[ G_{\mu ^{\text {lattice},n}_{D,c,s}}- G_\text {Haar}^{(t)}\right] \right\| _\diamond \nonumber \\\le & {} \left\| \text {Ch}\left[ G_{\mu ^{\text {lattice},n}_{D,c,s}} - {\tilde{G}}_{n,D,c}\right] \right\| _\diamond + \left\| \text {Ch}\left[ {\tilde{G}}_{n,D,c} - G_\text {Haar}^{(t)}\right] \right\| _\diamond \nonumber \\\le & {} {{\,\textrm{poly}\,}}(n) \cdot \left\| \text {Ch}\left[ (g^s_{r_{1,1}})^{\otimes n^{1-1/D}} - G_{r_{1,1}}^{\otimes n^{1-1/D}}\right] \right\| _\diamond \nonumber \\{} & {} + \left\| \text {Ch}\left[ {\tilde{G}}_{n,D,c}-G_\text {Haar}^{(t)}\right] \right\| _\diamond \nonumber \\\le & {} O(n^{1-1/D}) \cdot \Vert \text {Ch}\left[ g^s_{r_{1,1}} - G_{r_{1,1}}\right] \Vert _\diamond \nonumber \\{} & {} +\Vert \text {Ch}\left[ ({\tilde{G}}^{\otimes n^{1/D}}_{n^{1-1/D},D-1, c, s}G_{\text {Rows}(D,n)}{\tilde{G}}^{\otimes n^{1/D}}_{n^{1-1/D},D-1, c, s})^c - G_\text {Haar}^{(t)}\right] \Vert _\diamond \nonumber \\\le & {} O(n) \cdot 1/d^{\Omega (n^{1/D})}\nonumber \\{} & {} +\left\| \text {Ch}\left[ ({\tilde{G}}^{\otimes n^{1/D}}_{n^{1-1/D},D-1, c, s}G_{\text {Rows}(D,n)}{\tilde{G}}^{\otimes n^{1/D}}_{n^{1-1/D}, D-1, c,s})^c - G_\text {Haar}^{(t)}\right] \right\| _\diamond \nonumber \\\le & {} 1/d^{\Omega (n^{1/D})}+\left\| \text {Ch}\left[ ({\tilde{G}}^{\otimes n^{1/D}}_{n^{1-1/D},D-1, c, s} G_{\text {Rows}(D,n)} {\tilde{G}}^{\otimes n^{1/D}}_{n^{1-1/D}, D-1, c,s})^c - G_\text {Haar}^{(t)}\right] \right\| _\diamond \nonumber \\ \end{aligned}$$
    (77)

    The third line is by triangle inequality. The fourth inequality is by Lemma 28. The fifth line is by Lemma 29 and the definition \({\tilde{G}}_{n,D,c} = ([{\tilde{G}}^{\otimes n^{1/D}}_{n^{1-1/D},D-1, c, s}G_{\text {Rows}(D,n)}{\tilde{G}}^{\otimes n^{1/D}}_{n^{1-1/D}, D-1, c,s}])^c\). The sixth line is by lemma 3 and corollary 6 of [13], which assert that after linear depth in the number of qudits (\(n^{1/D}\)), the random circuit model we consider is \(\epsilon \)-approximate t-design in the diamond measure, and that \(\epsilon \) can be made exponentially small in \(n^{1/D}\).

    Next, we bound \(\left\| \text {Ch}\left[ ({\tilde{G}}^{\otimes n^{1/D}}_{n^{1-1/D},D-1, c, s}G_{\text {Rows}(D,n)}{\tilde{G}}^{\otimes n^{1/D}}_{n^{1-1/D}, D-1, c,s})^c - G_\text {Haar}^{(t)}\right] \right\| _\diamond \). We first relate this expression to the superoperator \(\text {Ch}\left[ G_{\text {Planes}(D)}\right] \). Using triangle inequality and Lemma 28:

    $$\begin{aligned}{} & {} \left\| \text {Ch}\left[ ({\tilde{G}}^{\otimes n^{1/D}}_{n^{1-1/D},D-1, c, s} G_{\text {Rows}(D,n)}{\tilde{G}}^{\otimes n^{1/D}}_{n^{1-1/D}, D-1, c,s})^c - G_\text {Haar}^{(t)}\right] \right\| _\diamond \\{} & {} \quad \le \left\| \text {Ch}\left[ ( G_{\text {Rows}(D,n)}{\tilde{G}}^{\otimes n^{1/D}}_{n^{1-1/D}, D-1, c,s}G_{\text {Rows}(D,n)})^{c-1} - G_\text {Haar}^{(t)}\right] \right\| _\diamond \\{} & {} \quad \le \left\| \text {Ch}\left[ ( G_{\text {Rows}(D,n)} ({\tilde{G}}^{\otimes n^{1/D}}_{n^{1-1/D}, D-1, c,s}- G_{\text {Planes({D,n})}}) G_{\text {Rows}(D,n)}\right. \right. \\{} & {} \qquad \left. \left. +G_{\text {Rows}(D,n)} G_{\text {Planes}(D)} G_{\text {Rows}(D,n)} )^{c-1} - G_\text {Haar}^{(t)}\right] \right\| _\diamond \\{} & {} \quad \le O\left( \Vert \text {Ch}\left[ {\tilde{G}}^{\otimes n^{1/D}}_{n^{1-1/D}, D-1, c,s}- G_{\text {Planes}(D)}\right] \Vert _\diamond \right) \\{} & {} \qquad +\left\| \text {Ch}\left[ ( G_{\text {Rows}(D,n)} G_{\text {Planes}(D)} G_{\text {Rows}(D,n)} )^{c-1} - G_\text {Haar}^{(t)}\right] \right\| _\diamond \\{} & {} \quad \le O(n) \left\| \text {Ch}\left[ {\tilde{G}}_{n^{1-1/D}, D-1, c,s}- G_{\text {Haar}(p_1)}\right] \right\| _\diamond \\{} & {} \qquad +\left\| \text {Ch}\left[ ( G_{\text {Rows}(D,n)} G_{\text {Planes}(D)} G_{\text {Rows}(D,n)} )^{c-1} - G_\text {Haar}^{(t)}\right] \right\| _\diamond \\{} & {} \quad \le O(n) \epsilon _{D-1, n^{1-1/D}} +\left\| \text {Ch}\left[ ( G_{\text {Rows}(D,n)} G_{\text {Planes}(D)} G_{\text {Rows}(D,n)} )^{c-1} - G_\text {Haar}^{(t)}\right] \right\| _\diamond \\{} & {} \quad \le O(n) \frac{1}{d^{n^{1/D}}} +\left\| ( \text {Ch}\left[ G_{\text {Rows}(D,n)} G_{\text {Planes}(D)} G_{\text {Rows}(D,n)} )^{c-1} - G_\text {Haar}^{(t)}\right] \right\| _\diamond \\{} & {} \quad \le \frac{1}{d^{n^{1/D}}} +\left\| ( \text {Ch}\left[ G_{\text {Rows}(D,n)} G_{\text {Planes}(D)} G_{\text {Rows}(D,n)})^{c-1} - G_\text {Haar}^{(t)}\right] \right\| _\diamond . \end{aligned}$$

    The first line is by Lemma 27. The third line is by triangle inequality and Lemma 27. The fourth line is by Lemma 29 and that \({\tilde{G}}_{\text {Planes}(D)}\) is a tensor product of Haar moment operators. Note in the sixth line we have used the induction hypothesis: \(\epsilon _{D-1, n^{1-1/D}} = 1/d^{O\big (\frac{n^{1-1/D}}{D-1}\big )} = \frac{1}{d^{\Omega (n^{1/D})}}\).

    Using Lemma 43\(\left\| ( \text {Ch}[G_{\text {Rows}(D,n)}] {\tilde{G}}_{\text {Planes}(D)} \text {Ch}[G_{\text {Rows}(D,n)}] )^{c-1} - \text {Ch}[G_\text {Haar}^{(t)}] \right\| _\diamond = \frac{1}{d^{\Omega (n^{1/D})}}\) and this completes the proof.

  3. 3.

    Define \(\epsilon _{D,n}:= \left\| G_{\mu ^{\text {lattice},n}_{2,c,s}}^{(t)} - G_\text {Haar}^{(t)} \right\| _1\). By induction assume \(\epsilon _{D-1, n} = 1/d^{\Omega (n^{1/{D-1}})}\) for all n. We would like to show that \(\epsilon _{D,n} = 1/d^{\Omega (n^{1/D})}\).

    $$\begin{aligned} \epsilon _{D,n}:= & {} \left\| G_{\mu ^{\text {lattice},n}_{D,c,s}}^{(t)} - G_\text {Haar}^{(t)} \right\| _1 \nonumber \\= & {} \left\| G^{(t) \otimes n^{1/D}}_{\mu ^{\text {lattice},n^{1-1/D}}_{D-1,c,s}} (g_{\text {Rows}(D,n)}^s G^{(t)\otimes n^{1/D}}_{\mu ^{\text {lattice},n^{1-1/D}}_{D-1,c,s}} )^c-G_\text {Haar}^{(t)}\right\| _1 \end{aligned}$$
    (78)

    Write \(G^{(t) \otimes n^{1/D}}_{\mu ^{\text {lattice},n^{1-1/D}}_{D-1,c,s}} = G_{\text {Planes}(D)} + (G^{(t) \otimes n^{1/D}}_{\mu ^{\text {lattice},n^{1-1/D}}_{D-1,c,s}} - G_{\text {Planes}(D)})=: Z_0 + Z_1\). Our strategy is to expand (78) in terms of \(G_{\text {Planes}(D)}\):

    $$\begin{aligned}&\left\| (\delta + G_{\text {Planes}(D)}) (g_{\text {Rows}(D,n)}^s (\delta + G_{\text {Planes}(D)}) )^c-G_\text {Haar}^{(t)}\right\| _1 \nonumber \\&\quad = \sum _{\phi \in \{0,1\}^{c+1}} \left\| Z_{\phi _0} \prod _{i=1}^c (g_{\text {Rows}(D,n)}^s Z_{\phi _i}) - G_\text {Haar}^{(t)} \right\| _1 \nonumber \\&\quad \le \underbrace{\sum _{\phi \in \{0,1\}^{c+1} \backslash 0^{c+1}} \left\| Z_{\phi _0} \prod _{i=1}^c (g_{\text {Rows}(D,n)}^s Z_{\phi _i}) \right\| _1}_{(1)} + \underbrace{\left\| Z_0 (g_{\text {Rows}(D,n)}^s Z_0)^c - G_\text {Haar}^{(t)} \right\| _1}_{(2)} \end{aligned}$$
    (79)

    To bound (1), observe that each term contains at least one \(Z_1\). We would like to bound \(\Vert Z_1\Vert _1\). Observe that \(G_{\text {Planes}} =G^{(t)\otimes n^{1/D}}_{\text {Haar}(n^{1-1/D})}\), so

    $$\begin{aligned} \Vert Z_1\Vert _1&= \left\| G^{(t) \otimes n^{1/D}}_{\mu ^{\text {lattice},n^{1-1/D}}_{D-1,c,s}} -G^{(t)\otimes n^{1/D}}_{\text {Haar}(n^{1-1/D})} \right\| _1\nonumber \\&= \left\| \sum _{i=1}^{n^{1/D}} G^{(t) \otimes i-1}_{\mu ^{\text {lattice},n^{1-1/D}}_{D-1,c,s}} (G^{(t) }_{\mu ^{\text {lattice},n^{1-1/D}}_{D-1,c,s}} - G^{(t)}_{\text {Haar}(n^{1-1/D})}) G^{(t)\otimes n^{1/D}-i}_{\text {Haar}(n^{1-1/D})} \right\| _1\nonumber \\&\le \sum _{i=1}^{n^{1/D}} \left\| G^{(t)}_{\mu ^{\text {lattice},n^{1-1/D}}_{D-1,c,s}}\right\| _1^{i-1} \left\| G^{(t) }_{\mu ^{\text {lattice},n^{1-1/D}}_{D-1,c,s}} - G^{(t)}_{\text {Haar}(n^{1-1/D})}\right\| _1 \left\| G^{(t)}_{\text {Haar}(n^{1-1/D})}\right\| _1^{ n^{1/D}-i} \end{aligned}$$
    (80)
    $$\begin{aligned}&\le \sum _{i=1}^{n^{1/D}} (t! + \epsilon _{D-1,n})^{i-1} \epsilon _{D-1,n} t!^{n^{1/D}-i} \end{aligned}$$
    (81)
    $$\begin{aligned}&\le n^{1/D} (t! + \epsilon _{D-1,n})^{n^{1/D}} d^{-\Omega (n^{1/(D-1)})}. \end{aligned}$$
    (82)

    This final expression is \(\le d^{-\Omega (n^{1/D})}\) for n sufficiently large relative to dtD. Equation (81) uses the induction hypothesis as well as the fact that \(G^{(t)}_{\text {Haar}(m)}\) is a projector of rank \(\le t!\) for any m. (In fact this is an equality when \(m \ge \ln (t)\).) This last fact is standard and can be found in Lemma 17 of [13], with the relevant math background in [30, 33].

    For (2), we observe that \((G_{\text {Planes}(D)} g_{\text {Rows}(D,n)}^s)^c- G_\text {Haar}^{(t)}\) has rank \(t!^{O(n^{1/D})}\) so the cost of moving to the infinity norm is moderate:

    $$\begin{aligned} \left\| (G_{\text {Planes}(D)} g_{\text {Rows}(D,n)}^s)^c- G_\text {Haar}^{(t)}\right\| _1\le & {} t!^{O(n^{1/D})} \left\| (G_{\text {Planes}(D)} g_{\text {Rows}(D,n)}^s)^c- G_\text {Haar}^{(t)}\right\| _\infty \end{aligned}$$
    (83)
    $$\begin{aligned}= & {} t!^{O(n^{1/D})} \left\| G_{\text {Planes}(D)} g_{\text {Rows}(D,n)}^s- G_\text {Haar}^{(t)}\right\| ^c_\infty \nonumber \\ \end{aligned}$$
    (84)

    We now bound \(\left\| G_{\text {Planes}(D)} g_{\text {Rows}(D,n)}^s- G_\text {Haar}^{(t)}\right\| _\infty \) using a variant of the proof of part 3 of this theorem.

    $$\begin{aligned} \left\| G_{\text {Planes}(D)} g_{\text {Rows}(D,n)}^s- G_\text {Haar}^{(t)}\right\| _\infty\le & {} \left\| g_{\text {Rows}(D,n)}^s- G_{r_{1,1}}^{\otimes n^{1-1/D}}\right\| _\infty \nonumber \\ {}{} & {} + \left\| G_{\text {Planes}(D)} G_{r_{1,1}}^{\otimes n^{1-1/D}}- G_\text {Haar}^{(t)}\right\| ^c_\infty \end{aligned}$$
    (85)

    Using [13] and Lemma 29

    $$\begin{aligned} \left\| g_{\text {Rows}(D,n)}^s- G_{r_{1,1}}^{\otimes n^{1-1/D}}\right\| _\infty \le O(n^{1-1/D})\left\| g_{r_{1,1}}^s- G_{r_{1,1}}\right\| _\infty = \frac{1}{d^{\Omega (n^{1/D})}}. \end{aligned}$$

    Moreover, using lemma 40\(\left\| G_{\text {Planes}(D)} G_{r_{1,1}}^{\otimes n^{1-1/D}}- G_\text {Haar}^{(t)}\right\| ^c_\infty = \frac{1}{d^{\Omega (n^{1/D})}}\). This completes the proof by taking the constant in the \(\Omega (n^{1/D})\) in the last exponent sufficiently larger than the constant in the \(O(n^{1/D})\) exponent in (85). Here we are ignoring the dependence on dtD. Taking this into account properly would yield a depth that scales polynomially with with t, with the degree of the polynomial depending on D.

  4. 4.

    Define \(\epsilon _{D,n}:= \left\| G_{\mu ^{\text {lattice},n}_{2,c,s}}^{(t)} - G_\text {Haar}^{(t)} \right\| _\infty \). By induction assume \(\epsilon _{D,n} = 1/d^{\Omega (n^{1/D})}\) for any integers n and D. Assuming \(\epsilon _{D-1, n} = 1/d^{\Omega (n^{1/{D-1}})}\) for all n, we show that \(\epsilon _{D,n} = 1/d^{\Omega (n^{1/D})}\).

    $$\begin{aligned} \epsilon _{D,n}:= & {} \left\| G_{\mu ^{\text {lattice},n}_{2,c,s}}^{(t)} - G_\text {Haar}^{(t)} \right\| _\infty \nonumber \\\le & {} \left\| G_{\mu ^{\text {lattice},n}_{D,c,s}} - {\tilde{G}}_{n,D,c} \right\| _\infty + \left\| {\tilde{G}}_{n,D,c} - G_\text {Haar}^{(t)} \right\| _\infty \nonumber \\\le & {} {{\,\textrm{poly}\,}}(n) \cdot \left\| (g^s_{r_{1,1}})^{\otimes n^{1-1/D}} -G_{r_{1,1}}^{\otimes n^{1-1/D}} \right\| _\infty \nonumber \\{} & {} + \left\| G_{\text {Rows}(D,n)} {\tilde{G}}_{n^{1-1/D},D-1,c}^{\otimes n^{1/D}} - G_\text {Haar}^{(t)} \right\| ^c_\infty \nonumber \\\le & {} {{\,\textrm{poly}\,}}(n) \cdot \left\| (g^s_{r_{1,1}})^{\otimes n^{1-1/D}} - G_{r_{1,1}}^{\otimes n^{1-1/D}} \right\| _\infty \nonumber \\{} & {} + \Big \Vert G_{\text {Rows}(D,n)} ({\tilde{G}}_{n^{1-1/D},D-1,c}^{\otimes n^{1/D}} -F_{\text {Rows}(D,n)})\nonumber \\{} & {} + G_{\text {Rows}(D,n)} F_{\text {Rows}(D,n)}- G_\text {Haar}^{(t)}\Big \Vert ^c_\infty \nonumber \\\le & {} O(n) 1/d^{\Omega (n^{1/D})}+ O(n) \epsilon _{D-1, n^{1-1/D}} + 1/d^{\Omega (n^{1-1/D}) c}\nonumber \\\le & {} d^{- \Omega (n^{1/D})}+ 1/d^{\Omega (n^{1/D})} + 1/d^{\Omega (n^{1-1/D}) c}\nonumber \\\le & {} d^{- \Omega (n^{1/D})}. \end{aligned}$$
    (86)

    These steps follow from the proof of part 2.

\(\square \)

3.5 Proofs of the basic lemmas stated in Sect. 3.1

3.5.1 Comparison lemma for random quantum circuits.

Lemma

(Restatement of Lemma 34). Suppose we have the following cp ordering between superoperators \({\mathcal {A}}_1 \preceq {\mathcal {B}}_1, \ldots ,{\mathcal {A}}_t \preceq {\mathcal {B}}_t\). Then \({\mathcal {A}}_t \ldots {\mathcal {A}}_1 \preceq {\mathcal {B}}_t \ldots {\mathcal {B}}_1\).

Proof

We first prove the following claim

Claim

If \({\mathcal {A}}\preceq {\mathcal {B}}\) and \({\mathcal {C}}\preceq D\) are cp maps, then \({\mathcal {A}}{\mathcal {C}}\preceq {\mathcal {B}}{\mathcal {D}}\).

Proof

The class of cp maps is closed under composition and addition. Therefore \({\mathcal {B}}{\mathcal {D}}- {\mathcal {A}}{\mathcal {C}}= ({\mathcal {B}}-{\mathcal {A}}) D + {\mathcal {A}}({\mathcal {D}}-{\mathcal {C}})\) is cp.

The proof (of Lemma 34) is by induction. We show for all \(1 \le i \le t\)

$$\begin{aligned} {\mathcal {A}}_i \ldots {\mathcal {A}}_1 \preceq {\mathcal {B}}_i \ldots {\mathcal {B}}_1. \end{aligned}$$
(87)

Clearly this is true for \(i=1\). Suppose also this is true for \(1< k < t\). So \({\mathcal {A}}_i \ldots {\mathcal {A}}_1 \preceq {\mathcal {B}}_i \ldots {\mathcal {B}}_1\) and \({\mathcal {A}}_{i+1} \preceq {\mathcal {B}}_{i+1}\), and using the claim \({\mathcal {A}}_{i+1} \ldots {\mathcal {A}}_1 \preceq {\mathcal {B}}_{i+1} \ldots {\mathcal {B}}_1\). \(\square \)

Corollary

(Restatement of Corollary 35). If \(K_1, \ldots , K_t\) are respectively the moments superoperators of \(\epsilon _1, \ldots , \epsilon _t\)-approximate strong k-designs each on a potentially different subset of qudits, then

$$\begin{aligned}{} & {} \text {Ch}\left[ G^{(t)}_{\text {Haar}(S_1)} \ldots G^{(t)}_{\text {Haar}(S_t)}\right] (1-\epsilon _1)\ldots (1-\epsilon _t) \preceq K_1 \ldots K_t \nonumber \\{} & {} \quad \preceq \text {Ch}\left[ G^{(t)}_{\text {Haar}(S_1)} \ldots G^{(t)}_{\text {Haar}(S_t)}\right] (1+\epsilon _1)\ldots (1+\epsilon _t). \end{aligned}$$
(88)

Proof

This is immediate from Lemma 34, Definition 16, and the observation that if \(A \preceq B\) then \(A \otimes {{\,\textrm{id}\,}}\preceq B \otimes {{\,\textrm{id}\,}}\). \(\square \)

3.5.2 Bound on the value of off-diagonal monomials.

Lemma

(Restatement of Lemma 37). Let \(\delta > 0\). Assume that \(\text {Ch}\left[ G_\mu ^{(t)}\right] \) and \(\text {Ch}\left[ G_\nu ^{(t)}\right] \) are two moment superoperators that satisfy the following completely positive ordering

$$\begin{aligned} (1 - \delta ) \cdot \text {Ch}\left[ G_\nu ^{(t)}\right] \preceq \text {Ch}\left[ G_\mu ^{(t)}\right] \preceq (1 + \delta ) \cdot \text {Ch}\left[ G_\nu ^{(t)}\right] . \end{aligned}$$
(89)

Let \({\mathcal {O}}\) and \({\mathcal {D}}\) be respectively the set of off-diagonal and diagonal indices for monomials. Then

$$\begin{aligned} \max _{x \in {\mathcal {O}}} |{\textrm{Tr}}\left( x G^{(t)}_\mu \right) | \le \max _{x \in {\mathcal {O}}} |{\textrm{Tr}}\left( x G^{(t)}_\nu \right) | (1+\delta ) + 2\delta \cdot \max _{y \in {\mathcal {D}}} |{\textrm{Tr}}\left( y G^{(t)}_\nu \right) |. \end{aligned}$$
(90)

Proof

Let \(\phi _N:= |\phi _N\rangle \langle \phi _N|\) for

$$\begin{aligned} |\phi \rangle := \frac{1}{\sqrt{N}} \sum _{x\in [d]^n} |x\rangle |x\rangle \end{aligned}$$
(91)

be the n-qudit maximally entangled state, and \(N = d^n\).

We use the following standard lemma which we leave without proof (see [13] for e.g.)

Lemma 44

Let \(\mu \) and \(\nu \) be two distributions over the n-qudit unitary group then \(\text {Ch}[G^{(t)}_\mu ] \preceq \text {Ch}[G^{(t)}_\nu ]\) if and only if

$$\begin{aligned} \left( \text {Ch}\left[ G^{(t)}_\nu \right] \otimes \textrm{id} - \text {Ch}\left[ G^{(t)}_\mu \right] \otimes \textrm{id} \right) \phi ^{\otimes t}_N \end{aligned}$$
(92)

is a psd matrix.

We now adapt Lemma 37 to Lemma 44. First,

$$\begin{aligned} \phi _N^{\otimes t}= & {} \frac{1}{N^{t}} \sum |i_1, \ldots , i_t\rangle \langle j_1, \ldots , j_t|\otimes |i_1, \ldots , i_t\rangle \langle j_1, \ldots , j_t|\nonumber \\\equiv & {} \frac{1}{N^{t}} \sum |i\rangle \langle j|\otimes |i\rangle \langle j|. \end{aligned}$$
(93)

For \(i,j,k,l \in [d]^{nt}\), if we define

$$\begin{aligned} M^{(\mu ,t)}_{k, i,l,j} = \langle k|\text {Ch}\left[ G^{(t)}_\mu \right] \left( |i\rangle \langle j| \right) |l\rangle . \end{aligned}$$
(94)

Therefore

$$\begin{aligned} (\text {Ch}[G_\mu ^{( t)}] \otimes \textrm{id} ) \phi _N^{\otimes t} = \frac{1}{N^{t}} \sum M^{(\mu ,t)}_{a,b,c,d} |a\rangle \langle c|\otimes |b\rangle \langle d|, \end{aligned}$$
(95)

and

$$\begin{aligned} (\text {Ch}[G^{(t)}_\text {Haar}] \otimes \textrm{id} ) \phi _N^{\otimes t}= \frac{1}{N^{t}} \sum M^{(\textrm{Haar},t)}_{a,b,c,d} |a\rangle \langle c|\otimes |b\rangle \langle d|. \end{aligned}$$
(96)

Therefore since \(\text {Ch}\left[ G_\mu ^{(t)}\right] \le (1+\delta ) \text {Ch}[G_\nu ^{( t)}]\) the following matrix

$$\begin{aligned} A= & {} ( (1+\delta ) \text {Ch}[G_\text {Haar}^{(t)} ]\otimes \textrm{id}-\text {Ch}\left[ G_\mu ^{(t)}\right] \otimes \textrm{id} ) \phi _N^{\otimes t} \nonumber \\= & {} \frac{1}{N^{t}} \sum ( (1+\delta ) M^{(\textrm{Haar},t)}_{a,b,c,d} - M^{(\mu ,t)}_{a,b,c,d} ) |a\rangle |b\rangle \langle c|\langle d|. \end{aligned}$$
(97)

Is psd. We use the following fact about psd matrices which we leave without proof.

Fact— if A is psd then the absolute maximum of off-diagonal terms in A is at most the absolute maximum diagonal term.

Then using the above fact

$$\begin{aligned} \max _{x \in {\mathcal {O}}} |(1 + \delta ) {\textrm{Tr}}\left( G^{(t)}_\nu x \right) - {\textrm{Tr}}\left( G^{(t)}_\mu x \right) | \le \max _{y \in {\mathcal {D}}} |(1 + \delta ) {\textrm{Tr}}( G^{(t)}_\nu y) - {\textrm{Tr}}\left( G^{(t)}_\mu y \right) |.\nonumber \\ \end{aligned}$$
(98)

Hence

$$\begin{aligned} \max _{x \in {\mathcal {O}}} | {\textrm{Tr}}\left( G^{(t)}_\mu x \right) |\le & {} \max _{x \in {\mathcal {O}}} | {\textrm{Tr}}\left( G^{(t)}_\nu x \right) | (1+\delta ) \nonumber \\{} & {} + \max _{y \in {\mathcal {D}}} |(1 + \delta ) {\textrm{Tr}}\left( G^{(t)}_\nu y \right) - {\textrm{Tr}}\left( G^{(t)}_\mu y \right) |. \end{aligned}$$
(99)

Now if \(y \in {\mathcal {D}}\)

$$\begin{aligned} {\textrm{Tr}}\left( G^{(t)}_\nu x \right) (1-\delta ) \le {\textrm{Tr}}\left( G^{(t)}_\mu x \right) \le {\textrm{Tr}}\left( G^{(t)}_\nu x \right) (1+\delta ). \end{aligned}$$
(100)

then using this in (99)

$$\begin{aligned} \max _{x \in {\mathcal {O}}} | {\textrm{Tr}}\left( G^{(t)}_\mu x \right) | \le \max _{x \in {\mathcal {O}}} | {\textrm{Tr}}\left( G^{(t)}_\nu x \right) | (1+\delta ) + 2\delta \cdot \max _{y \in {\mathcal {D}}} {\textrm{Tr}}\left( G^{(t)}_\nu y \right) . \end{aligned}$$
(101)

\(\square \)

3.5.3 Bounds on the moments of the Haar measure.

Lemma

(Restatement of Lemma 38). Let \(G_{\text {Haar}(m)}^{(t)}\) be the quasi-projector operator for the Haar measure on m qudits. Then

$$\begin{aligned} \max _y \left\| G_{\text {Haar}(m)}^{(t)} y G_{\text {Haar}(m)}^{(t)}\right\| _1 \le \frac{t^{O(t)}}{d^{mt}}. \end{aligned}$$
(102)

Here the maximization is taken over matrix elements in the computational basis like

\(y= |i_1,\ldots , i_t, i'_1,\ldots , i'_t\rangle \langle j_1,\ldots , j_t, j'_1,\ldots , j'_t|\). Each label (e.g. \(i_j\)) is in \([d]^m\).

Proof

First observe that

$$\begin{aligned} \max _y \Vert G_\text {Haar}^{(t)} y G_\text {Haar}^{(t)}\Vert _1= & {} \max _{a, b} {\textrm{Tr}}\sqrt{G_\text {Haar}^{(t)} |a\rangle \langle b| G_\text {Haar}^{(t)} G_\text {Haar}^{(t)} |b\rangle \langle a| G_\text {Haar}^{(t)}} \end{aligned}$$
(103)
$$\begin{aligned}= & {} \max _{a, b} \sqrt{\langle a| G_\text {Haar}^{(t)} |a\rangle \cdot \langle b| G_\text {Haar}^{(t)} |b \rangle } \end{aligned}$$
(104)
$$\begin{aligned}= & {} \max _{i} \langle i|G_\text {Haar}^{(t)}|i\rangle . \end{aligned}$$
(105)

The below lemma concludes the proof.

Lemma 45

(Moments of the Haar measure). The largest t-th monomial moment of the Haar measure is at most \(\frac{t!}{d^{tm}}\).

Proof

Consider a particular balanced moment of Haar, using Hölder’s inequality

$$\begin{aligned} \mathop {{\mathbb {E}}}\limits _{C\sim \text {Haar}} | C_{a_1, b_1} \ldots C_{a_t,b_t}|^2 \le \prod _{i \in [k]} ({\mathbb {E}}|C_{a_i,b_i}|^{2k})^{1/k} \le k! / d^{km}. \end{aligned}$$
(106)

If the moment is not balanced the expectation is zero and hence the bound still works. Here we have used a closed form expression for \({\mathbb {E}}|C_{a_i,b_i}|^{2k}\), see corollary 2.4 and proposition 2.6 of [23] for a reference. \(\square \)

3.6 Proofs of the projector overlap lemmas from section 3.2

3.6.1 Extended quasi-orthogonality of permutation operators with application to random circuits on 2-dimensional lattices.

In this section we prove Lemma 39

Lemma

(Restatement of Lemma 39). \(\Vert G_C G_R -G_\text {Haar}^{(t)}\Vert _\infty \le 1/d^{\Omega (\sqrt{ n})}\).

First, we need a description of the subspaces the projectors \(G_R, G_C\) and \(G_\text {Haar}\) project onto. Consider a \(\sqrt{n}\times \sqrt{n}\) square lattice with n qudits as the collection of points \(A:= [\sqrt{n}]\times [\sqrt{n}]\). We use the following interpretation of the Hilbert space a quasi-projector acts on. This interpretation is also used in [13]. Denote \(R_j (C_j)\) as the j-th row (column) of A for \(j \in [\sqrt{n}]\). Here we assume each point of A consists of t pairs of qudits, each with local dimension d. Thereby, the lattice becomes the Hilbert space \({\mathcal {H}}:= \bigotimes _{(x,y) \in A} {\mathbb {C}}_{(x,y)}^{d^{2 t}}\), and has dimension \(d^{2 t n}\).

We are interested in a certain subspace of \({\mathcal {H}}\), and in order to understand it we need the following notation. For each point \((x,y) \in A\) we assign the quantum state \(|\psi _\pi \rangle := (I\otimes V(\pi ))|\Phi _{d,t}\rangle \), for each permutation \(\pi \in S_t\). \(|\Phi _{d,t}\rangle \) is the maximally entangled state \(\frac{1}{\sqrt{d^t}}\sum _{x \in [d]^t} |x,x\rangle \), \(V: S_t \rightarrow GL({\mathbb {C}}^{d^{2t}})\), is a representation of \(S_t\) with the map \(V(\pi ): |x(1), x_2, \ldots , x_t\rangle \mapsto |x_{\pi ^{-1}(1)}, x_{\pi ^{-1}(2)}, \ldots , x_{\pi ^{-1}(t)}\rangle \), and \(S_t\) is the symmetric group over t elements.

Given these definitions define the following basis states in \({\mathcal {H}}\):

$$\begin{aligned} |R_{\pi _1, \pi _2, \ldots , \pi _{\sqrt{n}}}\rangle := \bigotimes _{v_1 \in R_1} |\psi _{\pi _1}\rangle _{v_1} \otimes \bigotimes _{v_2 \in R_2} |\psi _{\pi _2}\rangle _{v_2} \otimes \ldots \otimes \bigotimes _{v_{\sqrt{n}} \in R_{\sqrt{n}}} |\psi _{\pi _{\sqrt{n}}}\rangle _{v_{\sqrt{n}}}, \end{aligned}$$
(107)

and,

$$\begin{aligned} |C_{\pi _1, \pi _2, \ldots , \pi _{\sqrt{n}}}\rangle := \bigotimes _{v_1 \in C_1} |\psi _{\pi _1}\rangle _{v_1} \otimes \bigotimes _{v_2 \in C_2} |\psi _{\pi _2}\rangle _{v_2} \otimes \ldots \otimes \bigotimes _{v_{\sqrt{n}} \in C_{\sqrt{n}}} |\psi _{\pi _{\sqrt{n}}}\rangle _{v_{\sqrt{n}}}, \end{aligned}$$
(108)

for each \(\sqrt{n}\) tuple of permutations \((\pi _1, \pi _2, \ldots , \pi _{\sqrt{n}}) \in S_t^{\sqrt{n}}\). Here \(S_t^{\sqrt{n}}\) is the \(\sqrt{n}\)-fold Cartesian product \(S_t \times \ldots \times S_t\) of \(S_t\) with itself. Denote \(H_{t,n}\) as the subset consisting of tuples of permutations in which not all of the permutations are equal. For example, elements like \((\pi , \pi , \ldots , \pi )\) are not contained in this set. Notice that these basis are not orthogonal to each other and if \(t>d^n\) these are not even linearly independent.

Here we define two vector spaces \(V_R, V_C \subseteq {\mathcal {H}}\), with:

$$\begin{aligned} V_R:=\mathop {{\textrm{span}}}\limits _{\mathbb {C}}\left\{ |R_{\pi _1, \pi _2, \ldots , \pi _{\sqrt{n}}}\rangle : (\pi _1, \pi _2, \ldots , \pi _{\sqrt{n}}) \in S_t^{\sqrt{n}} \right\} , \end{aligned}$$
(109)

and,

$$\begin{aligned} V_C:=\mathop {{\textrm{span}}}\limits _{\mathbb {C}}\left\{ |C_{\pi _1, \pi _2, \ldots , \pi _{\sqrt{n}}}\rangle : (\pi _1, \pi _2, \ldots , \pi _{\sqrt{n}}) \in S_t^{\sqrt{n}} \right\} , \end{aligned}$$
(110)

and we call them row and column vector spaces, respectively. Also, denote the intersection between them by \(V_\text {Haar}:= V_R \cap V_C\). Equivalently:

$$\begin{aligned} V_\text {Haar}= \mathop {{\textrm{span}}}\limits _{\mathbb {C}}\left\{ \bigotimes _{v\in A} |\psi _\pi \rangle _v: \pi \in S_t \right\} . \end{aligned}$$
(111)

Then define \({\tilde{V}}_R:= V_R\cap V^\perp _H\) and \({\tilde{V}}_C:= V_C\cap V^\perp _H\). Define the angle between two vector spaces A and B as

$$\begin{aligned} \cos \measuredangle (A,B):= \max _{x\in A, y \in B} \langle x,y\rangle . \end{aligned}$$
(112)

We need the following definition of a Gram matrix

Definition 46

(Gram matrix). Let \(v_1, \ldots , v_{\text {Rows}(D,n)}\) be normal vectors that are not necessarily orthogonal to each other. Then the Gram matrix corresponding to this set of vectors is defined as \([J_{ij}] = \langle v_i | v_j\rangle \).

We also need the following lemma

Lemma 47

(Perron-Frobenius [53]). If A is a (not necessarily symmetric) d-dimensional matrix, then:

$$\begin{aligned} ||A||_\infty \le \sqrt{\max _{i\in [d]} \sum _j |A_{i,j}|\cdot \max _{j\in [d]} \sum _i |A_{i,j}|}. \end{aligned}$$
(113)

Let \(G_R, G_C\) and \(G_{\text {Haar}}\) be the quasi-projectors defined in Sect. 2.1. From [13] we know that \(G_R\), \(G_C\) and \(G_\text {Haar}\) are indeed projectors onto \(V_R, V_C\) and \(V_\text {Haar}\), respectively. Define the inner-product matrix between \(V_R\) and \(V_C\) with matrix Q with entries:

$$\begin{aligned}{}[Q]_{g,h}:=\langle R_g|C_h\rangle , \, g,h \in H_{t,n}. \end{aligned}$$
(114)

The goal is to prove \(\Vert G_C G_R -G_\text {Haar}^{(t)}\Vert _\infty \le 1/d^{\Omega (\sqrt{ n})}\). This basically means that the composition of \(G_R\) and \(G_C\) is close to \(G_{\text {Haar}}\).

Also let \(c_{d,n,t} =\frac{1}{1-\frac{\sqrt{n} t(t-1)}{2 d^{\sqrt{n}}}}\) be a number very close to 1.

The proof is in three main steps. First we relate \(\Vert G_CG_R-G_{\text {Haar}}\Vert _\infty \) to \(\measuredangle ({{\tilde{V}}}_R, {{\tilde{V}}}_C)\):

Proposition 48

\(\Vert G_CG_R-G_{\text {Haar}}\Vert _\infty \le \cos ^2 \measuredangle ({{\tilde{V}}}_R, {{\tilde{V}}}_C)\).

Next, we relate \(\measuredangle ({{\tilde{V}}}_R, {{\tilde{V}}}_C)\) to \(||Q||_\infty \)

Proposition 49

\(|\cos \measuredangle ({{\tilde{V}}}_R, {{\tilde{V}}}_C)| \le c_{d,n,t}. ||Q||_\infty \)

Then we bound \(||Q||_\infty \):

Proposition 50

\(||Q||_\infty \le \big (\frac{1}{d}+\frac{1}{d^{\sqrt{n}-1}}+\frac{2 t^2}{d^{\sqrt{n}}} \big )^{\sqrt{n}}\).

Propositions 48, 49 and 50 imply the proof of Lemma 39.

Proof of Proposition 48

We use the following result of Jordan

Proposition 51

(Jordan). if P and Q are two projectors, then the Hilbert space V they act on can be decomposed, as a direct sum, into one-dimensional or two-dimensional subspaces, all of which are invariant under the action of both P and Q at the same time.

which implies

Corollary 52

There are orthonormal basis \(e_1, \ldots , e_K\), \(f_1, \ldots , f_K\), \(q_1, \ldots , q_T\), and angles \(0\ge \theta _1 \ge \theta _2 \ge \ldots \ge \theta _K \le \pi /2\) such that:

$$\begin{aligned} V_R = \mathop {{\textrm{span}}}\limits _{{\mathbb {C}}} \left\{ e_1,\ldots , e_K, q_1, \ldots , q_T \right\} , \end{aligned}$$
(115)

and

$$\begin{aligned} V_C = \mathop {{\textrm{span}}}\limits _{{\mathbb {C}}} \{\cos \theta _1 e_1 + \sin \theta _1 f_1,\ldots , \cos \theta _K e_K + \sin \theta _K f_K, q_1, \ldots , q_T \}, \end{aligned}$$
(116)

and

$$\begin{aligned} V_\text {Haar}= \mathop {{\textrm{span}}}\limits _{{\mathbb {C}}} \{q_1, \ldots , q_T \}. \end{aligned}$$
(117)

In other words, both \(G_R\) and \(G_C\) can be decomposed into \(2\times 2\) blocks, each corresponding to one of the angles \(\theta _i\), such that \(G_C\) on this block looks like

$$\begin{aligned} G^{2 \times 2}_C= \begin{pmatrix} 1 &{} 0\\ 0&{} 0\\ \end{pmatrix}, \end{aligned}$$
(118)

and \(G_R\)

$$\begin{aligned} G^{2\times 2}_R = \begin{pmatrix} \cos ^2 \theta _i &{} \sin \theta _i \cos \theta _i\\ \sin \theta _i \cos \theta _i &{} \sin ^2 \theta _i\\ \end{pmatrix}. \end{aligned}$$
(119)

Hence \(G_C G_R\) looks like

$$\begin{aligned} G^{2\times 2}_C G^{2\times 2}_R = \begin{pmatrix} \cos ^2 \theta _i &{} \sin \theta _i \cos \theta _i\\ 0 &{} 0\\ \end{pmatrix}, \end{aligned}$$
(120)

which has largest singular value \(|\cos ^2 \theta _i|\). Propositions 49 and 50 along with this observation imply that the largest singular value of \(G_C G_R - G_\text {Haar}\) is \(1/d^{n^{O(n^{1/D})}}\). \(\square \)

Proof of Proposition 49

An arbitrary normal vector in \({\tilde{V}}_R\) can be written as \(|\psi _x\rangle = \frac{\sum _{{{\tilde{\pi }}} \in H_{t,n}} x_{{{\tilde{\pi }}}} |R_{{{\tilde{\pi }}}}\rangle }{\sqrt{\sum _{{{\tilde{\pi }}}, {{\tilde{\sigma }}} \in H_{t,n}} x_{{{\tilde{\pi }}}} x_{{{\tilde{\sigma }}}} \langle R_{{{\tilde{\pi }}}}| R_{{\tilde{\sigma }}} \rangle }}\). Let \(|x\rangle \) be a vector with entries corresponding to \(x_{{\tilde{\pi }}_1, \ldots , {{\tilde{\pi }}}_{\sqrt{n}}}\). Similarly, a typical vector inside \({{\tilde{V}}}_C\) can be represented as \(|\psi _y\rangle = \frac{\sum _{{{\tilde{\pi }}} \in H_{t,n}} y_{{{\tilde{\pi }}}} |R_{{{\tilde{\pi }}}}\rangle }{\sqrt{\sum _{{{\tilde{\pi }}}, {{\tilde{\sigma }}} \in H_{t,n}} y_{{{\tilde{\pi }}}} y_{{{\tilde{\sigma }}}} \langle C_{{{\tilde{\pi }}}}| C_{{\tilde{\sigma }}} \rangle }}\). Also represent the corresponding vector \(|y\rangle \) similarly.

Let \({\tilde{J}}\) and \({\tilde{J}}'\) be the Gram matrices corresponding to the basis described for \({\tilde{V}}_R\) and \({\tilde{V}}_C\), respectively. Then:

$$\begin{aligned} \langle \psi _x | \psi _y\rangle= & {} \frac{\sum _{{{\tilde{\pi }}}, {{\tilde{\sigma }}} \in H_{t,n}} x_{{{\tilde{\pi }}}} \langle R_{{{\tilde{\pi }}}}| R_{{{\tilde{\sigma }}}}\rangle y_{{{\tilde{\sigma }}}}}{\sqrt{\sum _{{{\tilde{\pi }}}, {{\tilde{\sigma }}} \in H_{t,n}} x_{{{\tilde{\pi }}}} x_{{{\tilde{\sigma }}}} \langle R_{{{\tilde{\pi }}}}| R_{{\tilde{\sigma }}} \rangle } \cdot \sqrt{\sum _{{{\tilde{\pi }}}, {{\tilde{\sigma }}} \in H_{t,n}} y_{{{\tilde{\pi }}}} y_{{{\tilde{\sigma }}}} \langle C_{{{\tilde{\pi }}}}| C_{{\tilde{\sigma }}} \rangle }}\nonumber \\= & {} \frac{\langle x|Q |y\rangle }{\sqrt{\langle x| {\tilde{J}}|x\rangle }.\sqrt{\langle y|{\tilde{J}}'|y\rangle }}.\nonumber \\ \end{aligned}$$
(121)

To see the equality we go through the below calculation.

(122)

For the second line we used the following proposition

Proposition 53

If \({\tilde{J}}\) is the Gram matrix for the basis states, \(|R_{\cdot }\rangle \) or \(|C_{\cdot }\rangle \) in (107) and (108) for \({\tilde{V}}_R\) or \({\tilde{V}}_R\), then for any \(|x\rangle \) with \(||x||_2=1\):

$$\begin{aligned} \langle x|{\tilde{J}}|x\rangle \ge \left( 1-\frac{\sqrt{n} t(t-1)}{2 d^{\sqrt{n}}}\right) =\frac{1}{c_{d,n,t}}. \end{aligned}$$
(123)

For the third line we used Cauchy-Schwartz. \(\square \)

In order to prove Proposition 50 we need the following tool. If \(\vec {x(1)},\vec {x_2}, \ldots ,\vec {x_K}\) are d-dimensional vectors the multi-product of them is defined to be:

$$\begin{aligned} \textsf{multiprod} (\vec {x_1},\vec {x_2}, \ldots ,\vec {x_K}):= \sum ^d_{i=1} x_{1 i} x_{2 i}\ldots x_{K i}. \end{aligned}$$
(124)

Proposition 54

(Majorization). Let \(\vec {x}_1,\vec {x}_2, \ldots ,\vec {x}_K\) be d-dimensional, non-negative and real vectors. If \(\vec {x}_i^{\downarrow }\) is \(\vec {x}_i\) in descending order, then:

$$\begin{aligned} \textsf{multiprod} \left( \vec {x}_1,\vec {x}_2, \ldots ,\vec {x}_K\right) \le \textsf{multiprod} (\vec {x}_1^{\downarrow },\vec {x}_2^{\downarrow }, \ldots ,\vec {x}_K^{\downarrow }). \end{aligned}$$
(125)

Proof

The \(K=2\) version of claim is that \(\langle \vec x(1), \vec x_2\rangle \le \langle \vec x(1)^\downarrow , \vec x_2^\downarrow \rangle \). This is a standard fact. To prove it, observe that WLOG we can assume \(\vec x(1) = \vec x(1)^\downarrow \). Then for any out-of-order pair \(x_{2i} < x_{2j}\) with \(i<j\), we will increase \(\langle \vec x(1), \vec x_2\rangle \) by swapping \(x_{2i}\) and \(x_{2j}\). Applying this repeatedly we end with \(\langle \vec x(1)^\downarrow , \vec x_2^\downarrow \rangle \).

This same argument works if we replace the inner product with a sum over the first \(d' \le d\) terms, i.e. \(\sum _{i=1}^{d'} x_{1i}x_{2i}\). Thus the same argument shows that

$$\begin{aligned} \vec {x}_1 \circ \vec {x}_2 \preceq \vec {x}_1^\downarrow \circ \vec {x}_2^\downarrow . \end{aligned}$$
(126)

The proposition now follows by induction on K. \(\square \)

We also need the following upper bound:

Proposition 55

Let \(e\in S_t\) be the identity permutation. Define \(f_t: {\mathbb {R}}_{>1} \rightarrow {\mathbb {R}}_{>1}\) with the map:

$$\begin{aligned} f_t(\alpha )=\sum _{\sigma \in S_t} \frac{1}{\alpha ^{\textsf{dist}(e,\sigma )}}, \end{aligned}$$
(127)

for \(\alpha > 1\). Then as long as \(2t^2 \le \alpha \)

$$\begin{aligned} f_t(\alpha ) \le 1 + \frac{2 t^2}{\alpha }. \end{aligned}$$
(128)

For \(\sigma _1, \ldots , \sigma _M \in S_t\) define the function:

$$\begin{aligned} h(D,t, \sigma _1, \ldots , \sigma _M):= \sum _{\pi \in S_t} \frac{1}{D^{\textsf{dist}(\pi , \sigma _1)+\ldots +\textsf{dist}(\pi , \sigma _M)}}. \end{aligned}$$
(129)

Proposition 56

Let \((\sigma _1,\ldots , \sigma _M) \in H\) be permutations that not all of them are equal to each other then:

$$\begin{aligned} h(D,t, \sigma _1,\ldots , \sigma _M) \le \frac{1}{D}+\frac{1}{D^{M-1}} + \frac{2 t^2}{D^M}. \end{aligned}$$
(130)

Proof of Proposition 50

In or to prove this, we show that the sum of terms in each row is a small number. Then use Lemma 47 to obtain the result. Consider the particular row \((\sigma _1, \ldots , \sigma _{sqrt{n}}) \in H\), then the sum of terms in each row is:

$$\begin{aligned} \sum _{(\pi _1, \ldots , \pi _{\sqrt{n}})\in H} \langle R_{\pi _1, \ldots , \pi _{\sqrt{n}}}|C_{\sigma _1, \ldots , \sigma _{\sqrt{n}}}\rangle= & {} \sum _{\pi _1, \ldots , \pi _{\sqrt{n}}\in S_t} \langle R_{\pi _1, \ldots , \pi _{\sqrt{n}}}|C_{\sigma _1, \ldots , \sigma _{\sqrt{n}}}\rangle \nonumber \\{} & {} -\sum _{\pi \in S_t} \langle R_{\pi , \ldots , \pi }|C_{\sigma _1, \ldots , \sigma _{\sqrt{n}}}\rangle . \end{aligned}$$
(131)

The lower bound:

$$\begin{aligned} \sum _{\pi \in S_t} \langle R_{\pi , \ldots , \pi }|C_{\sigma _1, \ldots , \sigma _{\sqrt{n}}}\rangle \ge 0, \end{aligned}$$
(132)

is good enough. The goal is to find a good upper bound for \(S:=\sum _{\pi _1, \ldots , \pi _{\sqrt{n}}\in S_t} \langle R_{\pi _1, \ldots , \pi _{\sqrt{n}}}|C_{\sigma _1, \ldots , \sigma _{\sqrt{n}}}\rangle \). But S simplifies to:

$$\begin{aligned} S= \left( \sum _{\pi \in S_t} \frac{1}{d^{\textsf{dist}(\pi ,\sigma _1)+\ldots +\textsf{dist}(\pi , \sigma _{\sqrt{n}})}} \right) ^{\sqrt{n}} = h(d,t,\sigma _1, \ldots , \sigma _{\sqrt{n}})^{\sqrt{n}}. \end{aligned}$$
(133)

Now we use Proposition 56 and find the upper bound:

$$\begin{aligned} S\le \left( \frac{1}{d}+\frac{1}{d^{\sqrt{n}-1}}+\frac{2 t^2}{d^{\sqrt{n}}} \right) ^{\sqrt{n}}. \end{aligned}$$
(134)

Which is a global maximum and in turn is a bound on the \(\infty \)-norm. \(\square \)

Proof of Proposition 53

We will prove the statement for the row space, and the same thing works for the column space. First, for any normal vector \(|x\rangle \), \(\langle x | {{\tilde{J}}}_R | x\rangle \ge \lambda _{\min } ({{\tilde{J}}}_R)\). Let \(J(\sqrt{n})\) be the Gram matrix for the Haar subspace on one row of the grid. The entries of \(J(\sqrt{n})\) are according to:

$$\begin{aligned} J(\sqrt{n})_{\pi ,\sigma }:= \left( \frac{1}{d^{\textsf{dist}(\pi ,\sigma )}} \right) ^{\sqrt{n}}= \left( \frac{1}{d^{\sqrt{n}}} \right) ^{\textsf{dist}(\pi ,\sigma )}. \end{aligned}$$
(135)

Let P be the projector that projects out the subspace spanned by \(\{|R_{\pi , \ldots , \pi }\rangle : \pi \in S_t\}\). Then \({{\tilde{J}}} = P J(\sqrt{n})^{\otimes \sqrt{n}} P^\dagger \). We first need the following proposition

Proposition 57

If J is the Gram matrix of the vector space spanned by \( \{ |\psi _\pi \rangle ^{\otimes m}: \pi \in S_t \}.\), then:

$$\begin{aligned} 1- \frac{t(t-1)}{2 d^m}\le \lambda _{\min }(J) \end{aligned}$$
(136)

Using this proposition \(\lambda _{\min } (J(\sqrt{n})) \ge 1- \frac{t(t-1)}{2 d^{\sqrt{n}}}\), and therefore \(\lambda _{\min } (J(\sqrt{n})^{\otimes \sqrt{n}}) \ge (1- \frac{t(t-1)}{2 d^{\sqrt{n}}})^{\sqrt{n}} \ge 1- \frac{\sqrt{n} t(t-1)}{2 d^{\sqrt{n}}}\). This implies that \(J(\sqrt{n})^{\otimes \sqrt{n}} \succeq I (1-\frac{\sqrt{n} t(t-1)}{2 d^{\sqrt{n}}})\), and therefore \({{\tilde{J}}} \succeq P P^\dagger (1-\frac{\sqrt{n} t(t-1)}{2 d^{\sqrt{n}}})\). This means that restricted to \({{\tilde{V}}}_R\) the minimum eigenvalue of \({{\tilde{J}}}_R\) is at least \((1-\frac{\sqrt{n} t(t-1)}{2 d^{\sqrt{n}}})\).

\(\square \)

Proof of Proposition 56

Let \(C= \{ \sigma _1,\ldots , \sigma _M \}\). Then \(h=h_1+h_2\), where:

$$\begin{aligned} h_1 = \sum _{\pi \in C} \frac{1}{D^{\textsf{dist}(\pi , \sigma _1)+\ldots +\textsf{dist}(\pi , \sigma _M)}}, \end{aligned}$$
(137)

and,

$$\begin{aligned} h_2 = \sum _{\pi \in S_t/C} \frac{1}{D^{\textsf{dist}(\pi , \sigma _1)+\ldots +\textsf{dist}(\pi , \sigma _M)}}. \end{aligned}$$
(138)

We then find useful upper bounds for \(h_1\) and \(h_2\) separately. Suppose that C has distinct elements \(\{\tau _1,\ldots ,\tau _K\}\) with \(\tau _1\) appearing \(\mu _1\) times, \(\tau _2\) appearing \(\mu _2\) times, etc. Define

$$\begin{aligned} S&= \left\{ (\mu _1,\ldots , \mu _K) \in {\mathbb {Z}}_{\ge 0}^K: \mu _1 + \ldots + \mu _K =M, \max (i) \mu _i < M\right\} \end{aligned}$$
(139)
$$ \begin{aligned} P&= \Big \{(\mu _1, \ldots , \mu _K) \in S: \exists i,j, \, \mu _i = M-1 \, \& \, \mu _j = 1\Big \} \end{aligned}$$
(140)

Now we can bound \(h_1\) by

$$\begin{aligned} h_1= & {} \sum _{\pi \in C} \frac{1}{D^{\mu _1 \textsf{dist}(\pi , \tau _1)+\ldots +\mu _K\textsf{dist}(\pi , \tau _K)}}\nonumber \\\le & {} \max _{(\mu _1,\ldots , \mu _K) \in S}\frac{D^{\mu _1}+ \ldots +D^{\mu _K}}{D^M}\nonumber \\\le & {} \max _{(\mu _1,\ldots , \mu _K) \in \text {conv} (P)}\frac{D^{\mu _1}+ \ldots +D^{\mu _K}}{D^M} \nonumber \\\le & {} \frac{D^{M-1}+D}{D^M} \nonumber \\= & {} \frac{1}{D}+\frac{1}{D^{M-1}}. \end{aligned}$$
(141)

Here \({{\,\textrm{conv}\,}}\) denotes the convex hull and (141) uses the fact that \(K\ge 2\) since \(\sigma _1,\ldots ,\sigma _M\) are assumes to be not all equal. To justify (141), observe that \(f (\mu ) = D^{\mu _1} + \ldots + D^{\mu _K}\) is a convex function and the maximization is over a convex set whose extreme points are P. Therefore the maximum is achieved at a point in P.

In order to find a bound on \(h_2\), for each \(\sigma \in C\) we will define the following vector \(\vec X_\sigma \) whose entries are labeled by \(\pi \in S_t\).

$$\begin{aligned} \vec X_{\sigma ,\pi } = {\left\{ \begin{array}{ll} 0 &{} \text { if }\pi \in C \\ D^{-\textsf{dist}(\sigma ,\pi )}&{} \text { if }\pi \not \in C \end{array}\right. } \end{aligned}$$
(142)

Then \(h_2 = \textsf{multiprod}(\vec X_{\sigma _1},\ldots ,\vec X_{\sigma _M})\). We can use Proposition 54 to show that

$$\begin{aligned} h_2= \textsf{multiprod}(\vec {X}_{\sigma _1},\ldots ,\vec {X}_{\sigma _M} ) \le \textsf{multiprod}(\vec {X}^{\downarrow }_{\sigma _1},\ldots , \vec {X}^{\downarrow }_{\sigma _M} ). \end{aligned}$$
(143)

We will also define \(\vec X_e\) (where e denotes the identity element of \(S_t\)) by

$$\begin{aligned} \vec X_{e,\pi } = D^{-\textsf{dist}(e,\pi )}. \end{aligned}$$
(144)

Observe that \(\vec X_\sigma \) can be obtained from \(\vec X_e\) by zeroing out the elements in locations corresponding to C and reordering the remaining elements. Thus for each \(\sigma \in C\)

$$\begin{aligned} \vec X_{\sigma }^{\downarrow } \preceq \vec X_e^{\downarrow }. \end{aligned}$$
(145)

We use Proposition 54 again to bound

$$\begin{aligned} h_2 \le \textsf{multiprod}(\underbrace{\vec X_e,\ldots \vec X_e}_{M \text { times}}) = f_t(D^{-M})- 1 \le \frac{2 t^2}{D^M}. \end{aligned}$$
(146)

\(\square \)

3.6.2 Extended quasi-orthogonality of permutation operators with application to random circuits on D-dimensional lattices.

In this section we prove lemmas 40, 41, 42 and 43. Before getting to the proof we go over some notation and definitions.

Let \(\text {Rows}(D,n):= \{r_1, \ldots , r_{n^{1-1/D}}\}\) be the set of rows in the D-th direction and let \(V_{\text {Rows}(D,n)}\) be the subspace \(G_{\text {Rows}(D,n)}\) projects onto. Then \(V_{\text {Rows}(D,n)} = V_{\text {Haar}(r_1)} \otimes \ldots \otimes V_{\text {Haar}(r_{n^{1-1/D}})}\). A spanning set for \(V_{\text {Rows}(D,n)}\) is \(H_{\text {Rows}(D,n)}:= \{|D_{\sigma _1, \ldots ,\sigma _{n^{1-1/D}}} \rangle : \sigma _1, \ldots ,\sigma _{n^{1-1/D}} \in S_t\}\). Here \(V_{\text {Haar}(S)}\) is the Haar subspace (like \(V_\text {Haar}\)) on a subset of qudits S. \(|D_{\sigma _1, \ldots ,\sigma _{n^{1/D}}} \rangle \) is the basis state representing maximally entangled states for each qudit such that the qudits in the first row are permuted by \(\pi _1\), the qudits in the second row are permuted by \(\pi _2\), and so on. In other words:

$$\begin{aligned} |D_{\sigma _1,\ldots , \sigma _{n^{1-1/D}}}\rangle = \bigotimes _{r_i \in \text {Rows}(D,n)} \bigotimes _ {v \in r_i} |\psi _{\sigma _i}\rangle _{v}. \end{aligned}$$
(147)

We view the D dimensional lattice as \(n^{1/D}\) \(D-1\)-dimensional sub-lattices, each composed of \(n^{1-1/D}\) qudits. More concretely, the full lattice is the set \(A = [n^{1/D}]^D\). For \(1 \le \beta \le n^{1/D}\), denote \(p_\beta =\{(x(1), \ldots , x_{\text {Rows}(D,n)}) \in A: x_{\text {Rows}(D,n)} = \beta \}\). We denote the set of these lattices by \(\text {Planes}(D):= \{p_1, \ldots , p_{n^{1-1/D}}\}\). (This terminology is chosen to match the \(D=3\) case but the arguments here apply to any \(D>2\).) These lattices are connected to each other by the rows in \(\text {Rows}(D,n)\). \(V_{\text {Planes}(D)} = V_{\text {Haar}(p_1)} \otimes \ldots \otimes V_{\text {Haar}(p_{n^{1-1/D}})}\) is the span of \(H_{\text {Planes}(D)}:= \{|F_{\pi _1, \ldots ,\pi _{n^{1-1/D}}} \rangle : \pi _1, \ldots ,\pi _{n^{1-1/D}} \in S_t\}\). Here \(|F_{\pi _1, \ldots ,\pi _{n^{1-1/D}}} \rangle \) is the basis state of maximally entangled states for each qudit, such that the qudits in \(p_1\) are permuted by \(\pi _1\), qudits in \(p_2\) are permuted by \(\pi _2\) and so on. In other words:

$$\begin{aligned} |F_{\pi _1,\ldots , \pi _{{n^{1/D}}}}\rangle = \bigotimes _{p_i \in \text {Planes}(D)} \bigotimes _ {v \in p_i} |\psi _{\pi _i}\rangle _{v}. \end{aligned}$$
(148)

Then \(G_{\text {Planes}(D)}\) is the projector onto \(V_{\text {Planes}(D)}\).

Let \({\tilde{V}}_{\text {Planes}(D)}:= V_{\text {Planes}(D)}\cap V_\text {Haar}^\perp \) and \({\tilde{V}}_{\text {Rows}(D,n)} =: V_{\text {Rows}(D,n)}\cap V_\text {Haar}^\perp \) be respectively the orthogonal complements of \(V_{\text {Planes}(D)}\) and \(V_{\text {Rows}(D,n)}\) with respect to \(V_\text {Haar}\). Also define \({{\tilde{H}}}_{\text {Rows}(D,n)}\) and \({{\tilde{H}}}_{\text {Planes}(D)}\) the same as \(H_{\text {Rows}(D,n)}\) and \(H_{\text {Planes}(D)}\), excluding basis marked with permutations that are all equal to each other. For example, \(F_{\pi ,\ldots , \pi } \notin {{\tilde{H}}}_{\text {Planes}(D)}\). Define the overlap matrix \([Q]_{gh}:= \langle g|h\rangle \), for \(g \in H_{\text {Planes}(D)}\) and \(h \in H_{\text {Rows}(D,n)}\). Let \({\tilde{J}}_{\text {Planes}(D)}\) and \({\tilde{J}}_{\text {Rows}(D,n)}\) be the Gram matrices corresponding to \({\tilde{H}}_{\text {Planes}(D)}\) and \({\tilde{H}}_{\text {Rows}(D,n)}\), respectively. In other words, \([{{\tilde{J}}}_{D}]_{g,h} = \langle g|h\rangle \) for \(g,h \in {{\tilde{H}}}_{\text {Rows}(D,n)}\) and \([\tilde{J_{\text {Planes}(D)}}]_{g,h} = \langle g|h\rangle \) for \(g,h \in {{\tilde{H}}}_{\text {Planes}(D)}\).

We first prove Lemma 40, which basically states that the composition of \(G_{\text {Rows}(D,n)}\) and \(F_{\text {Rows}(D,n)}\) is very close to \(G^{(t)}_\text {Haar}\), or equivalently, \({{\tilde{V}}}_{\text {Rows}(D,n)}\) and \({{\tilde{V}}}_{\text {Planes}(D)}\) are almost orthogonal:

Lemma

(Restatement of Lemma 40). Let \(D = O(\ln n / \ln \ln n)\) with small enough constant factor, then \(\Vert G_{\text {Planes}(D)} G_{\text {Rows}(D,n)} - G_{\text {Haar}} \Vert _\infty \le 1/d^{\Omega (n^{1-1/D})}\).

Proof

The proof is very similar to the proof of Lemma 39. In particular, we need generalized versions of propositions 48, 49 and 50. The generalization of proposition 48 states that \(\cos ^2(\measuredangle ({\tilde{V}}_{\text {Planes}(D)},{\tilde{V}}_{\text {Rows}(D,n)}))\) equals the largest singular value of \(F_{D} G_{\text {Rows}(D,n)} - G_{\text {Haar}}\). Proposition 49 generalizes to the statement that the cosine of this angle is equal to

$$\begin{aligned} \frac{1}{\sqrt{ \lambda _{\min }({\tilde{J}}_{\text {Planes}(D)})\lambda _{\min }( {\tilde{J}}_{\text {Rows}(D,n)})}} \Vert Q\Vert _\infty \le c_{D,d,n,t} \Vert Q\Vert _\infty . \end{aligned}$$
(149)

Where \(1/c_{D,d,n,t}\) is a lower bound on \(\sqrt{ \lambda _{\min }({\tilde{J}}_{\text {Planes}(D)})\lambda _{\min }( {\tilde{J}}_{\text {Rows}(D,n)})}\).

We first bound \(\Vert Q\Vert _\infty \). Using Lemma 47

$$\begin{aligned} \Vert Q\Vert _\infty \le \sqrt{\max _h \sum _g Q_{gh} \max _g \sum _h Q_{gh}}=: \omega . \end{aligned}$$
(150)

Similar to the calculations in Sect. 3.6.1

$$\begin{aligned} Q_{F_{\pi _1, \ldots ,\pi _{n^{1-1/D}};D_{\sigma _1, \ldots ,\sigma _{n^{1/D}}}}} = \frac{1}{d^{\sum _{i=1}^{n^{1-1/D}} \sum _{j=1}^{n^{1/D}} \textsf{dist}(\pi _i, \sigma _j)}}. \end{aligned}$$
(151)

Let \(\alpha \) (\(\beta \)) be respectively the set of permutations \(\sigma _1, \ldots , \sigma _{n^{1/D}}\) (\(\pi _1,\ldots , \pi _{n^{1-1/D}}\)) that are not all equal. We compute

$$\begin{aligned}{} & {} \max _{\pi _1, \ldots , \pi _{n^{1-1/D}} \in \beta } \sum _{\sigma _1, \ldots , \sigma _{n^{1/D}}} \frac{1}{d^{\sum _{i=1}^{n^{1-1/D}} \sum _{j=1}^{n^{1/D}} \textsf{dist}(\pi _i, \sigma _j)}}\nonumber \\{} & {} \quad = \max _{\pi _1, \ldots , \pi _{n^{1-1/D}} \in \beta } (\sum _{\sigma } \frac{1}{d^{\sum _{i=1}^{n^{1-1/D}} \textsf{dist}(\pi _i, \sigma )}})^{n^{1/D}}\end{aligned}$$
(152)
$$\begin{aligned}{} & {} \quad = \max _{\pi _1, \ldots , \pi _{n^{1-1/D}} \in \beta } h(d^{{n^{1/D}}},t, \pi _1, \ldots , \pi _{n^{1-1/D}})\end{aligned}$$
(153)
$$\begin{aligned}{} & {} \quad \le \left( \frac{1}{d}+\frac{1}{d^{n^{1-1/D}-1}}+\frac{2 t^2}{d^{n^{1-1/D}}} \right) ^{n^{1/D}}\end{aligned}$$
(154)
$$\begin{aligned}{} & {} \quad = \frac{1}{d^{\Omega (n^{1-1/D})}}. \end{aligned}$$
(155)

and

$$\begin{aligned}{} & {} \max _{\sigma _1, \ldots , \pi _{n^{1/D}} \in \alpha } \sum _{\pi _1, \ldots , \pi _{n^{1-1/D}}} \frac{1}{d^{\sum _{i=1}^{n^{1-1/D}} \sum _{j=1}^{n^{1/D}} \textsf{dist}(\pi _i, \sigma _j)}}\nonumber \\{} & {} \quad =\max _{\sigma _1, \ldots , \sigma _{n^{1/D}} \in \alpha } (\sum _{\pi } \frac{1}{d^{\sum _{j=1}^{n^{1/D}} \textsf{dist}(\pi , \sigma _j)}})^{n^{1/D}}\end{aligned}$$
(156)
$$\begin{aligned}{} & {} \quad = \max _{\sigma _1, \ldots , \sigma _{n^{1/D}} \in \alpha } h(d^{n^{1-1/D}},t, \sigma _1, \ldots , \sigma _{n^{1/D}})\end{aligned}$$
(157)
$$\begin{aligned}{} & {} \quad \le \big (\frac{1}{d}+\frac{1}{d^{{n^{1/D}}-1}}+\frac{2 t^2}{d^{{n^{1/D}}}} \big )^{n^{1-1/D}}\end{aligned}$$
(158)
$$\begin{aligned}{} & {} \quad = \frac{1}{d^{\Omega (n^{1/D})}}. \end{aligned}$$
(159)

Hence

$$\begin{aligned} \omega = \frac{1}{d^{\Omega (n^{1-1/D})}}. \end{aligned}$$
(160)

Next, we have to show that \(c_{D,d,n,t}\) is not too large. Using exactly the same steps in the proof of Proposition 53 we can show that

$$\begin{aligned} \lambda _{\min }({\tilde{J}}_{\text {Planes}(D)}) \ge 1-\frac{n^{1/D} t(t-1)}{2 d^{n^{1-1/D}}}, \end{aligned}$$
(161)

and

$$\begin{aligned} \lambda _{\min }({\tilde{J}}_{\text {Rows}(D,n)}) \ge 1-\frac{n^{1-1/D} t(t-1)}{2 d^{n^{1/D}}}. \end{aligned}$$
(162)

\(\square \)

Next, we use this result to prove Lemma 41. Recall the expression \({\tilde{G}}_{n,D,c} \) from Definition 23

$$\begin{aligned} {\tilde{G}}_{n,D,c} = ({\tilde{G}}^{\otimes n^{1/D}}_{n^{1-1/D}, D-1, c} G_{\text {Rows}(D,n)} {\tilde{G}}^{\otimes n^{1/D}}_{n^{1-1/D}, D-1, c} )^c, \end{aligned}$$
(163)

where c is a constant depending on D and t, but independent of n. Note that \({\tilde{G}}_{n,D,c} = {\tilde{G}}^\dagger _{n,D,c}\) if \({\tilde{G}}_{n^{1-1/D},D-1,c} = {\tilde{G}}^\dagger _{n^{1-1/D},D-1,c}\). Also let \({{\hat{G}}}_{n,D,c}:= G_{\text {Rows}(D,n)} {\tilde{G}}_{n^{1-1/D},D-1,c}^{\otimes n^{1/D}} G_{\text {Rows}(D,n)}\). Hence \({\tilde{G}}_{n,D,c} = {\tilde{G}}_{n^{1-1/D},D-1,c}^{\otimes n^{1/D}} ({{\hat{G}}}_{n,D,c})^{c-1} {\tilde{G}}_{n^{1-1/D},D-1,c}^{\otimes n^{1/D}}\).

Lemma

(Restatement of Lemma 41). Let \(|x\rangle \) and \(|y\rangle \) be two computational basis states. For small enough \(D = O(\ln n / \ln \ln n)\) and large enough c, \(|\langle x | {\tilde{G}}_{n,D,c} -G_\text {Haar}| y\rangle | \le \frac{\epsilon }{d^{nt}}\) for some \(\epsilon = 1/d^{\Omega (n^{1/D})}\).

Proof

The proof is by induction. The base case \(D = 2\) is by Lemma 39. We assume that for any large enough m, \(\Vert {{\hat{G}}}_{m,D-1,c} - G_\text {Haar}\Vert _\infty \le \frac{1}{d^{O(m^{1/{D-1}})}}\), and we show that \(\Vert {{\hat{G}}}_{n,D,c} - G_\text {Haar}\Vert _\infty \le \frac{1}{d^{\Omega (n^{1/D})}}\).

$$\begin{aligned} \left\| {{\hat{G}}}_{n,D,c} - G_\text {Haar}\right\| _\infty\le & {} \left\| G_{\text {Rows}(D,n)} {\tilde{G}}_{n^{1-1/D},D-1,c}^{\otimes n^{1/D}} G_{\text {Rows}(D,n)} - G_\text {Haar}\right\| _\infty \end{aligned}$$
(164)
$$\begin{aligned}\le & {} \left\| {\tilde{G}}_{n^{1-1/D},D-1,c}^{\otimes n^{1/D}} G_{\text {Rows}(D,n)} - G_\text {Haar}\right\| _\infty \end{aligned}$$
(165)
$$\begin{aligned}= & {} \Big \Vert ({\tilde{G}}_{n^{1-1/D},D-1,c}^{\otimes n^{1/D}} -G_{\text {Planes}(D)}) G_{\text {Rows}(D,n)}\end{aligned}$$
(166)
$$\begin{aligned}{} & {} + G_{\text {Planes}(D)} G_{\text {Rows}(D,n)}- G_\text {Haar}\Big \Vert _\infty \end{aligned}$$
(167)
$$\begin{aligned}\le & {} \Big \Vert ({\tilde{G}}_{n^{1-1/D},D-1,c}^{\otimes n^{1/D}} -G_{\text {Planes}(D)}) G_{\text {Rows}(D,n)}\Big \Vert _\infty \end{aligned}$$
(168)
$$\begin{aligned}{} & {} + \Big \Vert G_{\text {Planes}(D)} G_{\text {Rows}(D,n)}- G_\text {Haar}\Big \Vert _\infty \end{aligned}$$
(169)
$$\begin{aligned}\le & {} \Big \Vert {\tilde{G}}_{n^{1-1/D},D-1,c}^{\otimes n^{1/D}} -G_{\text {Planes}(D)}\Big \Vert _\infty \nonumber \\{} & {} + \Big \Vert G_{\text {Planes}(D)} G_{\text {Rows}(D,n)}- G_\text {Haar}\Big \Vert _\infty \end{aligned}$$
(170)
$$\begin{aligned}\le & {} n^{1/D} \Big \Vert {\tilde{G}}_{n^{1-1/D},D-1,c} - G_{\text {Haar}(p_1)} \Big \Vert _\infty \nonumber \\{} & {} + \Big \Vert G_{\text {Planes}(D)} G_{\text {Rows}(D,n)}- G_\text {Haar}\Big \Vert _\infty \end{aligned}$$
(171)
$$\begin{aligned}\le & {} n^{1/D} \frac{1}{d^{O( n^{(1-1/D)\cdot 1/{(D-1)}})}} + 1/d^{\Omega (n^{1-1/D})}\end{aligned}$$
(172)
$$\begin{aligned}\le & {} \frac{n^{1/D}}{d^{\Omega (n^{1/D})}} + 1/d^{\Omega (n^{1-1/D})}\end{aligned}$$
(173)
$$\begin{aligned}\le & {} \frac{1}{d^{\Omega (n^{1/D})}}. \end{aligned}$$
(174)

\(\square \)

Lemma

(Restatement of Lemma 41). Let \(|x\rangle \) and \(|y\rangle \) be two computational basis states. For small enough \(D = O(\ln n / \ln \ln n)\) and large enough c, \(|\langle x | {\tilde{G}}_{n,D,c} -G_\text {Haar}| y\rangle | \le \frac{\epsilon }{d^{nt}}\) for some \(\epsilon = 1/d^{\Omega (n^{1/D})}\).

Proof

The proof is by induction. Our induction hypothesis is \(\max _x |\langle x | ({\tilde{G}}_{n,D,c} -G_\text {Haar})^2 | x\rangle | \le \frac{\epsilon }{d^{nt}}\). First, we show this bound (for sub-lattices of dimension \(D-1\)) implies the statement of this theorem:

$$\begin{aligned} |\langle x | {\tilde{G}}_{n,D,c} -G_\text {Haar}| y\rangle |= & {} |\langle x | {\tilde{G}}_{n^{1-1/D},D-1,c}^{\otimes n^{1/D}} ({\tilde{G'}}_{n,D,c}^{c-1} - G_\text {Haar}) {\tilde{G}}_{n^{1-1/D},D-1,c}^{\otimes n^{1/D}} | y\rangle |\nonumber \\\le & {} \Big \Vert {{{\hat{G}}}}_{n,D,c} - G_\text {Haar}\Big \Vert ^{c-1}_\infty \Big \Vert {\tilde{G}}_{n^{1-1/D},D-1,c}^{\otimes n^{1/D}} |y\rangle \langle x| {\tilde{G}}_{n^{1-1/D},D-1,c}^{\otimes n^{1/D}}\Big \Vert _1\nonumber \\\le & {} \Big \Vert {{{\hat{G}}}}_{n,D,c} - G_\text {Haar}\Big \Vert ^{c-1}_\infty \nonumber \\{} & {} \times \max _{x,y \in [d]^{2 t n^{1-1/D}}}\Big \Vert {\tilde{G}}_{n^{1-1/D},D-1,c} |y\rangle \langle x| {\tilde{G}}_{n^{1-1/D},D-1,c}\Big \Vert ^{ n^{1/D}}_1\nonumber \\\le & {} \frac{1}{d^{O(c \cdot n^{1/D})}} \max _{x,y \in [d]^{2 t n^{1-1/D}}}\Big \Vert {\tilde{G}}_{n^{1-1/D},D-1,c}|y\rangle \langle x| {\tilde{G}}_{n^{1-1/D},D-1,c}\Big \Vert ^{ n^{1/D}}_1\nonumber \\\le & {} \frac{1}{d^{O(c \cdot n^{1/D})}} \max _{x \in [d]^{2 t n^{1-1/D}}} |\langle x| {\tilde{G}}^2_{n^{1-1/D},D-1,c}|x\rangle |^{ n^{1/D}}\nonumber \\\le & {} \frac{1}{d^{O(c \cdot n^{1/D})}} \nonumber \\{} & {} \times \max _{x \in [d]^{2 t n^{1-1/D}}} (\langle x|G_\text {Haar}|x\rangle + |\langle x| ( {\tilde{G}}_{n^{1-1/D},D-1,c}-G_H)^2|x\rangle |)^{ n^{1/D}}\nonumber \\\le & {} \frac{1}{d^{O(c \cdot n^{1/D})}} \left( \max _{x \in [d]^{2 t n^{1-1/D}}}\frac{t! + 1/d^{n^{(1-1/D) \cdot \frac{1}{D-1}}}}{d^{n^{1-1/D}t}} \right) ^{ n^{1/D}}\nonumber \\\le & {} \frac{\epsilon }{d^{nt}}. \end{aligned}$$
(175)

Next, assumming \(\max _x |\langle x | ({\tilde{G}}_{n^{1-1/D},D-1,c} -G_\text {Haar})^2 | x\rangle | \le \frac{\epsilon }{d^{n^{1-1/D} t}}\), we show \(\max _x |\langle x | ({\tilde{G}}_{n,D,c} -G_\text {Haar})^2 | x\rangle | \le \frac{\epsilon }{d^{nt}}\). The proof is very similar to the above calculation:

$$\begin{aligned} |\langle x | ({\tilde{G}}_{n,D,c} -G_\text {Haar})^2 | y\rangle |= & {} \langle x| | {\tilde{G}}_{n^{1-1/D},D-1,c}^{\otimes n^{1/D}} ({{{\hat{G}}}}_{n,D,c}^{c-1} - G_\text {Haar}) {\tilde{G}}_{n^{1-1/D},D-1,c}^{\otimes n^{1/D}} \nonumber \\{} & {} \times ({{{\hat{G}}}}_{n,D,c}^{c-1} - G_\text {Haar}) {\tilde{G}}_{n^{1-1/D},D-1,c}^{\otimes n^{1/D}} | |y\rangle \nonumber \\\le & {} \Big \Vert ({{{\hat{G}}}}_{n,D,c}^{c-1} - G_\text {Haar}) {\tilde{G}}_{n^{1-1/D},D-1,c}^{\otimes n^{1/D}} ({{{\hat{G}}}}_{n,D,c}^{c-1} - G_\text {Haar}) \Big \Vert _\infty \nonumber \\{} & {} \times \Big \Vert {\tilde{G}}_{n^{1-1/D},D-1,c}^{\otimes n^{1/D}} |y\rangle \langle x| {\tilde{G}}_{n^{1-1/D},D-1,c}^{\otimes n^{1/D}}\Big \Vert _1\nonumber \\\le & {} \Big \Vert {{{\hat{G}}}}_{n,D,c}^{c-1} - G_\text {Haar}\Big \Vert _\infty \nonumber \\{} & {} \times \max _{x,y \in [d]^{2 t n^{1-1/D}}}\Big \Vert {\tilde{G}}_{n^{1-1/D},D-1,c} |y\rangle \langle x| {\tilde{G}}_{n^{1-1/D},D-1,c}\Big \Vert ^{ n^{1/D}}_1\nonumber \\\le & {} \frac{\epsilon }{d^{nt}}. \end{aligned}$$
(176)

In the third line we have used Lemma 26. We skip the calculations after the third line because it is similar to the calculations of (175). \(\square \)

Next, we prove Lemma 43. Lemma 42 is a special case of this lemma and we skip its proof.

Lemma

(Restatement of Lemma 43). For small enough \(D = O(\ln n / \ln \ln n)\) and large enough c,

$$\begin{aligned} \left\| \text {Ch}\left[ ( G_{\text {Rows}(D,n)} G_{\text {Planes}(D)} G_{\text {Rows}(D,n)})^{c} - G_\text {Haar}^{(t)}\right] \right\| _\diamond = \frac{t^{O(t n^{1-1/D})}}{d^{\Omega (c n^{1-1/D})}}. \end{aligned}$$
(177)

Proof

As discussed in Sect. 2.3, the superoperator \(\text {Ch}[G^{(t)}_\text {Haar}]\) can be written in the following canonical form

$$\begin{aligned} \text {Ch}[G^{(t)}_\text {Haar}] [X] = \sum _{\pi \in S_t} {\textrm{Tr}}(V(\pi )X) {{\,\textrm{Wg}\,}}(\pi ). \end{aligned}$$
(178)

Using the notation defined in Sect. 2.3, \({\mathcal {X}}[\text {Ch}[G^{(t)}_\text {Haar}]] = G^{(t)}_\text {Haar}\) and

$$\begin{aligned} \left( G_{\text {Rows}(D,n)} G_{\text {Planes}(D)} G_{\text {Rows}(D,n)}\right) ^c - G^{(t)}_\text {Haar}=: \sum _{a, b \in S_t^{\times n^{1-1/D}}} |D_a\rangle \Lambda _{a,b} \langle D_b|. \end{aligned}$$
(179)

Using the definition of \(\Lambda \) we can write

$$\begin{aligned}{} & {} \text {Ch}\left[ (G_{\text {Rows}(D,n)}G_{\text {Planes}(D)}G_{\text {Rows}(D,n)})^c - G^{(t)}_\text {Haar}\right] \nonumber \\{} & {} \quad = \text {Ch}\left[ \sum _{a, b \in S_t^{\times n^{1-1/D}}} |D_a\rangle \Lambda _{a,b} \langle D_b|\right] = \sum _{a, b\in S_t^{\times n^{1-1/D}}} \frac{1}{d^{nt}}V(a) \Lambda _{a,b} V^*(b).\nonumber \\ \end{aligned}$$
(180)

Therefore

$$\begin{aligned}{} & {} \Vert \text {Ch}[(G_{\text {Rows}(D,n)}G_{\text {Planes}(D)}G_{\text {Rows}(D,n)})^c - G^{(t)}_\text {Haar}]\Vert _\diamond \nonumber \\{} & {} \quad \le \sum _{a, b\in S_t^{\times n^{1-1/D}}} |\Lambda _{a,b}| \Vert \frac{1}{d^{nt}}V(a) V^*(b)\Vert _\diamond \nonumber \\{} & {} \quad \le \sum _{a, b\in S_t^{\times n^{1-1/D}}} |\Lambda _{a,b}| \le t^{O(n^{1-1/D})} \Vert \Lambda \Vert ^c_\infty . \end{aligned}$$
(181)

Here we have used \(\Vert \frac{1}{d^{nt}}V(a) V^*(b)\Vert _\diamond \le 1\). This is because \(V(a) V^*(b)\) is a tensor product of \(n^{1-1/D}\) superoperators, i.e., \( \otimes _i V(a_i) V^*(b_i)\), and hence \(\Vert V(a) V^*(b)\Vert _\diamond = \prod _i \Vert V(a_i) V^*(b_i)\Vert _\diamond \). It is enough to show that each of \(\Vert V(a_i) V^*(b_i)\Vert _\diamond \) is bounded by 1.

$$\begin{aligned} \frac{1}{d^{nt}}\left\| V(a_1)V(b_1)^*\right\| _\diamond= & {} \frac{1}{d^{nt}} \sup _{X: \left\| X\right\| _1 = 1} \left\| \mathop {{\textrm{Tr}}}\limits _A (V(a_1)_A \otimes {{\,\textrm{id}\,}}_B X_{AB}) \otimes V_A(b_1)\right\| _1\nonumber \\= & {} \sup _{X: \Vert X\Vert _1 = 1} \Big \Vert \mathop {{\textrm{Tr}}}\limits _A (V(a_1)_A \otimes {{\,\textrm{id}\,}}_B X_{AB})\Big \Vert _1\cdot \frac{1}{d^{nt}} \Big \Vert V_A(b_1)\Big \Vert _1\nonumber \\\le & {} \sup _{X: \Vert X\Vert _1 = 1} \Big \Vert V_{(a_1)} \otimes {{\,\textrm{id}\,}}X_{AB}\Big \Vert _1 \cdot 1\nonumber \\= & {} \sup _{X: \Vert X\Vert _1 = 1} \Vert X_{AB}\Vert _1\nonumber \\\le & {} 1. \end{aligned}$$
(182)

It is enough to compute \(\Vert \Lambda \Vert _\infty \). Let \(|a\rangle \) be an orthonormal basis labeled according to the indices of \(\Lambda \). Define

$$\begin{aligned} T:= \sum _{a,b} \sqrt{\Lambda }_{a b} |D_a\rangle \langle b|. \end{aligned}$$
(183)

\(T T^\dagger = \sum _{a, b} |D_a\rangle \Lambda _{a,b} \langle D_b|\) and \(T^\dagger T = \sum _{a, b} |a\rangle (\sqrt{\Lambda }J \sqrt{\Lambda })_{ab} \langle b|\), where \([J]_{a,b} = \langle D_a | D_b\rangle \). First of all, using Lemma 25\(T T^\dagger \) and \(T^\dagger T\) have the same spectra. Hence

$$\begin{aligned} \Vert \sum _{a, b} |D_a\rangle \Lambda _{a,b} \langle D_b|\Vert _\infty = \Vert T T^\dagger \Vert _\infty = \Vert T^\dagger T\Vert _\infty = \Vert \sqrt{\Lambda }J \sqrt{\Lambda }\Vert _\infty \end{aligned}$$
(184)

Therefore

$$\begin{aligned} \Vert \Lambda \Vert _\infty{} & {} \le \Big \Vert \sum _{a, b} |D_a\rangle \Lambda _{a,b} \langle D_b|\Big \Vert _\infty + \Vert \sqrt{\Lambda }(J-{{\,\textrm{id}\,}}) \sqrt{\Lambda }\Vert _\infty \nonumber \\{} & {} \le \Big \Vert \left( G_{\text {Rows}(D,n)} G_{\text {Planes}(D)} G_{\text {Rows}(D,n)}\right) ^c - G^{(t)}_\text {Haar}\Big \Vert _\infty + \Vert \sqrt{\Lambda }\Vert _\infty \Vert J-{{\,\textrm{id}\,}}\Vert _\infty \Vert \sqrt{\Lambda }\Vert _\infty \nonumber \\{} & {} = \Big \Vert \left( G_{\text {Rows}(D,n)} G_{\text {Planes}(D)} G_{\text {Rows}(D,n)}\right) ^c - G^{(t)}_\text {Haar}\Big \Vert _\infty + \Vert \Lambda \Vert _\infty \Vert J-{{\,\textrm{id}\,}}\Vert _\infty \end{aligned}$$
(185)

As a result

$$\begin{aligned} \Vert \Lambda \Vert _\infty \le \frac{ \left\| (G_{\text {Rows}(D,n)} G_{\text {Planes}(D)} G_{\text {Rows}(D,n)})^c - G^{(t)}_\text {Haar}\right\| _\infty }{1 -\Vert J-{{\,\textrm{id}\,}}\Vert _\infty }. \end{aligned}$$
(186)

In Lemma 40 we showed that \(\Vert (G_{\text {Rows}(D,n)} G_{\text {Columns}(D)} G_{\text {Rows}(D,n)})^c - G^{(t)}_\text {Haar}\Vert _\infty \le \Vert G_{\text {Columns}(D)} G_{\text {Rows}(D,n)} - G^{(t)}_\text {Haar}\Vert ^c_\infty = 1/d^{O(c n^{1-1/D})}\). It is enough to show that \(\Vert J-{{\,\textrm{id}\,}}\Vert _\infty \) is small. But J is tensor product of \(n^{1-1/D}\) Gram matrices \(J_1\) such that \(\Vert J_1 - {{\,\textrm{id}\,}}\Vert _\infty = \frac{O(t^2)}{d^{n^{1/D}}}\) (see Lemma 57), hence \(\Vert J-{{\,\textrm{id}\,}}\Vert _\infty = n^{1-1/D} \frac{O(t^2)}{d^{n^{1/D}}}\) which is bounded by 1/2 for large enough n and constant t and D. As a result, \(\Vert \Lambda \Vert _\infty = 1/d^{O(c n^{1-1/D})}\). Combining this with (181) we find that

$$\begin{aligned} \left\| \text {Ch}\left[ (G_{\text {Rows}(D,n)} G_{\text {Planes}(D)}G_{\text {Rows}(D,n)})^c - G^{(t)}_\text {Haar}\right] \right\| _\diamond \le t^{O(t n^{1-1/D})}1/d^{O(c n^{1-1/D})}.\nonumber \\ \end{aligned}$$
(187)

\(\square \)

4 \(O(n \ln ^2 n)\)-Size Random Circuits with Long-Range Gates Output Anti-concentrated Distributions

Recall that for a circuit C, \(\text {Coll}(C)\) is the collision probability,

$$\begin{aligned} \sum _{x \in \{0,1\}^n} |\langle x |C|0\rangle |^4, \end{aligned}$$
(188)

of C in the computational basis. Also recall that \(\mu ^{(\text {CG})}_{t}\) is the distribution over random circuits obtained from application of t random long-range gates. Unlike the previous section where we used t to denote the degree of a monomial, here we use t for time, i.e. the number of time-steps in a random circuit.

The goal of this section is to prove the following theorem:

Theorem

(Restatement of Theorem 13). There exists a c such that when \(s > c n \ln ^2 n\),

$$\begin{aligned} \mathop {{\mathbb {E}}}\limits _{C \sim \mu ^{\text {CG}}_s} \text {Coll}(C) \le \frac{29}{2^n}. \end{aligned}$$
(189)

Moreover if \(t \le \frac{1}{3 c'} n \ln n\) for some large enough \(c'\), then

$$\begin{aligned} \mathop {{\mathbb {E}}}\limits _{C \sim \mu ^{\text {CG}}_s} \text {Coll}(C) \ge \frac{1.6 ^{n^{1-1/c'}}}{2^n}. \end{aligned}$$
(190)

Our strategy is to relate the convergence of the expected collision probability to a classical Markov chain mixing problem. In Sect. 4.1 we go over the notation and definitions we use in the proof of this theorem. In Sect. 4.2 we prove the theorem. This proof is based on several lemmas which we will prove in sections 4.3 and 4.5.

4.1 Background: random circuits with long-range gates and Markov chains

Previous work [17, 18, 34, 51] demonstrates that if we only care about the second moment of \(\mu ^{(\text {CG})}_t\), then the corresponding moment superoperator is related to a certain classical Markov chain. In particular the application of the moment superoperator on the basis \(\text{ P}^2_n:= \left\{ \sigma _p \otimes \sigma _p: p \in \{0,1,2,3\}^n\right\} \) is a classical Markov chain. We now describe this connection.

We first start with some basic properties of moment superoperators.

Fact 58

Let \(\mu \) and \(\mu _1, \ldots , \mu _K\) be distributions over circuits.

  1. 1.

    If \(\mu \) is a convex combination of \(\mu _1, \ldots , \mu _K\) then \(\text {Ch}\left[ G^{(2)}_\mu \right] \) is the same convex combination of \(\text {Ch}\left[ G^{(2)}_{\mu _1}\right] , \ldots ,\text {Ch}\left[ G^{(2)}_{\mu _K}\right] \).

  2. 2.

    If \(\mu \) is the composition of a circuit from \(\mu _1\) with a circuit with \(\mu _2\), then \(\text {Ch}\left[ G^{(2)}_\mu \right] = \text {Ch}\left[ G^{(2)}_{\mu _2}\right] \circ \text {Ch}\left[ G^{(2)}_{\mu _1}\right] \).

Recall that \(\text {Ch}\left[ G^{(2)}_{i,j}\right] \) denotes \(\text {Ch}\left[ G^{(2)}_{\text{ U }(4)}\right] \) applied to qubits i and j. Since \(\mu ^{\text {CG}}_1\) is a convex combination of two-qubit random \(\text{ U }(4)\) gates, the first point above implies that

$$\begin{aligned} \text {Ch}\left[ G^{(2)}_{\mu ^{\text {CG}}_1}\right] = \frac{2}{n(n-1)} \sum _{i < j}\text {Ch}\left[ G^{(2)}_{i,j}\right] \end{aligned}$$
(191)

and since \(\mu ^{\text {CG}}_t\) is t times compositions of \(\mu ^{\text {CG}}_1\) with itself, the second item implies that

$$\begin{aligned} \text {Ch}\left[ G^{(2)}_{\mu ^{\text {CG}}_t}\right] = \left( \frac{2}{n(n-1)} \sum _{i < j}\text {Ch}\left[ G^{(2)}_{i,j}\right] \right) ^t. \end{aligned}$$
(192)

The moment superoperator \(\text {Ch}[G^{(2)}_{\text{ U }(4)}]\) has the following simple action on the Pauli basis:

$$\begin{aligned}{} & {} \text {Ch}[G^{(2)}_{\text{ U }(4)}] (\sigma _p\otimes \sigma _q \otimes \sigma _a\otimes \sigma _b)\nonumber \\{} & {} \quad = {\left\{ \begin{array}{ll} \sigma _0\otimes \sigma _0 \otimes \sigma _0\otimes \sigma _0 &{} p q =ab = 00 \\ \frac{1}{15} \sum _{\begin{array}{c} c, d \in \{0,1,2,3\}^2 \\ cd \ne 00 \end{array}} \sigma _c \otimes \sigma _d \otimes \sigma _c \otimes \sigma _d &{} p q =ab \ne 00\\ 0 &{} \text {otherwise} \end{array}\right. } \nonumber \\ \end{aligned}$$
(193)

In particular the action of \(\text {Ch}[G^{(2)}_{\text{ U }(4)}]\) on the Pauli basis \(\text{ P}^2_2\) is a stochastic matrix, and for any pair \(i\ne j\) the action of \(\text {Ch}[G^{(2)}_{\text{ U }(4)}]\) on qubits ij can be represented by a stochastic matrix acting on \(\text{ P}^2_n\). Using (192) \(\text {Ch}\left[ G^{(2)}_{\mu ^{\text {CG}}_s}\right] \) on \(\text{ P}^2_n\) is also a stochastic matrix. We can describe this stochastic matrix as a Markov chain on state space \({\mathcal {S}}= \{0,1,2,3\}^n\), with \(S_t\in {\mathcal {S}}\) describing the string at time t.

It turns out that the expected collision probability depends on the subset of qubits that have been hit by the random circuit. In case a subset of size m of qubits (out of n qubits) never have a gate applied to them, then the expected collision probability converges to a value like \(\approx \frac{2^m}{2^n}\) and not \(\frac{1}{2^n +1}\). So we need to separately track which qubits have ever been hit by a gate throughout this process. Let \(H_t\in 2^{[n]}\) denote the set of qubits that have been hit by at least one gate by time t, where \(2^{[n]}\) denotes the power set of [n].

Together \((S_t,H_t)\) can be modeled as the following Markov chain.

Definition 59

Let \((S_0,H_0), (S_1,H_1), (S_2,H_2), \ldots \in {\mathcal {S}}\times 2^{[n]}\) be the following classical Markov chain. Initially \(H_0 = \emptyset \) and \(S_0\) is a random element of \(\{0,3\}^n \backslash 0^n\). At each time step t we choose a random pair \(i,j \in [n]\) with \(i\ne j\). We let \(H_{t+1} = H_t \cup \{i,j\}\) so that \(H_t\) represents the set of all indices chosen up to time t. We determine \(S_{t+1}\) from \(S_t\) using the transition rule of (193). Specifically if the ij positions of \(S_t\) are both 0, then we leave them equal to 00, and otherwise we replace them with a uniformly random element of \(\{01,02,\ldots ,33\}\).

Suppose we condition on \(H_t \supseteq H\) for some set H with \(|H|=n-m\). Let

$$\begin{aligned} P_t^{(n-m)}(k):= \Pr \left[ | S_t(H)|=k | H_t \subseteq H\right] . \end{aligned}$$
(194)

We can use this notation since the RHS of (194) depends only on |H|, tnk and not on H.

For a function \(f: [n] \rightarrow {\mathbb {R}}\) we define \(\Vert \cdot \Vert _*\) to be the following norm

$$\begin{aligned} \Vert f\Vert _*:= \sum _{k \in [n]} \frac{|f(k)|}{3^k}. \end{aligned}$$
(195)

4.1.1 Summary of the definitions.

See below for a summary of the definitions:

Notation

Definition

Reference

\(\text {Coll}(C)\)

The collision probability of circuit C

Equation (188)

\(G^{(t)}_\mu \)

Average of \(C^{\otimes t,t}\) over \(C\sim \mu \)

Definition 16

\(G^{(t)}_{i,j}\)

Haar projector of order t on qudits i and j

Definition 16

\(\mu ^{\text {CG}}_{t}\)

The distribution over circuits with t random two-qubit gates

Definition 24

\(P^2_n\)

\(\{\sigma _p \otimes \sigma _p : p \in \{0,1,2,3\}^n\}\)

Section 4.1

\(S_0, S_1, \ldots \)

Markov chain of Pauli strings

Definition 59

\(H_t\)

Subset of [n] that is covered according to the Markov chain of Pauli strings

Definition 59

\(S'_0, S'_1, \ldots \)

Accelerated Markov chain of binary strings with decoupled coordinates

Definition 82

\(X_t\)

\(|S_t|\)

Section 4.5

\(Y_t\)

Steps of the accelerated Markov chain Q

Section 4.5

\(P_t^{(n-m)}(k)\)

\(\Pr \left[ | S_t(H)|=k | H_t \subseteq H\right] \) for any fixed H with \(|H|=n-m\)

Equation (194)

\(P_t(k)\)

\( \Pr \left[ | S_t(H)|=k\right] \). Also equal to \(P_t^{(n)}(k)\)

Equation (194)

\(\Vert f\Vert _*\)

\(\sum _{x=1}^n \frac{|f(x)|}{3^x}\)

Equation (195)

\(\sum _{x=1}^n |f(x)|\frac{3n}{x 3^x}\)

Proposition 74

P

Transition matrix of the birth and death Markov chain

Equation (229)

Q

Transition matrix of the partially accelerated Markov chain

Equation (234)

\(T_{\text {left (right)}} (Y^s)\)

Wait time for the steps \(Y_0,\ldots , Y_s\) on the left (right) hand side of site \(\frac{5}{6} n\)

Section 4.5.2

\(\nu \)

3/4n

Section 4.5.2

\(\nu _\tau \)

\(Y_0 \exp (- \frac{\tau }{\nu }) + \nu (1- \exp (- \frac{\tau }{\nu }))\)

Section 4.5.2

\(\beta \)

\(8(4+c) \ln n\) for constant c fixed in advance

Section 4.5.2

x(0)

\(\nu /\beta \)

Section 4.5.2

\(\rho _x\)

\(\sum _{j = 1}^s I\{Y_j = x\}\)

Section 4.5.2

A

\(\cap _{1 \le x\le x(0)} \{N_x \le \beta x\}\)

Section 4.5.2

\(M_s\)

\(\min _{1 \le j \le s} \{Y_j\}\)

Section 4.5.2

\(y^s\)

Short hand for \((y_0, \ldots , y_s)\)

Section 4.5.6

\(\textsf{Bin} (n,p)\)

Binomial distribution on n elements each occurring with probability p

 

\(\textsf{Geo}(\alpha )\)

Geometric distribution with mean \(\frac{1}{\alpha }\)

 

\(\textsf{Pois} (\tau )\)

Poisson distribution with mean \(\tau \)

 

\(\textsf{Unif} [a,b]\)

Continuous uniform distribution on the interval [ab]

 

4.2 Proof of Theorem 13: bound on the collision probability

Before giving the proof we state the following three main theorems. The first one relates the expected collision probability to the \(\Vert \cdot \Vert _*\) norm of the probability vector on the state space of the Markov chain of weights. More concretely

Theorem 60

$$\begin{aligned} \mathop {{\mathbb {E}}}\limits _{C \sim \mu ^{\text {CG}}_t} \text {Coll}(C) \le \frac{1}{2^n} + \sum _{m=0}^n {n \atopwithdelims ()m} e^{ -t m/n} \Vert P_t^{(n-m)}\Vert _*\end{aligned}$$
(196)

This result is proved in Sect. 4.3.

The second theorem shows that for \(t \approx n \ln ^2 n\), \(\Vert P^{(n)}_t\Vert _*\approx \text {Constant} \times \frac{1}{2^n + 1}\), where \(\frac{1}{2^n+1}\) is the value of this norm at the stationary state.

Theorem 61

There exists a constant c such that if \(t = c n \ln ^2 n\) then \(\Vert P^{(m)}_t\Vert _*\le \frac{28}{2^m + 1}\).

This result is proved in Sect. 4.5.

The third theorem gives an exact expression for the collision probability in terms of the Markov chain \(S_0, S_1, \ldots \). We use this to compute the lower bound.

Theorem 62

\(\text {Coll}_{\mu ^{\text{ CG }}_t} = \frac{1}{2^n} \left( 1 + \sum _{p,q \in \{0,3\}^n\backslash 0^n} \Pr [S_t = p | S_0 = q]\right) \)

The proof of this expression is the same as equation (215) and is derived in section 4.3.

Proof of Theorem 13

We first prove the upper bound. There are two major steps.

Combining Theorems 60 and 61 and choosing \(t=cn\ln ^2(n)\) we obtain

$$\begin{aligned} \mathop {{\mathbb {E}}}\limits _{C \sim \mu ^{\text {CG}}_{t}} \text {Coll}(C)\le & {} \frac{1}{2^n} + \sum _{m=0}^n {n \atopwithdelims ()m} e^{ -t m/n} \frac{28}{2^{n-m}}\end{aligned}$$
(197)
$$\begin{aligned}\le & {} \frac{1}{2^n} (1 +28 (1 + 2 e^{-t/n})^n)\end{aligned}$$
(198)
$$\begin{aligned}\le & {} \frac{1}{2^n} (1 +28 (1 + \frac{2}{n^{c \ln n}})^n)\end{aligned}$$
(199)
$$\begin{aligned}\le & {} \frac{29}{2^n+1}. \end{aligned}$$
(200)

Here we need to assume n is larger than some universal constant. This can be done by adjusting c to cover the finite set of cases where n is too small.

For the lower bound we use the expression in Theorem 62 and bound it according to

$$\begin{aligned} \text {Coll}_{\mu ^{\text{ CG }}_t}\ge & {} \frac{1}{2^n}\sum _{p \in \{0,3\}^n} \Pr [S_t = p | S_0 = p], \end{aligned}$$
(201)
$$\begin{aligned}= & {} \frac{1}{2^n}\sum _{k=0}^n \sum _{p \in \{0,3\}^n: |p|_H = k} \Pr [S_t = p | S_0 = p], \end{aligned}$$
(202)
$$\begin{aligned}\ge & {} \frac{1}{2^n}\sum _{k=0}^n {n \atopwithdelims ()k} r_k^t, \end{aligned}$$
(203)

where

$$\begin{aligned} r_k = \frac{14}{15} (1 - \frac{k}{n}) (1-\frac{k}{n-1}) + \frac{1}{15} \ge e^{- 3 \frac{k}{n}}, \end{aligned}$$
(204)

is the probability that a string of Hamming weight k does not change after one step of the Markov chain. Assume \(t \le \frac{1}{3 c'} n \ln n\) then

$$\begin{aligned} \text {Coll}_{\mu ^{\text{ CG }}_t}\ge & {} \frac{1}{2^n}\sum _{k=0}^n {n \atopwithdelims ()k} e^{- 3 \frac{kt}{n}},\end{aligned}$$
(205)
$$\begin{aligned}= & {} \frac{1}{2^n}(1 + e^{- 3 t/n})^n,\end{aligned}$$
(206)
$$\begin{aligned}\ge & {} \frac{1}{2^n}\exp \big (\frac{n^{1-1/c'}}{1 + n^{-1/c'}}\big )\end{aligned}$$
(207)
$$\begin{aligned}\ge & {} \frac{1}{2^n} \cdot 1.6 ^{n^{1-1/c'}} \end{aligned}$$
(208)

\(\square \)

4.3 Proof of Theorem 60: relating collision probability to a Markov chain

In this section we relate the expected collision probability of a random circuit with long-range gates to the \(\Vert \cdot \Vert _*\) norm of the probability vector \(P^{(m)}_t\) defined in Sect. 4.1. We will prove several intermediate results along the way to Theorem 60.

Theorem 63

(Section 3 of [34]). We can write

$$\begin{aligned} \text {Ch}\left[ G^{(2)}_{\mu ^{\text {CG}}_t}\right] (\sigma _q \otimes \sigma _q) = \sum _{p \in \{0,1,2,3\}^n} \Pr [S_t = p|S_0 = q] \sigma _p \otimes \sigma _p. \end{aligned}$$
(209)

Proof of Theorem 62

We can write the expected collision probability in terms of the moment superoperator \(\text {Ch}\left[ G^{(2)}_{\mu ^{\text {CG}}_t}\right] \). We use the notation \(\text {Coll}_{\mu ^{\text {CG}}_t} = {\mathbb {E}}_{C \sim \mu ^{\text {CG}}_t} \text {Coll}(C)\):

$$\begin{aligned} \text {Coll}_{\mu ^{\text {CG}}_t}= & {} \sum _{z \in \{0,1\}^n} \mathop {{\mathbb {E}}}\limits _{C \sim {\mu ^{\text {CG}}_t}} | \langle z | C | 0 \rangle |^4\nonumber \\= & {} \sum _{z \in \{0,1\}^n} \langle z| \otimes \langle z| \mathop {{\mathbb {E}}}\limits _{C \sim {\mu ^{\text {CG}}_t}} \left( C |0^n\rangle \langle 0^n| C^\dagger \otimes C |0^n\rangle \langle 0^n| C^\dagger \right) |z\rangle \otimes |z\rangle \nonumber \\= & {} {\textrm{Tr}}\sum _{z \in \{0,1\}^n} \left| z\right\rangle \left\langle z\right| \otimes \left| z\right\rangle \left\langle z\right| \text {Ch}\left[ G^{(2)}_{\mu ^{\text {CG}}_t}\right] \left( |0^n\rangle \langle 0^n| \otimes |0^n\rangle \langle 0^n|\right) \end{aligned}$$
(210)

It is useful to write \(|0^n\rangle \langle 0^n| \otimes |0^n\rangle \langle 0^n|\) and \(\sum _{z \in \{0,1\}^n} \left| z\right\rangle \left\langle z\right| \otimes \left| z\right\rangle \left\langle z\right| \) in the Pauli basis:

$$\begin{aligned} |0^n\rangle \langle 0^n| \otimes |0^n\rangle \langle 0^n|&= \frac{1}{4^n} \sum _{p,q \in \{0,3\}^n} \sigma _p \otimes \sigma _q. \end{aligned}$$
(211)
$$\begin{aligned} \sum _{z \in \{0,1\}^n} \left| z\right\rangle \left\langle z\right| \otimes \left| z\right\rangle \left\langle z\right|&= \frac{1}{2^n} \sum _{p \in \{0,3\}^n} \sigma _p \otimes \sigma _p. \end{aligned}$$
(212)

Then the collision probability becomes:

$$\begin{aligned} \text {Coll}_\nu= & {} \frac{1}{2^n} +(1- \frac{1}{2^n})\frac{1}{2^n}{\textrm{Tr}}\left( \sum _{z \in \{0,1\}^n} \left| z\right\rangle \left\langle z\right| \otimes \left| z\right\rangle \left\langle z\right| \right) \text {Ch}\left[ G^{(2)}_{\mu ^{\text {CG}}_t}\right] \nonumber \\{} & {} \times \left( \frac{1}{2^n-1} \sum _{q \in \{0,3\}^n\backslash 0^n} \sigma _q \otimes \sigma _q \right) \end{aligned}$$
(213)

Using Theorem 63

$$\begin{aligned}{} & {} \text {Ch}\left[ G^{(2)}_{\mu ^{\text {CG}}_t}\right] \left( \frac{1}{2^n-1} \sum _{q \in \{0,3\}^n\backslash 0^n} \sigma _q \otimes \sigma _q\right) \nonumber \\ {}{} & {} \quad = \frac{1}{2^n-1} \sum _{\begin{array}{c} p \in \{0,1,2,3\}^n\backslash 0^n \\ q \in \{0,3\}^n\backslash 0^n \end{array}} \Pr [S_t = p | S_0 = q]\sigma _p \otimes \sigma _p.\nonumber \\ \end{aligned}$$
(214)

As a result

$$\begin{aligned} \text {Coll}_{\mu ^{\text{ CG }}_t}= & {} \frac{1}{2^n}\left( 1 +\sum _{p,q \in \{0,3\}^n\backslash 0^n} \Pr [S_t = p | S_0 = q]\right) \end{aligned}$$
(215)

\(\square \)

For a string \(a \in \{0,1,2,3\}^n\) and a subset \(A \in 2^{[n]}\) we let a(A) denote the substring of a restricted to A.

Lemma 64

For \(H \subseteq [n]\) and \(p,q \in \{0,1,2,3\}^n\)

$$\begin{aligned} \Pr [S_t = p | S_0 = q, H_t=H] = \frac{1}{{|H| \atopwithdelims ()|p(H)|} 3^{|p(H)|}}\Pr [|S_t(H)|=|p(H)| \big | S_0 = q, H_t=H]\nonumber \\ \end{aligned}$$
(216)

if \(q([n] \backslash H) = p([n] \backslash H)\) and 0 otherwise.

In other words, once we condition on \(H_t=H\), the probability distribution of \(S_t(H)\) depends only on its Hamming weight.

Proof

Conditioned on \(H_t=H\) the sites of \([n]\backslash H\) are not hit, so the event that \(q([n] \backslash H) \ne p([n] \backslash H)\) has zero probability. Now since the set H is covered, 1, 2 or 3 have equal probabilities of appearing at any position of the string \(S_t (H)\). As a result for each non-zero bit of \(S_t (H)\) we get a factor of 1/3. \(\square \)

Using Theorem 62 and Lemma 64 we obtain

Corollary 65

$$\begin{aligned} \text {Coll}_{\mu ^{\text {CG}}_t} = \frac{1}{2^{n}} + (1-1/2^n) \sum _{H \subseteq [n]} \Pr [H_t = H] \sum _{1 \le k \le |H|} \frac{\Pr [|S_t(H)|=k \big | H_t=H]}{3^{k}}. \nonumber \\ \end{aligned}$$
(217)

Proof

Expanding Theorem 62 we have

$$\begin{aligned} \text {Coll}_{\mu ^{\text{ CG }}_t}&= \frac{1}{2^n} \left( 1 + \sum _{p,q \in \{0,3\}^n\backslash 0^n} \Pr [S_t = p | S_0 = q]\right) \end{aligned}$$
(218)
$$\begin{aligned}&= \frac{1}{2^n} \left( 1 + \sum _{H \subseteq [n]} \sum _{p,q \in \{0,3\}^n\backslash 0^n} \Pr [H_t = H | S_0 = q] \Pr [S_t = p | S_0 = q, H_t = H]\right) \end{aligned}$$
(219)
$$\begin{aligned}&= \frac{1}{2^n} \left( 1 + \sum _{H \subseteq [n]} \sum _{p,q \in \{0,3\}^n\backslash 0^n} \Pr [H_t = H] \Pr [S_t = p | S_0 = q, H_t = H]\right) . \end{aligned}$$
(220)

Using Lemma 64 in the above we have

$$\begin{aligned}&= \frac{1}{2^n} + \frac{1}{2^n} \sum _{H \subseteq [n]} \Pr [H_t = H] \sum _{p,q \in \{0,3\}^n\backslash 0^n} \Pr [S_t = p | S_0 = q, H_t = H],\end{aligned}$$
(221)
$$\begin{aligned}&= \frac{1}{2^n} + \frac{1}{2^n} \sum _{H \subseteq [n]} \Pr [H_t = H] \sum _{\begin{array}{c} p,q \in \{0,3\}^n\backslash 0^n\\ p([n]\backslash H) = q([n]\backslash H) \end{array}}\nonumber \\&\quad \times \frac{1}{{|H| \atopwithdelims ()|p(H)|} 3^{|p(H)|}}\Pr [|S_t(H)|=|p(H)| \big | S_0 = q, H_t=H],\end{aligned}$$
(222)
$$\begin{aligned}&= \frac{1}{2^n} + \frac{1}{2^n} \sum _{H \subseteq [n]} \Pr [H_t = H] \sum _{\begin{array}{c} q \in \{0,3\}^n \end{array}}\sum _{1 \le k \le |H|}\sum _{\begin{array}{c} p \in \{0,3\}^n\backslash 0^n\\ p([n]\backslash H) = q([n]\backslash H)\\ |p(H)| = k \end{array}}\nonumber \\&\quad \times \frac{1}{{|H| \atopwithdelims ()|p(H)|} 3^{|p(H)|}}\Pr [|S_t(H)|=|p(H)| \big | S_0 = q, H_t=H],\end{aligned}$$
(223)
$$\begin{aligned}&= \frac{1}{2^n} + \frac{1}{2^n} \sum _{H \subseteq [n]} \Pr [H_t = H] \sum _{1 \le k \le |H|} \sum _{\begin{array}{c} q \in \{0,3\}^n \end{array}}\sum _{\begin{array}{c} p \in \{0,3\}^n\backslash 0^n\\ p([n]\backslash H) = q([n]\backslash H)\\ |p(H)| = k \end{array}}\nonumber \\&\quad \times \frac{1}{{|H| \atopwithdelims ()k} 3^{k}}\Pr [|S_t(H)|=k \big | S_0 = q, H_t=H],\end{aligned}$$
(224)
$$\begin{aligned}&= \frac{1}{2^n} + \frac{1}{2^n} \sum _{H \subseteq [n]} \Pr [H_t = H] \sum _{1 \le k \le |H|} \sum _{\begin{array}{c} q \in \{0,3\}^n \end{array}}\nonumber \\&\quad \times \frac{1}{3^{k}}\Pr [|S_t(H)|=k \big | S_0 = q, H_t=H], \end{aligned}$$
(225)
$$\begin{aligned}&= \frac{1}{2^{n}} + (1-1/2^n) \sum _{H \subseteq [n]} \Pr [H_t = H] \sum _{1 \le k \le |H|} \frac{\Pr [|S_t(H)|=k \big | H_t=H]}{3^{k}}. \end{aligned}$$
(226)

\(\square \)

The standard coupon-collector bound is

Lemma 66

(coupon collector). Let \(H \subseteq [n]\). Then \(\Pr [H_t \subseteq H] \le e^{-(n-|H|)t/n }\).

Proof

Let \(E^{(i)}_H\) be the event that at step i of the circuit a random gate lands completely inside the set H. Then \(\Pr [E^{(i)}_H] = \frac{|H| (|H|-1)}{n(n-1)}\). Now \(\Pr [H_t \subseteq H] = \Pr [E^{(i)}_H]^t = \big (\frac{|H| (|H|-1)}{n(n-1)}\big )^t \le \big (\frac{|H|}{n}\big )^t \le e^{-(n-|H|)t/n }\). \(\square \)

We now have all the pieces to prove Theorem 60.

Proof of Theorem 60

Using corollary 65 the total collision probability is

$$\begin{aligned} \text {Coll}_{\mu ^{\text {CG}}_t}&= \frac{1}{2^n} + (1-1/2^n)\sum _{H\subseteq [n]} \Pr [H_t=H] \sum _{k=1}^n \frac{\Pr [|S_t(H)|=k|H_t=H]}{ 3^{k}} \nonumber \\&= \frac{1}{2^n} + (1-1/2^n)\sum _{H\subseteq [n]}\sum _{k=1}^n \frac{\Pr [|S_t(H)|=k, H_t=H]}{ 3^{k}} \nonumber \\&\le \frac{1}{2^n} + (1-1/2^n)\sum _{H\subseteq [n]}\sum _{k=1}^n \frac{\Pr [|S_t(H)|=k, H_t \subseteq H]}{ 3^{k}} \nonumber \\&\le \frac{1}{2^n} + \sum _{H\subseteq [n]} \Pr [H_t \subseteq H] \sum _{k=1}^n \frac{\Pr [|S_t(H)|=k | H_t \subseteq H]}{ 3^{k}} \nonumber \\&\le \frac{1}{2^n} + \sum _{H\subseteq [n]} \Pr [H_t\subseteq H] \sum _{k=1}^n \frac{P_t^{(|H|)}(k)}{ 3^{k}} \nonumber \\&\le \frac{1}{2^n} + \sum _{H\subseteq [n]} e^{-(n-|H|)t/n } \sum _kP_t^{(|H|)}(k) / 3^{k} \quad \text {Lemma}~66 \nonumber \\&= \frac{1}{2^n} +\frac{1}{(1-\epsilon )}\sum _{m = 0}^n {n \atopwithdelims ()m} e^{-mt/n }\Vert P^{(n-m)}_t\Vert _*. \quad \text {setting }m=n-|H| \end{aligned}$$
(227)

\(\square \)

4.4 Proof of Proposition 67: collision probability is non-increasing in time

When we try to recover the original chain from the accelerated chain we find that s steps of the accelerated chain typically correspond to \(t=O(s)\) steps of the original chain, but with a significant variance. This means that our bounds on the collision probability of the accelerated chain translate only into bounds for a distribution of times of running the original chain. This issue can be addressed using the following fact.

Proposition 67

\({\mathbb {E}}_{C \sim \mu ^{\text {CG}}_t} \text {Coll}(C)\) is a non-increasing function of t.

Proof

\(\text {Ch}[G_{\mu ^{(\text {CG})}_1}]\) corresponds to an average of \(n(n-1)/2\) projectors (using the Hilbert-Schmidt inner product). Hence it is a psd matrix with maximum eigenvalue \(\le 1\). Let \(\alpha = \sum _{p \in \{0,3\}^n} \sigma _p \otimes \sigma _p\). (210) may be written as

$$\begin{aligned}&\sum _{z \in \{0,1\}^n} {\textrm{Tr}}\left( |z\rangle \langle z| \otimes |z\rangle \langle z| \text {Ch}\left[ G^{(2)}_{\mu ^{\text {CG}}_t}\right] \left( |0^n\rangle \langle 0^n| \otimes |0^n\rangle \langle 0^n|\right) \right) \nonumber \\&\quad ={\textrm{Tr}}\left( \frac{\alpha }{2^n} \text {Ch}\left[ G^{(2)}_{\mu ^{\text {CG}}_t}\right] \left( |0^n\rangle \langle 0^n| \otimes |0^n\rangle \langle 0^n|\right) \right) \end{aligned}$$
(228)

Using (212), terms of the form \(\sigma _p \otimes \sigma _q\) for \(p \ne q\) in the decomposition of \(|0^n\rangle \langle 0^n| \otimes |0^n\rangle \langle 0^n|\) do not contribute to the collision probability. Therefore using this observation and (228), the collision probability after t steps is proportional to \({\textrm{Tr}}( \alpha \text {Ch}[G_{\mu ^{(\text {CG})}_1}]^t \alpha )\). Since \(\text {Ch}[G_{\mu ^{(\text {CG})}_1}]\) has all eigenvalues between 0 and 1, we conclude that the collision probability after t steps cannot increase in t. \(\square \)

This argument relied on the starting state being \(|0^n\rangle \). There exist starting states, such as \(|+\rangle ^{\otimes n}\), for which the collision probability increases when random gates are applied.

4.5 Proof of Theorem 61: the Markov chain analysis

Consider the following birth-and-death Markov chain on the state space \(\{0, 1,2, \ldots , n\}\).

$$\begin{aligned} P(k,l): = {\left\{ \begin{array}{ll} 1-\dfrac{2k (3n-2k-1)}{5n(n-1)} &{} l=k\\ \dfrac{2k(k-1)}{5n(n-1)} &{} l=k-1\\ \dfrac{6k(n-k)}{5n(n-1)} &{} l=k+1\\ 0 &{} \text {otherwise} \end{array}\right. } \end{aligned}$$
(229)

This Markov chain is reducible in general, however if we restrict the state space to \(\{0\}\) or \(\{1,2,\ldots ,n\}\) it is irreducible. Consider the following initial distribution over the state space \(\{1,2,\ldots ,n\}\):

$$\begin{aligned} P^{(n)}_0 (k) = \frac{{n \atopwithdelims ()k}}{2^n-1} \qquad k \in \{1,2,\ldots ,n\} \end{aligned}$$
(230)

We claim that

Lemma 68

$$\begin{aligned} P_t^{(n)} = P^t P^{(n)}_0 \end{aligned}$$
(231)

Proof

The proof follows from the fact that \(\Pr [|S_t| = l \big | |S_0| = k] = P^t(k,l)\) which was shown in Lemma 5.2 of [34]. \(\square \)

We now prove Theorem 61 which gives a sharp upper bound on \(\Vert P^{(n)}_t\Vert _*\). Throughout this section we drop the superscript (n). Moreover we use the notation \(X_t:= |S_t|\).

Proof overview: The philosophy of our analysis is to consider an acceleration of the chain P: a chain with transition matrix Q which is the same as P but moves faster. As mentioned in the introduction, previous work [17, 34] considered a “fully accelerated” chain, but we will instead carefully choose the amount of acceleration so that the transition probabilities are affine functions of x. This will allow an exact solution of the dynamics of this partially accelerated chain using a method of Kac [39], as we describe in Sect. 4.5.4. We then analyze how much time should P wait in each step of its walk in order to simulate steps of Q. In order to do this we need to prove bounds on how many times each site of the Markov chain has been visited during the accelerated walk and based on that we count how many steps the original chain should wait. This analysis is demonstrated in Sect. 4.5.2. Along the way during the wait-time analysis we will further modify the partially accelerated chain to run in continuous time, so that in time t we sample \(t'\) from \(\text {Pois}(t)\) (the Poisson distribution) and move \(t'\) steps. This resulting chain is also exactly solvable, and the solution turns out to be extremely simple, and exemplifies the connection of the accelerated walk with the well-known Ornstein–Uhlenbeck process (see Proposition 76). We need to analyze the error from moving to continuous time, which turns out to be a straightforward analysis of the Poisson distribution.

Now suppose that the accelerated chain goes through a sequence of transitions \(Y_0, Y_1, \ldots , Y_s\).

Let \(p(x) = P(x,x+1)\) and \(q(x) = P(x, x-1)\). We first consider the chain P conditioned on moving at every single step. This chain at site x has probability of moving forward and backwards \(\dfrac{p(x)}{p(x)+q(x)}\) and \(\dfrac{q(x)}{p(x)+q(x)}\), respectively. We can compute these probabilities

$$\begin{aligned} Q^a(x,x)= & {} 0,\nonumber \\ Q^a(x, x+1)= & {} \dfrac{3 (n-x)}{3n-2x-1},\nonumber \\ Q^a(x,x-1)= & {} \dfrac{x-1}{3n-2x-1}. \end{aligned}$$
(232)

Such a chain is called accelerated. The chain \(Q^a\) was used in [18, 27, 34] but we will not use it in this paper.

Instead of an accelerated chain we now define a partially accelerated chain as:

$$\begin{aligned} Q^w(x,x)= & {} w(x),\nonumber \\ Q^{w}(x,x+1)= & {} (1-w(x)) \dfrac{3 (n-x)}{3n-2x-1},\nonumber \\ Q^w(x,x-1)= & {} (1-w(x)) \dfrac{x-1}{3n-2x-1}. \end{aligned}$$
(233)

for arbitrary probability value w(x). Setting \(w(x) = \frac{2x}{3n-1}\) the partially accelerated chain becomes affine:

$$\begin{aligned} Q(x,x)= & {} \frac{2x}{3n-1},\nonumber \\ Q(x,x+1)= & {} \dfrac{3 (n-x)}{3n-1},\nonumber \\ Q(x,x-1)= & {} \dfrac{x-1}{3n-1}. \end{aligned}$$
(234)

By “affine” we mean that the transition probabilities are degree-1 polynomials in x. Let \(X_0, X_1, \ldots \) be the steps of the Markov chain evolving according to the transition matrix P and \(Y_0, Y_1, \ldots \) be the Markov chain according to Q. We now describe a coupling between these two.

4.5.1 Coupling between X and Y chains.

For \(x < \frac{5}{6} n\) let \(\alpha (x) = 1- \frac{p(x) + q(x)}{1- w(x)} = 1 - \frac{2x(3n-1)}{5n(n-1)}\). If \(0< x < \frac{5}{6} n\), \(0< \alpha (x) < 1\). So for this range we can view \(\alpha (x)\) as a probability.

For \(x \ge \frac{5}{6} n\), let \(\beta (x)\) be the solution to the following equation

$$\begin{aligned} p(x) + q(x) = 1 - w(x) + \beta (x) w(x) ( p(x) + q(x)). \end{aligned}$$
(235)

We can solve for \(\beta (x)\) to find

$$\begin{aligned} \beta (x) = \frac{1}{w(x)} \frac{-\alpha (x)}{1 - \alpha (x)} = \frac{2x(3n-1)-5n(n-1)}{4x^2}. \end{aligned}$$
(236)

For \(x\ge \frac{5}{6} n\) we have \(\alpha (x) < 0\), so from the first expression for \(\beta (x)\) we see that \(\beta (x) > 0\). From the second expression for \(\beta (x)\) we can calculate the upper bound \(\beta (x) \le 1/4 + \frac{6}{5 n}\).

figure f

Definition 70

For a tuple L and a number x let \(L_{\text {left}(x)}\) be the same as L except that we remove elements which are \(> x\). Similarly define \(L_{\text {right}(x)}\) to be the tuple resulted from removing the elements that are \(< x\).

Theorem 71

Assume \(X_0 = Y_0\) and fix \(s>0\), and let \({{\mathcal {Y}}}:= (Y_0, Y_1, \ldots , Y_s)\). Define

$$\begin{aligned} S:= \{ i: {{\mathcal {Y}}}_{\text {right}\big (\frac{5}{6}n\big )} [i] ={{\mathcal {Y}}}_{\text {right}\big (\frac{5}{6}n\big )} [i+1]\}. \end{aligned}$$
(237)

Let

$$\begin{aligned} T_{\text {left}} ({\mathcal {Y}}) = \sum _{y \in {{\mathcal {Y}}}_{ \text {left} \big (\frac{5}{6} n\big )}} \textsf{Geo}({\alpha _{y}}) \end{aligned}$$
(238)

and

$$\begin{aligned} T_{\text {right}} ({\mathcal {Y}}) = \sum _{y \in S} \textsf{Bern} (\beta (y)) \end{aligned}$$
(239)

then the process in Coupling 69 satisfies

$$\begin{aligned} Y_s = X_{s + T_{\text {left}}({\mathcal {Y}}) - T_{\text {right}}({\mathcal {Y}})} \end{aligned}$$
(240)

Proof

We prove this by induction on Coupling 69. For the base case we have \(X_0 = Y_0\). Now suppose for \(s > 0\), \(Y_s = X_{s + T_{\text {left}} - T_{\text {right}}}\). Let \(Y_{s+1}\) the \(s+1\)-th step. There are two possibilities: if \(Y_s < \frac{5}{6} n\), then \(\alpha (Y_s) > 0\). In this case, s will be incremented once while t may be incremented many times. The number of times t will advance is distributed according to \(\textsf{Geo}(\alpha (Y_s))\). Let \(X'=Y_{s+1}\), i.e.  the location on the chain after one step of Q. We show that \(X'\) is distributed according to \(X_{s + T_{\text {left}} - T_{\text {right}} + \textsf{Geo}(\alpha (Y_s))}\). To see this note:

$$\begin{aligned} \Pr [X' = x | X_{s + T_{\text {left}} - T_{\text {right}}} = x]= & {} \alpha (x) + (1-\alpha (x)) w(x) \nonumber \\= & {} 1 - p(x) - q(x) = P(x,x)\nonumber \\ \Pr [X' = x+1 | X_{s + T_{\text {left}} - T_{\text {right}}} = x]= & {} (1-\alpha (x)) (1-w(x)) \frac{p(x)}{p(x) + q(x)} =p(x)\nonumber \\= & {} P(x,x+1)\nonumber \\ \Pr [X' = x-1 | X_{s + T_{\text {left}} - T_{\text {right}}} = x]= & {} (1-\alpha (x)) (1-w(x)) \frac{q(x)}{p(x) + q(x)}\nonumber \\= & {} q(x) = P(x,x-1). \end{aligned}$$
(241)

Now if \(Y_s \ge \frac{5}{6} n\) then if \(Y_{s+1} \ne Y_s\), \(X_{s + T_{\text {left}} - T_{\text {right}} + 1} = Y_{s+1}\). But if \(Y_{s+1} = Y_s\), then with probability \(\beta (Y_s)\) the X process skips this, ie, \(Y_s = X_{s + 1 + T_{\text {left}} - T_{\text {right}} - 1}\). Let \(x \ge \frac{5}{6} n\). Let \(E_+\) be the event that \(X_{s + T_{\text {left}} - T_{\text {right}} + 1} = x+1\) conditioned on \(X_{s + T_{\text {left}} - T_{\text {right}}} = x\). Then

$$\begin{aligned} \Pr [E_+] = (1-w(x)) \frac{p(x)}{p(x) + q(x)} + \beta (x) w(x) \Pr [E_+] \end{aligned}$$
(242)

which implies that \(\Pr [E_+] = P(x, x+1)\). Similarly if we define \(E_-\) to be the event that \(X_{s + T_{\text {left}} - T_{\text {right}} + 1} = x-1\) conditioned on \(X_{s + T_{\text {left}} - T_{\text {right}}} = x\), then

$$\begin{aligned} \Pr [E_-] = (1-w(x)) \frac{q(x)}{p(x) + q(x)} + \beta (x) w(x) \Pr [E_-] \end{aligned}$$
(243)

which implies that \(\Pr [E_-] = P(x,x-1)\). Using this, if \(E_0\) is the event that \(X_{s + T_{\text {left}} - T_{\text {right}} + 1} = x\) conditioned on \(X_{s + T_{\text {left}} - T_{\text {right}}} = x\) then \(\Pr [E_0] = \Pr [(E_+ \cup E_-)^{{\textsf{c}}}] = P(x,x)\)\(\square \)

We need the following two theorems which basically assert that 1) the wait time during the accelerated process is not too long and 2) the accelerated chain mixes after \(O(n \ln ^2 n)\) steps in the \(\Vert \cdot \Vert _*\) norm.

Theorem 72

(Wait-time bound). Let \(Y_0, Y_1, \ldots , Y_s\) be s steps of the accelerated Markov chain defined in (234) for \(Y_0 \sim {{\,\textrm{Bin}\,}}(n,1/2)\), and \(W_s\) be the number of steps Markov chain \(X_0, X_1, \ldots \) has waited after s steps of the accelerated chain. Then for \(s = O(n \ln n)\), and for any constant \(\alpha > 0\) there exists a constant c such that

$$\begin{aligned} \Pr [T_{\text {left}}(s) \ge c n \ln ^2 n] \le 2^{-n}\cdot n^{-\alpha }. \end{aligned}$$
(244)

Theorem 73

(Accelerated-chain mixing). If \(s \ge 3 n \ln n\) then

$$\begin{aligned} \Vert Q_s\Vert _*\le \frac{27}{2^n + 1}(1+ \frac{1}{{{\,\textrm{poly}\,}}(n)}). \end{aligned}$$
(245)

Also, the following theorem combines Theorems 72 and 73 to argue that the original Markov chain mixes rapidly in the \(\Vert \cdot \Vert _{*}\) norm.

Proposition 74

Let

(246)

For any \(t_0 \le t_1 \le t_2\):

(247)

where, \(T= t_2 - t_1+1\).

Proof of Theorem 61

We need to find suitable values for \(t_0, t_1, t_2\). Let \(t_0 = 3 n \ln n\) so that in Proposition 74. Next, choose c to be large enough so that (using Theorem 72) if \(t_1 = c n \ln ^2 n\)

$$\begin{aligned} \Pr [T_{\text {left}}(t_0) \ge t_1-t_0] \le \frac{1}{2^n +1} \frac{1}{n^3}. \end{aligned}$$
(248)

Finally, let \(c' > c\) be any constant and choose \(t_2 =c' t_1\). Using Theorem 73 we conclude that:

$$\begin{aligned} \frac{1}{T} \sum _{\tau = t_1}^{t_2} \Vert P_\tau \Vert _*\le \frac{28}{2^n+1}. \end{aligned}$$
(249)

This implies that there exists a value \(t_1 \le t^*\le t_2\) such that

$$\begin{aligned} \Vert P_{t^*}\Vert _*\le \frac{1}{T} \sum _{\tau = t_1}^{t_2} \Vert P_\tau \Vert _*\le \frac{28}{2^n+1}. \end{aligned}$$
(250)

Since \(t^*\) is related to \(n \ln ^2 n\) by a constant, this implies the proof.

\(\square \)

It remains to prove Theorems 72 and 73 and Proposition 74. We prove Theorem 72 in Sects. 4.5.2 and 4.5.3, Theorem 73 in Sects. 4.5.4 and 4.5.5, and Proposition 74 in Sect. 4.5.6.

4.5.2 Wait-time analysis.

In this section we prove Theorem 72. Before getting to the proof we need some preliminaries. Sites with low Hamming weight have the largest wait times. Hence, intuitively, we want to say that during the accelerated walk, these sites are not hit so often. More formally, let \(N_x = \sum _{\tau =1}^s I \{ Y_\tau \le x \}\) and let \(\beta >1\).

Proposition 75

Let \(\nu =3/4 n\). For \(x \le \nu / \beta \), \(\Pr [N_x \ge \beta x ] \le s^{3/2} e \cdot e^{- \frac{\beta }{8} x}\).

If we set \(\beta = 8(4+c)\ln n\) then Proposition 75 implies that

$$\begin{aligned} \Pr [N_x \ge \beta x ] \le \frac{1}{{n \atopwithdelims ()x} n^c}. \end{aligned}$$
(251)

Let x(0) denote the corresponding \(\nu /\beta \), i.e.

$$\begin{aligned} x(0):= \frac{\nu }{8(4+c) \ln n}. \end{aligned}$$
(252)

Proof

We observe that \(N_x\) conditioned on \(Y_0 =z \ge 1\) is stochastically dominated by the same variable conditioned on \(Y_0 =1\). The proof is by just taking the natural coupling that makes sure the latter walk is always \(\le \) the former. Hence we can assume that the walk starts out from \(Y_0 =1\) and we will obtain a valid upper bound.

In [34] (see the proof of lemma A.5) the authors show that

$$\begin{aligned} \Pr [N_x \ge \beta x ] \le \sum _{\tau =\beta x}^s \Pr [Y_\tau \le x ]. \end{aligned}$$
(253)

To understand these probabilities we will develop an exactly solvable analogue for \(Y_\tau \). Although \(Y_\tau \) is a random walk in discrete time and space, we can approximate it by a process that takes place in continuous time and space. If \(Y_\tau \) were an unbiased random walk then we could approximate it with Brownian motion. However, it is biased to always drift towards the point \(\frac{3}{4} n\). The continuous-time-and-space random process which diffuses like Brownian motion but is biased to drift towards a fixed point is called the Ornstein–Uhlenbeck process. We will not prove a formal connection between \(Y_\tau \) and the Ornstein–Uhlenbeck process, but instead will prove bounds on \(Y_\tau \) that are inspired by the analogous facts about Ornstein–Uhlenbeck.

Proposition 76

(Connection with the Ornstein–Uhlenbeck process). Define

$$\begin{aligned} \nu _\tau := z e^{-\frac{4\tau }{3n}} + \frac{3}{4}n\left( 1-e^{-\frac{4\tau }{3n}}\right) . \end{aligned}$$
(254)

Then we can bound

$$\begin{aligned} \Pr [Y_\tau \le x ] \le \sqrt{\tau }e \cdot e^{-\frac{(\nu _\tau -x)^2}{2 \nu _\tau }} \end{aligned}$$
(255)

The proof is in Sect. 4.5.3.

This proposition is inspired by the fact that the exact solution to the Ornstein–Uhlenbeck process is a Gaussian with mean and variance both equal to \(\nu _\tau \). We can see that once \(\tau \gtrsim n\ln n\), this is close to a Gaussian centered at \(\frac{3}{4}n\), i.e. the stationary distribution.

Note that \(\nu _\tau \) is an increasing function of \(\tau \), and furthermore, for \(\nu _\tau \ge x\), \(e^{-\frac{(\nu _\tau -x)^2}{2 \nu _\tau }}\) is decreasing in \(\nu _\tau \), and therefore \(\tau \). Hence the sum in (253) can be bounded by

$$\begin{aligned} \Pr [N_x \ge \beta x ] \le s^2 e \cdot \exp \left( {-\frac{(\nu (1-e^{-\frac{\beta x}{\nu }})-x)^2}{2 \nu (1- e^{-\frac{\beta x}{\nu }})}}\right) . \end{aligned}$$
(256)

Using the following inequalities

$$\begin{aligned} \frac{u}{1+u} \le 1-e^{-u} \le u, \text { for } u \le 1. \end{aligned}$$
(257)

we find that

$$\begin{aligned} \Pr [N_x \ge \beta x ] \le s^{3/2} e \cdot \exp \left( {-\frac{\left( \frac{\beta x}{1 + \frac{\beta x}{\nu }}-x\right) ^2}{2 \beta x}}\right) . \end{aligned}$$
(258)

Since \(\frac{\beta x}{\nu }< 1\) then

$$\begin{aligned} \Pr [N_x \ge \beta x ] \le s^{3/2} e \cdot e^{- \frac{\beta }{8} x}. \end{aligned}$$
(259)

\(\square \)

Now following [18, 27, 34], define the good event \(A:= \cap _{1 \le x \le x(0)} \{ N_x \le \beta \cdot x \}\). Recall that \(\beta = 8(4+c)\ln n\) and \(x(0)=\nu /\beta \).

Proposition 77

\(\Pr [A^{{\textsf{c}}}|Y_0] \le \frac{2}{{n \atopwithdelims ()Y_0} n^{c-1}}\).

To prove Proposition 77, we will need a bound on the minimum site visited during the accelerated walk. Let \(M_s:= \min _{1 \le i\le s} \{Y_i \}\). Then

Proposition 78

\(\Pr [ M_s \le a | Y_0 = z ] \le s\frac{{n \atopwithdelims ()a} 3^a}{{n \atopwithdelims ()z} 3^z}\)

We need the following lemma which is a standard fact about Markov chains.

Lemma 79

Let \(Y_0, \ldots \) be a Markov chain with stationary distribution \(\pi \) then for any xy in the state space and integer \(s > 0\)

$$\begin{aligned} \Pr [ Y_s = y | Y_0 = x ] \le \frac{\pi _y}{\pi _x}. \end{aligned}$$
(260)

Proof

$$\begin{aligned} \Pr [ Y_s = y | Y_0 = x ]= & {} \frac{1}{\pi _x} \pi _x \Pr [ Y_s = y | Y_0 = x ]\end{aligned}$$
(261)
$$\begin{aligned}\le & {} \frac{1}{\pi _x}\sum _z \pi _z \Pr [ Y_s = y | Y_0 = z ]\end{aligned}$$
(262)
$$\begin{aligned}\le & {} \frac{\pi _y}{\pi _x} \end{aligned}$$
(263)

\(\square \)

Proof of Proposition 78

$$\begin{aligned} \Pr [ M_s \le a | Y_0 = z ]&\le \Pr [ \cup _{1 \le i \le s} \{ Y_i = a \} | Y_0 = z ]\nonumber \\&\le \sum _{j = 1}^s \Pr [Y_j = a | Y_0=z]\nonumber \\&\le s \cdot \frac{\pi _a}{\pi _z}&\text {using Lemma}~79\nonumber \\&= s \cdot \frac{{n \atopwithdelims ()a} 3^a}{{n \atopwithdelims ()z} 3^z}. \end{aligned}$$
(264)

\(\square \)

Now we show that the event \(A= \cap _{1 \le x \le x(0)} \{ N_x \le \beta \cdot x \}\) is very likely.

Proof of Proposition 77

The proof is very similar to the proof of lemma 4.5 in Brown and Fawzi [18].

$$\begin{aligned} \Pr [A^{{\textsf{c}}}]= & {} \Pr [ \cup _x N_x> \beta \cdot x ]\nonumber \\\le & {} \sum _x \Pr [ N_x> \beta \cdot x ]\nonumber \\\le & {} \sum _{ x<M_s} \Pr [ N_x> \beta \cdot x ] + \sum _{M_s\le x<Y_0} \Pr [ N_x> \beta \cdot x ]\nonumber \\ {}{} & {} + \sum _{x(0) \ge x\ge Y_0} \Pr [ N_x > \beta \cdot x ] \end{aligned}$$
(265)

In the last line we have used the fact that \(M_s\le Y_0\). Now we will handle each term in (265) separately. When \(x<M_s\), \(N_x=0\), so \(\sum _{ x<M_s} \Pr [ N_x > \beta \cdot x ]=0\). Next when \(x\ge Y_0\), we can use Proposition 75 to bound \(\Pr [N_x>\beta x] \le \left( {\begin{array}{c}n\\ Y_0\end{array}}\right) n^{-c}\). Finally, when \(M_s\le x<Y_0\), we have

$$\begin{aligned} \Pr [ N_x> \beta \cdot x ]&= \Pr [ N_x> \beta \cdot x | M_s \le x ] \Pr [M_s \le x ]\nonumber \\&\le \Pr [ N_x > \beta \cdot x |Y_0 = 1 ] \Pr [M_s \le x ]\nonumber \\&\le \Pr [M_s \le x ] \cdot \frac{1}{{n\atopwithdelims ()x} n^{c}}&\text {using Proposition}~75 \end{aligned}$$
(266)
$$\begin{aligned}&\le \frac{{n\atopwithdelims ()x}}{{n\atopwithdelims ()Y_0} 3^{Y_0-x}} \cdot \frac{1}{{n\atopwithdelims ()x} n^{c}}&\text {using Proposition}~78 \end{aligned}$$
(267)

We now combine these contributions and sum over x to obtain

$$\begin{aligned} \Pr [A^{{\textsf{c}}}]&\le s\frac{1}{{n\atopwithdelims ()Y_0}n^c} \sum _{x<Y_0} 3^{Y_0-x}+ \sum _{x(0)\le x \le Y_0} \frac{1}{{n\atopwithdelims ()Y_0} n^{c}} \end{aligned}$$
(268)
$$\begin{aligned}&\le s\frac{1}{2 {n\atopwithdelims ()Y_0}n^c} + \frac{1}{{n\atopwithdelims ()Y_0} n^{c-1}}\end{aligned}$$
(269)
$$\begin{aligned}&\le \frac{2}{{n\atopwithdelims ()Y_0} n^{c-2}}. \end{aligned}$$
(270)

\(\square \)

Proof of Theorem 72

Recall that the initial position on the chain \(Y_0\) is distributed according to a binomial around n/2. Hence it is enough to show that starting from position \(Y_0\) on the chain the probability that the wait time is larger than the bound stated in the theorem is bounded by

$$\begin{aligned} \frac{1}{{n \atopwithdelims ()Y_0} {{\,\textrm{poly}\,}}(n)}. \end{aligned}$$
(271)

If such bound holds than the probability of waiting too long is bounded by

$$\begin{aligned} \frac{1}{2^n-1} \sum _{Y_0 =1}^n \frac{{n\atopwithdelims ()Y_0}}{{n \atopwithdelims ()Y_0} {{\,\textrm{poly}\,}}(n)} = \frac{1}{2^n+1}\cdot \frac{1}{{{\,\textrm{poly}\,}}(n)}. \end{aligned}$$
(272)

We achieve this in the following.

Let a be a constant. Consider the following bound on the wait-time random variable \(T_{\text {left}}(s) = T_{\text {left}}({Y_0}) + \ldots + T_{\text {left}}({Y_s})\):

$$\begin{aligned} \Pr [ T_{\text {left}}(s) \ge a ]\le & {} \Pr [ T_{\text {left}}(s) \ge a | A ] + \Pr [ A^{{\textsf{c}}} ]\nonumber \\\le & {} \sum _{m=1}^{Y_0}\Pr [ T_{\text {left}}(s) \ge a | A, M_s=m ] \Pr [M_s=m ] + \Pr [ A^{{\textsf{c}}} ]\nonumber \\\le & {} \frac{1}{{n\atopwithdelims ()Y_0} 3^{Y_0}}\sum _{m=1}^{Y_0}\Pr [ T_{\text {left}}(s) \ge a | A, M_s=m] {n \atopwithdelims ()m} 3^m + \frac{2}{{n\atopwithdelims ()Y_0} n^{c-2}},\nonumber \\ \end{aligned}$$
(273)

using Propositions 78 and 77.

Let \(\rho _x = N_x - N_{x-1}\) be the number of times site x has been visited during s rounds of the accelerated walk. Recall from Sect. 4.5 that

$$\begin{aligned} T_{\text {left}}(s) \preceq \sum _{x=1}^{5n/6} \rho _x \cdot \textsf{Geo}({\frac{6 x}{5 n}}). \end{aligned}$$
(274)

Hence we need a concentration bound for sums of geometric random variables. Fortunately we know the following Chernoff-type tail bounds on the sum of geometric random variables.

Theorem 80

(Janson [38]). Let \(G = \sum _{i=1}^n \textsf{Geo}({p_i})\) be the sum of independent geometric random variables with parameters \(p_1, \ldots , p_n\), and let \(p^*= \min _i p_i\) and \(\phi := \sum _{i=1}^n \frac{1}{p_i} = {\mathbb {E}}G\), then for any \(\lambda \ge 1\)

$$\begin{aligned} \Pr [G \ge \lambda \phi ] \le \frac{1}{\lambda } (1-p^*)^{(\lambda -1-\ln \lambda ) \phi }, \end{aligned}$$
(275)

The bound we need for our results is:

Corollary 81

Let G be sum of s geometric random variables with parameters \(p^*= p_1 \le \ldots \le p_s\), and \({\mathbb {E}}G = \phi \). If \(T > 3 c \ln ( c) \phi \), then

$$\begin{aligned} \Pr [G > T] \le \frac{1}{3 c \ln c} (1-p^*)^{T(1-1/c)}. \end{aligned}$$
(276)

In particular, we can say that if \(T > {\mathbb {E}}W\), then for any constant c there exists a constant \(c'\) such that

$$\begin{aligned} \Pr [G > c' T] \le (1-p^*)^{c T}. \end{aligned}$$
(277)

Proof

It is enough to show that if \(\lambda > 3 c \ln c\) then \(\lambda -1-\ln \lambda > \lambda (1-1/c)\). Let \(f(\lambda ):=\frac{ \lambda }{c} - \ln (e\lambda )\) for \(c>1\), we observe that f is an increasing function for \(\lambda >c\). We need to find a point \(\lambda ^*\) such that \(f(\lambda ^*) >0\), and one can check that \(\lambda ^*= 3 c \ln c\) works. \(\square \)

In order to employ Corollary 81 in the context of wait time (specifically (273)) we just need to find an upper bound on the expected wait time. Now we condition on A. Hence for \(x \le x(0)\), \(N_x \le \beta x\). Among all possibilities given by event A, the wait time is maximized when the minimum visited site (\(M_s\)) is visited as often as possible (see Brown-Fawzi [18] for a discussion). So it will suffice to bound the wait time for the situation when \(x \le x(0)\), \(\rho _x = \beta \) and for \(x = x(0)\), \(\rho _x = s - \beta x(0)\). In this case, the expected wait time (conditioned on any starting point) is bounded by

$$\begin{aligned} {\mathbb {E}}[T_{\text {left}}(s) | A] \le \beta \sum _{1 \le x \le x(0)} \frac{5 n}{2 x} + (s - \beta x(0)) \frac{5 n}{ 2 x(0)} \end{aligned}$$
(278)

Assuming the parameters in Proposition 75 we find that \({\mathbb {E}}[T_{\text {left}}(s) | A] = O(n \ln ^2 n + s \ln n)\), and in particular if \(s = O(n \ln n)\) then \({\mathbb {E}}[T_{\text {left}}(s) | A] = O(n \ln ^2 n)\).

Therefore using Lemma 80 for any \(C >0\) there exists a large enough constant \(C'\) such that

$$\begin{aligned} \Pr [ T_{\text {left}}(s) \ge C' n \ln ^2 n | H, M_s=m] \le e^{- C \cdot \frac{m}{n} \cdot n \ln ^2 n}. \end{aligned}$$
(279)

Combining this with (273) and choosing C large enough yields

$$\begin{aligned} \Pr [ T_{\text {left}}(s) \ge C' n \ln ^2 n ]\le & {} \frac{1}{{n\atopwithdelims ()Y_0} 3^{Y_0}}\sum _{m=1}^{Y_0}e^{- C \cdot \frac{m}{n} \cdot n \ln ^2 n} {n \atopwithdelims ()m} 3^m + \frac{2}{{n\atopwithdelims ()Y_0} n^{c-2}}\nonumber \\\le & {} \frac{3}{{n \atopwithdelims ()Y_0} \cdot n^{c-2}}. \end{aligned}$$
(280)

and this completes the proof. \(\square \)

4.5.3 Proof of Proposition 76: Connection with the Ornstein–Uhlenbeck process.

We first define a new Markov chain \(S'_0, S'_1, S'_2, \ldots \) which is easier to analyze and gives us useful bounds for the Markov chain \(S_0, S_1, S_2, \ldots \).

Definition 82

\(S'_0, S'_1, S'_2, \ldots \) is the following Markov chain. The state space is \(\{0,1\}^{n}\). The initial string \(S'_0\) is sampled uniformly at random from \(\{0,1\}^n \backslash 0^n\). At each step t, \(S'_{t+1}\) results from \(S'_t\) by picking a random position of \(S'_t\). If it was a zero we flip it, otherwise if it was a 1 with probability 1/3 we flip it and with probability 2/3 it doesn’t change.

The Hamming weight of these strings corresponds to the position on a birth-and-death chain on the state space \(\{0,1,2,\ldots , n\}\). Given a string \(S' \in \{0,1\}^{n}\) the probability that the Hamming weight of \(S'\) increases by 1 is \(1- x/n\) and the probability that it decreases is \(\frac{x}{3 n}\). Let \(Q'\) be the transition matrix describing the Hamming weight.

We now claim that:

Proposition 83

Starting from a string of Hamming weight \(\ge 1\), at any time t, \(Y_t\) stochastically dominates \(Y'_t\), meaning that

$$\begin{aligned} \Pr [Y'_t \ge k] \le \Pr [Y_t \ge k] \end{aligned}$$
(281)

Proof

It is enough to observe that for \(0 \le x \le n\), the probability of moving forward for Q is larger than the probability of moving forward for \(Q'\), and also the probability of moving backwards for Q is smaller than the probability of moving backwards for \(Q'\). \(\square \)

Now suppose that we simulate \(Q'\) for T steps. First, instead of considering T steps we consider this number to be a Poisson random variable \(T \sim \textsf{Pois}(\tau )\), where \(\tau \) is some positive real number. Let \(f_l\) be the number of times that site l is hit after T steps. Then \((f_1, \ldots , f_n) \sim \textsf{Multi} (T, \frac{1}{n}, \ldots , \frac{1}{n})\) is the number of times each position in [n] is hit after T steps. Here \(\textsf{Multi} (T, \frac{1}{n}, \ldots , \frac{1}{n})\) is the multinomial distribution over n items summing up to T, each happening with probability 1/n.

We can then consider T in turn to be a random variable distributed according to \(T \sim \textsf{Pois}(\tau )\). It turns out that defining T in this way will make \(f_1,\ldots ,f_n\) independent. Moreover, for any \(l \in \{1,\ldots , n\}\),

$$\begin{aligned} f_l \sim \textsf{Pois} (\tau /n). \end{aligned}$$
(282)

In other words, the number of times each site is hit is independently distributed according to a Poisson distribution. This technique is sometimes called Poissonization.

Now suppose the l’th bit of \(S'_0\) starts out from 0 and that \(f_l = k\). We find that the probability of ending up with a 1 in this case is

$$\begin{aligned} p_k = \frac{3}{4} \left( 1 - \left( \frac{-1}{3}\right) ^k \right) , \end{aligned}$$
(283)

and the probability of reaching a 0 is

$$\begin{aligned} 1-p_k = \frac{1}{4} + \frac{3}{4} \left( \frac{-1}{3}\right) ^k. \end{aligned}$$
(284)

Using these two probabilities and taking the expectation over the Poisson measure we can compute

$$\begin{aligned} \Pr [S'_T [l]=1 | S'_0[l] = 0]= & {} \sum _{k=0}^\infty \frac{e^{-\tau /n}}{k!} (\tau /n)^k \left( \frac{3}{4}-\frac{3}{4} \left( -1/3\right) ^k\right) \nonumber \\= & {} \frac{3}{4} \left( 1-e^{-\frac{4\tau }{3n}}\right) \nonumber \\=: & {} \alpha _\tau . \end{aligned}$$
(285)

Note that the T on the LHS is still a random variable distributed according to \({{\,\mathrm{\textsf{Pois}}\,}}(\tau )\).

For the case when the l’th bit starts out equal to 1 and \(f_l = k \), we find the probabilities in a similar way. The probability of ending up in bit 1 is

$$\begin{aligned} p_k = \frac{3}{4} + \frac{1}{4} \left( \frac{-1}{3}\right) ^k, \end{aligned}$$
(286)

and the probability of ending up in 0 is

$$\begin{aligned} 1-p_k = \frac{1}{4} - \frac{1}{4} \left( \frac{-1}{3}\right) ^k. \end{aligned}$$
(287)

We then compute

$$\begin{aligned} \Pr [S'_T [l] =1 | S'_0 [l] \ne 0]= & {} \sum _{k=0}^\infty \frac{e^{-\tau /n}}{k!} (\tau /n)^k \left( \frac{3}{4}+\frac{1}{4} \left( -1/3\right) ^k\right) \nonumber \\= & {} \frac{3}{4} + \frac{1}{4}e^{-\frac{4\tau }{3n}}\nonumber \\=: & {} \beta _\tau . \end{aligned}$$
(288)

As a result conditioned on \(|S'_0|=z\),

$$\begin{aligned} Y'_T \sim Y'_{{{\,\mathrm{\textsf{Pois}}\,}}(\tau )} \sim \textsf{Bin} (n-z, \alpha _\tau ) + {{\,\textrm{Bin}\,}}(z, \beta _\tau ). \end{aligned}$$
(289)

This has expectation equal to

$$\begin{aligned} {\mathbb {E}}\left[ Y'_T | Y'_0 = z \right] = z e^{-\frac{4\tau }{3n}} + \frac{3}{4}n\left( 1-e^{-\frac{4\tau }{3n}}\right) , \end{aligned}$$
(290)

which is simply equal to \(\nu _\tau \), which was first introduced in (254). Next, using a simple Chernoff bound for sum of binomial random variables we can show that for all \(x < \nu _j\)

$$\begin{aligned} \Pr [ Y'_{{{\,\mathrm{\textsf{Pois}}\,}}(\tau )} \le x ] \le e^{-\nu _\tau \frac{(1-x/\nu _\tau )^2}{2}} = e^{-\frac{(\nu _\tau -x)^2}{2 \nu _\tau }}. \end{aligned}$$
(291)

This bound is exactly the one that we expect from an Ornstein–Uhlenbeck process.

Fix a number \(x \in [n]\). Let B (the bad event) be \(\{|S'_T| \le k\}\). Then

$$\begin{aligned} \Pr [B]&= \sum _{s=0}^\infty \Pr [T = s] \Pr [B | T = s ] \end{aligned}$$
(292)
$$\begin{aligned}&\ge \Pr [T = \tau ] \Pr [B | T = \tau ] \end{aligned}$$
(293)
$$\begin{aligned}&\ge \Pr [T = \tau ] \Pr [|S'_\tau | \le x ] \end{aligned}$$
(294)

We can evaluate \(\Pr [T = \tau ] = \frac{\tau ^\tau }{\tau !} e^{-\tau } \ge \frac{1}{\sqrt{\tau }e}\), where we have use Stirling’s formula (from wikipedia) which states that \(\frac{\tau !}{(\tau /e)^\tau } \le e\sqrt{\tau }\). Together with the bound in (294) we find that

$$\begin{aligned} \Pr [Y'_\tau \le x ] \le \sqrt{\tau }e \cdot \Pr [B] \end{aligned}$$
(295)

Combining this inequality with (291) we conclude that

$$\begin{aligned} \Pr [Y'_\tau \le x ] \le \sqrt{\tau }e \cdot e^{-\frac{(\nu _\tau -x)^2}{2 \nu _\tau }} \end{aligned}$$
(296)

Using Proposition 83

$$\begin{aligned} \Pr [Y_\tau \le x ] \le \sqrt{\tau }e \cdot e^{-\frac{(\nu _\tau -x)^2}{2 \nu _\tau }} \end{aligned}$$
(297)

If \(\tau \ge \frac{3}{4} n \ln n\) then \(\frac{3}{4} n \ge \nu _\tau \ge \frac{3}{4} n-1\). Therefore

$$\begin{aligned} \Pr [Y_\tau \le x ] \le \sqrt{\frac{3}{4} n \ln n} e \cdot e^{-\frac{2 \big (\frac{3}{4} n -x -1\big )^2}{3 n}} \end{aligned}$$
(298)

4.5.4 Proof of Theorem 73: exact solution to the Markov chain Q.

In this section we give an exact solution to the Markov chain Q defined in Sect. 4.5. Here, by giving an exact solution we mean we can find the eigenvalues and eigenvectors of the transition matrix explicitly and evaluate the norm \(\Vert Q_t\Vert _*\). The construction follows nearly directly from a result of Kac [39].

Recall the transition probabilities of Markov chain Q according to Equation (234). In (234), Q is defined over the state space [n]. Without loss of generality and for convenience we can relabel the state space to \(\{0,1,2,\ldots , n-1\}\) and redefine the transition matrix according to:

$$\begin{aligned} p_i:= & {} Q(i,i+1) = \dfrac{3 (n-i-1)}{3n-1},\nonumber \\ q_i:= & {} Q(i,i-1) = \dfrac{i}{3n-1},\nonumber \\ r_i:= & {} Q(i,i) = \dfrac{ 2 (i+1)}{3n-1}. \end{aligned}$$
(299)

for \(i \in \{0,1,2,3, \ldots , n-1\}\).

Now we consider the eigenvalue problem

$$\begin{aligned} x^ {(\lambda )} Q = \lambda x^{ (\lambda )}, \end{aligned}$$
(300)

where \(x^{(\lambda )}\) is a row vector with entries \(x^{(\lambda )}(i)\), is the left eigenvector corresponding to the eigenvalue \(\lambda \). For now we drop the superscript \(\lambda \) in \(x^{(\lambda )}\). Expanding this equation we have

$$\begin{aligned} p_{i-1} x({i-1}) + r_i x(i) + q_{i+1} x({i+1})= \lambda x(i). \end{aligned}$$
(301)

Notice that \(q_0 = p_{n-1} = 0\). Define the generating function

$$\begin{aligned} g_\lambda (z) = \sum _{i=0}^\infty x(i) z^i, \end{aligned}$$
(302)

where for \(i\ge n\), we set \(x(i)=0\). It suffices to solve (301) subject to the boundary conditions \(x_{-1} = x_{n} = 0\). For \(i>0\) we can write

$$\begin{aligned} p_{i-1} x(i-1) z^i + r_i x(i) z^i + q_{i+1} x(i+1) z^i= \lambda x(i) z^i, \end{aligned}$$
(303)

assuming \(x_{-1}=0\). Using the coefficients of (299) we get

$$\begin{aligned} \dfrac{3(n-i)}{3n-1} x(i-1) z^i + \dfrac{2 (i+1)}{3n-1} x(i) z^i + \dfrac{i+1}{3n-1} x(i+1) z^i= \lambda x(i) z^i. \end{aligned}$$
(304)

For \(i=0\) the equation is

$$\begin{aligned} x_{1}= ((3 n-1) \lambda - 2)x(0). \end{aligned}$$
(305)

Summing (\(\sum _{i=0}^\infty \)) over the first term in the left-hand side of (304) we obtain

$$\begin{aligned} \frac{3(n-1)}{3n-1} z \cdot g_\lambda (z) - \big (\frac{3}{3n-1}\big )z^2 \frac{d}{dz} g_\lambda (z). \end{aligned}$$
(306)

Similarly for the second term we get

$$\begin{aligned} \frac{2}{3n-1} g_\lambda (z) + \big (\frac{2}{3n-1}\big )z\frac{d}{dz} g_\lambda (z), \end{aligned}$$
(307)

and for the third term

$$\begin{aligned} \big (\frac{1}{3n-1}\big )\frac{d}{dz} g_\lambda (z), \end{aligned}$$
(308)

and for the term on the right-hand side

$$\begin{aligned} \lambda g_\lambda (z) \end{aligned}$$
(309)

Let \(\lambda ' = \lambda \frac{3n-1}{3(n-1)} - \frac{2}{3(n-1)}\). Putting all of these together we obtain the following first order differential equation

$$\begin{aligned} \dfrac{1}{g_\lambda (z)}\dfrac{d}{dz} g_\lambda (z)= (n-1) \dfrac{3 \lambda ' -3z }{- 3 z^2 +2 z + 1}, \end{aligned}$$
(310)

with the boundary conditions

$$\begin{aligned} g_\lambda (0)= & {} x(0), \end{aligned}$$
(311)
$$\begin{aligned} \dfrac{d^{n}}{d z^{n}} g (0)= & {} 0. \end{aligned}$$
(312)

Assume \(n-1\) is divisible by 4. Solving this differential equation and applying the first boundary condition (\(g_\lambda (0) = x(0)\)) we get

$$\begin{aligned} g_\lambda (z) = x^{(\lambda )}(0) (1+3z)^{\frac{n-1}{4} (1+3\lambda ')} (1-z)^{\frac{n-1}{4} (3-3\lambda ')}. \end{aligned}$$
(313)

The second boundary condition basically says that \(g_\lambda (z)\) should be a polynomial of degree at most \(n-1\). This implies that \(3 \lambda ' (n-1)/4\) should be an integer. Since the exponents of both the \((1+3z)\) and the \((1-z)\) terms should be nonnegative, we can further constrain \(3\lambda '(n-1)/4\) to lie in the interval \([-\frac{n-1}{4},3\frac{n-1}{4}]\). These constraints are enough to determine the n eigenvalues \(\lambda _0,\ldots ,\lambda _{n-1}\). They must (up to an irrelevant choice of ordering) satisfy

$$\begin{aligned} 3\lambda '_m\frac{n-1}{4} = 3\frac{n-1}{4} - m. \end{aligned}$$
(314)

Rearranging and solving for \(\lambda _m\) we have

$$\begin{aligned} \lambda _m = 1-\frac{4m}{3n-1}. \end{aligned}$$
(315)

The eigenvalue gap is exactly \(\dfrac{4}{3 n-1}\). Note for \(m=0\) we get \(\lambda _0 =1\) and

$$\begin{aligned} g_1(z) = x^{(1)}(0) (1+3 z)^{n-1} = x^{(1)}(0) \sum _{i=0}^{n-1} {n -1 \atopwithdelims ()i} 3^i z^i = \sum _i \pi (i) z^i. \end{aligned}$$
(316)

In the last equation we have introduced \(\pi (i)\), which is the stationary distribution. This is a binomial centered around \(\frac{3}{4}(n-1)\) and shifted by 1. Its mean \(\frac{3}{4}n+\frac{1}{4}\) differs from that of the non-accelerated chain by an offset of \(\approx \frac{1}{4}\). We might expect a shift like this because the accelerated chain spends less time on lower values of x.

Since the stationary distribution has unit 1-norm we can evaluate

$$\begin{aligned} x^{(1)} = \frac{1}{4^{n-1}} \end{aligned}$$
(317)

The eigenvectors for each eigenvalue \(\lambda \) can be indirectly read from the generating function \(g_\lambda (z)\). We use the notation \(x^{(\lambda )}\) for the eigenvector corresponding to eigenvalue \(\lambda \). Also we denote the i-th component of these vectors by \(x^{(\lambda )}(i)\), for \(i \in \{0,1,2,3, \ldots , n-1\}\).

4.5.5 Exact solution to the Markov chain Q implies a good upper bound on \(\Vert Q_t\Vert _{\small \Box }\).

We want to use the above exact solution to derive a bound on \(\Vert Q_t\Vert _{\small \Box }\). We begin by stating some facts.

  1. 1.

    \(\lambda _m = 1-\frac{4\,m}{3n-1}\le e^{- \frac{4\,m}{3n-1}}\) for \(m \in [0, n-1]\).

  2. 2.

    \(g_m(z) = x^{(m)}(0) (1+3z)^{n-m-1} (1-z)^{m} = \sum _{i=0}^{n-1}x^{(m)}(i) z^i\) for \(m \in [0, n-1]\).

  3. 3.

    \(x^{(m)} Q = \lambda _m x^{(m)} = (1-\dfrac{4m}{3n-1} ) x^{(m)}\) for \(m \in \left[ 0, n-1\right] \).

  4. 4.

    Q is a reversible Markov chain on \(\{0,\ldots ,n-1\}\) with stationary distribution \(\pi (i) = \left( {\begin{array}{c}n-1\\ i\end{array}}\right) 3^i / 4^{n-1}\).

Since \(x^{(m)}\)’s are the left eigenvectors of Q, they can be used to find the right eigenvectors \(y^{(n)}\):

$$\begin{aligned} y^{(m)}(i) = \dfrac{x^{(m)}(i)}{\pi (i)}. \end{aligned}$$
(318)

Left and right eigenvectors are orthonormal with respect to each other, i.e., for any \(l,m\in [n-1]\)

$$\begin{aligned} \sum _{i=0}^{n-1} x^{(m)}(i) y^{(l)}(i) = \sum _{i=0}^{n-1} \dfrac{x^{(m)}(i)x^{(l)}(i)}{\pi (i)} = \delta _{m,l}. \end{aligned}$$
(319)

We define the following inner product between functions

$$\begin{aligned} (f,g):= \sum _{i} \dfrac{1}{\pi (i)} f(i) g(i), \end{aligned}$$
(320)

according to which \(\{x^{(m)}: m \in [n-1]\}\) forms an orthonormal basis, i.e.,

$$\begin{aligned} (x^{(i)},x^{(j)}):= \delta _{i,j}. \end{aligned}$$
(321)

We denote the initial distribution by \(Q_0(i) = \dfrac{1}{2^n-1}{n\atopwithdelims ()i+1}\). Also we denote the eigenvector corresponding to eigenvalue 1 with \(x^{(1)} = \pi \), which is the same as the stationary distribution. We write this initial vector as a combination of eigenvectors of the chain

$$\begin{aligned} Q_0 = \sum _{i=0}^{n-1} \alpha _i x^{(i)} \qquad \text {with}\qquad \alpha _i = (x^{(i)}, Q_0). \end{aligned}$$
(322)

Therefore after t steps

$$\begin{aligned} Q_t= & {} \sum _{m=0}^{n-1} \alpha _m \lambda _m^t x^{(m)} = \sum _{m=0}^{n-1} (x^{(m)}, Q_0) \lambda _m^t x^{(m)},\nonumber \\= & {} \sum _{m=0}^{n-1} (x^{(m)}, Q_0) \lambda _m^t x^{(m)}. \end{aligned}$$
(323)

We are interested in

(324)

Using Equation (323) this can be evaluated as

(325)
(326)
(327)

As a result the problem reduces to evaluating the overlaps \(\alpha _m = (x^{(m)}, Q_0)\).

$$\begin{aligned} \alpha _m= & {} (x^{(m)}, Q_0)\nonumber \\= & {} \sum _{i =0}^{n-1} x^{(m)} (i) \frac{\frac{{n \atopwithdelims ()i+1}}{2^n-1}}{\frac{{n-1 \atopwithdelims ()i}}{4^{n-1}} 3^i} \end{aligned}$$
(328)
$$\begin{aligned}= & {} 3 \cdot \frac{4^{n-1}}{2^n-1 }\sum _{i =0}^{n-1} x^{(m)} (i) \frac{n}{(i+1) \cdot 3^{i+1}} \end{aligned}$$
(329)
$$\begin{aligned}= & {} 3n \cdot \frac{4^{n-1}}{2^n-1 }\int _{z = 0}^{1/3}g_{m} (z)dz \end{aligned}$$
(330)

Now we evaluate the integral \(\int _{z = 0}^{1/3}g_{m} (z) dz\). We consider two cases, one for \(m = 0\) and one for \(m > 0\):

  1. 1.

    \(m = 0\): In this case \(g_0(z) = (1 + 3z)^{n-1}\). Therefore

    $$\begin{aligned} \int _{z = 0}^{1/3}g_{0} (z)dz&=\; x^{(0)}(0) \int _{z = 0}^{1/3} (1 + 3z)^{n-1}dz \end{aligned}$$
    (331)
    $$\begin{aligned}&= x^{(0)}(0) \frac{2^{n}}{3 \cdot n} \end{aligned}$$
    (332)
    $$\begin{aligned}&= \frac{4 }{2^{n} \cdot 3 n} \text {using Equation}~(317) \end{aligned}$$
    (333)
  2. 2.

    \(m > 0\): In this case we give an upper bound on the integral

    $$\begin{aligned} \int _{z = 0}^{1/3}g_{m} (z)dz= & {} x^{(m)}(0) \int _{z = 0}^{1/3}(1+3z)^{n-m-1} (1-z)^{m}dz\end{aligned}$$
    (334)
    $$\begin{aligned}\le & {} x^{(m)}(0) 2^{n-1} \int _{z = 0}^{1/3} \big (\frac{1-z}{1+3z}\big )^{m}dz\end{aligned}$$
    (335)
    $$\begin{aligned}\le & {} x^{(m)}(0) 2^{n-1} \int _{z = 0}^{1/3} (1-z)^{m}dz\end{aligned}$$
    (336)
    $$\begin{aligned}\le & {} x^{(m)}(0) \frac{2^{n-1}}{m+1} \end{aligned}$$
    (337)

As a result we conclude that

$$\begin{aligned} \alpha _m \le {\left\{ \begin{array}{ll} 1+ \frac{1}{2^{n-1}} &{} m=0\\ x^{(m)}(0) 4^n \frac{3n}{4(m+1)} &{} m > 0\\ \end{array}\right. } \end{aligned}$$
(338)

The last step is to evaluate \(x^{(m)}(0)\). In order to do this we need some insight from a well studied class of polynomials known as the Krawtchouk polynomials. It turns out the Krawtchouk polynomial naturally appears in the expansion of \((1+3z)^{n-m-1}(1-z)^m\) as the coefficients of z monomials. The degree-t Krawtchouk polynomial is defined as:

$$\begin{aligned} K^{(t)} (x):= \sum _{i=0}^{t} {x \atopwithdelims ()i} {n-x-1 \atopwithdelims ()t-i} 3^{t-i} (-1)^i. \end{aligned}$$
(339)

(Elsewhere in the literature the Krawtchouk polynomials have been defined with the 3 above replaced by either 1 or some other number.) Now we evaluate the coordinates in each \(x^{(m)}\) vector.

$$\begin{aligned} (1+3z)^{n-m-1}(1-z)^m= & {} \sum _{i=0}^{n-m-1} {n-m-1 \atopwithdelims ()i} 3^i z^i \sum _{j=0}^m {m \atopwithdelims ()j} (-1)^j z^j,\nonumber \\= & {} \sum _{i=0}^{n-m-1} \sum _{j=0}^m {n-m-1 \atopwithdelims ()i} {m \atopwithdelims ()j} 3^i (-1)^j z^{i+j}\nonumber \\= & {} \sum _{t=0}^{n-1} z^t \sum _{i=0}^{t} {m \atopwithdelims ()i} {n-m-1 \atopwithdelims ()t-i} 3^{t-i} (-1)^i,\nonumber \\=: & {} \sum _{t=0}^{n-1} z^t K^{(t)} (m). \end{aligned}$$
(340)

Hence these Krawtchouk polynomials define the eigenstates, up to overall normalization, according to

$$\begin{aligned} x^{(m)}(i) = x^{(m)}(0) K^{(i)} (m). \end{aligned}$$
(341)

Moreover using the orthogonality of the \(x^{(m)}\)’s, we have

$$\begin{aligned} (4^n-1) {x^{(m)}(0)}^2 \sum _{t=0}^{n-1} \dfrac{{K^{(t)}(m)}^2}{{n \atopwithdelims ()t} 3^t} =1. \end{aligned}$$
(342)

In order to compute \(x^{(m)}(0)\) we prove the following proposition.

Proposition 84

\(\sum _{t=0}^{n-1} \dfrac{{K^{(t)}(m)}^2}{{n-1 \atopwithdelims ()t} 3^t} = \dfrac{4^n}{{n-1 \atopwithdelims ()m} 3^m}\).

Proving this will require two lemmas that establish symmetry and orthogonality properties of Krawtchouk polynomials.

Lemma 85

(Orthogonality). If we define

$$\begin{aligned} k^{(t)}(x):= \sum _{i=0}^t {x \atopwithdelims ()i}{N-x \atopwithdelims ()t-i} p^{t-i} (-q)^i, \end{aligned}$$
(343)

for \(p,q \in [0,1]\) and \(p+q=1\). Then these Krawtchouk polynomials satisfy the following orthogonality relationship

$$\begin{aligned} \sum _{x=0}^n {N\atopwithdelims ()x} p^x q^{N-x} k^{(t)} (x) k^{(s)} (x) = {N \atopwithdelims ()t} (pq)^t \delta _{t,s}. \end{aligned}$$
(344)

Lemma 86

(Symmetry). The Krawtchouk polynomials obey the following symmetry relation.

$$\begin{aligned} \dfrac{{n-1\atopwithdelims ()x}}{3^t}K^{(t)}(x) = \dfrac{{n-1\atopwithdelims ()t}}{3^x}K^{(x)}(t). \end{aligned}$$
(345)

These two lemma are proved in appendix B.

Proof of Proposition 84

Using Lemma 85, setting \(p=3/4\) and \(q=1/4\), and \(N=n-1\) and \(t=s\), we have

$$\begin{aligned} 4^t k^{(t)} (x) =\sum _{i=0}^t {x \atopwithdelims ()i}{n-x-1 \atopwithdelims ()t-i} 3^{t-i} (-1)^i = K^{(t)}(x), \end{aligned}$$
(346)

Therefore we obtain the relation

$$\begin{aligned} \sum _{x=0}^n {n-1 \atopwithdelims ()x} 3^x {K^{(t)}(x)}^2 = {n-1 \atopwithdelims ()t}3^t 4^{n-1}. \end{aligned}$$
(347)

We now use the symmetry from Lemma 86 to obtain

$$\begin{aligned} K^{(t)}(x) = \dfrac{3^t}{3^x} \frac{{n-1\atopwithdelims ()t}}{{n-1\atopwithdelims ()x}}K^{(x)}(t). \end{aligned}$$
(348)

As a result

$$\begin{aligned} \sum _{x=0}^n \dfrac{ {K^{(x)}(t)}^2}{{3^x}{n-1\atopwithdelims ()x}} = \frac{4^{n-1}}{{n-1 \atopwithdelims ()t}3^t}. \end{aligned}$$
(349)

This concludes the proof. \(\square \)

A corollary of Proposition 84 is that

$$\begin{aligned} {x^{(m)}(0)} = \frac{1}{(4^n-1)} \sqrt{{n-1 \atopwithdelims ()m}3^m}. \end{aligned}$$
(350)

Plugging this into Equation (338) we get

$$\begin{aligned} \alpha _m \le {\left\{ \begin{array}{ll} 2 &{} m=0\\ \sqrt{{n-1 \atopwithdelims ()m}3^m} \frac{3n}{2(m+1)} &{} m > 0\\ \end{array}\right. } \end{aligned}$$
(351)

Now we are ready to prove Theorem 73.

Proof of Theorem 73

Using Equations (327) and (351)

(352)
(353)
(354)
(355)
(356)
(357)
(358)

\(\square \)

4.5.6 Proof of Proposition 74: Combining wait-time analysis with the analysis of the accelerated chain.

Proposition

(Restatement of Proposition 74). Let

(359)

For any \(t_0 \le t_1 \le t_2\):

(360)

where, \(T= t_2 - t_1+1\).

Proof of Proposition 74

Let \(\tau \sim \textsf{Unif} (t_1, t_2)\). Then

$$\begin{aligned} \frac{1}{T} \sum _{s = t_1}^{t_2} \Vert P_s\Vert _*= \mathop {{\mathbb {E}}}\limits _\tau \Vert P_\tau \Vert _*\end{aligned}$$
(361)

We use the notation \(y^s = (y_1, \ldots , y_s)\), for \(y_j\) running over [n]. Consider the event \(\{X_\tau = k\}\). This event is equivalent to the disjoint union \(\cup _{s \ge 0} \cup _{y^s \in [n]^s: y_s = k} \{Y^s=y^s\}\cap \{W_{s-1} < \tau \le W_s\}\). Here \(y_0 \sim {{\,\textrm{Bin}\,}}(n,1/2)\), conditioned on \(y_0\ne 0\). Therefore

$$\begin{aligned} \Pr [X_\tau = k]= & {} \sum _{s \ge 0} \sum _{y^s: y_s = k} \Pr [Y^s = y^s] \Pr [W_{s-1}< \tau \le W_s]\nonumber \\= & {} \sum _{0 \le s< t_0} \sum _{y^s: y_s = k} \Pr [Y^s = y^s] \Pr [W_{s-1}< \tau \le W_s]\nonumber \\{} & {} +\sum _{t_0 \le s } \sum _{y^s: y_s = k} \Pr [Y^s = y^s] \Pr [W_{s-1}< \tau \le W_s]. \end{aligned}$$
(362)

We first argue about the time average of the first term.

$$\begin{aligned}&\mathop {{\mathbb {E}}}\limits _\tau \sum _{0 \le s<t_0} \sum _{y^s: y_s = k}\Pr [Y^s = y^s] \Pr [W_{s-1}< \tau \le W_s] \nonumber \\&\quad \le \mathop {{\mathbb {E}}}\limits _\tau \sum _{0 \le s<t_0} \sum _{y^s: y_s = k}\Pr [Y^s = y^s] \Pr [W_s \ge \tau ] \nonumber \\&\quad = \mathop {{\mathbb {E}}}\limits _\tau \sum _{0 \le s<t_0} \sum _{y^s: y_s = k} \Pr [Y^s = y^s]\Pr [s + T_{\text {left}(y^s)} - T_{\text {right}(y^s)} \ge \tau ]\nonumber \\&\quad \le \mathop {{\mathbb {E}}}\limits _\tau \sum _{0 \le s<t_0} \sum _{y^s: y_s = k} \Pr [Y^s = y^s] \Pr [T_{\text {left}(y^s)} \ge \tau -s]\nonumber \\&\quad \le \sum _{0 \le s <t_0} \sum _{y^s: y_s = k} \Pr [Y^s = y^s] \Pr [T_{\text {left}(y^s)} \ge t_1 - t_0]\nonumber \\&\quad \le t_0 \cdot \Pr [T_{\text {left}}(t_0) \ge t_1-t_0]. \end{aligned}$$
(363)

In the last step we have used the fact that \(T_{\text {left}(y^s)}\) is a nondecreasing function of s. To bound the contribution to the \(\Vert \cdot \Vert _*\) norm, observe that \(\Vert (1,1,\ldots ,1) \Vert _*= 1/3 + 1/3^2 + \ldots \le 1/2\). Thus the contribution from the first term is \(\le \frac{t_0}{2} \cdot \Pr [T_{\text {left}(y^{t_0})} \ge t_1 - t_0]\).

Next we argue about the time average of the second term in (362).

$$\begin{aligned}&\sum _{\begin{array}{c} s \ge t_0\\ y^s: y_s = k \end{array}} \Pr [Y^s = y^s] \mathop {{\mathbb {E}}}\limits _\tau \Pr [W_{s-1}< \tau \le W_s] \nonumber \\&\quad \le \sum _{\begin{array}{c} 0 \le s \le 4 t_2\\ y^s: y_s = k \end{array}} \Pr [Y^s = y^s] \mathop {{\mathbb {E}}}\limits _\tau \Pr [W_{s-1}< \tau \le W_s].&\text {(part i)} \end{aligned}$$
(364)
$$\begin{aligned}&\qquad + \sum _{s > 4 t_2} \max _{y^s: y_s = k} \mathop {{\mathbb {E}}}\limits _\tau \Pr [W_{s-1}< \tau \le W_s].&\text {(part ii)} \end{aligned}$$
(365)

We now analyze each part independently

  1. (part i)

    Write

    $$\begin{aligned} \mathop {{\mathbb {E}}}\limits _\tau \Pr [W_{s-1}< \tau \le W_s] = \mathop {{\mathbb {E}}}\limits _W \mathop {{\mathbb {E}}}\limits _\tau I [W_{s-1}< \tau \le W_s]. \end{aligned}$$
    (366)

    Here \({\mathbb {E}}_W\) is the expectation value over wait times \(W_{y_1}, \ldots , W_{y_s}\), and \(I[W_{s-1}< \tau \le W_s]\) is the indicator of the event \(W_{s-1}< \tau \le W_s\).

    We first bound \({\mathbb {E}}_W {\mathbb {E}}_\tau I [W_{s-1}< \tau \le W_s]\). Fix \(y^s\) such that \(y_s=k\), and for \(a\le b\) integers, let [ab] denote the set \(\{a,a+1,\ldots , b\}\). Then

    $$\begin{aligned} \mathop {{\mathbb {E}}}\limits _W \mathop {{\mathbb {E}}}\limits _\tau I [W_{s-1}< \tau \le W_s],&= \mathop {{\mathbb {E}}}\limits _W \frac{|[t_1,t_2]\cap [W_{s-1},W_s]|}{T},\nonumber \\&\le \mathop {{\mathbb {E}}}\limits _W \frac{|[W_{s-1},W_s]|}{T}. \end{aligned}$$
    (367)

    There are two possibilities for the random variable \(|[W_{s-1},W_s]| = W_s - W_{s-1}\); one for \(k < \frac{5}{6}n\) and one for \(k \ge \frac{5}{6}n\):

    $$\begin{aligned} W_s - W_{s-1} \sim {\left\{ \begin{array}{ll} \textsf{Geo}({1-\alpha (k)}) &{} k < \frac{5}{6}n\\ \textsf{Bern} (\beta _k) &{} k \ge \frac{5}{6}n \end{array}\right. } \end{aligned}$$
    (368)

    Therefore

    $$\begin{aligned} \mathop {{\mathbb {E}}}\limits _W [W_s - W_{s-1}]&\le \textsf{Geo}({1-\alpha (k)}) + \textsf{Bern} (\beta _k)\nonumber \\&\le \frac{5 n (n-1)}{2k (3n-1)} + 1/2\nonumber \\&\le \frac{3n}{k}. \end{aligned}$$
    (369)

    Using this in (367) and (364) we find the bound

    $$\begin{aligned}&\sum _{\begin{array}{c} 0 \le s \le 4 t_2\\ y^s: y_s = k \end{array}} \Pr [Y^s = y^s] \mathop {{\mathbb {E}}}\limits _\tau \Pr [W_{s-1}< \tau \le W_s]\nonumber \\&\quad \le \sum _{\begin{array}{c} 0 \le s \le 4 t_2\\ y^s: y_s = k \end{array}} \Pr [Y^s = y^s] \frac{3n}{kT} \le \sum _{t_0 \le s \le 4 t_2} \Pr [Y_s = k] \frac{3n}{kT} \end{aligned}$$
    (370)
  2. (part ii)

    For the second part we use

    $$\begin{aligned}&\sum _{s> 4 t_2} \max _{y^s: y_s = k} \mathop {{\mathbb {E}}}\limits _\tau \Pr [W_{s-1}< \tau \le W_s] \nonumber \\&\quad \le \sum _{s> 4 t_2} \max _{y^s: y_s = k} \mathop {{\mathbb {E}}}\limits _\tau \Pr [W_{s-1}< \tau ]\nonumber \\&\quad \le \sum _{s> 4 t_2} \max _{y^s: y_s = k} \max _{t_1 \le \tau \le t_2} \Pr [W_{s-1}< \tau ]\nonumber \\&\quad \le \sum _{s> 4 t_2} \max _{y^s: y_s = k} \max _{t_1 \le \tau \le t_2} \Pr [s-1 + T_{\text {left}(y^s)}< \tau + T_{\text {right}(y^s)}]\nonumber \\&\quad \le \sum _{s> 4 t_2} \max _{y^s: y_s = k} \max _{t_1 \le \tau \le t_2} \Pr [s-1 - \tau<T_{\text {right}(y^s)}]\nonumber \\&\quad \le \sum _{s > 4 t_2} \max _{y^s: y_s = k} \Pr [s-1 - t_2 <T_{\text {right}(y^s)}] \end{aligned}$$
    (371)

    Now recall from Equation (239) we know that \(T_{\text {right}(y^s)}\) is statistically dominated by \(\textsf{Bin}(s,1/2)\). So the RHS of (371) gets bounded by:

    $$\begin{aligned}&\le \sum _{s> 4 t_2} \Pr [s - t_2 \le \textsf{Bin} (s,1/2)]\nonumber \\&\le \sum _{s> 4 t_2} \sum _{k = s -t_2}^s \frac{{s \atopwithdelims ()k}}{2^s}\nonumber \\&\le \sum _{s> 4 t_2} t_2 \frac{{s \atopwithdelims ()t_2}}{2^s}&\text {(using }s> 4 t_2)\nonumber \\&\le \sum _{s> 4 t_2} t_2 \frac{{s \atopwithdelims ()s/4}}{2^s}&\text {(using }s> 4 t_2)\nonumber \\&\le \sum _{s> 4 t_2} t_2 \frac{(4 e)^{s/4}}{2^s}\nonumber \\&\le t_2 \cdot \sum _{s > 4 t_2} \frac{1}{1.09^s}\nonumber \\&\le 12 t_2 \frac{1}{1.4^{t_2}} \end{aligned}$$
    (372)

Using (370), (372), (363) and (362)

$$\begin{aligned} \mathop {{\mathbb {E}}}\limits _{\tau } \Pr [X_\tau = k] \le t_0 \cdot \Pr [T_{\text {left}}(t_0) \ge t_1-t_0] + \sum _{t_0 \le s \le 4 t_2} \Pr [Y_s = k] \frac{3n}{kT} + 12 t_2 \frac{1}{1.4^{t_2}}\nonumber \\ \end{aligned}$$
(373)

Therefore

(374)

\(\square \)

4.6 Towards exact constants

Here we discuss what constant factors we may expect from the bound in Theorem 13. We do not consider the case of D-dimensional graphs here.

What is the right time scale in order to get anti-concentration? Since Pauli strings of weight k have contribution \(1/3^k\) as well as expected wait-time of \(\approx n/k\), it seems reasonable to guess that lower values of k contribute more to the anti-concentration probability. On the other hand, the initial distribution of k is centered around n/2. Still, enough probability mass survives at low values of k to yield a non-trivial lower bound in Theorem 13.

Thus, let us focus initially on walks starting with weight \(k=1\). Here the expected “escape time” from the low-k sector (say to \(k=n/2\)) is \(\approx \frac{5}{6} n \ln n\), and, simultaneously, it takes \(\approx \frac{5}{6} n \ln n\) time to hit \(\frac{3}{4} n -o(n)\). This is the basis for the following conjecture. A special case of this conjecture for anti-concentration was recently resolved in [24].

Conjecture 1

If \(t = \frac{5}{6} n \ln n + o(n \ln n)\) then \(\mathop {\Pr }\limits _{C \sim \mu ^{(\text {CG})}_t} [|\langle x|C|0\rangle |^2 \ge \frac{\alpha }{2^n} ] = \Omega (1)\).

Here is the reasoning behind this conjecture. Recall that the transition matrix P is a birth-death chain, with probability of moving forward, backwards, and staying put being \(p_l\), \(q_l\) and \(r_l\), respectively. Let \(\pi \) be the stationary distribution. Let \(T_l= \min \{t: X_t \ge l\}\) be the time of hitting the chain site l starting from the first site. For any birth-death chain, starting at site \(l-1\) [41], the expected time of moving one step forward is

$$\begin{aligned} \mathop {{\mathbb {E}}}\limits _{l-1} [ T_l] = \frac{1}{q_l} \sum _{i=1}^{l-1} \dfrac{\pi (i)}{\pi (l)}. \end{aligned}$$
(375)

In our chain

$$\begin{aligned} \mathop {{\mathbb {E}}}\limits _{l-1} [ T_l]=\frac{5}{2} \sum _{i=1}^{l-1} \dfrac{{n \atopwithdelims ()i}}{{n-2 \atopwithdelims ()l-2} 3^{l-i}}. \end{aligned}$$
(376)

In order to bound this we use the inequalities (proven in [7])

$$\begin{aligned} {n-2 \atopwithdelims ()l-2} \le {n-2 \atopwithdelims ()i-1} \left( \dfrac{n-i-1}{i} \right) ^{l-i-1}, \end{aligned}$$
(377)

and

$$\begin{aligned} {n \atopwithdelims ()i} \le {n \atopwithdelims ()l-1} \left( \dfrac{l-1}{n-l+2} \right) ^{l-i-1}. \end{aligned}$$
(378)

Therefore

$$\begin{aligned} \mathop {{\mathbb {E}}}\limits _{l-1} [ T_l]\le & {} \frac{5}{6} \sum _{i=1}^{l-1} \dfrac{{n \atopwithdelims ()l-1}}{{n-2 \atopwithdelims ()l-2} } \left( \dfrac{l-1}{3(n-l+2)}\right) ^{l-i-1}, \end{aligned}$$
(379)
$$\begin{aligned}\le & {} \frac{5}{6} n \left( \dfrac{1}{l-1}+\frac{1}{3n/4-l+7/4} \right) \left( 1 + O(1/n) \right) . \end{aligned}$$
(380)

The last line holds for \(l < \frac{3}{4} n\). The transition from (379) to (380) is directly inspired by Equation (2) of [the arXiv version of] [7].

To bound the time of reaching \(\frac{3}{4} n-\delta \) for some \(\delta \ge 0\) we sum (380) over \(1 \le l \le \frac{3}{4} n-\delta \) and neglect the \(1+O(1/n)\) corrections.

$$\begin{aligned} \mathop {{\mathbb {E}}}\limits _1[T_{\frac{3}{4}n-\delta }]&\le \frac{5}{6} n \left( \sum _{l=1}^{\frac{3}{4}n-\delta } \frac{1}{\ell }+\frac{1}{3n/4-l+11/4} \right) \end{aligned}$$
(381)
$$\begin{aligned}&\approx \frac{5}{6} n \left( \ln \left( \frac{\frac{3}{4}n-\delta }{1}\right) + \ln \left( \frac{3n/4+7/4}{\delta + 11/4}\right) \right) \end{aligned}$$
(382)
$$\begin{aligned}&= \frac{5}{6} n\left( \ln \frac{n^2}{\delta +1} + O(1)\right) . \end{aligned}$$
(383)

Using this bound, for \(a < b\) we can also compute \({\mathbb {E}}_{a} [T_b]\) as \({\mathbb {E}}_{1} [T_b] - {\mathbb {E}}_{1} [T_a]\). We wish to estimate \({\mathbb {E}}_a[T_b]\) in two main regimes. Recall that our starting distribution is \({{\,\textrm{Bin}\,}}(n,1/2)\) and the stationary distribution is \({{\,\textrm{Bin}\,}}(n,3/4)\). Thus we need to know the time for most of the probability mass to reach \(\approx 3/4n\), and for the left tail of the initial distribution to reach the left tail of the final distribution. (The right tail is less demanding and less important, because it does not have the long wait times and it is suppressed by the \(1/3^k\) factors.) For the bulk of the probability distribution we use the estimate \({\mathbb {E}}_{n/2} [ T_{3/4 n - O(1)}] \lesssim \frac{5}{6} n \ln n\). For the left tail, we use the bound \({\mathbb {E}}_1[T_{0.74n}] \lesssim \frac{5}{6}n\ln n\). In each case the time required is \(\frac{5}{6} n\log n + O(n)\).

5 Alternative Proof for Anti-concentration of the Outputs of Random Circuits with Nearest-Neighbor Gates on D-Dimensional Lattices

5.1 The \(D=2\) case

In this section we consider a simplified version of \(\mu ^{\text {lattice},n}_{2,c,s}\), where \(c=1\) and that \(K^{(t)}_{\mu ^{\text {lattice},n}_{2,1,s}} = k^s_R k^s_C\). We prove the following:

Theorem 87

If \(s = O ( \sqrt{n} + \ln (1/\epsilon ) )\) then \(\mu ^{\text {lattice},n}_{2,1,s}\) satisfies

$$\begin{aligned} \mathop {{\mathbb {E}}}\limits _{C\sim \mu ^{\text {lattice},n}_{2,s}}\text {Coll}(C) \le \frac{2}{2^n+1} (1+\epsilon ). \end{aligned}$$
(384)

This result is already established in Theorem 8, we give an alternative proof based on a reduction to a classical probabilistic process. This alternative approach may help with the analysis of random circuits on arbitrary graphs.

We use the following two statements

Lemma 88

(Brandão-Harrow-Horodecki‘13 [13]). Let \(t = O(\sqrt{n} + \ln \frac{1}{\epsilon })\) then

$$\begin{aligned} \text {Ch}[g^t_{\text {Rows}}] \preceq \bigotimes _{i \in \text {Rows}} \text {Ch}[G_i] \cdot (1+\epsilon ). \end{aligned}$$
(385)

the same holds for \(\text {Ch}[g^t_{\text {Columns}}]\).

Proof

This result is proved by Brandão-Harrow-Horodecki in [13]. \(\square \)

Proposition 89

Let \(K_i\) an \(\epsilon \) approximate 2-designs on row or column \(i \in \{R,C\}\), in the sense that

$$\begin{aligned} K_i \preceq (1 + \epsilon ) \text {Ch}[G_i] \end{aligned}$$
(386)

then for any sequence of rows or columns \(i_1,\ldots , i_t\)

$$\begin{aligned} \text {Coll}( K_{i_t} \ldots K_{i_1}) \le (1+\epsilon )^t \text {Coll}( \text {Ch}[G_{i_t}] \ldots \text {Ch}[G_{i_1}]). \end{aligned}$$
(387)

Proof

This proposition is proved in Sect. 3.5.1. \(\square \)

Putting these together

$$\begin{aligned} \text {Coll}(\mu ^{\text {lattice},n}_{2, 1, s}) \le ( 1+\epsilon )^2 \text {Coll}(\text {Ch}[G_R G_C]). \end{aligned}$$
(388)

Therefore our objective is to show that

Proposition 90

$$\begin{aligned} \text {Coll}(\text {Ch}[G_R G_C]) \le \frac{2}{2^n+1} \left( 1+\frac{1}{{{\,\textrm{poly}\,}}(n)}\right) . \end{aligned}$$
(389)

Proof

Using the Markov chain interpretation discussed in Sect. 4, the initial distribution on the chain is

$$\begin{aligned} V_0:= \frac{1}{2^n} (\sigma _0\otimes \sigma _0 + \sum _{\begin{array}{c} p \in \{0,3\}^n\\ p\ne 0 \end{array}} \sigma _p\otimes \sigma _p ), \end{aligned}$$
(390)

and after the application a large enough random quantum circuit the distribution converges to

$$\begin{aligned} V^*:= \frac{1}{2^n} \sigma _0\otimes \sigma _0 + (1-\frac{1}{2^n} ) \cdot \frac{1}{4^n-1}\sum _{\begin{array}{c} p \in \{0,1,2,3\}^n\\ p\ne 0 \end{array}} \sigma _p\otimes \sigma _p, \end{aligned}$$
(391)

and we want to see how fast this convergence happens.

For clarity, throughout this proof we represent distributions along the full lattice by capital letters (such as V) and for individual rows or columns with small letters (such as \(v^i\) for distribution v on row or column i). Also, for simplicity we write 0 instead of \(\sigma _0\otimes \sigma _0\), and \(\sigma _0^i\) for all zeros across row or column i.

\(V_0\) is separable across any subset of nodes. So the initial distribution along each row or column is exactly

$$\begin{aligned} \frac{1}{2^{\sqrt{n}}} (\sigma _0 + \sum _{\begin{array}{c} p \in \{0,3\}^{\sqrt{n}}\\ p\ne 0 \end{array}} \sigma _p\otimes \sigma _p ) =: v_0. \end{aligned}$$
(392)

After one application of \(\text {Ch}[G_R]\) each such distributions become

$$\begin{aligned} v^*:= & {} \frac{1}{2^{\sqrt{n}}} \sigma _0\otimes \sigma _0 + (1-\frac{1}{2^{\sqrt{n}}} ) \frac{1}{4^{\sqrt{n}}-1}\sum _{\begin{array}{c} p \in \{0,1,2,3\}^{\sqrt{n}}\\ p\ne 0 \end{array}} \sigma _p\otimes \sigma _p \nonumber \\=: & {} \frac{1}{2^{\sqrt{n}}} \sigma _0 + (1-\frac{1}{2^{\sqrt{n}}} ) v. \end{aligned}$$
(393)

Here we have defined

$$\begin{aligned} v:= \frac{1}{4^{\sqrt{n}}-1}\sum _{\begin{array}{c} p \in \{0,1,2,3\}^{\sqrt{n}}\\ p\ne 0 \end{array}} \sigma _p\otimes \sigma _p. \end{aligned}$$
(394)

therefore the distribution along the full chain is \(V_1:= ( \frac{1}{2^{\sqrt{n}}}\sigma _0 + (1-\frac{1}{2^{\sqrt{n}}})v )^{\otimes \sqrt{n}}\). We also use the notation \(v^y \sigma _0^{\backslash y}:= \otimes _{i: y_i = 1} v \otimes \bigotimes _{i: y_i = 0} \sigma _0\), for \(y \in \{0,1\}^{\sqrt{n}}\).

Before getting to the analysis, we should first understand the main reason why \(\text {Coll}(\text {Ch}[G_R])\) is large.

After we apply \(\text {Ch}[G_R]\) the collision probability across each row is exactly \(\frac{2}{2^{\sqrt{n}}+1}\). So the collision probability across the whole lattice is \(\approx \frac{2^{\sqrt{n}}}{2^n}\); which is much larger (by a factor of \(2^{\sqrt{n}}\)) than what we want. The crucial observation here is that if in (393) we project out all the \(\sigma _0\) terms across each row, then the bound becomes \(\approx \frac{1}{2^n}\). So what really slows this process are the zero \(\sigma _0\) terms. The issue is that, after an application of \(\text {Ch}[G_R]\), all zeros states get projected to themselves. However, if one applies \(\text {Ch}[G_C]\) they get partially mix with other rows. So the objective is to show that after application of \(\text {Ch}[G_C] \text {Ch}[G_R] \) for constant number of times, these zeros disappear with large enough probability.

Let \(V_s\) be the distribution along the full chain after we apply \( (\text {Ch}[G_C] \text {Ch}[G_R] )^s\). Eventually we want to compute

$$\begin{aligned} \text {Coll}(\text {Ch}[G_s])=\frac{1}{2^n} {\textrm{Tr}}\left( v_0^{\otimes \sqrt{n}}V_s \right) =: \kappa (V_s). \end{aligned}$$
(395)

Here we have defined the map

$$\begin{aligned} \kappa : A \mapsto \frac{1}{2^n} {\textrm{Tr}}( V_0 A ). \end{aligned}$$
(396)

As a result

$$\begin{aligned} V_1 = \otimes _{r \in \text {Rows}} \frac{1}{2^{\sqrt{n}}} \sigma _0^r + (1-\frac{1}{2^{\sqrt{n}}} ) v^r= & {} \sum _{y \in \{0,1\}^{\sqrt{n}}} \frac{1}{2^{\sqrt{n}(\sqrt{n} - |y|)}} (1-\frac{1}{2^{\sqrt{n}}} )^{|y|} v^{y} \sigma _0^{\backslash y}.\nonumber \\ \end{aligned}$$
(397)

An important observation here is that

$$\begin{aligned} \kappa _i \left( \frac{1}{2^{\sqrt{n}}} \sigma _0^i \right) = \frac{1}{2^{\sqrt{n}}}, \quad \kappa \left( (1- \frac{1}{2^{\sqrt{n}}}) v^{ i} \right) =\frac{(1- \frac{1}{2^{\sqrt{n}}})}{2^{\sqrt{n}}+1} < \frac{1}{2^{\sqrt{n}}}. \end{aligned}$$
(398)

the relevant information here is that when \(\kappa \) is applied to the summation in (397), it amounts to

$$\begin{aligned} \kappa (V_1) < \frac{1}{2^n} \sum _{y \in \{0,1\}^{\sqrt{n}}} 1 = \frac{2^{\sqrt{n}}}{2^n}. \end{aligned}$$
(399)

In other words, each \(\sigma _0\) term contributes to the number 1 in the above summation. That means if we had started with the distribution

$$\begin{aligned} V' = \bigotimes _{r \in \text {Rows}}o(1/\sqrt{n}) \frac{1}{2^{\sqrt{n}}} \sigma _0^r + \left( 1-o(1/\sqrt{n}) \frac{1}{2^{\sqrt{n}}} \right) v^r, \end{aligned}$$
(400)

then we would have obtained

$$\begin{aligned} \kappa (V') = \frac{2}{2^n+1} \left( 1+\frac{1}{{{\,\textrm{poly}\,}}(n)}\right) , \end{aligned}$$
(401)

which is exactly what we want. The last relevant piece of information is that if \(v''_j\) is a distribution over row j that with probability 1 contains a nonzero item, then when \(\text {Ch}[G_j]\) is applied to it, it will instantly get mapped to \(v_j\). This phenomenon is related to strong stationarity in Markov chain theory.

We claim that after the first application of \(\text {Ch}[G_C]\), the expected collision probability is according to the bound claimed in this theorem. In order to see this, we consider the distribution \(V_1\) ((397)), this time along each column. Note that the distribution along columns. For any set of columns \(j_1, \ldots , j_k\) let \(E_{j_1, \ldots , j_k}\) be the event that these columns are all zeros, and the rest of the columns have at least one non-zero element in them. Here we use the notation \(E_{j_1, \ldots , j_k} \equiv E_y\) for \(y \in \{0,1\}^{\sqrt{n}}\) such that the \(j_1, \ldots , j_k\) locations of y are ones and the rest of its bits are zeros.

Therefore

$$\begin{aligned} \text {Coll}(\text {Ch}[G_C] V_1 )= & {} \sum _{y \in \{0,1\}^{\sqrt{n}}} \Pr [E_y ] \kappa \left( \sigma _0^y V^{\backslash y} \right) \\= & {} \frac{1}{2^n} + \sum _{y \in \{0,1\}^{\sqrt{n}}\backslash } \Pr [E_y ] \left( \frac{1}{2^{\sqrt{n}+1}} \right) ^{\sqrt{n} - |y|}. \end{aligned}$$

Let \(p_0:= \frac{1}{2^{\sqrt{n}}}+ \frac{1}{4}(1- \frac{1}{2^{\sqrt{n}}})\). The main observation is that for each such y,

$$\begin{aligned} \Pr [E_y ] \le p_0^{\sqrt{n} |y|} \left( 1-p^{\sqrt{n}}_0 \right) ^{\sqrt{n}-|y|}. \end{aligned}$$
(402)

Therefore

$$\begin{aligned} \text {Coll}(\text {Ch}[G_C] V_1 )\le & {} \frac{1}{2^{n}}+\sum _{y \in \{0,1\}^{\sqrt{n}}\backslash 0}p_0^{\sqrt{n} |y|} \left( 1-p^{\sqrt{n}}_0 \right) ^{\sqrt{n}-|y|} \left( \frac{1}{2^{\sqrt{n}+1}} \right) ^{\sqrt{n} - |y|}\nonumber \\= & {} \frac{1}{2^n}+ \left( p_0^{\sqrt{n}}+ \left( 1-p_0^{\sqrt{n}}\right) \frac{1}{2^{\sqrt{n}+1}} \right) ^{\sqrt{n}}\nonumber \\= & {} \frac{1}{2^n} + \frac{1-\frac{1}{2^{n}}}{2^n+1} \left( 1+\frac{1}{{{\,\textrm{poly}\,}}(n)}\right) \nonumber \\= & {} \frac{2}{2^n+1} \left( 1+\frac{1}{{{\,\textrm{poly}\,}}(n)}\right) , \end{aligned}$$
(403)

and this completes the proof. \(\square \)

5.2 Generalization to arbitrary D-dimensional case

See Sect. 2.1 for definitions in this section. In particular, we need definitions for \(\text {Ch}[g_i]\), \(K_i\) and \(\text {Ch}[G_i]\) for each coordinate i of the lattice, and \(K_t = (\prod _i k_i)^t\).

In this section we prove that

Theorem 91

D-dimensional \(O\big (D n^{1/D} + D \ln \big (\frac{D}{\epsilon }\big )\big )\)-depth random circuits on n qubits have expected collision probability \(\frac{2}{2^n+1} \left( 1+\frac{1}{{{\,\textrm{poly}\,}}(n)}\right) \).

Proof

The proof is basically a generalization of the proof for Theorem 87. Here we sketch an outline and avoid repeating details. In particular, we need generalizations of Lemma 88 and Proposition 89

The generalization of Lemma 88 is simply that \(k_i^t\) for \(t = O(n^{1/D} + \ln \frac{D}{\epsilon })\) is an \(\frac{\epsilon }{d}\)-approximate 2-design. Proposition 89 naturally generalizes to: if for each coordinate \(K_i\) is an \(\frac{\epsilon }{D}\)-approximate 2-design then

$$\begin{aligned} \text {Coll}\left( \prod _i K_i \right) \le \left( 1+\frac{\epsilon }{D}\right) ^{D} \cdot \text {Coll}\left( \prod _i \text {Ch}[G_i] \right) . \end{aligned}$$
(404)

Our objective is then to show

$$\begin{aligned} \text {Coll}\left( \prod _i \text {Ch}[G_i] \right) = \frac{2}{2^{n}+1} \left( 1+\frac{1}{{{\,\textrm{poly}\,}}(n)}\right) . \end{aligned}$$
(405)

This last step may be the most non-trivial part in this proof.

Here we just outline the proof. For detailed discussions see the proof of Proposition 90. We first separate the all zeros state of the chain which contributes as \(1/2^{n}\) to the expected collision probability. After the application of \(G_1\) on the first coordinate, each row in this coordinate, will be all zeros vector with probability \(1/2^{n^{1/D}}\) and V with probability \(1-1/2^{n^{1/D}}\). After the application of \(G_2\) each plane in the direction 1, 2 will be all zeros with probability \(\approx 1/2^{2 n^{1/D}}\) and V with probability \(\approx 1- 1/2^{2 n^{1/D}}\). After the application of \(G_3\) each plane in 1, 2, 3 direction is all zeros with probability \(\approx 1/2^{3 n^{1/D}}\) and V otherwise, and so on. Eventually after the application of \(G_d\) the distribution along the chain is all zeros with probability \(\approx 1/2^{D n^{1/D}}\) and V otherwise. At this point the distribution along each individual row in each coordinate is \(\approx 1/2^{D n^{1/D}} 0 + (1-1/2^{D n^{1/D}}) V\). So the collision probability across each such row is

$$\begin{aligned} \approx \frac{1}{2^{D n^{1/D}}} +\frac{1}{2^{n^{1/D}}}. \end{aligned}$$
(406)

Therefore the collision probability across the full chain is

$$\begin{aligned} \approx \frac{1}{2^n} + \left( \frac{1}{2^{D n^{1/D}}} +\frac{1}{2^{n^{1/D}}} \right) ^{n^{1-1/D}} \approx \frac{1}{2^n} + \frac{1}{2^{n}} \exp \left( \frac{1}{2^{d n^{1/D}}} n^{1-1/D}\right) . \end{aligned}$$
(407)

\(\square \)

Corollary 92

\(O(\ln n \ln \ln n)\)-depth random circuits with long-range gates have expected collision probability \(\frac{2}{2^n+1} \left( 1+\frac{1}{{{\,\textrm{poly}\,}}(n)}\right) \).

Proof

Set \(D = \ln n\) in Theorem 91. \(\square \)

6 Scrambling and Decoupling with Random Quantum Circuits

In this section we reconstruct some of the results of Brown and Fawzi [17, 18]. The paper [17] proves random circuit depth bounds required for scrambling and some weak notions of decoupling. We are able to use our proof technique to reconstruct and improve on the results of this paper. [18] on the other hand introduces a stronger notion of decoupling with random circuits. Unfortunately our method does not seem to yield any results about this model.

We first define an approximate scrambler based on [17].

Definition 93

(Scramblers). \(\mu \) is an \(\epsilon \)-approximate scrambler if for any density matrix \(\rho \) and subset S of qubits with \(|S| \le n/3 \)

$$\begin{aligned} \mathop {{\mathbb {E}}}\limits _{C \sim \mu } \Vert \rho _S(C) - \frac{I}{2^{|S|}} \Vert ^2_1 \le \epsilon . \end{aligned}$$
(408)

where \(\rho _S(C) = {\textrm{Tr}}_{\backslash S}C \rho C^\dagger \) and \({\textrm{Tr}}_{\backslash S}\) is trace over the subset of qubits that is complimentary to S.

We show that small depth circuits from \(\mu ^{\text {lattice},n}_{D,c,s}\) are good scramblers.

Theorem 94

If \(s = O(D \cdot n^{1/D} + \ln D)\) and \(c= 1\) then \(\mu ^{\text {lattice},n}_{D,c,s}\) is a \(\frac{1}{{{\,\textrm{poly}\,}}(n)}\)-approximate scrambler. In particular, for \(D = O(\ln n)\) this corresponds to an ensemble of \(O(\ln n \ln \ln n)\) depth circuits that are \(\frac{1}{{{\,\textrm{poly}\,}}(n)}\)-approximate scramblers.

Brown and Fawzi show a circuit depth bound of \(O(\ln ^2 n)\) for random circuits with long-range interactions. Our result improves this to \(O(\ln n \ln \ln n)\) depth. We believe that the right bound should be \(O(\ln n)\). Moreover, no bound for the case of D-dimensional lattices was mentioned in their result.

Proof

We first rewrite \({\mathbb {E}}_{C \sim \mu } \Vert \rho _S(C) - \frac{I}{2^{|S|}} \Vert ^2_1 \le 2^{|S|} {\mathbb {E}}_{C \sim \mu } {\textrm{Tr}}(\rho ^2_S(C))-1\) (to see why this is true see [17]). Next, consider an arbitrary density matrix

$$\begin{aligned} \rho = \sum _{i,j} \rho _{i,j} |i\rangle \langle j|. \end{aligned}$$
(409)

We first find an expression for \({\textrm{Tr}}_{\backslash S} (C \rho C^{\dagger })\)

$$\begin{aligned} {\textrm{Tr}}_{\backslash S} (C \rho C^{\dagger })= & {} \sum _{i,j} \rho _{i,j} {\textrm{Tr}}_{\backslash S} (C |i\rangle \langle j| C^{\dagger })\nonumber \\= & {} \sum _{i,j} \rho _{i,j} {\textrm{Tr}}_{\backslash S} \sum _{g, h} C_{ig} C^*_{jh} |g\rangle \langle h|\nonumber \\= & {} \sum _{i,j} \rho _{i,j} \sum _{{\tilde{g}}, {\tilde{h}}} \sum _p C_{i,{\tilde{g}};p} C^*_{j,{\tilde{h}};p} |{\tilde{g}}\rangle \langle {\tilde{h}}|. \end{aligned}$$
(410)

Therefore

$$\begin{aligned} \mathop {{\mathbb {E}}}\limits _{C\sim \mu ^{\text {lattice},n}_{D,c,s}}\mathop {{\textrm{Tr}}}\limits _S \left( {\textrm{Tr}}_{\backslash S} \left( C \rho C^{\dagger }\right) \right) ^2= & {} \mathop {{\mathbb {E}}}\limits _{C\sim \mu ^{\text {lattice},n}_{D,c,s}} \mathop {{\textrm{Tr}}}\limits _S\left( \sum _{i,j} \rho _{i,j} \sum _{{\tilde{g}}, {\tilde{h}}} \sum _p C_{i,{\tilde{g}};p} C^*_{j,{\tilde{h}};p} |{\tilde{g}}\rangle \langle {\tilde{h}}| \right) ^2\nonumber \\= & {} \mathop {{\mathbb {E}}}\limits _{C\sim \mu ^{\text {lattice},n}_{D,c,s}}\sum _{i,j} \sum _{k,l} \sum _{{\tilde{g}}_1, {\tilde{h}}_1} \sum _{{\tilde{g}}_2, {\tilde{h}}_2} \sum _{p,q}\nonumber \\{} & {} \rho _{i,j}\rho _{kl} C_{i,{\tilde{g}}_1;p} C^*_{j,{\tilde{h}}_1;p}C_{i,{\tilde{g}}_2;q} C^*_{j,{\tilde{h}}_2;q} \delta _{{\tilde{h}}_1 = {\tilde{g}}_2}\delta _{{\tilde{h}}_2 = {\tilde{g}}_1}\nonumber \\= & {} \mathop {{\mathbb {E}}}\limits _{C\sim \mu ^{\text {lattice},n}_{D,c,s}} \sum _{ij,k,l} \sum _{a,b,c,d}\rho _{i,j}\rho _{kl} C_{i, a;b} C^*_{j,c;b}C_{i,c;d} C^*_{j,a;d}\nonumber \\= & {} {\textrm{Tr}}\left( \rho \otimes \rho \text {Ch}\left[ G^{(2)}_{\mu ^{\text {lattice},n}_{D,c,s}}\right] \left( \sum _{a,b,c,d} |ab\rangle \langle cb| \otimes |cd\rangle \langle ad|\right) \right) \nonumber \\= & {} {\textrm{Tr}}\left( \rho \otimes \rho \text {Ch}\left[ G^{(2)}_{\mu ^{\text {lattice},n}_{D,c,s}}\right] (A ) \right) . \end{aligned}$$
(411)

both \(\rho \otimes \rho \) and A are psd therefore using Lemma 34

$$\begin{aligned} {\textrm{Tr}}\left( \rho \otimes \rho \text {Ch}[G_\mu ^{(2)}] (A ) \right) \le (1+\epsilon )^{D} \cdot {\textrm{Tr}}\left( \rho \otimes \rho \prod _{1\le i \le D} \text {Ch}[G_i] (A ) \right) . \end{aligned}$$
(412)

Next, using Equation 3 of [17] we reduce computation of \({\textrm{Tr}}(\rho \otimes \rho \prod _{1\le i \le D} \text {Ch}[G_i] (A ) )\) to the following probabilistic process: starting from a uniform distribution over \(\{0,3\}^{n} \backslash I^n\) show that the probability that after the application \(\prod _{1\le i \le D} \text {Ch}[G_i]\) the string on Markov chain K defined in Sect. 4 has weight \(\le n/3\) is \({{\,\textrm{poly}\,}}(n)/2^{n}\) and this reconstructs theorem A.1 of [17].

The initial state on the chain is \(\frac{1}{2^n}\sum _{p \in \{0,3\}^n\backslash 00} \sigma _p \otimes \sigma _p\) we add the term \(\frac{1}{2^n} \sigma _0\otimes \sigma _0\) this can only slower the process. With this modification each site is initially independently \(Z \otimes Z\) or \(I \otimes I\), each with probability 1/2.

From using the proof of Theorem 91 after the application \(\prod _i \text {Ch}[G_i]\) the distribution along the each row is \(\approx 1/2^{D n^{1/D}} \sigma _0 \otimes \sigma _0 + (1-1/2^{D n^{1/D}}) V\). Therefore the probability that each site is zero is at most \(1/4 + 1/2^{D n^{1/D}} =: 1/4 + \delta =: p_0\). Hence the probability of having at most n/3 is at most

$$\begin{aligned} \sum _{k=1}^{n/3} \frac{{n \atopwithdelims ()k}}{4^n-1} p_0^{n-k} \left( 1-p_0\right) ^{k}= & {} \sum _{k=1}^{n/3} \frac{{n \atopwithdelims ()k}}{4^n-1} \left( 1/4 + \delta \right) ^{n-k} \left( 3/4-\delta \right) ^{k}\nonumber \\ {}\le & {} e^{4 \cdot 2/3 n \cdot \delta } \sum _{k=1}^{n/3} \frac{{n \atopwithdelims ()k}}{4^n-1} 1/4^{n-k} (3/4)^{k} \end{aligned}$$
(413)

which is within \(1 + O\left( n/2^{D n^{1/D}}\right) \) of what we would expect from the Haar measure. Also when \(D = O(\ln n)\) with a proper constant, this value is \(1 + 1/ {{\,\textrm{poly}\,}}(n)\). \(\square \)

Next, we consider the following notion of decoupling defined in [17]. Consider a maximally entangled state \(\Phi _{MM'}\) along equally sized systems M and \(M'\) each with m qubits, and a pair of equally sized systems A and \(A'\). Similar to [17] we consider two models for \(AA'\): 1) a pure state \(|0\rangle _A\langle 0|\) along system A with \(n-m\) qubits and 2) a maximally entangled state \(\phi _{AA'}\). We then apply a random circuit to systems \(M' A\) and we want that for a small subsystem S of M the final state \(\rho _{MS} (t)\) be decoupled in the sense that \(\rho _{MS} (t) \approx I/2^{m+s}\).

Definition 95

(Weak decouplers). A distribution \(\mu \) over \(\text{ U }(2^n)\) is an \(\epsilon \)-approximate weak decoupler if \(\Vert \rho _{MS} (t) - \frac{I_M}{2^{|M|}} \otimes \frac{I_S}{2^{|S|}}\Vert _1 \le \epsilon \).

Theorem 96

Let D be a constant integer. If \(s = O(D \cdot n^{1/D})\) and \(c= 1\) then there exists a constant \(c'<1\) such that if \(m < c' n^{1/D}\) then \(\mu ^{\text {lattice},n}_{D,c,s}\) is a \(\frac{1}{{{\,\textrm{poly}\,}}(n)}\)-approximate weak decoupler.

The depth bound Brown and Fawzi find in [17] for this problem is \(n^{1/D} \cdot O(\ln n)\) depth for \(m = {{\,\textrm{poly}\,}}(n)\).

Proof

We first show that the bound we want to calculate for the 1-norm in this theorem can be written as \({\textrm{Tr}}\left( E \text {Ch}\left[ G_{\mu ^{\text {lattice},n}_{D,c,s}}\right] F\right) \) where E and F are psd matrices. Hence using Lemma 34 we can use the overlapping projectors \(\prod _i \text {Ch}[G_i]\) instead \(\text {Ch}[G_{\mu ^{\text {lattice},n}_{D,c,s}}]\) as the second-moment operator.

We first start with the case when \(\psi _A\) is the pure state \(|0\rangle _A\langle 0|\). The initial state is the (pure) density matrix

$$\begin{aligned} \rho _{\textrm{init}} = \frac{1}{2^m} \sum _{i,j} |i\rangle \langle j| \otimes |i0\rangle \langle j0| \end{aligned}$$
(414)

where \(|i\rangle \) runs through the computational basis of M and 0 is the initial state of A. After the application of a circuit C

$$\begin{aligned} \rho _{\textrm{init}} \mapsto \rho _C = \frac{1}{2^m} \sum _{i,j,k,l} |i\rangle \langle j| \otimes |k\rangle \langle l| C_{i0;k}C^*_{j0;l}. \end{aligned}$$
(415)

where \(C_{a;b}\) is the ab entry of C. The density matrix corresponding to subsystem MS becomes

$$\begin{aligned} \frac{1}{2^m} \sum _{i,j, k',l', q'} |i\rangle \langle j| \otimes |k'\rangle \langle l'| C_{i0;k'q'}C^*_{j0;l'q'}. \end{aligned}$$
(416)

We use the bound (also used in [17])

$$\begin{aligned} \Vert \rho _{MS} (C) - \frac{I_M}{2^{|M|}} \otimes \frac{I_S}{2^{|S|}}\Vert _1 \le 2^{m+s} {\textrm{Tr}}(\rho ^2_{MS}(C))-1. \end{aligned}$$
(417)

Next, using the proof of Theorem 94\({\mathbb {E}}_{C\sim \mu }{\textrm{Tr}}(\rho ^2_{MS}(C))\) can be written as \({\textrm{Tr}}(C \text {Ch}[G^{(2)}_\mu ] D )\) where C and D are psd, hence \({\textrm{Tr}}(C \text {Ch}[G_{\mu ^{\text {lattice},n}_{D,c,s}}^{(2)}] D ) \le {\textrm{Tr}}(C \prod _i \text {Ch}[G_i] D) (1+\epsilon )\). Hence we can just use \(\prod _i \text {Ch}[G_i]\) to bound the expectation \({\mathbb {E}}_{C \sim \mu ^{\text {lattice},n}_{D,c,s}}\Vert \rho _{MS} (C) - \frac{I_M}{2^{|M|}} \otimes \frac{I_S}{2^{|S|}}\Vert _1\).

Next, we do the same calculation for the case when \(\psi _{AA'}\) is the maximally entangled state \(\frac{1}{2^{n-m}}\sum _{i,j} |i\rangle \langle j| \otimes |i\rangle \langle j|\). Therefore the initial density matrix is

$$\begin{aligned} \rho _{\textrm{init}}= & {} \frac{1}{2^n} \sum _{i,j,k,l} |i\rangle _M\langle j| \otimes |i\rangle _{M'}\langle j| \otimes |k\rangle _A\langle l| \otimes |k\rangle _{A'}\langle l|\nonumber \\ {}= & {} \frac{1}{2^n} \sum _{i,j,k,l} |i\rangle _M\langle j| \otimes |i k \rangle _{M'A'}\langle j l| \otimes |k\rangle _{A'}.\langle l| \end{aligned}$$
(418)

After the application of the random circuit this gets mapped to

$$\begin{aligned} \rho _{\textrm{init}} \mapsto \rho (C) = \frac{1}{2^n} \sum _{i,j,k,l} |i\rangle _M\langle j| \otimes |z \rangle _{M'A'}\langle w| \otimes |k\rangle _{A'}\langle l| C_{ik, z} C^*_{jl, w}. \end{aligned}$$
(419)

Again we can use a bound similar to (417) and similar to the proof of Theorem 94 we can show that tracing out a subsystem, the trace of the resulting density matrix squared can be written as \({\textrm{Tr}}\left( C \text {Ch}[G^{(2)}_\mu ] D \right) \) for C and D psd.

As proved in theorem 3.5 of [17], the task is to show that starting with uniform distribution over all strings with weight \(\le m = O(n^{1/D})\), prove that the probability that after the application of the random circuit the weight of the string on the chain is \(\ge n/2\) is at least \(1-1/4^m\). It is enough to show that this is true for the initial state with Hamming weight 1. Without loss of generality assume the nonzero digit in this string is in the first row of the first direction. After the application of \(G_1\) the first row in this direction becomes V. Using Chernoff bound for independent Bernoulli trials, with probability at least \(1- e^{-\Omega (n^{1/D})}\) there are at most \(1/4 \cdot n^{1/D} \cdot 2^{1/D}\) zeros on this row. After the application of \(G_2\) with probability at least \(1- e^{-\Omega (n^{1/D})}\) there are \(1/4 \cdot n^{2/D} \cdot 2^{2/D}\), and so on. Hence after the completion of \(\prod _i G_i\) with probability at least \(1- e^{-\Omega (n^{1/D})}\) there are at most \(1/4 \cdot n^{D/D} \cdot 2^{D/D} = n/2\) zeros on the chain. For constant D the failure probability is at most \(e^{-\Omega (n^{1/D})}\) and we can choose the constant \(c'\) small enough so that if \(m < c' n^{1/D}\) the probability of failure is at most \(1/4^m\).\(\square \)