Approximate Unitary t-Designs by Short Random Quantum Circuits Using Nearest-Neighbor and Long-Range Gates

We prove that poly(t)·n1/D\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\,\textrm{poly}\,}}(t) \cdot n^{1/D}$$\end{document}-depth local random quantum circuits with two qudit nearest-neighbor gates on a D-dimensional lattice with n qudits are approximate t-designs in various measures. These include the “monomial” measure, meaning that the monomials of a random circuit from this family have expectation close to the value that would result from the Haar measure. Previously, the best bound was poly(t)·n\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\,\textrm{poly}\,}}(t)\cdot n$$\end{document} due to Brandão–Harrow–Horodecki (Commun Math Phys 346(2):397–434, 2016) for D=1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D=1$$\end{document}. We also improve the “scrambling” and “decoupling” bounds for spatially local random circuits due to Brown and Fawzi (Scrambling speed of random quantum circuits, 2012). One consequence of our result is that assuming the polynomial hierarchy (PH\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\,\mathrm{\textsf{PH}}\,}}$$\end{document}) is infinite and that certain counting problems are #P\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\#{\textsf{P}}$$\end{document}-hard “on average”, sampling within total variation distance from these circuits is hard for classical computers. Previously, exact sampling from the outputs of even constant-depth quantum circuits was known to be hard for classical computers under these assumptions. However the standard strategy for extending this hardness result to approximate sampling requires the quantum circuits to have a property called “anti-concentration”, meaning roughly that the output has near-maximal entropy. Unitary 2-designs have the desired anti-concentration property. Our result improves the required depth for this level of anti-concentration from linear depth to a sub-linear value, depending on the geometry of the interactions. This is relevant to a recent experiment by the Google Quantum AI group to perform such a sampling task with 53 qubits on a two-dimensional lattice (Arute in Nature 574(7779):505–510, 2019; Boixo et al. in Nate Phys 14(6):595–600, 2018) (and related experiments by USTC), and confirms their conjecture that O(n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(\sqrt{n})$$\end{document} depth suffices for anti-concentration. The proof is based on a previous construction of t-designs by Brandão et al. (2016), an analysis of how approximate designs behave under composition, and an extension of the quasi-orthogonality of permutation operators developed by Brandão et al. (2016). Different versions of the approximate design condition correspond to different norms, and part of our contribution is to introduce the norm corresponding to anti-concentration and to establish equivalence between these various norms for low-depth circuits. For random circuits with long-range gates, we use different methods to show that anti-concentration happens at circuit size O(nln2n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(n\ln ^2 n)$$\end{document} corresponding to depth O(ln3n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(\ln ^3 n)$$\end{document}. We also show a lower bound of Ω(nlnn)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Omega (n \ln n)$$\end{document} for the size of such circuit in this case. We also prove that anti-concentration is possible in depth O(lnnlnlnn)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(\ln n \ln \ln n)$$\end{document} (size O(nlnnlnlnn)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(n \ln n \ln \ln n)$$\end{document}) using a different model.


Introduction
Random unitaries are central resources in quantum information science. They appear in many applications including algorithms, cryptography, and communication. Moreover, they are important toy models for random chaotic systems, capturing phenomena like thermalization or scrambling of quantum information.
An idealized model of a random unitary is the uniform distribution over the unitary group, also known as the Haar measure. However, the Haar measure is an unrealistic model for large systems because the number of random coins and gates needed to generate an element of the Haar distribution scale exponentially with the size of the system (i.e. polynomially with dimension, meaning exponentially in the number of qubits or independent degrees of freedom). To resolve this dilemma, approximate t-designs have been proposed as physically and computationally realistic alternatives to the Haar measure. They approximate the behavior of the Haar measure if one only cares about up to the first t moments.
Several constructions of t-designs have been proposed based on either random or structured circuits. While structured circuits can in some cases be more efficient [20,18,37], random quantum circuits have other advantages. They are plausible models for chaotic random processes in nature, such as scrambling in black holes [14,42], or the spread of entanglement in condensed matter systems [35,36], and decoupling [15]. Moreover, they are practical candidates to benchmark computational advantage for quantum computation over classical models, since they seem to capture the power of a generic polynomial-size unitary circuit. Indeed, the Google quantum AI group has recently proposed running a random unitary circuit on a 49-qubit superconducting device and has argued that this should be hard to simulate classically [7] (see Figure 1 for a demonstration of their proposal) . Here the random gates are useful not only for the 2-design property, specifically "anti-concentration", but also for evading the sort of structure which would lend itself to easy simulation, such as being made of Clifford gates.
All previous random circuit based constructions for t-designs required the circuits to have linear depth. In this paper, we show that certain random circuit models with small depth are approximate t-designs. We consider two models of random circuits. The first one is nearest-neighbor local interactions on a Ddimensional lattice. In this model, we apply random U(d 2 ) gates on neighboring qudits of a D-dimensional lattice in a certain order.
Depending on the application we want, we can define convergence to the Haar measure in different ways. For example, for scrambling [14] we measure convergence w.r.t. the norm E C ρ S (s) − 1 2 |S| 2 1 , where ρ S (s) is the density matrix ρ(s) reduced to a subset S of qudits and ρ(s) is the quantum state that results from s steps of the random process. But for anti-concentration, which corresponds loosely to a claim that typical circuit outputs have nearly maximal entropy, we use a norm related to E C x | x|C|0 | 4 . For other measures of convergence to the Haar measure see [34] or Section 2.4. In general, these measures are equivalent but moving between them involves factors that are exponentially large in the number of qudits, i.e., if one norm converges to the translation implies that another norm converges to 2 O(n) . Some of the known size/depth bounds for designs are of the form O(f (n, t)(n + ln 1/ )) (e.g. [10]) and in 1-D simple arguments yield an Ω(n+ln(1/ )) lower bound [14]. In this case, replacing with 2 −O(n) will not change the asymptotic scaling.
However, in D dimensional lattices the natural lower bound is Ω(n 1/D + ln(1/ )). Our main challenge in this work is to show that this depth bound is asymptotically achievable, and along the way, we need to deal with the fact that we can no longer freely pay norm-conversion costs of 2 O(n) . We are able to achieve the desired poly(t)(n 1/D + ln(1/ )) in many operationally relevant norms, but due in part to the difficulty of converting between norms, we do not establish it in all cases.
Approximate unitary designs. We will consider several notions of approximate designs in this paper. First, we will introduce some notation. A degree-(t, t) monomial in C ∈ U((C d ) ⊗n ) is degree t in the entries of C and degree t in the entries of C * . We can collect all these monomials into a single matrix of dimension d 2nt by defining C ⊗t,t := C ⊗t ⊗ C * ⊗t . We say that µ is an exact [unitary] t-design if expectations of all t, t moments of µ match those of the Haar measure. We can express this succinctly in terms of the operator Then µ is an exact t-design iff G Haar is a projector, we sometimes call G (t) µ a quasiprojector operator and we will later use the fact that it can sometimes be shown to be very close to a projector.
Most definitions of approximate designs demand that some norm of G Haar be small. Three norms that we will consider are based on viewing G (t) µ as either a vector of length d 4nt , a matrix of dimension d 2nt or a quantum operation acting on a space of dimension d nt . In each case, one can show that the t-design property implies the t -design property for 1 ≤ t ≤ t.
Definition 1 (Monomial definition of t-designs). µ is a monomial-based -approximate t-design if any monomial has expectation within d −nt of that resulting from the Haar measure. In other words, vec(A) is a vector consisting of the elements of matrix A (in the computational basis) and · ∞ refers to the vector ∞ norm.
The monomial measure is natural when studying anti-concentration, since a sufficient condition for anticoncentration is that E C | 0|C|0 | 4 is close to the quantity that arises from the Haar measure, namely 2 2 n (2 n +1) . This is achieved by [monomial measure] 2-designs.
If the operator-norm distance between G (t) µ and G (t) Haar is small then instead of calling µ an approximate design we call it a t-tensor product expander [29]. This controls the rate at which certain nonlinear (i.e. degree-t polynomial) functions of the state converge to the average value they would have under the Haar measure. We can also measure the distance between G (t) µ and G (t) Haar in the 1-norm (i.e. trace norm) and this notion of approximate designs has been considered before [4,41], although it does not have direct operational meaning. We will show poly(t)(n 1/D + ln(1/ ))-depth convergence in each of these measures.
Finally, we can consider G (t) µ to be a superoperator using the following canonical map. Define Ch Note that Ch [G µ ] is completely positive and trace preserving, i.e., a quantum channel. For superoperators M, N we say that M N if N − M is a completely positive (cp) map. Based on this ordering, a strong notion of being an approximate design was proposed by Andreas Winter and first appeared in [10].
Definition 2 (Strong definition of t-designs). A distribution µ is a strong -approximate t-design if Haar .
Circuit models. The result of [10] constructs t-designs in the strong measure (Definition 2) for D = 1 and linear depth, and we generalize this result to construct weak monomial designs for arbitrary D and O(n 1/D ) depth. We also show that the same construction converges to the Haar measure in other norms: diamond, infinity and trace norm. Our proof techniques do not seem to yield t-designs in the strong measure. We do not even know whether the construction of "strong" t-designs in sub-linear depth is possible. The second model we consider is circuits with long-range two-qubit interactions. In this model, at each step, we pick a pair of qubits uniformly at random and apply a random U(4) gate on them. This model is the standard one when considering bounded-depth circuit classes, such as QNC. Physically, it could model chaotic systems with long-range interactions. Following Oliveira, Dahlsten and Plenio [39] (see also [14,15,27]), we can map the t = 2 moments of this process onto a simple random walk on the points {1, 2, . . . , n}. We map this random walk to the classical (and exactly solvable) Ehrenfest model, meaning a random walk with a linear bias towards the origin. Further challenges are that this mapping introduces random and heavy-tailed delays and that the norm used for anti-concentration is exponentially sensitive to some of the probabilities. However, we are able to show (in Section 4) that after O(n ln 2 n) rounds of this process the resulting distribution over the unitary group converges to the Haar measure in the mentioned norm.
For a distribution p its collision probability is defined as Coll(p) = x p 2 x . If Coll(p) is large (Ω(1)) then the support of p is concentrated around a constant number of outcomes, and if it is small (≈ 1/2 n ) then it is anti-concentrated. The norm that we consider for anti-concentration is basically the expected collision probability of the output distribution of a random circuit. The expected collision probability for the Haar measure is 2 2 n +1 and our result shows that a typical circuit of size O(n ln 2 n) outputs a distribution with expected collision probability 2 2 n 1 + 1 poly(n) . Along with the Paley-Zygmund anti-concentration inequality this result proves that these circuits have the following anti-concentration property: Here µ is the distribution of random circuits we consider, and x is any n-bit string. This bound is related to the hardness of classical simulation for random circuits. We furthermore show that sub-logarithmic depth quantum circuits in this model have expected collision probability 2 2 n +1 ω(1). The best anti-concentration depth bound we get from this model is O(ln 2 n). However, we are able to construct a natural family of random circuits with depth O(ln n ln ln n) that are anti-concentrated.

Connections with quantum supremacy experiments
Outperforming classical computers, even for an artificial problem such as sampling from the output of an ideal quantum circuit would be a significant milestone for early quantum computers which has recently been called "quantum supremacy" [28,40]. The reason to study quantum supremacy in its own right (as opposed to general quantum algorithms) is that it appears to be a distinctly easier task than full-scale quantum computing, and even various non-universal forms [2,3,7,11,12,24] of quantum computing can be shown to be hard to simulate classically. For example, the outputs of constant-depth quantum circuits cannot be simulated exactly by classical computers unless the PH collapses [43]. In general families of quantum circuits have this property if they are universal under postselection, meaning that after measuring all the qubits at the end of the circuit and producing a string of bits, we condition on the values of some of these bits and use other of the bits for the output.
However, these hardness results can be non-robust and it is not known whether simulating these distributions to within constant or even 1/ poly(n) variational distance is still hard. The main way known to achieve a robust hardness of sampling is to show that the outputs of the quantum circuit are "anti-concentrated" meaning that they have near-maximal entropy (see [13,3,1]). Approximate t-designs (and even approximate 2-designs) have the desired "anti-concentration" property.
For experimental verification of quantum supremacy we can consider the following sampling task: let µ be a distribution over random circuits that satisfies Pr C∼µ | 0|C|0 | 2 ≥ 1 2 n+1 ≥ 1/8 − 1/ poly(n). (6) (which we call the anti-concentration property). Let C x be the family of circuits constructed by first applying a circuit C ∼ µ and then an X gate to each qubit with probability 1/2 (and identity with probability 1/2). A similar line of reasoning as in Bremner-Montanaro-Shepherd (see Theorem 6 and 7 of [13]) implies that Theorem 3. Fix > 0 and 0 < δ < 1/8. If there exists a BPP machine which takes C ∼ µ as input and for at least 1 − δ fraction of such inputs outputs a probability distribution that is within total variation distance from the probability distribution p x = | x|C|0 | 2 , then there exists an FBPP NP algorithm that succeeds with probability 1 − δ and computes the value | 0|C |0 | 2 within multiplicative error 2( +1/poly(n)) δ for 1/8 − 1 poly(n) fraction of circuits C ∼ C x .
As a result, if one conjectures that there exist and δ, for which the sampling task discussed in Theorem 3 above is #P-hard, then sampling from such random quantum circuits implies that PH is finite. This theorem is proved in Appendix A.
The linear to sub-linear improvement of the depth required for anti-concentration provided in this paper is likely to be significant for near-term quantum computers that will be constrained both in terms of the number of qubits (n) and noise rate per gate (δ). Due to the constraints in the number of qubits (say 50-100), quantum supremacy will only be possible without the overhead of error correction, since even the most efficient known schemes for fault-tolerant quantum computation reduce the number of qubits by more than a factor of two [17]. Thus a quantum circuit with S gates will have an expected Sδ errors. Recent work due to Yung and Gao [44] and the Google group [8] states that noisy random quantum circuits with O(ln n) random errors output distributions that are nearly uniform, and thus are trivially classically simulable. Thus S can be at most ln(n)/δ. In proposed near-term quantum devices [5,21,38,7] we can expect n ∼ 10 2 and δ ∼ 10 −2 . Thus the S = O(n ln 2 n) for long-range interactions or S = O(n √ n) bound for 2-D lattices from our work is much closer to being practical than the previous S = O(n 2 ). (This assumes that the constants are reasonable. We have not made an effort to calculate them rigorously but for the case of long-range interactions we do present a heuristic that suggests that in fact ≈ 5 3 n ln n gates are necessary and sufficient.)

Our models
We consider two models of random quantum circuits. The first involves nearest-neighbor local interactions on a D-dimensional lattice and the second involves long-range random two-qubit gates. The order of gates in the first model has some structure but in the second model it is chosen at random. Hence, we can view the second model as the natural dynamics of an n-qubit system, connected as a complete graph. We first define the following random circuit model for D = 1 which was also considered in [10]: Definition 4 (Random circuits on one-dimensional lattices). µ lattice,n 1,s is the distribution over unitary circuits resulting from the following random process.
This definition assumes that n is even but we modify it in the obvious way when n is odd. Another modification which would not change our results would be to put the qudits on a ring so that sites n and 1 are connected.
Building on this, we define the following distribution of random circuits on a two-dimensional lattice.
Definition 5 (Random circuits on two-dimensional lattices). Consider a two-dimensional lattice with n qudits. Let r α,i be the i th row of the lattice in direction α ∈ {1, 2}, for 1 ≤ i ≤ √ n. For each α ∈ {1, 2} let SampleAllRows(α) denote the following procedure (see Figure 2): , sample a random circuit from µ lattice, √ n 1,s and apply it to r α,i . Now define µ lattice,n 2,c,s to be the distribution over unitary circuits resulting from the following random process: • Repeat these steps c times: apply SampleAllRows(1) and then SampleAllRows(2).
This distribution has depth (2c+1)2s and is related but not identical to the Google AI group's proposal [7], see Figure 1. For our results on t-designs, we will take c to be poly(t) and s to be poly(t) · √ n. We believe that our result can be extended to any natural family of circuits with nearest-neighbor interactions. We also assume for convenience that √ n is an integer, but believe that this assumption is not fundamentally necessary.

Figure 1:
The architecture proposed by the quantum AI group at Google to demonstrate quantum supremacy consists of a 2D lattice of superconducting qubits. This figure depicts two illustrative timesteps in this proposal. At each timestep, 2-qubit gates (blue) are applied across some pairs of neighboring qubits.
Next, we give a recursive definition for our random circuits model on arbitrary D-dimensional lattices. We view a D-dimensional lattice as a collection of n 1/D sub-lattices of size n 1−1/D , labeled as ξ 1 , . . . , ξ n 1−1/D . We label the rows of the lattice in the D-th direction by r 1 , . . . , r n 1/D . Definition 6 (Random circuits on D-dimensional lattices). µ lattice,n D,c,s is the distribution resulting from the following random process.
1. Repeat these steps c times. Next, we define the model with long-range interactions on a complete graph.
Definition 7 (Random circuit models on complete graphs). µ CG s is the distribution over unitary circuits resulting from the following random process.
Repeat this step s times % view s as O(n ln 2 n).
• Pick a random pair of qudits (i, j) and apply a random U(d 2 ) gate between them.
The size of the circuits in this ensemble is s.
(3) (4) Figure 2: The random circuit model in definition 5. Each black circle is a qudit and each blue link is a random SU(d 2 ) gate. The model does O( √ n poly(t)) rounds alternating between applying (1) and (2). Then for O( √ n poly(t)) rounds it alternates between (3) and then (4). This entire loop is then repeated O(poly(t)) times.

Our results
Our first result is the following.
Theorem 8. Let s, c, n > 0 be positive integers with µ lattice,n 2,c,s defined as in Definition 5.
The three norms in the above theorem refer to the vector ∞ norm, the superoperator diamond norm · (see Section 2.1) and the operator S ∞ norm, also known simply as the operator norm.
Proof sketch for part 1. We first give a brief overview of the proof in [10] and explain why their construction requires a circuit to have linear depth. Let G i,i+1 be the projector operator for a random two-qudit gate applied to qudits i and i + 1, and let G = 1 . Therefore G s = G s is the quasi-projector corresponding to a 1-D random circuit with size s. [10] observed that G − G Haar corresponds to a certain local Hamiltonian and = 1 − G − G Haar ∞ is its spectral gap. The central technical result of [10] is the bound ≥ 1 n·poly(t) . As a result, G s − G Haar ∞ = (1 − 1 n·poly(t) ) s . In general G − G Haar has rank e O(n) , and in order to construct a strong approximate t-design (Definition 2), one needs to apply a sequence of expensive changes of norm that lose factors polynomial in the overall dimension of G, i.e., e O(nt) . Thereby in order to compensate for such exponentially large factors one needs to choose s = O(n 2 · poly(t)), meaning depth growing linearly with n. Brown and Fawzi [14] furthermore observed that if G is the projector corresponding to one step of a random circuit on a 2-D lattice, the spectral gap still remains 1− G−G Haar ∞ = O( 1 n·poly(t) ), and using the same proof strategy one needs linear depth.
The new ingredient we contribute is to show that if s = O( √ n) and c = poly(n) one can replace G (t) µ lattice,n 2,c,s with a certain quasi-projector G , such that We first combine (1) and (2) to obtain As a result using (3) vec This step requires certain change of norms for which we only have to pay a factor like e O( √ n) , which we justify by bounding the ranks of the right intermediate operators. The factor of 1/d nt comes from the fact that the Haar measure itself has monomial expectation values on this order (in fact as large as t!/d nt but we suppressing the t-dependence in this proof sketch. ) We now briefly describe the construction of G . Let G R (and G C ) be the projector operators corresponding to applying a Haar unitary to each row (and column) independently. Then G = (G R G C ) c . Let V R , V C , and V Haar be respectively the subspaces that G R , G C and G Haar project onto. In order to prove (1) in Section 3.6.1 we first use the fact that our circuits are computationally universal to argue that V C ∩ V R = V Haar . We then prove that the angle between V R ∩ V ⊥ Haar and V C ∩ V ⊥ Haar is very close to π/2, i.e., ≈ π/2 ± 1 d √ n . This implies that G C G R = G Haar + P , where P is a small matrix in the sense that P ∞ ≈ 1/d √ n . Choosing c = poly(n) we obtain (1). To show (2) it is not hard to see that the rank of G − G Haar is indeed e O( √ n) . For (3) we use the construction of t-designs from [10]. In particular, our random circuits model first applies an O( √ n) depth circuit to each row and then an O( √ n) depth circuit to each column and repeats this for poly(n) rounds. The result [10] implies that each of these rounds is effectively the same as applying a strong approximate t-design to the rows or columns of the lattice. We then analyze how these designs behave under composition in various norms and prove (3).
Our second result generalizes Theorem 8 to arbitrary dimensions. Theorem 9. There exists a value δ = 1/d Ω(n 1/D ) such that for some large enough c depending on D and t: For these spatially local circuits we also improve on some bounds in [14] and [15] about scrambling and decoupling, removing polylogarithmic factors. Here we give an informal statement of the result with full details and definitions found in Section 6.
Theorem 10 (Informal). Random quantum circuits acting on D-dimensional grids composed of n qubits are scramblers and decoupler in the sense of [14] and [15] after O(D · n 1/D ) number of steps.
Our last result concerns the fully connected model. If s = O(n ln 2 n) and d = 2 then µ CG s satisfies the anti-concentration criterion (see (5)). We phrase our result in terms of the expected "collision probability" of the output distribution of C ∼ µ CG s from which (5) will follow using the Paley-Zygmond inequality. In particular, if C is a quantum circuit on n qubits, starting from |0 n the collision probability is For the Haar measure E C∼Haar Coll(C) = 2 2 n +1 , and for the uniform distribution this value is 1/2 n . In contrast, a depth-1 random circuit has expected collision probability ( 2 5 ) n , which is exponentially larger than what we expect from the Haar measure.
Theorem 11. There exists a c such that when s > cn ln 2 n, Moreover if t ≤ 1 3c n ln n for some large enough c , then Proof Sketch. For the upper bound, we translate the convergence time of the expected collision probability to the mixing time of a certain classical Markov chain (which we call X 0 , X 1 , . . .). This Markov chain has also been considered in previous work [39,27,15]. Part of our contribution is to analyze this Markov chain in a new norm. The Markov chain has n sites labeled as 1, . . . , n, and at each site x it will move only to x − 1, x or x + 1. Such chains are known as "birth and death" chains, and in our case it results from representing the state of the system by a Pauli operator and then taking x to be the Hamming weight of that Pauli operator. It is known [39] that the probability of moving to site x + 1 is ≈ 6 5 x(n−x) n 2 and the probability of moving to site x − 1 is ≈ 2 5 x(x−1) n 2 . The major difficulty in proving mixing for this Markov chain is that the norm which we have to prove mixing in is exponentially sensitive to small fluctuations (measured in either the 1-norm or the 2-norm). Indeed, given starting condition we would like to show that is ≤ O(2 −n ). We can think of (13) as a weighted 1-norm on probability distributions. Our proof will compute the distribution of X t for t = O(n ln 2 n) nearly exactly. One distinctive feature of this chain is that when k/n 1, the probability of moving is O(k/n) and the chain is strongly biased to move towards the right. When k/n reaches O(n), the chain becomes more like the standard discrete Ehrenfest chain, which is a random walk with a linear bias towards (in this case) k = 3 4 n. Thus the small-k region needs to be handled separately. This is especially true for anti-concentration thanks to the 1/3 k term in (13), so that even a small probability of waiting for a long time in this region can have a large effect on the collision probability.
The approach of [27,22,15] has been to relate the original Markov chain to an "accelerated" chain which is conditioned on moving at each step. The status of the original chain can be recovered from the accelerated chain by adding a geometrically distributed "wait time" at each step. Then standard tools from the analysis of Markov chains, such as comparison theorems and log-Sobolev inequalities, can be used to bound the convergence rate of the accelerated chain. Finally, it can be related back to the original chain by arguing that the accelerated chain is unlikely to spend too long on small values of k, allowing us to bound the wait time. For our purposes, this process does not produce sharp enough bounds, due to the heavy-tailed wait times combined with fairly weak bounds on how quickly the accelerated chain converges and leaves the small-k region.
We will sharpen this approach by incompletely accelerating; i.e., we will couple the original chain to a chain that moves with a carefully chosen (but always Ω(1)) probability. In particular, we will introduce a chain where the probabilities of moving from x to x − 1, x or x + 1 are each affine functions of x. In fact our new "accelerated" chain is only accelerated for x < 5 6 n and is actually more likely to stand still for x ≥ 5 6 n. This will allow us to exactly solve for the probability distribution of the accelerated chain after any number of steps, using a method of Kac to relate this distribution to the solution of a differential equation. Our solution can be expressed simply in terms of Krawtchouk polynomials, which have appeared in other exact solutions to random processes on the hypercube. We relate this back to the original chain with careful estimates of the mean and large-deviation properties of the wait time. This ends up showing only that the collision probability is small for t in some interval [t 1 , t 2 ], and to show that it is small for a specific time, we need to prove that the collision probability decreases monotonically when we start in the state |0 n . A further subtlety is that (13) technically only applies when all qubits have been hit by gates and we need to extend this analysis to include the non-negligible probability that some qubits have never been acted on by a gate.
Because previous work achieved quantitatively less sharp bounds, they could omit some of these steps. For example, [22,27] used O(n 2 ) gates, which meant that the probability of most bad events was exponentially small. By contrast, in depth O(n ln 2 (n)), there is probability n −O(ln n) of missing at least one qubit and so we cannot afford to let this be an additive correction to our target collision probability of constant · 2 −n . Likewise, [15] used only O(n ln 2 (n)) gates but achieved a collision probability of 2 n−n for small constant , which allowed them to use a simpler version of the accelerated chain whose convergence they bounded using generic tools from the theory of Markov chains.
For the lower bound we just consider the event that the initial Hamming weight does not change throughout the process. The initial state with Hamming weight k has probability mass Pr[X 0 = k] = ( n k ) 2 n −1 . Starting with Hamming weight k, the probability of not moving in each step is e −O(k/n) , so if t = cn ln n for c 1 then we have Pr[X t = k|X 0 = k] ≥ e −O(kt/n) . Hence Our lower bound is based on the following intuition for how circuits of depth s < n 1/D should behave. A crude model for such circuits would be to model them as n/s D copies of Haar-random unitaries each on s D qubits. In this case their collision probability would be ≈ 2 n/s D 2 n . Our lower bound asymptotically matches this intuition. We believe that this should be tight (up to some constant in the exponent) throughout the range 1 ≤ s ≤ n 1/D but in our upper bound we focus only on the s ∼ n 1/D end of the range. Our intuition is that after depth s any two non-overlapping clusters of s D qubits will be close to Haar random. There are n/t D such clusters. So a simple calculation suggests the mentioned asymptotic bound. So one expects that after s ≈ n 1/D the distribution across the full lattice becomes close to Haar. The goal of this work is to formally prove this intuition.
A natural question is whether there is a common generalization of our Theorems 9 and 11. In physics, the D → ∞ limit is often considered a good proxy for the fully connected model. This raises the question of whether we needed Theorem 11 to handle the fully connected case, or whether it would be enough to use Theorem 9 in the large D limit. However, Theorem 9 works only for D = O(ln n/ ln ln n), and the best depth bound we can get from this theorem is e O(ln n/ ln ln n) , which is far above the O(ln 2 (n)) achievable by Theorem 9. However, in Section 5 we give an alternative proof for anti-concentration of outputs via circuits on D-dimensional circuits with t = 2 and D = O(ln n). Using that approach we can make the depth as small as O(ln n ln ln n). We conjecture that O(ln n) depth should also possible.
In order to establish rigorous bounds, our results involve some inequalities that are not always tight. As a result, the upper bound on collision probability in Theorem 11 has a factor of 29 rather than the 2 + o(1) that we would expect and the bound on the number of gates required may be too high by a factor of ln(n).
Since determining the precise number of gates needed for anti-concentration may have utility in near-term quantum hardware, we also undertake a heuristic analysis of what depth seems to be required to achieve anticoncentration. Here we ignore the possibility of large fluctuations in the wait time, for example, and simply set it equal to its expected value. We also freely make the continuum approximation for the biased random walk that ignores wait time, obtaining the Ornstein-Uhlenbeck process. The resulting analysis (found in Section 4.6) suggests that 5 3 n ln n + o(n ln n) gates are needed to achieve anti-concentration comparable to the Haar measure.

Previous work
In [27] Harrow and Low (HL) considered random quantum circuits on a complete graph and showed that after linear depth these random quantum circuits become approximate 2-designs (with a missing piece of the proof provided by Diniz and Jonathan [22]). In [10] Brandão-Harrow-Horodecki (BHH) extended this result and showed that for a 1D-lattice after depth t 10 · O(n + ln 1 ) these random quantum circuits become -approximate t-designs. Both of these results directly imply anti-concentration after the mentioned depths. The construction of t-designs in [10] is in a stronger measure than the one in HL [27]. The gap of the second-moment operator was calculated exactly for D = 1 and fully connected circuits byŽnidarič [45] and a heuristic estimate for the t th moment operator was given by Brown and Viola for fully connected circuits [16].
In [15,14] Brown and Fawzi considered "scrambling" and "decoupling" with random quantum circuits. In particular, they showed for a D-dimensional lattice scrambling occurs in depth O(n 1/D polylog(n)), and for complete graphs, they showed that after polylogarithmic depth these circuits demonstrate both decoupling and scrambling. For the case of D-dimensional lattices they showed that for the Markov chain K, after depth n 1/D polylog(n), a string of Hamming weight 1 gets mapped to a string with linear Hamming weight with probability 1 − 1/ poly(n). While this result is related to ours, it does not seem to yield the results we need e.g. for anti-concentration, due to the powers of Hilbert space dimension that are lost when changing norms.
In [36,35] Nahum, Ruhman, Vijay and Haah considered operator spreading for random quantum circuits on D-dimensional lattices. They considered the case when a single Pauli operator starts from a certain point on the lattice and they analyze the probability that after a certain time a non-identity Pauli operator appears at an arbitrary point on the lattice. For D = 1 they showed that this probability function satisfies a biased diffusion equation. Their result in this case is exact. For D = 2 they explained, both numerically and theoretically, that this probability function spreads as an almost circular wave whose front satisfies the one dimensional Kardar-Parisi-Zhang equation. They moreover explained: 1) the bulk of the occupied region is in equilibrium, 2) fluctuations appear at the boundary of this region with ∼ t 1/3 , and 3) the area of the occupied region grows like t 2 , where t is the depth of the circuit. As far as we understand this result does not directly lead to the construction of t-designs and rigorous bounds on the quality of the approximations made in that paper are not known.
If we assume that qudits have infinite local dimension (d → ∞) then the evolution of Pauli strings on a 2-D lattice is closely related to Eden's model [23]. Here, Eden has found certain explicit solutions. However, apart from the d → ∞ limit, his model differs from ours also in that his considers only starting with a single occupied site and running for a time much less than the graph diameter (or equivalently, considering an infinitely large 2-D lattice), while we consider the initial distribution obtained by starting in the |0 n state.

Open questions
1. Is it possible to construct "strong" t-designs (Definition 2) using sub-linear depth random circuits?
If we can show that the off-diagonal moments (see Definition 34) of the distribution, which have expectation zero according to the Haar measure, become smaller than 1/d 3nt in sub-linear depth, then our construction of monomial designs implies the construction of strong designs. On the other hand, we cannot rule out the possibility that strong designs require linear depth.
2. Can we tighten and generalize our bounds for the case of a complete graph? We conjecture that such random circuits of size s = 5 3 n(ln n + ) are O( )-approximate 2-designs. Right now the best we can prove is that O(n ln 2 n) size circuits have the same (certain) moments as Haar up to constant factors. Can we generalize this closeness within constant factors to all moments?
3. Is the dependency of our results on either of n or t tight? At the moment the best lower bound is O(t ln n) depth for any circuit. It seems likely that the degree of the polynomial in t is wasteful. Intriguingly, for constant n and with a different gate model, some results are known that are completely independent of t [9].
4. If we pick an arbitrary graph and apply random gates on the edges of this graph, after what depth do these circuits become t-designs? We conjecture that if the graph has large expansion and diameter l, then the answer is O(l). However, if the graph has a tight bottleneck (like a binary tree), then even though the graph has small diameter, we suspect that certain measures of t-designs (including the monomial measure) require linear depth. Ideally, the t-design time for any graph could be related to other properties of the graph such as mixing time, cover time, etc.
5. Can we prove a comparison lemma for random circuits, i.e., can we show that if two random circuits are close to each other, then they become t-designs after roughly the same amount of time? Such comparison lemma may imply that other natural families of low-depth circuits are approximate tdesigns. A related question is whether deleting random gates from a circuit family can ever speed up convergence to being a t-design. Such a bound has been called a "censoring" inequality in the Markov-chain literature.
6. Our results do not say much about the actual constants that appear in the asymptotic bounds for the required size for anti-concentration. We conjecture the leading term in the anti-concentration time for random circuits on complete graphs is 5 3 n ln n. See Section 4.6 Conjecture 1 for a precise conjecture and for some justification.
For the D-dimensional case our bounds inherit constant factors from [10]. Simple numerical simulation and also the analysis of [36,35,7] suggest that the constant should be ≈ 1.
7. For the case of D-dimensional circuits, our result does not say much about the dynamics of the distribution when depth is n 1/D . Such a result may explain the dynamics of entanglement in random circuits. [36,35] consider this problem for the case when a single Pauli operator starts at the middle of the lattice; however, their result does not apply to arbitrary initial operators.

Basic definitions
We need the following norms: Definition 12. For a superoperator E the diamond norm [32] is defined as E := sup d E ⊗id d 1→1 , where for a superoperator A the 1 → 1 norm is defined as A 1→1 := sup X =0 A matrix is called positive semi-definite (psd) if it is Hermitian and has all non-negative eigenvalues. A superoperator A is called completely positive (cp) if for any d ≥ 0, A ⊗ id d maps psd matrices to psd matrices. A superoperator is called trace-preserving completely positive (tpcp) if it maps if it preserves the trace and is furthermore cp.
Let S be a set of qudits, then Definition 13. Haar(S) is the Haar measure on U ((C d ) ⊗|S| ). We refer to Haar(i, j) as the two qudit Haar measure on qudits indexed by i and j and also if m is an integer, the notation Haar(m) means Haar measure on m qudits.
We now define expected monomials, moment superoperators and quasi-projectors for a distribution µ over the unitary group: Definition 14. Let n, t > 0 be positive integers and µ be any distribution over n-qudit unitary group Haar(i,j) . Using this Definition we will also use the following quantities: 1. Let i 1 , j 1 , . . . , i t , j t , k 1 , l 1 , . . . , k t , l t ∈ [d] n be any 2t-tuple of words ∈ [d] n . Then the i 1 , . . . , l t monomial is the expected value of a balanced monomial of µ defined as C a,b is the a, b entry of the unitary matrix C.

Let ad
Next, we define the building blocks of our t-design constructions.
(i,j) =: For the 2-D lattice g R and g C for g 1 and g 2 , respectively.
The following defines the moment superoperator and quasi-projector of the Haar measure on the rows of a D-dimensional lattice in a specific direction.
Definition 17 (Idealized model with Haar projectors on rows). Let 1 ≤ α ≤ D be one of the directions of a D-dimensional lattice then Haar(rα,i) =: For a 2-D lattice we use G R and G C for G 1 and G 2 , respectively.
Next, we define moment operators and projectors corresponding to the Haar measure on the sub-lattices of a D-dimensional lattice. We view a D-dimensional lattice as a collection of n 1/D smaller lattices each with dimension D − 1, composed of n 1−1/D qudits. We label these sub-lattices with Planes(D) := {p 1 , . . . , p n 1/D }.
In particular, for a distribution µ over circuits of size s the expected collision probability is defined as Remark 1. For d = 2, t = 2 and when ν is the Haar measure on U(4), Ch G (i,j) is the following map in the Pauli basis: More generally, if S is a collection of qubits, and p, q ∈ {0, 1, 2, 3} S , then Ch G See [39,27] for the proof of these remarks.

Operator definitions of the models
Definition 20 (Random circuits on a two-dimensional lattice). The quasi-projector of µ lattice,n The generalization of this definition to arbitrary D dimensions is according to: [Recursive definition for random circuits on D-dimensional lattices] The quasi-projector of µ lattice,n D,c,s is specified by the recursive formula: It will be useful to our proofs to also define: In particular,G n,D,c,s is the same as G µ lattice,n D,c,s except that we have replaced g s Rows(D,n) with G Rows(D,n) . Definition 20 is a special case of Definition 21, but we included both of them for convenience.

Summary of the definitions
See below for a summary of the definitions: Haar measure on qudits i and j Definition 13 Haar projector of order t on qudit i and j Definition 14 the collection of rows of a lattice (with n points) in the α direction Definition 15 g Rows(α,n) two-qudit gates applied to even then odd neighbors in each row in the α direction Definition 16 g r(α,i) two-qudit gates applied to even then odd neighbors the i-th row in the α direction Definition 16 g R and g C g Rows(1,n) and g Rows(2,n) when D = 2. Definition 16 G Rows(α,n) Haar projector applied to each row in theα th direction Definition 17 Haar projector applied to each row (column) of a 2D lattice Definition 17 G Planes(α) Haar projector applied to each plane perpendicular to the direction α cos −1 max x∈A,y∈B x, y is the angle between two vector spaces A and B Section 3.6.1

Elementary tools
If A is a matrix and σ i are the singular values of A, then for p ∈ [1, ∞) the Schatten p-norm of A is defined as is the linear map from matrices to superoperators such that for any two equally sized matrices A and B, Ch , for any equally sized matrices A, B, C, D.

Consider the Haar measure over U(d). Ch G (t)
Haar (defined in the previous section) is the projector onto the matrix vector space of permutation operators (permuting length t words over the alphabet [d]). In particular, for any matrix X ∈ C d t ×d t we can write Ch G (t) where V (π) is the permutation matrix (i1,...,it)∈[d] t |i 1 , . . . , i t i π(1) , . . . , i π(t) |, and W (π) is a linear combination of permutations. Specifically Here the coefficients α(·) are known as Weingarten functions (see [19]). If µ, ν ∈ S t then let dist(µ, ν) denote the number of transpositions needed to generate µ −1 ν from the identity permutation. Then we can define α(·) by the following relation.
Note that α(π) is always real and |α(λ) Let A, B be matrices. For the superoperator D ≡ B Tr[A·] we use the notation D = BA * . We need the following observation: where We need the following lemma: Lemma 23. If A is a (possibly rectangular) matrix, then AA † and A † A have the same spectra.
Lemma 24. If A and B are matrices and · * is a unitarily invariant norm, then AB * ≤ A * B ∞ .
Proof. This lemma can be viewed as a consequence of Russo-Dye theorem, which states that the extreme points of the unit ball for · ∞ are the unitary matrices. Thus we can write B = B ∞ i p i U i for {p i } a probability distribution and {U i } a set of unitary matrices. We use this fact along with the triangle inequality and then unitary invariance to obtain A similar argument applies to superoperators.
Lemma 25. If A is a superoperator and B is a tpcp superoperator then AB ≤ A .
Proof. Let d be ≥ the input dimensions of both A and B. Then and so AB is maximizing over a set which is contained in the set maximized over by A .
These give rise to the following well-known bound, which often is called "the hybrid argument." This is also true for superoperators and the diamond norm, if each superoperator is a tpcp map.
We will need a similar bound for tensor products.
Lemma 27. Suppose A − B * ≤ for some norm · * that is multiplicative under tensor product. Then for any integer M > 0 The same holds for superoperators and the diamond norm. In particular We need the following definition and lemma: Definition 28. Let X and Y be two real valued random variables on the same totally ordered sample space We represent this by X Y .
Lemma 29 (Coupling). X Y if and only if there exists a coupling (a joint probability distribution) between X and Y such that the marginals of this coupling are exactly X and Y and that with probability 1, X ≤ Y .

Various measures of convergence to the Haar measure
Definition 30. Let µ be a distribution over n-qudit gates. Let be a positive real number.
or equivalently if The first is cp ordering and the second is psd ordering.

(Monomial definition) µ is a monomial based -approximate t-design if for any balanced monomial
Here for a matrix A, vec(A) is a vector consisting of the entries of A (in the computational basis).

(Diamond definition) µ is an -approximate
4. (Trace definition) µ is an -approximate t-design in the trace measure if 6. (Anti-concentration) µ is an approximate anti-concentration design if 7. (Approximate scramblers) µ is an -approximate scrambler if for any density matrix ρ and subset S of qubits with |S| ≤ n/3 where ρ S (C) = Tr \S CρC † and Tr \S is trace over the subset of qubits that is complimentary to S.
We consider two definitions. In the first definition the initial state is φ M M ⊗ φ AA and in the second is the reduced density matrix along M S after the application of C ∼ µ.

Approximate t-designs by random circuits with nearest-neighbor gates on D-dimensional lattices
In this section we prove theorems 8 and 9, which state that our random circuit models defined for Ddimensional lattices (definitions 5) form approximate t-designs in several measures. We begin in Section 3.1 by outlining some basic utility lemmas. The technical core of the proof is contained in the lemmas in Section 3.2 in which we bound various norms of products of Haar projectors onto overlapping sets of qubits. These are proved in Sections 3.5 and 3.6 respectively. We show how to use these lemmas to prove our main theorems in Section 3.3 (for a 2-D grid) and in Section 3.4 (for a lattice in D > 2 dimensions).

Basic lemmas
In this section we state some utilities lemmas which are largely independent of the details of our circuit models.

Comparison lemma for random quantum circuits
Our comparison lemma is simply the following:

Lemma 32 (Comparison). Suppose we have the following cp ordering between superoperators
Corollary 33 (Overlapping designs). If K 1 , . . . , K t are respectively the moments superoperators of 1 , . . . , tapproximate strong k-designs each on a potentially different subset of qudits, then

Bound on the value of off-diagonal monomials
We first formally define an off-diagonal monomial.
Definition 34 (Off-diagonal monomials). A diagonal monomial of balanced degree t of a unitary matrix C is a balanced monomial that can be written as product of absolute square of terms, i.e., |C a1,b1 | 2 . . . |C at,bt | 2 . A monomial is off-diagonal if it is balanced and not diagonal.
We now define the set of diagonal indices as We note that a diagonal monomial can be written as Tr(C ⊗t,t x) for some x ∈ D and similarly, an off-diagonal monomial can be written as Tr(C ⊗t,t x) for some x ∈ O.
We relate the strong definition of designs to the monomial definiton via the following lemma.
are two moment superoperators that satisfy the following completely positive ordering Let O and D be respectively the set of off-diagonal and diagonal indices for monomials. Then

Bound on the moments of the Haar measure
We need the following bound on the t-th monomial moment of the Haar measure. Assume we have m qudits.

Lemma 36 (Moments of the Haar measure). Let G (t)
Haar(m) be the quasi-projector operator for the Haar measure on m qudits. Then Here the maximization is taken over matrix elements in the computational basis like

Gap bounds for the product of overlapping Haar projectors
Here we state several lemmas that address the specific We need the following results: In these last two lemmas, we see that c will need to grow with t. We believe that a sharper analysis could reduce this dependence, but since we already have a poly(t) dependence in s, improving Lemmas 40 and 41 would not make a big difference. In fact, even in 1-D, [10] found a sharp n dependence but their factor of poly(t) (which we inherit) is probably not optimal.

Proof of Theorem 8; t-designs on two-dimensional lattices
Theorem (Restatement of Theorem 8). Let s, c, n > 0 be positive integers with µ lattice,n 2,c,s defined as in Definition 5. Proof.
1. This item corresponds to convergence of the individual moments of the Haar measure. A balanced moment of a distribution µ can be written as where |i := |i 1 , . . . , i t and so on for |i , |j , |j . The same moment can also be written as We will see that the strong design condition established by gives us strong bounds first for the "diagonal" case (i = i , j = j ) then the off-diagonal case. This is because when we interpret G (t) µ as a quantum operation, the diagonal monomials correspond to Tr Y G (t) µ X for psd matrices X, Y , and so the strong design condition applies directly. For off-diagonal moments we need to do a bit more work.
For each the diagonal and off-diagonal monomials, our strategy will be to first compare with the entries of G R (G C G R ) c G R and then to compare to G Haar .
First observe that since Ch G (t) µ lattice,n 2,c,s = (g s R g s C ) c g s R and s = poly(t) · ( √ n + ln(1/δ)) then corollary 6 of [10] implies that each g s i for i ∈ {R, C} is an δ-approximate t-design. Hence, using corollary 33, Note that we chose poly(t) large enough so that the error is as small as δ 4t! . This choice will be helpful later.
Focusing first on diagonal monomials |i i| , |j j| we can bound In other words, for diagonal monomials Similarly, using the first inequality in (47) The next step is to bound In the third line we have used the Hölder's inequality. In the last inequality we have used the fact that G 1 is a tensor product of G r1,i across each column in the first direction; by symmetry we can just consider G r1,1 .
Using Lemma 36 Furthermore, using Lemma 37 therefore As a result, for some large enough As a result, using Lemma 36 any diagonal monomial satisfies Next, we bound the expected off-diagonal monomials of the distribution. The value of the off-diagonal monomials according to the Haar measure is zero. So it is enough to bound max x∈O | Tr G where O is the set of off-diagonal indices for moments. In order to do this we use Lemma 35 for µ = µ lattice,n 2,c,s and ν being a distribution with moment superoperator Ch Here D is the set of diagonal monomials. Using (55) In order to bound max x∈O Tr( therefore using (55), (57) and (59) we conclude 2. Ch In the first line we have used triangle inequality and the definition K In the second line, for the first term we have used Lemma 26 and that all operators are compositions of moment superoperators. For the second part we have used Lemma 40. In the third inequality we have used Lemma 27. In fourth inequality, the first term (δ/2) comes from lemma 3 and corollary 6 of [10] for s = poly(t) · ( √ n + ln 1 δ ), and the second δ/2 is by the 3.
In the second and fourth inequalities where we have used 4.
These steps follow from the proof of part 1.

Proof of Theorem 9; t-designs on D-dimensional lattices
Throughout this section we treat D and t as constants.
Theorem (Restatement of Theorem 9). There exists a value δ = 1/d Ω(n 1/D ) such that for some large enough c depending on D and t: Proof.

Consider the moment superoperator for the
Therefore, Next, we use Lemma 39. This lemma along with the bound in (65) and Lemma 36 proves the stated bound for diagonal monomials: Next, we bound off-diagonal monomials max x∈O | Tr G Similar to (59) we can show therefore using (67), (68) and (69) we conclude that any monomial M . We use induction to show that D,n = 1/d Ω(n 1/D ) for any integers n and D. This is true for D = 2 by Theorem 8. Assuming D−1,n = 1/d Ω(n 1/(D−1) ) for any n, we show that D,n = 1/d Ω(n 1/D ) .
The third line is by triangle inequality. The fourth inequality is by Lemma 26. The fifth line is by Lemma 27 and the definitionG n,D, The sixth line is by lemma 3 and corollary 6 of [10], which assert that after linear depth in the number of qudits (n 1/D ), the random circuit model we consider is -approximate t-design in the diamond measure, and that can be made exponentially small in n 1/D .
. We first relate this expression to the superoperator Ch G Planes(D) . Using triangle inequality and Lemma 26: To bound (1), observe that each term contains at least one Z 1 . We would like to bound Z 1 1 . Observe that G Planes = G (t)⊗n 1/D Haar(n 1−1/D ) , so This final expression is ≤ d −Ω(n 1/D ) for n sufficiently large relative to d, t, D. Eq. (75) uses the induction hypothesis as well as the fact that G Haar(m) is a projector of rank ≤ t! for any m. (In fact this is an equality when m ≥ ln(t).) This last fact is standard and can be found in Lemma 17 of [10], with the relevant math background in [25,26].
Haar has rank t! O(n 1/D ) so the cost of moving to the infinity norm is moderate: We now bound G Planes(D) g s Rows(D,n) − G These steps follow from the proof of part 2.
3.5 Proofs of the basic lemmas stated in Section 3.1

Comparison lemma for random quantum circuits
Lemma (Restatement of Lemma 32). Suppose we have the following cp ordering between superoperators Proof. We first prove the following claim Claim: If A B and C D are cp maps, then AC BD.
Proof. The class of cp maps is closed under composition and addition. Therefore Clearly this is true for i = 1. Suppose also this is true for 1 < k < t.
Corollary (Restatement of Corollary 33). If K 1 , . . . , K t are respectively the moments superoperators of 1 , . . . , t -approximate strong k-designs each on a potentially different subset of qudits, then Ch G

Bound on the value of off-diagonal monomials
Lemma (Restatement of Lemma 35). Let δ > 0. Assume that Ch G are two moment superoperators that satisfy the following completely positive ordering Let O and D be respectively the set of off-diagonal and diagonal indices for monomials. Then be the n-qudit maximally entangled state, and N = d n .
We use the following standard lemma which we leave without proof (see [10] for e.g.) Lemma 42. Let µ and ν be two distributions over the n-qudit unitary group then Ch[G is a psd matrix.
We now adapt Lemma 35 to Lemma 42. First, and Therefore since Ch G ν ] the following matrix Is psd. We use the following fact about psd matrices which we leave without proof.
Factif A is psd then the absolute maximum of off-diagonal terms in A is at most the absolute maximum diagonal term.
Then using the above fact Hence then using this in (93) Here the maximization is taken over matrix elements in the computational basis like y = |i 1 , . . . , i t , i 1 , . . . , i t j 1 , . . . , j t , j 1 , . . . , j t |. Each label (e.g.
Proof. First observe that The below lemma concludes the proof.
If the moment is not balanced the expectation is zero and hence the bound still works. Here we have used a closed form expression for E |C ai,bi | 2k , see corollary 2.4 and proposition 2.6 of [19] for a reference.
3.6 Proofs of the projector overlap lemmas from section 3.2 3.6.1 Extended quasi-orthogonality of permutation operators with application to random circuits on 2-dimensional lattices In this section we prove Lemma 37 Lemma (Restatement of Lemma 37).
First, we need a description of the subspaces the projectors G R , G C and G Haar project onto. Consider a √ n × √ n square lattice with n qudits as the collection of points A : We use the following interpretation of the Hilbert space a quasi-projector acts on. This interpretation is also used in [10]. Denote Here we assume each point of A consists of t pairs of qudits, each with local dimension d. Thereby, the lattice becomes the Hilbert space H := (x,y)∈A C d 2t (x,y) , and has dimension d 2tn .
We are interested in a certain subspace of H, and in order to understand it we need the following notation. For each point (x, y) ∈ A we assign the quantum state |ψ π := (I ⊗V (π)) |Φ d,t , for each permutation π ∈ S t . |Φ d,t is the maximally entangled state 1 , is a representation of S t with the map V (π) : |x(1), x 2 , . . . , x t → |x π −1 (1) , x π −1 (2) , . . . , x π −1 (t) , and S t is the symmetric group over t elements.
Given these definitions define the following basis states in H: and, |C π1,π2,...,π √ for each √ n tuple of permutations (π 1 , π 2 , . . . , Denote H t,n as the subset consisting of tuples of permutations in which not all of the permutations are equal. For example, elements like (π, π, . . . , π) are not contained in this set. Notice that these basis are not orthogonal to each other and if t > d n these are not even linearly independent.
We need the following definition of a Gram matrix Definition 44 (Gram matrix). Let v 1 , . . . , v Rows(D,n) be normal vectors that are not necessarily orthogonal to each other. Then the Gram matrix corresponding to this set of vectors is defined as We also need the following lemma

Lemma 45. (Perron-Frobenius) If
A is a (not necessarily symmetric) d-dimensional matrix, then: Let G R , G C and G Haar be the quasi-projectors defined in Section 2.1. From [10] we know that G R , G C and G Haar are indeed projectors onto V R , V C and V Haar , respectively. Define the inner-product matrix between V R and V C with matrix Q with entries: The goal is to prove . This basically means that the composition of G R and G C is close to G Haar .
Also let c d,n,t = 2d √ n be a number very close to 1.
Proof of Proposition 46. We use the following result of Jordan Proposition 49. (Jordan) if P and Q are two projectors, then the Hilbert space V they act on can be decomposed, as a direct sum, into one-dimensional or two-dimensional subspaces, all of which are invariant under the action of both P and Q at the same time.
which implies Corollary 50. There are orthonormal basis e 1 , . . . , e K , f 1 , . . . , f K , q 1 , . . . , q T , and angles 0 ≥ θ 1 ≥ θ 2 ≥ . . . ≥ θ K ≤ π/2 such that: and and In other words, both G R and G C can be decomposed into 2 × 2 blocks, each corresponding to one of the angles θ i , such that G C on this block looks like Hence G C G R looks like Let |x be a vector with entries corresponding to xπ 1,...,π √ n . Similarly, a typical vector insideṼ C can be represented as |ψ y = π∈H t,n yπ|Rπ π,σ∈H t,n yπyσ Cπ|Cσ . Also represent the corresponding vector |y similarly.
For the second line we used the following proposition Proposition 51. IfJ is the Gram matrix for the basis states, |R · or |C · in (101) and (102) forṼ R or V R , then for any |x with ||x|| 2 = 1: For the third line we used Cauchy-Schwartz.
In order to prove Proposition 48 we need the following tool. If x(1), x 2 , . . . , x K are d-dimensional vectors the multi-product of them is defined to be: Proposition 52 (Majorization). Let x 1 , x 2 , . . . , x K be d-dimensional, non-negative and real vectors. If x ↓ i is x i in descending order, then: Proof. The K = 2 version of claim is that x(1), x 2 ≤ x(1) ↓ , x ↓ 2 . This is a standard fact. To prove it, observe that WLOG we can assume x(1) = x(1) ↓ . Then for any out-of-order pair x 2i < x 2j with i < j, we will increase x(1), x 2 by swapping x 2i and x 2j . Applying this repeatedly we end with x(1) ↓ , x ↓ 2 . This same argument works if we replace the inner product with a sum over the first d ≤ d terms, i.e. d i=1 x 1i x 2i . Thus the same argument shows that The proposition now follows by induction on K.
Proof of Proposition 51. We will prove the statement for the row space, and the same thing works for the column space. First, for any normal vector |x , x|J R |x ≥ λ min (J R ). Let J( √ n) be the Gram matrix for the Haar subspace on one row of the grid. The entries of J( √ n) are according to: Let P be the projector that projects out the subspace spanned by {|R π,...,π : π ∈ S t }. ThenJ = P J( √ n) ⊗ √ n P † . We first need the following proposition Proposition 55. If J is the Gram matrix of the vector space spanned by {|ψ π ⊗m : π ∈ S t }., then: Using this proposition λ min (J( √ n)) ≥ 1 − t(t−1) 2d √ n , and therefore λ min (J( ). This means that restricted toṼ R the minimum eigenvalue ofJ R is at least ( Proof of Proposition 54. Let C = {σ 1 , . . . , σ M }. Then h = h 1 + h 2 , where: and, We then find useful upper bounds for h 1 and h 2 separately. Suppose that C has distinct elements {τ 1 , . . . , τ K } with τ 1 appearing µ 1 times, τ 2 appearing µ 2 times, etc. Define Now we can bound h 1 by Here conv denotes the convex hull and (135) uses the fact that K ≥ 2 since σ 1 , . . . , σ M are assumes to be not all equal. To justify (135), observe that f (µ) = D µ1 + . . . + D µ K is a convex function and the maximization is over a convex set whose extreme points are P . Therefore the maximum is achieved at a point in P .
In order to find a bound on h 2 , for each σ ∈ C we will define the following vector X σ whose entries are labeled by π ∈ S t .
Then h 2 = multiprod( X σ1 , . . . , X σ M ). We can use Proposition 52 to show that We will also define X e (where e denotes the identity element of S t ) by X e,π = D −dist(e,π) .
Observe that X σ can be obtained from X e by zeroing out the elements in locations corresponding to C and reordering the remaining elements. Thus for each σ ∈ C We use Proposition 52 again to bound

Extended quasi-orthogonality of permutation operators with application to random circuits on D-dimensional lattices
In this section we prove lemmas 38, 39, 40 and 41. Before getting to the proof we go over some notation and definitions. Let Rows(D, n) := {r 1 , . . . , r n 1−1/D } be the set of rows in the D-th direction and let V Rows(D,n) be the subspace G Rows(D,n) projects onto. Then V Rows(D,n) = V Haar(r1) ⊗ . . . ⊗ V Haar(r n 1−1/D ) . A spanning set for V Rows(D,n) is H Rows(D,n) := {|D σ1,...,σ n 1−1/D : σ 1 , . . . , σ n 1−1/D ∈ S t }. Here V Haar(S) is the Haar subspace (like V Haar ) on a subset of qudits S. |D σ1,...,σ n 1/D is the basis state representing maximally entangled states for each qudit such that the qudits in the first row are permuted by π 1 , the qudits in the second row are permuted by π 2 , and so on. In other words: ..,π n 1−1/D : π 1 , . . . , π n 1−1/D ∈ S t }. Here |F π1,...,π n 1−1/D is the basis state of maximally entangled states for each qudit, such that the qudits in p 1 are permuted by π 1 , qudits in p 2 are permuted by π 2 and so on. In other words: Where 1/c D,d,n,t is a lower bound on λ min (J Planes(D) )λ min (J Rows(D,n) ).
Next, assumming max x | x|( The proof is very similar to the above calculation: In the third line we have used Lemma 24. We skip the calculations after the third line because it is similar to the calculations of (169).
Next, we prove Lemma 41. Lemma 40 is a special case of this lemma and we skip its proof.
Using the definition of Λ we can write Here we have used 1 It is enough to compute Λ ∞ . Let |a be an orthonormal basis labeled according to the indices of Λ.
First of all, using Lemma 23 T T † and T † T have the same spectra. Hence In Lemma 38 we showed that (G Rows(D,n) G Columns(D) G Rows (D,n) d n 1/D which is bounded by 1/2 for large enough n and constant t and D. As a result, Λ ∞ = 1/d O(cn 1−1/D ) . Combining this with (175) we find that 4 O(n ln 2 n)-size random circuits with long-range gates output anticoncentrated distributions Recall that for a circuit C, Coll(C) is the collision probability, of C in the computational basis. Also recall that µ (CG) t is the distribution over random circuits obtained from application of t random long-range gates. Unlike the previous section where we used t to denote the degree of a monomial, here we use t for time, i.e. the number of time-steps in a random circuit.
The goal of this section is to prove the following theorem: Theorem (Restatement of Theorem 11). There exists a c such that when s > cn ln 2 n, Moreover if t ≤ 1 3c n ln n for some large enough c , then Our strategy is to relate the convergence of the expected collision probability to a classical Markov chain mixing problem. In Section 4.1 we go over the notation and definitions we use in the proof of this theorem. In Section 4.2 we prove the theorem. This proof is based on several lemmas which we will prove in sections 4.3 and 4.5.

Background: random circuits with long-range gates and Markov chains
Previous work [39,27,15,14] demonstrates that if we only care about the second moment of µ (CG) t , then the corresponding moment superoperator is related to a certain classical Markov chain. In particular the application of the moment superoperator on the basis P 2 n := {σ p ⊗ σ p : p ∈ {0, 1, 2, 3} n } is a classical Markov chain. We now describe this connection.
We first start with some basic properties of moment superoperators.
1. If µ is a convex combination of µ 1 , . . . , µ K then Ch G (2) µ is the same convex combination of 2. If µ is the composition of a circuit from µ 1 with a circuit with µ 2 , then Ch G Recall that Ch G (2) i,j denotes Ch G U(4) applied to qubits i and j. Since µ CG 1 is a convex combination of two-qubit random U(4) gates, the first point above implies that and since µ CG t is t times compositions of µ CG 1 with itself, the second item implies that (186) The moment superoperator Ch[G (2) U(4) ] has the following simple action on the Pauli basis: In particular the action of Ch[G (2) U(4) ] on the Pauli basis P 2 2 is a stochastic matrix, and for any pair i = j the action of Ch[G (2) U(4) ] on qubits i, j can be represented by a stochastic matrix acting on P 2 n . Using (186) Ch G n is also a stochastic matrix. We can describe this stochastic matrix as a Markov chain on state space S = {0, 1, 2, 3} n , with S t ∈ S describing the string at time t.
It turns out that the expected collision probability depends on the subset of qubits that have been hit by the random circuit. In case a subset of size m of qubits (out of n qubits) never have a gate applied to them, then the expected collision probability converges to a value like ≈ 2 m 2 n and not 1 2 n +1 . So we need to separately track which qubits have ever been hit by a gate throughout this process. Let H t ∈ 2 [n] denote the set of qubits that have been hit by at least one gate by time t, where 2 [n] denotes the power set of [n].
Together (S t , H t ) can be modeled as the following Markov chain.
We can use this notation since the RHS of (188) depends only on |H|, t, n, k and not on H.
For a function f : [n] → R we define · * to be the following norm

Summary of the definitions
See below for a summary of the definitions:

Notation Definition Reference
Coll(C) the collision probability of circuit C Haar projector of order t on qudits i and j Definition 14 µ CG t the distribution over circuits with t random two-qubit gates Definition  Steps of the accelerated Markov chain Q Section 4.5 P

Proof of Theorem 11: bound on the collision probability
Before giving the proof we state the following three main theorems. The first one relates the expected collision probability to the · * norm of the probability vector on the state space of the Markov chain of weights. More concretely Theorem 58. For t > n ln(n/δ)/2 This result is proved in Section 4.3.
The second theorem shows that for t ≈ n ln 2 n, P (n) t * ≈ Constant × 1 2 n +1 , where 1 2 n +1 is the value of this norm at the stationary state.
Theorem 59. There exists a constant c such that if t = cn ln 2 n then P This result is proved in Section 4.5.
The third theorem gives an exact expression for the collision probability in terms of the Markov chain S 0 , S 1 , . . .. We use this to compute the lower bound.
The proof of this expression is the same as equation (209) and is derived in section 4.3.
Proof of Theorem 11. We first prove the upper bound. There are two major steps. Combining Theorems 58 and 59 and choosing t = cn ln 2 (n) we obtain n m e −tm/n 28 2 n−m (191) Here we need to assume n is larger than some universal constant. This can be done by adjusting c to cover the finite set of cases where n is too small. For the lower bound we use the expression in Theorem 60 and bound it according to where is the probability that a string of Hamming weight k does not change after one step of the Markov chain. Assume t ≤ 1 3c n ln n then

Proof of Theorem 58: relating collision probability to a Markov chain
In this section we relate the expected collision probability of a random circuit with long-range gates to the · * norm of the probability vector P (m) t defined in Section 4.1. We will prove several intermediate results along the way to Theorem 58.
Theorem 61 (Section 3 of [27]). We can write Proof of Theorem 60. We can write the expected collision probability in terms of the moment superoperator Ch G (2) µ CG t . We use the notation Coll µ CG It is useful to write |0 n 0 n | ⊗ |0 n 0 n | and z∈{0,1} n |z z| ⊗ |z z| in the Pauli basis: Then the collision probability becomes: Using Theorem 61 As a result For a string a ∈ {0, 1, 2, 3} n and a subset A ∈ 2 [n] we let a(A) denote the substring of a restricted to A.  Pr In other words, conditioning on hitting all sites does not increase the collision probability by very much. (In fact, it seems likely to decrease it, but it is easier to prove the upper bound here.) Proof. Our claim will follow from the fact that for any event E,

Proof of Proposition 66: collision probability is non-increasing in time
When we try to recover the original chain from the accelerated chain we find that s steps of the accelerated chain typically correspond to t = O(s) steps of the original chain, but with a significant variance. This means that our bounds on the collision probability of the accelerated chain translate only into bounds for a distribution of times of running the original chain. This issue can be addressed using the following fact.
] corresponds to an average of n(n − 1)/2 projectors (using the Hilbert-Schmidt inner product). Hence it is a psd matrix with maximum eigenvalue ≤ 1. Let α = p∈{0,3} n σ p ⊗ σ p . (204) may be written as Using (206), terms of the form σ p ⊗σ q for p = q in the decomposition of |0 n 0 n |⊗|0 n 0 n | do not contribute to the collision probability. Therefore using this observation and (221), the collision probability after t steps is proportional to Tr(αCh[G µ (CG) ] has all eigenvalues between 0 and 1, we conclude that the collision probability after t steps cannot increase in t.
This argument relied on the starting state being |0 n . There exist starting states, such as |+ ⊗n , for which the collision probability increases when random gates are applied.

Proof of Theorem 59: the Markov chain analysis
Consider the following birth-and-death Markov chain on the state space {0, 1, 2, . . . , n}.
This Markov chain is reducible in general, however if we restrict the state space to {0} or {1, 2, . . . , n} it is irreducible. Consider the following initial distribution over the state space {1, 2, . . . , n}: We claim that Proof. The proof follows from the fact that Pr[|S t | = l |S 0 | = k] = P t (k, l) which was shown in Lemma 5.2 of [27].
We now prove Theorem 59 which gives a sharp upper bound on P (n) t * . Throughout this section we drop the superscript (n). Moreover we use the notation X t := |S t |.
Proof overview: The philosophy of our analysis is to consider an acceleration of the chain P : a chain with transition matrix Q which is the same as P but moves faster. As mentioned in the introduction, previous work [27,14] considered a "fully accelerated" chain, but we will instead carefully choose the amount of acceleration so that the transition probabilities are affine functions of x. This will allow an exact solution of the dynamics of this partially accelerated chain using a method of Kac [31], as we describe in Section 4.5.4. We then analyze how much time should P wait in each step of its walk in order to simulate steps of Q. In order to do this we need to prove bounds on how many times each site of the Markov chain has been visited during the accelerated walk and based on that we count how many steps the original chain should wait. This analysis is demonstrated in Section 4.5.2. Along the way during the wait-time analysis we will further modify the partially accelerated chain to run in continuous time, so that in time t we sample t from Pois(t) (the Poisson distribution) and move t steps. This resulting chain is also exactly solvable, and the solution turns out to be extremely simple, and exemplifies the connection of the accelerated walk with the well-known Ornstein-Uhlenbeck process (see Proposition 75). We need to analyze the error from moving to continuous time, which turns out to be a straightforward analysis of the Poisson distribution. Now suppose that the accelerated chain goes through a sequence of transitions Y 0 , Y 1 , . . . , Y s . Let p(x) = P (x, x + 1) and q(x) = P (x, x − 1). We first consider the chain P conditioned on moving at every single step. This chain at site x has probability of moving forward and backwards p(x) p(x) + q(x) and , respectively. We can compute these probabilities Such a chain is called accelerated. The chain Q a was used in [27,22,15] but we will not use it in this paper. Instead of an accelerated chain we now define a partially accelerated chain as: for arbitrary probability value w(x). Setting w(x) = 2x 3n−1 the partially accelerated chain becomes affine: By "affine" we mean that the transition probabilities are degree-1 polynomials in x. Let X 0 , X 1 , . . . be the steps of the Markov chain evolving according to the transition matrix P and Y 0 , Y 1 , . . . be the Markov chain according to Q. We now describe a coupling between these two.
For x ≥ 5 6 n, let β(x) be the solution to the following equation We can solve for β(x) to find For x ≥ 5 6 n we have α(x) < 0, so from the first expression for β(x) we see that β(x) > 0. From the second expression for β(x) we can calculate the upper bound β(x) ≤ 1/4 + 6 5n .
Coupling 68. The following describes a coupling between X 0 , X 1 , . . . and Y 0 , Y 1 , . . .. It takes as input an arbitrary x ∈ [n]. We write A ← a to mean that we assign value a to variable A.
• Set X 0 ← x and Y 0 ← x.
• Repeat the following steps.
-If α(X t ) > 0 then In this case, the X chain may move more slowly than the Y chain, so one step of the Y chain corresponds to one or more steps of the X chain.

Set
-Else Otherwise, there is the possibility of advancing the X chain while the Y chain waits. This is Definition 69. For a tuple L and a number x let L left(x) be the same as L except that we remove elements which are > x. Similarly define L right(x) to be the tuple resulted from removing the elements that are < x.
Theorem 70. Assume X 0 = Y 0 and fix s > 0, and let Y := (Y 0 , Y 1 , . . . , Y s ). Define and then the process in Coupling 68 satisfies Proof. We prove this by induction on Coupling 68. For the base case we have X 0 = Y 0 . Now suppose for s > 0, Y s = X s+T left −T right . Let Y s+1 the s + 1-th step. There are two possibilities: if Y s < 5 6 n, then α(Y s ) > 0. In this case, s will be incremented once while t may be incremented many times. The number of times t will advance is distributed according to Geo(α(Y s )). Let X = Y s+1 , i.e. the location on the chain after one step of Q. We show that X is distributed according to X s+T left −T right +Geo(α(Ys)) . To see this note: then with probability β(Y s ) the X process skips this, ie, Y s = X s+1+T left −T right −1 . Let x ≥ 5 6 n. Let E + be the event that X s+T left −T right +1 = x + 1 conditioned on X s+T left −T right = x. Then which implies that Pr(E + ) = P (x, x + 1). Similarly if we define E − to be the event that X s+T left −T right +1 = x − 1 conditioned on X s+T left −T right = x, then which implies that Pr( We need the following two theorems which basically assert that 1) the wait time during the accelerated process is not too long and 2) the accelerated chain mixes after O(n ln 2 n) steps in the · * norm.
Also, the following theorem combines Theorems 71 and 72 to argue that the original Markov chain mixes rapidly in the · * norm.
Proof of Theorem 59. We need to find suitable values for t 0 , t 1 , t 2 . Let t 0 = 3n ln n so that max s∈[t0,t2] Q s ≤ 27 2 n +1 (1 + 1 poly(n) ) in Proposition 73. Next, choose c to be large enough so that (using Theorem 71) if Finally, let c > c be any constant and choose t 2 = c t 1 . Using Theorem 72 we conclude that: This implies that there exists a value t 1 ≤ t * ≤ t 2 such that Since t * is related to n ln 2 n by a constant, this implies the proof.

Wait-time analysis
In this section we prove Theorem 71. Before getting to the proof we need some preliminaries. Sites with low Hamming weight have the largest wait times. Hence, intuitively, we want to say that during the accelerated walk, these sites are not hit so often. More formally, let N x = s τ =1 I{Y τ ≤ x} and let β > 1.
Proof. We observe that N x conditioned on Y 0 = z ≥ 1 is stochastically dominated by the same variable conditioned on Y 0 = 1. The proof is by just taking the natural coupling that makes sure the latter walk is always ≤ the former. Hence we can assume that the walk starts out from Y 0 = 1 and we will obtain a valid upper bound. In [27] (see the proof of lemma A.5) the authors show that To understand these probabilities we will develop an exactly solvable analogue for Y τ . Although Y τ is a random walk in discrete time and space, we can approximate it by a process that takes place in continuous time and space. If Y τ were an unbiased random walk then we could approximate it with Brownian motion. However, it is biased to always drift towards the point 3 4 n. The continuous-time-and-space random process which diffuses like Brownian motion but is biased to drift towards a fixed point is called the Ornstein-Uhlenbeck process. We will not prove a formal connection between Y τ and the Ornstein-Uhlenbeck process, but instead will prove bounds on Y τ that are inspired by the analogous facts about Ornstein-Uhlenbeck.
Proposition 75 (Connection with the Ornstein-Uhlenbeck process). Define . (247) Then we can bound The proof is in Section 4.5.3. This proposition is inspired by the fact that the exact solution to the Ornstein-Uhlenbeck process is a Gaussian with mean and variance both equal to ν τ . We can see that once τ n ln n, this is close to a Gaussian centered at 3 4 n, i.e. the stationary distribution. Note that ν τ is an increasing function of τ , and furthermore, for ν τ ≥ x, e − (ντ −x) 2 2ντ is decreasing in ν τ , and therefore τ . Hence the sum in (246) can be bounded by Using the following inequalities we find that Since βx ν < 1 then Now following [27,22,15], define the good event A := ∩ 1≤x≤x(0) {N x ≤ β ·x}. Recall that β = 8(4+c) ln n and x(0) = ν/β.
To prove Proposition 76, we will need a bound on the minimum site visited during the accelerated walk. Let M s := min 1≤i≤s {Y i }. Then Proposition 77. Pr[M s ≤ a|Y 0 = z] ≤ s ( n a )3 a ( n z )3 z We need the following lemma which is a standard fact about Markov chains.
Lemma 78. Let Y 0 , . . . be a Markov chain with stationary distribution π then for any x, y in the state space and integer s > 0 Proof.
Proof of Proposition 77.
≤ s · π a π z using Lemma 78 = s · n a 3 a n z 3 z . (257) Now we show that the event A = ∩ 1≤x≤x(0) {N x ≤ β · x} is very likely.
Proof of Proposition 76. The proof is very similar to the proof of lemma 4.5 in Brown and Fawzi [15].
In the last line we have used the fact that M s ≤ Y 0 . Now we will handle each term in (258) separately. When x < M s , N x = 0, so x<Ms Pr[N x > β · x] = 0. Next when x ≥ Y 0 , we can use Proposition 74 to bound Pr[N x > βx] ≤ n Y0 n −c . Finally, when M s ≤ x < Y 0 , we have (263) Proof of Theorem 71. Recall that the initial position on the chain Y 0 is distributed according to a binomial around n/2. Hence it is enough to show that starting from position Y 0 on the chain the probability that the wait time is larger than the bound stated in the theorem is bounded by If such bound holds than the probability of waiting too long is bounded by We achieve this in the following. Let a be a constant. Consider the following bound on the wait-time random variable T left (s) = T left (Y 0 )+ . . . + T left (Y s ): using Propositions 77 and 76. Let ρ x = N x − N x−1 be the number of times site x has been visited during s rounds of the accelerated walk. Recall from Section 4.5 that Hence we need a concentration bound for sums of geometric random variables. Fortunately we know the following Chernoff-type tail bounds on the sum of geometric random variables.
Theorem 79 (Janson [30]). Let G = n i=1 Geo(p i ) be the sum of independent geometric random variables with parameters p 1 , . . . , p n , and let p * = min i p i and φ := n i=1 1 pi = E G, then for any λ ≥ 1 The bound we need for our results is: Corollary 80. Let G be sum of s geometric random variables with parameters p * = p 1 ≤ . . . ≤ p s , and In particular, we can say that if T > E W , then for any constant c there exists a constant c such that Proof. It is enough to show that if λ > 3c ln c then λ − 1 − ln λ > λ(1 − 1/c). Let f (λ) := λ c − ln(eλ) for c > 1, we observe that f is an increasing function for λ > c. We need to find a point λ * such that f (λ * ) > 0, and one can check that λ * = 3c ln c works.
In order to employ Corollary 80 in the context of wait time (specifically (266)) we just need to find an upper bound on the expected wait time. Now we condition on A. Hence for x ≤ x(0), N x ≤ βx. Among all possibilities given by event A, the wait time is maximized when the minimum visited site (M s ) is visited as often as possible (see  for a discussion). So it will suffice to bound the wait time for the situation when x ≤ x(0), ρ x = β and for x = x(0), ρ x = s − βx(0). In this case, the expected wait time (conditioned on any starting point) is bounded by and this completes the proof.

Proof of Proposition 75: Connection with the Ornstein-Uhlenbeck process
We first define a new Markov chain S 0 , S 1 , S 2 , . . . which is easier to analyze and gives us useful bounds for the Markov chain S 0 , S 1 , S 2 , . . ..
Definition 81. S 0 , S 1 , S 2 , . . . is the following Markov chain. The state space is {0, 1} n . The initial string S 0 is sampled uniformly at random from {0, 1} n \0 n . At each step t, S t+1 results from S t by picking a random position of S t . If it was a zero we flip it, otherwise if it was a 1 with probability 1/3 we flip it and with probability 2/3 it doesn't change.
The Hamming weight of these strings corresponds to the position on a birth-and-death chain on the state space {0, 1, 2, . . . , n}. Given a string S ∈ {0, 1} n the probability that the Hamming weight of S increases by 1 is 1 − x/n and the probability that it decreases is x 3n . Let Q be the transition matrix describing the Hamming weight.
We now claim that: Proposition 82. Starting from a string of Hamming weight ≥ 1, at any time t, Y t stochastically dominates Y t , meaning that Proof. It is enough to observe that for 0 ≤ x ≤ n, the probability of moving forward for Q is larger than the probability of moving forward for Q , and also the probability of moving backwards for Q is smaller than the probability of moving backwards for Q . Now suppose that we simulate Q for T steps. First, instead of considering T steps we consider this number to be a Poisson random variable T ∼ Pois(τ ), where τ is some positive real number. Let f l be the number of times that site l is hit after T steps. Then (f 1 , . . . , f n ) ∼ Multi(T, 1 n , . . . , 1 n ) is the number of times each position in [n] is hit after T steps. Here Multi(T, 1 n , . . . , 1 n ) is the multinomial distribution over n items summing up to T , each happening with probability 1/n.
We can then consider T in turn to be a random variable distributed according to T ∼ Pois(τ ). It turns out that defining T in this way will make f 1 , . . . , f n independent. Moreover, for any l ∈ {1, . . . , n}, In other words, the number of times each site is hit is independently distributed according to a Poisson distribution. This technique is sometimes called Poissonization. Now suppose the l'th bit of S 0 starts out from 0 and that f l = k. We find that the probability of ending up with a 1 in this case is and the probability of reaching a 0 is Using these two probabilities and taking the expectation over the Poisson measure we can compute Note that the T on the LHS is still a random variable distributed according to Pois(τ ). For the case when the l'th bit starts out equal to 1 and f l = k, we find the probabilities in a similar way. The probability of ending up in bit 1 is and the probability of ending up in 0 is We then compute As a result conditioned on |S 0 | = z, This has expectation equal to which is simply equal to ν τ , which was first introduced in (247). Next, using a simple Chernoff bound for sum of binomial random variables we can show that for all x < ν j Combining this inequality with (284) we conclude that Using Proposition 82 If τ ≥ 3 4 n ln n then 3 4 n ≥ ν τ ≥ 3 4 n − 1. Therefore n ln ne · e − 2( 3

Proof of Theorem 72: exact solution to the Markov chain Q
In this section we give an exact solution to the Markov chain Q defined in Section 4.5. Here, by giving an exact solution we mean we can find the eigenvalues and eigenvectors of the transition matrix explicitly and evaluate the norm Q t * . The construction follows nearly directly from a result of Kac [31].
Recall the transition probabilities of Markov chain Q according to Equation (227). In (227), Q is defined over the state space [n]. Without loss of generality and for convenience we can relabel the state space to {0, 1, 2, . . . , n − 1} and redefine the transition matrix according to: for i ∈ {0, 1, 2, 3, . . . , n − 1}. Now we consider the eigenvalue problem where x (λ) is a row vector with entries x (λ) (i), is the left eigenvector corresponding to the eigenvalue λ. For now we drop the superscript λ in x (λ) . Expanding this equation we have Notice that q 0 = p n−1 = 0. Define the generating function where for i ≥ n, we set x(i) = 0. It suffices to solve (294) subject to the boundary conditions x −1 = x n = 0. For i > 0 we can write assuming x −1 = 0. Using the coefficients of (293) we get For i = 0 the equation is Summing ( ∞ i=0 ) over the first term in the left-hand side of (297) we obtain Similarly for the second term we get and for the third term and for the term on the right-hand side Let λ = λ 3n−1 3(n−1) − 2 3(n−1) . Putting all of these together we obtain the following first order differential equation with the boundary conditions Assume n−1 is divisible by 4. Solving this differential equation and applying the first boundary condition (g λ (0) = x(0)) we get The second boundary condition basically says that g λ (z) should be a polynomial of degree at most n − 1. This implies that 3λ (n − 1)/4 should be an integer. Since the exponents of both the (1 + 3z) and the (1 − z) terms should be nonnegative, we can further constrain 3λ (n−1)/4 to lie in the interval [− n−1 4 , 3 n−1 4 ]. These constraints are enough to determine the n eigenvalues λ 0 , . . . , λ n−1 . They must (up to an irrelevant choice of ordering) satisfy Rearranging and solving for λ m we have The eigenvalue gap is exactly 4 3n − 1 . Note for m = 0 we get λ 0 = 1 and In the last equation we have introduced π(i), which is the stationary distribution. This is a binomial centered around 3 4 (n − 1) and shifted by 1. Its mean 3 4 n + 1 4 differs from that of the non-accelerated chain by an offset of ≈ 1 4 . We might expect a shift like this because the accelerated chain spends less time on lower values of x.
Since the stationary distribution has unit 1-norm we can evaluate The eigenvectors for each eigenvalue λ can be indirectly read from the generating function g λ (z). We use the notation x (λ) for the eigenvector corresponding to eigenvalue λ. Also we denote the i-th component of these vectors by x (λ) (i), for i ∈ {0, 1, 2, 3, . . . , n − 1}.

Exact solution to the Markov chain Q implies a good upper bound on Q t
We want to use the above exact solution to derive a bound on Q t . We begin by stating some facts.
Since x (m) 's are the left eigenvectors of Q, they can be used to find the right eigenvectors y (n) : Left and right eigenvectors are orthonormal with respect to each other, i.e., for any l, m ∈ [n − 1] We define the following inner product between functions according to which {x (m) : m ∈ [n − 1]} forms an orthonormal basis, i.e., We denote the initial distribution by Q 0 (i) = 1 2 n − 1 n i+1 . Also we denote the eigenvector corresponding to eigenvalue 1 with x (1) = π, which is the same as the stationary distribution. We write this initial vector as a combination of eigenvectors of the chain Therefore after t steps We are interested in Using Equation (316) this can be evaluated as As a result the problem reduces to evaluating the overlaps α m = (x (m) , Q 0 ).
Now we evaluate the integral 1/3 z=0 g m (z)dz. We consider two cases, one for m = 0 and one for m > 0: 1. m = 0: In this case g 0 (z) = (1 + 3z) n−1 . Therefore using Equation (310) 2. m > 0: In this case we give an upper bound on the integral As a result we conclude that The last step is to evaluate x (m) (0). In order to do this we need some insight from a well studied class of polynomials known as the Krawtchouk polynomials. It turns out the Krawtchouk polynomial naturally appears in the expansion of (1+3z) n−m−1 (1−z) m as the coefficients of z monomials. The degree-t Krawtchouk polynomial is defined as: (Elsewhere in the literature the Krawtchouk polynomials have been defined with the 3 above replaced by either 1 or some other number.) Now we evaluate the coordinates in each x (m) vector.
Hence these Krawtchouk polynomials define the eigenstates, up to overall normalization, according to Moreover using the orthogonality of the x (m) 's, we have In order to compute x (m) (0) we prove the following proposition.
Proof of Proposition 73. Let τ ∼ Unif(t 1 , t 2 ). Then We use the notation y s = (y 1 , . . . , y s ), for y j running over [ In the last step we have used the fact that T left(y s ) is a nondecreasing function of s. To bound the contribution to the · * norm, observe that (1, 1, . . . , 1) * = 1/3 + 1/3 2 + . . . ≤ 1/2. Thus the contribution from the first term is ≤ t0 There are two possibilities for the random variable |[W s−1 , W s ]| = W s − W s−1 ; one for k < 5 6 n and one for k ≥ 5 6 n: Using this in (361) and (358) This result is already established in Theorem 8, we give an alternative proof based on a reduction to a classical probabilistic process. This alternative approach may help with the analysis of random circuits on arbitrary graphs.
We use the following two statements . Proof. This result is proved by Brandão-Harrow-Horodecki in [10].
Proposition 88. Let K i an approximate 2-designs on row or column i ∈ {R, C}, in the sense that then for any sequence of rows or columns i 1 , . . . , i t Proof. This proposition is proved in Section 3.5.1.
Proof. Using the Markov chain interpretation discussed in Section 4, the initial distribution on the chain is and after the application a large enough random quantum circuit the distribution converges to V * := 1 2 n σ 0 ⊗ σ 0 + (1 − 1 2 n ) · 1 4 n − 1 p∈{0,1,2,3} n p =0 and we want to see how fast this convergence happens. For clarity, throughout this proof we represent distributions along the full lattice by capital letters (such as V ) and for individual rows or columns with small letters (such as v i for distribution v on row or column i). Also, for simplicity we write 0 instead of σ 0 ⊗ σ 0 , and σ i 0 for all zeros across row or column i. V 0 is separable across any subset of nodes. So the initial distribution along each row or column is exactly After one application of ∆ R each such distributions become √ n . Before getting to the analysis, we should first understand the main reason why Coll(∆ R ) is large. After we apply ∆ R the collision probability across each row is exactly 2 2 √ n +1 . So the collision probability across the whole lattice is ≈ 2 √ n 2 n ; which is much larger (by a factor of 2 √ n ) than what we want. The crucial observation here is that if in (387) we project out all the σ 0 terms across each row, then the bound becomes ≈ 1 2 n . So what really slows this process are the zero σ 0 terms. The issue is that, after an application of ∆ R , all zeros states get projected to themselves. However, if one applies ∆ C they get partially mix with other rows. So the objective is to show that after application of ∆ C ∆ R for constant number of times, these zeros disappear with large enough probability.
Let V s be the distribution along the full chain after we apply (∆ C ∆ R ) s . Eventually we want to compute Here we have defined the map κ : A → 1 2 n Tr(V 0 A).
As a result An important observation here is that the relevant information here is that when κ is applied to the summation in (391), it amounts to In other words, each σ 0 term contributes to the number 1 in the above summation. That means if we had started with the distribution then we would have obtained κ(V ) = 2 2 n + 1 1 + 1 poly(n) , which is exactly what we want. The last relevant piece of information is that if v j is a distribution over row j that with probability 1 contains a nonzero item, then when ∆ j is applied to it, it will instantly get mapped to v j . This phenomenon is related to strong stationarity in Markov chain theory. We claim that after the first application of ∆ C , the expected collision probability is according to the bound claimed in this theorem. In order to see this, we consider the distribution V 1 ((391)), this time along each column. Note that the distribution along columns. For any set of columns j 1 , . . . , j k let E j1,...,j k be the event that these columns are all zeros, and the rest of the columns have at least one non-zero element in them. Here we use the notation E j1,...,j k ≡ E y for y ∈ {0, 1} √ n such that the j 1 , . . . , j k locations of y are ones and the rest of its bits are zeros. Therefore and this completes the proof.

Generalization to arbitrary D-dimensional case
See Section 2.1 for definitions in this section. In particular, we need definitions for Ch[g i ], K i and ∆ i for each coordinate i of the lattice, and K t = ( i k i ) t . In this section we prove that Theorem 90. D-dimensional O(Dn 1/D +D ln( D ))-depth random circuits on n qubits have expected collision probability 2 2 n +1 1 + 1 poly(n) .
Proof. The proof is basically a generalization of the proof for Theorem 86. Here we sketch an outline and avoid repeating details. In particular, we need generalizations of Lemma 87 and Proposition 88 The generalization of Lemma 87 is simply that k t i for t = O(n 1/D + ln D ) is an d -approximate 2-design. Proposition 88 naturally generalizes to: if for each coordinate K i is an D -approximate 2-design then Our objective is then to show Coll i Ch[G i ] = 2 2 n + 1 1 + 1 poly(n) .
This last step may be the most non-trivial part in this proof.
Here we just outline the proof. For detailed discussions see the proof of Proposition 89. We first separate the all zeros state of the chain which contributes as 1/2 n to the expected collision probability. After the application of G 1 on the first coordinate, each row in this coordinate, will be all zeros vector with probability 1/2 n 1/D and V with probability 1 − 1/2 n 1/D . After the application of G 2 each plane in the direction 1, 2 will be all zeros with probability ≈ 1/2 2n 1/D and V with probability ≈ 1 − 1/2 2n 1/D . After the application of G 3 each plane in 1, 2, 3 direction is all zeros with probability ≈ 1/2 3n 1/D and V otherwise, and so on. Eventually after the application of G d the distribution along the chain is all zeros with probability ≈ 1/2 Dn 1/D and V otherwise. At this point the distribution along each individual row in each coordinate is ≈ 1/2 Dn 1/D 0 + (1 − 1/2 Dn 1/D )V . So the collision probability across each such row is Therefore the collision probability across the full chain is Proof. Set D = ln n in Theorem 90.

Scrambling and decoupling with random quantum circuits
In this section we reconstruct some of the results of Brown and Fawzi [15,14]. The paper [14] proves random circuit depth bounds required for scrambling and some weak notions of decoupling. We are able to use our proof technique to reconstruct and improve on the results of this paper. [15] on the other hand introduces a stronger notion of decoupling with random circuits. Unfortunately our method does not seem to yield any results about this model. We first define an approximate scrambler based on [14].
Definition 92 (Scramblers). µ is an -approximate scrambler if for any density matrix ρ and subset S of qubits with |S| ≤ n/3 where ρ S (C) = Tr \S CρC † and Tr \S is trace over the subset of qubits that is complimentary to S.
We show that small depth circuits from µ lattice,n Brown and Fawzi show a circuit depth bound of O(ln 2 n) for random circuits with long-range interactions. Our result improves this to O(ln n ln ln n) depth. We believe that the right bound should be O(ln n). Moreover, no bound for the case of D-dimensional lattices was mentioned in their result.
We first find an expression for Tr \S (CρC † ) i,j k,l g1,h1 g2,h2 p,q ρ i,j ρ kl C i,g1;p C * j,h1;p C i,g2;q C * j,h2;q δh 1=g2 δh both ρ ⊗ ρ and A are psd therefore using Lemma 32 Next, using Equation 3 of [14] we reduce computation of Tr(ρ ⊗ ρ 1≤i≤D Ch[G i ](A)) to the following probabilistic process: starting from a uniform distribution over {0, 3} n \I n show that the probability that after the application 1≤i≤D Ch[G i ] the string on Markov chain K defined in Section 4 has weight ≤ n/3 is poly(n)/2 n and this reconstructs theorem A.1 of [14].
The initial state on the chain is 1 2 n p∈{0,3} n \00 σ p ⊗ σ p we add the term 1 2 n σ 0 ⊗ σ 0 this can only slower the process. With this modification each site is initially independently Z ⊗ Z or I ⊗ I, each with probability 1/2.
From using the proof of Theorem 90 after the application i Ch[G i ] the distribution along the each row is ≈ 1/2 Dn 1/D σ 0 ⊗ σ 0 + (1 − 1/2 Dn 1/D )V . Therefore the probability that each site is zero is at most 1/4 + 1/2 Dn 1/D =: 1/4 + δ =: p 0 . Hence the probability of having at most n/3 is at most with a proper constant, this value is 1 + 1/ poly(n).
Next, we consider the following notion of decoupling defined in [14]. Consider a maximally entangled state Φ M M along equally sized systems M and M each with m qubits, and a pair of equally sized systems A and A . Similar to [14] we consider two models for AA : 1) a pure state |0 A 0| along system A with n − m qubits and 2) a maximally entangled state φ AA . We then apply a random circuit to systems M A and we want that for a small subsystem S of M the final state ρ M S (t) be decoupled in the sense that ρ M S (t) ≈ I/2 m+s .
Proof. Consider the generating function Now for all real values y and z consider the overlap (g p (y), g p (z)). On the one hand (g p (y), g p (z)) = N x=0 N x p x q N −x g p,x (y)g p,x (z), z t+s (k (t) , k (s) ).