Efficient Unitary Designs with a System-Size Independent Number of Non-Clifford Gates

Many quantum information protocols require the implementation of random unitaries. Because it takes exponential resources to produce Haar-random unitaries drawn from the full n-qubit group, one often resorts to t-designs. Unitary t-designs mimic the Haar-measure up to t-th moments. It is known that Clifford operations can implement at most 3-designs. In this work, we quantify the non-Clifford resources required to break this barrier. We find that it suffices to inject \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(t^{4}\log ^{2}(t)\log (1/\varepsilon ))$$\end{document}O(t4log2(t)log(1/ε)) many non-Clifford gates into a polynomial-depth random Clifford circuit to obtain an \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document}ε-approximate t-design. Strikingly, the number of non-Clifford gates required is independent of the system size – asymptotically, the density of non-Clifford gates is allowed to tend to zero. We also derive novel bounds on the convergence time of random Clifford circuits to the t-th moment of the uniform distribution on the Clifford group. Our proofs exploit a recently developed variant of Schur-Weyl duality for the Clifford group, as well as bounds on restricted spectral gaps of averaging operators.

The multitude of applications motivates the search for efficient constructions of unitary t-designs [24][25][26][27][28]. In particular, Brandao, Harrow and Horodecki [24] show that local random circuits on n qubits with O(n 2 t 10 ) many gates give rise to an approximate t-design. In practice, it is often desirable to find more structured implementations. Designs consisting of Clifford operations would be particular attractive from various points of view: (i) Because the Clifford unitaries form a finite group, elements can be represented exactly using a small number (O(n 2 )) of bits. (ii) The Gottesman-Knill Theorem ensures that there are efficient classical algorithms for simulating Clifford circuits. (iii) Most importantly, in fault-tolerant architectures [29,30], Clifford unitaries tend to have comparatively simple realizations, while the robust implementation of general gates (e.g. via magic-state distillation) carries a significant overhead. The difference is so stark that in this context, Clifford operations are often considered to be a free resource, and the complexity of a circuit is measured solely in terms of the number of non-Clifford gates [31,32].
The Clifford group is known to form a unitary t-design for t = 2 [9] and t = 3 [33][34][35], but fails to have this property for t > 3 [33][34][35][36][37]. In fact, the Clifford group is singled out among the finite subgroups of the unitary group by being a 3-design [38]. Moreover, Refs. [38,39] together imply that any local gate set that generates an exact unitary design of order t > 3 must necessarily be universal, c.f. the discussion in Sect. 5. Hence, any efficient design construction for t > 3 can only be approximate, and the Clifford group seems to be a distinguished starting point.
This leads us to the central question underlying this work: How many non-Clifford gates are required to generate an approximate unitary t-design? A direct application of the random circuit model of Ref. [24] yields an estimate of O(n 2 t 10 ) non-Clifford operations. In this paper we show that a polynomial-sized random Clifford circuit, together with a system size-independent number of O(t 4 log 2 (t)) non-Clifford gates -a "homeopathic dose" -is already sufficient.
We establish this main result for two different circuit models (Fig. 1). In Sect. 1.1, we consider alternating unitaries drawn uniformly from the Clifford group with a non-Clifford gate. This gives rise to an efficient quantum circuit, as there are classical algorithms for sampling uniformly from the Clifford group, and for producing an efficient gate decomposition of the resulting operation [40]. A somewhat simpler model is analyzed in Sect. 1.2. There, we assume that the Clifford layers are circuits consisting of gates drawn form a local Clifford gate set. These circuits will only approximate the uniform measure on the Clifford group. Theorem 2, which might be of independent interest, gives novel bounds on the convergence rate.
The key to this scaling lies in the structure of the commutant of the t-th tensor power of the Clifford group, described by a variant of Schur-Weyl duality developed in a sequence of recent works [36,[41][42][43]. There, it has been shown that the dimension of this commutant -which measures the failure of the Clifford group to be a t-design from a representation theoretical perspective -is independent of the system size. Refs. [36,42] have used this insight to provide a construction for exact spherical t-designs that consist of a system size-independent number of Clifford orbits. It has been left as an open problem whether these ideas can be generalized from spherical designs to the more complex notion of unitary designs, and whether the construction can be made efficient [42]. The present work resolves this question in the affirmative. Finally, we note that in Ref. [44], it has been observed numerically that adding a single T gate to a random Clifford circuit has dramatic effects on the entanglement spectrum. A relation to t-designs was suspected. Our result provides a rigorous understanding of this observation.

Approximate t-designs with few non-Clifford gates.
To state our results precisely, we need to formalize the relevant notion of approximation, as well as the circuit model used. Let ν be a probability measure on the unitary group U (d). The measure ν gives rise to a quantum channel which applies U ⊗t , with U chosen according to ν. We will refer to M t (ν) as the t-th moment operator associated with ν. Following Ref. [27], we quantify the degree to which a measure approximates a t-design by the diamond norm distance of its moment operator to the moment operator of the Haar measure μ H on U (d).

Definition 1 (Approximate unitary design).
Let ν be a distribution on U(d). Then ν is an (additive) ε-approximate t-design if Denote the uniform measure on the multiqubit Clifford group Cl(2 n ) by μ Cl , and let K be some fixed single-qubit non-Clifford gate. The circuit model we are considering ( Fig. 1) interleaves Clifford unitaries drawn from μ Cl , with random gates from {K , K † , 1} acting on an arbitrary qubit. 1 Note that the concatenation of two unitaries drawn from measures ν 1 and ν 2 is described by the convolution ν 1 * ν 2 of the respective measures. We thus arrive at this formal definition of the circuit model: Definition 2 (K -interleaved Clifford circuits). Let K ∈ U (2). Consider the probability measure ξ K that draws uniformly from the set {K ⊗ 1 2 n−1 , K † ⊗ 1 2 n−1 , 1 2 n }. A Kinterleaved Clifford circuit of depth k is the random circuit acting on n qubits described by the probability distribution σ k := μ Cl * ξ K * · · · * μ Cl * ξ K k times . ( For convenience, we work with the logarithm of base 2: log(x) := log 2 (x). We are now equipped to state the main result of this work in the form of a theorem: Theorem 1 (Unitary designs with few non-Clifford gates). Let K ∈ U (2) be a non-Clifford unitary. There are constants C 1 (K ), C 2 (K ) such that for any k ≥ C 1 (K ) log 2 (t) (t 4 + t log(1/ε)), a K -interleaved Clifford circuit with depth k acting on n qubits is an additive ε-approximate t-design for all n ≥ C 2 (K )t 2 .
We give the proofs of this theorem in Sect. 3. In Theorem 1, we consider uniformly drawn multiqubit Clifford unitaries. This can be achieved with O(n 3 ) classical random bits [40] and then implemented with O(n 2 / log(n)) gates [45]. Combined with these results, Theorem 1 implies an overall gate count of O(n 2 / log(n)t 4 log 2 (t)) improving the scaling compared to Ref. [24] in the dependence on both t and n. In this sense, our construction can be seen as a classical-quantum hybrid construction of unitary designs: The scaling is significantly improved by outsourcing as many tasks as possible to a classical computer. A construction in which all parts of the random unitary are local random circuits is considered in Corollary 2.
For designs generated from general random local circuits, numerical results suggest that convergence is much faster in practice than indicated by the proven bounds [46]. We expect that a similar effect occurs here, and that in fact very shallow K -interleaved Clifford circuits are sufficient to approximate t-designs. This intuition is supported by the numerical results of Ref. [44], which show that even a single T -gate has dramatic effects on the entanglement spectrum of a quantum circuit.
It is moreover noteworthy that circuits with few T -gates can be efficiently simulated [47][48][49][50][51]. The scaling of these algorithms is polynomial in the depth of the circuit, but exponential in the number of T -gates. Combined with our result, this implies that for fixed additive errors ε, there are families of ε-approximate unitary O(log(n))-designs simulable in quasi-polynomial time. For the general random quantum circuit model, it is conjectured that a depth of order O(nt) suffices to approximate t-designs [24,52]. If such a linear scaling is sufficient in our model, the quasi-polynomial time estimate for classical simulations would improve to polynomial.
For the proof of Theorem 1 we need to analyse the connection between the t-th moment operator of the Haar measure and the commutant of the diagonal action of the Clifford group. The latter was proven to be spanned by representations of so-called stochastic Lagrangian sub-spaces in Ref. [42]. In particular, we prove almost tight bounds on the overlap of the Haar operator with these basis vectors in Lemma 13 that might be of independent interest. This will allow us to invoke a powerful theorem by Varjú [53] on restricted spectral gaps of probability distributions on compact Lie groups to show that non-Clifford unitaries have a strong impact on representations of Lagrangian sub-spaces that are not also permutations. We combine this insight with a careful combinatorial argument about the Gram-Schmidt orthogonalization of the basis corresponding to stochastic Lagrangian sub-spaces to bound the difference to a unitary t-design in diamond norm.
Moreover, the bound for Theorem 1 allows us to prove a corollary about the stronger notion of relative approximate designs: where A B if and only if B − A is completely positive.
Corollary 1 (K -interleaved Clifford circuits as relative ε-approximate t-designs). There are constants C 1 (K ), Hence, if we drop the system-size independence, we can achieve a scaling of O(nt) at least until t ∼ √ n. While we believe the setting of K -interleaved Clifford circuits to be the more relevant case, the same method of proof works for Haar-interleaved Clifford circuits. Here, we draw not from the gate set {K i , K † i , 1}, but instead Haar-randomly from U (2). The advantage is that we obtain explicit constants for the depth, while the depth in the Kinterleaved setting has to depend on a constant (as K might be arbitrarily close to the identity).
Similarly, variants of Corollary 1 for Haar-interleaved Clifford circuits can be obtained, here also without the log 2 (t) dependence. Finally, we discuss an application to higher Rényi entropies in "Appendix D".

Local random Clifford circuits for Clifford and unitary designs.
The circuits considered in the previous section require one to find the gate decomposition of a random Clifford operation. In this section, we analyze the case where the Clifford layers are circuits consisting of gates drawn from a local set of generators.
As a first step, we establish that a 2-local random Clifford circuit on n qubits of depth O(n 2 t 9 log −2 (t) log(1/ε)) constitutes a relative ε-approximate Clifford t-design, i.e., reproduces the moment operator of the Clifford group up to the t-th order with a relative error of ε. We consider local random Clifford circuits that consist of 2-local quantum gates from a finite set G with is closed under taking the inverse and generates Cl(4). We refer to such a set as a closed, generating set. A canonical example for such a closed, generating set is {H ⊗ 1, S ⊗ 1, S 3 ⊗ 1, CX} where H is the Hadamard gate, S is the phase gate and CX is the cNOT-gate [54]. Such a set G induces a set of multi-qubit Clifford unitariesĜ ⊂ Cl(n) by acting on any pair of adjacent qubits on a line, where we adopt periodic boundary conditions. We then define the corresponding random Clifford circuits.
Definition 4 (Local random Clifford circuit). Let G ⊂ Cl(4) be a closed, generating set containing the identity. Define the probability measure σ G as the measure having uniform support onĜ ⊂ Cl(n) acting on n qubits. A local random Clifford circuit of depth m is the random circuits described by the probability measure σ * m G . For technical reasons, we again assume that the identity is part of the generating set. This assumption can be avoided but simplifies the argumentation in the following. As for the Definition 2 of K -interleaved Clifford circuits before, any upper bound on the depth of local random Clifford circuits with identity is a bound for those without.
Our result on local random Clifford circuits even holds for a stronger notion for approximations of designs, namely relative approximate designs.
Definition 5 (Relative approximate Clifford t-designs). Let ν be a probability measure on Cl(2 n ). Then, ν is a relative ε-approximate Clifford t-design if With this definition, our result reads as follows.
The proof of the theorem is given in Sect. 4. This result is a significant improvement over the scaling of O(n 8 ), which is implicit in Ref. [9]. We can combine this result with the bounds obtained in Sect. 3. To this end, consider a random circuit that k-times alternatingly applies a local random Clifford circuit of depth m, and a unitary drawn from the probability measure ξ K . The corresponding probability measure is For these local random circuits we establish the following result: Corollary 2 (Local random unitary design). Let K ∈ U (2) be a non-Clifford gate and let G ⊂ Cl(4) be a closed, generating set. There are constants the local random circuit σ k,m , defined in (6), is an ε-approximate unitary t-design for all n ≥ C 3 (K )t 2 .
The complete argument for the corollary is given at the end of Sect. 4. After introducing technical preliminaries in Sect. 2, the remainder of the paper, Sect. 3 and Sect. 4, is devoted to the proofs of Theorem 1, Theorem 2 and the Corollary 2. Finally, in Sect. 5 we elaborate on and formalize as Proposition 3 the observation that there exists no non-universal gate set generating exact 4-designs for arbitrary system size. This observation is an immediate consequence of the classification of finite unitary t-groups and a criterion for the universality of finite gate sets [38,39,55].

Operators and superoperators.
Given a (finite-dimensional) Hilbert space H, we denote with L(H) the space of linear operators on H with involution † mapping an operator to its adjoint with respect to the inner product on H. L(H) naturally inherits a Hermitian inner product, the Hilbert-Schmidt inner product ∀A, B ∈ L(H).
As this definition already suggests, we will use "operator kets and bras" whenever we think it simplifies the notation. Concretely, we write |B ) = B and denote with (A | the linear form on L(H) given by Following common terminology in quantum information theory, we call linear maps φ : L(H) → L(H) on operators "superoperators". We use φ † to denote the adjoint map with respect to the Hilbert-Schmidt inner product. Note that with the above notation, φ = |A )(B | defines a rank one superoperator with φ † = |B )( A |. Moreover, we will denote by the superoperator Ad A := A · A −1 the adjoint action of an invertible operator A ∈ GL(H) on L(H). For notational reasons, we sometimes write Ad(A) instead of Ad A . We consistently reserve the notation · p for the Schatten p-norms where σ (A) is the vector of singular values of A. In particular, we use the trace norm p = 1, the Frobenius or Hilbert-Schmidt norm p = 2 and the spectral norm p = ∞. Clearly, this norms can be defined for both operators and superoperators and we will use the same symbol in both cases. For the latter, however, there is also a family of induced operator norms Note that · 2→2 ≡ · ∞ . Finally, we are interested in "stabilized" versions of these induced norms, in particular the diamond norm The following norm inequality will be useful [56] φ

Commutant of the diagonal representation of the Clifford group.
In this section, we review some of the machinery developed in Ref. [42]. Recall that the n-qubit Clifford group Cl(n) is defined as the unitary normalizer of the Pauli group P n as Here, we followed the convention to restrict the matrix entries to rational complex numbers. This avoids the unnecessary complications from an infinite center U (1) yielding a finite group with minimal center Z (Cl(n)) = Z (P n ) Z 4 . The Clifford group can equivalently be defined in a less conceptual but more constructive manner: It is the subgroup of U(2 n ) generated by CX, the controlled not gate, the Hadamard gate H and the phase gate S.
For this work, the t-th diagonal representation of the Clifford group, defined as will be of major importance. It acts naturally on the Hilbert space ((C 2 ) ⊗n ) ⊗t which can be seen as t copies of an n-qubit system. However, it will turn out that the operators commuting with this representation naturally factorize with respect to a different tensor structure on this Hilbert space, namely ((C 2 ) ⊗t ) ⊗n ((C 2 ) ⊗n ) ⊗t . Because of the different exponents, it should be clear from the context which tensor structure is meant. We will make ubiquitous use of the description of the commutant of the diagonal representation in terms of stochastic Lagrangian sub-spaces [42]: Definition 6 (Stochastic Lagrangian sub-spaces). Consider the quadratic form q : Z 2t 2 → Z 4 defined as q(x, y) := x ·x − y · y mod 4. The set t,t denotes the set of all sub-spaces T ⊆ Z 2t 2 being subject to the following properties: 1. T is totally q-isotropic: x · x = y · y mod 4 for all (x, y) ∈ T . 2. T has dimension t (the maximum dimension compatible with total isotropicity). 3. T is stochastic: (1, . . . , 1) ∈ T .
We call elements in t,t stochastic Lagrangian sub-spaces. We have With this notion, we can now state the following key theorem from Ref. [42].
Since the representation in question is fixed throughout this paper, we will simplify the notation from now on and write Cl(n) ≡ τ (t) (Cl(n)) . To make use of a more sophisticated characterization of the elements r (T ) developed in Ref. [42,Section 4], we need the following definitions. Definition 7 (Stochastic orthogonal group). Consider the quadratic form q : Z t 2 → Z 4 defined as q(x) := x · x mod 4. The stochastic orthogonal group O t is defined as the group of t × t matrices O with entries in Z 2 such that q(Ox) = q(x) for all x ∈ Z t 2 . The subspace T O := {(Ox, x), x ∈ Z t 2 } is a stochastic Lagrangian subspace. Moreover, the operator r (O) := r (T O ) is unitary. We will therefore canonically embed the orthogonal stochastic group O t ⊂ t,t . Notice that the permutation group on t objects, referred to as S t , may be embedded into O t by acting on the standard basis of Z t 2 . Together with O t , the following definition can be used to fully characterize the set of stochastic Langrangian sub-spaces, t,t .

Definition 8 (Defect sub-spaces).
A defect subspace is a subspace N ⊆ Z t 2 which is isotropic with respect to q, that is, that q(x) = 0 for all x ∈ N .
The quadratic form q is what is known as a generalized quadratic refinement of the bi-linear form defined by the inner product (x, y) → x · y mod 2 (see, e.g., Ref. [57,App. A] for a self-contained discussion). In the following, the ortho-complement N ⊥ of a subspace N ⊆ Z t 2 is taken with respect to the inner product modulo 2, Notice that q(x) = 0 implies that x · 1 t = 0 mod 2, where 1 t := (1, . . . , 1) T is the all-ones vector. Thus, we do not need a separate clause requiring 1 t ∈ N ⊥ in the definition of defect sub-spaces (compare Ref. [42,Def. 4.16]). Moreover, one may verify that 2q(x) = 2x · 1 t mod 4. This implies, similarly, that if O preserves q, then O1 t = 1 t . Borrowing the language of [42], all q-isometries are stochastic (compare the definition of the orthogonal stochastic group in that reference, [42,Def. 4.11]). The reason for these simplifications is that here we focus on the qubit case exclusively, while Ref. [42] works simultaneously for qubits and odd qudits. We use the names stochastic orthogonal group and defect subspace (rather than simply q-isometry group and isotropic subspace) to keep with the notation of that reference.
For any defect subspace N , it holds that N ⊆ N ⊥ (and thus dim N ≤ t/2). Because of this, defect sub-spaces N ⊆ Z t 2 define Calderbank-Shor-Sloane (CSS) codes CSS(N ) where the action of the multi-qubit Pauli operators is Z ( p) |x := (−1) p·x |x and X (q) |x := |x + q for x ∈ Z t 2 . The corresponding projector is given by Since the order of the stabilizer group is 2 2 dim N , P N projects onto a 2 t−2 dim N -dimensional subspace of (C 2 ) ⊗t . For N = {0} we set P CSS(N ) := 1. We summarize the findings of Ref. [42,Sect. 4] in Thm. 4. We give a short proof to give an explicit relation between this theorem and the results of that work.

Theorem 4 ([42]
). Consider T ∈ t,t , then for O, O ∈ O t and N , N are unique defect sub-spaces with dim N = dim N .
Proof. Recall from Ref. [42] that the code space range P CSS(N ) has an orthonormal basis of coset state vectors given by . This way, Comparing this equation to [42,Lem. 4.23] we see that the set is equal to the set of r (T ) operators with right defect subspace given by N , i.e., with T R D = N in the notation of that reference. This way, varying over N we obtain the full set t,t . The existence of a decomposition 2 dim N P CSS(N ) r (O ) follows from the above by noting that

Lemma 1 (Norms of r (T )). Suppose r (T ) = 2 dim N r (O)P N as in Theorem 4. Then it holds:
Proof. Since any Schatten p-norm is unitarily invariant, we have r (T ) p = 2 dim N P N p . The statements follow from rank P N = 2 t−2 dim N .
In the following, we will often work with a normalized version of the r (T ) operators which we define as

Approximate Unitary t-Designs
In this section, we give a bound on the number of non-Clifford gates needed to leverage the Clifford group to an approximate unitary t-design. This is made precise by the following two theorems which rely on two distinct proof strategies and come with different trade-offs.

Theorem 1 (Unitary designs with few non-Clifford gates). Let K ∈ U (2) be a non-Clifford unitary.
There are constants C 1 (K ), C 2 (K ) such that for any k ≥ C 1 (K ) log 2 (t) (t 4 + t log(1/ε)), a K -interleaved Clifford circuit with depth k acting on n qubits is an additive ε-approximate t-design for all n ≥ C 2 (K )t 2 .
Recall from Def. 2 that a K -interleaved Clifford circuit has an associated probability measure σ K := (μ Cl * ξ K ) * k where ξ K is the measure which draws uniformly from {K , K † , 1} on the first qubit. Let us introduce the notation Then, our goal is to bound the deviation of the moment operator from the Haar projector P H ≡ M t (μ H ) in diamond norm. Using that P H is invariant under left and right multiplication with unitaries, we have the identity for any mixed unitary channel A. Thus, we can rewrite the difference of moment operators as where we have introduced the shorthand notation P Cl := M t (μ Cl ).
Remark 1 (Non-vanishing probability of applying the identity). We apply K , K † with equal probability in Theorem 1 such that R(K ) is Hermitian. The non-vanishing probability of applying 1, i.e., of doing nothing, is necessary in the proof of Lemma 2, because we require the probability distribution ξ K * ξ K to have non-vanishing support on a non-Clifford gate. If ξ K is the uniform measure on K and K † , then ξ K * ξ K has support on K 2 , (K † ) 2 and 1. We can hence drop this assumption for gates that do not square to a Clifford gate. This is not the case for e.g. the T -gate.
Our proof strategy for Theorem 1 makes use of the following two lemmas which are proven in Sects. 6.1 and 6.2. The first lemma is key to the derivations in this section. It is based on a bound (Lemma 13) on the overlap of stochastic Lagrangian sub-spaces with the Haar projector and Theorem 5, a special case of a theorem about restricted spectral gaps of random walks on compact Lie groups due to Varjú [53].

Lemma 2 (Overlap bound). Let K be a single qubit gate which is not contained in the Clifford group. Then, there is a constant c(K ) > 0 such that
The second lemma is of a more technical nature.

Lemma 3 (Diamond norm bound)
. Consider T 1 , T 2 ∈ t,t and denote with N 1 , N 2 their respective defect spaces. Then, it holds that The difficulty of using these results to bound the difference stems from the following reason: The range of the projector P Cl − P H is the orthocomplement of the space spanned by permutations Q ⊗n π for π ∈ S t within the commutant of the Clifford group spanned by the operators Q ⊗n T . Although this is a conveniently factorizing and well-studied basis, it is non-orthogonal. Thus, the projectors do not possess a natural expansion in this basis and we can not directly use the above bounds. However, we can write it explicitly in a suitable orthonormal basis of the commutant obtained by the Gram-Schmidt procedure from the basis {Q ⊗n T | T ∈ t,t }. We summarize the properties of this basis in the following lemma.

Lemma 4 (Properties of the constructed basis
be an enumeration of the elements of t,t such that the first t! spaces T j correspond to the elements of S t . Then, the {E j } constitutes an orthogonal (but not normalized) basis, where Denote by N i the defect space of T i . For n ≥ 1 2 (t 2 + 5t), we have Moreover, it holds that We believe that the explicit bounds in Lemma 4 might be of independent interest in applications of the Schur-Weyl duality of the Clifford group. For the sake of readibility, and as Theorem 1 holds up to an inexplicit constant, we will bound all polynomials in t by their leading order term in the following. Specifically, the bounds in Lemma 4 will be simplified by using the inequalities which hold for all positive integers t.
Proof of Theorem 1. Notice that from (25), we have the expression We now bound each of the factors in each term above. First, we compute the squared norm of E j , Using Eqs. (32) and (33), we thus bound and in the same way Now we use that n ≥ 16t 2 . Letting x : We now focus on the second factor, If for Q T r R(K ) Q T l one of the stochastic Lagrangian sub-spaces does not correspond to a permutation, Lemma 2 introduces a factor of η K ,t . If both correspond to a permutation, we redefine the factors in a way that leads to simpler expressions in the calculations used below. Namely, in this case we redefine A r,i and A l, j by multiplying it with 2. This is compensated by introducing a factor of 1 4 and lettinḡ We can do this as i and j do not correspond to permutations and hence A r, j and A l j are exponentially suppressed, which remains true after rescaling by 2. In this case, moreover, r < t!+1 ≤ i and l < t!+1 ≤ j, so the factor |A r,i A l, j | will be exponentially suppressed according to (32) and so this redefinition will not affect the asymptotic scaling in n. We provide two bounds for | (E i | R(K ) E j | that will be used later on. We will use repeatedly that the diamond norm is multiplicative under the tensor product of superoperators [58,Thm. 3.49]. First, using (31), (33) and (28), we obtain where we have used 2 | dim N l −dim N r | ≤ 2 t ≤ 2 t 3 , and the fact that for the rescaled A r,i , the inequality (31) implies for all r, i. Moreover, we have used the triangle inequality, in the inequality (49). The second bound follows from Eqs. (32) and (33), and we consider two cases. If i = j, then Otherwise, In inequality (54), we have bounded the term r = l = i using (33), and each of the other terms using (32). Moreover, in the inequalities (55) and (56) we use that i ≤ | t,t |, and Lastly, we obtain from (31) and (27) We now start piecing these expressions together to bound (40). Eqs. (59) and (44) give To bound (60), we will bunch together the contribution of all terms whose sequence { j 1 , . . . , j k } contains l changes. Moreover, we will treat differently the cases l ≤ t/2 and l > t/2 . In the former case, we use (50) to get In this case, the factor of 2 n(dim N j k −dim N j 1 ) coming from (59) is cancelled by the last factor of 2 −n| dim N j k −dim N j 1 | . In the latter case, we turn to (52) instead to obtain Here, the exponential factor coming from (59) is cancelled by 2 −ln since dim N j k − dim N j 1 ≤ t/2 . Counting the instances of sequences with l changes, we may put these considerations together to bound where we have used in ‡ that Finally, noting that 2 32t 4 + 2 18t 3 ≤ 2 33t 4 for all positive integers t, we obtain the bound whereη K ,t is bounded by Lemma 2. Taking the logarithm and using the inequality log(1 + x) ≤ x repeatedly, this implies Theorem 1.
With the above bound, we can also prove Corollary 1.
Proof of Corollary 1. Consider the self-adjoint superoperator A := P Cl R(K )P Cl . As P Cl is a projector, we have with Eq. (24) Using norm inequality between operator and diamond norm Eq. (12) and the previous result Eq. (62), we find Taking the k-th square root of the expresion above, we obtain a sequence of infinitely many bounds for ||A − P H || ∞ which converges as k → ∞. That limit gives Combined with Ref. [24,Lem. 4], Eq. (65) implies the result.
The bound in Eq. (62) also suffices to prove Proposition 1: Proof of Proposition 1. The proof follows exactly as the proof of Theorem 1, but with the factor 7/8 instead ofη K ,t (compare Lemma 13). Using log 2 (7/8) ≤ −0.19 the result can be checked.

Convergence to Higher Moments of the Clifford Group
In this section, we aim to prove: The proof of Theorem 2 follows a well-established strategy [24,59] in a sequence of lemmas. For the sake of readibility, the proofs of these lemmas have been moved to Sect. 6.4. Given a measure ν on the Clifford group Cl(n), recall that its t-th moment operator was defined as The idea of the proof is that if M t (ν) is close to the moment operator M t (μ Cl ) ≡ P Cl of the uniform (Haar) measure μ Cl on the Clifford group, ν is an approximate Clifford design. However, we have seen that there are different notions of closeness. We define its deviation in (superoperator) spectral norm as Then, we prove the following lemma in Sect. 6.4.
Recall that we have defined the measure σ G on the Clifford group Cl(n) in Def. 4 by randomly drawing from a 2-local Clifford gate set G and applying it to a random qubit i, or to a pair of adjacent qubits (i, i + 1), respectively. For this measure, we show that it fulfills the assumptions of Lemma 5: Proposition 2 (Clifford expander bound). Let σ G be as in Def. 4 and n ≥ 12t. Then, g Cl (σ G , t) ≤ 1 − c(G)n −1 log 2 (t)t −8 for some constant c(G) > 0.
We will prove Proposition 2 in the end of this section. From this, Theorem 2 follows as a direct consequence: Proof of Theorem 2. First, note that g Cl (ν * k , t) = g Cl (ν, t) k for all probability measures ν on the Clifford group. This can be easily verified using the observation Hence, combining the bound given by Proposition 2 and Lemma 5, we find that the k-step random walk σ * k G is a ε-approximate Clifford t-design, if we choose k = O n log −2 (t)t 8 (2nt + log(1/ε))).
For the sake of readibility, let us from now on drop the dependence on G and write σ ≡ σ G . In order to prove Proposition 2, we use a reformulation of g(σ, t) based on the following observation. Since G is closed under taking inverses, the moment operator M t (σ ) is self-adjoint with respect to the Hilbert-Schmidt inner product. Due to σ being a probability measure, its largest eigenvalue is 1 with eigenspace corresponding to the operator subspace which is fixed by the adjoint action Ad(g ⊗t ) of all generators [59]. Equivalently, this is the subspace of operators which commute with any generator g ⊗t . However, any operator commuting with all generators also commutes with every element in the Clifford group Cl(n) and vice versa. Hence, this subspace is nothing but the Clifford commutant Cl(n) with projector P Cl :=M t (μ Cl ). Thus, the spectral decomposition is where λ r (X ) denotes the r -th largest eigenvalue of a normal operator X . Hence, we find where λ min (M t (σ )) is the smallest eigenvalues of M t (σ ). We continue by arguing that it sufficient to consider the case when λ * (M t (σ )) = λ 2 (M t (σ )) > 0. To this end, consider the linear operator T σ : L 2 (Cl(n)) → L 2 (Cl(n)) given as This is the (Hermitian) averaging operator with respect to σ on the group algebra L 2 (Cl(n)). The largest eigenvalue of T σ is λ 1 (T σ ) = 1 and its eigenspace corresponds to the trivial representation. By Ref.
[60, Lem. 1], its smallest eigenvalue is lower bounded by where σ (1) ≡ σ ({1}) = 1/|G| is the probability of drawing the identity. According to the Peter-Weyl theorem, the spectrum of M t (σ ) is exactly the spectrum of the restriction of T σ to the irreducible representations that appear in the representation U → Ad ⊗t U . In particular, we find λ min (M t (σ )) ≥ −1 + 2 |G| . Let us assume that λ * (M t (σ )) = |λ min (M t (σ )) |. Then, g(σ, t) ≤ 1 − 2/|G| < 1 and hence we can argue as in the proof of Thm. 2 to show that local random Clifford circuits form relative ε-approximate Clifford t-designs in depth O(2nt + log(1/ε)). Therefore, we consider the more relevant case when λ * (M t (σ )) = λ 2 (M t (σ )) > 0 in the following, this is Since M t (σ ) is self-adjoint, we can interpret it as an Hamiltonian on the Hilbert space L((C 2 ) ⊗nt ). In this light, it will turn out to be useful to recast Eq. (71) as the spectral gap of a suitable family of local Hamiltonians with vanishing ground state energy: Let us summarize these findings in the following lemmas. Lemma 6 (Spectral gap). Let σ be as in Def. 4 and H n,t the Hamiltonian from Eq. (72).
Lemma 7 (Ground spaces). The Hamiltonians H n,t are positive operators with ground state energy 0. The ground space is given by the Clifford commutant where t,t is the set of stochastic Lagrangian sub-spaces of Z t 2 ⊕ Z t 2 . In the remainder of this section, we will prove the existence of a uniform lower bound on the spectral gap of H n,t . In combination with Lemma 6 and Lemma 5 this will imply Theorem 2. While it is highly non-trivial to show spectral gaps in the thermodynamic limits, we can use the fact that H n,t is frustration-free (compare Lemma 7). This allows us to apply the powerful martingale method pioneered by Nachtergaele [61]. Lemma 8. (Lower bound to spectral gap) Let the Hamiltonian H n,t be as in Eq. (72) and assume that n ≥ 12t. Then, H n,t has a spectral gap satisfying Proof of Proposition 2. We can now combine the bound in (75) with any lower bound on the spectral gap independent of t. To this end, we make again use of the averaging operator T σ : where η is the probability of the least probable generator (here 1/|G|n) and d is the diameter of the associated Cayley graph (given in Ref. [62] as d = O(n 3 / log(n)).
Since the representation U → Ad ⊗t U contains a trivial component, the second largest eigenvalue of M t (σ ) can be at most λ 2 (T σ ). Thus, H n,t has a gap of at least η/d 2 . Finally, by Lemma 8 it follows that for a constant c(G). We note that the applicability of Ref. [60,Cor. 1] to random walks on the Clifford group has also been observed in Ref. [9].
We can combine Theorem 2 and Theorem 1 to obtain the following corollary: Corollary 2 (Local random unitary design). Let K ∈ U (2) be a non-Clifford gate and let G ⊂ Cl(4) be a closed, generating set. There are constants C 1 (K , G), C 2 (K ), C 3 (K ) such that whenever the local random circuit σ k,m , defined in (6), is an ε-approximate unitary t-design for all n ≥ C 3 (K )t 2 .
Proof. Consider the superoperator where σ * m denotes the probability measure of a depth m local random walk on the Clifford group (cp. Def. 4). We would like to bound the difference between the Haar random t-th moment operator M t (μ H ) =: P H and M t (σ k,m ). Notice the following standard properties of P H : for any probability measure ν on U (2 n ). In particular, we have that P H is an orthogonal projector. As in the last section, we make use of the spectral decomposition in Eq. (67) to decompose M t (σ * k ) as follows: Recall the shorthand notation P Cl := M t (μ Cl ). Using the triangle inequality and the inequality (12), this implies Note that we bounded the second largest eigenvalue λ 2 of M t (σ ) in Proposition 2. We can now combine Proposition 2 with (62) to obtain: M t (σ k,m ) − P H ≤ k2 2tn+1 λ m 2 + 2 33t 4 +t log(k) 1 + 2 32t 2 −n 5kη k K ,t .

Singling out the Clifford Group
There are a number of ways to motivate the construction of approximate unitary tdesigns from random Clifford circuits. From a practical point of view, Clifford gates are often comparatively easy to implement, in particular in fault-tolerant architectures. In this section, we point out that Refs. [38,39] together imply that the Clifford groups are also mathematically distinguished. We formulate this observation as Proposition 3: The finite case follows from the recently obtained classification of finite unitary subgroups forming t-designs, so-called unitary t-groups, by [38] building on earlier results by [55]. The infinite case is a corollary of a theorem about universality of finitely generated subgroups by [39]. This section is independent from the rest of the paper and has the sole purpose of highlighting the results in Refs. [38,39,55] and explicitly formulate their combined implications for the generation of unitary t-designs. Moreover, it might serve as an intuitive justification for the usefulness and omnipresence of Clifford unitaries in random circuit constructions.
For any subgroup G ⊆ U(d), we let Notice that G is a unitary t-design if and only if G is. Proposition 3 refers to t-designs generated by finite gate sets, which we define now. The starting point is a Hilbert space (C q ) ⊗r for some r . A finite gate set is a finite subset We will denote by G n the subgroup of SU (C q ) ⊗n generated by elements of G acting on any r tensor factors (here r ≤ n). The number q is called the local dimension of G.
Proposition 3 (Singling out the Clifford group [38,39,55]). Let t ≥ 2, and let G be a finite gate set with local dimension q ≥ 2. Assume that (1) either all G n are finite or they are all infinite, and (2) there is an n 0 such that for all n ≥ n 0 , G n is a unitary t-design.
Then, one of the following cases apply: (i) If t = 2, we have either q prime and G n is isomorphic to a subgroup of the Clifford group Cl(q n ), or G n is dense in SU(q n ), (ii) If t = 3, we have either q = 2 and G n is isomorphic to the full Clifford group Cl(2 n ) or G n is dense in SU(q n ), (iii) If t ≥ 4 then G n is dense in SU(q n ).
Note that a finitely generated infinite subgroup of SU(d) is always dense in some compact Lie subgroup (cp. [39,Fact 2.6]). In particular, it inherits a Haar measure from this Lie subgroup which allows for a definition of unitary t-design. a. Finite case. In the classification in Ref. [38], the non-existence of finite unitary tgroups was shown for t ≥ 4 (and dimension d > 2). Already the case t = 3 is very restrictive, since the authors arrive at the following result: Lemma 9 (Ref. [38,Thm. 4

]). Suppose d ≥ 5 and consider a finite subgroup H < SU(d) which is a unitary 3-design. Then, H is either one of finitely many exceptional cases or d = 2 n and H is isomorphic to the Clifford group Cl(2 n ).
This establishes the finite version of (ii), the t = 3 case. The classification of unitary 2-designs is however more involved, it includes certain irreducible representations of finite unitary and symplectic groups (compare [38, Thm. 3 Lie-type case]), and a finite set of exceptions. The exceptions can be ruled out in the same way as above.
The former, the Lie-type cases, happen in dimensions (3 n ± 1)/2 and (2 n + (−1) n )/3. There is no q for which there exists an n 0 such that for all n ≥ n 0 there exists an m ∈ N satisfying either Thus, the assumptions of Prop. 3 rule these out. This establishes the finite version of (i). b. Infinite case. Define the commutant for a set S ⊂ SU(d) of the adjoint action as We show that the second case can be reduced to Cor. 3.5 from Ref. [39] applied to the simple Lie group SU(d).

Lemma 10 ([39, Cor. 3.5]). Given a finite set G ⊂ SU(d) such that G = G is infinite. Then, the group G is dense in SU(d) if and only if
Recall that a subgroup G ⊆ U (d) is a unitary 2-group if and only if Comm(U ⊗U |U ∈ G) = Comm(U ⊗ U |U ∈ U(d)) = span(1, F), where F denotes the flip of two tensor copies (see also App. A). Let us denote the partial transpose on the second system of a linear operator A ∈ L(C d ⊗ C d ) by A . Then, one can easily verify that induces a vector space isomorphism between Comm(U ⊗ U |U ∈ G) and Comm(U ⊗ U |U ∈ G). The image of the basis {1, F} is readily computed as where |ii is the maximally entangled state vector. Next, we use that U ⊗ U = mat(Ad U ) is the matrix representation of Ad U = U · U † with respect to the basis E i, j = |i j | of L(C d ). Thus, we have Comm(Ad G ) Comm(U ⊗ U |U ∈ G) as algebras. Pulling the above basis of Comm(U ⊗ U |U ∈ G) back to Comm(Ad G ), we then find: Hence, we have shown that any element in Comm(Ad G ) is a linear combination of these two maps. However, by restricting to su(d), the second map becomes identically zero, thus we have By Lemma 10, this shows that any finitely generated infinite unitary 2-group G ≤ SU(d) is dense in SU(d). Since any unitary t-group is in particular a 2-group, this is also true for any t > 2.

Proof of overlap lemmas.
In this section, we prove three technical lemmas which are needed throughout this paper. These lemmas give bounds on the overlaps of the operators Q ⊗n T and hence quantify how far this basis is from an orthonormal basis of the commutant of the Clifford tensor power representation, i.e., for range P Cl . (Diamond norm bound). Consider T 1 , T 2 ∈ t,t and denote with N 1 , N 2 their respective defect spaces. Then, it holds that

Lemma 3
Proof. First, recall that Q T := 2 −t/2 r (T ). Then, we make use of the following elementary bound on the diamond norm of rank one superoperator |A )(B |: Here, we have used in † that the partial trace is a contraction w.r.t. · 1 and in ‡ a version of the duality between trace and spectral norm [63]. Given stochastic Lagrangians T 1 and T 2 with defect spaces N 1 and N 2 , we thus find using Lem. 1: To prove 2., we use Ref. [42,Eq. (4.25)] and that the transpose does not change the dimension of the corresponding defect subspace. Moreover, we assume w.l.o.g. that dim N 2 ≥ dim N 1 . We have where r (T ) is described by a stochastic orthogonal and a defect space N ⊥ 1 ∩ N 2 + N 1 . Hence, we obtain (together with Hölder's inequality): Using N ⊆ N ⊥ for all defect spaces and the general identity dim( Next, we define a frame operator associated to the basis Q ⊗n T . If the basis was orthogonal, this frame operator would simply be the projector P Cl onto the Clifford commutant. Definition 9 (Clifford frame operator). We define the Clifford frame operator of the basis Q ⊗n T as Hence, a quantifier for the orthogonality of the Q ⊗n T basis is the distance of S Cl to the projector P Cl . As we prove in Lem. 12, we have P Cl ≈ S Cl in spectral norm and we will use this result later in the proof of Lem. 8. In order to show this, we first derive a result on the sum of overlaps in Lem. 11.
Interestingly, S Cl is not close to P Cl in diamond norm (see. Ch. 15 in Ref. [64]). To derive our main result, we instead construct an orthogonalized basis from the Q ⊗n T . Some properties of the orthogonalized basis are proven in Lem. 4, which also makes use of Lem. 11.

Lemma 11 (Overlap of stochastic Lagrangian sub-spaces). We have Q T Q T ≥ 0 for all T, T ∈ t,t . Moreover, for all T ∈ t,t the sum of overlaps is
where (−2 −n ; 2) t−1 = t−2 r =0 (1 + 2 r −n ) and the last inequality holds for n + 2 ≥ t + log 2 (t).
Proof. Denote by Stab(n) the set of stabilizer states on n qubits. Since the operators r (T ) are entry-wise non-negative, we have Q T Q T = 2 −t Tr(r (T ) † r (T )) ≥ 0. Note that r (T ) † = r (T ) for a suitableT ∈ t,t (cp. Thm. 4). We obtain where we have again used [42,Thm. 5.3] in † and in ‡ that s ⊗t r (T ) ⊗n s ⊗t = 1 for all T ∈ t,t and all s ∈ Stab(n) (compare Ref. [42, Eq. (4.10)]). Finally, in * we have used the "inverse Bernoulli inequality" (1 + x) r ≤ e r x which holds for all x ∈ R and r ≥ 0. By assumption, the following holds Thus, we can use the inequality e x ≤ 1 + 2x for 0 ≤ x ≤ 1 to obtain where (−2 −n ; 2) t−1 = t−2 r =0 (1 + 2 r −n ) and the last inequality holds for n + 2 ≥ t + log 2 (t).
Proof. Define the synthesis operator of the frame as the map where e T is the standard basis of the domain. Then, we have clearly = V † V and S Cl | Cl(n) = V V † . Since S Cl and P Cl are both identically zero on Cl(n) ⊥ , this part does not contribute to the spectral norm. From this it is clear that Moreover, we can compute where we have used that the spectral norm of Hermitian operators is bounded by the max-column norm and inserted the exact result of Lemma 11 in the last step. Finally, said lemma provides the desired bound for n + 2 ≥ t + log 2 t.

Lemma 2 (Overlap bound). Let K be a single qubit gate which is not contained in the Clifford group. Then, there is a constant c(K ) > 0 such that
The proof of Lemma 2 is based on two results. The first states that the basis elements r (T ) of the commutant of tensor powers of the Clifford group either belong to the commutant of the powers of the unitary group, or else are far away from it.

Lemma 13 (Haar symmetrization). For all t and for all T ∈ t,t \ S t , it holds that
where Q T is as in Eq. (21) and P H = M t (μ H ) is the t-th moment operator of the single-qubit unitary group U(2).
The proof is given in Sect. 6.3. In Appendix C, we show that the constant 7/8 cannot be improved below 7/10, by exhibiting a T that attains this bound.
The second ingredient to Lemma 2 is a powerful theorem by Varjú [53]. Here, we specialize this theorem to the unitary group: 53,Thm. 6]. Let ν be a probability measure on U(d). Consider the aver- Then there are numbers C(d) > 0 and r 0 > 0 such that where |v| 2 = i v 2 i . Proof of Lemma 2. Consider the probability measure ξ K that draws uniformly from the set {K , K † , 1}. Moreover, define ν K on U(2) as the average of the uniform measure on {H, S, S 3 } and ξ K * ξ K . Hence, the according moment operator is As the Clifford group augmented with any non-Clifford gate is universal [65, Thm. 6.5], so is the probability measure ν K . It follows from the representation theory of the unitary group (see App. B) that the representation U → Ad ⊗t U does not contain irreducible representations W v with highest weight of length |v| > √ 2t. Thus, we can decompose into these irreducible representations as follows: Here, m v denotes the multiplicity of the irreducible representation W v (possibly zero).
In the second step we have used that P H has only support on the trivial irreducible representation v = 0, where both P H and M t (ν K ) act as identity and thus cancel. Hence, only non-trivial irreducible representations are contributing. To bound √ 2t (ν K ), we can invoke Theorem 5 combined with the fact that for any universal probability measure the restricted gap is non-zero: r (ν K ) > 0 for all r ≥ 1 (compare e.g. Ref. [27]). Hence, we obtain where c(K ) > 0. Therefore, we have Furthermore, consider the operator We obtain In the fourth step, we again used the properties of the Haar projector as in Eq. (79). Combining this with (107) and Lemma 13 we obtain We can use that (Q T | Ad ⊗t for all T ∈ t,t because Q T = 2 −t/2 r (T ) commutes with the t-th diagonal action of the single-qubit Clifford group (compare [42,Lem. 4.5]). We immediately obtain From the Cauchy-Schwarz inequality, we now get where we have used that c (K ) log −2 (t) ≤ √ 2t (ν K ) ≤ 1 such that we can use the inequality This shows the claimed statement.
Remark 2 (Quantum gates with algebraic entries). If we restrict to gates K that have only algebraic entries, we can apply the result from Ref. [66] and save the additional overhead of log 2 (t) in the scaling. This applies to the T -gate and for essentially all gates that might be used in practical implementations. Here, we have chosen the more general approach.
Remark 3 (Implications for quantum information processing). Theorem 5 has miscellaneous implications for quantum information processing. E.g. we can immediately combine this bound with the local-to-global lemma in Ref. [23,Lem. 16] to extend Ref. [24,Cor. 7] to gate sets with non-algebraic entries at the cost of an additional overhead of log 2 (t) in the scaling. The bottleneck to loosen the invertibility assumption as well is the local-to-global lemma which only works for Hermitian moment operators (symmetric distributions). Work to lessen the assumption of invertibility has been done in Ref. [67]. Extending this would be an interesting application which we, however, do not pursue in this work.
Lemma 4 (Properties of the constructed basis). Let {T j } | t,t | j=1 be an enumeration of the elements of t,t such that the first t! spaces T j correspond to the elements of S t . Then, the {E j } constitutes an orthogonal (but not normalized) basis, where Denote by N i the defect space of T i . For n ≥ 1 2 (t 2 + 5t), we have Moreover, it holds that Proof. The form of (30) is up to a constant the determinant formulation of the Gram-Schmidt procedure. First, note that the number of permutations of n elements with no fixed points is known from Ref. [68] to be for n ≥ 1. Here, D stands for "derangement" as permutations without fixed points are sometimes called. Then, the number of permutations having exactly k fixed points is n k many choices of k points times the number D(n − k) of deranged permutations on the remaining n − k objects: The following estimate for certain sums involving p(n, k) will shortly become useful. Note that we have for any M, L ∈ N and m ∈ R such that 2 m > M − L and M ≥ L ≥ 1: Here, we have used in the second inequality that 2 mk /k! is monotonically increasing for k ≤ M − L < 2 m and a standard bound on binomial coefficients in the last step. We start by bounding the diagonal coefficients A j, j . The idea is to divide the set of permutations into sets of permutations with exactly k fixed points. For any such permutation, the product of overlaps collapses to only j −1−k non-trivial inner products. By assumption n ≥ 1 2 (t 2 + 5t) ≥ t + log 2 t, thus we can be bound any of those using Lemma 11 as Note that the trivial permutation (corresponding to k = j −1 fixed points) contributes by exactly 1 to the sum. Thus, we find the following bound using Eq. (115) with M = j −1, L = 1 and m = n − t − log 2 t: where we have used Eq. (15) in the last step as j − 1 < j ≤ | t,t | ≤ 2 1 2 (t 2 +5t) . Using the reverse triangle inequality, we get a lower bound in the same way: Next, we will bound the off-diagonal terms A i, j . It is well known that every permutation ∈ S j can be written as a product of disjoint cycles. Given a ∈ S j with ( j) = i, consider the cycle j → i → i 1 → i 2 → . . . i r → j in . Then, we have the bound where we have used Lemma 3, the triangle inequality and a telescope sum. We set L := | dim N i − dim N j | and split the sum over permutations into those with more than or equal to j − L many fixed points and those with less. In the first case, we use Eq. (119) to bound the overlaps, in the second case we use Eq. (115) as before. This yields the following bound where we have used again j ≤ | t,t | and L ≤ t/2.
Note that we can alternatively bound A i, j for i = j using that the identity is not an allowed permutation, i. e. only permutations with less than j − 2 fixed points can appear. With Eqs. (115) and (116), we get the following inequality

Lemma 13 (Haar symmetrization). For all t and for all T ∈ t,t \ S t , it holds that
where Q T is as in Eq. (21) and P H = M t (μ H ) is the t-th moment operator of the single-qubit unitary group U(2).
For an analysis of the tightness of the bound, see "Appendix C". Recall that Let P D be the Haar averaging operator, restricted to the diagonal unitaries. As it averages over a subgroup, P D is a projection with range a super-set of P H . By applying P D to r (T ), we can turn the statement (101) from one involving Hilbert space geometry to one about the discrete geometry of stochastic Lagrangians. Indeed, i.e., the overlap is upper-bounded by the probability that a uniformly sampled element (x, y) of T has components of equal Hamming weight.
We will bound the probability in slightly different ways for spaces T with trivial (i.e., zero-dimensional) and non-trivial defect spaces. a. Case I: trivial defect sub-spaces In this case, T = {(Oy, y) | y ∈ F t 2 } for some orthogonal stochastic matrix O. The next proposition treats a slightly more general situation.

Proposition 4 (Hamming bound
. Assume O has a column of Hamming weight r . Then the probability that O preserves the Hamming weight of a vector y chosen uniformly at random from F t 2 satisfies the bound The bound in Eq. (123) decreases monotonically in r . Orthogonal stochastic matrices O satisfy r = 1 mod 4, so the smallest non-trivial r that can appear is r = 5, for which the bound gives .81.
The proof idea is as follows: For each y ∈ F t 2 , the two vectors y, y + e 1 differ in Hamming weight by ±1. But, if h(e 1 ) = 1, then h(Oy) − h(O(y + e 1 )) tends not to be ±1. In such cases, O does not preserve weights for both y and y + e 1 . Applying this observation to randomly chosen vectors, we can show the existence of many vectors for which O changes the Hamming weight.
Proof of Proposition 4. Assume without loss of generality that the first r entries of Oe 1 are 1, and the remaing t − r entries are 0.
Let y be a uniformly distributed random vector on F t 2 , notice that also Oy, and O(y + e 1 ) are uniformly distributed. Using the union bound, we find that What is more, let T be a stochastic Lagrangian with non-trivial defect sub-spaces. Then, for an element (x, y) drawn uniformly from T , we have Proof. Let d = dim N . Consider a t × d column-generator matrix for N . Permuting coordinates of F t 2 and adopting a suitable basis, there is no loss of generality in assuming that is of the form Note that is a row-generator matrix for N ⊥ . Indeed, the row-span has dimenion t − d and the matrices fulfill i.e., the inner product between any column of and any row of γ vanishes. It follows that elements n ∈ N , x ∈ N ⊥ are exactly the vectors of respective form In particular, if x is drawn uniformly from N ⊥ , then the first t − d components are uniformly distributed in F t−d 2 . For now, we restrict to the case where G has a column, say the first, with r = 1 non-zero entries. We then choose n = (Ge 1 , e 1 ) and argue as in Eq. (124) to obtain We are left with the case where all columns of G have Hamming weight 1. (If N is a defect subspace, then Def. 6.1 implies that every column of has Hamming weight at least 4. We treat the present case merely for completeness). As N is isotropic, the columns of have mutual inner product equal to 0: It follows that all columns have to be mutually orthogonal standard basis vectors e i ∈ F t−d 2 . Thus, by permutating the first t − d coordinates of F t 2 , we can assume that G is of the form wherex| d denotes the restriction ofx to the first d components. Adding n := (e 1 ⊕0, e 1 ) to x = (x,x| d ), the Hamming weight of the two parts change both by ±1, giving We have proven the first advertised claim. It implies the second one, as argued next. Let N be the left defect subspace of T . By Ref. [42,Prop. 4.17], we find the following.
• The restriction {x | (x, y) ∈ T for some y} equals N ⊥ .
• The stochastic Lagrangian T contains N ⊕ 0.
Assume that (x, y) is distributed uniformly in T . By the first cited fact, x is distributed uniformly in N ⊥ . By the second fact, (x + n, y) follows the same distribution as (x, y), for each n ∈ N . Thus, repeating the argument in the proof of Proposition 4, we find that for any fixed n ∈ N :
Proof. This follows similar to Ref. [24,Lem. 4& Lem. 30]. Denote by | 2 n the maximally entangled state vector on C 2 n ⊗ C 2 n . The condition in (5) is equivalent to as an operator inequality, where We have a decomposition of (C 2 n ) ⊗t into irreducible representations of the Clifford group: where {C γ } is the set of all equivalence classes of irreducible representations of Cl(n) that appear in the t-th order diagonal representation, and L γ are the corresponding multiplicity spaces (which by the double commutant theorem are irreducible representations of the commutant algebra -we have chosen L for Lagrangian). This implies that where | L γ and | C γ denote maximally entangling state vectors on two copies of L γ and C γ , respectively. Indeed, observe that | 2 n ⊗t = 2 −nt/2 vec(1) and that the identity restricted to sub-spaces is just the identity on these sub-spaces. The prefactors then follow from normalizing the vectorized identity operators on the direct summands. Since Cl(n) acts via multiplication on the spaces C λ , this implies that where the second line follows from Schur's lemma and the fact that U ⊗t • (U † ) ⊗t is trace preserving. The support of this operator is on the symmetric subspace ∨ t (C 2 n ⊗C 2 n ) [24,Lem 30.1]. The minimal eigenvalue of this operator restricted to the symmetric subspace is which we now lower bound. Let γ * denote the optimizer. By Schur-Weyl duality, the diagonal action of U(2 n ) on (C 2 n ⊗ C 2 n ) ⊗t decomposes as ⊕ λ U λ ⊗ S λ where as usual U λ are Weyl modules and S λ are Specht modules. Restricting this action to the Clifford group, the U λ further decompose into irreducible representations where I λ is the spectrum of U λ as a Clifford representation. Let 0 be the set of all λ such that γ * ∈ I λ , then as a Clifford representation Thus, as a vector space, we have In particular, for any λ ∈ 0 we have that dim C γ * ≤ dim U λ and dim L γ * ≥ dim S λ . Thus we get the following bound for the minimal eigenvalue: The rest of the proof follows as in Ref. [24,Lem. 4], mutatis mutandis.
3. The third condition requires a calculation and a non-trivial choice of l. We have to bound the quantity for all q ≥ Q l = l. Here, G [ p,q] denotes the orthogonal projector onto the ground space of H [ p,q] . Note that this ground space is simply a suitable translation of the Clifford commutant Cl(k) for k = q − p + 1 as shown in Lemma 7. Recall that it comes with a non-orthogonal basis Q ⊗k T , where Moreover, the projector G [ p, q] is also simply a translation of the Clifford projector P Cl(k) projecting onto Cl(k) . From the discussion in Sect. 6.1, we know that the Clifford frame operator is a suitable approximation to P Cl(k) when k is large enough. Concretely, we have by Lem. 12: Defining the shorthand notation s t (k) = (−2 −k ; 2) t−1 , we in particular get the bound Let us introduce the shorthand notation G q := G [1,q] ≡ P Cl(q) , S q = S [1,q] ≡ S Cl(q) , and G q,l := G [q−l+2,q+1] , S q,l := S [q−l+2,q+1] for translations of the Clifford projector and frame operator, respectively. Notice that G q − G q+1 is an orthogonal projector as the support of G q+1 is by definition contained in that of G q . Therefore, restricted to the support of G q , the operator G q − G q+1 projects onto the orthogonal complement of the support of G q+1 . Combining this fact with the above inequalities, we find where the operator Y T can be straightforwardly computed as Invoking the synthesis operators introduced in Lemma 12, one can bound the above norm as Thus, we arrive at For l + 1 ≥ t + log 2 (t), we can use Lemma 11 to get: Finally choose any l ≥ 4t + 4 log 2 (t) + 6, then we find In particular, we can choose l = 12t, ε l = 1/2 √ l to get the desired bound in Lemma 14 ∀q ≥ l.

Summary and Open Questions
We have found that a number of non-Clifford gates independent of the system size suffices to generate ε-approximate unitary t-designs. This is surprising, conceptually interesting and practically relevant: After all, it is the main objective in quantum gate synthesis to minimize the number of non-Clifford gates in a circuit implementation of a given unitary. There are multiple open questions and ways to continue this work: • Similar to the result in Ref. [24], the scaling in n is near to optimal, the scaling in t can probably be improved.
• Another natural open question is whether the condition n = O(t 2 ) can be lifted.
Notably, this is reminiscent to the situation discussed in Ref.
[69], where the improved scaling can be proven only in the regime t = o(n 1 2 ). In this work, the condition n = O(t 2 ) is related to the approximate orthogonality of the Lagrangian subspace. We use this fact repeatedly and in different flavours, but we can only prove it in this regime. In fact, in Lemma 12 we use the same technique that has been used in Ref. [24] to prove approximate orthogonality of permutations in the regimes t ≤ 2 O(0.4n) . However, the commutant of the Clifford group is far larger than the span of permutations and we suspect that this bound is tight. Nevertheless, we cannot rule out that similar results can be proven without exploiting approximate orthogonality. This likely requires a detailed understanding of the representation theory of the Clifford group.
• Our result holds for additive errors in the diamond norm. For relative errors, our bounds can be used to obtain a quadratic advantage in the number of non-Clifford gates in Corollary 1. This still allows the density of non-Clifford gates to go to zero in the thermodynamic limit, but is not system-size independent anymore. In fact, it has been proven in Ref. [70] that this scaling is optimal for relative errors. It would be interesting to delineate more precisely for which notions of approximations a system-size independent result holds. • We strongly expect that the results can be generalized to qudits for arbitrary d, giving rise to analogous conclusions concerning an independence of the system size for additive errors in the diamond norm.
We hope the present work stimulates such endeavors.
regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Data Availability Statement
No data was generated in this work.
Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Unitary t-designs
In the following, we review the concept of a unitary t-design [5][6][7], giving different but equivalent definitions which prove to be useful in different contexts. They also serve as starting point to explore connections to other mathematical fields, e. g. representation theory. To this end, let us introduce some notation. Define μ H to be the (normalized) Haar measure on U(d) and let Hom (t,t) (U(d)) be the space of homogeneous polynomials of degree t in both the entries of U ∈ U(d) as well as U .

Definition 10 (Unitary t-design).
A probability measure ν on U(d) is called a unitary t-design if the following holds for all p ∈ Hom (t,t) (U(d)): A subset D ⊆ U(d) is called a unitary t-design, if it comes with a probability measure ν D which, continued trivially to U(d), is a unitary t-design. In particular, if D is finite, ν D is usually taken to be the (normalized) counting measure.
It might not come as a surprise that Def. 10 has not to be checked for any polynomial. Since any homogeneous polynomial p ∈ Hom (t,t) (U(d)) can be linearized as p(U ) = Tr AU ⊗t,t , U ⊗t,t := U ⊗t ⊗ U ⊗t , A particularly fruitful theory of designs is possible in the case where the design (G, ν) itself constitutes a (locally compact) subgroup G ⊆ U(d) and ν is the normalized Haar measure on G. Following Ref. [38], we call these unitary t-groups. In this case, we see that Eq. (A3) implies that the trivial isotype of the representation G g → Ad ⊗t g shall agree with the trivial isotype of U(d) U → Ad ⊗t U . Since the trivial isotype exactly corresponds to the commutant of the respective diagonal representations τ t : U → U ⊗t , this is equivalent to the statement that the commutant of the representation τ t agrees with the commutant of the restriction τ t | G . However, this is the case if and only if τ t | G decomposes into the same irreducible representations as τ t . Likewise, both estimates in Proposition 5 are tight. The first bound is saturated for N = {0, (1, 1, 1, 1)}. Indeed, N ⊥ is the space of all even-weight elements of F 4 2 . The only non-trivial element of N is (1, 1, 1, 1) and adding it to an even-weight vector changes its weight if and only if the vector is in N itself. But |N |/|N ⊥ | = 1/4. In an exactly analogous way, the second bound is tight for the stochastic Lagrangian with left and right defect spaces equal to the same N . As detailed in Example 4.27 of Ref. [42], this stochastic Lagrangian is the one identified in Ref. [74] as the sole non-trivial one in case of t = 4.
In contrast, we do not know (but suspect) that we pay a price by restricting from the full Haar symmetrizer to the one over diagonal matrices in Eq. (123). For the two cases that saturate the bounds in Proposition 4 and Proposition 5, we can compute the full projection explictily and show that at least there, Eq. (123) indeed fails to be tight.
One can expand the anti-id 1 in terms of Pauli operators [42] 1 = 1 2 1 ⊗t + X ⊗t + Y ⊗t + Z ⊗t . Tr σ ⊗t i U ⊗t σ ⊗t j (U † ) ⊗t dμ H (U ) Tr σ i U σ j U † t dμ H (U ) = 2 −2 + 2 −2 9 1 4π S 2 x t 1 dx = 1 4 + 9 4 1 4π where in (C4), we have interpreted the Haar integral over inner products of Paulis as an integral over the Bloch sphere and in the next line, used the formula from [75]. For t = 2, Eq. (C1) is just the swap operator (i.e., a permutation), and the formula gives 1, as it should. The smallest non-trivial case is t = 6 [42] , where we get roughly 0.571 < 0.65. Next, we consider the CSS code P N for N = (1, 1, 1, 1). We use the results in Sect. 3 of Ref. [74]. For a given partition λ, let W λ be the associated Weyl module and S λ the Schur module. As in Ref. [74], let W + λ ⊂ W λ be the subspace such that For the projection operators onto the various spaces, we write P λ (Schur module), Q λ (Weyl module), and Q + λ (the subspace defined above). Then [74] By Schur's Lemma, for suitable coefficients c λ , which are seen to equal c λ = D + λ /D λ by the fact that Haar averaging preserves the trace. Hence, using Table 1

Appendix D: Saturation of Higher Rényi-entropies in K -interleaved Clifford Circuits
Consider the Rényi-entropies which are defined as for α > 0. For α 1 the standard von Neumann entropy is recovered. Here, we are interested in the entanglement properties of random state vectors |ψ on n qubits. We consider a bi-partition of the n qubits into a set A consisting of constantly many qubits n A and a set B of n B = n − n A many qubits that constitutes the complement of A. To derive concentration bounds on these quantities over random ensembles of states, we study the "higher purities" Tr[ρ α ] for positive integer α in more detail. First, we compute the Haar average of this quantity. Let π cyc ∈ S α be any full α-cycle. We compute Tr r (π cyc ) A ⊗ 1 B P sym,α σ ∈S α 2 n A #cyc(π cyc •σ ) 2 n B #cyc(σ ) = 1 2 n (2 n + 1) . . . (2 n + α − 1) σ ∈S α 2 n A #cyc(π cyc •σ ) 2 n B #cyc(σ ) = 2 αn B 2 n A 2 n (2 n + 1) . . . (2 n where O(2 −n B ) depends on α. Therefore, up to an exponentially small correction, the average higher purity is minimal. Next, we compute the same average over an additive ε-approximate unitary t-design. Recall that this is a probability distribution ν such that By definition of the diamond norm, this also implies It suffices to insert C(K ) log 2 (t)(t 4 + t log(1/ε)) non-Clifford gates into random Clifford circuits to generate an additive ε-approximate t-designs. Therefore, we can choose ε = 2 −2(α−1)n A and t = α and find that a K -interleaved Clifford circuit with k = C(K ) log 2 (α)(α 4 + 2(α − 1)n A ) satisfies Therefore, for every constant n A and α, there is a classically simulable ensemble of quantum circuits that generate essentially minimal higher purities on average.