Schur-Weyl Duality for the Clifford Group with Applications: Property Testing, a Robust Hudson Theorem, and de Finetti Representations

Schur-Weyl duality is a ubiquitous tool in quantum information. At its heart is the statement that the space of operators that commute with the tensor powers of all unitaries is spanned by the permutations of the tensor factors. In this work, we describe a similar duality theory for tensor powers of Clifford unitaries. The Clifford group is a central object in many subfields of quantum information, most prominently in the theory of fault-tolerance. The duality theory has a simple and clean description in terms of finite geometries. We demonstrate its effectiveness in several applications: (1) We resolve an open problem in quantum property testing by showing that"stabilizerness"is efficiently testable: There is a protocol that, given access to six copies of an unknown state, can determine whether it is a stabilizer state, or whether it is far away from the set of stabilizer states. We give a related membership test for the Clifford group. (2) We find that tensor powers of stabilizer states have an increased symmetry group. We provide corresponding de Finetti theorems, showing that the reductions of arbitrary states with this symmetry are well-approximated by mixtures of stabilizer tensor powers (in some cases, exponentially well). (3) We show that the distance of a pure state to the set of stabilizers can be lower-bounded in terms of the sum-negativity of its Wigner function. This gives a new quantitative meaning to the sum-negativity (and the related mana) -- a measure relevant to fault-tolerant quantum computation. The result constitutes a robust generalization of the discrete Hudson theorem. (4) We show that complex projective designs of arbitrary order can be obtained from a finite number (independent of the number of qudits) of Clifford orbits. To prove this result, we give explicit formulas for arbitrary moments of random stabilizer states.


Background
Schur-Weyl duality. To motivate the symmetry this work is based on, we start by considering two types of problems that have frequently appeared in quantum information theory. First, assume that we have access to t copies ρ ⊗t of an unknown quantum state ρ on C d , and that we are interested in some property of ρ's eigenvalues (for example its entropy). Clearly, then, the problem has a U ⊗t -symmetry in the sense that the inputs ρ ⊗t and U ⊗t (ρ ⊗t )U † ⊗t represent equivalent properties. It thus makes sense to design a procedure that shares the U ⊗tsymmetry, and indeed the resulting procedure has been shown to be optimal for estimating the eigenvalues [KW01,HM02,CM06,CHM07,OW15]. Moreover, consider quantum state tomography, the task of estimating the entire quantum state ρ. Essentially optimal estimators can be constructed by first estimating the eigenvalues and then the eigenbasis [OW16, OW17, HHJ + 17], crucially using the structure of U ⊗t in each step. There are many further problems in quantum information where this symmetry can be exploited -for example in quantum Shannon theory, where optimal rates mostly depend only on the eigenvalues of the quantum state [HM02,Har05].
Second, studying the properties of a Haar-random state vector |ψ has proven to be extremely fruitful [HLW06,Has08]. Instead of working with the full distribution, it is often sufficient to exploit information about the statistical moments of the random matrix |ψ ψ|. The t-th moment is described by the expected value of the t-th tensor power of the random matrix: (1.1) Again, M t is invariant under conjugation by U ⊗t , U ∈ U(C d ).
The importance of Schur-Weyl duality in quantum information stems from the fact that it allows one to characterize the set of U ⊗t -invariant operators on (C d ) ⊗t . Indeed, it implies that any such operator can be expressed as the linear combination of matrices r π , π ∈ S t that act by permuting the tensor factors: r π (|ψ 1 ⊗ · · · ⊗ |ψ t ) = |ψ π 1 ⊗ · · · ⊗ |ψ π t .
(1.2) Clifford group and stabilizer states. Arguably, the subgroup of the full unitary group that is most important to quantum information is the Clifford group. The Clifford group and the closely related concept of stabilizer states and stabilizer codes feature centrally in fault-tolerant quantum computing, quantum coding in general, randomized benchmarking, measurement-based quantum computing, and many other subfields of quantum information.
To introduce the Clifford group, we first recall the definition of the set of Pauli operators. For a qudit (d-dimensional system), they are defined by their action on a some basis {|q } d−1 q=0 via X |q = |q + 1 , Z |q = e 2πiq/d |q .
For n qudits, the Pauli group is defined as the finite group generated by the Pauli operators on each qudit. The Clifford group now is the natural symmetry group of the Pauli group. That is, a unitary U is Clifford if, for any Pauli operator P, UPU † is again in the Pauli group. Ignoring overall phases, the Clifford group is a finite group, which is intimately connected to the metaplectic representation

Schur-Weyl duality for the Clifford group
We start with an explicit description of the commutant of tensor powers of Clifford unitaries. While such a description has not yet appeared in the quantum information literature, we emphasize that some of the key results can already be deduced from work by Nebe, Rains, Sloane and colleagues on invariants of self-dual codes (see the excellent monograph [NRS06]). Also, in representation theory, there is a separate stream of closely related work regarding the structure of the oscillator representation and attempts to develop a Howe duality theory over finite fields, which is still an open problem (see, e.g., [How73,GH16] and references therein). We discovered the approach presented below independently, starting from our results in [NW16, App. C] for third tensor powers. Our proofs differ fundamentally from the preceding works in that they rely on the phase space formalism of finite-dimensional quantum mechanics, which offers additional insight.
To construct the commutant, start with the permutations r π on (C d ) ⊗t of Eq. (1.2). We assume for now that C d is the Hilbert space of a single qudit with "computational basis" {|x } x∈Z d labeled by elements in Z d = Z/dZ (this is anyway required for defining the Pauli and the Clifford group). Basis elements |x = |x 1 ⊗ · · · ⊗ |x t of (C d ) ⊗t are then labeled by vectors x ∈ Z t d . In this language: where T π = {(π(y), y) : y ∈ Z t d } and π permutes the components of y. Because the Clifford group is a subgroup of the unitaries, the commutant is in general strictly larger. We thus have to add further operators to the r π 's in order to find a complete set.
The central message of this section is that, surprisingly, a minor modification of (1.3) suffices! Indeed, for any subspace T of Z t d ⊕ Z t d define We also consider the n-fold tensor power R(T ) := r(T ) ⊗n , which is an operator on ((C d ) ⊗t ) ⊗n ∼ = (C d ) ⊗tn ∼ = ((C d ) ⊗n ) ⊗t . We now single out subspaces that satisfy certain geometric properties. Reflecting a well-known difference between even and odd dimensions in the stabilizer formalism, we define D = d if d is odd, and D = 2d if d is even.
the projection onto a stabilizer code. This way, one can e.g., recover the code that has been used to describe the irreps contained in the 4th tensor power of the Clifford group in [ZKGG16]. The set of invertible R(T )'s are associated with spaces T of the form (Ay, y), for A that are elements of a certain "stochastic orthogonal" group O t (d). This group is of interest to the formulation of modular Howe duality [GH16], and underlies several of our applications below.
Remarkably, the size of the commutant stabilizes as soon as n t − 1. That is, just like the symmetric group in Schur-Weyl duality, the set that parametrizes the commutant of the Clifford tensor powers is independent of the number n of qudits. The fact that the operators R(T ) = r(T ) ⊗n are themselves tensor powers facilitates possible physical implementations. This, once more, generalizes a property of the symmetric group in Schur-Weyl duality.
To find novel applications of this theory, it is helpful to identify a set of non-trivial T 's that afford an intuitive interpretation. Several of our multi-qubit results presented below are based on spaces with elements (πy, y), whereπ is what we refer to as an anti-permutation. An antipermutation is simply the binary complement of a permutation matrix. Formally,π = 1 t 1 T t − π, where 1 t = (1, . . . , 1) contains t ones, and π ∈ S t . Its operator representation is particularly straightforward. The n-qubit anti-identity, e.g., acts by R(1) = 2 −n I ⊗t + X ⊗t + Y ⊗t + Z ⊗t ⊗n , (1.4) which greatly facilitates the analysis (cf. Eq. (3.13) and Definition 4.29).

Quantum property testing: stabilizer testing
The theory of quantum property testing asks which properties of a "black box" many-body quantum system can be learned efficiently-in particular without having to resort to costly full tomography [BFNR03,BFNR08,MdW16]. A prototypical example of a testable property is purity. Indeed, given access to two copies ρ ⊗ ρ of an unknown quantum state ρ, the so-called swap test provides for a simple protocol that accepts with certainty if ρ = |ψ ψ| is pure, and rejects with probability Θ(1/ε 2 ) if ρ is ε-far away from the set of pure states in trace distance. The test is perfectly complete in the sense that it has a type-I error rate of zero (pure states are accepted with certainty); it requires a number of copies (two) that is independent of the dimension. It is also transversal in the sense that if ρ acts on n qubits, all operations are required to be coherent only across the two copies, and factorize w.r.t. the n qubits. An open problem in this theory was whether stabilizerness and Cliffordness are testable properties of, respectively, states and unitaries [MdW16]. Both properties are clearly Clifford-invariant-so by the arguments presented in the introduction, it makes sense to search for tests in the commutant of the Clifford group. It is known that 2nd and 3rd moments of random stabilizer states are identical to the moments of Haar-random states [Zhu15,KG15,Web16]. This implies that three copies of a state are not sufficient to test for stabilizerness, and the results of [ZKGG16] can be used to show that four copies are also insufficient for a dimension-independent theory.
Prior work. Prior to our results, the best known algorithms for stabilizer testing required a number of copies that scaled linearly with n, the number of qubits. Indeed, these algorithms proceeded by attempting to identify the stabilizer state, which necessarily requires Ω(n) copies by the Holevo bound [AG08,Mon17,ZPDF16,KR08]. However, the existence of tests that require only a constant number of copies has been an important open question [MdW16]. We note that the stabilizer testing Input: Six copies of an unknown multi-qubit quantum state (ψ ⊗6 ).
1. Perform Bell difference sampling: That is, Bell sample twice (on two independent copies of ψ ⊗2 ), with outcomes x, y, and set a = x − y.
2. Measure the Weyl operator W a twice (on two independent copies of ψ). Accept iff the outcomes agree.
Algorithm 1: Algorithm for testing whether an unknown multi-qubit state is a stabilizer state.
problem asks whether a given state is any stabilizer state -which is distinct from the problem of verifying whether it equals some fixed stabilizer state [HM15].
Our results. We show that for n qudits O(1) copies suffice to give an efficient, perfectly complete, dimension-independent, and transversal test. For example, for qubits (d = 2) our test requires only 6 copies of the state to achieve a power independent of n (Algorithm 1). It requires coherent operations on only two qubits at a time, which means in particular that it can be implemented given a source that creates two copies of a fixed state at a time (Fig. 2). First, we consider the problem for qubits. Here our protocol affords an intuitive description using a new primitive which we call Bell difference sampling. Then we proceed to the general case and discuss the connection to the commutant of the Clifford group described in Section 1.2.

Qubits: Bell difference sampling
We start with an intuitive motivation of the test. Let |ψ ψ| = 2 −n/2 a c a W a be its expansion w.r.t. the Weyl operators W a (which are just the Pauli operators labeled in the usual way by bitstrings a ∈ Z 2n 2 , cf. Section 2). Now measure two copies of ψ in the Bell basis |W x defined by applying the Weyl operators to the maximally entangled state, i.e., |W x = (W x ⊗ I) |Φ + where |Φ + = 2 −n/2 q |q, q . If ψ is real in the computational basis then it is not hard to see that the measurement outcome is distributed according to the probability distribution p ψ (a) = |c a | 2 . This is known as Bell sampling [Mon17,ZPDF16]. Now stabilizer states are distinguished by the fact that they are eigenvectors of all Weyl operators W a for which |c a | 2 = 0 (these are its stabilizer group). This suggests using Bell sampling to obtain some a, then measuring W a twice on two fresh copies, and accepting ψ as a stabilizer if the same eigenvalue is obtained twice.
While we show that this works for real state vectors, Bell sampling unfortunately does not extend to complex state vectors. To overcome this challenge, we introduce a new primitive: Definition 3.1 (Bell difference sampling). We define Bell difference sampling as performing Bell sampling twice and subtracting (adding) the results from each other (modulo two). In other words, it is the projective measurement on four copies of a state, ψ ⊗4 ∈ ((C 2 ) ⊗n ) ⊗4 , given by For stabilizer states (whether real or complex) it is easy to see that Bell difference sampling will always sample an element a corresponding to a Weyl operator W a in its stabilizer group. What Figure 1: The X-Z-plane of the Bloch sphere. The area shaded in red indicates the projection of those states ρ for which |tr Zρ| > sin π 4 = 1 √ 2 . Likewise, the blue area correspond to the states with |tr Xρ| > 1 √ 2 . As the two areas do not intersect, these two conditions cannot be simultaneously satisfied. This is a manifestation of the uncertainty principle.
is rather less obvious is that, even for arbitrary quantum states, Bell difference sampling still has a useful interpretation. The following theorem shows that this is indeed the case: it amounts to sampling from the probability distribution p ψ twice and taking the difference.
Theorem 3.2 (Bell difference sampling). Let ψ be an arbitrary pure state of n qubits. Then: If ψ is a stabilizer state, say |S S|, then this is equal to p S (a) from Eq. (3.3).
Using Bell difference sampling as a primitive, we obtain the natural Algorithm 1. Theorem 3.3 (Stabilizer testing for qubits). Let ψ be a pure state of n qubits. If ψ is a stabilizer state then Algorithm 1 accepts with certainty, p accept = 1. On the other hand, if max S | S|ψ | 2 1 − ε 2 then p accept 1 − ε 2 /4. Proof sketch. We want to show that if the success probability, p accept , is close to one then ψ has high overlap with a stabilizer state. The proof proceeds in two steps. First, we analyze the success probability and show that if p accept ≈ 1 then p ψ (a) is typically close to its maximum possible value 2 −n . Next, we use Markov's inequality to find a large set of a where p ψ (a) > 1 2 2 −n . Using a version of uncertainty principle (see Fig. 1), we show that the corresponding Weyl operators W a necessarily commute, and therefore form a stabilizer subgroup. This finally means that our initial state must have a large overlap with a corresponding stabilizer state. Theorem 3.3 solves the stabilizer testing conjecture for qubits. It also implies a number of interesting corollaries. E.g., it directly follows that one can also test Cliffordness of a unitary efficiently, without given black-box access to the inverse as in [Low09,Wan11]; this resolves another open problem from [MdW16]. From a structural point of view, it shows that the Clifford group is the solution, within U(2 n ), of a set of polynomial equations of order 6. Our result is optimal in the sense that there exist no perfectly complete tests for fewer than six copies that achieve statistical power independent of the number of qubits (see Section 5 and [Dam18]). Figure 2: Quantum circuit implementing Algorithm 1 for qubit stabilizer testing. Inside the blue blocks: The quantum gates denote the controlled-NOT and the Hadamard gate, respectively; the measurements are in the n-qubit computational basis. Outside the blue blocks: Double lines represent classical information. The "⊕"-operation is addition modulo two. The boxes labeled "Weyl" perform a two-outcome measurement with respect to the eigenspaces of W a , where a is determined by classical inputs. For n qubits, the circuit is fully transversal in the sense that all operations are required to be coherent only across two copies, and factorize with respect to the n qubits.

Qudits
A careful analysis of the measurement of Algorithm 1 shows that it is equivalent to a projective measurement of the form Π accept = 1 2 (I + V), where V is the following Hermitian unitary operator: (1.5) It can be readily seen that the operator Eq. (1.5) commutes with tensor powers of Clifford unitaries. In fact, as discussed earlier, it is natural to approach the stabilizer testing problem by measuring operators in the commutant of the Clifford group. Since the stabilizer states are a single orbit under the Clifford group, any such measurement by design will have the same level of significance on all stabilizer states. Equation (1.5) and corresponding measurement have a clear generalization to arbitrary qudits. Let d 2 and consider the operators One can see that if (d, s) = 1, V s is a Hermitian unitary and so Π s,accept is a projector. We now state our general stabilizer testing result: The proof proceeds similarly to the one of Theorem 3.3. Again, an uncertainty relation for Weyl operators plays an important role. We record it since it may be of independent interest: Lemma 3.10 (Uncertainty relation). Let δ = 1/2d and ψ a pure state such that |tr[ψW x ]| 2 > 1 − δ 2 and |tr[ψW y ]| 2 > 1 − δ 2 . Then W x and W y must commute.
We also study the minimal number of copies required to distinguish stabilizer states from non-stabilizer states in such a way that the power of the statistical test does not decrease with the number of qubits. Since the stabilizer states share the same second moments with uniformly random states (see Section 1.6 below for more detail), one can see that any such test requires at least three copies. Our next result shows that this is sufficient at least when d ≡ 1, 5 (mod 6). For this, consider the POVM element Theorem 8.6 (Stabilizer testing from three copies).
The operators A x are known as phase-space point operators [Gro06], which are defined by a (symplectic) Fourier transform of the Weyl operator basis W a (with respect to the index a). Again, the test corresponds to a particular element of the commutant, and to establish Theorem 8.6 we also need another uncertainty relation, this time for phase-space point operators. Lastly, we derive an explicit prescription for the minimal test that is perfectly complete, i.e., detects all stabilizer states with certainty. Here we use the full power of the algebraic theory. We assume that d is a prime.
We define O t (d) as the group of t × t-matrices O with entries in Z d that satisfy the following properties: We will refer to O t (d) as the stochastic orthogonal group; its elements will be called stochastic isometries.
Equivalently, O t (d) is the group of t × t-matrices O that are orthogonal in the ordinary sense (i.e., O T O = I mod d) and such that the sum of elements in each row is equal to 1 (mod D). See Remark 4.12 for more details.
Recall that for x ∈ Z d , x 2 is well-defined modulo D.
Note that the subspace T O := {(Oy, y) : y ∈ Z t d } is a stochastic Lagrangian subspace in Σ t,t (d) (as defined above in Definition 4.1), and so we obtain a corresponding operator in the commutant, which we abbreviate by R(O) = R(T O ). It is easy to see that the operators R(O) define a representation of the group O t (d), so is the projector onto the invariant subspace for this action. Remarkably, not only do the R(O) stabilize all stabilizer tensor powers |S ⊗t (Eq. (4.13)), but Π min t is in fact the minimal perfectly complete test for stabilizer states: Theorem 5.6 (Minimal stabilizer test with perfect completeness). Let d be a prime and n, t 1. Then the projector Π min t is the orthogonal projector onto span {|S ⊗t : |S S| ∈ Stab(n, d)}.
Are there any other tensor power states in the support of Π min t ? For every d 2, we have proved above there exists some t 3 such that stabilizer testing is possible using t copies. Since the accepting POVM element is in each case the projector onto the invariant subspace of an element in O t (d) (e.g., the anti-identity for d = 2 and t = 6), it follows that in this case the only tensor power states contained in the support of Π min t are tensor powers of stabilizer states!

De Finetti theorems for stabilizer symmetries
Quantum de Finetti theorems provide versatile tools for the study of correlations in quantum states with permutation symmetry. They have found many important applications, from quantifying the monogamy of entanglement to proving security for quantum key distribution protocols, where de Finetti theorems allow to reduce general attacks to collective attacks [Ren05]. By now, several variants and generalizations are known [Stø69, HM76, RW89, Pet90, CFS02, KR05, DOS07, CKMR07,  Ren07, NOP09, KM09, BCY11, BH13, BH17, BCHW16]. Generally speaking, de Finetti theorems state that when ρ is a quantum state on (C ) ⊗t that commutes with all permutations (i.e., [r π , ρ] = 0 for all π ∈ S t , where r π are the permutation operators defined in Eq. (1.2)) then its reduced density operators ρ 1...s = tr s+1...t [ρ] are well-approximated by convex mixtures of i.i.d. states in some suitable sense if s t. E.g., for any such ρ there exists a probability measure dµ on the space of mixed states on C such that [CKMR07], Using the techniques developed for stabilizer testing, we prove two new versions of the quantum de Finetti theorem adapted to the symmetries inherent in stabilizer states. A key insight from the preceding section was that any stabilizer tensor power |S ⊗t is stabilized not only by the permutations, but by the larger group O t (d). This group contains includes in general many more elements, for example, the anti-identity (1.4) for the case of qubits. Our de Finetti theorems for stabilizer states show that if we consider arbitrary states ρ on ((C d ) ⊗n ) ⊗t that show symmetries of this kind, then the conclusions of the de Finetti theorem can be strengthened. In this case, the reduced states can be well-approximated by convex mixtures of tensor powers of stabilizer states in Stab(n, d) (rather than of general pure states in (C d ) ⊗n ).
Our first de Finetti theorem shows that the enlarged symmetry provided by the stochastic orthogonal group ensures that the approximation is exponentially good in the number of traced-out subsystems. This is remarkable, since the ordinary permutation symmetry-based de Finetti theorem achieves exponential convergence only if the form of allowed states is relaxed to include "almost product states" [Ren07] or "high weight vectors" (as opposed to highest weight vectors) [KM09]. Such a relaxation is, in fact, already necessary for classical distributions [DF80]. In detail: Theorem 7.6 (Exponential stabilizer de Finetti theorem). Let d be a prime and ρ a quantum state on ((C d ) ⊗n ) ⊗t that commutes with the action of O t (d) ⊇ S t . Let 1 s t. Then there exists a probability distribution p on the (finite) set of mixed stabilizer states of n qudits, such that Our theorem can be understood as a stabilizer version of the Gaussian Finetti theorems established in [LC09,Lev16]; cf. [DEL92]. The latter have been successfully used to establish security of continuous-variable quantum key distribution (QKD) protocols which admit the required symmetries [LGPRC13,Lev17]. Since the input states of entanglement-based QKD schemes [Eke91], are usually taken to be powers of stabilizer states, they show the enlarged symmetry identified here-a fact that seems to have been overlooked so far. It is this natural to study applications of our de Finetti theorems to QKD security proofs-we will report results on this elsewhere.
We can also ask to which extent the conclusions of Theorem 7.6 hold if we only slightly enlarge the symmetry group. The following theorem shows that if we consider quantum states that commute with permutations as well as the anti-identity (but not necessarily other elements of O t (d)) then we still get an approximation by mixtures of stabilizer tensor powers -but now with a polynomially rather than exponentially small error: Theorem 7.7 (Stabilizer de Finetti theorem for the anti-identity). Let ρ be a quantum state on ((C 2 ) ⊗n ) ⊗t that commutes with all permutations as well as with the action of the anti-identity (1.4) on some (and hence every) subsystem consisting of six n-qubit blocks. Let s < t be a multiple of six. Then there exists a probability distribution p on the (finite) set of mixed stabilizer states of n qubits, such that While Theorem 7.7 is stated here only for qubits, we believe that a similar result can be established in any prime dimension.

Robust Hudson theorem
Similar techniques can also be applied to pure states with a small amount of negativity in their phase space representation. More precisely, recall that for odd d the Wigner function of a quantum state ψ is defined by w ψ (x) = d −n tr[A x ψ], where the operators A x are the phase-space point operators mentioned above. The Wigner function is a quasi-probability distribution, i.e., x w ψ (x) = 1, but it can be negative. This negativity plays an important role -e.g., it is an obstruction to efficient classical simulability [VFGE12,ME12] and witnesses the onset of contextuality [HWVE14].
A mixed stabilizer state is a maximally mixed state on a stabilizer code (see Section 2.4).
In fact, pure stabilizer states are characterized by having a nonnegative Wigner function-this is the discrete Hudson theorem [Gro06]. Our next result shows that this characterization is robust, and that the robustness is independent of the system size (number of qudits). The relevant quantity is the Wigner or sum-negativity sn(ψ) = w ψ (x)<0 |w ψ (x)|, i.e., the absolute sum of negative entries of the Wigner function.
Theorem 8.4 (Robust finite-dimensional Hudson theorem). Let d be odd and ψ a pure quantum state of n qudits. Then there exist a stabilizer state |S such that | S|ψ | 2 1 − 9d 2 sn(ψ).
Our theorem gives a new quantitative meaning to the sum-negativity, and thereby to the related mana, a monotone from the resource theory of stabilizer states [VMGE14] that has attracted increasing attention in the theory of fault-tolerant quantum computation.

Random stabilizers, higher moments and designs
As mentioned to above, randomized constructions based on the Haar measure are often near-optimal, yet have the drawback that generic quantum states cannot be efficiently prepared. In contrast, random stabilizer states can be efficiently implemented, and early on, it had been discovered that they reproduce the same second moments as the Haar measure! More recently, there had been significant progress on the third and fourth moments [ZKGG16,HWW16,NW16], opening up several many applications where random Clifford unitaries and stabilizer states have successfully replaced the Haar measure [MGE11, HWFW17, KZG16, HNQ + 16, NW16]. To go beyond, however, a general understanding of the statistical properties of random stabilizer states is required.
The theory presented in this paper implies general formulas for the t-th moments of stabilizer states. For qudits, where T ranges precisely over the maximal isotropic stochastic subspaces from Theorem 4.3! Recall that a (complex projective) t-design is an ensemble of states {p i , |ψ i } such that the average of any polynomial of degree t identically matches the average of the same polynomial with respect to the Haar measure. In other words, a t-design satisfies where the right-hand side is the familiar formula for the maximally mixed state on the symmetric subspace in terms of the symmetrizer-an easy consequence of Schur-Weyl duality. When the stabilizer states form a t-design (t 3 for qubits, t 2 otherwise), Eq. Importantly, however, our formula allows us to compute an arbitrary t-th moment even when the stabilizer states deviate significantly from being a t-design. In fact, we demonstrate the power of the formula by using it to establish that following remarkable fact: Even when the stabilizer states (a single Clifford orbit) fail to be a t-design, we can obtain t-designs by taking a finitely many Clifford orbits with appropriately chosen weights: Theorem 6.2. Let d be a prime and n t − 1. Then there exists an ensemble {p i , Ψ i } M t,d i=1 of fiducial states in (C d ) ⊗n such that: That is, the corresponding ensemble of Clifford orbits is a complex projective t-design. Importantly, the number of fiducial states does not depend on the number of qudits n.

Pauli and Clifford group
Let d 2 be an arbitrary integer. We first consider a single qudit with computational basis vectors |q , where q ∈ {0, . . . , d − 1} or q ∈ Z d = Z/dZ. We define unitary shift and boost operators where ω = e 2πiq/d .
The algebra of shift and boost operators differs slightly depending on whether d is even or odd. For uniform treatment, one introduces τ = (−1) d e iπ/d = e iπ(d 2 +1)/d . Note that τ 2 = ω. Let D denote the order of τ. Then D = 2d if d is even, but D = d if d is odd (indeed, in this case τ = ω 2 −1 , where 2 −1 denotes the multiplicative inverse of 2 mod d). Then Y := τX † Z † is such that XYZ = τI, generalizing the commutation relation of the usual Pauli operators for qubits (where τ = i). For a single qudit, the Pauli group is generated by X, Y, Z or, equivalently, by τI, X, Z.
For n qudits, the Hilbert space is H n = (C d ) ⊗n , with computational basis vectors |q = |q 1 , . . . , q n , and the Pauli group P n is the group generated by the tensor product of I, X, Y, Z acting on each of the n qudits.
The Clifford group Cliff(n, d) is defined as normalizer of the Pauli group in the unitary group, modulo phases. That is, it consists of all unitary operators U that UP n U † ⊆ P n , up to phases. For qubits, the Clifford group is generated by the phase gate P = 1 0 0 i , the Hadamard gate H = 1 √ 2 1 1 1 −1 , and the controlled-NOT gate.

Weyl operators and characteristic function
At this point it is useful to recall the phase space picture of finite-dimensional quantum mechanics developed in [Woo87, App05, Gro06, GE08, DB13], which is analogous to the phase space formalism for continuous-variable systems used, e.g., in quantum optics [Sch11]. For x = (p, q) ∈ Z 2n , define the Weyl operator W x = W p,q = τ −p·q (Z p 1 X q 1 ) ⊗ · · · ⊗ (Z p n X q n ). (2.1) Clearly, each Weyl operator is an element of the Pauli group. Conversely, each element of the Pauli group is equal to a Weyl operator up to a phase that is a power of τ. It is not hard to see that the Weyl operators themselves only depend on x modulo D (which we recall is 2d or d, depending on whether d is even or odd). Indeed, and we have Parseval's identity By definition, if U is a Clifford unitary then, for every x ∈ V n , UW x U † is proportional to a Weyl operator W x , where we can take x ∈ V n in view of Eq. (2.2). Since conjugation preserves the commutation relations, this action has substantially more structure. In particular, the mapping x → x is implemented by an element of the symplectic group Sp(2n, d), i.e., a linear transformation of Z 2n d that preserves the symplectic form (2.3). The following facts are well-known in the literature (e.g. [App05,Gro06,DB13,Zhu15]).
Lemma 2.1. For any prime d and any n ∈ N, the following holds: 3. The quotient of the Clifford group by Weyl operators and phases is isomorphic to Sp(2n, d).
Below, we will frequently assume that a correspondence Γ → U Γ has been fixed.

Wigner function and phase space point operators
It is also useful to consider the symplectic Fourier transform, which for any function f : (2.9) The transformation f →f is unitary, i.e., we have Parseval's identity: xf (x)ĝ(x) = y f(y)g(y). The Fourier transform of the characteristic function is (up to normalization) known as the Wigner function [Woo87] w B : V n → C, defined by (2.10) where we have introduced the phase-space point operators (2.11) The operators {A x } form an orthogonal basis of the space of all operators, tr[A † x A y ] = d n δ x,y , so the Wigner function can be seen as the set of coefficients of an operator as expanded in this basis, x . Moreover, the Wigner function of a quantum state is a quasiprobability distribution in the sense that x w ρ (x) = 1.
For odd d the Wigner function is particularly well-behaved. For one, the phase-space point operators are Hermitian (this is also true for qubits) and they square to the identity (so the eigenvalues are ±1 and in particular A x = 1). This means that the Wigner function of a quantum state is real and −d −n w ψ (x) d −n . The phase-space point operators satisfy the following important identity: Moreover, (only) for odd d does the Wigner function transforms covariantly with respect to the Clifford group. Here, the Clifford operators can (up to overall phase) be parametrized by an affine symplectic transformation, i.e., by a symplectic matrix Γ ∈ Sp(2n, d) and a vector b ∈ Z 2n d . Then U = W v µ Γ is in Cliff(n, d), where µ Γ is the so-called metaplectic representation of the symplectic group (see, e.g., [Gro06]), and the conjugation action of U on phase-space point operators is given by (2.13) In particular, the Weyl operators induce a translation in phase space.

Stabilizer groups, codes, and states
We now give uniform account of the stabilizer formalism [Got97,Got99] for qudits. Stabilizer states are commonly defined in terms of the Pauli group in the following way: Consider a subgroup of the Pauli group S ⊆ P n that does not contain any (nontrivial) multiple of the identity operator. Then the operator is an orthogonal projection onto a subspace V S ⊆ H n of dimension d n /|S|. We say that V S is the stabilizer code associated with the stabilizer group S. If |S| = d n then this code is spanned by a single state, called a (pure) stabilizer state and denoted by |S S|. It is given precisely by Eq. (2.14). In other words, a stabilizer state |S is the unique +1 eigenvector (up to scalars) of all the Pauli operators in S, In the following we will mostly be talking about stabilizer groups that determine a pure state. We denote the (finite) set of pure stabilizer states in (C d ) ⊗n by Stab(n, d).
In order to connect the stabilizer formalism to the phase space picture, we observe that the stabilizer group can be written in the form (2.15) for some subset M ⊆ V n and some function f : M → Z d . The two pieces of data determine the stabilizer state uniquely. Indeed, |S can be characterized by demanding that Moreover, it is not hard to verify that M is closed under addition (because S is a group) and that [x, y] = 0 for any two elements x, y ∈ M. Thus, M is a totally isotropic submodule of the phase space V n (which itself can be thought of as a Z d -module). For simplicity, we will usually say subspace instead of submodule, although the latter terminology is more appropriate for non-prime d.
Moreover, |M| = d n , which is the maximal possible cardinality of any such subspace -one often says that M is a Lagrangian subspace and it holds that See, e.g., [Gro06,GW13] for further detail on this symplectic point of view. Conversely, suppose that M is a Lagrangian subspace of V n . Then there always exist functions f such that {ω f(x) W x } x∈M is a stabilizer group; we will denote the corresponding stabilizer states by |M, f . Any other such function f can be obtained by replacing f by g = f + δ, where δ : M → Z d is a Z d -linear function. We can always write δ(x) = [z, x]; then |M, g = W z |M, f . In this way, M parametrizes an orthonormal basis of H n worth of stabilizer states. In particular, any state that is a simultaneous eigenvector of the {W x } x∈M is necessarily a stabilizer state. It is not hard to verify that the quantum channel that implements the projective measurement in this stabilizer basis {|M, f } f is given by The fact that any stabilizer state can be parametrized as |S = |M, f will be of fundamental importance to our investigations. As a first consequence, we note that Eqs. (2.14) and (2.15) imply This shows that the characteristic function is given by i.e., it is supported precisely on the set M.
For odd d the phase is a linear function, so it can be written as f(x) = [a, x] for some suitable vector a (e.g., [Gro06, App. C]). This means that the Wigner functions of stabilizer states have the following form [GW13]: (using that M = M ⊥ for a pure stabilizer state). In particular, the Wigner function is non-negative. The finite-dimensional Hudson theorem asserts that, for pure states, the converse is also true [Gro06]. In Section 8 we will prove a robust version of this result.

Testing stabilizer states
Given two copies of an unknown pure state ψ = |ψ ψ| on H n , it is easy to verify using phase estimation whether |ψ is an eigenvector of a given Weyl operator W x . In particular, if W x is Hermitian then we simply measure twice and compare the result. The probability of obtaining the same outcome is where we recall that c ψ denotes the characteristic function defined in Eq. (2.6).
To turn this idea into an algorithm for testing whether ψ is a stabilizer state we need a way of generating good candidate Weyl operators. For this we note that, since ψ is a pure quantum state, is a probability distribution on the phase space V n . This follows directly from Eq. (2.7). We call p ψ the characteristic distribution of ψ. Now, if |ψ = |S = |M, f is a stabilizer state then Eq. (2.17) implies that p ψ is simply the uniform distribution on the subset M ⊆ V n : Note that p S is maximally sparse in the case of pure stabilizer states, since it always holds true that 0 p ψ (x) d −n . Therefore, if we sample from the characteristic distribution of a stabilizer state then Eq. (3.3) shows that we would with certainty obtain the label of a Weyl operator for which |ψ is an eigenvector. Importantly, the converse of this statement is also true: Suppose that |ψ is an eigenvector of all Weyl operators W x for x in the support of the characteristic distribution (i.e., p ψ (x) > 0). Since p ψ (x) d −n , the support of p ψ contains at least d n points. Thus if |ψ is an eigenvector of all these Weyl operators then the support must be exactly of cardinality d n and so |ψ is a stabilizer state. This suggests the following algorithm: 1. Sample from the characteristic distribution of ψ. Denote the result x.
2. Measure the corresponding Weyl operator W x twice and accept if the result is the same.
By the preceding discussion, this test will accept if and only if the state is a stabilizer state. But how do we go about sampling from the characteristic distribution?

Qubit stabilizer testing and Bell difference sampling
When the wave function |ψ is real in the computational basis then sampling from the characteristic distribution can be achieved by Bell sampling, introduced for qubits in [Mon17] (cf. [ZPDF16]). Bell sampling amounts to performing a basis measurement in the basis obtained by applying the Weyl operators to a fixed maximally entangled state, |W x = (W x ⊗ I) |Φ + . Since the Weyl operators are orthogonal, |W x is an orthonormal basis of the doubled Hilbert space H n ⊗ H n . Using the transpose trick, In case the wave function is real, Eq. (3.4) is exactly equal to p ψ (x); Bell sampling therefore allows us to implement step (1) above given two copies of the unknown state ψ.
In general, however, the transformation ψ → ψ = ψ T cannot be implemented by a physical process, since the transpose map is well-known not to be completely positive. Thus we need a new idea to treat the general case where the wave function can be complex.
We start with the observation that if ψ is a stabilizer state then so is ψ.
on phase space (note that the phase is trivial when d is odd and so always well-defined mod d).
On the other hand, For qubits (d = 2), the involution J is trivial. This means that if ψ is a stabilizer state then ψ and ψ are characterized by the same subspace M, but possibly different phases. We saw above that (only) in this case there exists a Weyl operator W z such that |ψ = W z |ψ . As a consequence, if we perform Bell sampling on |ψ ⊗ |ψ then, from Eq. (3.4), Of course, z is an unknown vector that depends on the stabilizer state ψ. But since z depends only on the stabilizer state ψ, it is clear that we may Bell sample twice and take the difference of the result in order to obtain a uniform sample a from the subspace M. Formally: Definition 3.1 (Bell difference sampling). We define Bell difference sampling as performing Bell sampling twice and subtracting (adding) the results from each other (modulo two). In other words, it is the projective measurement on four copies of a state, ψ ⊗4 ∈ ((C 2 ) ⊗n ) ⊗4 , given by It is not obvious that Bell difference sampling should be meaningful for non-stabilizer quantum states ψ. The following theorem shows that it has a natural interpretation for general states: Theorem 3.2 (Bell difference sampling). Let ψ be an arbitrary pure state of n qubits. Then: If ψ is a stabilizer state, say |S S|, then this is equal to p S (a) from Eq. (3.3).
The proof of Theorem 3.2 uses the symplectic Fourier transform defined in Eq. (2.9). Remarkably, the characteristic distribution of any pure state is left invariant by the Fourier transform: where the third step is Eq. (2.7) (note that for qubits the characteristic function is real).
We now give the proof of Theorem 3.2: Proof of Theorem 3.2. We start with the observation that Π a = ( On the other hand, it is easy to verify that , it is the projection onto a stabilizer code of dimension 2 2n , which played an important role in [ZKGG16], and Bell difference sampling achieves precisely the syndrome measurement for this code). It follows that and so the third equality is the unitarity of the Fourier transform, which also maps modulations to translations, and in the last step we used Eq. (3.5), namely that the characteristic distribution of a pure state is left invariant by the Fourier transform.
Theorem 3.2 motivates Algorithm 1 as a natural algorithm for testing whether a multi-qubit state is a stabilizer state. The following theorem shows that stabilizer states are the only states that are accepted with certainty, and it quantifies this observation in a dimension-independent way: Theorem 3.3 (Stabilizer testing for qubits). Let ψ be a pure state of n qubits. If ψ is a stabilizer state then Algorithm 1 accepts with certainty, The converse bound of Theorem 3.3 can be stated equivalently as (3.8) Proof. According to Theorem 3.2, step 1 of the algorithm samples elements a with probability q(a) = x p ψ (x)p ψ (x + a).
Let us first discuss the case that ψ is a stabilizer state, say |ψ = |M, f . Since p ψ (x) is the uniform distribution over M, which is a subspace, it holds that q(a) = p ψ (a), since, for x ∈ M, x + a ∈ M if and only if a ∈ M. But this means that a ∈ M with certainty. Thus, |ψ is an eigenvector of the corresponding Weyl operator W a and step 2 of the test always accepts.
We now consider the case that ψ is a general pure state. Our goal will be to show that if Algorithm 1 succeeds with high probability then there must exist a stabilizer state with high overlap with ψ. According to Eq. (3.1), the probability of acceptance is given by where we recall that q(a) = x p ψ (x)p ψ (x + a). Thus, by the Cauchy-Schwarz inequality, where we have also used the fact that p ψ is a probability distribution. Intuitively, this bound shows that if our test accepts with high probability then p ψ (a) ≈ 2 −n with high probability. Indeed, let us consider Then Markov's inequality (which can be applied since it is always true that p ψ 2 −n ) asserts that The choice of threshold 1/2 in the definition of M 0 ensures that the Weyl operators corresponding to any two points a, b ∈ M 0 commute. To see, we use that any pair of anticommuting W a , W b can by a base change be mapped onto the Pauli operators X,Z; it can then verified on the Bloch sphere that there exists no qubit state ρ such that both tr[ρX] 2 > 1/2 and tr[ρZ] 2 > 1/2 (see Fig. 1 for a graphical proof).
Let us now extend the set M 0 to some maximal set M such that the corresponding Weyl operators commute. Then M is automatically a Lagrangian subspace, of dimension n. As discussed in Section 2.4, it determines a whole basis of stabilizer states, {|M, f } f . Thus: where we used Eq. (2.16) for the measurement Λ M in the stabilizer basis; the last bound is Eq. (3.10).
Our theorem has the following consequence for quantum property testing, resolving an open question first raised by Montanaro  Corollary 3.4. Let ψ be a pure state of n qubits and let ε > 0. Then there exists a quantum algorithm that, given O(1/ε 2 ) copies of ψ, accepts any stabilizer state (it is perfectly complete), while it rejects states such that max S | S|ψ | 2 1 − ε 2 with probability at least 2/3. Before our result, the best known algorithms required a number of copies that scaled linearly with n, the number of qubits. Indeed, these algorithms proceeded by attempting to identify the stabilizer state, which requires Ω(n) copies by the Holevo bound [AG08, Mon17,ZPDF16]. Moreover, our algorithm is manifestly efficient (see the circuit in Fig. 2).
Remark 3.5. For multi-qubit states ψ that are real in the computational basis, we can replace step 1 of the algorithm by a single Bell sampling, which in this case directly samples from the characteristic distribution p ψ (see Eq. (3.4)). The resulting algorithm operators on four copies of ψ and achieves the same guarantees as Theorem 3.2.
Remark 3.6. The scaling in Theorem 3.2 is optimal. Indeed, it is known that distinguishing any fixed pair of states |ψ , |φ with | ψ|φ | 2 = 1 − ε 2 requires Ω(1/ε 2 ) copies [MdW16]. In particular, this lower bound holds if we choose |ψ to be a stabilizer state and |φ a state that is ε-far away from being a stabilizer state, in which case our Algorithm 1 is applicable. It is instructive to write down the accepting POVM element for Algorithm 1. From Eqs. (3.1) and (3.7), we find that it is given by

Remark 3.7 (Clifford testing). It follows from Theorem 3.3 that we can also test whether a given unitary U is in the Clifford group or not (without given access to U † ). This resolves another open question in the survey of Montanaro and de
where we have introduced the unitary It is easy to verify that U = u ⊗n is a Clifford unitary acting on the space H ⊗6 n ∼ = H 6n of 6n qubits. For any pure state ψ, ψ ⊗n is in the symmetric subspace, and so invariant under left and right-multiplication by permutations. In particular, we obtain a test of the same goodness as 12)) denotes the operator that swaps (or flips) two blocks of n qubits. Since F = 2 −n b W ⊗2 b ,we obtain the formula (3.12) Thus, we recognize that the unitary V = v ⊗n is precisely the action of the anti-identity (1.4) described in the introduction (for t = 6): See also Remark 3.9. We discuss anti-permutations in more detail in Definition 4.29. Equation (3.12) allows us to express the acceptance probability of Algorithm 1 in an interesting way: (3.14) It is intuitive that the p -norms should appear, since stabilizer states can be characterized by having a maximally peaked characteristic function and distribution (Eqs. (2.17) and (3.3)).
In fact, the result of this calculation is plainly a strengthening of Eq. (3.9), since 2 n p ψ (x) 1. If we follow the rest of the proof of Theorem 3.3 then we obtain p accept 1 − 3ε 2 /8, a slight improvement. More importantly, though, this argument completely avoids the analysis of Bell difference sampling in Theorem 3.2. This leads us towards an approach for testing general qudit stabilizer states.

Qudit stabilizer testing
While Bell sampling can only be used for qubit systems, Eq. (3.12) has a clear generalization to arbitrary qudits. Let d 2 and consider the operator (3.15) (For qubits, the Weyl operators are Hermitian and so V 3 is precisely Eq. (3.12).) Suppose we choose s such that V s is a Hermitian unitary (we will momentarily see that this can always be done). Then is a projection. If we think of it as the accepting element of a binary POVM then If s is invertible modulo d then ω s[−,z] is a nontrivial character for all z, and so the inner sum simplifies to d 2n δ z,0 . It follows that V 2 s = I, as desired. We summarize: Indeed, V 2 = 2 n Π 0 where Π 0 is the projection from Eq. (3.6). Thus V 2 = 2 n and so we cannot interpret the associated Π 2 as a POVM element. This already partly explains why we had to resort to six copies to test stabilizerness.
The second ingredient used to establish Theorem 3.3 was an uncertainty principle for Weyl operators. The following lemma supplies this for general d: Lemma 3.10 (Uncertainty relation). Let δ = 1/2d and ψ a pure state such that |tr[ψW x ]| 2 > 1 − δ 2 and |tr[ψW y ]| 2 > 1 − δ 2 . Then W x and W y must commute.
Proof. Note that W x |ψ − |ψ ψ|W x |ψ < δ and likewise for W y . By the triangle inequality, If we combine this with another triangle inequality, we obtain that Now suppose that W x and W y do not commute. Then [x, y] = 0 and so Thus, 4/d < 4δ/(1 − δ 2 ), which plainly contradicts our choice of δ. This is the desired contradiction and we conclude that W x and W y commute.
We now show that stabilizer testing can be done in arbitrary local dimension: Proof. If ψ is a stabilizer state, say |ψ = |M, f , then p ψ (x) is the uniform distribution on M, which has d n elements. In view of Eq. (3.16), so the test accepts with certainty. Now suppose that ψ is a general state. Define By Lemma 3.10, the Weyl operators W x for x ∈ M 0 all commute. We can thus extend M 0 to a maximal set M with this property. As in the proof of Theorem 3.3, we can bound But this probability can be bounded as before using the Markov inequality (but now for a (s − 1)st moment): The last equality is Eq. (3.16). This yields the desired bound.
Remark 3.12. It is clear that s = d + 1 is always a valid choice in Theorem 3.11. This leads to C d,s ≈ 1/8d for large d, but the resulting test involves gates that act on 2d + 2 qudits at a time. However, this choice of s is in general rather pessimistic. E.g., if d is odd then we may always choose s = 2, meaning that our test acts on four copies at a time.
Corollary 3.13. Let d 2 and fix s as in Theorem 3.11. Let ψ be a pure state of n qudits and let ε > 0. Then there exists an quantum algorithm that, given O(1/C d,s ε 2 ) copies of ψ, accepts any stabilizer state (it is perfectly complete), while it rejects states such that max S | S|ψ | 2 1 − ε 2 with probability at least 2/3.
It is clear that the POVM measurement {Π s,accept , I − Π s,accept } can be implemented efficiently. Using phase estimation, it suffices to argue that the controlled version of V s can be implemented efficiently. But V s = v ⊗n s , so its controlled version is equal to a composition of n controlled versions of v s , each of which acts only on a constant (with respect to n) number of qudits. It follows that our stabilizer test for qudits is efficient.
It is instructive to compute the action of the unitary V s = v ⊗n s more explicitly: Let |x = |x 1 , . . . , x 2s denote a computational basis vector of H ⊗2s n . Then, using Eq. (2.1), wherex even = s −1 k even x k andx odd is defined analogously. If we re-order the tensor factors so that the odd systems come first, followed by the even ones, we find that a basis vector |x odd , x even is mapped to |x odd −x odd +x even , x even +x odd −x even . Thus, V s is a unitary that permutes the computational basis vectors by "swapping the mean" of the even and the odd sites of the 2s many blocks of n qudits.
Here is one last reformulation that will be useful to connect to our algebraic results. Let p 2s = (−1, 1, . . . , −1, 1) ∈ Z 2s d denote the 'parity vector' that is ±1 on even/odd sites, and consider the following 2s × 2s matrix with entries in Z d : Then we can write the action of V s as It is easy to verify that1 is a stochastic isometry (cf. Eq. (4.36) in Section 4.3). For qubits and s = 3,1 is just the matrix obtained by taking the 6 × 6 identity matrix and inverting each bit (the 'anti-identity'). This gives a pleasant and insightful interpretation of Eq. (3.12), as we will see in Section 5.2. Interestingly, the anti-identity has previously appeared in the classification of Clifford gates in [GS16] (their T 6 ).

Algebraic theory of Clifford tensor powers
In this section, we present a general framework for studying the algebraic structure of stabilizer states and Clifford operators. We start by describing the commutant of the tensor powers of the Clifford group, where we obtain results similar in flavor to the Schur-Weyl duality between the unitary group and the symmetric group. Next, we apply this machinery to compute arbitrary moments of qudit stabilizer states, and we describe how to construct t-designs of arbitrary order from weighted Clifford orbits. Lastly, we return to the stabilizer testing problem and explain how our solution from Section 3 can be understood more systematically and generalized. In particular, we find an optimal projection that characterizes the tensor powers of stabilizer states precisely. Throughout this section we assume that d is prime.

Commutant of Clifford tensor powers
Schur-Weyl duality in its most fundamental form asserts that any operator on (C D ) ⊗t that commutes with U ⊗t for all unitaries U ∈ U(D) is necessarily a linear combination of permutation operators.
Using the double commutant theorem, this implies at once that where the V U(D),λ and V S t ,λ are pairwise inequivalent irreducible representations of the unitary group U(D) and of the symmetric group S t , respectively. The main result of this section is that the commutant of the tensor powers of the Clifford group can be completely described in terms of a natural generalization of permutation operators (see Theorem 4.3 below). Mathematically, this generalization involves Lagrangian subspaces of a space equipped with a quadratic form. Since stabilizer states can be described in terms of Lagrangian subspaces with respect to a symplectic form (Section 2), this is reminiscent of Howe's classical duality between sympletic and orthogonal group actions.
To describe the result more precisely, let T denote a subspace of Z t d ⊕ Z t d . We define a corresponding operator r(T ) = (x,y)∈T |x y| on (C d ) ⊗t , where |x = |x 1 , x 2 , . . . , x t ∈ (C d ) ⊗t denotes the computational basis vector associated with some x ∈ Z t d . We also consider the n-fold tensor power Both r(T ) and R(T ) are represented by real matrices in the computational basis.
Definition 4.1 (Σ t,t ). Consider the quadratic form q : Z 2t d → Z D defined by q(x, y) := x · x − y · y. We denote by Σ t,t (d) the set of subspaces T ⊆ Z 2t d satisfying the following properties: 1. T is totally q-isotropic: i.e., x · x = y · y (mod D) for all (x, y) ∈ T .
2. T has dimension t (the maximal possible dimension).
We will summarize the first two conditions by saying that T is Lagrangian. Thus, we will call Σ t,t (d) the set of stochastic Lagrangian subspaces.
Note that for x ∈ Z d , x 2 is well-defined modulo D.
See [NW16, App. C] for a complete list of the subspaces Σ t,t (d) for t = 3, and Section 4.3 for examples. In Lemma 4.5, we will show that the operators R(T ) are indeed in the commutant of Cliff(n, d) ⊗t . The proof is straightforward and elucidates the role of the three conditions in Definition 4.1 as well as the difference between even and odd d.

Remark 4.2.
Recall that a subspace T is called totally isotropic with respect to a quadratic form q if q(v) = 0 for every v ∈ T . This explain our terminology in Definition 4.1.
We can also consider the Z d -valued bilinear form b ((x, y), (x , y )) := x · x − y · y ∈ Z d . By a straightforward calculation, for all v, w ∈ Z 2t d . Thus, q is a Z D -valued quadratic form associated to the Z d -bilinear form b in the sense of [Woo93]. Note that if T is totally isotropic with respect to q then Eq. (4.1) shows that T is self-orthogonal, i.e., T ⊆ T ⊥ , where If d is odd then q(v) = b(v, v), so any self-orthogonal subspace is automatically totally isotropic with respect to q. If d = 2 then Eq. (4.1) implies that, for a self-orthogonal subspace, the set of isotropic vectors forms a subspace -so we can check total isotropicity on a basis. Moreover, for d = 2, if T is Lagrangian then it is automatically stochastic; indeed, b(v, 1 2t ) = q(v) (mod 2), so 1 2t is contained in any maximal totally isotropic subspace.
Our goal of this section it to prove the following theorem: It is instructive to discuss a few key features of Theorem 4.3. First, we know that the permutation group on t elements, S t , is in the commutant of the Clifford group Cliff(n, d), because it is even in the commutant of the larger unitary group U(d n ). Indeed, let π · y = (y π −1 (1) , . . . , y π −1 (t) ) denote the permutation action of S t on Z t d . The one can see that, for any permutation π ∈ S t , the subspace T π = {(π·y, y) : y ∈ Z t d } is Lagrangian and stochastic. The corresponding operator R(T π ) = r(T π ) ⊗n agrees precisely with the usual permutation action of S t on ((C d ) ⊗n ) ⊗t . Accordingly, we may identify S t with a subset of Σ t,t (d). We will see below in Definition 4.11 that the set of subspaces T for which R(T ) is invertible forms a (in general, proper) subgroup that is (in general, strictly) larger than S t .
Remarkably, Theorem 4.3 shows that the size of the commutant stabilizes as soon as n t − 1. That is, just like for the symmetric group in Schur-Weyl duality of D, the set Σ t,t (d) that parametrizes the commutant of the Clifford tensor powers is independent of n, the number of qudits, provided that n t − 1. This stabilization, along with the fact that the operators R(T ) = r(T ) ⊗n are tensor powers, are highly useful properties in applications (e.g., [NW16] and Sections 5 and 5.2 below).
Remark 4.4. We believe that the results of Nebe et al [NRS06] show that the operators R(T ) span the commutant of Cliff(n, d) ⊗t for any value of n. But we caution that if n < t − 1 then the R(T ) are in general no longer linearly independent (e.g., [Zhu15,eqs. (9) and (10)  Proof. Up to global phases, the Clifford group is generated by the following three operators, which are allowed to act on arbitrary qudits or pairs of qudits [Got99, Far14, NBD + 02]: The Fourier transform (also known as the Hadamard gate for d = 2), the phase gate, which is defined as (here we use that for d = 2, a 2 is well-defined modulo four, while for odd d, 2 has a multiplicative inverse, denoted 2 −1 ), and the controlled addition (also known as the CNOT gate for d = 2) CADD = a,b∈Z d |a, a + b a, b| .
To establish the lemma we will prove the claim for each generator (cf. [NRS06]). The Fourier transform H is a one-qudit gate, so it suffices to show that [H ⊗t , r(T )] = 0 for every T ∈ Σ t,t (d). Indeed: In the second step and third steps, we used the notation b and T ⊥ from Remark 4.2, respectively, as well as that dim T = t. The last step holds since T = T ⊥ , as T is a Lagrangian subspace. Next, we consider the phase gate, which is likewise a single-qudit gate. For d = 2, we have that since T is totally isotropic. For odd d, we instead compute since T is totally isotropic and stochastic (so w = v − 1 2t ∈ T and b(v, w) = 0 for every v ∈ T ). Lastly, we consider the controlled addition gate, which is a two-qudit gate: where we only used that T is a subspace.
We now show that the operators R(T ) are linearly independent as soon as n t − 1. For this, we introduce the following useful notation:  Proof. For each T ∈ Σ t,t (d), consider the vectorization of r(T ), which we denote by |T := vec (r(T )) = v∈T |v ∈ (C d ) ⊗2t . Note that v|T = δ v∈T . Clearly, vec (R(T )) = vec (r(T )) ⊗n = |T ⊗n . Therefore, we want to show that the vectors |T ⊗n are linearly independent as soon as n t − 1. But each T is t-dimensional and contains the vector 1 2t . Extend it by v 1 , . . . , v t−1 to a basis of T . Then, if T is another subspace: This concludes the proof.
So far, we have accomplished the task of finding a large set of linearly independent operators in the commutant of Cliff(n, d) ⊗t , one for each element of Σ t,t (d). In the remainder of this section we will compute the dimension of the commutant as well as the cardinality of Σ t,t (d), and show that the two numbers agree precisely. We will use the Gaussian binomial coefficients, which are defined by It is well-known that n k d equals the number of k-dimensional subspaces in Z n d . The Gaussian binomial coefficients satisfy the following analogs of Pascal's rule, and of the binomial formula, We now compute the dimension of the commutant. This has previously been done for t 4 by Zhu [Zhu15] and before that for d = 2, n = 1 by van den Nest et al [vdNDdM05].
We start with the following result from [Zhu15], which reduces the dimension computation to a counting problem. Zhu arrived at this result by computing the frame potential of the Clifford group -essentially, the norm squared of the character of the representation U → U ⊗k . In contrast, we will follow the approach by van den Nest et al, who considered the action of the Clifford average (also known as the twirl operation or Reynolds operator) on the basis of Weyl operators, correcting a glitch in [vdNDdM05] along the way.
Lemma 4.8 ( [Zhu15,vdNDdM05]). The dimension of the commutant of Cliff(n, d) ⊗t is equal to the number of orbits for the diagonal action of the symplectic group Sp(2n, d) on t − 1 copies of the phase space Z 2n d , i.e., for the action where Γ ∈ Sp(2n, d) and (x 1 , . . . , x t−1 ) ∈ (Z 2n d ) t−1 . Proof. We will show that the dimension of the commutant is equal to the number of orbits for the diagonal action of Sp(2n, d) on which is plainly an equivalent statement.
We start by noting that the Weyl operators W x for x = (x 1 , . . . , x t ) ∈ (Z 2n d ) t form a basis of the space of operators on ((C d ) ⊗n ) ⊗t . We can thus obtain a generating set of the commutant by averaging each Weyl operator W x with respect to the tensor power action of the Clifford group. According to Lemma 2.1, we can for each symplectic matrix Γ fix a Clifford unitary U Γ such that the set of {U Γ W b } equals the Clifford group, up to global phases. Let us denote by f Γ the phase function corresponding to U Γ , as in Eq. (2.8). Thus, the average of the Weyl operator W x is, up to overall normalization, given by and Γ x := (Γ x 1 , . . . , Γ x t ). When d is odd, the phase function f can be chosen to vanish (Lemma 2.1). Thus, the averaged operator is equal to the sum of Weyl operators over the Sp(2n, d)-orbit of x, provided x ∈ W t , and zero otherwise. Since distinct orbits are disjoint, it is clear that we obtain a basis of the commutant by averaging one Weyl operator for each orbit of the diagonal action of Sp(2n, d) on W t . Now consider the case where d = 2. To each x ∈ W t , associate the phase φ x (a power of τ) such that In Ref. [vdNDdM05], the relative phases ω f Γ (x) that appear in our Eq. (4.5) are all taken to be trivial (for qubits). The origin seems to lie in their Section II, where it is stated-in their language-that α 1 α 2 α 3 = 1. But this holds only if their π is cyclic. Clifford operations inducing non-cyclic permutations do, however, exist. We thank Huangjun Zhu for identifying the root of the apparent contradiction.
Then, for each Γ ∈ Sp(2n, d), It follows that the phase function f Γ (x) depends only on x and Γ x (rather than directly on Γ ) and is given explicitly by the quotient In particular, if y is in the same Sp(2n, d)-orbit as x then Λ Cliff (W y ) = φ y φ x Λ Cliff (W x ), i.e., the two averaged operators only differ by a global phase. Thus, also for d = 2 we obtain a basis of the commutant by averaging one Weyl operator for each orbit of the diagonal action of Sp(2n, d) on W t .
We now derive an explicit formula for the dimension of the commutant. Proof. To count the number of orbits of the action (4.4), we will associate to any orbit O an invariant, the dimension, defined by dim(O) = dim span {x 1 , . . . , x t−1 }, where (x 1 , . . . , x t−1 ) is any point in the orbit. We write Ω t for the set of all orbits and Ω t for the set of orbits with dimension . We will establish and solve the following recursion relation: To see why this is true, suppose (x 1 , . . . , x t−1 ) ∈ Ω t . Then there are two cases: 1. x t−1 ∈ span {x 1 , . . . , x t−2 }: Then the orbit through (x 1 , . . . , x t−2 ) is in Ω t−1 , and there are d ways to choose x t−1 ∈ span{x 1 , . . . , x t−2 }. Together, this contributes |Ω t−1 |d many orbits to Ω t .
Next, we count the number of stochastic Lagrangian subspaces. To this end, define the "diagonal subspace" Theorem 4.10 (Cardinality of Σ t,t ). We have |Σ t,t (d) (4.8) which implies the claim by the same calculation as in Eq. (4.7). To start, consider a subspace T ∈ Σ t,t (d) and consider with X a (t − )-dimensional subspace that is uniquely determined by T . Since T is stochastic, we know that 1 t ∈ X. Fix a basis x 1 , . . . , x t− of X and extend it by vectors z 1 , . . . , z to a basis of Z t d . Denote the dual basis with respect to the ordinary dot product byx 1 , . . . ,x t− ,ẑ 1 , . . . ,ẑ . Now, any vector in Z 2t d , so particularly in T , can be written uniquely in the form (a + b, b). The subspace of vectors where b is a linear combination of z 1 , . . . , z forms a complement of T ∆ ⊆ T , which we shall denote by T N . Since T N ∩ ∆ = {0}, we know that a = 0 for any nonzero vector in T N . The condition that T is self-orthogonal implies that a · x i = 0 for all i = 1, . . . , t − , so that a is a linear combination ofẑ 1 , . . . ,ẑ . Since also dim T N = , this implies that T N has a unique basis of the form (ẑ 1 + w 1 , w 1 ), . . . , (ẑ + w , w ), where each w i is of the form w i = j=1 A ij z j . We still need to implement the condition that T N is self-orthogonal. In terms of the matrix A = (A ij ), this means that for any i, j. This means that the lower triangular part of A is uniquely determined by the upper triangular part. For d = 2, (4.9) furthermore implies that the diagonal entries of A are fixed, so there are in total d ( −1)/2 many options for A. We have thus implemented all conditions for T to be a subspace in Σ t,t (d) since, according to Remark 4.2, for d = 2, any self-orthogonal T is automatically totally isotropic. The set of (t − )-dimensional subspaces of Z t d that contain 1 t are in bĳection with the (t − − 1)-dimensional subspaces in Z t d /Z d 1 t , hence there are t−1 t− −1 d = t−1 d many choices for X. Together, we obtain (4.8).
For d = 2, (4.9) gives no constraint about the diagonal entries of A. Instead, it asserts that z i ·ẑ i = 0 or, equivalently, thatẑ i · 1 t = 0 for i = 1, . . . , , which is automatically satisfied since 1 t ∈ X. We will now show that there is a unique choice for the diagonal entries of A such that T is totally isotropic with respect to the Z 4 -valued quadratic form q. By the discussion in Remark 4.2, since T is self-orthogonal, it suffices to consider T ∆ and its complement T N separately. But the vectors in T ∆ are automatically isotropic, while for T N total isotropy amounts to the condition that which fixes the A ii uniquely. We thus obtain (4.8) by the same counting as above.
We finally obtain Theorem 4.3 as a consequence of the preceding results.
Proof of Theorem 4.3. By combining Lemma 4.5 and Theorems 4.9 and 4.10, we see that the operators R(T ) form a basis of the commutant of Cliff(n, d) ⊗t on ((C d ) ⊗n ) ⊗t .
It is interesting to note that all elements R(T ) of our basis of the commutant of Cliff(n, d) ⊗s have the property that S ⊗t |R(T )|S ⊗t = 1 for every stabilizer state |S . Indeed, if T ∈ Σ t,t (d) and |S = U |0 ⊗n for some Clifford unitary U, then where we used that 0 ∈ T (see also Eq. (4.13) below).

Structure of the commutant
Theorem 4.3 is in the spirit of Schur-Weyl duality in that it establishes a natural basis of the commutant of the tensor power action of the Clifford group (a subgroup of the unitary group), generalizing the permutation operators. Yet, in contrast to the permutation group, Σ t,t (d) is not in general a group and the operators R(T ) for T ∈ Σ t,t (d) are not always invertible. In this section we show that Σ t,t (d) has a rich algebraic structure.
We first observe that there is a maximal subset of Σ t,t (d) that carries a group structure such that the R(T ) form a (unitary) representation. The following definition and lemma identify these elements: Definition 4.11 (O t ). Consider the quadratic form q : Z t d → Z D defined by q(x) := x · x. We define O t (d) as the group of t × t-matrices O with entries in Z d that satisfy the following properties: We will refer to O t (d) as the stochastic orthogonal group; its elements will be called stochastic isometries.
To see that O t (d) forms a group we only need to observe that The following remark is completely analogous to Remark 4.2.

Remark 4.12. Recall that a linear map is an isometry with respect to a quadratic form q if q(Ox) = q(x)
for all x ∈ Z t d . This justifies our terminology in Definition 4.11. As before, we note that q is a Z D -valued quadratic form associated to the Z d -bilinear form x · y in the sense of [Woo93], namely, q(x + y) = q(x) + q(y) + 2x · y (mod D). (4.11) In particular, any O ∈ O t (d) is an orthogonal matrix in the ordinary sense that O T O = I (mod d), i.e., Ox · Oy = x · y for all x, y ∈ Z t d . If d is odd then q(x) = x · x, so any orthogonal matrix is automatically a q-isometry.
If d = 2 then Eq. (4.11) implies that an orthogonal matrix O is a q-isometry provided that q(x) = 1 (mod 4) for every column of O or, equivalently, for every row of O (since O T = O −1 ). In particular, any q-isometry is automatically stochastic.
The significance of Definition 4.11 is the following observation.
Proof. Only the converse needs justification. Note that in order for R(T ) to be invertible, both subspaces {x : (x, y) ∈ T } and {y : (x, y) ∈ T } of Z t d should be t-dimensional (corresponding to r(T ) having full row and column rank). The claim now follows easily.
We will often regard O t (d) as a subset of Σ t,t (d) via the assignment O → T O . Note that any permutation matrix satisfies the conditions of Definition 4.11, so we can consider S t as a subgroup of O t (d) for every value of d, and hence as a subset of Σ t,t (d). [Zhu15,Web16].

This identity always holds up to t = 2, and up to t = 3 precisely in the case of qubits (d = 2). Thus the multiqubit Clifford group is a 3-design (but not a 4-design), while in higher dimensions the Clifford group is only a 2-design (but not a 3-design), reproducing prior beautiful results
for every stabilizer state |S . That is, stochastic isometries stabilize any stabilizer tensor power. We will return to discussing the implications of this important fact in Section 5 below.
Next, we note that the group O t (d) naturally acts on the elements of Σ t,t (d) from left and right, suggesting that it is the natural symmetry group of Σ t,t (d).

Definition 4.15 (Left and right action on subspaces). Consider a subspace T ∈ Σ t,t (d) and a matrix O ∈ O t (d). We define the left action of O on T as follows:
OT = {(Ox, y) : (x, y) ∈ T }. (4.14) Similarly, the right action of O on T is defined as: It is easy to check that OT, T O ∈ Σ t,t (d).

Note that this action is consistent with the composition of the operators R(T ) and R(O): For all
We can therefore decompose Σ t,t (d) into a disjoint union of double cosets with respect to the left and right action: where T 1 , . . . , T k are choices of subspaces in Σ t,t (d) that represent the different cosets. We note that O t (d) is always one of the double cosets in Eq. (4.16), corresponding to, e.g., the choice T 1 = ∆.
We will now derive a complete classification of the double cosets. We start with the central definition. Recall the quadratic form q : Z t d → Z D , q(x) := x · x from Definition 4.11. Definition 4.16 (Defect subspaces). A defect subspace is a subspace N ⊆ Z t d with the following properties: 1. N is totally q-isotropic: i.e., q(x) = 0 (mod D) for all x ∈ N.
2. N is co-stochastic: 1 t ∈ N ⊥ , i.e., x · 1 t = 0 (mod d) for every x ∈ N. is an element in Σ t,t (d). Note that, necessarily, dim N = dim M (since J is invertible) and 1 t ∈ N if and only if 1 t ∈ M (since J is also stochastic). We now show that all elements of Σ t,t (d) can be obtained in this way. Proof. The first claim is clear from Definition 4.1. Since T is stochastic, so (1 t , 0) ∈ T if and only if (0, 1 t ) = 1 2t − (1 t , 0) ∈ T , which proves half of the second claim. Next, consider the maps Then T LD ∼ = ker π R and T RD ∼ = ker π L , while T L = ran π L and T R = ran π R . Note that T L ⊆ T ⊥ LD and T R ⊆ T ⊥ RD , since T is totally isotropic. Using the rank-nullity theorem, Adding the two inequalities we see that they must both be equalities, hence dim T LD = dim T RD as well as T L = T ⊥ LD , T R = T ⊥ RD . This establishes the second and third claim. For the fourth claim, first recall from above that T ⊥ LD = T L and T ⊥ RD = T R , which means that for any y ∈ T ⊥ RD there exists some x ∈ T ⊥ LD such that (x, y) ∈ T . Next, suppose that (x, y), (x , y ) ∈ T such that y − y ∈ T RD . Then (0, y − y ) ∈ T , so which means that x − x ∈ T LD . As a consequence, [y] → [x(y)] is well-defined as a map from T ⊥ RD /T RD to T ⊥ LD /T LD . Using Definition 4.1, it is not hard to see that it defines an element of Iso(T RD , T LD ).
The fifth claim is clear by construction of T LD and T RD and from the fact that [y] → [x(y)] is well-defined. And the last claim can be seen to hold since the right-hand side of (4.17) is clearly a subset of T for our choice of N, M, and J, but also of dimension t. Which elements T ∈ Σ t,t (d) can be obtained in this way? Clearly, the left-right action by O t (d) preserves the common dimension of the defect subspaces and whether the all-ones vector is contained. We will now show that these are the only two invariants. The assumption implies that dimÑ = dimM. Choose any linear isomorphismÕ :Ñ →M such thatÕ1 t = 1 t . Since both N and M are totally isotropic and co-stochastic,Õ is an isometry with respect to the symmetric bilinear form x · y.
If d > 2, we can directly apply the usual version of Witt's lemma for symmetric bilinear forms of odd characteristic [Wil09] to see thatÕ extends to an isometry map O which by construction is also stochastic, i.e., O ∈ O t (d).
For d = 2, we appeal to the version of Witt's lemma from [Woo93] for the Z 4 -valued quadratic form q(x) = x · x (mod 4) from Remark 4.12. Here we need to verify two conditions: (i)Õ should be an isometry with respect to q, i.e., q(Õx) = q(x) for every x ∈Ñ. Since we already know that O is orthogonal, it suffices to check this condition on a generating set. By construction,Õ1 t = 1 t , so the condition is clearly true for x = 1 t . On the other hand, both defect subspaces are totally isotropic, which means that q(Õx) = 0 = q(x) (mod 4) for every x ∈ N. Thus, the first condition is satisfied. (ii) We also need to check is thatÑ ∩ I ⊥ =M ∩ I ⊥ , where I := {y ∈ Z t 2 : y · y = 0 (mod 2)}. But I = 1 ⊥ t and hence I ⊥ = Z 2 1 t . By construction, 1 t ∈Ñ and 1 t ∈M, so the second condition is also satisfied. We conclude thatÕ extends to an isometry O with respect to the quadratic form q, which implies that O ∈ O t (2) (Remark 4.12).    (4.18)). Next, use Lemma 4.20 to obtain some O that induces the defect isomorphism T J (T J ) −1 : T ⊥ LD /T LD → T ⊥ LD /T LD . Then O T = T , concluding the proof. For the last remark, note that the dimension of a totally isotropic subspace is never larger than t/2. If the dimension is zero, then it cannot contain 1 t , while if the dimension is t/2 then it is Lagrangian, hence must contain 1 t . Hence the number of possible dimensions is at most 1 + (t/2 − 1)2 + 1 = t.
Remark 4.22. We can also restrict to either the left or the right action. In this case, the resulting cosets are classified by the right and left defect subspace, respectively, which is an arbitrary defect subspace in the sense of Definition 4.16.
Next, we give an explicit description of the operators R(T ) = r(T ) ⊗n in terms of the defect subspaces and the defect isomorphism. If N is a defect subspace, define the coset states  Thus, r(T ) is proportional to a partial isometry, and rank r( We obtain (4.20) directly from Proposition 4.17 and (4.17). Since the coset states form two orthonormal families and T J is a bĳection, the formula for the rank follows at once. Now consider the case when the left and right defect subspaces coincide and the defect isomorphism is trivial. That is, (4.21) where N := T LD = T RD is an arbitrary defect subspace. In view of Corollary 4.21, any double coset contains a subspace of this form. In this case, r(T ) and R(T ) are related to a well-known family of codes in quantum information theory. To state the result, define the Weyl operators of, respectively, shift and multiply type: Given any totally isotropic subspace N ⊆ Z t d , the set CSS(N) := {Z p X q : q, p ∈ N} forms a stabilizer group of cardinality |N| 2 (since N is self-orthogonal, the Weyl operators commute). Such codes are a simple variant of Calderbank-Shor-Sloane (CSS) codes [Ste96b,CS96,Ste96a]. The projection onto the code space can be written as In particular, all this applies in the situation of Eq. (4.21). It follows that r(T ) and R(T ) are proportional to orthogonal projections onto CSS codes associated with the defect subspaces: Theorem 4.24 (CSS codes). Suppose that T is of the form (4.21), i.e., its left and right defect subspaces coincide and that the defect isomorphism is trivial. Let N := T LD = T RD . Then, Conversely, if T ∈ Σ t,t (d) is such that r(T ) is an orthogonal projection, then T is of the form (4.21).
Proof. The formula for r(T ) follows directly by comparing Eqs. (4.20) and (4.23). Conversely, suppose that r(T ) is an orthogonal projection. We see from Eq. (4.20) that the range of r(T ) is spanned by the |T LD , [x] , so we must have Since the coset states |T LD , [x] form a basis, it follows that for all x ∈ T LD and y ∈ T RD . When Finally, we can equip the set of subspaces Σ t,t (d) with a semigroup structure, denoted by •, such that the assignment T → R(T ) becomes a representation, i.e., (4.24) First, if T 1 or T 2 are associated to a stochastic isometry in O t (d), then we can simply define T 1 • T 2 as in Definition 4.15. In particular, the diagonal subspace is the identity element. Next, consider the case that T 1 and T 2 are of the form (4.21), associated to defect subspaces N 1 and N 2 . Then we may define T 1 • T 2 as the Lagrangian stochastic subspace T with (4.25) where x is such that x − y ∈ N 1 + N 2 .
Next, we verify that T J is a well-defined defect space isomorphism. Note that which shows that for every y ∈ T ⊥ RD there exists x ∈ T ⊥ LD such that x − y ∈ N 1 + N 2 . The same holds vice versa, so the map is surjective provided it is well-defined. Assume that x, x ∈ T ⊥ LD are two vectors such that x − y, x − y ∈ N 1 + N 2 . Then, x − x ∈ T ⊥ LD ∩ (N 1 + N 2 ) = T LD , which shows that (4.26) is indeed well-defined. Note that its kernel is given by T ⊥ RD ∩ (N 1 + N 2 ) = T RD . Thus, the induced map, which is precisely T J from (4.25), is a well-defined invertible linear map. We still need to verify that T J is an isometry and stochastic. The latter is clear, since 1 t − 1 t = 0 ∈ N 1 + N 2 . For the former, consider [x] ∈ T ⊥ LD /T LD and [y] ∈ T ⊥ RD /T RD such that y − x = n 1 + n 2 , where n 1 ∈ N 1 and n 2 ∈ N 2 . Without loss of generality, we may assume that x, y ∈ N ⊥ 1 ∩ N ⊥ 2 , so in particular x − y ⊥ y and n 2 ∈ N ⊥ 1 . Since N 1 and N 2 are totally isotropic, it follows that q(x − y) = 2n 1 · n 2 = 0 (mod D) and q(x) = q(y) + q(x − y) + 2y · (x − y) = 0 (mod D).
Finally, we will establish that Eq. (4.24) holds with T 1 • T 2 = T . It sufices to prove the claim for n = 1: In the third step, we used that for x ∈ N ⊥ 1 and y ∈ N ⊥ 2 , the condition that y − x ∈ N 1 + N 2 implies that x ∈ T ⊥ LD and y ∈ T ⊥ RD , and in the fourth step we used the definition of T J .
Finally, if T 1 and T 2 are arbitrary subspaces in Σ t,t (d) then we can always left and right multiply T 1 and T 2 by suitable stochastic isometries, thereby reducing to the preceding two cases (cf. Eq. (4.18)). The semigroup structure is highly useful for calculations (see Section 4.3 below and [Dam18]). We believe that the projections exhibited by Theorem 4.24 and the semigroup structure of Eq. (4.24) will be instrumental in understanding the fine-grained decomposition of H ⊗t n into irreducible representations of Cliff(n, d) and O t (d), generalizing the results discussed below. First results in this direction will be reported in [MMG20], with a full analysis being a direction for future work.

Examples
It is instructive to compute the commutant for small values of t. One can verify that every subspace in Σ 2,2 (d), as well as in Σ 3,3 (2) corresponds to a permutation. That is, in this case, Σ t,t (d) = O t (d) = S t . This is consistent with the fact that the Clifford group is always a unitary 2-design, and even a 3-design in the case of qubits [Zhu15].
For certain larger values of d and t, it is still true that Σ t,t (d) = O t (d), e.g., for t = 3 and d ≡ 2 (mod 3) [NW16]. In this case, the double commutant theorem implies that we have a proper duality akin to Schur-Weyl duality: where the V Cliff(n,d),λ and V O t (d),λ are pairwise inequivalent irreducible representations of Cliff(n, d) and of O t (d), respectively. It would be interesting to identify these representations further. In fact, it would be more appropriate to call Eq. (4.27) a form of a Howe duality, of which an example are well-known dualities between metaplectic and orthogonal groups. In general, however, O t (d) is a proper subset of Σ t,t (d), and it is an open problem to obtain a complete duality theory in positive characteristic [How73,GH16]. We now discuss some explicit examples.
Example 4.26 (d = 3, t = 3). In this case, O 3 (3) = S 3 , and we have that [NW16] where we identify the matrix with its row space, a Lagrangian stochastic subspace T . The double coset of T contains only two elements, T and (12)T . In total, Σ 3,3 (3) contains 6 + 2 = 8 elements, which is in agreement with Theorem 4.10.
Next, we note that T corresponds to a CSS code as in Eq. x + z, z − x, z| (4.29) is a projector of rank 3 n (Eq. (4.23)).
It is now straightforward to derive the decomposition of ((C 3 ) ⊗n ) ⊗3 into irreducible representations of the Clifford group (for n 2). We start with Schur-Weyl duality, which asserts that where λ runs over all partitions of 3. By Eq. (4.28), the commutant is generated by S 3 and the projection P.
Next, we discuss some multi-qubit examples.
Example 4.27 (d = 2, t = 4). As before, we find that O 4 (2) = S 4 . In addition to the 4! = 24 permutation subspaces, there exist 6 more Lagrangian subspaces in Σ 4,4 (2) -making a total of 30, which is known to be the dimension of the commutant of the multi-qubit Clifford group for n 3 [Zhu15, (10)]. We can decompose Σ 4,4 (2) into two double cosets in a form that is completely analogous to Eq. (4.28): 0 1 1 0 0 1  0 1 0 1 0 1 0 1  0 0 0 0 1 1 1 1  1 1 1 1 0 0 The given matrix is the generator matrix of a Lagrangian subspace which we denote by T 4 . Similarly to above, the operator R(T 4 ) is proportional to a projector onto a CSS code, with defect subspace spanned by the all-ones vector 1 4 . This projector is given by Eq. (3.6), and it can be used to decompose ((C 2 ) ⊗n ) ⊗4 into irreducible representations of the Clifford group, as explained in [ZKGG16].
Example 4.28 (d = 2, t = 5). Likewise, for t = 5, it is not hard to see that (cf. [Dam18])  1 1 1 1 0  1 1 1 1 0 0 (4.34) The displayed matrix corresponds to a subspace T 5 of the form Eq. (4.21), with defect subspace spanned by the vector (1, 1, 1, 1, 0), and the operator R(T 5 ) is proportional to a projector onto a CSS code. Indeed, we have We now discuss some interesting elements in the groups O t (d). For qubits, we have the class of anti-permutations introduced previously in Eq. (1.4).
Definition 4.29 (Anti-permutation). Let π ∈ S t . We define the anti-permutationπ as the binary complement of the corresponding permutation matrix. Formally, it is the t × t-matrix π = 1 t 1 T t − π with entries in F 2 , where we identify π with the corresponding permutation matrix.
Proof. By Remark 4.12, it suffices to check thatπ is orthogonal and that q(x) = 1 for each column. The latter holds since each column ofπ contains t − 1 ≡ 1 (mod 4) ones. For the former, where we used that ordinary permutation matrices are orthogonal and stochastic, as well as that t is even.
Remark 4.31. More generally, the entrywise binary complement maps any A ∈ O t (2) to an element A ∈ O t (2) provided that the rows of A each have Hamming weight w such that t ≡ 2w (mod 4).
For t 6, the anti-permutations are distinct from the permutations, so in particular O t (2) S t .
(For t = 2, the two sets coincide.) Example 4.32 (d = 2, t = 6). The anti-identity1 ∈ O 6 (2) and the corresponding subspace T1 ⊆ Z 6 2 ⊕ Z 6 2 are given by (cf. Eqs. (1.4) and (3.13)). 1 1 1  1 0 1 1 1 1  1 1 0 1 1 1  1 1 1 0 1 1  1 1 1 1 0 1  1 1 1 1 1  The anti-permutations admit several possible generalizations to odd primes d. One class of generalizations is given as follows. For π ∈ S t with d t, definē where t −1 denotes the multiplicative inverse of t in F d . It is easy to verify thatπ ∈ O t (d). Moreover, π is the only nontrivial linear combination of 1 t 1 T t and π with this property. Another class of generalizations is given by the formula in Eq. (3.17). Let p ∈ F t d be vector with entries in {±1} that is 'balanced', i.e., p · 1 t = 0 (this requires that t is even). If d t and π ∈ S t is a permutation that stabilizes p up to a sign, i.e., πp = ±p, theñ is an element in O t (d). In particular, this yields a large family of 'anti-identities' for odd d.
Another non-trivial example of a stochastic isometry can be constructed from the adjacency matrix A of the edge-vertex graph of the icosahedron. The icosahedron has 12 vertices, so A is a 12 × 12 binary matrix. Any two vertices share either zero or two neighbors, which implies that A is orthogonal. Moreover, each vertex has 5 neighbors, which implies q(x) = 1 for each column x of A. By Remark 4.12, it follows that A ∈ O 12 (2). The space TĀ generated by the element-wise complement of A is just the extended Golay code G 24 . The latter plays an important role in the invariant theory of the Clifford group as detailed in [NRS06]. Note, however, that unlike A and T A , it is not the case thatĀ ∈ O 12 (2) or TĀ ∈ Σ 12,12 (2). Working out the precise connection between T A , the extended Golay code, and their respective roles in the representation theory of the Clifford group is an interesting problem we leave open. Likewise, we leave open the question of whether R(A) can be given a physical interpretation, as was the case for the anti-identity.

Statistical properties of stabilizer states
In this section we discuss the statistical properties of the stabilizer states. We use the techniques that we developed in the last section to prove an explicit formula for the t-th moment of random stabilizer states, which vastly generalizes previous results in the quantum information literature [Zhu15,KG15,Web16,ZKGG16,HWW16]. Throughout this section, d is assumed to be a prime.

Moments of random stabilizer states
We start by studying the operator-valued t-th moment of the uniform distribution over all stabilizer states in (C d ) ⊗n : Clearly this operator can be used to calculate the average value of any polynomial of degree t in the coefficients of the wavefunction of a random stabilizer state. Note that the operator E[|S S| ⊗t ] is invariant under conjugation by Clifford operators. This is because the set of stabilizer states is a single orbit of the Clifford group. Thus E[|S S| ⊗t ] is in the commutant of the Clifford group and, assuming n t − 1, can be written in terms of the basis R(T ) from Theorem 4.3, for certain coefficients γ T ∈ C. In this section, we will show that these coefficients are all equal and establish an explicit formula for the t-th moment of a random stabilizer state which holds for all values of t and n. We start with some useful lemmas.
Remark 5.1 (Sum of traces of the R(T )). Recall that in order to establish Theorem 4.10, we determined the cardinality of the set Σ t,t (d), whose elements are the subspaces T ∈ Σ t,t (d) with dim(T ∩ ∆) = t − . The significance of the parameter is that tr R(T ) = (tr r(T )) n = d (t− )n for every subspace T ∈ Σ t,t (d). Thus, we can e.g. compute the sum of the traces of all R(T ) by using Eq. (4.8) and the Gaussian binomial formula Eq. (4.3): This number can be expressed in terms of the q-Pochhammer symbol as For n = 0, we recover the cardinality of Σ t,t (d), in agreement with Theorem 4.10.
Next, we prove a formula that relates moments of stabilizer states for different numbers of qudits.
Lemma 5.2. Let N n > 0. Then: It is clear that the left-hand side operator is nonzero and invariant under conjugation by U ⊗t for any Clifford unitary U ∈ Cliff(n, d). Thus, we can replace the right-hand side by its Clifford average, which is plainly proportional to E Stab(n,d) |S S| ⊗t .
Theorem 5.3 (t-th moment). Let n, t 1. Then the t-th moment of a random stabilizer state in (C d ) ⊗n is given by the formula Proof. It suffices to argue that the left-hand side and right-hand side are proportional, since the formula for the proportionality constant Z n,d,t follows immediately from Remark 5.1 and comparing traces. Fixing d and t, we will proceed in two steps. First, we will argue that there exist α n ∈ C and β T ∈ C for each T ∈ Σ t,t (d) such that We will show this first for n t − 1 and then for all n. Afterwards, we will find that the β T are necessarily equal, which as just discussed implies the claim. Let us first assume that n t − 1. By Theorem 4.3, there exist coefficients γ n,T ∈ C such that γ n,T r(T ) ⊗n , (5.5) since the left-hand side commutes with arbitrary t-th tensor powers of Clifford unitaries. It follows that, for every N n, since 0 ∈ T for every subspace T . From Lemma 5.2 we know that Eqs. (5.5) and (5.6) are proportional and nonzero. Since the operators r(T ) ⊗n are also linearly independent for n t − 1, it follows that there exist α n and β T such that γ T,n = α n β T for all n t − 1 (e.g., we can choose β T := γ T,t−1 ). Thus, we have established Eq. (5.4) for n t − 1. To extend its validity to all values of n, we observe that Eq. (5.6) holds also when N t − 1 > n. Together with Lemma 5.2, we find that, indeed, which shows that there exist constants α n and β T such that Eq. (5.4) holds for all values of n.
We will now argue that, in this case, the β T are necessarily all equal. For this, we compute the expectation value of an operator R(T ) † = r(T ) ⊗n, † . On the one hand, by Eq. (4.10), On the other hand, in the limit of large n. Thus, all β T must be equal, and the statement of the theorem follows.
Remark 5.4. Theorem 5.3 is reminiscent of the following average formula for the tensor power of a Haar-random pure state ψ in C D , which follows from Schur's lemma and Schur-Weyl duality: (5.7) Remark 5.5. When d is an odd prime, Σ t,t (d) = S t for t 2, but not for t 3. Thus Eqs. (5.3) and (5.7) match for t 2 and deviate for t 3. Since the operators R(T ) are linearly independent for sufficiently large n, this shows that stabilizer states in odd prime dimension are 2-designs, but not 3-designs or higher (provided n 2). Similarly, for d = 2, Σ t,t (2) = S t for t 3, but not or t 4, which shows that multiqubit stabilizer states form 3-designs, but not 4-designs or higher [KG15] (provided n 3).
Remarkably, the theory developed in this section allows us to design complex projective t-designs for any order t from the Clifford group orbits of a finite number of fiducial states. We explain this in Section 6 below.

Minimal projections for stabilizer testing
We now return to the problem of stabilizer testing; we revisit our solution from Section 3 and characterize minimal stabilizer tests with perfect completeness.
In Section 3, we found that perfectly complete stabilizer tests were in any local dimension d given by the following accepting POVM element on t = 2s copies of (C d ) ⊗n , where V s is the Hermitian unitary defined in Eq. (3.15) and (d, s) = 1. This means that V s is a unitary operator with the property that 2s-th tensor powers of every pure stabilizer state |S are contained in its +1 eigenspace: for any pure stabilizer states |S . Our soundness result implies that, conversely, these are the only tensor power states with this property. Note that V s is an operator in the commutant of the Clifford action. This is immediate by comparing Eqs. (3.18) and (4.12), which also shows that V s is precisely the operator R(1) associated with the 'anti-identity'1 defined in Eq. We proved this in Eq. (4.13). As we just saw, V s is such an operator, so Eq. (5.10) generalizes Eq. (5.9). Note that, since1 squares to the identity, it generates a subgroup of O t (d) that contains two elements: {1,1}. We can thus interpret the projector (5.8) as the projector onto the invariant subspace for the action of this subgroup. This suggest that we look more generally at the invariant subspaces associated with subgroups of O t (d). Larger subgroups corresponds to projectors onto smaller invariant subspaces. In particular, the minimal projector corresponds to the full group O t (d), i.e., (5.11) By Eq. (5.10), Π min t accepts all stabilizer tensor powers. Remarkably, it is the minimal projector with this property, as follows from the following theorem. Proof. Note that the t-th moment ρ := E |S S| ⊗t defined in Eq. (5.1) is a density operator that is exactly supported on the span of the stabilizer tensor powers. By the preceding discussion, it remains to prove that the support of Π min t is contained in the support of ρ. We start with Eq. (5.3): Recall from Eq. (4.16) that we can decompose Σ t,t (d) into k double cosets, One of the double cosets is just O t (d), say the first, corresponding to T 1 = ∆ and R(T 1 ) = I. As a consequence of Corollary 4.21, we can choose each representative T i to be of the form (4.21). Then, Theorem 4.24 shows that R(T i ) is proportional to an orthogonal projection (by a positive proportionality constant) so in particular R(T i ) 0. On the other hand, we can compute the sum over each double coset by which shows that the support of ρ indeed contains the support of Π min t .
Note that there is no condition on t in Theorem 5.6. Indeed, while the theorem identifies the projector onto the span of stabilizer tensor powers precisely, it makes no assertion about whether this subspace contains other tensor power states than stabilizer states or not. It therefore complements our results on stabilizer testing, from which we can read off values of t such that the projector Π t and hence Π min t Π t contains only stabilizer tensor powers. It is also interesting to ask about the minimal number of copies necessary for there to exist a perfectly complete stabilizer test that is dimension-independent. For d = 2, it is possible to show that t = 4, 5 copies of a random stabilizer state become on average indistinguishable from a Haar-random pure state as n → ∞. This can be done by an explicit calculation of the 4th and 5th moments using our Theorem 5.3 and Eqs. (4.33) and (4.34) (for t = 4, this has been carried out in [Dam18]). Thus, t = 6 copies as in our Theorem 3.3 are indeed optimal for multiqubit stabilizer testing. For odd d (prime or not), we know from Theorem 3.11 that t = 4 copies always suffice. For d ≡ 1, 5 (mod 6) it follows from Theorem 8.6 below that even t = 3 copies suffice (and are optimal). For d ≡ 3 (mod 6), we leave the question of minimal t open.

Construction of designs
Next we describe a construction of projective t-designs for arbitrary t based on weighted Clifford orbits. As in Sections 4 and 5, we assume that d is prime.
First, we derive expressions for the average tensor powers of the Clifford orbits of arbitrary states. For any pure state |Ψ , the average E U∈Cliff(n,d) [(U |Ψ Ψ| U † ) ⊗t ] commutes with Cliff(n, d) ⊗t . By Theorem 4.3, and assuming that n t − 1, it can therefore be expressed as for some α T ∈ C. Not all of the α T are independent. This is because each (U |Ψ Ψ| U † ) ⊗t is invariant under the action of S t × S t (acting from the left and from right), and also under taking the conjugate transpose. This motivates the following definition: Definition 6.1 (Equivalence relation ∼ S ). We define an equivalence relation ∼ S on Σ t,t (d) in the following way: T ∼ S T if and only if there exist π, π ∈ S t such that T = πT π or T = πT t π , where the transposed subspace T t is defined by T t = {(y, x) : (x, y) ∈ T }.
We correspondingly decompose Σ t,t (d) into equivalence classes: For convenience, we choose F t,1 (d) to be the set of subspaces corresponding to the permution group S t (these form a single equivalence class). We also define We note that the operators R i are Hermitian and linearly independent.
Since the R(T ) are linearly independent and R(T ) † = R(T t ), it follows that the coefficients α T in Eq. (6.1) must be the same for the elements of each equivalence class. Thus, for some coefficients α i . Note that α i ∈ R because the R i are Hermitian.
Importantly, the number M t,d of Clifford orbit is independent of n, the number of qudits.
Proof. We start the proof by taking an arbitrary finite t-design given by an ensemble {p j , Ψ j } K j=1 . Such designs exist (see for example the early work [SZ84]), but K can be very large. If we replace each Ψ j by a random element in its Clifford orbit then the resulting ensemble still forms a projective t-design. This means that Thus, if we define α (j) i as the coefficient of R i in the Clifford average of the fiducial state Ψ j , Conversely, if {p j } is an arbitrary probability distribution that satisfies Eq. (6.3) then the ensemble obtained by first choosing a random fiducial state according to this distribution and then a random state in its Clifford orbit is a projective t-design. We will now explain how to modify the probabilities p j step by step, setting more and more probabilities to zero while ensuring that Eq. (6.3) continues to hold -until all but M t,d of them are zero. Without loss of generality, assume that p 1 > 0. Each step proceeds as follows: Suppose that there exist indices 2 j 1 < j 2 < · · · < j M t,d K such that p j m > 0 for all m = 1, . . which is always possible since p 1 > 0. Thus, we obtain a probability distribution {p j } with strictly smaller support satisfying Eq. (6.3) and p 1 > 0. We can repeat this process until there are at most M t,d − 1 nonzero probabilities among the {p j } K j=2 . By including p 1 , we arrive at an ensemble of at most M t,d fiducial vectors. The corresponding probabilities satisfy Eq. (6.3), which is necessary and sufficient for the ensemble of Clifford orbits to be a design. This completes the proof.

De Finetti theorems for stabilizer symmetries
In this section we establish a direct connection between our results on stabilizer testing and the celebrated quantum de Finetti theorems, which play an important role in characterizing entanglement and correlations in quantum states with permutation symmetry (cf. discussion in Section 1.4).
We first recall the finite quantum de Finetti theorem from [CKMR07]. Let ρ be a quantum state on (C ) ⊗t that commutes with all permutations (i.e., [r π , ρ] = 0 for all π ∈ S t ). Then there exists a probability measure µ on the space of mixed states on C such that 1 2 ρ 1...s − dµ(σ)σ ⊗s 1 2 2 s t . (7.1) Since any quantum state that commutes with permutations admits a purification on the symmetric subspace, Eq. (7.1) follows directly from a similar result for the symmetric subspace, namely, that for every |Ψ ∈ Sym t (C ) there exists a probability measure µ on pure states on C such that In this section, we prove de Finetti theorems adapted to stabilizer states. The key idea is to extend the permutation symmetry to invariance under a larger group: 1. the stochastic orthogonal group O t (d) (for qudits in any prime dimension d), or 2. the group generated by the permutations and the anti-identity (3.13) (for qubits).
These symmetries are natural since they are carried by the tensor powers of any stabilizer state, as we proved in Eq. (4.13).
In both cases, our theorems show that the reduced density matrices are close to convex combinations of tensor powers of stabilizer states. In the first case, we find that the reduced state is in fact exponentially (in the number of traced out systems) close to a state of this form, which is a much stronger guarantee than provided by the finite de Finetti theorems of Eqs. (7.1) and (7.2) (cf. [Ren07,KM09]). In the second case, we obtain power law convergence but the symmetry requirements are drastically reduced. We establish our results first for pure states (Sections 7.1 and 7.2) and then extend them by a standard purification argument to mixed states (Section 7.3).

Exponential stabilizer de Finetti theorem
Let d be an arbitrary prime. We start with the observation that, for any two distinct stabilizer states, (this can be seen from, e.g., Eq. (2.17)). It follows that, for fixed d and n, the stabilizer tensor powers |S ⊗t approach orthonormality as t → ∞. The following lemma makes this precise.
Lemma 7.1. Let d be a prime and n, t 1. Consider the Gram matrix G S,S = S|S t , where S, S ∈ Stab(n, d). If ε := d 1 2 ((n+2) 2 −t) < 1 2 then the following holds: 1. The Gram matrix is ε-close to the identity matrix in operator norm: G − I ∞ ε. In particular, the stabilizer tensor powers |S ⊗t are linearly independent.
2. The nonzero eigenvalues of Q := S |S ⊗t S| ⊗t and its pseudoinverse Q + lie in the interval 1 ± 2ε.
Proof. 1. The first claim follows directly from the element-wise bound (7.3): where we used the bound where |e S denotes an orthonormal basis labeled by the set of stabilizer states Stab(n, d). Then, and thus the nonzero eigenvalues of G and Q are both identical (to the squared singular values of H). By part 1, the eigenvalues of G lie in the interval 1 ± ε, hence the same is true for the nonzero eigenvalues of Q. Since we assumed that ε < 1/2, it follows that the nonzero eigenvalues of the pseudoinverse Q + are in the interval 1 ± 2ε. This establishes the second claim. 3. By the first claim, the stabilizer tensor powers are linearly independent. On the other hand, Thus, the linear independence implies that the vectors (Q + ) 1/2 |S ⊗t are orthonormal. for certain coefficients α S ∈ C. We now use the third and second claim of Lemma 7.1 to see that Here we have assumed that ε < 1/2, for otherwise the statement of the theorem is vacuous. We now compute the partial trace over all but s subsystems: The norm of the cross terms is small: the first inequality uses Eq. (7.3), the third inequality is Eq. (7.4), the fourth inequality is the upper bound in Eq. (7.5), and the last step uses that ε < 1/2. Finally, define p(S) := |α S | 2 / S |α S | 2 (the denominator is positive by the lower bound in Eq. (7.5) and ε < 1/2). Then:

Stabilizer de Finetti theorem for the anti-identity
We now prove a stabilizer de Finetti theorem with reduced symmetry requirements. For concreteness, we restrict to the multi-qubit case (d = 2) and to tensor powers that are multiples of six. Neither restriction is essential. The following theorem shows that the reduced states of an arbitrary permutation-symmetric quantum state that is invariant under the anti-identity operator V = R(1) from Eq. (3.13), but not necessarily under other stochastic isometries, are well-approximated by convex mixtures of tensor powers of stabilizer states.
Theorem 7.3 (Pure-state stabilizer de Finetti theorem for the anti-identity). Let |Ψ ∈ Sym t ((C 2 ) ⊗n ) be a quantum state that is left invariant by the action of the anti-identity (3.13) on some (and hence every) subsystem consisting of six n-qubit blocks. Let s < t be a multiple of six. Then there exists a probability distribution p on Stab(n, 2), the set of pure stabilizer states of n qubits, such that Proof. By the ordinary finite quantum de Finetti theorem (7.2), there exists a probability measure dµ(φ) on the set of pure states such that (7.6) Let Π n = (I + R(1))/2 denote the projector onto the +1-eigenspace of the 6 × 6-anti identity for n qubits. By assumption, (Π  12)) is the operator that swaps two blocks of n qubits (see discussion above Eq. (3.12)). Since tensor powers of pure states are permutation-symmetric, According to Theorem 3.3 and Eq. (3.8) and using Lemma 7.4 below, for each pure state φ there exists a pure stabilizer state S φ such that | S φ |φ | 2(s/6) 4 tr Π accept φ ⊗6 s/6 − 3.
It follows that replacing each pure state φ by the nearby stabilizer state S φ incurs only a small error: 24 · 2 n+1 s t , where we have used the triangle inequality, the relation between the trace distance and the fidelity between pure states, and the concavity of the square root. Together with Eq. (7.6), we obtain Theorem 7.6 (Exponential stabilizer de Finetti theorem). Let d be a prime and ρ a quantum state on ((C d ) ⊗n ) ⊗t that commutes with the action of O t (d) ⊇ S t . Let 1 s t. Then there exists a probability distribution p on the (finite) set of mixed stabilizer states of n qudits, such that Proof. Using Lemma 7.5, we can find a purification |Ψ ∈ ((C d ) ⊗2n )) ⊗t ∼ = (C d ) ⊗n ) ⊗t ⊗ (C d ) ⊗n ) ⊗t of ρ, which is invariant under the action of O ∈ O t (d). (In particular, |Ψ is an elemenrt of the symmetric subspace.) Here we crucially use that the operators R(O) for 2n qudits are just the second tensor powers of the corresponding operators for n qudits, as is clear from Eq. (4.12). Thus we can apply Theorem 7.2 to the state |Ψ . Since the local Hlibert space now contains 2n qudits, we obtain that there exists a probability distribution p over pure stabilizer states on (C d ) ⊗2n such that Taking the partial trace over the purifying systems does not increase the trace distance. Since reduced density matrices of pure stabilizer states are mixed stabilizer states, we obtain the result.
The very same argument yields the following version of Theorem 7.3 for mixed states: √ 2 · 2 n s t .

Robust Hudson theorem
The methods developed in Section 3 also allow us to prove a robust version of the finite-dimensional Hudson theorem. Recall that from Eq. (2.18) that, for odd d, the Wigner function of a pure stabilizer state is necessarily nonnegative. Hudson theorem states that, for pure states, this condition is also sufficient, i.e., the Wigner function of a pure quantum state is non-negative if and only if the state is a stabilizer state [Gro06]. We will show in Theorem 8.4 that if the Wigner or sum-negativity sn(ψ) := x:w ψ (x)<0 |w ψ (x)| = 1 2 x |w ψ (x)| − 1 .
is small then the state is close to a stabilizer state. The Wigner negativity is immediately related to the mana M(ψ) = log(2 sn(ψ) + 1), a monotone that plays an important role in the resource theory of stabilizer computation [Got97]. Throughout this section we assume that d is odd, so that the Wigner function is well-behaved (cf. Section 2.3).
A mixed stabilizer state is a maximally mixed state on a stabilizer code (see Section 2.4).

Exact Hudson theorem
We first give a new and succinct proof of the finite-dimensional Hudson theorem. For pure states, 1 = tr ψ 2 = x d n w ψ (x) 2 . Thus we can define a probability distribution based on the Wigner function, q ψ (x) = d n w ψ (x) 2 , similar to the p ψ distribution that we defined in Eq. (3.2) via the characteristic function. Note that 0 q ψ (x) d −n , since |w ψ (x)| d −n .
We now consider the sum of the absolute value of the Wigner function, It holds that ψ W x w ψ (x) = 1, with equality if and only if w ψ (x) 0 for all x. By the Hölder inequality (with p 1 = p 2 = p 3 = 3, so k 1/p k = 1): Thus we obtain the following fundamental bound: x q ψ (x) 2 1 d n ψ 2 W . (8.1) Crucially, we can interpret the left-hand side as the average of the function q ψ with respect to the same probability distribution, E x∼q ψ q ψ (x). Now suppose that the Wigner function is nowhere negative, so that the bound simplifies to x q ψ (x) 2 d −n . But q ψ (x) d −n for all x, so we conclude that the function q ψ must be equal to d −n on its support. In other words, q ψ (x) is the uniform distribution on a subset of cardinality d n . This gives a rather direct proof of the finite-dimensional Hudson theorem: Theorem 8.1 (Finite-dimensional Hudson theorem, [Gro06]). Let d be an odd integer and ψ a pure quantum state of n qudits. Then the Wigner function of ψ is everywhere nonnegative, w ψ (x) 0, if and only if ψ is a stabilizer state.
Proof. In view of Eq. (2.18) we only need to show that if w ψ (x) 0 for all x then ψ is a stabilizer state. By the preceding discussion, we know that w ψ (x) = d −n 1 T (x), where 1 T denotes the indicator function of some subset T ⊆ V n of cardinality d n . In other words, ψ|A x |ψ = 1 T (x) and so A x |ψ = |ψ for all x ∈ T .
It remains to show that T is of the form T = a + M, where M is a maximal isotropic subspace. For this, consider any three points x, y, z ∈ T and use Eq. (2.12), which asserts that A x A y A z = ω 2[z−x,y−x] A x−y+z . Because |ψ is an eigenvector of A x , A y , A z , with eigenvalue +1, we obtain 1 = ψ|A x A y A z |ψ = ω 2[z−x,y−x] ψ|A x−y+z |ψ .
But A x−y+z is Hermitian, so this is impossible unless [z − x, y − x] = 0. Therefore, T is the translate of an totally isotropic set M of cardinality d n . Since the maximal size of a totally isotropic subspace is also d n [Gro06, App. C], T is necessarily a maximal isotropic subspace. We conclude that ψ is a stabilizer state.

Robust Hudson theorem
To obtain a robust version of the Hudson theorem, we will, similarly as in our approach to stabilizer testing, combine Eq. (8.1) with an uncertainty relation that generalizes the proof of Theorem 8.1. Proof. Note that the assumption implies that and likewise for y and z. Thus we obtain the following inequalities: As a consequence of the triangle inequality, and using A z 1, along with Eq. (2.12), we obtain We can simply expand this relation and see that it is equivalent to ).
We now prove the main result of this section: Theorem 8.4 (Robust finite-dimensional Hudson theorem). Let d be odd and ψ a pure quantum state of n qudits. Then there exist a stabilizer state |S such that | S|ψ | 2 1 − 9d 2 sn(ψ).
Proof. Suppose that sn(ψ) ε. Then ψ W 1 + 2ε and we find from Eq. (8.1) that x q ψ (x) 2 1 d n (1 + 2ε) 2 , i.e., We would like to show that the probability of the set T from Corollary 8.3 with respect to the probability distribution q ψ is close to one. First, though, let us consider T = x : |w ψ (x)| > d −n 1 − 1/2d 2 = x : q ψ (x) > d −n 1 − 1/2d 2 which is defined just like T but for the absolute value of the Wigner function! Then, using Markov's inequality and Eq. (8.2), we have But then T is likewise a high-probability subset: where we used q ψ (x) = d n |w ψ (x)| 2 |w ψ (x)|. As a result of Corollary 8.3, T is a subset of some affine totally isotropic subspace a + M. If |S denotes the corresponding stabilizer state then In the fifth step we used that, for x ∈ T , w ψ (x) = |w ψ (x)| d n |w ψ (x)| 2 = q ψ (x).

Stabilizer testing revisited: minimal number of copies
We will now revisit stabilizer testing from the perspective of the Wigner function and show that for d ≡ 1, 5 (mod 6) it is in fact possible to perform stabilizer testing with just three copies of the state. This is clearly optimal, since the set of stabilizer states forms a projective 2-design.
We start with the phase space point operators A x from (2.11), Consider the operator y 1 +y 2 +y 3 =0 W y 1 ⊗ W y 2 ⊗ W y 3 .
We now consider the binary POVM measurement with accepting projector Π accept = 1 2 (I + V).
Theorem 8.6 (Stabilizer testing from three copies). Let d ≡ 1, 5 (mod 6) and ψ a pure state of n qudits. Denote by p accept = tr[ψ ⊗3 Π accept ] the probability that the POVM element Π accept accepts given three copies of ψ. If ψ is a stabilizer state then it accepts with certainty, p accept = 1. On the other hand, if max S | S|ψ | 2 1 − ε 2 then p accept 1 − ε 2 /16d 2 .
Proof. We first note that where w ψ denotes the Wigner function defined in Eq. (2.10). It is clear from Eq. (2.18) that if ψ is a stabilizer state then p accept = 1. Now assume that ψ is an arbitrary pure state. Since q(x) = d n w 2 ψ (x) is a probability distribution, we can rewrite the above as x q ψ (x) 1 − d n w ψ (x) = 2 1 − p accept .