1 Introduction

1.1 Background

Schur–Weyl duality. To motivate the symmetry this work is based on, we start by considering two types of problems that have frequently appeared in quantum information theory. First, assume that we have access to t copies \(\rho ^{\otimes t}\) of an unknown quantum state \(\rho \) on \({\mathbb {C}}^d\), and that we are interested in some property of \(\rho \)’s eigenvalues (for example its entropy). Clearly, then, the problem has a \(U^{\otimes t}\)-symmetry in the sense that the inputs \(\rho ^{\otimes t}\) and

$$\begin{aligned} U^{\otimes t} (\rho ^{\otimes t}) {U^\dagger }^{\otimes t} \end{aligned}$$

represent equivalent properties. It thus makes sense to design a procedure that shares the \(U^{\otimes t}\)-symmetry, and indeed the resulting procedure has been shown to be optimal for estimating the eigenvalues [KW01, HM02, CM06, CHM07, OW15]. Moreover, consider quantum state tomography, the task of estimating the entire quantum state \(\rho \). Essentially optimal estimators can be constructed by first estimating the eigenvalues and then the eigenbasis [OW16, OW17, HHJ+17], crucially using the structure of \(U^{\otimes t}\) in each step. There are many further problems in quantum information where this symmetry can be exploited—for example in quantum Shannon theory, where optimal rates mostly depend only on the eigenvalues of the quantum state [HM02, Har05].

Second, studying the properties of a Haar-random state vector \(\vert \psi \rangle \) has proven to be extremely fruitful [HLW06, Has08]. Instead of working with the full distribution, it is often sufficient to exploit information about the statistical moments of the random matrix \(\vert \psi \rangle \langle \psi \vert \). The t-th moment is described by the expected value of the t-th tensor power of the random matrix:

$$\begin{aligned} M_t = {\mathbb {E}}_{\psi \text { Haar}} [ (\vert \psi \rangle \langle \psi \vert )^{\otimes t} ]. \end{aligned}$$
(1.1)

Again, \(M_t\) is invariant under conjugation by \(U^{\otimes t}\), \(U\in U({\mathbb {C}}^d)\).

The importance of Schur–Weyl duality in quantum information stems from the fact that it allows one to characterize the set of \(U^{\otimes t}\)-invariant operators on \(({\mathbb {C}}^d)^{\otimes t}\). Indeed, it implies that any such operator can be expressed as the linear combination of matrices \(r_\pi \), \(\pi \in S_t\) that act by permuting the tensor factors:

$$\begin{aligned} r_\pi \, (\vert \psi _1\rangle \otimes \dots \otimes \vert \psi _t\rangle ) = \vert \psi _{\pi _1}\rangle \otimes \dots \otimes \vert \psi _{\pi _t}\rangle . \end{aligned}$$
(1.2)

Clifford group and stabilizer states. Arguably, the subgroup of the full unitary group that is most important to quantum information is the Clifford group. The Clifford group and the closely related concept of stabilizer states and stabilizer codes feature centrally in fault-tolerant quantum computing, quantum coding in general, randomized benchmarking, measurement-based quantum computing, and many other subfields of quantum information.

To introduce the Clifford group, we first recall the definition of the set of Pauli operators. For a qudit (d-dimensional system), they are defined by their action on a some basis \(\{\vert q\rangle \}_{q=0}^{d-1}\) via

$$\begin{aligned} X \vert q\rangle = \vert q+1\rangle , \quad Z \vert q\rangle = e^{2\pi i q/d} \vert q\rangle . \end{aligned}$$

For n qudits, the Pauli group is defined as the finite group generated by the Pauli operators on each qudit. The Clifford group now is the natural symmetry group of the Pauli group. That is, a unitary U is Clifford if, for any Pauli operator P, \(UPU^\dagger \) is again in the Pauli group. Ignoring overall phases, the Clifford group is a finite group, which is intimately connected to the metaplectic representation of the discrete symplectic group (see, e.g., [Gro06]). Closely related to the Clifford group is the set of stabilizer states. These are the states that can be obtained by acting on a basis vector \(\vert 0\dots 0\rangle \) by arbitrary Clifford unitaries.

As before, there are many natural problems that are invariant under \(U^{\otimes t}\), for U a Clifford unitary. Two examples we will discuss are: (1) Given access to \(\psi ^{\otimes t}\), decide whether \(\psi \) is a stabilizer state; (2) What are the t-th moments of a random stabilizer state \(\psi \)?

Randomized constructions. Another motivation arises from randomized constructions. Unitaries and states drawn from the Haar measure appear in many situations, including in quantum cryptography, coding, and data hiding [HLW06]. While randomized constructions are often near-optimal and frequently out-perform all known deterministic constructions, they have the drawback that generic quantum states cannot be efficiently prepared.

This contrasts with random Clifford unitaries and random stabilizer states, both of which can be efficiently realized (they require at most \(O(n^2)\) gates to implement in a quantum circuit) [AG04]. They have therefore repeatedly been suggested as “drop-in replacements” for their Haar-measure analogues. Examples include randomized benchmarking [MGE11, HWFW17], low-rank recovery [KZG16], and tensor networks in the context of holography [HNQ+16, NW16]. All these applications require information about the moments (in the sense of Eq. (1.1)) of random stabilizer states, which they all obtain from representation-theoretic data. To date, this representation theory and the associated stabilizer moments are understood only up to order \(t=4\) [ZKGG16, HWW16, NW16]. This contrasts with the Haar-random case, where Schur–Weyl duality gives this information for arbitrary orders t. Making analogous techniques available for the Clifford case was one important motivation for this work. Higher moments will generally lead to tighter performance bounds in randomized constructions, and are strictly required for some applications, like the stabilizer testing problem resolved here.

1.2 Schur–Weyl duality for the Clifford group

We start with an explicit description of the commutant of tensor powers of Clifford unitaries. While such a description has not yet appeared in the quantum information literature, we emphasize that some of the key results can already be deduced from work by Nebe, Rains, Sloane and colleagues on invariants of self-dual codes (see the excellent monograph [NRS06]). Also, in representation theory, there is a separate stream of closely related work regarding the structure of the oscillator representation and attempts to develop a Howe duality theory over finite fields, which is still an open problem (see, e.g., [How73, GH16] and references therein). We discovered the approach presented below independently, starting from our results in [NW16, App. C] for third tensor powers. Our proofs differ fundamentally from the preceding works in that they rely on the phase space formalism of finite-dimensional quantum mechanics, which offers additional insight.

To construct the commutant, start with the permutations \(r_\pi \) on \(({\mathbb {C}}^d)^{\otimes t}\) of Eq. (1.2). We assume for now that \({\mathbb {C}}^d\) is the Hilbert space of a single qudit with “computational basis” \(\{\vert x\rangle \}_{x\in {\mathbb {Z}}_d}\) labeled by elements in \({\mathbb {Z}}_d = {\mathbb {Z}}/d {\mathbb {Z}}\) (this is anyway required for defining the Pauli and the Clifford group). Basis elements \(\vert \mathbf {x}\rangle =\vert x_1\rangle \otimes \dots \otimes \vert x_t\rangle \) of \(({\mathbb {C}}^d)^{\otimes t}\) are then labeled by vectors \(\mathbf {x} \in {\mathbb {Z}}_d^t\). In this language:

$$\begin{aligned} r_\pi = \sum _{\mathbf {y}\in {\mathbb {Z}}_d^t} \vert \pi (\mathbf {y})\rangle \langle \mathbf {y}\vert = \sum _{(\mathbf {x}, \mathbf {y})\in T_\pi } \vert \mathbf {x}\rangle \langle \mathbf {y}\vert , \end{aligned}$$
(1.3)

where \(T_\pi =\{ (\pi (\mathbf {y}), \mathbf {y}) : y\in {\mathbb {Z}}_d^t\}\) and \(\pi \) permutes the components of \(\mathbf {y}\). Because the Clifford group is a subgroup of the unitaries, the commutant is in general strictly larger. We thus have to add further operators to the \(r_\pi \)’s in order to find a complete set.

The central message of this section is that, surprisingly, a minor modification of (1.3) suffices! Indeed, for any subspace T of \({\mathbb {Z}}_d^t \oplus {\mathbb {Z}}_d^t\) define

$$\begin{aligned} r(T) = \sum _{(\mathbf {x},\mathbf {y})\in T} \vert \mathbf {x}\rangle \langle \mathbf {y}\vert . \end{aligned}$$

We also consider the n-fold tensor power \(R(T):=r(T)^{\otimes n}\), which is an operator on \( (({\mathbb {C}}^d)^{\otimes t})^{\otimes n} \cong ({\mathbb {C}}^d)^{\otimes t n} \cong (({\mathbb {C}}^d)^{\otimes n})^{\otimes t} \).

We now single out subspaces that satisfy certain geometric properties. Reflecting a well-known difference between even and odd dimensions in the stabilizer formalism, we define \(D=d\) if d is odd, and \(D=2d\) if d is even.

Definition 4.1

(\(\Sigma _{t,t}\)) Consider the quadratic form \({\mathfrak {q}}:{\mathbb {Z}}_d^{2t} \rightarrow {\mathbb {Z}}_D\) defined by \({\mathfrak {q}}(\mathbf {x},\mathbf {y}) :=\mathbf {x}\cdot \mathbf {x} - \mathbf {y}\cdot \mathbf {y}\).Footnote 1 We denote by \(\Sigma _{t,t}(d)\) the set of subspaces \(T\subseteq {\mathbb {Z}}_d^{2t}\) satisfying the following properties:

  1. 1.

    T is totally \({\mathfrak {q}}\)-isotropic: i.e., \(\mathbf {x}\cdot \mathbf {x} = \mathbf {y}\cdot \mathbf {y} \pmod D\) for all \((\mathbf {x},\mathbf {y})\in T\).

  2. 2.

    T has dimension t (the maximal possible dimension).

  3. 3.

    T is stochastic: \(\mathbf {1}_{2t} = (1,\dots ,1) \in T\).

We will summarize the first two conditions by saying that T is Lagrangian. Thus, we will call \(\Sigma _{t,t}(d)\) the set of stochastic Lagrangian subspaces.

Our first main result is the following theorem, which states that the operators R(T) obtained from these subspaces are a basis of the commutant:

Theorem 4.3

(Commutant of Clifford tensor powers). Let d be a prime and \(n\ge t-1\). Then the operators \(R(T)=r(T)^{\otimes n}\) for \(T \in \Sigma _{t,t}(d)\) are \(\prod _{k=0}^{t-2} (d^k+1)\) many linearly independent operators that span the commutant of the t-th tensor power action of the Clifford group for n qudits.

Proof sketch

We use the phase space formalism of finite-dimensional quantum mechanics developed in [Woo87, App05, Gro06, GE08, DB13]. In particular, Clifford unitaries have a simple description on phase space: they act by affine symplectic transformations.

We use this structure to give a concise proof that the operators R(T) commute with \(U^{\otimes t}\) for any Clifford unitary. The linear independence is not hard, so it remains to argue that the number of subspaces equals the dimension of the commutant. We show this by a careful counting argument. We first compute the number of stochastic Lagrangian subspaces. Employing the fundamental Witt’s theorem, we find recursive relations for the dimension of commutant of the Clifford group. We solve this recursion using Gaussian binomial identities (the result is a generalization of [Zhu15, (8)–(10)]) and find that the cardinalities match, concluding the proof. \(\square \)

There is a rich structure associated with the objects appearing in this theorem: It is easy to see that the spaces \(T_\pi =\{(\pi (\mathbf {y}),\mathbf {y})\}\) that give rise to the commutant of U(d) appear as special cases above. For general d and t, not all R(T)’s are invertible. In particular, for some T’s, R(T) is proportional to the projection onto a stabilizer code. This way, one can e.g., recover the code that has been used to describe the irreps contained in the 4th tensor power of the Clifford group in [ZKGG16]. The set of invertible R(T)’s are associated with spaces T of the form \((A\mathbf {y}, \mathbf {y})\), for A that are elements of a certain “stochastic orthogonal” group \(O_t(d)\). This group is of interest to the formulation of modular Howe duality [GH16], and underlies several of our applications below.

Remarkably, the size of the commutant stabilizes as soon as \(n\ge t-1\). That is, just like the symmetric group in Schur–Weyl duality, the set that parametrizes the commutant of the Clifford tensor powers is independent of the number n of qudits. The fact that the operators \(R(T)=r(T)^{\otimes n}\) are themselves tensor powers facilitates possible physical implementations. This, once more, generalizes a property of the symmetric group in Schur–Weyl duality.

To find novel applications of this theory, it is helpful to identify a set of non-trivial T’s that afford an intuitive interpretation. Several of our multi-qubit results presented below are based on spaces with elements \(({\bar{\pi }} \mathbf {y}, \mathbf {y})\), where \({\bar{\pi }}\) is what we refer to as an anti-permutation. An anti-permutation is simply the binary complement of a permutation matrix. Formally, \({\bar{\pi }} = \mathbf {1}_t \mathbf {1}_t^T - \pi \), where \(\mathbf {1}_t=(1,\dots , 1)\) contains t ones, and \(\pi \in S_t\). Its operator representation is particularly straightforward. The n-qubit anti-identity, e.g., acts by

$$\begin{aligned} R({\bar{\mathbb {1}}}) = 2^{-n} \left( I^{\otimes t} + X^{\otimes t} + Y^{\otimes t} + Z^{\otimes t} \right) ^{\otimes n}, \end{aligned}$$
(1.4)

which greatly facilitates the analysis (cf. Eq. (3.13) and Definition 4.29).

1.3 Quantum property testing: stabilizer testing

The theory of quantum property testing asks which properties of a “black box” many-body quantum system can be learned efficiently—in particular without having to resort to costly full tomography [BFNR03, BFNR08, MdW16]. A prototypical example of a testable property is purity. Indeed, given access to two copies \(\rho \otimes \rho \) of an unknown quantum state \(\rho \), the so-called swap test provides for a simple protocol that accepts with certainty if \(\rho =\vert \psi \rangle \langle \psi \vert \) is pure, and rejects with probability \(\Theta (1/\varepsilon ^2)\) if \(\rho \) is \(\varepsilon \)-far away from the set of pure states in trace distance. The test is perfectly complete in the sense that it has a type-I error rate of zero (pure states are accepted with certainty); it requires a number of copies (two) that is independent of the dimension. It is also transversal in the sense that if \(\rho \) acts on n qubits, all operations are required to be coherent only across the two copies, and factorize w.r.t. the n qubits.

An open problem in this theory was whether stabilizerness and Cliffordness are testable properties of, respectively, states and unitaries [MdW16]. Both properties are clearly Clifford-invariant—so by the arguments presented in the introduction, it makes sense to search for tests in the commutant of the Clifford group. It is known that 2nd and 3rd moments of random stabilizer states are identical to the moments of Haar-random states [Zhu15, KG15, Web16]. This implies that three copies of a state are not sufficient to test for stabilizerness, and the results of [ZKGG16] can be used to show that four copies are also insufficient for a dimension-independent theory.

Prior work. Prior to our results, the best known algorithms for stabilizer testing required a number of copies that scaled linearly with n, the number of qubits. Indeed, these algorithms proceeded by attempting to identify the stabilizer state, which necessarily requires \(\Omega (n)\) copies by the Holevo bound [AG08, Mon17, ZPDF16, KR08]. However, the existence of tests that require only a constant number of copies has been an important open question [MdW16]. We note that the stabilizer testing problem asks whether a given state is any stabilizer state—which is distinct from the problem of verifying whether it equals some fixed stabilizer state [HM15].

Our results. We show that for n qudits O(1) copies suffice to give an efficient, perfectly complete, dimension-independent, and transversal test. For example, for qubits (\(d=2\)) our test requires only 6 copies of the state to achieve a power independent of n (Algorithm 1). It requires coherent operations on only two qubits at a time, which means in particular that it can be implemented given a source that creates two copies of a fixed state at a time (Fig. 2).

First, we consider the problem for qubits. Here our protocol affords an intuitive description using a new primitive which we call Bell difference sampling. Then we proceed to the general case and discuss the connection to the commutant of the Clifford group described in Sect. 1.2.

1.3.1 Qubits: Bell difference sampling

We start with an intuitive motivation of the test. Let \(\vert \psi \rangle \langle \psi \vert =2^{-n/2} \sum _{\mathbf {a}} c_{\mathbf {a}} W_{\mathbf {a}}\) be its expansion w.r.t. the Weyl operators \(W_{\mathbf {a}}\) (which are just the Pauli operators labeled in the usual way by bitstrings \(\mathbf {a}\in {\mathbb {Z}}_2^{2n}\), cf. Sect. 2). Now measure two copies of \(\psi \) in the Bell basis \(\vert W_{\mathbf {x}}\rangle \) defined by applying the Weyl operators to the maximally entangled state, i.e., \(\vert W_{\mathbf {x}}\rangle = (W_{\mathbf {x}} \otimes I) \vert \Phi ^+\rangle \) where \(\vert \Phi ^+\rangle = 2^{-n/2}\sum _{\mathbf {q}} \vert \mathbf {q},\mathbf {q}\rangle \). If \(\psi \) is real in the computational basis then it is not hard to see that the measurement outcome is distributed according to the probability distribution \(p_\psi (\mathbf {a}) = |c_{\mathbf {a}}|^2\). This is known as Bell sampling [Mon17, ZPDF16]. Now stabilizer states are distinguished by the fact that they are eigenvectors of all Weyl operators \(W_{\mathbf {a}}\) for which \(|c_{\mathbf {a}}|^2\ne 0\) (these are its stabilizer group). This suggests using Bell sampling to obtain some \(\mathbf {a}\), then measuring \(W_{\mathbf {a}}\) twice on two fresh copies, and accepting \(\psi \) as a stabilizer if the same eigenvalue is obtained twice.

figure a

While we show that this works for real state vectors, Bell sampling unfortunately does not extend to complex state vectors. To overcome this challenge, we introduce a new primitive:

Definition 3.1

(Bell difference sampling). We define Bell difference sampling as performing Bell sampling twice and subtracting (adding) the results from each other (modulo two). In other words, it is the projective measurement on four copies of a state, \(\psi ^{\otimes 4}\in (({\mathbb {C}}^2)^{\otimes n})^{\otimes 4}\), given by

$$\begin{aligned} \Pi _{\mathbf {a}} = \sum _{\mathbf {x}} \vert W_{\mathbf {x}}\rangle \langle W_{\mathbf {x}}\vert \otimes \vert W_{\mathbf {x}+\mathbf {a}}\rangle \langle W_{\mathbf {x}+\mathbf {a}}\vert . \end{aligned}$$

For stabilizer states (whether real or complex) it is easy to see that Bell difference sampling will always sample an element \(\mathbf {a}\) corresponding to a Weyl operator \(W_{\mathbf {a}}\) in its stabilizer group. What is rather less obvious is that, even for arbitrary quantum states, Bell difference sampling still has a useful interpretation. The following theorem shows that this is indeed the case: it amounts to sampling from the probability distribution \(p_\psi \) twice and taking the difference.

Theorem 3.2

(Bell difference sampling). Let \(\psi \) be an arbitrary pure state of n qubits. Then:

$$\begin{aligned} {{\,\mathrm{tr}\,}}\left[ \Pi _{\mathbf {a}} \psi ^{\otimes 4}\right] = \sum _{\mathbf {x}} p_\psi (\mathbf {x}) p_\psi (\mathbf {x}+\mathbf {a}). \end{aligned}$$

If \(\psi \) is a stabilizer state, say \(\vert S\rangle \langle S\vert \), then this is equal to \(p_S(\mathbf {a})\) from Eq. (3.3).

Using Bell difference sampling as a primitive, we obtain the natural Algorithm 1.

Theorem 3.3

(Stabilizer testing for qubits). Let \(\psi \) be a pure state of n qubits. If \(\psi \) is a stabilizer state then Algorithm 1 accepts with certainty, \(p_\text {accept}=1\). On the other hand, if \(\max _S |\langle S|\psi \rangle |^2\le 1-\varepsilon ^2\) then \(p_\text {accept}\le 1-\varepsilon ^2/4\).

Proof sketch

We want to show that if the success probability, \(p_{\text {accept}}\), is close to one then \(\psi \) has high overlap with a stabilizer state. The proof proceeds in two steps. First, we analyze the success probability and show that if \(p_{\text {accept}} \approx 1\) then \(p_\psi (\mathbf {a})\) is typically close to its maximum possible value \(2^{-n}\). Next, we use Markov’s inequality to find a large set of \(\mathbf {a}\) where \(p_\psi (\mathbf {a})>\frac{1}{2} 2^{-n}\). Using a version of uncertainty principle (see Fig. 1), we show that the corresponding Weyl operators \(W_{\mathbf {a}}\) necessarily commute, and therefore form a stabilizer subgroup. This finally means that our initial state must have a large overlap with a corresponding stabilizer state. \(\square \)

Fig. 1
figure 1

The X-Z-plane of the Bloch sphere. The area shaded in red indicates the projection of those states \(\rho \) for which \(|{{\,\mathrm{tr}\,}}Z \rho |>\sin \frac{\pi }{4}=\frac{1}{\sqrt{2}}\). Likewise, the blue area correspond to the states with \(|{{\,\mathrm{tr}\,}}X \rho |>\frac{1}{\sqrt{2}}\). As the two areas do not intersect, these two conditions cannot be simultaneously satisfied. This is a manifestation of the uncertainty principle

Theorem 3.3 solves the stabilizer testing conjecture for qubits. It also implies a number of interesting corollaries. E.g., it directly follows that one can also test Cliffordness of a unitary efficiently, without given black-box access to the inverse as in [Low09, Wan11]; this resolves another open problem from [MdW16]. From a structural point of view, it shows that the Clifford group is the solution, within \(U(2^n)\), of a set of polynomial equations of order 6. Our result is optimal in the sense that there exist no perfectly complete tests for fewer than six copies that achieve statistical power independent of the number of qubits (see Sect. 5 and [Dam18]).

1.3.2 Qudits

A careful analysis of the measurement of Algorithm 1 shows that it is equivalent to a projective measurement of the form \(\Pi _{\text {accept}}=\frac{1}{2} \left( I + V \right) \), where V is the following Hermitian unitary operator:

$$\begin{aligned} V =2^{-n} \sum _{\mathbf {x}} W_{\mathbf {x}}^{\otimes 6}. \end{aligned}$$
(1.5)

It can be readily seen that the operator Eq. (1.5) commutes with tensor powers of Clifford unitaries.

In fact, as discussed earlier, it is natural to approach the stabilizer testing problem by measuring operators in the commutant of the Clifford group. Since the stabilizer states are a single orbit under the Clifford group, any such measurement by design will have the same level of significance on all stabilizer states.

Equation (1.5) and corresponding measurement have a clear generalization to arbitrary qudits. Let \(d\ge 2\) and consider the operators

$$\begin{aligned} \Pi _{s,\text {accept}} = \frac{1}{2}(I+V_s)\quad \text {where} \quad V_s = d^{-n} \sum _{\mathbf {x}} (W_{\mathbf {x}} \otimes W_{\mathbf {x}}^\dagger )^{\otimes s}. \end{aligned}$$
(1.6)

One can see that if \((d,s)=1\), \(V_s\) is a Hermitian unitary and so \(\Pi _{s,\text {accept}}\) is a projector. We now state our general stabilizer testing result:

Theorem 3.11

(Stabilizer testing for qudits). Let \(d\ge 2\) and choose \(s\ge 2\) such that \((d,s)=1\). Let \(\psi \) be a pure state of n qudits and denote by \(p_\text {accept}={{\,\mathrm{tr}\,}}[\psi ^{\otimes 2s}\Pi _{s,\text {accept}}]\) the probability that the POVM element \(\Pi _{s,\text {accept}}\) accepts given 2s copies of \(\psi \). If \(\psi \) is a stabilizer state then it accepts with certainty, \(p_\text {accept}=1\). On the other hand, if \(\max _S|\langle S|\psi \rangle |^2\le 1-\varepsilon ^2\) then \(p_\text {accept}\le 1-C_{d,s}\varepsilon ^2\), where \(C_{d,s} = (1 - (1 - 1/4d^2)^{s-1})/2\).

The proof proceeds similarly to the one of Theorem 3.3. Again, an uncertainty relation for Weyl operators plays an important role. We record it since it may be of independent interest:

Lemma 3.10

(Uncertainty relation). Let \(\delta = 1/2d\) and \(\psi \) a pure state such that \(|{{\,\mathrm{tr}\,}}[\psi W_{\mathbf {x}}]|^2 > 1-\delta ^2\) and \(|{{\,\mathrm{tr}\,}}[\psi W_{\mathbf {y}}]|^2 > 1-\delta ^2\). Then \(W_{\mathbf {x}}\) and \(W_{\mathbf {y}}\) must commute.

We also study the minimal number of copies required to distinguish stabilizer states from non-stabilizer states in such a way that the power of the statistical test does not decrease with the number of qubits. Since the stabilizer states share the same second moments with uniformly random states (see Sect. 1.6 below for more detail), one can see that any such test requires at least three copies. Our next result shows that this is sufficient at least when \(d\equiv 1,5\pmod 6\). For this, consider the POVM element

$$\begin{aligned} \Pi _{\text {accept}} = \frac{1}{2} (I+V) \quad \text {where} \quad V:=d^{-n} \sum _{\mathbf {x}} A_{\mathbf {x}}^{\otimes 3}. \end{aligned}$$

Theorem 8.6

(Stabilizer testing from three copies). Let \(d\equiv 1,5\pmod 6\) and \(\psi \) a pure state of n qudits. Denote by \(p_\text {accept}={{\,\mathrm{tr}\,}}[\psi ^{\otimes 3}\Pi _\text {accept}]\) the probability that the POVM element \(\Pi _\text {accept}\) accepts given three copies of \(\psi \). If \(\psi \) is a stabilizer state then it accepts with certainty, \(p_\text {accept}=1\). On the other hand, if \(\max _S|\langle S|\psi \rangle |^2\le 1-\varepsilon ^2\) then \(p_\text {accept}\le 1-\varepsilon ^2/16d^2\).

The operators \(A_{\mathbf {x}}\) are known as phase-space point operators [Gro06], which are defined by a (symplectic) Fourier transform of the Weyl operator basis \(W_{\mathbf {a}}\) (with respect to the index \(\mathbf {a}\)). Again, the test corresponds to a particular element of the commutant, and to establish Theorem 8.6 we also need another uncertainty relation, this time for phase-space point operators.

Lemma 8.2

Let d be an odd integer and \(\psi \) a pure state of n qudits. Suppose that \({{\,\mathrm{tr}\,}}[\psi A_{\mathbf {x}}],{{\,\mathrm{tr}\,}}[\psi A_{\mathbf {y}}]\), \({{\,\mathrm{tr}\,}}[\psi A_{\mathbf {z}}] > \sqrt{1-1/2d^2}\). Then \([\mathbf {z}-\mathbf {x},\mathbf {y}-\mathbf {x}]=0\), i.e., \(W_{\mathbf {z}-\mathbf {x}}\) and \(W_{\mathbf {y}-\mathbf {x}}\) must commute.

Lastly, we derive an explicit prescription for the minimal test that is perfectly complete, i.e., detects all stabilizer states with certainty. Here we use the full power of the algebraic theory. We assume that d is a prime.

Definition 4.11

(\(O_t\)) Consider the quadratic form \(q:{\mathbb {Z}}_d^t \rightarrow {\mathbb {Z}}_D\) defined by \(q(\mathbf {x}) :=\mathbf {x}\cdot \mathbf {x}\).Footnote 2 We define \(O_t(d)\) as the group of \(t\times t\)-matrices O with entries in \({\mathbb {Z}}_d\) that satisfy the following properties:

  1. 1.

    O is a q-isometry: i.e., \(O\mathbf {x}\cdot O\mathbf {x}=\mathbf {x}\cdot \mathbf {x}\pmod D\) for all \(\mathbf {x}\in {\mathbb {Z}}_d^t\).

  2. 2.

    O is stochastic: \(O \mathbf {1}_t = \mathbf {1}_t \pmod d\).

We will refer to \(O_t(d)\) as the stochastic orthogonal group; its elements will be called stochastic isometries.

Equivalently, \(O_t(d)\) is the group of \(t\times t\)-matrices O that are orthogonal in the ordinary sense (i.e., \(O^T O = I \bmod d\)) and such that the sum of elements in each row is equal to 1 (mod D). See Remark 4.12 for more details.

Note that the subspace \(T_O:=\{(O\mathbf {y},\mathbf {y})\,:\, \mathbf {y} \in {\mathbb {Z}}_d^t\}\) is a stochastic Lagrangian subspace in \(\Sigma _{t,t}(d)\) (as defined above in Definition 4.1), and so we obtain a corresponding operator in the commutant, which we abbreviate by \(R(O)=R(T_O)\). It is easy to see that the operators R(O) define a representation of the group \(O_t(d)\), so

$$\begin{aligned} \Pi ^{\min }_t :=\frac{1}{|O_t(d) |}\sum _{O\in O_t(d)} R(O) \end{aligned}$$

is the projector onto the invariant subspace for this action. Remarkably, not only do the R(O) stabilize all stabilizer tensor powers \(\vert S\rangle ^{\otimes t}\) (Eq. (4.13)), but \(\Pi ^{\min }_t\) is in fact the minimal perfectly complete test for stabilizer states:

Theorem 5.6

(Minimal stabilizer test with perfect completeness). Let d be a prime and \(n,t\ge 1\). Then the projector \(\Pi ^{\min }_t\) is the orthogonal projector onto \({{\,\mathrm{span}\,}}~\{\vert S\rangle ^{\otimes t}\, : \, \vert S\rangle \langle S\vert \in {{\,\mathrm{Stab}\,}}(n,d) \}\).

Are there any other tensor power states in the support of \(\Pi ^{\min }_t\)? For every \(d\ge 2\), we have proved above there exists some \(t\ge 3\) such that stabilizer testing is possible using t copies. Since the accepting POVM element is in each case the projector onto the invariant subspace of an element in \(O_t(d)\) (e.g., the anti-identity for \(d=2\) and \(t=6\)), it follows that in this case the only tensor power states contained in the support of \(\Pi ^{\min }_t\) are tensor powers of stabilizer states!

1.4 De Finetti theorems for stabilizer symmetries

Quantum de Finetti theorems provide versatile tools for the study of correlations in quantum states with permutation symmetry. They have found many important applications, from quantifying the monogamy of entanglement to proving security for quantum key distribution protocols, where de Finetti theorems allow to reduce general attacks to collective attacks [Ren05]. By now, several variants and generalizations are known [Stø69, HM76, RW89, Pet90, CFS02, KR05, DOS07, CKMR07, Ren07, NOP09, KM09, BCY11, BH13, BH17, BCHW16]. Generally speaking, de Finetti theorems state that when \(\rho \) is a quantum state on \(({\mathbb {C}}^\ell )^{\otimes t}\) that commutes with all permutations (i.e., \([r_\pi , \rho ]=0\) for all \(\pi \in S_t\), where \(r_\pi \) are the permutation operators defined in Eq. (1.2)) then its reduced density operators \(\rho _{1\dots {}s}={{\,\mathrm{tr}\,}}_{s+1\dots {}t}[\rho ]\) are well-approximated by convex mixtures of i.i.d. states in some suitable sense if \(s\ll t\). E.g., for any such \(\rho \) there exists a probability measure \(d\mu \) on the space of mixed states on \({\mathbb {C}}^\ell \) such that [CKMR07],

figure b

Using the techniques developed for stabilizer testing, we prove two new versions of the quantum de Finetti theorem adapted to the symmetries inherent in stabilizer states. A key insight from the preceding section was that any stabilizer tensor power \(\vert S\rangle ^{\otimes t}\) is stabilized not only by the permutations, but by the larger group \(O_t(d)\). This group contains includes in general many more elements, for example, the anti-identity (1.4) for the case of qubits. Our de Finetti theorems for stabilizer states show that if we consider arbitrary states \(\rho \) on \((({\mathbb {C}}^d)^{\otimes n})^{\otimes t}\) that show symmetries of this kind, then the conclusions of the de Finetti theorem can be strengthened. In this case, the reduced states can be well-approximated by convex mixtures of tensor powers of stabilizer states in \({{\,\mathrm{Stab}\,}}(n,d)\) (rather than of general pure states in \(({\mathbb {C}}^d)^{\otimes n}\)).

Our first de Finetti theorem shows that the enlarged symmetry provided by the stochastic orthogonal group ensures that the approximation is exponentially good in the number of traced-out subsystems. This is remarkable, since the ordinary permutation symmetry-based de Finetti theorem achieves exponential convergence only if the form of allowed states is relaxed to include “almost product states” [Ren07] or “high weight vectors” (as opposed to highest weight vectors) [KM09]. Such a relaxation is, in fact, already necessary for classical distributions [DF80]. In detail:

Theorem 7.6

(Exponential stabilizer de Finetti theorem). Let d be a prime and \(\rho \) a quantum state on \((({\mathbb {C}}^d)^{\otimes n})^{\otimes t}\) that commutes with the action of \(O_t(d) \supseteq S_t\). Let \(1\le s\le t\). Then there exists a probability distribution p on the (finite) set of mixed stabilizer statesFootnote 3 of n qudits, such that

$$\begin{aligned} \frac{1}{2}\left\Vert \rho _{1\dots {}s} - \sum _{\sigma _S} p(\sigma _S) \sigma _S^{\otimes s} \right\Vert _1 \le 2 d^{\frac{1}{2}(2n+2)^2} d^{-\frac{1}{2}(t-s)}. \end{aligned}$$

Our theorem can be understood as a stabilizer version of the Gaussian Finetti theorems established in [LC09, Lev16]; cf. [DEL92]. The latter have been successfully used to establish security of continuous-variable quantum key distribution (QKD) protocols which admit the required symmetries [LGPRC13, Lev17]. Since the input states of entanglement-based QKD schemes [Eke91], are usually taken to be powers of stabilizer states, they show the enlarged symmetry identified here—a fact that seems to have been overlooked so far. It is this natural to study applications of our de Finetti theorems to QKD security proofs—we will report results on this elsewhere.

We can also ask to which extent the conclusions of Theorem 7.6 hold if we only slightly enlarge the symmetry group. The following theorem shows that if we consider quantum states that commute with permutations as well as the anti-identity (but not necessarily other elements of \(O_t(d)\)) then we still get an approximation by mixtures of stabilizer tensor powers—but now with a polynomially rather than exponentially small error:

Theorem 7.7

(Stabilizer de Finetti theorem for the anti-identity). Let \(\rho \) be a quantum state on \((({\mathbb {C}}^2)^{\otimes n})^{\otimes t}\) that commutes with all permutations as well as with the action of the anti-identity (1.4) on some (and hence every) subsystem consisting of six n-qubit blocks. Let \(s<t\) be a multiple of six. Then there exists a probability distribution p on the (finite) set of mixed stabilizer states of n qubits, such that

$$\begin{aligned} \frac{1}{2}\left\Vert \rho _{1\dots {}s} - \sum _{\sigma _S} p(\sigma _S) \sigma _S^{\otimes s} \right\Vert _1 \le 6 \sqrt{2} \cdot 2^n \sqrt{\frac{s}{t}}. \end{aligned}$$

While Theorem 7.7 is stated here only for qubits, we believe that a similar result can be established in any prime dimension.

1.5 Robust Hudson theorem

Similar techniques can also be applied to pure states with a small amount of negativity in their phase space representation. More precisely, recall that for odd d the Wigner function of a quantum state \(\psi \) is defined by \(w_\psi (\mathbf {x}) = d^{-n} {{\,\mathrm{tr}\,}}[A_{\mathbf {x}} \psi ]\), where the operators \(A_{\mathbf {x}}\) are the phase-space point operators mentioned above. The Wigner function is a quasi-probability distribution, i.e., \(\sum _{\mathbf {x}} w_\psi (\mathbf {x})=1\), but it can be negative. This negativity plays an important role—e.g., it is an obstruction to efficient classical simulability [VFGE12, ME12] and witnesses the onset of contextuality [HWVE14].

In fact, pure stabilizer states are characterized by having a nonnegative Wigner function—this is the discrete Hudson theorem [Gro06]. Our next result shows that this characterization is robust, and that the robustness is independent of the system size (number of qudits). The relevant quantity is the Wigner or sum-negativity \({{\,\mathrm{sn}\,}}(\psi ) = \sum _{w_\psi (\mathbf {x})<0} |w_\psi (\mathbf {x})|\), i.e., the absolute sum of negative entries of the Wigner function.

Theorem 8.4

(Robust finite-dimensional Hudson theorem). Let d be odd and \(\psi \) a pure quantum state of n qudits. Then there exist a stabilizer state \(\vert S\rangle \) such that \(|\langle S|\psi \rangle |^2 \ge 1 - 9 d^2 {{\,\mathrm{sn}\,}}(\psi )\).

Our theorem gives a new quantitative meaning to the sum-negativity, and thereby to the related mana, a monotone from the resource theory of stabilizer states [VMGE14] that has attracted increasing attention in the theory of fault-tolerant quantum computation.

1.6 Random stabilizers, higher moments and designs

As mentioned to above, randomized constructions based on the Haar measure are often near-optimal, yet have the drawback that generic quantum states cannot be efficiently prepared. In contrast, random stabilizer states can be efficiently implemented, and early on, it had been discovered that they reproduce the same second moments as the Haar measure! More recently, there had been significant progress on the third and fourth moments [ZKGG16, HWW16, NW16], opening up several many applications where random Clifford unitaries and stabilizer states have successfully replaced the Haar measure [MGE11, HWFW17, KZG16, HNQ+16, NW16]. To go beyond, however, a general understanding of the statistical properties of random stabilizer states is required.

The theory presented in this paper implies general formulas for the t-th moments of stabilizer states. For qudits,

figure c

where T ranges precisely over the maximal isotropic stochastic subspaces from Theorem 4.3!

Recall that a (complex projective) t-design is an ensemble of states \(\{p_i,\vert \psi _i\rangle \}\) such that the average of any polynomial of degree t identically matches the average of the same polynomial with respect to the Haar measure. In other words, a t-design satisfies

$$\begin{aligned} {\mathbb {E}}_{i \sim p} \left[ (\vert \psi _i\rangle \langle \psi _i\vert )^{\otimes t} \right] = {\mathbb {E}}_{\psi \text { Haar}} \left[ (\vert \psi \rangle \langle \psi \vert )^{\otimes t} \right] = \frac{1}{N'_{d,t}} \sum _{\pi \in S_t} r_\pi , \end{aligned}$$
(1.7)

where the right-hand side is the familiar formula for the maximally mixed state on the symmetric subspace in terms of the symmetrizer—an easy consequence of Schur–Weyl duality. When the stabilizer states form a t-design (\(t\le 3\) for qubits, \(t\le 2\) otherwise), Eq. (5.3) reduces to Eq. (1.7). Equation (5.3) unifies and generalizes all previously known results [Zhu15, KG15, Web16, ZKGG16, HWW16, NW16].

Importantly, however, our formula allows us to compute an arbitrary t-th moment even when the stabilizer states deviate significantly from being a t-design. In fact, we demonstrate the power of the formula by using it to establish that following remarkable fact: Even when the stabilizer states (a single Clifford orbit) fail to be a t-design, we can obtain t-designs by taking a finitely many Clifford orbits with appropriately chosen weights:

Theorem 6.2

Let d be a prime and \(n\ge t-1\). Then there exists an ensemble \(\{p_i,\Psi _i\}_{i=1}^{M_{t,d}}\) of fiducial states in \(({\mathbb {C}}^d)^{\otimes n}\) such that:

$$\begin{aligned} {\mathbb {E}}_{i \sim p} {\mathbb {E}}_{U \text { Clifford}}\left[ \left( U\vert \Psi _i\rangle \langle \Psi _i\vert U^\dagger \right) ^{\otimes t}\right] = {\mathbb {E}}_{\Psi \text { Haar}}\left[ \vert \Psi \rangle \langle \Psi \vert ^{\otimes t}\right] \end{aligned}$$

That is, the corresponding ensemble of Clifford orbits is a complex projective t-design. Importantly, the number of fiducial states does not depend on the number of qudits n.

2 Preliminaries

2.1 Pauli and Clifford group

Let \(d\ge 2\) be an arbitrary integer. We first consider a single qudit with computational basis vectors \(\vert q\rangle \), where \(q\in \{0,\dots ,d-1\}\) or \(q\in {\mathbb {Z}}_d={\mathbb {Z}}/d{\mathbb {Z}}\). We define unitary shift and boost operators

$$\begin{aligned} X \vert q\rangle = \vert q+1\rangle , \quad Z \vert q\rangle = \omega ^q \vert q\rangle , \end{aligned}$$

where \(\omega =e^{2\pi i q/d}\).

The algebra of shift and boost operators differs slightly depending on whether d is even or odd. For uniform treatment, one introduces \(\tau =(-1)^d e^{i\pi /d}=e^{i\pi (d^2+1)/d}\). Note that \(\tau ^2=\omega \). Let D denote the order of \(\tau \). Then \(D=2d\) if d is even, but \(D=d\) if d is odd (indeed, in this case \(\tau =\omega ^{2^{-1}}\), where \(2^{-1}\) denotes the multiplicative inverse of 2 mod d). Then \(Y :=\tau X^\dagger Z^\dagger \) is such that \(XYZ=\tau I\), generalizing the commutation relation of the usual Pauli operators for qubits (where \(\tau =i\)). For a single qudit, the Pauli group is generated by XYZ or, equivalently, by \(\tau I,X,Z\).

For n qudits, the Hilbert space is \({\mathcal {H}}_n = ({\mathbb {C}}^d)^{\otimes n}\), with computational basis vectors \(\vert \mathbf {q}\rangle =\vert q_1,\dots ,q_n\rangle \), and the Pauli group \({\mathcal {P}}_n\) is the group generated by the tensor product of IXYZ acting on each of the n qudits.

The Clifford group \({{\,\mathrm{Cliff}\,}}(n,d)\) is defined as normalizer of the Pauli group in the unitary group, modulo phases. That is, it consists of all unitary operators U that \(U {\mathcal {P}}_n U^\dagger \subseteq {\mathcal {P}}_n\), up to phases. For qubits, the Clifford group is generated by the phase gate , the Hadamard gate , and the controlled-NOT gate.

2.2 Weyl operators and characteristic function

At this point it is useful to recall the phase space picture of finite-dimensional quantum mechanics developed in [Woo87, App05, Gro06, GE08, DB13], which is analogous to the phase space formalism for continuous-variable systems used, e.g., in quantum optics [Sch11]. For \(\mathbf {x}=(\mathbf {p},\mathbf {q})\in {\mathbb {Z}}^{2n}\), define the Weyl operator

$$\begin{aligned} W_{\mathbf {x}} = W_{\mathbf {p}, \mathbf {q}} = \tau ^{-\mathbf {p}\cdot \mathbf {q}} (Z^{p_1} X^{q_1})\otimes \cdots \otimes (Z^{p_n} X^{q_n}). \end{aligned}$$
(2.1)

Clearly, each Weyl operator is an element of the Pauli group. Conversely, each element of the Pauli group is equal to a Weyl operator up to a phase that is a power of \(\tau \). It is not hard to see that the Weyl operators themselves only depend on \(\mathbf {x}\) modulo D (which we recall is 2d or d, depending on whether d is even or odd). Indeed,

$$\begin{aligned} W_{\mathbf {x}+d\mathbf {z}} = (-1)^{(d+1)[\mathbf {x},\mathbf {z}]} W_{\mathbf {x}}, \end{aligned}$$
(2.2)

where we have introduced the \({\mathbb {Z}}\)-valued symplectic form on \({\mathbb {Z}}^{2n}\)

$$\begin{aligned}{}[\mathbf {x},\mathbf {y}] = [(\mathbf {p},\mathbf {q}),(\mathbf {p}',\mathbf {q}')] = \mathbf {p} \cdot \mathbf {q}' - \mathbf {q} \cdot \mathbf {p}'. \end{aligned}$$
(2.3)

We will often use the symplectic form in situations where \(\mathbf {x}\), \(\mathbf {y}\) are elements of \({\mathbb {Z}}_d^{2n}\) or \({\mathbb {Z}}_D^{2n}\), and interpret \([\mathbf {x},\mathbf {y}]\) accordingly. For example,

$$\begin{aligned} W_{\mathbf {x}} W_{\mathbf {y}} = \tau ^{[\mathbf {x},\mathbf {y}]} W_{\mathbf {x}+\mathbf {y}} \end{aligned}$$
(2.4)

for all \(\mathbf {x}\), \(\mathbf {y}\in {\mathbb {Z}}^{2n}_D\). This implies that in particular

$$\begin{aligned} W_{\mathbf {x}} W_{\mathbf {y}} = \omega ^{[\mathbf {x},\mathbf {y}]} W_{\mathbf {y}} W_{\mathbf {x}}. \end{aligned}$$
(2.5)

Thus the commutation relations between Weyl operators only depend on \(\mathbf {x}, \mathbf {y}\) mod d. In this sense, \({\mathcal {V}}_n=\{0,\dots ,d-1\}^{2n}\) is the natural classical phase space associated with the Hilbert space \({\mathcal {H}}_n=({\mathbb {C}}^d)^{\otimes n}\). We will often write \(W_{\mathbf {x}}\) for \(\mathbf {x}\in {\mathbb {Z}}_d^{2n}\), identifying \({\mathbb {Z}}_d^{2n} \cong {\mathcal {V}}_n\) in the standard way.

Note that \({{\,\mathrm{tr}\,}}[W_{\mathbf {x}}]\ne 0\) if and only if \(W_{\mathbf {x}}\) is a scalar multiple of the identity (necessarily \(\pm I\)), that is, if and only if \(\mathbf {x}\equiv \mathbf {0} \pmod d\). Together with Eq. (2.4), it follows that the re-scaled Weyl operators \(\{d^{-n/2} W_{\mathbf {x}}\}\) for \(\mathbf {x}\in V_n\) form an orthonormal basis with respect to the Hilbert-Schmidt inner product \(\langle A,B\rangle = {{\,\mathrm{tr}\,}}[A^\dagger B]\). In particular, any operator B on \({\mathcal {H}}_n\) can be expanded in the form \(B = d^{-n/2} \sum _{\mathbf {x}\in {\mathcal {V}}_n} c_B(\mathbf {x}) W_{\mathbf {x}}\). The expansion coefficients \(c_B(\mathbf {x})\) together define the characteristic function \(c_B: {\mathcal {V}}_n\rightarrow {\mathbb {C}}\) of the operator B,

$$\begin{aligned} c_B(\mathbf {x}) = d^{-n/2} {{\,\mathrm{tr}\,}}[W_{\mathbf {x}}^\dagger B], \end{aligned}$$
(2.6)

and we have Parseval’s identity

$$\begin{aligned} {{\,\mathrm{tr}\,}}[A^\dagger B] = \sum _{\mathbf {x}\in {\mathcal {V}}_n} {\overline{c}}_A(x) c_B(x). \end{aligned}$$
(2.7)

By definition, if U is a Clifford unitary then, for every \(\mathbf {x}\in {\mathcal {V}}_n\), \(U W_{\mathbf {x}} U^\dagger \) is proportional to a Weyl operator \(W_{\mathbf {x}'}\), where we can take \(\mathbf {x}'\in {\mathcal {V}}_n\) in view of Eq. (2.2). Since conjugation preserves the commutation relations, this action has substantially more structure. In particular, the mapping \(\mathbf {x} \mapsto \mathbf {x}'\) is implemented by an element of the symplectic group \({{\,\mathrm{Sp}\,}}(2n,d)\), i.e., a linear transformation of \({\mathbb {Z}}_d^{2n}\) that preserves the symplectic form (2.3). The following facts are well-known in the literature (e.g. [App05, Gro06, DB13, Zhu15]).

Lemma 2.1

For any prime d and any \(n\in {\mathbb {N}}\), the following holds:

  1. 1.

    For each \(U\in {{\,\mathrm{Cliff}\,}}(n,d)\), there is a \(\Gamma \in {{\,\mathrm{Sp}\,}}(2n,d)\) and a function \(f:{\mathbb {Z}}_d^{2n} \rightarrow {\mathbb {Z}}_d\) such that

    $$\begin{aligned} U W_{\mathbf {x}} U^\dagger = \omega ^{f(\mathbf {x})} W_{\Gamma \mathbf {x}} \qquad \forall \,\mathbf {x} \in {\mathbb {Z}}_d^{2n}. \end{aligned}$$
    (2.8)
  2. 2.

    Conversely, for each \(\Gamma \in {{\,\mathrm{Sp}\,}}(2n,d)\), there is a \(U\in {{\,\mathrm{Cliff}\,}}(n,d)\) and a phase function \(f:{\mathbb {Z}}_d^{2n} \rightarrow {\mathbb {Z}}_d\) such that Eq. (2.8) holds. If d is odd, one can choose U such that \(f\equiv 0\).

  3. 3.

    The quotient of the Clifford group by Weyl operators and phases is isomorphic to \({{\,\mathrm{Sp}\,}}(2n,d)\).

Below, we will frequently assume that a correspondence \(\Gamma \mapsto U_\Gamma \) has been fixed.

2.3 Wigner function and phase space point operators

It is also useful to consider the symplectic Fourier transform, which for any function \(f:{\mathcal {V}}_n\rightarrow {\mathbb {C}}\) is defined as

$$\begin{aligned} {\hat{f}}(\mathbf {x}) = d^{-n} \sum _{\mathbf {y}} \omega ^{-[\mathbf {x},\mathbf {y}]} f(\mathbf {y}). \end{aligned}$$
(2.9)

The transformation \(f \mapsto {\hat{f}}\) is unitary, i.e., we have Parseval’s identity: \(\sum _{\mathbf {x}} \overline{{\hat{f}}(\mathbf {x})} {\hat{g}}(\mathbf {x}) = \sum _{\mathbf {y}} \overline{f(\mathbf {y})} g(\mathbf {y})\).

The Fourier transform of the characteristic function is (up to normalization) known as the Wigner function [Woo87] \(w_B:{\mathcal {V}}_n\rightarrow {\mathbb {C}}\), defined by

$$\begin{aligned} w_B(\mathbf {x})= & {} d^{-n/2} {\hat{c}}_B(\mathbf {x}) = d^{-3n/2} \sum _{\mathbf {y}} \omega ^{-[\mathbf {x},\mathbf {y}]} c_B(\mathbf {y}) \nonumber \\= & {} d^{-2n} \sum _{\mathbf {y}} \omega ^{-[\mathbf {x},\mathbf {y}]}{{\,\mathrm{tr}\,}}[W_{\mathbf {y}}^\dagger B] = d^{-n} {{\,\mathrm{tr}\,}}[A_{\mathbf {x}} B], \end{aligned}$$
(2.10)

where we have introduced the phase-space point operators

$$\begin{aligned} A_{\mathbf {x}} = d^{-n} \sum _{\mathbf {y}} \omega ^{-[\mathbf {x},\mathbf {y}]} W_{\mathbf {y}}^\dagger . \end{aligned}$$
(2.11)

The operators \(\{A_{\mathbf {x}}\}\) form an orthogonal basis of the space of all operators, \({{\,\mathrm{tr}\,}}[A_{\mathbf {x}}^\dagger A_{\mathbf {y}}] = d^n \delta _{\mathbf {x},\mathbf {y}}\), so the Wigner function can be seen as the set of coefficients of an operator as expanded in this basis, \(B = \sum _{\mathbf {x}} w_B (\mathbf {x}) A_{\mathbf {x}}^\dagger \). Moreover, the Wigner function of a quantum state is a quasiprobability distribution in the sense that \(\sum _{\mathbf {x}} w_\rho (\mathbf {x})=1\).

For odd d the Wigner function is particularly well-behaved. For one, the phase-space point operators are Hermitian (this is also true for qubits) and they square to the identity (so the eigenvalues are \(\pm 1\) and in particular \(\Vert A_{\mathbf {x}}\Vert =1\)). This means that the Wigner function of a quantum state is real and \(-d^{-n} \le w_\psi (\mathbf {x}) \le d^{-n}\). The phase-space point operators satisfy the following important identity:

$$\begin{aligned} A_{\mathbf {x}} A_{\mathbf {y}} A_{\mathbf {z}} = \omega ^{2 [\mathbf {z}-\mathbf {x},\mathbf {y}-\mathbf {x}]} A_{\mathbf {x}-\mathbf {y} +\mathbf {z}} \end{aligned}$$
(2.12)

Moreover, (only) for odd d does the Wigner function transforms covariantly with respect to the Clifford group. Here, the Clifford operators can (up to overall phase) be parametrized by an affine symplectic transformation, i.e., by a symplectic matrix \(\Gamma \in {{\,\mathrm{Sp}\,}}(2n,d)\) and a vector \(\mathbf {b}\in {\mathbb {Z}}_d^{2n}\). Then \(U=W_{\mathbf {v}}\,\mu _\Gamma \) is in \({{\,\mathrm{Cliff}\,}}(n,d)\), where \(\mu _\Gamma \) is the so-called metaplectic representation of the symplectic group (see, e.g., [Gro06]), and the conjugation action of U on phase-space point operators is given by

$$\begin{aligned} U A_{\mathbf {x}} U^\dagger = A_{\Gamma \mathbf {x} + \mathbf {b}}. \end{aligned}$$
(2.13)

In particular, the Weyl operators induce a translation in phase space.

2.4 Stabilizer groups, codes, and states

We now give uniform account of the stabilizer formalism [Got97, Got99] for qudits. Stabilizer states are commonly defined in terms of the Pauli group in the following way: Consider a subgroup of the Pauli group \(S\subseteq {\mathcal {P}}_n\) that does not contain any (nontrivial) multiple of the identity operator. Then the operator

$$\begin{aligned} P_S = \frac{1}{|S|} \sum _{P \in S} P \end{aligned}$$
(2.14)

is an orthogonal projection onto a subspace \(V_S \subseteq {\mathcal {H}}_n\) of dimension \(d^n / |S|\). We say that \(V_S\) is the stabilizer code associated with the stabilizer group S. If \(|S|= d^n\) then this code is spanned by a single state, called a (pure) stabilizer state and denoted by \(\vert S\rangle \langle S\vert \). It is given precisely by Eq. (2.14). In other words, a stabilizer state \(\vert S\rangle \) is the unique \(+1\) eigenvector (up to scalars) of all the Pauli operators in S,

$$\begin{aligned} P \vert S\rangle = \vert S\rangle \quad (\forall P\in S). \end{aligned}$$

In the following we will mostly be talking about stabilizer groups that determine a pure state. We denote the (finite) set of pure stabilizer states in \(({\mathbb {C}}^d)^{\otimes n}\) by \({{\,\mathrm{Stab}\,}}(n,d)\).

In order to connect the stabilizer formalism to the phase space picture, we observe that the stabilizer group can be written in the form

$$\begin{aligned} S = \{ \omega ^{f(\mathbf {x})} W_{\mathbf {x}} : \mathbf {x} \in M \}, \end{aligned}$$
(2.15)

for some subset \(M\subseteq {\mathcal {V}}_n\) and some function \(f:M\rightarrow {\mathbb {Z}}_d\). The two pieces of data determine the stabilizer state uniquely. Indeed, \(\vert S\rangle \) can be characterized by demanding that

$$\begin{aligned} W_{\mathbf {x}} \vert S\rangle = \omega ^{-f(\mathbf {x})} \vert S\rangle \quad (\forall P\in S). \end{aligned}$$

Moreover, it is not hard to verify that M is closed under addition (because S is a group) and that \([\mathbf {x},\mathbf {y}]=0\) for any two elements \(\mathbf {x},\mathbf {y}\in M\). Thus, M is a totally isotropic submodule of the phase space \({\mathcal {V}}_n\) (which itself can be thought of as a \({\mathbb {Z}}_d\)-module). For simplicity, we will usually say subspace instead of submodule, although the latter terminology is more appropriate for non-prime d. Moreover, \(|M|=d^n\), which is the maximal possible cardinality of any such subspace—one often says that M is a Lagrangian subspace and it holds that \(M=M^\perp \), where \(M^\perp =\{\mathbf {y} \in {\mathcal {V}}_n | [\mathbf {x},\mathbf {y}]=0\;\forall \mathbf {x}\in M\}\). See, e.g., [Gro06, GW13] for further detail on this symplectic point of view.

Conversely, suppose that M is a Lagrangian subspace of \({\mathcal {V}}_n\). Then there always exist functions f such that \(\{ \omega ^{f(\mathbf {x})} W_{\mathbf {x}} \}_{\mathbf {x}\in M}\) is a stabilizer group; we will denote the corresponding stabilizer states by \(\vert M,f\rangle \). Any other such function f can be obtained by replacing f by \(g=f+\delta \), where \(\delta :M\rightarrow {\mathbb {Z}}_d\) is a \({\mathbb {Z}}_d\)-linear function. We can always write \(\delta (\mathbf {x})=[\mathbf {z},\mathbf {x}]\); then \(\vert M,g\rangle = W_{\mathbf {z}} \vert M,f\rangle \). In this way, M parametrizes an orthonormal basis of \({\mathcal {H}}_n\) worth of stabilizer states. In particular, any state that is a simultaneous eigenvector of the \(\{W_{\mathbf {x}}\}_{\mathbf {x}\in M}\) is necessarily a stabilizer state. It is not hard to verify that the quantum channel that implements the projective measurement in this stabilizer basis \(\{\vert M,f\rangle \}_f\) is given by

$$\begin{aligned} \Lambda _M[\rho ] = \sum _f \vert M,f\rangle \langle M,f|\rho |M,f\rangle \langle M,f\vert = d^{-n} \sum _{\mathbf {x} \in M} W_{\mathbf {x}} \rho W_{\mathbf {x}}^\dagger . \end{aligned}$$
(2.16)

The fact that any stabilizer state can be parametrized as \(\vert S\rangle =\vert M,f\rangle \) will be of fundamental importance to our investigations. As a first consequence, we note that Eqs. (2.14) and (2.15) imply that \(\vert S\rangle \langle S\vert = d^{-n} \sum _{\mathbf {x}\in M} \omega ^{f(\mathbf {x})} W_{\mathbf {x}}\). This shows that the characteristic function is given by

$$\begin{aligned} c_S(\mathbf {x}) = {\left\{ \begin{array}{ll} d^{-n/2} \omega ^{f(\mathbf {x})} &{} \text { if } \mathbf {x} \in M, \\ 0 &{} \text { otherwise,} \end{array}\right. } \end{aligned}$$
(2.17)

i.e., it is supported precisely on the set M.

For odd d the phase is a linear function, so it can be written as \(f(\mathbf {x})=[\mathbf {a},\mathbf {x}]\) for some suitable vector \(\mathbf {a}\) (e.g., [Gro06, App. C]). This means that the Wigner functions of stabilizer states have the following form [GW13]:

$$\begin{aligned} w_S(\mathbf {x}) = d^{-3n/2} \sum _{\mathbf {y} \in M} \omega ^{-[\mathbf {x},\mathbf {y}]} d^{-n/2} \omega ^{[\mathbf {a},\mathbf {y}]} = {\left\{ \begin{array}{ll} d^{-n} &{} \text { if } \mathbf {x} \in \mathbf {a} + M, \\ 0 &{} \text { otherwise} \end{array}\right. } \end{aligned}$$
(2.18)

(using that \(M=M^\perp \) for a pure stabilizer state). In particular, the Wigner function is non-negative. The finite-dimensional Hudson theorem asserts that, for pure states, the converse is also true [Gro06]. In Sect. 8 we will prove a robust version of this result.

3 Testing Stabilizer States

Given two copies of an unknown pure state \(\psi =\vert \psi \rangle \langle \psi \vert \) on \({\mathcal {H}}_n\), it is easy to verify using phase estimation whether \(\vert \psi \rangle \) is an eigenvector of a given Weyl operator \(W_{\mathbf {x}}\). In particular, if \(W_{\mathbf {x}}\) is Hermitian then we simply measure twice and compare the result. The probability of obtaining the same outcome is

$$\begin{aligned} {{\,\mathrm{tr}\,}}\left[ \psi ^{\otimes 2} \frac{I + W_{\mathbf {x}} \otimes W_{\mathbf {x}}^\dagger }{2}\right] = \frac{1}{2} \left( 1 + d^n |c_\psi (\mathbf {x})|^2 \right) , \end{aligned}$$
(3.1)

where we recall that \(c_\psi \) denotes the characteristic function defined in Eq. (2.6).

To turn this idea into an algorithm for testing whether \(\psi \) is a stabilizer state we need a way of generating good candidate Weyl operators. For this we note that, since \(\psi \) is a pure quantum state,

$$\begin{aligned} p_\psi (\mathbf {x}) = |c_\psi (\mathbf {x})|^2 = d^{-n} |\langle \psi |W_{\mathbf {x}}|\psi \rangle |^2 = d^{-n} {{\,\mathrm{tr}\,}}[\psi W_{\mathbf {x}} \psi W_{\mathbf {x}}^\dagger ] \end{aligned}$$
(3.2)

is a probability distribution on the phase space \({\mathcal {V}}_n\). This follows directly from Eq. (2.7). We call \(p_\psi \) the characteristic distribution of \(\psi \).

Now, if \(\vert \psi \rangle =\vert S\rangle =\vert M,f\rangle \) is a stabilizer state then Eq. (2.17) implies that \(p_\psi \) is simply the uniform distribution on the subset \(M\subseteq {\mathcal {V}}_n\):

$$\begin{aligned} p_S(\mathbf {x}) = {\left\{ \begin{array}{ll} d^{-n} &{} \text { if } \mathbf {x} \in M, \\ 0 &{} \text { otherwise.} \end{array}\right. } \end{aligned}$$
(3.3)

Note that \(p_S\) is maximally sparse in the case of pure stabilizer states, since it always holds true that \(0\le p_\psi (x)\le d^{-n}\). Therefore, if we sample from the characteristic distribution of a stabilizer state then Eq. (3.3) shows that we would with certainty obtain the label of a Weyl operator for which \(\vert \psi \rangle \) is an eigenvector.

Importantly, the converse of this statement is also true: Suppose that \(\vert \psi \rangle \) is an eigenvector of all Weyl operators \(W_{\mathbf {x}}\) for \(\mathbf {x}\) in the support of the characteristic distribution (i.e., \(p_\psi (\mathbf {x})>0\)). Since \(p_\psi (x)\le d^{-n}\), the support of \(p_\psi \) contains at least \(d^n\) points. Thus if \(\vert \psi \rangle \) is an eigenvector of all these Weyl operators then the support must be exactly of cardinality \(d^n\) and so \(\vert \psi \rangle \) is a stabilizer state. This suggests the following algorithm:

  1. 1.

    Sample from the characteristic distribution of \(\psi \). Denote the result \(\mathbf {x}\).

  2. 2.

    Measure the corresponding Weyl operator \(W_{\mathbf {x}}\) twice and accept if the result is the same.

By the preceding discussion, this test will accept if and only if the state is a stabilizer state. But how do we go about sampling from the characteristic distribution?

3.1 Qubit stabilizer testing and Bell difference sampling

When the wave function \(\vert \psi \rangle \) is real in the computational basis then sampling from the characteristic distribution can be achieved by Bell sampling, introduced for qubits in [Mon17] (cf. [ZPDF16]). Bell sampling amounts to performing a basis measurement in the basis obtained by applying the Weyl operators to a fixed maximally entangled state, \(\vert W_{\mathbf {x}}\rangle = (W_{\mathbf {x}} \otimes I) \vert \Phi ^+\rangle \). Since the Weyl operators are orthogonal, \(\vert W_{\mathbf {x}}\rangle \) is an orthonormal basis of the doubled Hilbert space \({\mathcal {H}}_n \otimes {\mathcal {H}}_n\). Using the transpose trick,

$$\begin{aligned} \left|\langle W_{\mathbf {x}} | \psi ^{\otimes 2}\rangle \right|^2 = d^{-n} |\langle \psi |W_{\mathbf {x}}|{\bar{\psi }}\rangle |^2 \end{aligned}$$
(3.4)

In case the wave function is real, Eq. (3.4) is exactly equal to \(p_\psi (\mathbf {x})\); Bell sampling therefore allows us to implement step (1) above given two copies of the unknown state \(\psi \).

In general, however, the transformation \(\psi \mapsto {\overline{\psi }}=\psi ^T\) cannot be implemented by a physical process, since the transpose map is well-known not to be completely positive. Thus we need a new idea to treat the general case where the wave function can be complex.

We start with the observation that if \(\psi \) is a stabilizer state then so is \({\overline{\psi }}\). Indeed, \(\overline{W_{\mathbf {p},\mathbf {q}}} = (-1)^{(d+1)(\mathbf {p}\cdot \mathbf {q})} W_{J(p,q)}\), where J is the involution [App05]

$$\begin{aligned} J :{\mathcal {V}}_n \rightarrow {\mathcal {V}}_n, \quad (p,q) \mapsto ((-p) \bmod d,q) \end{aligned}$$

on phase space (note that the phase is trivial when d is odd and so always well-defined mod d). On the other hand, \(\overline{\omega ^{f(\mathbf {x})}} = \omega ^{-f(\mathbf {x})}\). It follows that if \(\vert \psi \rangle =\vert M,f\rangle \) then \(\vert {\overline{\psi }}\rangle =\vert J(M),g\rangle \), where \(\omega ^{g(\mathbf {x})}=\omega ^{-f(\mathbf {x})} (-1)^{(d+1)(\mathbf {p} \cdot \mathbf {q})}\) (again, this is well-defined for any d).

For qubits (\(d=2\)), the involution J is trivial. This means that if \(\psi \) is a stabilizer state then \(\psi \) and \({\bar{\psi }}\) are characterized by the same subspace M, but possibly different phases. We saw above that (only) in this case there exists a Weyl operator \(W_{\mathbf {z}}\) such that \(\vert {\bar{\psi }}\rangle = W_{\mathbf {z}} \vert \psi \rangle \). As a consequence, if we perform Bell sampling on \(\vert \psi \rangle \otimes \vert \psi \rangle \) then, from Eq. (3.4),

$$\begin{aligned} \left|\langle W_{\mathbf {x}} | \psi ^{\otimes 2}\rangle \right|^2 = d^{-n} |\langle \psi |W_{\mathbf {x} + \mathbf {z}}|\psi \rangle |^2 = p_\psi (\mathbf {x}+\mathbf {z}). \end{aligned}$$

Of course, \(\mathbf {z}\) is an unknown vector that depends on the stabilizer state \(\psi \). But since \(\mathbf {z}\) depends only on the stabilizer state \(\psi \), it is clear that we may Bell sample twice and take the difference of the result in order to obtain a uniform sample \(\mathbf {a}\) from the subspace M. Formally:

Definition 3.1

(Bell difference sampling). We define Bell difference sampling as performing Bell sampling twice and subtracting (adding) the results from each other (modulo two). In other words, it is the projective measurement on four copies of a state, \(\psi ^{\otimes 4}\in (({\mathbb {C}}^2)^{\otimes n})^{\otimes 4}\), given by

$$\begin{aligned} \Pi _{\mathbf {a}} = \sum _{\mathbf {x}} \vert W_{\mathbf {x}}\rangle \langle W_{\mathbf {x}}\vert \otimes \vert W_{\mathbf {x}+\mathbf {a}}\rangle \langle W_{\mathbf {x}+\mathbf {a}}\vert . \end{aligned}$$

It is not obvious that Bell difference sampling should be meaningful for non-stabilizer quantum states \(\psi \). The following theorem shows that it has a natural interpretation for general states:

Theorem 3.2

(Bell difference sampling). Let \(\psi \) be an arbitrary pure state of n qubits. Then:

$$\begin{aligned} {{\,\mathrm{tr}\,}}\left[ \Pi _{\mathbf {a}} \psi ^{\otimes 4}\right] = \sum _{\mathbf {x}} p_\psi (\mathbf {x}) p_\psi (\mathbf {x}+\mathbf {a}). \end{aligned}$$

If \(\psi \) is a stabilizer state, say \(\vert S\rangle \langle S\vert \), then this is equal to \(p_S(\mathbf {a})\) from Eq. (3.3).

The proof of Theorem 3.2 uses the symplectic Fourier transform defined in Eq. (2.9). Remarkably, the characteristic distribution of any pure state is left invariant by the Fourier transform:

$$\begin{aligned} \begin{aligned} {\widehat{p}}_\psi (\mathbf {a})&= 2^{-n} \sum _{\mathbf {x}} (-1)^{[\mathbf {a},\mathbf {x}]} c_\psi (\mathbf {x}) c_\psi (\mathbf {x}) = 2^{-n} \sum _{\mathbf {x}} c_\psi (\mathbf {x}) c_{W_{\mathbf {a}} \psi W_{\mathbf {a}}}(\mathbf {x}) \\&= 2^{-n} {{\,\mathrm{tr}\,}}[\psi W_{\mathbf {a}} \psi W_{\mathbf {a}}] = p_\psi (\mathbf {a}), \end{aligned}\nonumber \\ \end{aligned}$$
(3.5)

where the third step is Eq. (2.7) (note that for qubits the characteristic function is real).

We now give the proof of Theorem 3.2:

Proof of Theorem 3.2

We start with the observation that \(\Pi _{\mathbf {a}} = (I \otimes I \otimes I \otimes W_{\mathbf {a}}) \Pi _{\mathbf {0}} (I \otimes I \otimes I \otimes W_{\mathbf {a}})\). On the other hand, it is easy to verify that

$$\begin{aligned} \Pi _{\mathbf {0}} = \frac{1}{2^{2n}} \sum _{\mathbf {x}} W_{\mathbf {x}}^{\otimes 4} \end{aligned}$$
(3.6)

(i.e., it is the projection onto a stabilizer code of dimension \(2^{2n}\), which played an important role in [ZKGG16], and Bell difference sampling achieves precisely the syndrome measurement for this code). It follows that

$$\begin{aligned} \Pi _{\mathbf {a}} = \frac{1}{2^{2n}} \sum _{\mathbf {x}} (-1)^{[\mathbf {a},\mathbf {x}]} W_{\mathbf {x}}^{\otimes 4} \end{aligned}$$
(3.7)

and so

$$\begin{aligned} {{\,\mathrm{tr}\,}}\left[ \Pi _{\mathbf {a}} \psi ^{\otimes 4}\right]&= \frac{1}{2^{2n}} \sum _{\mathbf {x}} (-1)^{[\mathbf {a},\mathbf {x}]} {{\,\mathrm{tr}\,}}\left[ W_{\mathbf {x}}^{\otimes 4} \psi ^{\otimes 4} \right] = \sum _{\mathbf {x}} (-1)^{[\mathbf {a},\mathbf {x}]} p_\psi (\mathbf {x}) p_\psi (\mathbf {x}) \\&= \sum _{\mathbf {x}} {\hat{p}}_\psi (\mathbf {x}) {\hat{p}}_\psi (\mathbf {x}+\mathbf {a}) = \sum _{\mathbf {x}} p_\psi (\mathbf {x}) p_\psi (\mathbf {x}+\mathbf {a}); \end{aligned}$$

the third equality is the unitarity of the Fourier transform, which also maps modulations to translations, and in the last step we used Eq. (3.5), namely that the characteristic distribution of a pure state is left invariant by the Fourier transform. \(\square \)

Theorem 3.2 motivates Algorithm 1 as a natural algorithm for testing whether a multi-qubit state is a stabilizer state. The following theorem shows that stabilizer states are the only states that are accepted with certainty, and it quantifies this observation in a dimension-independent way:

Theorem 3.3

(Stabilizer testing for qubits). Let \(\psi \) be a pure state of n qubits. If \(\psi \) is a stabilizer state then Algorithm 1 accepts with certainty, \(p_\text {accept}=1\). On the other hand, if \(\max _S |\langle S|\psi \rangle |^2\le 1-\varepsilon ^2\) then \(p_\text {accept}\le 1-\varepsilon ^2/4\).

The converse bound of Theorem 3.3 can be stated equivalently as

$$\begin{aligned} \max _S |\langle S|\psi \rangle |^2 \ge 4p_{\text {accept}}-3. \end{aligned}$$
(3.8)

Proof

According to Theorem 3.2, step 1 of the algorithm samples elements \(\mathbf {a}\) with probability \(q(\mathbf {a})=\sum _{\mathbf {x}} p_\psi (\mathbf {x}) p_\psi (\mathbf {x}+\mathbf {a})\).

Let us first discuss the case that \(\psi \) is a stabilizer state, say \(\vert \psi \rangle =\vert M,f\rangle \). Since \(p_\psi (\mathbf {x})\) is the uniform distribution over M, which is a subspace, it holds that \(q(\mathbf {a}) = p_\psi (\mathbf {a})\), since, for \(x\in M\), \(x+a\in M\) if and only if \(a\in M\). But this means that \(\mathbf {a}\in M\) with certainty. Thus, \(\vert \psi \rangle \) is an eigenvector of the corresponding Weyl operator \(W_{\mathbf {a}}\) and step 2 of the test always accepts.

We now consider the case that \(\psi \) is a general pure state. Our goal will be to show that if Algorithm 1 succeeds with high probability then there must exist a stabilizer state with high overlap with \(\psi \). According to Eq. (3.1), the probability of acceptance is given by

$$\begin{aligned} p_\text {accept} = \frac{1}{2} \sum _{\mathbf {a}} q(\mathbf {a}) \left( 1 + 2^n p_\psi (\mathbf {a}) \right) \end{aligned}$$

where we recall that \(q(\mathbf {a})=\sum _{\mathbf {x}} p_\psi (\mathbf {x}) p_\psi (\mathbf {x}+\mathbf {a})\). Thus, by the Cauchy-Schwarz inequality,

$$\begin{aligned} p_\text {accept}= & {} \frac{1}{2} \sum _{\mathbf {x}} p_\psi (\mathbf {x}) \left( 1 + 2^n \sum _{\mathbf {a}} p_\psi (\mathbf {x}+\mathbf {a}) p_\psi (\mathbf {a}) \right) \nonumber \\\le & {} \frac{1}{2} \sum _{\mathbf {x}} p_\psi (\mathbf {x}) \left( 1 + 2^n \sum _{\mathbf {a}} p_\psi (\mathbf {a})^2 \right) \nonumber \\= & {} \frac{1}{2} \left( 1 + 2^n \sum _{\mathbf {a}} p_\psi (\mathbf {a})^2 \right) = \frac{1}{2} \sum _{\mathbf {a}} p_\psi (\mathbf {a}) \left( 1 + 2^n p_\psi (\mathbf {a}) \right) , \end{aligned}$$
(3.9)

where we have also used the fact that \(p_\psi \) is a probability distribution. Intuitively, this bound shows that if our test accepts with high probability then \(p_\psi (\mathbf {a}) \approx 2^{-n}\) with high probability. Indeed, let us consider

$$\begin{aligned} M_0 :=\{ \mathbf {a}\in {\mathcal {V}}_n : 2^n p_\psi (\mathbf {a}) > 1/2 \}. \end{aligned}$$

Then Markov’s inequality (which can be applied since it is always true that \(p_\psi \le 2^{-n}\)) asserts that

$$\begin{aligned} \sum _{\mathbf {a} \in M_0} p_\psi (\mathbf {a}) \ge 1 - 2 \sum _{\mathbf {a}} p_\psi (\mathbf {a}) \left( 1 - 2^n p_\psi (\mathbf {a}) \right) = 1 - 4\left( 1 - p_\text {accept}\right) . \end{aligned}$$
(3.10)

The choice of threshold 1/2 in the definition of \(M_0\) ensures that the Weyl operators corresponding to any two points \(\mathbf {a},\mathbf {b}\in M_0\) commute. To see, we use that any pair of anticommuting \(W_{\mathbf {a}},W_{\mathbf {b}}\) can by a base change be mapped onto the Pauli operators X,Z; it can then verified on the Bloch sphere that there exists no qubit state \(\rho \) such that both \({{\,\mathrm{tr}\,}}[\rho X]^2>1/2\) and \({{\,\mathrm{tr}\,}}[\rho Z]^2>1/2\) (see Fig. 1 for a graphical proof).

Let us now extend the set \(M_0\) to some maximal set M such that the corresponding Weyl operators commute. Then M is automatically a Lagrangian subspace, of dimension n.Footnote 4 As discussed in Sect. 2.4, it determines a whole basis of stabilizer states, \(\{\vert M,f\rangle \}_f\). Thus:

$$\begin{aligned} \max _S |\langle S|\psi \rangle |^2&\ge \max _f \langle M,f|\psi |M,f\rangle \ge \sum _f \langle M,f|\psi |M,f\rangle ^2 = {{\,\mathrm{tr}\,}}\left[ \Lambda _M[\psi ]^2\right] \\&= 2^{-2n} \sum _{\mathbf {x},\mathbf {y}\in M} {{\,\mathrm{tr}\,}}\left[ \psi W_{\mathbf {x}}^\dagger W_{\mathbf {y}} \psi (W_{\mathbf {x}}^\dagger W_{\mathbf {y}})^\dagger \right] = 2^{-n} \sum _{\mathbf {x}\in M} {{\,\mathrm{tr}\,}}\left[ \psi W_{\mathbf {x}} \psi W_{\mathbf {x}}^\dagger \right] \\&= \sum _{\mathbf {x}\in M} p_\psi (\mathbf {x}) \ge \sum _{\mathbf {x}\in M_0} p_\psi (\mathbf {x}) \ge 1 - 4\left( 1 - p_\text {accept}\right) \end{aligned}$$

where we used Eq. (2.16) for the measurement \(\Lambda _M\) in the stabilizer basis; the last bound is Eq. (3.10). In particular, if \(\max _S |\langle S|\psi \rangle |^2 \le 1-\varepsilon ^2\) then \(p_\text {accept} \le 1-\varepsilon ^2/4\). \(\square \)

Our theorem has the following consequence for quantum property testing, resolving an open question first raised by Montanaro and de Wolf [MdW16, Question 7].

Corollary 3.4

Let \(\psi \) be a pure state of n qubits and let \(\varepsilon >0\). Then there exists a quantum algorithm that, given \(O(1/\varepsilon ^2)\) copies of \(\psi \), accepts any stabilizer state (it is perfectly complete), while it rejects states such that \(\max _S |\langle S|\psi \rangle |^2\le 1-\varepsilon ^2\) with probability at least 2/3.

Before our result, the best known algorithms required a number of copies that scaled linearly with n, the number of qubits. Indeed, these algorithms proceeded by attempting to identify the stabilizer state, which requires \(\Omega (n)\) copies by the Holevo bound [AG08, Mon17, ZPDF16]. Moreover, our algorithm is manifestly efficient (see the circuit in Fig. 2).

Fig. 2
figure 2

Quantum circuit implementing Algorithm 1 for qubit stabilizer testing. Inside the blue blocks: The quantum gates denote the controlled-NOT and the Hadamard gate, respectively; the measurements are in the n-qubit computational basis. Outside the blue blocks: Double lines represent classical information. The “\(\oplus \)”-operation is addition modulo two. The boxes labeled “Weyl” perform a two-outcome measurement with respect to the eigenspaces of \(W_{\mathbf {a}}\), where \(\mathbf {a}\) is determined by classical inputs. For n qubits, the circuit is fully transversal in the sense that all operations are required to be coherent only across two copies, and factorize with respect to the n qubits

Remark 3.5

For multi-qubit states \(\psi \) that are real in the computational basis, we can replace step 1 of the algorithm by a single Bell sampling, which in this case directly samples from the characteristic distribution \(p_\psi \) (see Eq. (3.4)). The resulting algorithm operators on four copies of \(\psi \) and achieves the same guarantees as Theorem 3.2.

Remark 3.6

The scaling in Theorem 3.2 is optimal. Indeed, it is known that distinguishing any fixed pair of states \(\vert \psi \rangle ,\vert \phi \rangle \) with \(|\langle \psi |\phi \rangle |^2=1-\varepsilon ^2\) requires \(\Omega (1/\varepsilon ^2)\) copies [MdW16]. In particular, this lower bound holds if we choose \(\vert \psi \rangle \) to be a stabilizer state and \(\vert \phi \rangle \) a state that is \(\varepsilon \)-far away from being a stabilizer state, in which case our Algorithm 1 is applicable.

Remark 3.7

(Clifford testing). It follows from Theorem 3.3 that we can also test whether a given unitary U is in the Clifford group or not (without given access to \(U^\dagger \)). This resolves another open question in the survey of Montanaro and de Wolf [MdW16, Question 9].

Indeed, given black-box access to U alone we can create the Choi state \(\vert U\rangle :=(U \otimes I) \vert \Phi ^+\rangle \), which is a stabilizer state if and only if U is a Clifford unitary. Moreover, the “average case” distance measure used in the literature for quantum property testing of unitaries is precisely equal to trace distance between the corresponding Choi states [MdW16, Section 5.1.1]. Thus, by first creating the Choi state and then running our Algorithm 1 we can efficiently test whether a given unitary U is a Clifford unitary.

It is instructive to write down the accepting POVM element for Algorithm 1. From Eqs. (3.7) and (3.1), we find that it is given by

$$\begin{aligned} \Pi _\text {accept} = \sum _{\mathbf {a}} \Pi _{\mathbf {a}} \otimes \frac{I + W_{\mathbf {a}} \otimes W_{\mathbf {a}}^\dagger }{2} = \frac{1}{2} \left( I + U \right) , \end{aligned}$$
(3.11)

where we have introduced the unitary

$$\begin{aligned} U = \frac{1}{2^{2n}} \sum _{\mathbf {x},\mathbf {a}\in {\mathcal {V}}_n} (-1)^{[\mathbf {a},\mathbf {x}]} W_{\mathbf {x}}^{\otimes 4} \otimes W_{\mathbf {a}}^{\otimes 2} = \biggl ( \underbrace{\frac{1}{4} \sum _{\mathbf {x},\mathbf {a}\in {\mathcal {V}}_1} (-1)^{[\mathbf {a},\mathbf {x}]} W_{\mathbf {x}}^{\otimes 4} \otimes W_{\mathbf {a}}^{\otimes 2}}_{=: u} \biggr )^{\otimes n}. \end{aligned}$$

It is easy to verify that \(U=u^{\otimes n}\) is a Clifford unitary acting on the space \({\mathcal {H}}_n^{\otimes 6} \cong {\mathcal {H}}_{6n}\) of 6n qubits.

For any pure state \(\psi \), \(\psi ^{\otimes n}\) is in the symmetric subspace, and so invariant under left and right-multiplication by permutations. In particular, we obtain a test of the same goodness as Theorem 3.3 if we replace U by \(V=U (I^{\otimes 4} \otimes {\mathbb {F}})\), where \({\mathbb {F}}=R((1 2))\) denotes the operator that swaps (or flips) two blocks of n qubits. Since \({\mathbb {F}}=2^{-n} \sum _{\mathbf {b}} W_{\mathbf {b}}^{\otimes 2}\), we obtain the formula

$$\begin{aligned} \begin{aligned} V&= 2^{-3n} \sum _{\mathbf {x},\mathbf {a},\mathbf {b}} (-1)^{[\mathbf {a},\mathbf {x}]} W_{\mathbf {x}}^{\otimes 4} \otimes (W_{\mathbf {a}} W_{\mathbf {b}})^{\otimes 2} = 2^{-3n} \sum _{\mathbf {x},\mathbf {a},\mathbf {b}} (-1)^{[\mathbf {a},\mathbf {x}+\mathbf {b}]} W_{\mathbf {x}}^{\otimes 4} \otimes W_{\mathbf {a}+\mathbf {b} \bmod 2}^{\otimes 2} \\&= 2^{-3n} \sum _{\mathbf {x},\mathbf {a},\mathbf {b}} (-1)^{[\mathbf {a},\mathbf {x}+\mathbf {b}]} W_{\mathbf {x}}^{\otimes 4} \otimes W_{\mathbf {b}}^{\otimes 2} = 2^{-n} \sum _{\mathbf {x}\in {\mathcal {V}}_n} W_{\mathbf {x}}^{\otimes 6} = \biggl ( \underbrace{\frac{1}{2} \sum _{\mathbf {x}\in {\mathcal {V}}_1} W_{\mathbf {x}}^{\otimes 6}}_{=: v} \biggr )^{\otimes n}. \end{aligned} \end{aligned}$$
(3.12)

Thus, we recognize that the unitary \(V=v^{\otimes n}\) is precisely the action of the anti-identity (1.4) described in the introduction (for \(t=6\)):

$$\begin{aligned} V = R({\bar{\mathbb {1}}}) = 2^{-n} \left( I^{\otimes 6} + X^{\otimes 6} + Y^{\otimes 6} + Z^{\otimes 6} \right) ^{\otimes n} \end{aligned}$$
(3.13)

See also Remark 3.9. We discuss anti-permutations in more detail in Definition 4.29.

Equation (3.12) allows us to express the acceptance probability of Algorithm 1 in an interesting way:

$$\begin{aligned} p_\text {accept}&= {{\,\mathrm{tr}\,}}\left[ \psi ^{\otimes 6} \Pi _\text {accept}\right] = \frac{1}{2}\left( 1 + {{\,\mathrm{tr}\,}}\left[ \psi ^{\otimes 6} U\right] \right) = \frac{1}{2}\left( 1 + {{\,\mathrm{tr}\,}}\left[ \psi ^{\otimes 6} V\right] \right) \nonumber \\&= \frac{1}{2}\left( 1 + 2^{-n} \sum _{\mathbf {x}} {{\,\mathrm{tr}\,}}\left[ \psi ^{\otimes 6} W_{\mathbf {x}}^{\otimes 6} \right] \right) = \frac{1}{2}\left( 1 + 2^{2n} \sum _{\mathbf {x}} c_\psi (\mathbf {x})^6 \right) \nonumber \\&= \frac{1}{2}\left( 1 + 2^{2n} \Vert c_\psi \Vert _{\ell _6}^6 \right) = \frac{1}{2}\left( 1 + 2^{2n} \Vert p_\psi \Vert _{\ell _3}^3 \right) \nonumber \\&= \frac{1}{2}\left( 1 + 2^{2n} \sum _{\mathbf {x}} p_\psi (\mathbf {x})^3 \right) = \sum _{\mathbf {x}} p_\psi (\mathbf {x}) \frac{1}{2}\left( 1 + 2^{2n} p_\psi (\mathbf {x})^2 \right) . \end{aligned}$$
(3.14)

It is intuitive that the \(\ell _p\)-norms should appear, since stabilizer states can be characterized by having a maximally peaked characteristic function and distribution (Eqs. (2.17) and (3.3)).

In fact, the result of this calculation is plainly a strengthening of Eq. (3.9), since \(2^n p_\psi (\mathbf {x})\le 1\). If we follow the rest of the proof of Theorem 3.3 then we obtain \(p_\text {accept}\le 1-3\varepsilon ^2/8\), a slight improvement. More importantly, though, this argument completely avoids the analysis of Bell difference sampling in Theorem 3.2. This leads us towards an approach for testing general qudit stabilizer states.

3.2 Qudit stabilizer testing

While Bell sampling can only be used for qubit systems, Eq. (3.12) has a clear generalization to arbitrary qudits. Let \(d\ge 2\) and consider the operator

$$\begin{aligned} V_s = d^{-n} \sum _{\mathbf {x}} (W_{\mathbf {x}} \otimes W_{\mathbf {x}}^\dagger )^{\otimes s}. \end{aligned}$$
(3.15)

(For qubits, the Weyl operators are Hermitian and so \(V_3\) is precisely Eq. (3.12).) Suppose we choose s such that \(V_s\) is a Hermitian unitary (we will momentarily see that this can always be done). Then

$$\begin{aligned} \Pi _{s,\text {accept}} = \frac{1}{2}(I+V_s) \end{aligned}$$

is a projection. If we think of it as the accepting element of a binary POVM then

$$\begin{aligned} \begin{aligned} p_\text {accept}&= {{\,\mathrm{tr}\,}}[\psi ^{\otimes 2s} \Pi _{s,\text {accept}}] = \frac{1}{2} \left( 1 + {{\,\mathrm{tr}\,}}[\psi ^{\otimes s} V_s] \right) = \frac{1}{2} \left( 1 + d^{-n} \sum _{\mathbf {x}} |{{\,\mathrm{tr}\,}}[\psi W_{\mathbf {x}}]|^{2s} \right) \\&= \frac{1}{2} \left( 1 + d^{(s-1)n} \sum _{\mathbf {x}} p^s_\psi (\mathbf {x}) \right) = \sum _{\mathbf {x}} p_\psi (\mathbf {x}) \frac{1}{2} \left( 1 + d^{(s-1)n} p^{s-1}_\psi (\mathbf {x}) \right) , \end{aligned}\nonumber \\ \end{aligned}$$
(3.16)

which generalizes Eq. (3.14).

When is \(V_s\) Hermitian and unitary? It is always Hermitian, since \(W_{\mathbf {x}} \otimes W_{\mathbf {x}}^\dagger \) only depends on \(\mathbf {x}\) modulo d. For unitarity we use Eq. (2.4) and calculate

$$\begin{aligned} V_s^2&= d^{-2n} \sum _{\mathbf {x},\mathbf {y}} (W_{\mathbf {x}} W_{\mathbf {y}} \otimes W_{\mathbf {x}}^\dagger W_{\mathbf {y}}^\dagger )^{\otimes s} = d^{-2n} \sum _{\mathbf {x},\mathbf {y}} \omega ^{s[\mathbf {x},\mathbf {y}]} (W_{\mathbf {x} + \mathbf {y}} \otimes W_{-(\mathbf {x}+\mathbf {y})})^{\otimes s} \\&= d^{-2n} \sum _{\mathbf {x},\mathbf {y}} \omega ^{s[\mathbf {x},\mathbf {y}]} (W_{\mathbf {x} + \mathbf {y} \bmod d} \otimes W_{-(\mathbf {x}+\mathbf {y} \bmod d)})^{\otimes s}\\&= d^{-2n} \sum _{\mathbf {z}} \left( \sum _{\mathbf {x}} \omega ^{s[\mathbf {x},\mathbf {z}]} \right) (W_{\mathbf {z}} \otimes W_{-\mathbf {z}})^{\otimes s}. \end{aligned}$$

If s is invertible modulo d then \(\omega ^{s[-,\mathbf {z}]}\) is a nontrivial character for all \(\mathbf {z}\), and so the inner sum simplifies to \(d^{2n} \delta _{\mathbf {z},\mathbf{0}}\). It follows that \(V_s^2 = I\), as desired. We summarize:

Lemma 3.8

Let \(d\ge 2\) and s an integer that is invertible modulo d (i.e., \((s,d)=1\)). Then \(V_s\) is a Hermitian unitary.

Remark 3.9

(Qubits). For qubits, the operator \(V_s\) is a Hermitian unitary if and only if s is odd. E.g., for \(s=1\) it is the unitary swap operator \({\mathbb {F}}\) and for \(s=3\) it is precisely Eq. (3.12) (the anti-identity), while for \(s=2\) it is not unitary but in fact proportional to one of the POVM elements from Bell difference sampling. Indeed, \(V_2=2^n \Pi _0\) where \(\Pi _0\) is the projection from Eq. (3.6). Thus \(\Vert V_2\Vert =2^n\) and so we cannot interpret the associated \(\Pi _2\) as a POVM element. This already partly explains why we had to resort to six copies to test stabilizerness.

The second ingredient used to establish Theorem 3.3 was an uncertainty principle for Weyl operators. The following lemma supplies this for general d:

Lemma 3.10

(Uncertainty relation). Let \(\delta {=} 1/2d\) and \(\psi \) a pure state such that \(|{{\,\mathrm{tr}\,}}[\psi W_{\mathbf {x}}]|^2 > 1-\delta ^2\) and \(|{{\,\mathrm{tr}\,}}[\psi W_{\mathbf {y}}]|^2 > 1-\delta ^2\). Then \(W_{\mathbf {x}}\) and \(W_{\mathbf {y}}\) must commute.

Proof

Note that

$$\begin{aligned} \Vert W_{\mathbf {x}} \vert \psi \rangle - \vert \psi \rangle \langle \psi | W_{\mathbf {x}} | \psi \rangle \Vert < \delta \end{aligned}$$

and likewise for \(W_{\mathbf {y}}\). By the triangle inequality,

$$\begin{aligned}&\Vert W_{\mathbf {x}} W_{\mathbf {y}} \vert \psi \rangle - \vert \psi \rangle \langle \psi | W_{\mathbf {x}} | \psi \rangle \langle \psi | W_{\mathbf {y}} | \psi \rangle \Vert \\&\quad \le \Vert W_{\mathbf {x}} W_{\mathbf {y}} \vert \psi \rangle - W_{\mathbf {x}}\vert \psi \rangle \langle \psi | W_{\mathbf {y}} | \psi \rangle \Vert + \Vert W_{\mathbf {x}}\vert \psi \rangle \langle \psi | W_{\mathbf {y}} | \psi \rangle - \vert \psi \rangle \langle \psi | W_{\mathbf {x}} | \psi \rangle \langle \psi | W_{\mathbf {y}} | \psi \rangle \Vert \\&\quad \le \Vert W_{\mathbf {x}} \Vert \Vert W_{\mathbf {y}} \vert \psi \rangle - \vert \psi \rangle \langle \psi | W_{\mathbf {y}} | \psi \rangle \Vert + \Vert W_{\mathbf {x}}\vert \psi \rangle - \vert \psi \rangle \langle \psi | W_{\mathbf {x}} | \psi \rangle \Vert \langle \psi | W_{\mathbf {y}} | \psi \rangle \\&\quad \le \Vert W_{\mathbf {y}} \vert \psi \rangle - \vert \psi \rangle \langle \psi | W_{\mathbf {y}} | \psi \rangle \Vert + \Vert W_{\mathbf {x}}\vert \psi \rangle - \vert \psi \rangle \langle \psi | W_{\mathbf {x}} | \psi \rangle \Vert < 2\delta , \end{aligned}$$

but also

$$\begin{aligned}&\Vert W_{\mathbf {x}} W_{\mathbf {y}} \vert \psi \rangle - \omega ^{[x,y]} \vert \psi \rangle \langle \psi | W_{\mathbf {x}} | \psi \rangle \langle \psi | W_{\mathbf {y}} | \psi \rangle \Vert \\&\quad = \Vert \omega ^{[x,y]} W_{\mathbf {y}} W_{\mathbf {x}} \vert \psi \rangle - \omega ^{[x,y]} \vert \psi \rangle \langle \psi | W_{\mathbf {y}} | \psi \rangle \langle \psi | W_{\mathbf {x}} | \psi \rangle \Vert < 2\delta . \end{aligned}$$

If we combine this with another triangle inequality, we obtain that

$$\begin{aligned} |1 - \omega ^{[x,y]} |= \Vert \omega ^{[x,y]} \vert \psi \rangle - \vert \psi \rangle \Vert< \frac{4\delta }{\langle \psi | W_{\mathbf {x}} | \psi \rangle \langle \psi | W_{\mathbf {y}} | \psi \rangle } < \frac{4\delta }{1 - \delta ^2}. \end{aligned}$$

Now suppose that \(W_{\mathbf {x}}\) and \(W_{\mathbf {y}}\) do not commute. Then \([x,y]\ne 0\) and so

$$\begin{aligned} |1 - \omega ^{[x,y]} |\ge |1 - \omega |= 2\sin (\pi /d) \ge \frac{4}{d}. \end{aligned}$$

Thus, \(4/d < 4\delta /(1-\delta ^2)\), which plainly contradicts our choice of \(\delta \). This is the desired contradiction and we conclude that \(W_{\mathbf {x}}\) and \(W_{\mathbf {y}}\) commute. \(\square \)

We now show that stabilizer testing can be done in arbitrary local dimension:

Theorem 3.11

(Stabilizer testing for qudits). Let \(d\ge 2\) and choose \(s\ge 2\) such that \((d,s)=1\). Let \(\psi \) be a pure state of n qudits and denote by \(p_\text {accept}={{\,\mathrm{tr}\,}}[\psi ^{\otimes 2s}\Pi _{s,\text {accept}}]\) the probability that the POVM element \(\Pi _{s,\text {accept}}\) accepts given 2s copies of \(\psi \). If \(\psi \) is a stabilizer state then it accepts with certainty, \(p_\text {accept}=1\). On the other hand, if \(\max _S|\langle S|\psi \rangle |^2\le 1-\varepsilon ^2\) then \(p_\text {accept}\le 1-C_{d,s}\varepsilon ^2\), where \(C_{d,s} = (1 - (1 - 1/4d^2)^{s-1})/2\).

Proof

If \(\psi \) is a stabilizer state, say \(\vert \psi \rangle =\vert M,f\rangle \), then \(p_\psi (\mathbf {x})\) is the uniform distribution on M, which has \(d^n\) elements. In view of Eq. (3.16),

$$\begin{aligned} p_\text {accept} = \sum _{\mathbf {x}} p_\psi (\mathbf {x}) \frac{1}{2} \left( 1 + d^{(s-1)n} p^{s-1}_\psi (\mathbf {x}) \right) = 1, \end{aligned}$$

so the test accepts with certainty.

Now suppose that \(\psi \) is a general state. Define

$$\begin{aligned} M_0 :=\{ \mathbf {x} \in {\mathcal {V}}_n : d^n p_\psi (\mathbf {x}) > 1 - 1/4d^2 \}. \end{aligned}$$

By Lemma 3.10, the Weyl operators \(W_{\mathbf {x}}\) for \(\mathbf {x}\in M_0\) all commute. We can thus extend \(M_0\) to a maximal set M with this property. As in the proof of Theorem 3.3, we can bound

$$\begin{aligned} \max _S |\langle S|\psi \rangle |^2 \ge \sum _{\mathbf {x}\in M_0} p_\psi (\mathbf {x}). \end{aligned}$$

But this probability can be bounded as before using the Markov inequality (but now for a \((s-1)\)st moment):

The last equality is Eq. (3.16). This yields the desired bound. \(\square \)

Remark 3.12

It is clear that \(s=d+1\) is always a valid choice in Theorem 3.11. This leads to \(C_{d,s} \approx 1/8d\) for large d, but the resulting test involves gates that act on \(2d+2\) qudits at a time. However, this choice of s is in general rather pessimistic. E.g., if d is odd then we may always choose \(s=2\), meaning that our test acts on four copies at a time.

Corollary 3.13

Let \(d\ge 2\) and fix s as in Theorem 3.11. Let \(\psi \) be a pure state of n qudits and let \(\varepsilon >0\). Then there exists an quantum algorithm that, given \(O(1/C_{d,s}\varepsilon ^2)\) copies of \(\psi \), accepts any stabilizer state (it is perfectly complete), while it rejects states such that \(\max _S |\langle S|\psi \rangle |^2\le 1-\varepsilon ^2\) with probability at least 2/3.

It is clear that the POVM measurement \(\{\Pi _{s,\text {accept}},I-\Pi _{s,\text {accept}}\}\) can be implemented efficiently. Using phase estimation, it suffices to argue that the controlled version of \(V_s\) can be implemented efficiently. But \(V_s = v_s^{\otimes n}\), so its controlled version is equal to a composition of n controlled versions of \(v_s\), each of which acts only on a constant (with respect to n) number of qudits. It follows that our stabilizer test for qudits is efficient.

It is instructive to compute the action of the unitary \(V_s=v_s^{\otimes n}\) more explicitly: Let \(\vert \mathbf {x}\rangle =\vert \mathbf {x}_1,\dots ,\mathbf {x}_{2s}\rangle \) denote a computational basis vector of \({\mathcal {H}}_n^{\otimes 2s}\). Then, using Eq. (2.1),

where \(\bar{\mathbf {x}}_\text {even} = s^{-1} \sum _{k\text { even}} \mathbf {x}_k\) and \({\bar{x}}_\text {odd}\) is defined analogously. If we re-order the tensor factors so that the odd systems come first, followed by the even ones, we find that a basis vector \(\vert \mathbf {x}_\text {odd},\mathbf {x}_\text {even}\rangle \) is mapped to \(\vert \mathbf {x}_\text {odd} - \bar{\mathbf {x}}_\text {odd} + \bar{\mathbf {x}}_\text {even},\mathbf {x}_\text {even} + \bar{\mathbf {x}}_\text {odd} - \bar{\mathbf {x}}_\text {even}\rangle \). Thus, \(V_s\) is a unitary that permutes the computational basis vectors by “swapping the mean” of the even and the odd sites of the 2s many blocks of n qudits.

Here is one last reformulation that will be useful to connect to our algebraic results. Let \(\mathbf {p}_{2s} = (-1,1,\dots ,-1,1) \in {\mathbb {Z}}_d^{2s}\) denote the ‘parity vector’ that is \(\pm 1\) on even/odd sites, and consider the following \(2s \times 2s\) matrix with entries in \({\mathbb {Z}}_d\):

$$\begin{aligned} {\tilde{\mathbb {1}}} =\mathbb {1}- s^{-1} \mathbf {p}_{2s} \mathbf {p}_{2s}^T \end{aligned}$$
(3.17)

Then we can write the action of \(V_s\) as

$$\begin{aligned} V_s \vert \mathbf {x}\rangle = \vert {\tilde{\mathbb {1}}} (\mathbf {x}_1,\dots ,\mathbf {x}_{2s})\rangle = \vert ({\tilde{\mathbb {1}}} \otimes I_n)\mathbf {x}\rangle . \end{aligned}$$
(3.18)

It is easy to verify that \({\tilde{\mathbb {1}}}\) is a stochastic isometry (cf. Eq. (4.36) in Sect. 4.3). For qubits and \(s=3\), \({\tilde{\mathbb {1}}}\) is just the matrix obtained by taking the \(6\times 6\) identity matrix and inverting each bit (the ‘anti-identity’). This gives a pleasant and insightful interpretation of Eq. (3.12), as we will see in Sect. 5.2. Interestingly, the anti-identity has previously appeared in the classification of Clifford gates in [GS16] (their \(T_6\)).

4 Algebraic Theory of Clifford Tensor Powers

In this section, we present a general framework for studying the algebraic structure of stabilizer states and Clifford operators. We start by describing the commutant of the tensor powers of the Clifford group, where we obtain results similar in flavor to the Schur–Weyl duality between the unitary group and the symmetric group. Next, we apply this machinery to compute arbitrary moments of qudit stabilizer states, and we describe how to construct t-designs of arbitrary order from weighted Clifford orbits. Lastly, we return to the stabilizer testing problem and explain how our solution from Sect. 3 can be understood more systematically and generalized. In particular, we find an optimal projection that characterizes the tensor powers of stabilizer states precisely.

Throughout this section we assume that d is prime.

4.1 Commutant of Clifford tensor powers

Schur–Weyl duality in its most fundamental form asserts that any operator on \(({\mathbb {C}}^D)^{\otimes t}\) that commutes with \(U^{\otimes t}\) for all unitaries \(U\in U(D)\) is necessarily a linear combination of permutation operators. Using the double commutant theorem, this implies at once that \(({\mathbb {C}}^d)^{\otimes t} = \bigoplus _\lambda V_{U(D),\lambda } \otimes V_{S_t,\lambda }\), where the \(V_{U(D),\lambda }\) and \(V_{S_t,\lambda }\) are pairwise inequivalent irreducible representations of the unitary group U(D) and of the symmetric group \(S_t\), respectively.

The main result of this section is that the commutant of the tensor powers of the Clifford group can be completely described in terms of a natural generalization of permutation operators (see Theorem 4.3 below). Mathematically, this generalization involves Lagrangian subspaces of a space equipped with a quadratic form. Since stabilizer states can be described in terms of Lagrangian subspaces with respect to a symplectic form (Sect. 2), this is reminiscent of Howe’s classical duality between sympletic and orthogonal group actions.

To describe the result more precisely, let T denote a subspace of \({\mathbb {Z}}_d^t \oplus {\mathbb {Z}}_d^t\). We define a corresponding operator

$$\begin{aligned} r(T) = \sum _{(\mathbf {x},\mathbf {y})\in T} \vert \mathbf {x}\rangle \langle \mathbf {y}\vert \end{aligned}$$

on \(({\mathbb {C}}^d)^{\otimes t}\), where \(\vert \mathbf {x}\rangle = \vert x_1,x_2,\dots ,x_t\rangle \in ({\mathbb {C}}^d)^{\otimes t}\) denotes the computational basis vector associated with some \(\mathbf {x}\in {\mathbb {Z}}_d^t\). We also consider the n-fold tensor power

$$\begin{aligned} R(T):=r(T)^{\otimes n}, \end{aligned}$$

which is an operator on \((({\mathbb {C}}^d)^{\otimes t})^{\otimes n} \cong ({\mathbb {C}}^d)^{\otimes t n} \cong (({\mathbb {C}}^d)^{\otimes n})^{\otimes t}.\) Both r(T) and R(T) are represented by real matrices in the computational basis.

Definition 4.1

(\(\Sigma _{t,t}\)) Consider the quadratic form \({\mathfrak {q}}:{\mathbb {Z}}_d^{2t} \rightarrow {\mathbb {Z}}_D\) defined by \({\mathfrak {q}}(\mathbf {x},\mathbf {y}) :=\mathbf {x}\cdot \mathbf {x} - \mathbf {y}\cdot \mathbf {y}\).Footnote 5 We denote by \(\Sigma _{t,t}(d)\) the set of subspaces \(T\subseteq {\mathbb {Z}}_d^{2t}\) satisfying the following properties:

  1. 1.

    T is totally \({\mathfrak {q}}\)-isotropic: i.e., \(\mathbf {x}\cdot \mathbf {x} = \mathbf {y}\cdot \mathbf {y} \pmod D\) for all \((\mathbf {x},\mathbf {y})\in T\).

  2. 2.

    T has dimension t (the maximal possible dimension).

  3. 3.

    T is stochastic: \(\mathbf {1}_{2t} = (1,\dots ,1) \in T\).

We will summarize the first two conditions by saying that T is Lagrangian. Thus, we will call \(\Sigma _{t,t}(d)\) the set of stochastic Lagrangian subspaces.

See [NW16, App. C] for a complete list of the subspaces \(\Sigma _{t,t}(d)\) for \(t=3\), and Sect. 4.3 for examples.

In Lemma 4.5, we will show that the operators R(T) are indeed in the commutant of \({{\,\mathrm{Cliff}\,}}(n,d)^{\otimes t}\). The proof is straightforward and elucidates the role of the three conditions in Definition 4.1 as well as the difference between even and odd d.

Remark 4.2

Recall that a subspace T is called totally isotropic with respect to a quadratic form \({\mathfrak {q}}\) if \({\mathfrak {q}}(\mathbf {v}) = 0\) for every \(\mathbf {v}\in T\). This explain our terminology in Definition 4.1.

We can also consider the \({\mathbb {Z}}_d\)-valued bilinear form \({\mathfrak {b}}((\mathbf {x}, \mathbf {y}),(\mathbf {x}',\mathbf {y}')) :=\mathbf {x}\cdot \mathbf {x}' - \mathbf {y}\cdot \mathbf {y}' \in {\mathbb {Z}}_d\). By a straightforward calculation,

$$\begin{aligned} {\mathfrak {q}}(\mathbf {v} + \mathbf {w}) = {\mathfrak {q}}(\mathbf {v}) + {\mathfrak {q}}(\mathbf {w}) + 2 {\mathfrak {b}}(\mathbf {v}, \mathbf {w}) \pmod D \end{aligned}$$
(4.1)

for all \(\mathbf {v}\), \(\mathbf {w}\in {\mathbb {Z}}_d^{2t}\). Thus, \({\mathfrak {q}}\) is a \({\mathbb {Z}}_D\)-valued quadratic form associated to the \({\mathbb {Z}}_d\)-bilinear form \({\mathfrak {b}}\) in the sense of [Woo93]. Note that if T is totally isotropic with respect to \({\mathfrak {q}}\) then Eq. (4.1) shows that T is self-orthogonal, i.e., \(T \subseteq T^\perp \), where

$$\begin{aligned} T^\perp :=\{ \mathbf {v} \in {\mathbb {Z}}_d^{2t} \;:\; {\mathfrak {b}}(\mathbf {v},\mathbf {w}) = 0 \quad \forall \mathbf {w}\in T \}. \end{aligned}$$

If d is odd then \({\mathfrak {q}}(\mathbf {v})={\mathfrak {b}}(\mathbf {v},\mathbf {v})\), so any self-orthogonal subspace is automatically totally isotropic with respect to \({\mathfrak {q}}\).

If \(d=2\) then Eq. (4.1) implies that, for a self-orthogonal subspace, the set of isotropic vectors forms a subspace—so we can check total isotropicity on a basis. Moreover, for \(d=2\), if T is Lagrangian then it is automatically stochastic; indeed, \({\mathfrak {b}}(\mathbf {v}, \mathbf {1}_{2t}) = {\mathfrak {q}}(\mathbf {v}) \pmod 2\), so \(\mathbf {1}_{2t}\) is contained in any maximal totally isotropic subspace.

Our goal of this section it to prove the following theorem:

Theorem 4.3

(Commutant of Clifford tensor powers). Let d be a prime and \(n\ge t-1\). Then the operators \(R(T)=r(T)^{\otimes n}\) for \(T \in \Sigma _{t,t}(d)\) are \(\prod _{k=0}^{t-2} (d^k+1)\) many linearly independent operators that span the commutant of the t-th tensor power action of the Clifford group for n qudits.

It is instructive to discuss a few key features of Theorem 4.3. First, we know that the permutation group on t elements, \(S_t\), is in the commutant of the Clifford group \({{\,\mathrm{Cliff}\,}}(n,d)\), because it is even in the commutant of the larger unitary group \(U(d^n)\). Indeed, let \(\pi \cdot \mathbf {y} = (y_{\pi ^{-1}(1)},\dots ,y_{\pi ^{-1}(t)})\) denote the permutation action of \(S_t\) on \({\mathbb {Z}}_d^t\). The one can see that, for any permutation \(\pi \in S_t\), the subspace \(T_\pi = \{(\pi \cdot \mathbf {y},\mathbf {y}) : \mathbf {y}\in {\mathbb {Z}}_d^t\}\) is Lagrangian and stochastic. The corresponding operator \(R(T_\pi )=r(T_\pi )^{\otimes n}\) agrees precisely with the usual permutation action of \(S_t\) on \((({\mathbb {C}}^d)^{\otimes n})^{\otimes t}\). Accordingly, we may identify \(S_t\) with a subset of \(\Sigma _{t,t}(d)\). We will see below in Definition 4.11 that the set of subspaces T for which R(T) is invertible forms a (in general, proper) subgroup that is (in general, strictly) larger than \(S_t\).

Remarkably, Theorem 4.3 shows that the size of the commutant stabilizes as soon as \(n\ge t-1\). That is, just like for the symmetric group in Schur–Weyl duality of D, the set \(\Sigma _{t,t}(d)\) that parametrizes the commutant of the Clifford tensor powers is independent of n, the number of qudits, provided that \(n\ge t-1\). This stabilization, along with the fact that the operators \(R(T)=r(T)^{\otimes n}\) are tensor powers, are highly useful properties in applications (e.g., [NW16] and Sects. 5 and 5.2 below).

Remark 4.4

We believe that the results of Nebe et al [NRS06] show that the operators R(T) span the commutant of \({{\,\mathrm{Cliff}\,}}(n,d)^{\otimes t}\) for any value of n. But we caution that if \(n<t-1\) then the R(T) are in general no longer linearly independent (e.g., [Zhu15, eqs. (9) and (10)]).

Theorem 4.3 will be established by combining a number of intermediate results of independent interest. We first show that the operators R(T) are indeed in the commutant of \({{\,\mathrm{Cliff}\,}}(n,d)^{\otimes t}\).

Lemma 4.5

For every \(T\in \Sigma _{t,t}(d)\) and for every \(U\in {{\,\mathrm{Cliff}\,}}(n,d)\), we have that \([R(T), U^{\otimes t}] = [r(T)^{\otimes n}, U^{\otimes t}] = 0\).

Proof

Up to global phases, the Clifford group is generated by the following three operators, which are allowed to act on arbitrary qudits or pairs of qudits [Got99, Far14, NBD+02]: The Fourier transform (also known as the Hadamard gate for \(d=2\)),

$$\begin{aligned} H=\frac{1}{\sqrt{d}}\sum _{a,b\in {\mathbb {Z}}_d} \omega ^{ab} \vert a\rangle \langle b\vert , \end{aligned}$$

the phase gate, which is defined as

$$\begin{aligned} P=\sum _{a\in {\mathbb {Z}}_2} i^{a^2} \vert a\rangle \langle a\vert \text { for }d=2, \qquad P=\sum _{a\in {\mathbb {Z}}_d} \omega ^{2^{-1}a(a-1)} \vert a\rangle \langle a\vert \text { for }d\ne 2, \end{aligned}$$

(here we use that for \(d=2\), \(a^2\) is well-defined modulo four, while for odd d, 2 has a multiplicative inverse, denoted \(2^{-1}\)), and the controlled addition (also known as the CNOT gate for \(d=2\))

$$\begin{aligned} {\text {CADD}}=\sum _{a,b\in {\mathbb {Z}}_d} \vert a,a+b\rangle \langle a,b\vert . \end{aligned}$$

To establish the lemma we will prove the claim for each generator (cf. [NRS06]).

The Fourier transform H is a one-qudit gate, so it suffices to show that \([H^{\otimes t},r(T)]=0\) for every \(T\in \Sigma _{t,t}(d)\). Indeed:

$$\begin{aligned} H^{\otimes t} r(T) H^{\dagger , \otimes t}&= d^{-t} \sum _{\mathbf {a}, \mathbf {b}\in {\mathbb {Z}}_d^t} \sum _{(\mathbf {x},\mathbf {y})\in T} \omega ^{\mathbf {a}\cdot \mathbf {x}-\mathbf {b}\cdot \mathbf {y}} \vert \mathbf {a}\rangle \langle \mathbf {b}\vert \\&= d^{-t} \sum _{\mathbf {a}, \mathbf {b}\in {\mathbb {Z}}_d^t} \sum _{(\mathbf {x},\mathbf {y})\in T} \omega ^{{\mathfrak {b}}((\mathbf {a},\mathbf {b}),(\mathbf {x},\mathbf {y}))} \vert \mathbf {a}\rangle \langle \mathbf {b}\vert \\&= \sum _{(\mathbf {a}, \mathbf {b})\in T^\perp } \vert \mathbf {a}\rangle \langle \mathbf {b}\vert = r(T). \end{aligned}$$

In the second step and third steps, we used the notation \({\mathfrak {b}}\) and \(T^\perp \) from Remark 4.2, respectively, as well as that \(\dim T=t\). The last step holds since \(T=T^\perp \), as T is a Lagrangian subspace.

Next, we consider the phase gate, which is likewise a single-qudit gate. For \(d=2\), we have that

$$\begin{aligned} P^{\otimes t} r(T) P^{\dagger ,\otimes t} = \sum _{(\mathbf {x}, \mathbf {y})\in T} i^{\mathbf {x}\cdot \mathbf {x}-\mathbf {y}\cdot \mathbf {y}} \vert \mathbf {x}\rangle \langle \mathbf {y}\vert = r(T) \end{aligned}$$

since T is totally isotropic. For odd d, we instead compute

$$\begin{aligned} P^{\otimes t} r(T) P^{\dagger ,\otimes t}&= \sum _{(\mathbf {x}, \mathbf {y})\in T} \omega ^{2^{-1} \sum _j x_j(x_j-1) - y_j(y_j-1)} \vert \mathbf {x}\rangle \langle \mathbf {y}\vert \\&= \sum _{\mathbf {v}=(\mathbf {x}, \mathbf {y})\in T} \omega ^{2^{-1} {\mathfrak {b}}(\mathbf {v},\mathbf {v}-\mathbf {1}_{2t})} \vert \mathbf {x}\rangle \langle \mathbf {y}\vert = r(T), \end{aligned}$$

since T is totally isotropic and stochastic (so \(\mathbf {w}=\mathbf {v}-\mathbf {1}_{2t}\in T\) and \({\mathfrak {b}}(\mathbf {v},\mathbf {w})=0\) for every \(\mathbf {v}\in T\)).

Lastly, we consider the controlled addition gate, which is a two-qudit gate:

$$\begin{aligned}&{\text {CADD}}^{\otimes t} r(T)^{\otimes 2} {\text {CADD}}^{\dagger ,\otimes t} = \sum _{(\mathbf {x},\mathbf {y})\in T} \sum _{(\mathbf {x}',\mathbf {y}')\in T} {\text {CADD}}^{\otimes t} \vert \mathbf {x},\mathbf {x}'\rangle \langle \mathbf {y},\mathbf {y}'\vert {\text {CADD}}^{\dagger ,\otimes t} \\&\quad = \sum _{(\mathbf {x},\mathbf {y})\in T} \sum _{(\mathbf {x}',\mathbf {y}')\in T} \vert \mathbf {x},\mathbf {x}+\mathbf {x}'\rangle \langle \mathbf {y},\mathbf {y}+\mathbf {y}'\vert = \sum _{(\mathbf {x},\mathbf {y})\in T} \sum _{(\mathbf {x}',\mathbf {y}')\in T} \vert \mathbf {x},\mathbf {x}'\rangle \langle \mathbf {y},\mathbf {y}'\vert \end{aligned}$$

where we only used that T is a subspace. \(\square \)

We now show that the operators R(T) are linearly independent as soon as \(n\ge t-1\). For this, we introduce the following useful notation:

Definition 4.6

(Vectorization). The vectorization operator \({{\,\mathrm{vec}\,}}\) is defined by its action in the computational basis via

$$\begin{aligned} {{\,\mathrm{vec}\,}}(\vert \mathbf {x}\rangle \langle \mathbf {y}\vert ) = \vert \mathbf {x}\rangle \otimes \vert \mathbf {y}\rangle = \vert \mathbf {x},\mathbf {y}\rangle . \end{aligned}$$

Lemma 4.7

If \(n\ge t-1\) then operators R(T) are linearly independent.

Proof

For each \(T\in \Sigma _{t,t}(d)\), consider the vectorization of r(T), which we denote by \(\vert T\rangle :={{\,\mathrm{vec}\,}}{\left( r(T)\right) } = \sum _{\mathbf {v} \in T} \vert \mathbf {v}\rangle \in ({\mathbb {C}}^d)^{\otimes 2t}\). Note that \(\langle \mathbf {v} | T\rangle = \delta _{\mathbf {v} \in T}\). Clearly, \({{\,\mathrm{vec}\,}}{\left( R(T)\right) } = {{\,\mathrm{vec}\,}}{\left( r(T) \right) }^{\otimes n} = \vert T\rangle ^{\otimes n}\). Therefore, we want to show that the vectors \(\vert T\rangle ^{\otimes n}\) are linearly independent as soon as \(n\ge t-1\). But each T is t-dimensional and contains the vector \(\mathbf {1}_{2t}\). Extend it by \(\mathbf {v}_1,\dots ,\mathbf {v}_{t-1}\) to a basis of T. Then, if \(T'\) is another subspace:

$$\begin{aligned} \langle \mathbf {v}_1\vert \dots \langle \mathbf {v}_{t-1}\vert \langle 0\vert ^{\otimes n-(t-1)}\vert T'\rangle ^{\otimes n} = \langle \mathbf {v}_1\vert \dots \langle \mathbf {v}_{t-1}\vert \vert T'\rangle ^{\otimes (t-1)} = \delta _{\mathbf {v}_1,\dots ,\mathbf {v}_{t-1}\in T'} = \delta _{T,T'} \end{aligned}$$

This concludes the proof. \(\square \)

So far, we have accomplished the task of finding a large set of linearly independent operators in the commutant of \({{\,\mathrm{Cliff}\,}}(n,d)^{\otimes t}\), one for each element of \(\Sigma _{t,t}(d)\). In the remainder of this section we will compute the dimension of the commutant as well as the cardinality of \(\Sigma _{t,t}(d)\), and show that the two numbers agree precisely. We will use the Gaussian binomial coefficients, which are defined by

$$\begin{aligned} \left( {\begin{array}{c}n\\ k\end{array}}\right) _d = \frac{[n]_d [n-1]_d \cdots [n-k+1]_d}{[k]_d [k-1]_d \cdots [1]_d}, \quad \text { where } [k]_d = \sum _{i=0}^{k-1} d^i, \end{aligned}$$

It is well-known that \(\left( {\begin{array}{c}n\\ k\end{array}}\right) _d\) equals the number of k-dimensional subspaces in \({\mathbb {Z}}_d^n\). The Gaussian binomial coefficients satisfy the following analogs of Pascal’s rule,

$$\begin{aligned} \left( {\begin{array}{c}n\\ k\end{array}}\right) _d = d^k \left( {\begin{array}{c}n-1\\ k\end{array}}\right) _d + \left( {\begin{array}{c}n-1\\ k-1\end{array}}\right) _d, \end{aligned}$$
(4.2)

and of the binomial formula,

$$\begin{aligned} \sum _{k=0}^n d^{k(k-1)/2} \left( {\begin{array}{c}n\\ k\end{array}}\right) _d t^k = \prod _{k=0}^{n-1} \left( d^k t + 1\right) . \end{aligned}$$
(4.3)

We now compute the dimension of the commutant. This has previously been done for \(t\le 4\) by Zhu [Zhu15] and before that for \(d=2, n=1\) by van den Nest et al [vdNDdM05].

We start with the following result from [Zhu15], which reduces the dimension computation to a counting problem. Zhu arrived at this result by computing the frame potential of the Clifford group—essentially, the norm squared of the character of the representation \(U\mapsto U^{\otimes k}\). In contrast, we will follow the approach by van den Nest et al, who considered the action of the Clifford average (also known as the twirl operation or Reynolds operator) on the basis of Weyl operators, correcting a glitch in [vdNDdM05] along the way.Footnote 6

Lemma 4.8

([Zhu15, vdNDdM05]). The dimension of the commutant of \({{\,\mathrm{Cliff}\,}}(n,d)^{\otimes t}\) is equal to the number of orbits for the diagonal action of the symplectic group \({{\,\mathrm{Sp}\,}}(2n,d)\) on \(t-1\) copies of the phase space \({\mathbb {Z}}_d^{2n}\), i.e., for the action

$$\begin{aligned} \Gamma \cdot (\mathbf {x}_1,\dots ,\mathbf {x}_{t-1}) = (\Gamma \mathbf {x}_1,\dots ,\Gamma \mathbf {x}_{t-1}), \end{aligned}$$
(4.4)

where \(\Gamma \in {{\,\mathrm{Sp}\,}}(2n,d)\) and \((\mathbf {x}_1,\dots ,\mathbf {x}_{t-1})\in ({\mathbb {Z}}_d^{2n})^{t-1}\).

Proof

We will show that the dimension of the commutant is equal to the number of orbits for the diagonal action of \({{\,\mathrm{Sp}\,}}(2n,d)\) on

$$\begin{aligned} {\mathcal {W}}_t :=\{ (\mathbf {x}_1,\dots ,\mathbf {x}_t) : \sum _{i=1}^t \mathbf {x}_i = 0 \}, \end{aligned}$$

which is plainly an equivalent statement.

We start by noting that the Weyl operators \(W_{\mathbf {x}}\) for \(\mathbf {x} = (\mathbf {x}_1,\dots ,\mathbf {x}_t)\in ({\mathbb {Z}}_d^{2n})^{t}\) form a basis of the space of operators on \((({\mathbb {C}}^d)^{\otimes n})^{\otimes t}\). We can thus obtain a generating set of the commutant by averaging each Weyl operator \(W_{\mathbf {x}}\) with respect to the tensor power action of the Clifford group. According to Lemma 2.1, we can for each symplectic matrix \(\Gamma \) fix a Clifford unitary \(U_\Gamma \) such that the set of \(\{U_\Gamma W_{\mathbf {b}}\}\) equals the Clifford group, up to global phases. Let us denote by \(f_\Gamma \) the phase function corresponding to \(U_\Gamma \), as in Eq. (2.8). Thus, the average of the Weyl operator \(W_{\mathbf {x}}\) is, up to overall normalization, given by

$$\begin{aligned} \Lambda _{{{\,\mathrm{Cliff}\,}}}(W_{\mathbf {x}})&:=d^{-2n} \sum _{\Gamma \in {{\,\mathrm{Sp}\,}}(2n,d)} \sum _{\mathbf {b}\in {\mathbb {Z}}_d^{2n}} \left( U_\Gamma W_{\mathbf {b}} \right) ^{\otimes t} W_{\mathbf {x}} \left( U_\Gamma W_{\mathbf {b}} \right) ^{\dagger ,\otimes t}\\&= d^{-2n} \sum _{\Gamma \in {{\,\mathrm{Sp}\,}}(2n,d)} \sum _{\mathbf {b}\in {\mathbb {Z}}_d^{2n}} \omega ^{[\mathbf {b},\mathbf {x}_1+\dots +\mathbf {x}_t]} U_\Gamma ^{\otimes t} W_{\mathbf {x}} U_\Gamma ^{\dagger ,\otimes t}\\&= \delta _{\mathbf {x} \in {\mathcal {W}}_t} \sum _{\Gamma \in {{\,\mathrm{Sp}\,}}(2n,d)} \omega ^{f_\Gamma (\mathbf {x})} W_{\Gamma \mathbf {x}}, \end{aligned}$$

where \(f_\Gamma (\mathbf {x}) = \sum _{i=1}^t f_\Gamma (\mathbf {x}_i)\) and \(\Gamma \mathbf {x}:=(\Gamma \mathbf {x}_1, \dots , \Gamma \mathbf {x}_t)\).

When d is odd, the phase function f can be chosen to vanish (Lemma 2.1). Thus, the averaged operator is equal to the sum of Weyl operators over the \({{\,\mathrm{Sp}\,}}(2n,d)\)-orbit of \(\mathbf {x}\), provided \(\mathbf {x}\in {\mathcal {W}}_t\), and zero otherwise. Since distinct orbits are disjoint, it is clear that we obtain a basis of the commutant by averaging one Weyl operator for each orbit of the diagonal action of \({{\,\mathrm{Sp}\,}}(2n,d)\) on \({\mathcal {W}}_t\).

Now consider the case where \(d=2\). To each \(\mathbf {x}\in {\mathcal {W}}_t\), associate the phase \(\phi _{\mathbf {x}}\) (a power of \(\tau \)) such that

$$\begin{aligned} W_{\mathbf {x}_1} \cdots W_{\mathbf {x}_t} = \phi _{\mathbf {x}}\, I. \end{aligned}$$

Then, for each \(\Gamma \in {{\,\mathrm{Sp}\,}}(2n,d)\),

$$\begin{aligned} \phi _{\mathbf {x}} \, I&= U_\Gamma W_{\mathbf {x}_1} \cdots W_{\mathbf {x}_t} U_\Gamma ^\dagger = (U_{\Gamma } W_{\mathbf {x}_1} U_{\Gamma }^\dagger ) \cdots (U_{\Gamma } W_{\mathbf {x}_t} U_{\Gamma }^\dagger ) \\&= \omega ^{f_\Gamma (\mathbf {x})} W_{\Gamma \mathbf {x}_1} \cdots W_{\Gamma \mathbf {x}_t} = \omega ^{f_\Gamma (\mathbf {x})} \phi _{\Gamma \mathbf {x}}. \end{aligned}$$

It follows that the phase function \(f_\Gamma (\mathbf {x})\) depends only on \(\mathbf {x}\) and \(\Gamma \mathbf {x}\) (rather than directly on \(\Gamma \)) and is given explicitly by the quotient

$$\begin{aligned} \omega ^{f_\Gamma (\mathbf {x})} = \frac{\phi _{\mathbf {x}}}{\phi _{\Gamma \mathbf {x}}}. \end{aligned}$$
(4.5)

Thus, for \(\mathbf {x} \in {\mathcal {W}}_t\),

$$\begin{aligned} \Lambda _{{{\,\mathrm{Cliff}\,}}}(W_{\mathbf {x}}) = \sum _{\Gamma \in {{\,\mathrm{Sp}\,}}(2n,d)} \frac{\phi _{\mathbf {x}}}{\phi _{\Gamma \mathbf {x}}} W_{\Gamma \mathbf {x}} = \phi _{\mathbf {x}} \sum _{\Gamma \in {{\,\mathrm{Sp}\,}}(2n,d)} \frac{W_{\Gamma \mathbf {x}}}{\phi _{\Gamma \mathbf {x}}}. \end{aligned}$$

In particular, if \(\mathbf {y}\) is in the same \({{\,\mathrm{Sp}\,}}(2n,d)\)-orbit as \(\mathbf {x}\) then \(\Lambda _{{{\,\mathrm{Cliff}\,}}}(W_{\mathbf {y}}) = \frac{\phi _{\mathbf {y}}}{\phi _{\mathbf {x}}} \Lambda _{{{\,\mathrm{Cliff}\,}}}(W_{\mathbf {x}})\), i.e., the two averaged operators only differ by a global phase. Thus, also for \(d=2\) we obtain a basis of the commutant by averaging one Weyl operator for each orbit of the diagonal action of \({{\,\mathrm{Sp}\,}}(2n,d)\) on \({\mathcal {W}}_t\). \(\square \)

We now derive an explicit formula for the dimension of the commutant.

Theorem 4.9

(Dimension of commutant). Let \(n\ge t-1\). Then the dimension of the commutant of \({{\,\mathrm{Cliff}\,}}(n,d)^{\otimes t}\) is equal to \(\prod _{k=0}^{t-2} (d^k+1)\).

Proof

To count the number of orbits of the action (4.4), we will associate to any orbit O an invariant, the dimension, defined by \(\dim (O) = \dim {{\,\mathrm{span}\,}}~\{ \mathbf {x}_1,\dots ,\mathbf {x}_{t-1} \}\), where \((\mathbf {x}_1,\dots ,\mathbf {x}_{t-1})\) is any point in the orbit. We write \(\Omega _t\) for the set of all orbits and \(\Omega _t^\ell \) for the set of orbits with dimension \(\ell \). We will establish and solve the following recursion relation:

$$\begin{aligned} |\Omega _t^\ell | = |\Omega _{t-1}^\ell | d^\ell + |\Omega _{t-1}^{\ell -1}| d^{\ell -1} \end{aligned}$$
(4.6)

To see why this is true, suppose \((\mathbf {x}_1,\dots ,\mathbf {x}_{t-1})\in \Omega _t^\ell \). Then there are two cases:

  1. 1.

    \(x_{t-1} \in {{\,\mathrm{span}\,}}~\{\mathbf {x}_1,\dots ,\mathbf {x}_{t-2}\}\): Then the orbit through \((\mathbf {x}_1,\dots ,\mathbf {x}_{t-2})\) is in \(\Omega _{t-1}^\ell \), and there are \(d^\ell \) ways to choose \(\mathbf {x}_{t-1}\in {{\,\mathrm{span}\,}}\{\mathbf {x}_1,\dots ,\mathbf {x}_{t-2}\}\). Together, this contributes \(|\Omega _{t-1}^\ell | d^\ell \) many orbits to \(\Omega _t^\ell \).

  2. 2.

    \(x_{t-1} \not \in {{\,\mathrm{span}\,}}~\{\mathbf {x}_1,\dots ,\mathbf {x}_{t-2}\}\): Then the orbit through \((\mathbf {x}_1,\dots ,\mathbf {x}_{t-2})\) is in \(\Omega _{t-1}^{\ell -1}\), and we have to count the number of ways that we can add a new vector \(\mathbf {x}_{t-1}\) to \({{\,\mathrm{span}\,}}~\{\mathbf {x}_1,\dots ,\mathbf {x}_{t-2}\}\) such that we get different orbits. By Witt’s theorem, which also holds for alternating forms in characteristic two [Wil09], the only invariants are the inner products between \(\mathbf {x}_{t-1}\) and a basis of \({{\,\mathrm{span}\,}}~\{\mathbf {x}_1,\dots ,\mathbf {x}_{t-2}\}\). By assumption, the latter space has dimension \(\ell -1\le t-2<n\), so we have \(d^{\ell -1}\) options for \(\mathbf {x}_{t-1}\). Together, this contributes \(|\Omega _{t-1}^{\ell -1}| d^{\ell -1}\) many orbits to \(\Omega _t^\ell \).

We have thus established the recursion relation (4.6). Since \(\Omega _{2,0} = \{ \{0\} \}\) and \(\Omega _{2,1} = \{ \mathbf {x}_1 \ne 0 \}\), we find the initial conditions \(|\Omega _{2,0}|=|\Omega _{2,1}|=1\). The solution to the recursion relation is

$$\begin{aligned} |\Omega _t^\ell | = d^{\ell (\ell -1)/2} \left( {\begin{array}{c}t-1\\ \ell \end{array}}\right) _d, \end{aligned}$$

as can be verified by using Pascal’s rule (4.2). Using the binomial formula (4.3), we conclude that

$$\begin{aligned} |\Omega _t| = \sum _{\ell =0}^{t-1} |\Omega _t^\ell | = \sum _{\ell =0}^{t-1} d^{\ell (\ell -1)/2} \left( {\begin{array}{c}t-1\\ \ell \end{array}}\right) _d = \prod _{k=0}^{t-2} (d^k+1). \end{aligned}$$
(4.7)

This establishes the desired formula for the dimension of the commutant. \(\square \)

Next, we count the number of stochastic Lagrangian subspaces. To this end, define the “diagonal subspace”

$$\begin{aligned} \Delta = \{ (\mathbf {x}, \mathbf {x}) \,|\, \mathbf {x}\in {\mathbb {Z}}_d^t \}\subset {\mathbb {Z}}_d^{2t}. \end{aligned}$$

Theorem 4.10

(Cardinality of \(\Sigma _{t,t}\)). We have \(\left|\Sigma _{t,t}(d)\right|=\prod _{k=0}^{t-2} (d^k+1)\).

Proof

Let \(\Sigma _{t,t}^\ell (d)\) denote the set of subspaces \(T\in \Sigma _{t,t}(d)\) such that \(\dim (T\cap \Delta )=t-\ell \). We will show that

$$\begin{aligned} \left|\Sigma _{t,t}^\ell (d)\right|= d^{\ell (\ell -1)/2} \left( {\begin{array}{c}t-1\\ \ell \end{array}}\right) _d, \end{aligned}$$
(4.8)

which implies the claim by the same calculation as in Eq. (4.7). To start, consider a subspace \(T\in \Sigma _{t,t}^\ell (d)\) and consider

$$\begin{aligned} T_\Delta :=T \cap \Delta = \{ (\mathbf {x},\mathbf {x}) : \mathbf {x} \in X \}, \end{aligned}$$

with X a \((t-\ell )\)-dimensional subspace that is uniquely determined by T. Since T is stochastic, we know that \(\mathbf {1}_t\in X\). Fix a basis \(\mathbf {x}_1,\dots ,\mathbf {x}_{t-\ell }\) of X and extend it by vectors \(\mathbf {z}_1,\dots ,\mathbf {z}_\ell \) to a basis of \({\mathbb {Z}}_d^t\). Denote the dual basis with respect to the ordinary dot product by \(\hat{\mathbf {x}}_1,\dots ,\hat{\mathbf {x}}_{t-\ell }\), \(\hat{\mathbf {z}}_1,\dots ,\hat{\mathbf {z}}_\ell \). Now, any vector in \({\mathbb {Z}}_d^{2t}\), so particularly in T, can be written uniquely in the form \((\mathbf {a}+\mathbf {b},\mathbf {b})\). The subspace of vectors where \(\mathbf {b}\) is a linear combination of \(\mathbf {z}_1,\dots ,\mathbf {z}_\ell \) forms a complement of \(T_\Delta \subseteq T\), which we shall denote by \(T_N\). Since \(T_N \cap \Delta = \{0\}\), we know that \(\mathbf {a}\ne 0\) for any nonzero vector in \(T_N\). The condition that T is self-orthogonal implies that \(\mathbf {a} \cdot \mathbf {x}_i = 0\) for all \(i=1,\dots ,t-\ell \), so that \(\mathbf {a}\) is a linear combination of \(\hat{\mathbf {z}}_1,\dots ,\hat{\mathbf {z}}_\ell \). Since also \(\dim T_N=\ell \), this implies that \(T_N\) has a unique basis of the form \((\hat{\mathbf {z}}_1+\mathbf {w}_1,\mathbf {w}_1)\), ..., \((\hat{\mathbf {z}}_\ell +\mathbf {w}_\ell ,\mathbf {w}_\ell )\), where each \(\mathbf {w}_i\) is of the form \(\mathbf {w}_i = \sum _{j=1}^\ell A_{ij} \mathbf {z}_j\). We still need to implement the condition that \(T_N\) is self-orthogonal. In terms of the matrix \(A=(A_{ij})\), this means that

$$\begin{aligned} 0 = (\hat{\mathbf {z}}_i+\mathbf {w}_i) \cdot (\hat{\mathbf {z}}_j+\mathbf {w}_j) - \mathbf {w}_i \cdot \mathbf {w}_j = \hat{\mathbf {z}}_i\cdot \hat{\mathbf {z}}_j + A_{ij} + A_{ji} \pmod d \end{aligned}$$
(4.9)

for any ij. This means that the lower triangular part of A is uniquely determined by the upper triangular part.

For \(d\ne 2\), (4.9) furthermore implies that the diagonal entries of A are fixed, so there are in total \(d^{\ell (\ell -1)/2}\) many options for A. We have thus implemented all conditions for T to be a subspace in \(\Sigma _{t,t}^\ell (d)\) since, according to Remark 4.2, for \(d\ne 2\), any self-orthogonal T is automatically totally isotropic. The set of \((t-\ell )\)-dimensional subspaces of \({\mathbb {Z}}_d^t\) that contain \(\mathbf {1}_t\) are in bijection with the \((t-\ell -1)\)-dimensional subspaces in \({\mathbb {Z}}_d^t/{\mathbb {Z}}_d \mathbf {1}_t\), hence there are \(\left( {\begin{array}{c}t-1\\ t-\ell -1\end{array}}\right) _d=\left( {\begin{array}{c}t-1\\ \ell \end{array}}\right) _d\) many choices for X. Together, we obtain (4.8).

For \(d=2\), (4.9) gives no constraint about the diagonal entries of A. Instead, it asserts that \(\hat{\mathbf {z}}_i \cdot \hat{\mathbf {z}}_i = 0\) or, equivalently, that \(\hat{\mathbf {z}}_i \cdot \mathbf {1}_t = 0\) for \(i=1,\dots ,\ell \), which is automatically satisfied since \(\mathbf {1}_t\in X\). We will now show that there is a unique choice for the diagonal entries of A such that T is totally isotropic with respect to the \({\mathbb {Z}}_4\)-valued quadratic form \({\mathfrak {q}}\). By the discussion in Remark 4.2, since T is self-orthogonal, it suffices to consider \(T_\Delta \) and its complement \(T_N\) separately. But the vectors in \(T_\Delta \) are automatically isotropic, while for \(T_N\) total isotropy amounts to the condition that

$$\begin{aligned} 0 = (\hat{\mathbf {z}}_i+\mathbf {w}_i) \cdot (\hat{\mathbf {z}}_i+\mathbf {w}_i) - \mathbf {w}_i \cdot \mathbf {w}_i = \hat{\mathbf {z}}_i\cdot \hat{\mathbf {z}}_i + 2 A_{ii} \pmod 4, \end{aligned}$$

which fixes the \(A_{ii}\) uniquely. We thus obtain (4.8) by the same counting as above. \(\square \)

We finally obtain Theorem 4.3 as a consequence of the preceding results.

Proof of Theorem 4.3

By combining Lemma 4.5 and Theorems 4.9 and 4.10, we see that the operators R(T) form a basis of the commutant of \({{\,\mathrm{Cliff}\,}}(n,d)^{\otimes t}\) on \((({\mathbb {C}}^d)^{\otimes n})^{\otimes t}\). \(\square \)

It is interesting to note that all elements R(T) of our basis of the commutant of \({{\,\mathrm{Cliff}\,}}(n,d)^{\otimes s}\) have the property that \(\langle S^{\otimes t} | R(T) | S^{\otimes t}\rangle =1\) for every stabilizer state \(\vert S\rangle \). Indeed, if \(T\in \Sigma _{t,t}(d)\) and \(\vert S\rangle =U \vert 0\rangle ^{\otimes n}\) for some Clifford unitary U, then

$$\begin{aligned} \langle S^{\otimes t} | R(T) | S^{\otimes t}\rangle&=\langle S^{\otimes t} | R(T) U^{\otimes t} | 0^{\otimes tn}\rangle = \langle S^{\otimes t} | U^{\otimes t} R(T) | 0^{\otimes tn}\rangle \nonumber \\&= \langle 0^{\otimes tn} | R(T) | 0^{\otimes tn}\rangle = 1, \end{aligned}$$
(4.10)

where we used that \(\mathbf {0}\in T\) (see also Eq. (4.13) below).

4.2 Structure of the commutant

Theorem 4.3 is in the spirit of Schur–Weyl duality in that it establishes a natural basis of the commutant of the tensor power action of the Clifford group (a subgroup of the unitary group), generalizing the permutation operators. Yet, in contrast to the permutation group, \(\Sigma _{t,t}(d)\) is not in general a group and the operators R(T) for \(T\in \Sigma _{t,t}(d)\) are not always invertible. In this section we show that \(\Sigma _{t,t}(d)\) has a rich algebraic structure.

We first observe that there is a maximal subset of \(\Sigma _{t,t}(d)\) that carries a group structure such that the R(T) form a (unitary) representation. The following definition and lemma identify these elements:

Definition 4.11

(\(O_t\)) Consider the quadratic form \(q:{\mathbb {Z}}_d^t \rightarrow {\mathbb {Z}}_D\) defined by \(q(\mathbf {x}) :=\mathbf {x}\cdot \mathbf {x}\).Footnote 7 We define \(O_t(d)\) as the group of \(t\times t\)-matrices O with entries in \({\mathbb {Z}}_d\) that satisfy the following properties:

  1. 1.

    O is a q-isometry: i.e., \(O\mathbf {x}\cdot O\mathbf {x}=\mathbf {x}\cdot \mathbf {x}\pmod D\) for all \(\mathbf {x}\in {\mathbb {Z}}_d^t\).

  2. 2.

    O is stochastic: \(O \mathbf {1}_t = \mathbf {1}_t \pmod d\).

We will refer to \(O_t(d)\) as the stochastic orthogonal group; its elements will be called stochastic isometries.

To see that \(O_t(d)\) forms a group we only need to observe that \(O^{-1} = O^T\) is again in \(O_t(d)\). The following remark is completely analogous to Remark 4.2.

Remark 4.12

Recall that a linear map is an isometry with respect to a quadratic form q if \(q(O\mathbf {x})=q(\mathbf {x})\) for all \(\mathbf {x}\in {\mathbb {Z}}_d^t\). This justifies our terminology in Definition 4.11. As before, we note that q is a \({\mathbb {Z}}_D\)-valued quadratic form associated to the \({\mathbb {Z}}_d\)-bilinear form \(\mathbf {x}\cdot \mathbf {y}\) in the sense of [Woo93], namely,

$$\begin{aligned} q(\mathbf {x} + \mathbf {y}) = q(\mathbf {x}) + q(\mathbf {y}) + 2 \mathbf {x} \cdot \mathbf {y} \pmod D. \end{aligned}$$
(4.11)

In particular, any \(O\in O_t(d)\) is an orthogonal matrix in the ordinary sense that \(O^T O = I \pmod d\), i.e., \(O\mathbf {x} \cdot O\mathbf {y}=\mathbf {x}\cdot \mathbf {y}\) for all \(\mathbf {x}, \mathbf {y}\in {\mathbb {Z}}_d^t\). If d is odd then \(q(\mathbf {x})=\mathbf {x}\cdot \mathbf {x}\), so any orthogonal matrix is automatically a q-isometry.

If \(d=2\) then Eq. (4.11) implies that an orthogonal matrix O is a q-isometry provided that \(q(\mathbf {x})=1\pmod 4\) for every column of O or, equivalently, for every row of O (since \(O^T = O^{-1}\)). In particular, any q-isometry is automatically stochastic.

The significance of Definition 4.11 is the following observation.

Lemma 4.13

For every \(O\in O_t(d)\), the subspace

$$\begin{aligned} T_O:=\{(O\mathbf {x},\mathbf {x})\,:\, \mathbf {x} \in {\mathbb {Z}}_d^t\} \end{aligned}$$

is an element of \(\Sigma _{t,t}(d)\) and the operators

$$\begin{aligned} r(O) :=r(T_O) = \sum _{\mathbf {x}} \vert O\mathbf {x}\rangle \langle \mathbf {x}\vert , \quad R(O) :=r(O)^{\otimes n} = R(T_O) \end{aligned}$$
(4.12)

are unitary. Conversely, if R(T) is invertible then \(T=T_O\) for some \(O\in O_t(d)\). Moreover, the operators R(O) define a unitary representation of \(O_t(d)\) on \(({\mathbb {C}}^d)^{\otimes tn}\).

Proof

Only the converse needs justification. Note that in order for R(T) to be invertible, both subspaces \(\{ \mathbf {x} : (\mathbf {x},\mathbf {y})\in T \}\) and \(\{ \mathbf {y} : (\mathbf {x},\mathbf {y})\in T\}\) of \({\mathbb {Z}}_d^t\) should be t-dimensional (corresponding to r(T) having full row and column rank). The claim now follows easily. \(\square \)

We will often regard \(O_t(d)\) as a subset of \(\Sigma _{t,t}(d)\) via the assignment \(O \mapsto T_O\). Note that any permutation matrix satisfies the conditions of Definition 4.11, so we can consider \(S_t\) as a subgroup of \(O_t(d)\) for every value of d, and hence as a subset of \(\Sigma _{t,t}(d)\).

Remark 4.14

The Clifford group is a t-design (for \(n\ge t-1\)) if and only if \(S_t = O_t(d) = \Sigma _{t,t}(d)\), i.e., if and only if

$$\begin{aligned} t! = |S_t| = |\Sigma _{t,t}(d)| = \prod _{k=0}^{t-2} (d^k+1). \end{aligned}$$

This identity always holds up to \(t=2\), and up to \(t=3\) precisely in the case of qubits (\(d=2\)). Thus the multiqubit Clifford group is a 3-design (but not a 4-design), while in higher dimensions the Clifford group is only a 2-design (but not a 3-design), reproducing prior beautiful results [Zhu15, Web16].

For \(O\in O_t(d)\), Eq. (4.10) implies that

$$\begin{aligned} R(O) \vert S\rangle ^{\otimes t} = \vert S\rangle ^{\otimes t} \end{aligned}$$
(4.13)

for every stabilizer state \(\vert S\rangle \). That is, stochastic isometries stabilize any stabilizer tensor power. We will return to discussing the implications of this important fact in Sect. 5 below.

Next, we note that the group \(O_t(d)\) naturally acts on the elements of \(\Sigma _{t,t}(d)\) from left and right, suggesting that it is the natural symmetry group of \(\Sigma _{t,t}(d)\).

Definition 4.15

(Left and right action on subspaces). Consider a subspace \(T\in \Sigma _{t,t}(d)\) and a matrix \(O \in O_t(d)\). We define the left action of O on T as follows:

$$\begin{aligned} OT=\{(O \mathbf {x},\mathbf {y})\, : \, (\mathbf {x},\mathbf {y}) \in T\}. \end{aligned}$$
(4.14)

Similarly, the right action of O on T is defined as:

$$\begin{aligned} TO=\{( \mathbf {x},O^T \mathbf {y})\, : \, (\mathbf {x},\mathbf {y}) \in T\}. \end{aligned}$$
(4.15)

It is easy to check that \(OT, TO \in \Sigma _{t,t}(d)\).

Note that this action is consistent with the composition of the operators R(T) and R(O): For all \(T\in \Sigma _{t,t}(d)\) and \(O,O'\in O_t(d)\) we have that

$$\begin{aligned} R(O) R(T) R(O')=R(OTO'). \end{aligned}$$

We can therefore decompose \(\Sigma _{t,t}(d)\) into a disjoint union of double cosets with respect to the left and right action:

$$\begin{aligned} \Sigma _{t,t}(d) = O_t(d) T_1 O_t(d) \cup \dots \cup O_t(d) T_k O_t(d), \end{aligned}$$
(4.16)

where \(T_1,\dots ,T_k\) are choices of subspaces in \(\Sigma _{t,t}(d)\) that represent the different cosets. We note that \(O_t(d)\) is always one of the double cosets in Eq. (4.16), corresponding to, e.g., the choice \(T_1=\Delta \).

We will now derive a complete classification of the double cosets. We start with the central definition. Recall the quadratic form \(q:{\mathbb {Z}}_d^t \rightarrow {\mathbb {Z}}_D\), \(q(\mathbf {x}) :=\mathbf {x}\cdot \mathbf {x}\) from Definition 4.11.

Definition 4.16

(Defect subspaces). A defect subspace is a subspace \(N\subseteq {\mathbb {Z}}_d^t\) with the following properties:

  1. 1.

    N is totally q-isotropic: i.e., \(q(\mathbf {x})=0\pmod D\) for all \(\mathbf {x}\in N\).

  2. 2.

    N is co-stochastic: \(\mathbf {1}_t\in N^\perp \), i.e., \(\mathbf {x}\cdot \mathbf {1}_t=0\pmod d\) for every \(\mathbf {x}\in N\).

The quotient \(N^\perp /N\) inherits a \({\mathbb {Z}}_D\)-valued quadratic form, which we also denote by \(q([\mathbf {y}]) :=q(\mathbf {y})\).

Given two defect subspaces N and M, we write \({{\,\mathrm{Iso}\,}}(N,M)\) for the set of defect isomorphisms, by which we mean invertible linear maps \(J :N^\perp /N \rightarrow M^\perp /M\) with the following two properties:

  1. 1.

    J is a q-isometry: i.e., \(q(J[\mathbf {y}]) = q([\mathbf {y}])\) for all \([\mathbf {y}]\in N^\perp /N\).

  2. 2.

    J is stochastic: \(J[\mathbf {1}_t] = [\mathbf {1}_t]\).

The inverse of J is again a map in \({{\,\mathrm{Iso}\,}}(M,N)\).

This definition is central as it allows us to construct elements in \(\Sigma _{t,t}(d)\). If N, M are defect subspaces and \(J\in {{\,\mathrm{Iso}\,}}(M,N)\), then

$$\begin{aligned} \begin{aligned} T&= \{ (\mathbf {x} + \mathbf {z}, \mathbf {y} + \mathbf {w}) : [\mathbf {y}]\in M^\perp /M, \, [\mathbf {x}]=J[\mathbf {y}], \, \mathbf {z}\in N, \, \mathbf {w}\in M \} \\&= \{ (\mathbf {x}, \mathbf {y}) : \mathbf {y}\in M^\perp , \, \mathbf {x} \in J[\mathbf {y}] \} \end{aligned} \end{aligned}$$
(4.17)

is an element in \(\Sigma _{t,t}(d)\). Note that, necessarily, \(\dim N=\dim M\) (since J is invertible) and \(\mathbf {1}_t\in N\) if and only if \(\mathbf {1}_t\in M\) (since J is also stochastic). We now show that all elements of \(\Sigma _{t,t}(d)\) can be obtained in this way.

Proposition 4.17

Let \(T\in \Sigma _{t,t}(d)\).

  1. 1.

    The subspaces \(T_{LD} :=\{\mathbf {x} : (\mathbf {x},0)\in T\}\) and \(T_{RD} :=\{\mathbf {y} : (0,\mathbf {y})\in T\}\) are defect subspaces. We call them the left and right defect subspaces of T, respectively.

  2. 2.

    \(\dim T_{LD}=\dim T_{RD}\) and \(\mathbf {1}_t\in T_{LD}\) if and only if \(\mathbf {1}_t\in T_{RD}\).

  3. 3.

    \(T_{LD}^\perp = T_L :=\{ \mathbf {x} : (\mathbf {x},\mathbf {y}) \in T\}\) and \(T_{RD}^\perp = T_R :=\{ \mathbf {y} : (\mathbf {x},\mathbf {y}) \in T\}\),

  4. 4.

    For every \(\mathbf {y}\in T_{RD}^\perp \), choose some \(\mathbf {x}(\mathbf {y})\) such that \((\mathbf {x}(\mathbf {y}), \mathbf {y})\in T\). Then \(T_J:[\mathbf {y}]\mapsto [\mathbf {x}(\mathbf {y})]\) is a well-defined defect isomorphism, i.e., an element in \({{\,\mathrm{Iso}\,}}(T_{RD}, T_{LD})\).

  5. 5.

    The data \((T_{LD}, T_{RD}, T_J)\) is uniquely determined by T.

  6. 6.

    T is of the form (4.17), with \(T_{LD}=N\), \(T_{RD}=M\), and \(T_J=J\).

Proof

The first claim is clear from Definition 4.1. Since T is stochastic, so \((\mathbf {1}_t,0)\in T\) if and only if \((0,\mathbf {1}_t) = \mathbf {1}_{2t}-(\mathbf {1}_t,0)\in T\), which proves half of the second claim. Next, consider the maps

$$\begin{aligned} \pi _L:T\rightarrow {\mathbb {Z}}_d^t, \; (\mathbf {x}, \mathbf {y})\mapsto \mathbf {x} \quad \text {and}\quad \pi _R:T\rightarrow {\mathbb {Z}}_d^t, \; (\mathbf {x}, \mathbf {y})\mapsto \mathbf {y}. \end{aligned}$$

Then \(T_{LD}\cong \ker \pi _R\) and \(T_{RD}\cong \ker \pi _L\), while \(T_L={{\,\mathrm{ran}\,}}\pi _L\) and \(T_R={{\,\mathrm{ran}\,}}\pi _R\). Note that \(T_L\subseteq T_{LD}^\perp \) and \(T_R\subseteq T_{RD}^\perp \), since T is totally isotropic. Using the rank-nullity theorem,

$$\begin{aligned} \dim T_{LD}^\perp&= \dim T - \dim T_{LD} = \dim T/\ker \pi _R = \dim {{\,\mathrm{ran}\,}}\pi _R = \dim T_R \le \dim T_{RD}^\perp , \\ \dim T_{RD}^\perp&= \dim T - \dim T_{RD} = \dim T/\ker \pi _L = \dim {{\,\mathrm{ran}\,}}\pi _L = \dim T_L \le \dim T_{LD}^\perp . \end{aligned}$$

Adding the two inequalities we see that they must both be equalities, hence \(\dim T_{LD} = \dim T_{RD}\) as well as \(T_L=T_{LD}^\perp \), \(T_R=T_{RD}^\perp \). This establishes the second and third claim.

For the fourth claim, first recall from above that \(T_{LD}^\perp =T_L\) and \(T_{RD}^\perp =T_R\), which means that for any \(\mathbf {y}\in T_{RD}^\perp \) there exists some \(\mathbf {x}\in T_{LD}^\perp \) such that \((\mathbf {x},\mathbf {y})\in T\). Next, suppose that \((\mathbf {x},\mathbf {y}),(\mathbf {x}',\mathbf {y}')\in T\) such that \(\mathbf {y}-\mathbf {y}'\in T_{RD}\). Then \((0,\mathbf {y}-\mathbf {y}')\in T\), so

$$\begin{aligned} (\mathbf {x}-\mathbf {x}',0) = (\mathbf {x},\mathbf {y}) - (0,\mathbf {y}-\mathbf {y}') - (\mathbf {x}',\mathbf {y}') \in T, \end{aligned}$$

which means that \(\mathbf {x}-\mathbf {x}'\in T_{LD}\). As a consequence, \([\mathbf {y}]\mapsto [\mathbf {x}(\mathbf {y})]\) is well-defined as a map from \(T_{RD}^\perp /T_{RD}\) to \(T_{LD}^\perp /T_{LD}\). Using Definition 4.1, it is not hard to see that it defines an element of \({{\,\mathrm{Iso}\,}}(T_{RD}, T_{LD})\).

The fifth claim is clear by construction of \(T_{LD}\) and \(T_{RD}\) and from the fact that \([\mathbf {y}]\mapsto [\mathbf {x}(\mathbf {y})]\) is well-defined. And the last claim can be seen to hold since the right-hand side of (4.17) is clearly a subset of T for our choice of N, M, and J, but also of dimension t. \(\square \)

Thus we can cleanly decompose a subspace T into the two defect subspaces \(T_{LD}\) and \(T_{RD}\) as well as the defect isomorphism \(T_J:[\mathbf {y}]\mapsto [\mathbf {x}(\mathbf {y})]\). When T corresponds to a stochastic isometry \(O\in O_t(d)\), i.e., \(T = T_O= \{(O\mathbf {y},\mathbf {y})\}\), then both defect subspaces are trivial and the defect isomorphism \(T_J\) can be identified with O itself.

According to Proposition 4.17, for every \(T\in \Sigma _{t,t}(d)\), the left and right defect subspaces \(T_{LD}\) and \(T_{RD}\) necessarily have the same dimension and \(\mathbf {1}_t\in T_{LD}\) if and only if \(\mathbf {1}_t\in T_{RD}\). Note that if \(T\in \Sigma _{t,t}(d)\) and \(O, O'\in O_t(d)\), then \(T' = O T O'\) has defect subspaces

$$\begin{aligned} T'_{LD} = O T_{LD} \quad \text {and}\quad T'_{RD} = (O')^T T_{RD} \end{aligned}$$
(4.18)

and defect isomorphism

$$\begin{aligned} T'_J :[\mathbf {y}] \mapsto [O \mathbf {x}(O' \mathbf {y})]. \end{aligned}$$
(4.19)

Which elements \(T'\in \Sigma _{t,t}(d)\) can be obtained in this way? Clearly, the left-right action by \(O_t(d)\) preserves the common dimension of the defect subspaces and whether the all-ones vector is contained. We will now show that these are the only two invariants.

Lemma 4.18

Let \(N, M\subseteq {\mathbb {F}}_d^t\) be two defect subspaces with \(\dim N=\dim M\) and \(\mathbf {1}_t\in N\) if and only if \(\mathbf {1}_t\in M\). Then there exists \(O\in O_t(d)\) such that \(ON=M\).

Proof

Let \({\tilde{N}} :=N+{\mathbb {Z}}_d\mathbf {1}_t\) and \({\tilde{M}} :=M+{\mathbb {Z}}_d\mathbf {1}_t\). The assumption implies that \(\dim {\tilde{N}}=\dim {\tilde{M}}\). Choose any linear isomorphism \({\tilde{O}}:{\tilde{N}}\rightarrow {\tilde{M}}\) such that \({\tilde{O}} \mathbf {1}_t = \mathbf {1}_t\). Since both N and M are totally isotropic and co-stochastic, \({\tilde{O}}\) is an isometry with respect to the symmetric bilinear form \(\mathbf {x}\cdot \mathbf {y}\).

If \(d>2\), we can directly apply the usual version of Witt’s lemma for symmetric bilinear forms of odd characteristic [Wil09] to see that \({\tilde{O}}\) extends to an isometry map O which by construction is also stochastic, i.e., \(O\in O_t(d)\).

For \(d=2\), we appeal to the version of Witt’s lemma from [Woo93] for the \({\mathbb {Z}}_4\)-valued quadratic form \(q(\mathbf {x})=\mathbf {x}\cdot \mathbf {x}\pmod 4\) from Remark 4.12. Here we need to verify two conditions: (i) \({\tilde{O}}\) should be an isometry with respect to q, i.e., \(q({\tilde{O}}\mathbf {x}) = q(\mathbf {x})\) for every \(\mathbf {x}\in {\tilde{N}}\). Since we already know that \({\tilde{O}}\) is orthogonal, it suffices to check this condition on a generating set. By construction, \({\tilde{O}}\mathbf {1}_t=\mathbf {1}_t\), so the condition is clearly true for \(\mathbf {x}=\mathbf {1}_t\). On the other hand, both defect subspaces are totally isotropic, which means that \(q({\tilde{O}}\mathbf {x})=0=q(\mathbf {x})\pmod 4\) for every \(\mathbf {x}\in N\). Thus, the first condition is satisfied. (ii) We also need to check is that \({\tilde{N}} \cap I^\perp = {\tilde{M}} \cap I^\perp \), where \(I :=\{ \mathbf {y} \in {\mathbb {Z}}_2^t : \mathbf {y}\cdot \mathbf {y}=0 \pmod 2 \}\). But \(I = \mathbf {1}_t^\perp \) and hence \(I^\perp ={\mathbb {Z}}_2 \mathbf {1}_t\). By construction, \(\mathbf {1}_t \in {\tilde{N}}\) and \(\mathbf {1}_t\in {\tilde{M}}\), so the second condition is also satisfied. We conclude that \({\tilde{O}}\) extends to an isometry O with respect to the quadratic form q, which implies that \(O\in O_t(2)\) (Remark 4.12). \(\square \)

Corollary 4.19

Let \(N, M\subseteq {\mathbb {F}}_d^t\) be two defect subspaces with \(\dim N=\dim M\) and \(\mathbf {1}_t\in N\) if and only if \(\mathbf {1}_t\in M\). Then there exists \(T\in \Sigma _{t,t}(d)\) such that \(T_{LD}=N\) and \(T_{RD}=M\).

Proof

Take \(O\in O_t(d)\) as in Lemma 4.18. Then, \(O^T M = N\), \(O^T M^\perp =N^\perp \), and hence \(J:[\mathbf {y}] \mapsto [O^T \mathbf {y}]\) is a defect isomorphism. Then the subspace (4.17) has the desired defect spaces. \(\square \)

Lemma 4.20

Let \(J:N^\perp /N \rightarrow M^\perp /M\) be a defect isomorphism. Then there exists an \(O\in O_t(d)\) inducing J, i.e., \(ON=M\) and \([O\mathbf {x}]=J[\mathbf {x}]\) for every \([\mathbf {x}]\in N^\perp \).

Proof

(Sketch) Note that the existence of J implies that \(\dim M=\dim N\) as well as \(\mathbf {1}_t\in M\) if and only if \(\mathbf {1}_t\in N\). This means that we can choose a linear isomorphism \({\tilde{J}}:N^\perp \rightarrow M^\perp \) that fixes \(\mathbf {1}_t\), sends N to M and which restricts to J. As in the proof of Lemma 4.18, we can use the appropriate version of Witt’s lemma to obtain the existence of an extension \(O\in O_t(d)\). \(\square \)

Corollary 4.21

(Equivalence of double cosets). Let \(T, T'\in \Sigma _{t,t}(d)\). Then, \(T' \in O_t(d) T O_t(d)\) if and only if \(\dim T_{LD} = \dim T'_{LD}\) and \(\mathbf {1}_t\in T_{LD}\) iff \(\mathbf {1}_t\in T'_{RD}\). In particular, \(\Sigma _{t,t}(d)\) consists of no more than t double cosets.

Proof

It is clear that the two conditions are necessary. We will now argue that they are sufficient. First, use Lemma 4.18 to find O and \(O'\) such that \(O T'_{LD} = T_{LD}\) and \(O' T_{RD} = T'_{RD}\). Then \(T'' :=O T' O'\) is such that \(T''_{LD} = T_{LD}\) and \(T''_{RD} = T_{RD}\) (Eq. (4.18)). Next, use Lemma 4.20 to obtain some \(O''\) that induces the defect isomorphism \(T_J (T''_J)^{-1}:T_{LD}^\perp /T_{LD}\rightarrow T_{LD}^\perp /T_{LD}\). Then \(O'' T'' = T\), concluding the proof. For the last remark, note that the dimension of a totally isotropic subspace is never larger than t/2. If the dimension is zero, then it cannot contain \(\mathbf {1}_t\), while if the dimension is t/2 then it is Lagrangian, hence must contain \(\mathbf {1}_t\). Hence the number of possible dimensions is at most \(1 + (t/2-1)2 + 1 = t\). \(\square \)

Remark 4.22

We can also restrict to either the left or the right action. In this case, the resulting cosets are classified by the right and left defect subspace, respectively, which is an arbitrary defect subspace in the sense of Definition 4.16.

Next, we give an explicit description of the operators \(R(T) = r(T)^{\otimes n}\) in terms of the defect subspaces and the defect isomorphism. If N is a defect subspace, define the coset states

$$\begin{aligned} \vert N,[\mathbf {x}]\rangle :=|N|^{-1/2} \sum _{\mathbf {z}\in N} \vert \mathbf {x} + \mathbf {z}\rangle , \end{aligned}$$

which form an orthonormal family for \([\mathbf {x}]\in N^\perp /N\).

Lemma 4.23

Let \(T\in \Sigma _{t,t}(d)\), with defect subspaces \(T_{LD}\), \(T_{RD}\) and defect isomorphism \(T_J:T_{LD}^\perp /T_{LD}\rightarrow T_{RD}^\perp /T_{RD}\). Then:

$$\begin{aligned} \frac{1}{|T_{LD}|} r(T) = \sum _{[\mathbf {y}]\in T_{RD}^\perp /T_{RD}}\vert T_{LD},T_J [\mathbf {y}]\rangle \langle T_{RD},[\mathbf {y}]\vert . \end{aligned}$$
(4.20)

Thus, r(T) is proportional to a partial isometry, and \({{\,\mathrm{rank}\,}}r(T)=|T_{LD}^\perp /T_{LD}|=|T_{RD}^\perp /T_{RD}|\).

Proof

We obtain (4.20) directly from Proposition 4.17 and (4.17). Since the coset states form two orthonormal families and \(T_J\) is a bijection, the formula for the rank follows at once. \(\square \)

Now consider the case when the left and right defect subspaces coincide and the defect isomorphism is trivial. That is,

$$\begin{aligned} \begin{aligned} T&= \{ (\mathbf {x}+\mathbf {z},\mathbf {x}+\mathbf {w}) \;:\; [\mathbf {x}]\in N^\perp /N, \, \mathbf {z}, \mathbf {w}\in N \} \\&= \{ (\mathbf {x}, \mathbf {y}) \;:\; \mathbf {y} \in N^\perp , \mathbf {x}\in [\mathbf {y}] \} = \{ (\mathbf {x}, \mathbf {y}) \;:\; \mathbf {x} \in N^\perp , \mathbf {y}\in [\mathbf {x}] \}, \end{aligned} \end{aligned}$$
(4.21)

where \(N :=T_{LD} = T_{RD}\) is an arbitrary defect subspace. In view of Corollary 4.21, any double coset contains a subspace of this form. In this case, r(T) and R(T) are related to a well-known family of codes in quantum information theory. To state the result, define the Weyl operators of, respectively, shift and multiply type:

$$\begin{aligned} Z_{\mathbf {p}} = W_{\mathbf {p},\mathbf {0}} \qquad \text { and }\qquad X_{\mathbf {q}} = W_{\mathbf {0},\mathbf {q}}. \end{aligned}$$

Given any totally isotropic subspace \(N\subseteq {\mathbb {Z}}_d^t\), the set

$$\begin{aligned} {{\,\mathrm{CSS}\,}}(N):=\{ Z_{\mathbf {p}} X_{\mathbf {q}} : \mathbf {q}, \mathbf {p} \in N \} \end{aligned}$$

forms a stabilizer group of cardinality \(|N|^2\) (since N is self-orthogonal, the Weyl operators commute). Such codes are a simple variant of Calderbank-Shor-Sloane (CSS) codes [Ste96b, CS96, Ste96a]. The projection onto the code space can be written as

$$\begin{aligned} P_{{{\,\mathrm{CSS}\,}}(N)} = \frac{1}{|N|^2} \sum _{\mathbf {q},\mathbf {p}\in N} Z_{\mathbf {p}} X_{\mathbf {q}} \end{aligned}$$
(4.22)

By taking the trace of Eq. (4.22), one finds the dimension of the code is given by \(d^{t-2\dim N}=|N^\perp /N|\). One can readily confirm that the coset states \(\vert N,[\mathbf {z}]\rangle \) for \([\mathbf {z}]\in N^\perp /N\) form an orthonormal basis, so

$$\begin{aligned} P_{{{\,\mathrm{CSS}\,}}(N)} = \sum _{[\mathbf {x}]\in N^\perp /N} \vert N,[\mathbf {x}]\rangle \langle N,[\mathbf {x}]\vert , \end{aligned}$$
(4.23)

In particular, all this applies in the situation of Eq. (4.21). It follows that r(T) and R(T) are proportional to orthogonal projections onto CSS codes associated with the defect subspaces:

Theorem 4.24

(CSS codes). Suppose that T is of the form (4.21), i.e., its left and right defect subspaces coincide and that the defect isomorphism is trivial. Let \(N :=T_{LD} = T_{RD}\). Then,

$$\begin{aligned} r(T) = |N| \, P_{{{\,\mathrm{CSS}\,}}(N)} = d^{\dim N} P_{{{\,\mathrm{CSS}\,}}(N)}. \end{aligned}$$

Conversely, if \(T\in \Sigma _{t,t}(d)\) is such that r(T) is an orthogonal projection, then T is of the form (4.21).

Proof

The formula for r(T) follows directly by comparing Eqs. (4.20) and (4.23).

Conversely, suppose that r(T) is an orthogonal projection. We see from Eq. (4.20) that the range of r(T) is spanned by the \(\vert T_{LD},[\mathbf {x}]\rangle \), so we must have

$$\begin{aligned} \vert T_{LD},[\mathbf {x}]\rangle = r(T) \vert T_{LD},[\mathbf {x}]\rangle = \sum _{[\mathbf {y}]\in T_{RD}^\perp /T_{RD}} \vert T_{LD},T_J [\mathbf {y}]\rangle \langle T_{RD},[\mathbf {y}] \,|\, T_{LD},[\mathbf {x}]\rangle . \end{aligned}$$

Since the coset states \(\vert T_{LD},[\mathbf {x}]\rangle \) form a basis, it follows that

$$\begin{aligned} \langle T_{RD},[\mathbf {y}] \,|\, T_{LD},[\mathbf {x}]\rangle = \delta _{[\mathbf {x}],T_J[\mathbf {y}]} \end{aligned}$$

for all \(\mathbf {x}\in T_{LD}\) and \(\mathbf {y}\in T_{RD}\). When \([\mathbf {x}]=T_J[\mathbf {y}]\), then \(\langle T_{RD},[\mathbf {y}] \;|\; T_{LD},[\mathbf {x}]\rangle = 1\), which implies that \(T_{LD} = T_{RD}\) (the inner product is at most \(|T_{LD}\cap T_{RD}|/|T_{LD}|\) in absolute value). Denoting the common defect subspace by N, it follows that \(\delta _{[\mathbf {x}],[\mathbf {y}]} = \langle N,[\mathbf {y}] | N,[\mathbf {x}]\rangle = \delta _{[\mathbf {x}],T_J[\mathbf {y}]}\), so \(T_J\) is trivial. \(\square \)

Finally, we can equip the set of subspaces \(\Sigma _{t,t}(d)\) with a semigroup structure, denoted by \(\circ \), such that the assignment \(T \mapsto R(T)\) becomes a representation, i.e.,

$$\begin{aligned} R(T_1) R(T_2) = |N_1 \cap N_2|^n \, R(T_1\circ T_2) = d^{n\dim (N_1 \cap N_2)} \, R(T_1\circ T_2). \end{aligned}$$
(4.24)

First, if \(T_1\) or \(T_2\) are associated to a stochastic isometry in \(O_t(d)\), then we can simply define \(T_1 \circ T_2\) as in Definition 4.15. In particular, the diagonal subspace is the identity element.

Next, consider the case that \(T_1\) and \(T_2\) are of the form (4.21), associated to defect subspaces \(N_1\) and \(N_2\). Then we may define \(T_1 \circ T_2\) as the Lagrangian stochastic subspace T with

$$\begin{aligned} \begin{aligned} T_{LD}&= (N_1 + N_2) \cap N_1^\perp = N_1^\perp \cap N_2 + N_1, \\ T_{RD}&= (N_1 + N_2) \cap N_2^\perp = N_1\cap N_2^\perp + N_2, \\ T_J&:T_{RD}^\perp /T_{RD} \rightarrow T_{LD}^\perp /T_{LD}, \quad [\mathbf {y}]\mapsto [\mathbf {x}] \end{aligned} \end{aligned}$$
(4.25)

where \(\mathbf {x}\) is such that \(\mathbf {x} - \mathbf {y} \in N_1 + N_2\).

Lemma 4.25

The data in (4.25) defines a subspace in \(T\in \Sigma _{t,t}(d)\) such that Eq. (4.24) holds.

Proof

We first verify that \(T_{LD}\) is a defect subspace. Thus, let \(\mathbf {n}_1+\mathbf {n}_2\in N_1^\perp \), where \(\mathbf {n}_1\in N_1\) and \(\mathbf {n}_2\in N_2\). Then,

$$\begin{aligned} q(\mathbf {n}_1+\mathbf {n}_2) = q(\mathbf {n}_1) + q(\mathbf {n}_2) + 2\mathbf {n}_1\cdot \mathbf {n}_2 = 2 \mathbf {n}_1\cdot \mathbf {n}_2 = 0\pmod D, \end{aligned}$$

The second step holds since \(N_1\) and \(N_2\) are defect subspaces, so \(q(\mathbf {n}_1) = q(\mathbf {n}_2) = 0\pmod D\), and the third step holds since \(\mathbf {n}_2 \in N_1 + N_1^\perp = N_1^\perp \), so \(\mathbf {n}_1\cdot \mathbf {n}_2=0\pmod d\). Moreover, \(\mathbf {1}_t \in N_1^\perp \cap N_2^\perp \subseteq T_{LD}^\perp \), so \(T_{LD}\) is also co-stochastic. Similarly, one can check that \(T_{RD}\) is defect subspace.

Next, we verify that \(T_J\) is a well-defined defect space isomorphism. Note that

$$\begin{aligned} T_{LD}^\perp&= (N_1 + N_2^\perp ) \cap N_1^\perp = N_1^\perp \cap N_2^\perp + N_1, \\ T_{RD}^\perp&= (N_1^\perp + N_2) \cap N_2^\perp = N_1^\perp \cap N_2^\perp + N_2. \end{aligned}$$

which shows that for every \(\mathbf {y}\in T_{RD}^\perp \) there exists \(\mathbf {x}\in T_{LD}^\perp \) such that \(\mathbf {x}-\mathbf {y}\in N_1 + N_2\). The same holds vice versa, so the map

$$\begin{aligned} T_{RD}^\perp \rightarrow T_{LD}^\perp /T_{LD}, \quad \mathbf {y} \mapsto [\mathbf {x}] \end{aligned}$$
(4.26)

is surjective provided it is well-defined. Assume that \(\mathbf {x},\mathbf {x}'\in T_{LD}^\perp \) are two vectors such that \(\mathbf {x}-\mathbf {y}, \mathbf {x}'-\mathbf {y}\in N_1 + N_2\). Then, \(\mathbf {x} - \mathbf {x}' \in T_{LD}^\perp \cap (N_1 + N_2) = T_{LD}\), which shows that (4.26) is indeed well-defined. Note that its kernel is given by \(T_{RD}^\perp \cap (N_1 + N_2) = T_{RD}\). Thus, the induced map, which is precisely \(T_J\) from (4.25), is a well-defined invertible linear map. We still need to verify that \(T_J\) is an isometry and stochastic. The latter is clear, since \(\mathbf {1}_t - \mathbf {1}_t = 0 \in N_1 + N_2\). For the former, consider \([\mathbf {x}]\in T_{LD}^\perp /T_{LD}\) and \([\mathbf {y}]\in T_{RD}^\perp /T_{RD}\) such that \(\mathbf {y} - \mathbf {x} = \mathbf {n}_1 + \mathbf {n}_2\), where \(\mathbf {n}_1\in N_1\) and \(\mathbf {n}_2\in N_2\). Without loss of generality, we may assume that \(\mathbf {x}, \mathbf {y} \in N_1^\perp \cap N_2^\perp \), so in particular \(\mathbf {x} - \mathbf {y} \perp \mathbf {y}\) and \(\mathbf {n}_2 \in N_1^\perp \). Since \(N_1\) and \(N_2\) are totally isotropic, it follows that \(q(\mathbf {x} - \mathbf {y}) = 2\mathbf {n}_1\cdot \mathbf {n}_2 = 0 \pmod D\) and \(q(\mathbf {x}) = q(\mathbf {y}) + q(\mathbf {x} - \mathbf {y}) + 2\mathbf {y}\cdot (\mathbf {x}-\mathbf {y})=0\pmod D\).

Finally, we will establish that Eq. (4.24) holds with \(T_1 \circ T_2 = T\). It sufices to prove the claim for \(n=1\):

$$\begin{aligned} r(T_1) r(T_2)&= \sum _{\mathbf {x}\in N_1^\perp } \sum _{\mathbf {y}\in N_2^\perp } \bigl |[\mathbf {x}] \cap [\mathbf {y}]\bigr | \, \vert \mathbf {x}\rangle \langle \mathbf {y}\vert = |N_1 \cap N_2| \sum _{\mathbf {x}\in N_1^\perp } \sum _{\mathbf {y}\in N_2^\perp } \delta _{\mathbf {y}-\mathbf {x} \in N_1 + N_2} \vert \mathbf {x}\rangle \langle \mathbf {y}\vert \\&= |N_1 \cap N_2| \sum _{\mathbf {y}\in T_{RD}^\perp } \sum _{\mathbf {x} \in T_{LD}^\perp } \delta _{\mathbf {y}-\mathbf {x} \in N_1 + N_2} \vert \mathbf {x}\rangle \langle \mathbf {y}\vert = |N_1 \cap N_2| \sum _{\mathbf {y}\in T_{RD}^\perp } \sum _{\mathbf {x}\in T_J[\mathbf {y}]} \vert \mathbf {x}\rangle \langle \mathbf {y}\vert \\&= |N_1 \cap N_2| \, r(T). \end{aligned}$$

In the third step, we used that for \(\mathbf {x}\in N_1^\perp \) and \(\mathbf {y}\in N_2^\perp \), the condition that \(\mathbf {y}-\mathbf {x}\in N_1+N_2\) implies that \(\mathbf {x}\in T_{LD}^\perp \) and \(\mathbf {y} \in T_{RD}^\perp \), and in the fourth step we used the definition of \(T_J\). \(\square \)

Finally, if \(T_1\) and \(T_2\) are arbitrary subspaces in \(\Sigma _{t,t}(d)\) then we can always left and right multiply \(T_1\) and \(T_2\) by suitable stochastic isometries, thereby reducing to the preceding two cases (cf. Eq. (4.18)). The semigroup structure is highly useful for calculations (see Sect. 4.3 below and [Dam18]). We believe that the projections exhibited by Theorem 4.24 and the semigroup structure of Eq. (4.24) will be instrumental in understanding the fine-grained decomposition of \({\mathcal {H}}_n^{\otimes t}\) into irreducible representations of \({{\,\mathrm{Cliff}\,}}(n,d)\) and \(O_t(d)\), generalizing the results discussed below. First results in this direction will be reported in [MMG20], with a full analysis being a direction for future work.

4.3 Examples

It is instructive to compute the commutant for small values of t. One can verify that every subspace in \(\Sigma _{2,2}(d)\), as well as in \(\Sigma _{3,3}(2)\) corresponds to a permutation. That is, in this case, \(\Sigma _{t,t}(d) = O_t(d) = S_t\). This is consistent with the fact that the Clifford group is always a unitary 2-design, and even a 3-design in the case of qubits [Zhu15].

For certain larger values of d and t, it is still true that \(\Sigma _{t,t}(d) = O_t(d)\), e.g., for \(t=3\) and \(d\equiv 2\pmod 3\) [NW16]. In this case, the double commutant theorem implies that we have a proper duality akin to Schur–Weyl duality:

$$\begin{aligned} (({\mathbb {C}}^d)^{\otimes n})^{\otimes t} = \bigoplus _\lambda V_{{{\,\mathrm{Cliff}\,}}(n,d),\lambda } \otimes V_{O_t(d),\lambda }, \end{aligned}$$
(4.27)

where the \(V_{{{\,\mathrm{Cliff}\,}}(n,d),\lambda }\) and \(V_{O_t(d),\lambda }\) are pairwise inequivalent irreducible representations of \({{\,\mathrm{Cliff}\,}}(n,d)\) and of \(O_t(d)\), respectively. It would be interesting to identify these representations further. In fact, it would be more appropriate to call Eq. (4.27) a form of a Howe duality, of which an example are well-known dualities between metaplectic and orthogonal groups.

In general, however, \(O_t(d)\) is a proper subset of \(\Sigma _{t,t}(d)\), and it is an open problem to obtain a complete duality theory in positive characteristic [How73, GH16]. We now discuss some explicit examples.

Example 4.26

(\(d=3\), \(t=3\)) In this case, \({\mathcal {O}}_3(3) = S_3\), and we have that [NW16]

$$\begin{aligned} \Sigma _{3,3}(3) = S_3 \cup S_3 \left[ \begin{array}{ccc|ccc} 1&{}-1&{}0 &{} 1&{}-1&{}0 \\ \hline 0&{}0&{}0 &{} 1&{}1&{}1 \\ \hline 1&{}1&{}1 &{} 0&{}0&{}0 \end{array}\right] S_3 \end{aligned}$$
(4.28)

where we identify the matrix with its row space, a Lagrangian stochastic subspace T. The double coset of T contains only two elements, T and (12)T. In total, \(\Sigma _{3,3}(3)\) contains \(6+2=8\) elements, which is in agreement with Theorem 4.10.

Next, we note that T corresponds to a CSS code as in Eq. (4.21) and Theorem 4.24, with defect subspace N spanned by the all-ones vectors \(\mathbf {1}_3\). Thus, \(R(T) = 3^n P\), where

$$\begin{aligned} P :=P_{{{\,\mathrm{CSS}\,}}}(N) = p^{\otimes n}, \quad p :=\sum _{x=0}^2 \left( \frac{1}{\sqrt{3}} \sum _{y=0}^2 \vert x+y,y{-}x,y\rangle \right) \left( \frac{1}{\sqrt{3}} \sum _{z=0}^2 \langle x{+}z,z{-}x,z\vert \right) \end{aligned}$$
(4.29)

is a projector of rank \(3^n\) (Eq. (4.23)).

It is now straightforward to derive the decomposition of \((({\mathbb {C}}^3)^{\otimes n})^{\otimes 3}\) into irreducible representations of the Clifford group (for \(n\ge 2\)). We start with Schur–Weyl duality, which asserts that

$$\begin{aligned} (({\mathbb {C}}^3)^{\otimes n})^{\otimes 3}&= \bigoplus _{\lambda \vdash 3} V_{U(3^n),\lambda } \otimes V_{S_3,\lambda } \end{aligned}$$
(4.30)
$$\begin{aligned}&= {{\,\mathrm{Sym}\,}}^3(({\mathbb {C}}^3)^{\otimes n}) \;\oplus \; U^{3^n}_{(2,1)} \otimes V_{U(3^n),(2,1)} \;\oplus \; {{\,\mathrm{Alt}\,}}^3(({\mathbb {C}}^3)^{\otimes n}), \end{aligned}$$
(4.31)

where \(\lambda \) runs over all partitions of 3. By Eq. (4.28), the commutant is generated by \(S_3\) and the projection P. Since P commutes with all permutations, it follows that

$$\begin{aligned}&P_+ :=\Pi ^{\text {sym}}_3 P \Pi ^{\text {sym}}_3 = \frac{3^{-n}}{2} \left( R(T) + R((1 2) T) \right) , \\&P_- :=\Pi ^{\text {alt}}_3 P \Pi ^{\text {alt}}_3 = \frac{3^{-n}}{2} \left( R(T) + R((1 2) T) \right) \end{aligned}$$

are orthogonal projections onto subrepresentations of the Clifford group. We can compute their dimensions readily by using the formula \({{\,\mathrm{tr}\,}}[R(S)] = d^{n \dim (S \cap \Delta )}\):

$$\begin{aligned} \dim W_\pm = {{\,\mathrm{tr}\,}}[P_\pm ] = \frac{3^n \pm 1}{2} \end{aligned}$$
(4.32)

Thus we can decompose the symmetric and anti-symmetric subspaces further into four subrepresentations:

$$\begin{aligned} {{\,\mathrm{Sym}\,}}^3(({\mathbb {C}}^3)^{\otimes n})&\cong W_+ \oplus W_+^\perp , \\ {{\,\mathrm{Alt}\,}}^3(({\mathbb {C}}^3)^{\otimes n})&\cong W_- \oplus W_-^\perp . \end{aligned}$$

Since the commutant has dimension \(|\Sigma _{3,3}(3)|=8\), these four representations along with \(V_{U(3^n),(2,1)}\) (which appears twice in (4.30)) are necessarily irreducible and pairwise inequivalent. We have thus fully decomposed \((({\mathbb {C}}^3)^{\otimes n})^{\otimes 3}\) into irreducible representations of \({{\,\mathrm{Cliff}\,}}(n,3) \times S_3\).

Next, we discuss some multi-qubit examples.

Example 4.27

(\(d=2\), \(t=4\)). As before, we find that \({\mathcal {O}}_4(2)=S_4\). In addition to the \(4!=24\) permutation subspaces, there exist 6 more Lagrangian subspaces in \(\Sigma _{4,4}(2)\)—making a total of 30, which is known to be the dimension of the commutant of the multi-qubit Clifford group for \(n\ge 3\) [Zhu15, (10)]. We can decompose \(\Sigma _{4,4}(2)\) into two double cosets in a form that is completely analogous to Eq. (4.28):

$$\begin{aligned} \Sigma _{4,4}(2) = S_4 \cup S_4 \left[ \begin{array}{cccc|cccc} 1&{}0&{}0&{}1 &{} 1&{}0&{}0&{}1 \\ 0&{}1&{}0&{}1 &{} 0&{}1&{}0&{}1 \\ \hline 0&{}0&{}0&{}0 &{} 1&{}1&{}1&{}1 \\ \hline 1&{}1&{}1&{}1 &{} 0&{}0&{}0&{}0 \end{array}\right] S_4 \end{aligned}$$
(4.33)

The given matrix is the generator matrix of a Lagrangian subspace which we denote by \(T_4\). Similarly to above, the operator \(R(T_4)\) is proportional to a projector onto a CSS code, with defect subspace spanned by the all-ones vector \(\mathbf {1}_4\). This projector is given by Eq. (3.6), and it can be used to decompose \((({\mathbb {C}}^2)^{\otimes n})^{\otimes 4}\) into irreducible representations of the Clifford group, as explained in [ZKGG16].

Example 4.28

(\(d=2\), \(t=5\)). Likewise, for \(t=5\), it is not hard to see that (cf. [Dam18])

$$\begin{aligned} \Sigma _{5,5}(2) = S_5 \cup S_5 \left[ \begin{array}{ccccc|ccccc} 1&{}0&{}0&{}1&{}0 &{} 1&{}0&{}0&{}1&{}0 \\ 0&{}1&{}0&{}1&{}0 &{} 0&{}1&{}0&{}1&{}0 \\ 0&{}0&{}1&{}1&{}0 &{} 0&{}0&{}1&{}1&{}0 \\ 0&{}0&{}0&{}0&{}1 &{} 0&{}0&{}0&{}0&{}1 \\ \hline 0&{}0&{}0&{}0&{}0 &{} 1&{}1&{}1&{}1&{}0 \\ \hline 1&{}1&{}1&{}1&{}0 &{} 0&{}0&{}0&{}0&{}0 \end{array}\right] S_5. \end{aligned}$$
(4.34)

The displayed matrix corresponds to a subspace \(T_5\) of the form Eq. (4.21), with defect subspace spanned by the vector (1, 1, 1, 1, 0), and the operator \(R(T_5)\) is proportional to a projector onto a CSS code. Indeed, we have

$$\begin{aligned} R(T_5) = R(T_4) \otimes I_2^{\otimes n}. \end{aligned}$$

We now discuss some interesting elements in the groups \(O_t(d)\). For qubits, we have the class of anti-permutations introduced previously in Eq. (1.4).

Definition 4.29

(Anti-permutation). Let \(\pi \in S_t\). We define the anti-permutation \({\bar{\pi }}\) as the binary complement of the corresponding permutation matrix. Formally, it is the \(t\times t\)-matrix

$$\begin{aligned} {\bar{\pi }} = \mathbf {1}_t \mathbf {1}_t^T - \pi \end{aligned}$$

with entries in \({\mathbb {F}}_2\), where we identify \(\pi \) with the corresponding permutation matrix.

Lemma 4.30

Let \(\pi \in S_t\). If \(t\equiv 2\pmod 4\) then \({\bar{\pi }} \in O_t(d)\).

Proof

By Remark 4.12, it suffices to check that \({\bar{\pi }}\) is orthogonal and that \(q(\mathbf {x})=1\) for each column. The latter holds since each column of \({\bar{\pi }}\) contains \(t-1\equiv 1\pmod 4\) ones. For the former,

$$\begin{aligned} {\bar{\pi }}^T {\bar{\pi }} = (\mathbf {1}_t \mathbf {1}_t^T - \pi ^T) (\mathbf {1}_t \mathbf {1}_t^T - \pi ) = (t-2) \mathbf {1}_t \mathbf {1}_t^T + I \equiv I \pmod 2, \end{aligned}$$

where we used that ordinary permutation matrices are orthogonal and stochastic, as well as that t is even. \(\square \)

Remark 4.31

More generally, the entrywise binary complement maps any \(A\in O_t(2)\) to an element \({\bar{A}} \in O_t(2)\) provided that the rows of A each have Hamming weight w such that \(t\equiv 2w\pmod 4\).

For \(t\ge 6\), the anti-permutations are distinct from the permutations, so in particular \(O_t(2) \supsetneq S_t\). (For \(t=2\), the two sets coincide.)

Example 4.32

(\(d=2\), \(t=6\)). The anti-identity \({\bar{\mathbb {1}}}\in {\mathcal {O}}_6(2)\) and the corresponding subspace \(T_{{\bar{\mathbb {1}}}} \subseteq {\mathbb {Z}}_2^6\oplus {\mathbb {Z}}_2^6\) are given by (cf. Eqs. (1.4) and (3.13)).

$$\begin{aligned} {\bar{\mathbb {1}}} = \begin{pmatrix} 0&{}1&{}1&{}1&{}1&{}1 \\ 1&{}0&{}1&{}1&{}1&{}1 \\ 1&{}1&{}0&{}1&{}1&{}1 \\ 1&{}1&{}1&{}0&{}1&{}1 \\ 1&{}1&{}1&{}1&{}0&{}1 \\ 1&{}1&{}1&{}1&{}1&{}0 \end{pmatrix}, \quad T_{{\bar{\mathbb {1}}}} = \left[ \begin{array}{cccccc|cccccc} 0&{}1&{}1&{}1&{}1&{}1 &{} 1&{}0&{}0&{}0&{}0&{}0\\ 1&{}0&{}1&{}1&{}1&{}1 &{} 0&{}1&{}0&{}0&{}0&{}0 \\ 1&{}1&{}0&{}1&{}1&{}1 &{} 0&{}0&{}1&{}0&{}0&{}0 \\ 1&{}1&{}1&{}0&{}1&{}1 &{} 0&{}0&{}0&{}1&{}0&{}0 \\ 1&{}1&{}1&{}1&{}0&{}1 &{} 0&{}0&{}0&{}0&{}1&{}0 \\ 1&{}1&{}1&{}1&{}1&{}0 &{} 0&{}0&{}0&{}0&{}0&{}1 \end{array}\right] . \end{aligned}$$

The anti-permutations admit several possible generalizations to odd primes d. One class of generalizations is given as follows. For \(\pi \in S_t\) with \(d\not \mid t\), define

$$\begin{aligned} {\bar{\pi }} = 2 t^{-1} \mathbf {1}_t \mathbf {1}_t^T - \pi , \end{aligned}$$
(4.35)

where \(t^{-1}\) denotes the multiplicative inverse of t in \({\mathbb {F}}_d\). It is easy to verify that \({\bar{\pi }} \in O_t(d)\). Moreover, \({\bar{\pi }}\) is the only nontrivial linear combination of \(\mathbf {1}_t \mathbf {1}_t^T\) and \(\pi \) with this property.

Another class of generalizations is given by the formula in Eq. (3.17). Let \(\mathbf {p}\in {\mathbb {F}}_d^t\) be vector with entries in \(\{\pm 1\}\) that is ‘balanced’, i.e., \(\mathbf {p} \cdot \mathbf {1}_t = 0\) (this requires that t is even). If \(d \not \mid t\) and \(\pi \in S_t\) is a permutation that stabilizes \(\mathbf {p}\) up to a sign, i.e., \(\pi \mathbf {p} = \pm \mathbf {p}\), then

$$\begin{aligned} {\tilde{\pi }} = \pi \mp 2 t^{-1} \mathbf {p} \mathbf {p}^T \end{aligned}$$
(4.36)

is an element in \(O_t(d)\). In particular, this yields a large family of ‘anti-identities’ for odd d.

Another non-trivial example of a stochastic isometry can be constructed from the adjacency matrix A of the edge-vertex graph of the icosahedron. The icosahedron has 12 vertices, so A is a \(12 \times 12\) binary matrix. Any two vertices share either zero or two neighbors, which implies that A is orthogonal. Moreover, each vertex has 5 neighbors, which implies \(q(\mathbf {x})=1\) for each column \(\mathbf {x}\) of A. By Remark 4.12, it follows that \(A\in O_{12}(2)\). The space \(T_{{\bar{A}}}\) generated by the element-wise complement of A is just the extended Golay code \({\mathcal {G}}_{24}\). The latter plays an important role in the invariant theory of the Clifford group as detailed in [NRS06]. Note, however, that unlike A and \(T_A\), it is not the case that \({\bar{A}} \in O_{12}(2)\) or \(T_{{\bar{A}}}\in \Sigma _{12,12}(2)\). Working out the precise connection between \(T_A\), the extended Golay code, and their respective roles in the representation theory of the Clifford group is an interesting problem we leave open. Likewise, we leave open the question of whether R(A) can be given a physical interpretation, as was the case for the anti-identity.

5 Statistical Properties of Stabilizer States

In this section we discuss the statistical properties of the stabilizer states. We use the techniques that we developed in the last section to prove an explicit formula for the t-th moment of random stabilizer states, which vastly generalizes previous results in the quantum information literature [Zhu15, KG15, Web16, ZKGG16, HWW16]. Throughout this section, d is assumed to be a prime.

5.1 Moments of random stabilizer states

We start by studying the operator-valued t-th moment of the uniform distribution over all stabilizer states in \(({\mathbb {C}}^d)^{\otimes n}\):

$$\begin{aligned} {\mathbb {E}}\left[ \vert S\rangle \langle S\vert ^{\otimes t} \right] :=\frac{1}{|{{\,\mathrm{Stab}\,}}(n,d) |} \sum _{\vert S\rangle \langle S\vert \in {{\,\mathrm{Stab}\,}}(n,d)} \vert S\rangle \langle S\vert ^{\otimes t} \end{aligned}$$
(5.1)

Clearly this operator can be used to calculate the average value of any polynomial of degree t in the coefficients of the wavefunction of a random stabilizer state.

Note that the operator \({\mathbb {E}}[\vert S\rangle \langle S\vert ^{\otimes t}]\) is invariant under conjugation by Clifford operators. This is because the set of stabilizer states is a single orbit of the Clifford group. Thus \({\mathbb {E}}[\vert S\rangle \langle S\vert ^{\otimes t}]\) is in the commutant of the Clifford group and, assuming \(n\ge t-1\), can be written in terms of the basis R(T) from Theorem 4.3,

$$\begin{aligned} {\mathbb {E}}\left[ \vert S\rangle \langle S\vert ^{\otimes t} \right] =\sum _{T\in \Sigma _{t,t}(d)}{\gamma _T \, R(T)}, \end{aligned}$$
(5.2)

for certain coefficients \(\gamma _T\in {\mathbb {C}}\). In this section, we will show that these coefficients are all equal and establish an explicit formula for the t-th moment of a random stabilizer state which holds for all values of t and n. We start with some useful lemmas.

Remark 5.1

(Sum of traces of the R(T)). Recall that in order to establish Theorem 4.10, we determined the cardinality of the set \(\Sigma ^\ell _{t,t}(d)\), whose elements are the subspaces \(T\in \Sigma _{t,t}(d)\) with \(\dim (T\cap \Delta )=t-\ell \). The significance of the parameter \(\ell \) is that

$$\begin{aligned} {{\,\mathrm{tr}\,}}R(T) = \left( {{\,\mathrm{tr}\,}}r(T)\right) ^n = d^{(t-\ell ) n} \end{aligned}$$

for every subspace \(T\in \Sigma _{t,t}^\ell (d)\). Thus, we can e.g. compute the sum of the traces of all R(T) by using Eq. (4.8) and the Gaussian binomial formula Eq. (4.3):

$$\begin{aligned} \sum _{T\in \Sigma _{t,t}(d)} {{\,\mathrm{tr}\,}}R(T) = \sum _{\ell =0}^{t-1} \bigl |\Sigma _{t,t}^\ell (d)\bigr |d^{(t-\ell )n} = d^{nt} \prod _{k=0}^{t-2} (1+d^{k-n}) = d^{n} \prod _{k=0}^{t-2} (d^k+d^{n}). \end{aligned}$$

This number can be expressed in terms of the q-Pochhammer symbol as

$$\begin{aligned} \sum _{T\in \Sigma _{t,t}(d)} {{\,\mathrm{tr}\,}}R(T) = d^{nt} (-d^{-n}; d)_{t-1}. \end{aligned}$$

For \(n=0\), we recover the cardinality of \(\Sigma _{t,t}(d)\), in agreement with Theorem 4.10.

Next, we prove a formula that relates moments of stabilizer states for different numbers of qudits.

Lemma 5.2

Let \(N\ge n>0\). Then:

$$\begin{aligned}&\left( I^{\otimes nt} \otimes \langle 0\vert ^{\otimes (N-n)t} \right) {\mathbb {E}}_{{{\,\mathrm{Stab}\,}}(N,d)} \left[ \vert S\rangle \langle S\vert ^{\otimes t} \right] \\ {}&\left( I^{\otimes nt} \otimes \vert 0\rangle ^{\otimes (N-n)t} \right) \;\propto \; {\mathbb {E}}_{{{\,\mathrm{Stab}\,}}(n,d)} \left[ \vert S\rangle \langle S\vert ^{\otimes t} \right] , \end{aligned}$$

and both operators are nonzero.

Proof

If \(\vert S\rangle \in {{\,\mathrm{Stab}\,}}(N,d)\) is a stabilizer state then the partial projection \(\left( I \otimes \langle 0\vert ^{\otimes (N-n)}\right) \vert S\rangle \) is either zero or proportional to a stabilizer state in \({{\,\mathrm{Stab}\,}}(n,d)\) (see, e.g., [HNQ+16, App. G]). Thus, there exist coefficients \(\alpha _{S'}\) for \(S'\in {{\,\mathrm{Stab}\,}}(n,d)\) such that

$$\begin{aligned}&\left( I^{\otimes nt} \otimes \langle 0\vert ^{\otimes (N-n)t} \right) {\mathbb {E}}_{{{\,\mathrm{Stab}\,}}(N,d)} \left[ \vert S\rangle \langle S\vert ^{\otimes t} \right] \left( I^{\otimes nt} \otimes \vert 0\rangle ^{\otimes (N-n)t} \right) \\ {}&\quad = \sum _{S'\in {{\,\mathrm{Stab}\,}}(n,d)} \alpha _{S'} \vert S'\rangle \langle S'\vert ^{\otimes t} \end{aligned}$$

It is clear that the left-hand side operator is nonzero and invariant under conjugation by \(U^{\otimes t}\) for any Clifford unitary \(U\in {{\,\mathrm{Cliff}\,}}(n,d)\). Thus, we can replace the right-hand side by its Clifford average, which is plainly proportional to \({\mathbb {E}}_{{{\,\mathrm{Stab}\,}}(n,d)} \left[ \vert S\rangle \langle S\vert ^{\otimes t} \right] \). \(\square \)

Theorem 5.3

(t-th moment). Let \(n,t\ge 1\). Then the t-th moment of a random stabilizer state in \(({\mathbb {C}}^d)^{\otimes n}\) is given by the formula

$$\begin{aligned} {\mathbb {E}}_{{{\,\mathrm{Stab}\,}}(n,d)} \left[ \vert S\rangle \langle S\vert ^{\otimes t} \right] =\frac{1}{Z_{n,d,t}} \sum _{T\in \Sigma _{t,t}(d)} R(T), \end{aligned}$$
(5.3)

where \(Z_{n,d,t}=d^n\prod _{k=0}^{t-2} (d^k+d^n)=d^{nt} (-d^{-n}; d)_{t-1}\).

Proof

It suffices to argue that the left-hand side and right-hand side are proportional, since the formula for the proportionality constant \(Z_{n,d,t}\) follows immediately from Remark 5.1 and comparing traces. Fixing d and t, we will proceed in two steps. First, we will argue that there exist \(\alpha _n\in {\mathbb {C}}\) and \(\beta _T\in {\mathbb {C}}\) for each \(T\in \Sigma _{t,t}(d)\) such that

$$\begin{aligned} {\mathbb {E}}_{{{\,\mathrm{Stab}\,}}(n,d)} \left[ \vert S\rangle \langle S\vert ^{\otimes t} \right] = \alpha _n \sum _{T\in \Sigma _{t,t}(d)} \beta _T \, r(T)^{\otimes n}. \end{aligned}$$
(5.4)

We will show this first for \(n\ge t-1\) and then for all n. Afterwards, we will find that the \(\beta _T\) are necessarily equal, which as just discussed implies the claim.

Let us first assume that \(n\ge t-1\). By Theorem 4.3, there exist coefficients \(\gamma _{n,T}\in {\mathbb {C}}\) such that

$$\begin{aligned} {\mathbb {E}}_{{{\,\mathrm{Stab}\,}}(n,d)} \left[ \vert S\rangle \langle S\vert ^{\otimes t} \right] = \sum _{T\in \Sigma _{t,t}(d)} \gamma _{n,T} \, r(T)^{\otimes n}, \end{aligned}$$
(5.5)

since the left-hand side commutes with arbitrary t-th tensor powers of Clifford unitaries. It follows that, for every \(N\ge n\),

$$\begin{aligned} \begin{aligned}&\left( I^{\otimes nt} \otimes \langle 0\vert ^{\otimes (N-n)t} \right) {\mathbb {E}}_{{{\,\mathrm{Stab}\,}}(N,d)} \left[ \vert S\rangle \langle S\vert ^{\otimes t} \right] \left( I^{\otimes nt} \otimes \vert 0\rangle ^{\otimes (N-n)t} \right) \\&\quad = \sum _{T\in \Sigma _{t,t}(d)} \gamma _{N,T} \, r(T)^{\otimes n} \langle 0^{\otimes t} | r(T) | 0^{\otimes t}\rangle ^{N-n} = \sum _{T\in \Sigma _{t,t}(d)} \gamma _{N,T} \, r(T)^{\otimes n}, \end{aligned} \end{aligned}$$
(5.6)

since \(\mathbf {0}\in T\) for every subspace T. From Lemma 5.2 we know that Eqs. (5.5) and (5.6) are proportional and nonzero. Since the operators \(r(T)^{\otimes n}\) are also linearly independent for \(n\ge t-1\), it follows that there exist \(\alpha _n\) and \(\beta _T\) such that \(\gamma _{T,n} = \alpha _n \beta _T\) for all \(n\ge t-1\) (e.g., we can choose \(\beta _T :=\gamma _{T,t-1}\)). Thus, we have established Eq. (5.4) for \(n\ge t-1\). To extend its validity to all values of n, we observe that Eq. (5.6) holds also when \(N\ge t-1>n\). Together with Lemma 5.2, we find that, indeed,

$$\begin{aligned} {\mathbb {E}}_{{{\,\mathrm{Stab}\,}}(n,d)} \left[ \vert S\rangle \langle S\vert ^{\otimes t} \right] \propto \sum _{T\in \Sigma _{t,t}(d)} \gamma _{N,T} \, r(T)^{\otimes n} \propto \sum _{T\in \Sigma _{t,t}(d)} \beta _T \, r(T)^{\otimes n}, \end{aligned}$$

which shows that there exist constants \(\alpha _n\) and \(\beta _T\) such that Eq. (5.4) holds for all values of n.

We will now argue that, in this case, the \(\beta _T\) are necessarily all equal. For this, we compute the expectation value of an operator \(R(T)^\dagger = r(T)^{\otimes n,\dagger }\). On the one hand, by Eq. (4.10),

$$\begin{aligned} {{\,\mathrm{tr}\,}}\left[ R(T)^\dagger {\mathbb {E}}_{{{\,\mathrm{Stab}\,}}(n,d)} \left[ \vert S\rangle \langle S\vert ^{\otimes t} \right] \right] = {\mathbb {E}}_{{{\,\mathrm{Stab}\,}}(n,d)} \langle S^{\otimes t} | R(T)^\dagger | S^{\otimes t}\rangle = 1. \end{aligned}$$

On the other hand,

$$\begin{aligned}&{{\,\mathrm{tr}\,}}\left[ R(T)^\dagger {\mathbb {E}}_{{{\,\mathrm{Stab}\,}}(n,d)} \left[ \vert S\rangle \langle S\vert ^{\otimes t} \right] \right] \\&\quad = \alpha _n \sum _{T'\in \Sigma _{t,t}(d)} \beta _{T'} \, \left( {{\,\mathrm{tr}\,}}r(T)^\dagger r(T') \right) ^n\\&\quad = \alpha _n \sum _{T'\in \Sigma _{t,t}(d)} \beta _{T'} \, d^{n \dim (T \cap T')} = \alpha _n d^{nt} \left( \beta _T + O(d^{-n}) \right) \end{aligned}$$

in the limit of large n. Thus, all \(\beta _T\) must be equal, and the statement of the theorem follows. \(\square \)

Remark 5.4

Theorem 5.3 is reminiscent of the following average formula for the tensor power of a Haar-random pure state \(\psi \) in \({\mathbb {C}}^D\), which follows from Schur’s lemma and Schur–Weyl duality:

$$\begin{aligned} {\mathbb {E}}_{\psi \text { Haar}} \left[ \vert \psi \rangle \langle \psi \vert ^{\otimes t} \right] ] = \frac{1}{\prod _{k=0}^{t-1} (k+D)} \sum _{\pi \in S_t} R(\pi ). \end{aligned}$$
(5.7)

Remark 5.5

When d is an odd prime, \(\Sigma _{t,t}(d)=S_t\) for \(t\le 2\), but not for \(t\ge 3\). Thus Eqs. (5.3) and (5.7) match for \(t\le 2\) and deviate for \(t\ge 3\). Since the operators R(T) are linearly independent for sufficiently large n, this shows that stabilizer states in odd prime dimension are 2-designs, but not 3-designs or higher (provided \(n\ge 2\)). Similarly, for \(d=2\), \(\Sigma _{t,t}(2)=S_t\) for \(t\le 3\), but not or \(t\ge 4\), which shows that multiqubit stabilizer states form 3-designs, but not 4-designs or higher [KG15] (provided \(n\ge 3\)).

Remarkably, the theory developed in this section allows us to design complex projective t-designs for any order t from the Clifford group orbits of a finite number of fiducial states. We explain this in Sect. 6 below.

5.2 Minimal projections for stabilizer testing

We now return to the problem of stabilizer testing; we revisit our solution from Sect. 3 and characterize minimal stabilizer tests with perfect completeness.

In Sect. 3, we found that perfectly complete stabilizer tests were in any local dimension d given by the following accepting POVM element on \(t=2s\) copies of \(({\mathbb {C}}^d)^{\otimes n}\),

$$\begin{aligned} \Pi _{s,\text {accept}} = \frac{1}{2} \left( I + V_s \right) , \end{aligned}$$
(5.8)

where \(V_s\) is the Hermitian unitary defined in Eq. (3.15) and \((d,s)=1\). This means that \(V_s\) is a unitary operator with the property that 2s-th tensor powers of every pure stabilizer state \(\vert S\rangle \) are contained in its \(+1\) eigenspace:

$$\begin{aligned} V_s \vert S\rangle ^{\otimes 2s} =\vert S\rangle ^{\otimes 2s} \end{aligned}$$
(5.9)

for any pure stabilizer states \(\vert S\rangle \). Our soundness result implies that, conversely, these are the only tensor power states with this property.

Note that \(V_s\) is an operator in the commutant of the Clifford action. This is immediate by comparing Eqs. (3.18) and (4.12), which also shows that \(V_s\) is precisely the operator \(R({\tilde{\mathbb {1}}})\) associated with the ‘anti-identity’ \({\tilde{\mathbb {1}}}\) defined in Eq. (3.17)!

In fact, any R(O) stabilizes the t-th tensor powers of stabilizer states: For all \(O\in O_t(d)\),

$$\begin{aligned} R(O) \vert S\rangle ^{\otimes t} = \vert S\rangle ^{\otimes t}. \end{aligned}$$
(5.10)

We proved this in Eq. (4.13). As we just saw, \(V_s\) is such an operator, so Eq. (5.10) generalizes Eq. (5.9).

Note that, since \({\tilde{\mathbb {1}}}\) squares to the identity, it generates a subgroup of \(O_t(d)\) that contains two elements: \(\{\mathbb {1}, {\tilde{\mathbb {1}}}\}\). We can thus interpret the projector (5.8) as the projector onto the invariant subspace for the action of this subgroup. This suggest that we look more generally at the invariant subspaces associated with subgroups of \(O_t(d)\). Larger subgroups corresponds to projectors onto smaller invariant subspaces. In particular, the minimal projector corresponds to the full group \(O_t(d)\), i.e.,

$$\begin{aligned} \Pi ^{\min }_t :=\frac{1}{|O_t(d) |}\sum _{O\in O_t(d)} R(O). \end{aligned}$$
(5.11)

By Eq. (5.10), \(\Pi ^{\min }_t\) accepts all stabilizer tensor powers. Remarkably, it is the minimal projector with this property, as follows from the following theorem.

Theorem 5.6

(Minimal stabilizer test with perfect completeness). Let d be a prime and \(n,t\ge 1\). Then the projector \(\Pi ^{\min }_t\) is the orthogonal projector onto \({{\,\mathrm{span}\,}}~\{\vert S\rangle ^{\otimes t}\, : \, \vert S\rangle \langle S\vert \in {{\,\mathrm{Stab}\,}}(n,d) \}\).

Proof

Note that the t-th moment \(\rho :={\mathbb {E}}\left[ \vert S\rangle \langle S\vert ^{\otimes t} \right] \) defined in Eq. (5.1) is a density operator that is exactly supported on the span of the stabilizer tensor powers. By the preceding discussion, it remains to prove that the support of \(\Pi ^{\min }_t\) is contained in the support of \(\rho \). We start with Eq. (5.3):

$$\begin{aligned} {\mathbb {E}}_{{{\,\mathrm{Stab}\,}}(n,d)} \left[ \vert S\rangle \langle S\vert ^{\otimes t} \right] =\frac{1}{Z_{n,d,t}} \sum _{T\in \Sigma _{t,t}(d)} R(T) \end{aligned}$$

Recall from Eq. (4.16) that we can decompose \(\Sigma _{t,t}(d)\) into k double cosets,

$$\begin{aligned} \Sigma _{t,t}(d) = O_t(d) T_1 O_t(d) \cup \dots \cup O_t(d) T_k O_t(d). \end{aligned}$$

One of the double cosets is just \(O_t(d)\), say the first, corresponding to \(T_1=\Delta \) and \(R(T_1)=I\). As a consequence of Corollary 4.21, we can choose each representative \(T_i\) to be of the form (4.21). Then, Theorem 4.24 shows that \(R(T_i)\) is proportional to an orthogonal projection (by a positive proportionality constant) so in particular \(R(T_i)\ge 0\). On the other hand, we can compute the sum over each double coset by

$$\begin{aligned} \sum _{T \in O_t(d) T_i O_t(d)} R(T) {=} \frac{|O_t(d) T_i O_t(d)|}{|O_t(d)| \times |O_t(d)|} \sum _{O, O' \in O_t(d)} R(O) R(T_i) R(O') {=} c_i \, \Pi ^{\min }_t R(T_i) \Pi ^{\min }_t \end{aligned}$$

where \(c_i :=|O_t(d) T_i O_t(d)| > 0\). Together, we obtain that

$$\begin{aligned} \rho&= \frac{1}{Z_{n,d,t}} \sum _{T\in \Sigma _{t,t}(d)} R(T) = \frac{1}{Z_{n,d,t}} \sum _{i=1}^k c_i \, \Pi ^{\min }_t R(T_i) \Pi ^{\min }_t \\&= \frac{c_1}{Z_{n,d,t}} \left( \Pi ^{\min }_t + \sum _{i=2}^k \frac{c_i}{c_1} \, \Pi ^{\min }_t R(T_i) \Pi ^{\min }_t \right) \ge \frac{c_1}{Z_{n,d,t}} \Pi ^{\min }_t, \end{aligned}$$

which shows that the support of \(\rho \) indeed contains the support of \(\Pi ^{\min }_t\). \(\square \)

Note that there is no condition on t in Theorem 5.6. Indeed, while the theorem identifies the projector onto the span of stabilizer tensor powers precisely, it makes no assertion about whether this subspace contains other tensor power states than stabilizer states or not. It therefore complements our results on stabilizer testing, from which we can read off values of t such that the projector \(\Pi _t\) and hence \(\Pi ^{\min }_t \le \Pi _t\) contains only stabilizer tensor powers.

It is also interesting to ask about the minimal number of copies necessary for there to exist a perfectly complete stabilizer test that is dimension-independent. For \(d=2\), it is possible to show that \(t=4,5\) copies of a random stabilizer state become on average indistinguishable from a Haar-random pure state as \(n\rightarrow \infty \). This can be done by an explicit calculation of the 4th and 5th moments using our Theorem 5.3 Eqs. (4.33) and (4.34) (for \(t=4\), this has been carried out in [Dam18]). Thus, \(t=6\) copies as in our Theorem 3.3 are indeed optimal for multiqubit stabilizer testing. For odd d (prime or not), we know from Theorem 3.11 that \(t=4\) copies always suffice. For \(d\equiv 1,5\pmod 6\) it follows from Theorem 8.6 below that even \(t=3\) copies suffice (and are optimal). For \(d\equiv 3\pmod 6\), we leave the question of minimal t open.

6 Construction of Designs

Next we describe a construction of projective t-designs for arbitrary t based on weighted Clifford orbits. As in Sects. 4 and 5, we assume that d is prime.

First, we derive expressions for the average tensor powers of the Clifford orbits of arbitrary states. For any pure state \(\vert \Psi \rangle \), the average \({\mathbb {E}}_{U\in {{\,\mathrm{Cliff}\,}}(n,d)}[(U\vert \Psi \rangle \langle \Psi \vert U^\dagger )^{\otimes t}]\) commutes with \({{\,\mathrm{Cliff}\,}}(n,d)^{\otimes t}\). By Theorem 4.3, and assuming that \(n\ge t-1\), it can therefore be expressed as

$$\begin{aligned} {\mathbb {E}}_{U\in {{\,\mathrm{Cliff}\,}}(n,d)}\left[ \left( U\vert \Psi \rangle \langle \Psi \vert U^\dagger \right) ^{\otimes t}\right] = \sum _{T\in \Sigma _{t,t}(d)} \alpha '_T R(T) \end{aligned}$$
(6.1)

for some \(\alpha _T' \in {\mathbb {C}}\). Not all of the \(\alpha '_T\) are independent. This is because each \((U\vert \Psi \rangle \langle \Psi \vert U^\dagger )^{\otimes t}\) is invariant under the action of \(S_t \times S_t\) (acting from the left and from right), and also under taking the conjugate transpose. This motivates the following definition:

Definition 6.1

(Equivalence relation \(\sim _S\)). We define an equivalence relation \(\sim _S\) on \(\Sigma _{t,t}(d)\) in the following way: \(T \sim _S T'\) if and only if there exist \(\pi \), \(\pi ' \in S_t\) such that \(T'=\pi T \pi '\) or \(T'=\pi T^t \pi '\), where the transposed subspace \(T^t\) is defined by \(T^t=\{(\mathbf {y},\mathbf {x})\,:\, (\mathbf {x},\mathbf {y})\in T \}\).

We correspondingly decompose \(\Sigma _{t,t}(d)\) into equivalence classes:

$$\begin{aligned} \Sigma _{t,t}(d) = \bigcup _{i=1}^{M_{t,d}}{\mathcal {F}}_{t,i}(d) \end{aligned}$$

For convenience, we choose \({\mathcal {F}}_{t,1}(d)\) to be the set of subspaces corresponding to the permution group \(S_t\) (these form a single equivalence class). We also define

$$\begin{aligned} {\mathcal {R}}_i:=\sum _{T \in {\mathcal {F}}_{t,i}(d)} R(T). \end{aligned}$$

We note that the operators \({\mathcal {R}}_i\) are Hermitian and linearly independent.

Since the R(T) are linearly independent and \(R(T)^\dagger = R(T^t)\), it follows that the coefficients \(\alpha '_T\) in Eq. (6.1) must be the same for the elements of each equivalence class. Thus,

$$\begin{aligned} {\mathbb {E}}_{U\in {{\,\mathrm{Cliff}\,}}(n,d)}\left[ \left( U\vert \Psi \rangle \langle \Psi \vert U^\dagger \right) ^{\otimes t}\right] = \sum _{i=1}^{M_{t,d}} \alpha _i{\mathcal {R}}_i \end{aligned}$$
(6.2)

for some coefficients \(\alpha _i\). Note that \(\alpha _i \in {\mathbb {R}}\) because the \({\mathcal {R}}_i\) are Hermitian.

Theorem 6.2

Let d be a prime and \(n\ge t-1\). Then there exists an ensemble \(\{p_i,\Psi _i\}_{i=1}^{M_{t,d}}\) of fiducial states in \(({\mathbb {C}}^d)^{\otimes n}\) such that:

$$\begin{aligned} {\mathbb {E}}_{i \sim p} {\mathbb {E}}_{U \text { Clifford}}\left[ \left( U\vert \Psi _i\rangle \langle \Psi _i\vert U^\dagger \right) ^{\otimes t}\right] = {\mathbb {E}}_{\Psi \text { Haar}}\left[ \vert \Psi \rangle \langle \Psi \vert ^{\otimes t}\right] \end{aligned}$$

Importantly, the number \(M_{t,d}\) of Clifford orbit is independent of n, the number of qudits.

Proof

We start the proof by taking an arbitrary finite t-design given by an ensemble \(\{p_j,\Psi _j\}_{j=1}^K\). Such designs exist (see for example the early work [SZ84]), but K can be very large. If we replace each \(\Psi _j\) by a random element in its Clifford orbit then the resulting ensemble still forms a projective t-design. This means that

$$\begin{aligned} {\mathbb {E}}_{j \sim p} {\mathbb {E}}_{U\in {{\,\mathrm{Cliff}\,}}(n,d)}\left[ (U \vert \Psi _j\rangle \langle \Psi _j\vert U^\dagger )^{\otimes t} \right] \propto {\mathcal {R}}_1. \end{aligned}$$

Thus, if we define \(\alpha _i^{(j)}\) as the coefficient of \({\mathcal {R}}_i\) in the Clifford average of the fiducial state \(\Psi _j\),

$$\begin{aligned} {\mathbb {E}}_{U\in {{\,\mathrm{Cliff}\,}}(n,d)}\left[ \left( U\vert \Psi _j\rangle \langle \Psi _j\vert U^\dagger \right) ^{\otimes t}\right] = \sum _{i=1}^{M_{t,d}} \alpha _i^{(j)} {\mathcal {R}}_i, \end{aligned}$$

then

$$\begin{aligned} {\mathbb {E}}_{j \sim p}\left[ \alpha _i^{(j)}\right] = 0 \quad \text { for } i=2,\dots ,M_{t,d}. \end{aligned}$$
(6.3)

Conversely, if \(\{p_j\}\) is an arbitrary probability distribution that satisfies Eq. (6.3) then the ensemble obtained by first choosing a random fiducial state according to this distribution and then a random state in its Clifford orbit is a projective t-design. We will now explain how to modify the probabilities \(p_j\) step by step, setting more and more probabilities to zero while ensuring that Eq. (6.3) continues to hold—until all but \(M_{t,d}\) of them are zero. Without loss of generality, assume that \(p_1>0\).

Each step proceeds as follows: Suppose that there exist indices \(2\le j_1< j_2< \dots < j_{M_{t,d}}\le K\) such that \(p_{j_m} > 0\) for all \(m=1,\dots ,M_{t,d}\). (If no such indices exist then we are done.) Consider the linear system

$$\begin{aligned} \sum _{m=1}^{M_{t,d}} q_m \alpha _i^{(j_m)} = 0 \quad \text { for } i=2,\dots ,M_{t,d} \end{aligned}$$

in the indeterminates \(\{q_m\}_{m=1}^{M_{t,d}}\). This system is real, homogeneous, and underconstrained, so there always exists a nontrivial real solution \(q \in {\mathbb {R}}^{M_{t,d}}\). We can also assume that some component of q is positive (otherwise replace q by \(-q\)). Now consider \(p_{j_m} - x q_m\) for \(x\in {\mathbb {R}}\). At \(x=0\), all \(p_{j_m}\) are strictly positive. At some critical \(x=x_c\), one of the values \(p_{j_m} - x_c q_m\) becomes zero, while all other ones are still non-negative. Thus, if we modify the probabilities \(p_{j,m}\) by the rule

$$\begin{aligned} p_{j_m} \mapsto p_{j_m} - x_c q_m \quad \text { for } m=1,\dots ,M_{t,d} \end{aligned}$$

then it still holds true that \(\sum _{j=1}^{M_{t,d}} p_j \alpha _i^{(j)} = 0\), but there is now at least one additional zero among the \(p_{j,m}\). This continues to hold if we further normalize the \(\{p_j\}\) to be a probability distribution, i.e.,

$$\begin{aligned} p_j \mapsto \frac{p_j}{\sum _{j'} p_{j'}} \quad \text { for } j=1,\dots ,K, \end{aligned}$$

which is always possible since \(p_1>0\). Thus, we obtain a probability distribution \(\{p_j\}\) with strictly smaller support satisfying Eq. (6.3) and \(p_1>0\).

We can repeat this process until there are at most \(M_{t,d}-1\) nonzero probabilities among the \(\{p_j\}_{j=2}^K\). By including \(p_1\), we arrive at an ensemble of at most \(M_{t,d}\) fiducial vectors. The corresponding probabilities satisfy Eq. (6.3), which is necessary and sufficient for the ensemble of Clifford orbits to be a design. This completes the proof. \(\square \)

Remark 6.3

A simple upper bound for \(M_{t,d}\) is \(|\Sigma _{t,t}(d)|=\prod _{k=0}^{t-2} (d^k+1)\) from Theorem 4.10. However, in general this is a rather pessimistic estimate. For example, consider \(d=t=3\). Then, \(|\Sigma _{t,t}(d)|=8\), while there are just \(M_{t,d}=2\) equivalence classes, as follows from Eq. (4.28). One of them is the set of permutations \(S_3\), with 6 elements, and the other one has 2 elements.

For \(d=3\) and \(t=4\), \(|\Sigma _{t,t}(d)|=80\), while \(M_{t,d}=3\). Again, one of the equivalence classes is the permutation group with \(4!=24\) elements. The second equivalence class is the class of the anti-permutations as defined in Eq. (4.35), which is represented by the row space of the matrix

$$\begin{aligned} \left[ \begin{array}{cccc|cccc} 1&{}2&{}2&{}2 &{} 1&{}0&{}0&{}0 \\ 2&{}1&{}2&{}2 &{} 0&{}1&{}0&{}0 \\ 2&{}2&{}1&{}2 &{} 0&{}0&{}1&{}0 \\ 2&{}2&{}2&{}1 &{} 0&{}0&{}0&{}1 \end{array}\right] \end{aligned}$$

(the Lagrangian subspace corresponding to the qutrit anti-identity \({\bar{\mathbb {1}}}\)). This equivalence class again has 24 elements. The last equivalence class can be represented by

$$\begin{aligned} \left[ \begin{array}{cccc|cccc} 1&{}1&{}1&{}1 &{} 1&{}1&{}1&{}1 \\ 0&{}1&{}2&{}0 &{} 0&{}2&{}1&{}0 \\ \hline 0&{}0&{}0&{}0 &{} 1&{}1&{}1&{}0 \\ \hline 1&{}1&{}1&{}0 &{} 0&{}0&{}0&{}0 \end{array}\right] , \end{aligned}$$

and it has 32 elements.

Remark 6.4

The criterion used in the proof of Theorem 6.2 can also be used to determine fiducial states that generate a projective t-design. For example, the Clifford orbit through a single fiducial state \(\Psi \) forms a projective t-design if and only if the coefficients \(\alpha _i\) in Eq. (6.2) vanish for \(i\ne 1\).

Let us illustrate this strategy by showing that, for any \(n\ge 2\), there exists a qutrit state \(\vert \psi \rangle \in {\mathbb {C}}^3\) such that the Clifford orbit of \(\Psi = \psi ^{\otimes n}\) forms a projective 3-design. We note that this state cannot be a stabilizer state, since we know that the ensemble of qutrit stabilizer states does not form a 3-design (Remark 5.5)! Instead of with \({\mathcal {R}}_1\) and \({\mathcal {R}}_2\), we will work with their multiples \(\Pi ^{\text {sym}}_3 \propto {\mathcal {R}}_1\) and \(P_+ :=\Pi ^{\text {sym}}_3 P \Pi ^{\text {sym}}_3 \propto {\mathcal {R}}_2\), where P is the projector defined in Eq. (4.29). Now consider the third moment

$$\begin{aligned} \rho _3 :={\mathbb {E}}_{U\in {{\,\mathrm{Cliff}\,}}(n,3)}\left[ (U \vert \Psi \rangle \langle \Psi \vert U^{\dagger })^{\otimes 3}\right] , \end{aligned}$$

and expand it as

$$\begin{aligned} \rho _3 = \alpha (\psi ) \, \Pi ^{\text {sym}}_3 + \beta (\psi ) \, P_+ \end{aligned}$$

for coefficients \(\alpha (\psi )\), \(\beta (\psi )\in {\mathbb {R}}\) which depend on the choice of fiducial state. We wish to argue that for every n there exists a single-qutrit state \(\vert \psi \rangle \) such that \(\beta (\psi ) = 0\). For this, we note that the coefficients can be computed as follows:

$$\begin{aligned} 1&= {{\,\mathrm{tr}\,}}[\rho _3] = \alpha (\psi ) {{\,\mathrm{tr}\,}}[\Pi ^{\text {sym}}_3] + \beta (\psi ) {{\,\mathrm{tr}\,}}[P_+] \\ \langle \psi ^{\otimes 3}|r(T)|\psi ^{\otimes 3}\rangle ^n&= {{\,\mathrm{tr}\,}}[R(T) \rho _3] = 3^n \alpha (\psi ) \, {{\,\mathrm{tr}\,}}[P_+] + 3^n \beta (\psi ) \, {{\,\mathrm{tr}\,}}[P_+]. \end{aligned}$$

It follows that \(\beta (\psi )=0\) if and only if

$$\begin{aligned} \langle \psi ^{\otimes 3}|r(T)|\psi ^{\otimes 3}\rangle ^n = 3^n \frac{{{\,\mathrm{tr}\,}}[P_+]}{{{\,\mathrm{tr}\,}}[\Pi ^{\text {sym}}_3]} = \frac{3}{3^n+2}, \end{aligned}$$

where we used Eq. (4.32). Thus, the Clifford orbit through \(\psi ^{\otimes n}\) forms a projective 3-design if and only if

$$\begin{aligned} \langle \psi ^{\otimes 3}|r(T)|\psi ^{\otimes 3}\rangle = \left( \frac{3}{3^n+2}\right) ^{1/n} \in \left[ \tfrac{1}{3},\tfrac{3}{5}\right] \end{aligned}$$
(6.4)

But the the left-hand side is equal to one if \(\psi \) is a stabilizer state, e.g., \(\vert \psi \rangle =\vert 0\rangle \) (Eq. (4.10)), while is vanishes for, e.g., the non-stabilizer state \(\vert \psi \rangle = \frac{1}{\sqrt{2}}\left( \vert 0\rangle - \vert 1\rangle \right) \). By continuity it follows that there always exists a single-qutrit state \(\vert \psi \rangle \) satisfying Eq. (6.4). It is easy to find such an \(\vert \psi \rangle \) explicitly, e.g., by considering the one-parameter family of states \(\vert \psi (\theta )\rangle = \cos (\theta ) \vert 0\rangle - \sin (\theta ) \vert 1\rangle \) and solving Eq. (6.4) for \(\theta \in [0,\frac{\pi }{2}]\).

7 De Finetti Theorems for Stabilizer Symmetries

In this section we establish a direct connection between our results on stabilizer testing and the celebrated quantum de Finetti theorems, which play an important role in characterizing entanglement and correlations in quantum states with permutation symmetry (cf. discussion in Sect. 1.4).

We first recall the finite quantum de Finetti theorem from [CKMR07]. Let \(\rho \) be a quantum state on \(({\mathbb {C}}^\ell )^{\otimes t}\) that commutes with all permutations (i.e., \([r_\pi ,\rho ]=0\) for all \(\pi \in S_t\)). Then there exists a probability measure \(\mu \) on the space of mixed states on \({\mathbb {C}}^\ell \) such that

$$\begin{aligned} \frac{1}{2} \left\Vert \rho _{1\dots {}s} - \int d\mu (\sigma ) \sigma ^{\otimes s} \right\Vert _1 \le 2\ell ^2\frac{s}{t}. \end{aligned}$$
(7.1)

Since any quantum state that commutes with permutations admits a purification on the symmetric subspace, Eq. (7.1) follows directly from a similar result for the symmetric subspace, namely, that for every \(\vert \Psi \rangle \in {{\,\mathrm{Sym}\,}}^t({\mathbb {C}}^\ell )\) there exists a probability measure \(\mu \) on pure states on \({\mathbb {C}}^\ell \) such that

$$\begin{aligned} \frac{1}{2} \left\Vert \Psi _{1\dots {}s} - \int d\mu (\phi ) \phi ^{\otimes s} \right\Vert _1 \le 2\ell \frac{s}{t}. \end{aligned}$$
(7.2)

In this section, we prove de Finetti theorems adapted to stabilizer states. The key idea is to extend the permutation symmetry to invariance under a larger group:

  1. 1.

    the stochastic orthogonal group \(O_t(d)\) (for qudits in any prime dimension d), or

  2. 2.

    the group generated by the permutations and the anti-identity (3.13) (for qubits).

These symmetries are natural since they are carried by the tensor powers of any stabilizer state, as we proved in Eq. (4.13).

In both cases, our theorems show that the reduced density matrices are close to convex combinations of tensor powers of stabilizer states. In the first case, we find that the reduced state is in fact exponentially (in the number of traced out systems) close to a state of this form, which is a much stronger guarantee than provided by the finite de Finetti theorems of Eqs. (7.1) and (7.2) (cf. [Ren07, KM09]). In the second case, we obtain power law convergence but the symmetry requirements are drastically reduced. We establish our results first for pure states (Sects. 7.1 and 7.2) and then extend them by a standard purification argument to mixed states (Sect. 7.3).

7.1 Exponential stabilizer de Finetti theorem

Let d be an arbitrary prime. We start with the observation that, for any two distinct stabilizer states,

$$\begin{aligned} |\langle S|S'\rangle |^2\le \frac{1}{d} \end{aligned}$$
(7.3)

(this can be seen from, e.g., Eq. (2.17)). It follows that, for fixed d and n, the stabilizer tensor powers \(\vert S\rangle ^{\otimes t}\) approach orthonormality as \(t\rightarrow \infty \). The following lemma makes this precise.

Lemma 7.1

Let d be a prime and \(n,t\ge 1\). Consider the Gram matrix \(G_{S,S'} = \langle S|S'\rangle ^t\), where \(S,S'\in {{\,\mathrm{Stab}\,}}(n,d)\). If

$$\begin{aligned} \varepsilon :=d^{\frac{1}{2}((n+2)^2-t)} < \frac{1}{2} \end{aligned}$$

then the following holds:

  1. 1.

    The Gram matrix is \(\varepsilon \)-close to the identity matrix in operator norm: \(\Vert G - I \Vert _\infty \le \varepsilon \). In particular, the stabilizer tensor powers \(\vert S\rangle ^{\otimes t}\) are linearly independent.

  2. 2.

    The nonzero eigenvalues of \(Q :=\sum _S \vert S\rangle ^{\otimes t}\langle S\vert ^{\otimes t}\) and its pseudoinverse \(Q^{+}\) lie in the interval \(1\pm 2\varepsilon \).

  3. 3.

    The vectors \((Q^+)^{1/2} \vert S\rangle ^{\otimes t}\) for \(S\in {{\,\mathrm{Stab}\,}}(n,d)\) are orthonormal.

Proof

1. The first claim follows directly from the element-wise bound (7.3):

$$\begin{aligned} \Vert G - I \Vert _\infty\le & {} \Vert G - I \Vert _{\ell _2} \le \Vert G - I \Vert _{\ell _\infty }\,|{{\,\mathrm{Stab}\,}}(n,d)|\\\le & {} \left( \max _{S\ne S'} |\langle S|S'\rangle |^t \right) \, |{{\,\mathrm{Stab}\,}}(n,d)|\le d^{\frac{1}{2} ((n+2)^2-t)}=\varepsilon , \end{aligned}$$

where we used the bound

$$\begin{aligned} |{{\,\mathrm{Stab}\,}}(n,d)|=d^n \prod _{i=1}^n (d^i+1) \le d^{(n+2)^2/2}. \end{aligned}$$
(7.4)

The cardinality of the set of stabilizer states has been computed in [AG04, Prop. 2] for \(d=2\) and in [Gro06, Cor. 21] for odd d.

Since \(\varepsilon <1\), the statement about the Gram matrix implies that the stabilizer tensor powers are linearly independent.

2. Now define

$$\begin{aligned} H = \sum _{S\in {{\,\mathrm{Stab}\,}}(n,d)} \vert S\rangle ^{\otimes t}\langle e_S\vert , \end{aligned}$$

where \(\vert e_S\rangle \) denotes an orthonormal basis labeled by the set of stabilizer states \({{\,\mathrm{Stab}\,}}(n,d)\). Then,

$$\begin{aligned} G = H^\dagger H \qquad \text { and }\qquad Q = H H^\dagger , \end{aligned}$$

and thus the nonzero eigenvalues of G and Q are both identical (to the squared singular values of H). By part 1, the eigenvalues of G lie in the interval \(1\pm \varepsilon \), hence the same is true for the nonzero eigenvalues of Q. Since we assumed that \(\varepsilon <1/2\), it follows that the nonzero eigenvalues of the pseudoinverse \(Q^+\) are in the interval \(1\pm 2\varepsilon \). This establishes the second claim.

3. By the first claim, the stabilizer tensor powers are linearly independent. On the other hand,

$$\begin{aligned} \vert S\rangle ^{\otimes t} = Q Q^{+}\vert S\rangle ^{\otimes t} = \sum _{S'} \vert S'\rangle ^{\otimes t} \langle S'\vert ^{\otimes t} Q^{+} \vert S\rangle ^{\otimes t}. \end{aligned}$$

Thus, the linear independence implies that the vectors \((Q^+)^{1/2} \vert S\rangle ^{\otimes t}\) are orthonormal. \(\square \)

Theorem 7.2

(Pure-state exponential stabilizer de Finetti theorem). Let d be a prime and \(\vert \Psi \rangle \in {{\,\mathrm{Sym}\,}}^t(({\mathbb {C}}^d)^{\otimes n})\) a pure quantum state that is left invariant by the action of \(O_t(d)\). Let \(1\le s\le t\). Then there exists a probability distribution p on \({{\,\mathrm{Stab}\,}}(n,d)\), the set of pure stabilizer states of n qudits, such that

$$\begin{aligned} \frac{1}{2}\left\Vert \Psi _{1\dots {}s} - \sum _S p(S) \vert S\rangle ^{\otimes s}\langle S\vert ^{\otimes s} \right\Vert _1 \le 2 d^{\frac{1}{2}(n+2)^2} d^{-\frac{1}{2}(t-s)}. \end{aligned}$$

Proof

By assumption, \(\Pi ^{\min }_t \vert \Psi \rangle = \vert \Psi \rangle \), where \(\Pi ^{\min }_t\) is the minimal projector from Eq. (5.11). Theorem 5.6 shows that \(\vert \Psi \rangle \) is contained in the span of stabilizer tensor powers, i.e.,

$$\begin{aligned} \vert \Psi \rangle = \sum _{S\in {{\,\mathrm{Stab}\,}}(n,d)} \alpha _S \vert S\rangle ^{\otimes t} \end{aligned}$$

for certain coefficients \(\alpha _S\in {\mathbb {C}}\). We now use the third and second claim of Lemma 7.1 to see that

$$\begin{aligned} \sum _S |\alpha _S|^2 = \Vert (Q^+)^{1/2} \vert \Psi \rangle \Vert ^2 \in [1-2\varepsilon ,1+2\varepsilon ] \end{aligned}$$
(7.5)

where \(\varepsilon :=d^{\frac{1}{2}((n+2)^2-t)}\). Here we have assumed that \(\varepsilon <1/2\), for otherwise the statement of the theorem is vacuous. We now compute the partial trace over all but s subsystems:

$$\begin{aligned} \Psi _{1\dots {}s} = \sum _S |\alpha _S|^2 \vert S\rangle ^{\otimes s}\langle S\vert ^{\otimes s} \;+\; \sum _{S\ne S'} \alpha _S {\overline{\alpha }}_{S'} \vert S\rangle ^{\otimes s} \langle S'\vert ^{\otimes s} \langle S'|S\rangle ^{t-s}. \end{aligned}$$

The norm of the cross terms is small:

$$\begin{aligned}&\quad \left\Vert \sum _{S\ne S'} \alpha _s {\overline{\alpha }}_{S'} \vert S\rangle ^{\otimes s} \langle S'\vert ^{\otimes s} \langle S'|S\rangle ^{t-s} \right\Vert _1 \\ {}&\le \sum _{S\ne S'} |\alpha _S||\alpha _{S'}|d^{-(t-s)/2} \le \left( \sum _S |\alpha _S|\right) ^2 d^{-(t-s)/2} \\&\le \left( \sum _S |\alpha _S|^2 \right) d^{(n+2)^2/2} d^{-(t-s)/2}\\ {}&\le \left( 1 + 2\varepsilon \right) d^{(n+2)^2/2} d^{-(t-s)/2} \le 2 d^{(n+2)^2/2} d^{-(t-s)/2}; \end{aligned}$$

the first inequality uses Eq. (7.3), the third inequality is Eq. (7.4), the fourth inequality is the upper bound in Eq. (7.5), and the last step uses that \(\varepsilon < 1/2\). Finally, define \(p(S) :=|\alpha _S|^2 / \sum _{S'} |\alpha _{S'}|^2\) (the denominator is positive by the lower bound in Eq. (7.5) and \(\varepsilon <1/2\)). Then:

$$\begin{aligned}&\left\Vert \Psi _{1\dots {}s} - \sum _S p(S) \vert S\rangle ^{\otimes s}\langle S\vert ^{\otimes s} \right\Vert _1 \\&\quad \le \left\Vert \Psi _{1\dots {}s} - \sum _S |\alpha _S|^2 \vert S\rangle ^{\otimes s}\langle S\vert ^{\otimes s} \right\Vert _1 + \sum _S \left||\alpha _S |^2 - p(S) \right|\\&\quad \le 2 d^{(n+2)^2/2} d^{-(t-s)/2} + \left|1 - \sum _{S'} |\alpha _{S'}|^2 \right|\le 2 d^{(n+2)^2/2} d^{-(t-s)/2} + 2\varepsilon \\&\quad \le 4 d^{(n+2)^2/2} d^{-(t-s)/2}. \end{aligned}$$

\(\square \)

7.2 Stabilizer de Finetti theorem for the anti-identity

We now prove a stabilizer de Finetti theorem with reduced symmetry requirements. For concreteness, we restrict to the multi-qubit case (\(d=2\)) and to tensor powers that are multiples of six. Neither restriction is essential. The following theorem shows that the reduced states of an arbitrary permutation-symmetric quantum state that is invariant under the anti-identity operator \(V=R({\bar{\mathbb {1}}})\) from Eq. (3.13), but not necessarily under other stochastic isometries, are well-approximated by convex mixtures of tensor powers of stabilizer states.

Theorem 7.3

(Pure-state stabilizer de Finetti theorem for the anti-identity) Let \(\vert \Psi \rangle \in {{\,\mathrm{Sym}\,}}^t(({\mathbb {C}}^2)^{\otimes n})\) be a quantum state that is left invariant by the action of the anti-identity (3.13) on some (and hence every) subsystem consisting of six n-qubit blocks. Let \(s<t\) be a multiple of six. Then there exists a probability distribution p on \({{\,\mathrm{Stab}\,}}(n,2)\), the set of pure stabilizer states of n qubits, such that

$$\begin{aligned} \frac{1}{2}\left\Vert \Psi _{1\dots {}s} - \sum _S p(S) \vert S\rangle ^{\otimes s}\langle S\vert ^{\otimes s} \right\Vert _1 \le 6 \sqrt{2^{n+1}} \sqrt{\frac{s}{t}}. \end{aligned}$$

Proof

By the ordinary finite quantum de Finetti theorem (7.2), there exists a probability measure \(d\mu (\phi )\) on the set of pure states such that

$$\begin{aligned} \frac{1}{2} \left\Vert \Psi _{1\dots {}s} - \int d\mu (\phi ) \phi ^{\otimes s} \right\Vert _1 \le 2^{n+1} \frac{s}{t}. \end{aligned}$$
(7.6)

Let \(\Pi _n=(I+R({\bar{\mathbb {1}}}))/2\) denote the projector onto the \(+1\)-eigenspace of the \(6\times 6\)-anti identity for n qubits. By assumption, \((\Pi _n^{\otimes (s/6)} \otimes I^{\otimes (t-s)}) \vert \Psi \rangle =\vert \Psi \rangle \), and hence \({{\,\mathrm{tr}\,}}[\Psi _{1\dots {}s} \Pi _n^{\otimes s/6}]=1\). Since the trace distance satisfies \(\frac{1}{2}\Vert \rho - \sigma \Vert _1 = \max _{0\le Q\le I} {{\,\mathrm{tr}\,}}[(\rho -\sigma )Q]\), it follows that

$$\begin{aligned} \int d\mu (\phi ) \left( 1- {{\,\mathrm{tr}\,}}\left[ \Pi _n^{\otimes (s/6)} \phi ^{\otimes s}\right] \right) \le 2^{n+1} \frac{s}{t}. \end{aligned}$$

Now recall from Eq. (3.11) that the accepting POVM element for qubit stabilizer testing is given by \(\Pi _\text {accept} = \frac{1}{2} \left( I + U \right) \), where \(U = V(I^{\otimes 4} \otimes {\mathbb {F}})\), where \({\mathbb {F}}=R((1 2))\) is the operator that swaps two blocks of n qubits (see discussion above Eq. (3.12)). Since tensor powers of pure states are permutation-symmetric,

$$\begin{aligned} \int d\mu (\phi ) \left( 1- {{\,\mathrm{tr}\,}}\left[ \Pi _\text {accept} \phi ^{\otimes 6}\right] ^{s/6} \right) \le 2^{n+1} \frac{s}{t} \end{aligned}$$
(7.7)

According to Theorem 3.3 Eq. (3.8) and using Lemma 7.4 below, for each pure state \(\phi \) there exists a pure stabilizer state \(S_\phi \) such that

$$\begin{aligned} |\langle S_\phi |\phi \rangle |^{2(s/6)} \ge 4 {{\,\mathrm{tr}\,}}\left[ \Pi _\text {accept} \phi ^{\otimes 6}\right] ^{s/6} - 3. \end{aligned}$$

Using the estimate \(1-p^6 \le 6(1-p)\), which holds for all \(p \in [0,1]\), we obtain

$$\begin{aligned} 1 - |\langle S_\phi |\phi \rangle |^{2s} \le 6\left( 1 - |\langle S_\phi |\phi \rangle |^{2(s/6)} \right) \le 24 \left( 1 - {{\,\mathrm{tr}\,}}\left[ \Pi _\text {accept} \phi ^{\otimes 6}\right] ^{s/6} \right) . \end{aligned}$$

Combining this estimate with Eq. (7.7), we get

$$\begin{aligned} \int d\mu (\phi ) \left( 1- |\langle S_\phi |\phi \rangle |^{2s} \right) \le 24 \cdot 2^{n+1} \frac{s}{t}. \end{aligned}$$

It follows that replacing each pure state \(\phi \) by the nearby stabilizer state \(S_\phi \) incurs only a small error:

$$\begin{aligned}&\frac{1}{2}\left\Vert \int d\mu (\phi ) \phi ^{\otimes s} - \int d\mu (\phi ) S_\phi ^{\otimes s} \right\Vert _1 \\&\quad \le \int d\mu (\phi ) \frac{1}{2} \left\Vert \phi ^{\otimes s} - S_\phi ^{\otimes s} \right\Vert _1 \le \int d\mu (\phi ) \sqrt{1 - |\langle \phi |S_\phi \rangle ^{2s}} \\&\quad \le \sqrt{\int d\mu (\phi ) \left( 1 - |\langle \phi |S_\phi \rangle ^{2s} \right) } \le \sqrt{24 \cdot 2^{n+1} \frac{s}{t}}, \end{aligned}$$

where we have used the triangle inequality, the relation between the trace distance and the fidelity between pure states, and the concavity of the square root. Together with Eq. (7.6), we obtain

$$\begin{aligned} \frac{1}{2}\left\Vert \Psi _{1\dots {}s} - \int d\mu (\phi ) S_\phi ^{\otimes s} \right\Vert _1 \le 2^{n+1} \frac{s}{t} + \sqrt{24\cdot 2^{n+1} \frac{s}{t}} \le 6 \sqrt{2^{n+1}} \sqrt{\frac{s}{t}} \end{aligned}$$

where we have assumed without loss of generality that \(2^{n+1} \frac{s}{t}\le 1\) (otherwise, the right-hand side is larger than one so the resulting bound is trivially true). \(\square \)

Lemma 7.4

The following bound holds for all \(k\ge 1\) and \(p\in [0,1]\) such that the right-hand side is non-negative:

$$\begin{aligned} (4p-3)^k \ge 4p^k-3. \end{aligned}$$

Proof

We will prove the inequality for all \(k\ge 1\) and \(p\in [\frac{3}{4},1]\). For this, note that the two expressions coincide for \(p=1\) and that the derivative of their difference is negative for all \(p\in [\frac{3}{4},1]\). Indeed,

$$\begin{aligned} \partial _p \left( (4p-3)^k - (4p^k-3) \right) \le 0 \quad \Leftrightarrow \quad (4p-3)^{k-1} \le p^{k-1}. \end{aligned}$$

In the interval that we are considering, \(0 \le 4p-3 \le p\), so the right-hand side condition holds. \(\square \)

7.3 Extension to mixed states

In this section, we extend Theorems 7.2 and 7.3 to the case of mixed density matrices. This is done using a standard purification argument as used to derive the ordinary quantum de Finetti theorem for mixed states from the version for pure states (i.e., Eq. (7.1) from Eq. (7.2)). For the next lemma, recall the vectorization operation from Definition 4.6.

Lemma 7.5

(Purification and symmetries). Let \(\rho \) be positive semi-definite, \(\vert \Psi \rangle = {{\,\mathrm{vec}\,}}(\rho ^{1/2})\) its standard purification, and O a unitary with real matrix elements in the computational basis. Then the following conditions are equivalent:

  1. 1.

    \((O\otimes O)\vert \Psi \rangle =\vert \Psi \rangle \).

  2. 2.

    \([\rho ,O]=0\).

Proof

We observe that

$$\begin{aligned} (O \otimes O) \vert \Psi \rangle = \left( O \otimes (O^{-1})^T\right) \vert \Psi \rangle = {{\,\mathrm{vec}\,}}(O \rho ^{1/2} O^{-1} ). \end{aligned}$$

Thus, condition 1 is equivalent to \(O\rho ^{1/2}O^{-1} = \rho ^{1/2}\). It follows that condition 1 implies condition 2 by squaring. Conversely, assuming condition 2,

$$\begin{aligned} \rho = O \rho O^{-1} = \bigl (O \rho ^{1/2} O^{-1}\bigr ) \bigl ( O \rho ^{1/2} O^{-1}\bigr ). \end{aligned}$$

Hence \(O \rho ^{1/2} O^{-1}\) is a positive semi-definite square root of \(\rho \). Since such square roots are unique, this implies that \(O \rho ^{-1/2} O^{-1} = \rho ^{1/2}\) and hence condition 1. \(\square \)

Clearly, the operators R(O) for \(O\in O_t(d)\) have real matrix elements in the computational basis. In fact, they are given by permutation matrices in the computational basis, as can be seen from the formula given in Eq. (4.12). Thus they satisfy the conditions of Lemma 7.5. We use this now to extend our de Finetti theorems to mixed states.

Theorem 7.6

(Exponential stabilizer de Finetti theorem). Let d be a prime and \(\rho \) a quantum state on \((({\mathbb {C}}^d)^{\otimes n})^{\otimes t}\) that commutes with the action of \(O_t(d) \supseteq S_t\). Let \(1\le s\le t\). Then there exists a probability distribution p on the (finite) set of mixed stabilizer statesFootnote 8 of n qudits, such that

$$\begin{aligned} \frac{1}{2}\left\Vert \rho _{1\dots {}s} - \sum _{\sigma _S} p(\sigma _S) \sigma _S^{\otimes s} \right\Vert _1 \le 2 d^{\frac{1}{2}(2n+2)^2} d^{-\frac{1}{2}(t-s)}. \end{aligned}$$

Proof

Using Lemma 7.5, we can find a purification \(\vert \Psi \rangle \in (({\mathbb {C}}^d)^{\otimes 2n}))^{\otimes t} \cong ({\mathbb {C}}^d)^{\otimes n})^{\otimes t} \otimes ({\mathbb {C}}^d)^{\otimes n})^{\otimes t}\) of \(\rho \), which is invariant under the action of \(O \in O_t(d)\). (In particular, \(\vert \Psi \rangle \) is an elemenrt of the symmetric subspace.) Here we crucially use that the operators R(O) for 2n qudits are just the second tensor powers of the corresponding operators for n qudits, as is clear from Eq. (4.12). Thus we can apply Theorem 7.2 to the state \(\vert \Psi \rangle \). Since the local Hlibert space now contains 2n qudits, we obtain that there exists a probability distribution p over pure stabilizer states on \(({\mathbb {C}}^d)^{\otimes 2n}\) such that

$$\begin{aligned} \frac{1}{2}\left\Vert \Psi _{1\dots {}s} - \sum _S p(S) \, \vert S\rangle \langle S\vert ^{\otimes s} \right\Vert _1 \le 2 d^{\frac{1}{2}(2n+2)^2} d^{-\frac{1}{2}(t-s)}. \end{aligned}$$

Taking the partial trace over the purifying systems does not increase the trace distance. Since reduced density matrices of pure stabilizer states are mixed stabilizer states, we obtain the result. \(\square \)

The very same argument yields the following version of Theorem 7.3 for mixed states:

Theorem 7.7

(Stabilizer de Finetti theorem for the anti-identity). Let \(\rho \) be a quantum state on \((({\mathbb {C}}^2)^{\otimes n})^{\otimes t}\) that commutes with all permutations as well as with the action of the anti-identity (1.4) on some (and hence every) subsystem consisting of six n-qubit blocks. Let \(s<t\) be a multiple of six. Then there exists a probability distribution p on the (finite) set of mixed stabilizer states of n qubits, such that

$$\begin{aligned} \frac{1}{2}\left\Vert \rho _{1\dots {}s} - \sum _{\sigma _S} p(\sigma _S) \sigma _S^{\otimes s} \right\Vert _1 \le 6 \sqrt{2} \cdot 2^n \sqrt{\frac{s}{t}}. \end{aligned}$$

8 Robust Hudson Theorem

The methods developed in Sect. 3 also allow us to prove a robust version of the finite-dimensional Hudson theorem. Recall that from Eq. (2.18) that, for odd d, the Wigner function of a pure stabilizer state is necessarily nonnegative. Hudson theorem states that, for pure states, this condition is also sufficient, i.e., the Wigner function of a pure quantum state is non-negative if and only if the state is a stabilizer state [Gro06]. We will show in Theorem 8.4 that if the Wigner or sum-negativity

$$\begin{aligned} {{\,\mathrm{sn}\,}}(\psi ) :=\sum _{\mathbf {x} : w_\psi (\mathbf {x})<0} |w_\psi (\mathbf {x}) |= \frac{1}{2} \left( \sum _{\mathbf {x}} |w_\psi (\mathbf {x}) |- 1 \right) . \end{aligned}$$

is small then the state is close to a stabilizer state.

The Wigner negativity is immediately related to the mana \({\mathcal {M}}(\psi )=\log (2{{\,\mathrm{sn}\,}}(\psi )+1)\), a monotone that plays an important role in the resource theory of stabilizer computation [Got97]. Throughout this section we assume that d is odd, so that the Wigner function is well-behaved (cf. Sect. 2.3).

8.1 Exact Hudson theorem

We first give a new and succinct proof of the finite-dimensional Hudson theorem. For pure states, \(1={{\,\mathrm{tr}\,}}\psi ^2 =\sum _{\mathbf {x}} d^n w_\psi (\mathbf {x})^2\). Thus we can define a probability distribution based on the Wigner function,

$$\begin{aligned} q_\psi (\mathbf {x}) = d^n w_\psi (\mathbf {x})^2, \end{aligned}$$

similar to the \(p_\psi \) distribution that we defined in Eq. (3.2) via the characteristic function. Note that \(0\le q_\psi (\mathbf {x})\le d^{-n}\), since \(|w_\psi (\mathbf {x})|\le d^{-n}\).

We now consider the sum of the absolute value of the Wigner function,

$$\begin{aligned} \Vert \psi \Vert _W :=\sum _{\mathbf {x}} |w_\psi (\mathbf {x}) |= d^{-n/2} \sum _{\mathbf {x}} q_\psi (\mathbf {x})^{1/2}. \end{aligned}$$

It holds that \(\Vert \psi \Vert _W \ge \sum _{\mathbf {x}} w_\psi (\mathbf {x}) = 1\), with equality if and only if \(w_\psi (x)\ge 0\) for all x. By the Hölder inequality (with \(p_1=p_2=p_3=3\), so \(\sum _k 1/p_k=1\)):

$$\begin{aligned}&1 = \sum _{\mathbf {x}} q_\psi (\mathbf {x}) = \sum _{\mathbf {x}} q_\psi (\mathbf {x})^{1/6} q_\psi (\mathbf {x})^{1/6} q_\psi (\mathbf {x})^{2/3}\\ {}&\le \left( \sum _{\mathbf {x}} q_\psi ({\mathbf {x}})^{1/2} \right) ^{2/3} \left( \sum _{\mathbf {x}} q_\psi (\mathbf {x})^2 \right) ^{1/3}. \end{aligned}$$

Thus we obtain the following fundamental bound:

$$\begin{aligned} \sum _{\mathbf {x}} q_\psi (\mathbf {x})^2 \ge \frac{1}{d^n \Vert \psi \Vert _W^2}. \end{aligned}$$
(8.1)

Crucially, we can interpret the left-hand side as the average of the function \(q_\psi \) with respect to the same probability distribution, \(E_{\mathbf {x}\sim q_\psi } q_\psi (\mathbf {x})\). Now suppose that the Wigner function is nowhere negative, so that the bound simplifies to \(\sum _{\mathbf {x}} q_\psi (\mathbf {x})^2 \ge d^{-n}\). But \(q_\psi (\mathbf {x})\le d^{-n}\) for all \(\mathbf {x}\), so we conclude that the function \(q_\psi \) must be equal to \(d^{-n}\) on its support. In other words, \(q_\psi (\mathbf {x})\) is the uniform distribution on a subset of cardinality \(d^n\). This gives a rather direct proof of the finite-dimensional Hudson theorem:

Theorem 8.1

(Finite-dimensional Hudson theorem, [Gro06]). Let d be an odd integer and \(\psi \) a pure quantum state of n qudits. Then the Wigner function of \(\psi \) is everywhere nonnegative, \(w_\psi (\mathbf {x})\ge 0\), if and only if \(\psi \) is a stabilizer state.

Proof

In view of Eq. (2.18) we only need to show that if \(w_\psi (\mathbf {x})\ge 0\) for all \(\mathbf {x}\) then \(\psi \) is a stabilizer state. By the preceding discussion, we know that \(w_\psi (\mathbf {x}) = d^{-n} \mathbb {1}_T(\mathbf {x})\), where \(\mathbb {1}_T\) denotes the indicator function of some subset \(T\subseteq {\mathcal {V}}_n\) of cardinality \(d^n\). In other words, \(\langle \psi | A_{\mathbf {x}} | \psi \rangle = \mathbb {1}_T(\mathbf {x})\) and so \(A_{\mathbf {x}}\vert \psi \rangle =\vert \psi \rangle \) for all \(\mathbf {x}\in T\).

It remains to show that T is of the form \(T=\mathbf {a}+M\), where M is a maximal isotropic subspace. For this, consider any three points \(\mathbf {x},\mathbf {y},\mathbf {z}\in T\) and use Eq. (2.12), which asserts that \(A_{\mathbf {x}} A_{\mathbf {y}} A_{\mathbf {z}} = \omega ^{2 [\mathbf {z}-\mathbf {x},\mathbf {y}-\mathbf {x}]} A_{\mathbf {x}-\mathbf {y} +\mathbf {z}}\). Because \(\vert \psi \rangle \) is an eigenvector of \(A_{\mathbf {x}},A_{\mathbf {y}},A_{\mathbf {z}}\), with eigenvalue \(+1\), we obtain

$$\begin{aligned} 1 = \langle \psi |A_{\mathbf {x}} A_{\mathbf {y}} A_{\mathbf {z}}|\psi \rangle = \omega ^{2 [\mathbf {z}-\mathbf {x},\mathbf {y}-\mathbf {x}]} \langle \psi |A_{\mathbf {x}-\mathbf {y} +\mathbf {z}}|\psi \rangle . \end{aligned}$$

But \(A_{\mathbf {x}-\mathbf {y} +\mathbf {z}}\) is Hermitian, so this is impossible unless \([\mathbf {z}-\mathbf {x},\mathbf {y}-\mathbf {x}]=0\). Therefore, T is the translate of an totally isotropic set M of cardinality \(d^n\). Since the maximal size of a totally isotropic subspace is also \(d^n\) [Gro06, App. C], T is necessarily a maximal isotropic subspace. We conclude that \(\psi \) is a stabilizer state. \(\square \)

8.2 Robust Hudson theorem

To obtain a robust version of the Hudson theorem, we will, similarly as in our approach to stabilizer testing, combine Eq. (8.1) with an uncertainty relation that generalizes the proof of Theorem 8.1.

Lemma 8.2

Let d be an odd integer and \(\psi \) a pure state of n qudits. Suppose that \({{\,\mathrm{tr}\,}}[\psi A_{\mathbf {x}}],{{\,\mathrm{tr}\,}}[\psi A_{\mathbf {y}}]\), \({{\,\mathrm{tr}\,}}[\psi A_{\mathbf {z}}] > \sqrt{1-1/2d^2}\). Then \([\mathbf {z}-\mathbf {x},\mathbf {y}-\mathbf {x}]=0\), i.e., \(W_{\mathbf {z}-\mathbf {x}}\) and \(W_{\mathbf {y}-\mathbf {x}}\) must commute.

Proof

Note that the assumption implies that

$$\begin{aligned} \Vert A_{\mathbf {x}} \vert \psi \rangle - \vert \psi \rangle \Vert < \sqrt{2\left( 1-\sqrt{1-\frac{1}{2d^2}}\right) } \le \frac{1}{d}, \end{aligned}$$

and likewise for \(\mathbf {y}\) and \(\mathbf {z}\). Thus we obtain the following inequalities:

$$\begin{aligned} \Vert A_{\mathbf {x}} \vert \psi \rangle - \vert \psi \rangle \Vert< \frac{1}{d}, \quad \Vert A_{\mathbf {y}} \vert \psi \rangle - \vert \psi \rangle \Vert< \frac{1}{d}, \quad \Vert A_{\mathbf {z}} \vert \psi \rangle - \vert \psi \rangle \Vert < \frac{1}{d} . \end{aligned}$$

As a consequence of the triangle inequality, and using \(\Vert A_{\mathbf {z}} \Vert \le 1\), along with Eq. (2.12), we obtain

$$\begin{aligned} \quad \Vert A_{\mathbf {x} - \mathbf {y}+\mathbf {z}} \vert \psi \rangle - \omega ^{-2[\mathbf {z}-\mathbf {x},\mathbf {y}-\mathbf {x}]}\vert \psi \rangle \Vert = \Vert A_{\mathbf {x}} A_{\mathbf {y}} A_{\mathbf {z}} \vert \psi \rangle - \vert \psi \rangle \Vert < \frac{3}{d}. \end{aligned}$$

We can simply expand this relation and see that it is equivalent to

$$\begin{aligned} 1-\frac{9}{2d^2} < \langle \psi \vert A_{\mathbf {x} - \mathbf {y}+\mathbf {z}} \vert \psi \rangle \cos ( 2[\mathbf {z}-\mathbf {x}, \mathbf {y}-\mathbf {x}] \frac{2\pi }{d}). \end{aligned}$$

If \([\mathbf {z}-\mathbf {x}, \mathbf {y}-\mathbf {x}]\ne 0\), then

$$\begin{aligned} 1-\frac{9}{2d^2}&< \langle \psi \vert A_{\mathbf {x} - \mathbf {y}+\mathbf {z}} \vert \psi \rangle \cos ( 2[\mathbf {z}-\mathbf {x}, \mathbf {y}-\mathbf {x}] \frac{2\pi }{d})\\&\le - \cos (\frac{d-1}{2}\cdot \frac{2\pi }{d})= \cos (\frac{\pi }{d}). \end{aligned}$$

But, one can see that exactly for \(d\ge 3\), \(1-\frac{9}{2d^2} \ge \cos (\frac{\pi }{d})\), which is contradiction. This shows that \([\mathbf {z}-\mathbf {x}, \mathbf {y}-\mathbf {x}]=0\). \(\square \)

Corollary 8.3

Let d be an odd integer and \(\psi \) a pure state of n qudits. Then \(T:=\{ \mathbf {x} \in {\mathcal {V}}_n : w_\psi (\mathbf {x}) > d^{-n} \sqrt{1-1/2d^2}\}\) is a subset of an affine totally isotropic subspace.

We now prove the main result of this section:

Theorem 8.4

(Robust finite-dimensional Hudson theorem). Let d be odd and \(\psi \) a pure quantum state of n qudits. Then there exist a stabilizer state \(\vert S\rangle \) such that \(|\langle S|\psi \rangle |^2 \ge 1 - 9 d^2 {{\,\mathrm{sn}\,}}(\psi )\).

Proof

Suppose that \({{\,\mathrm{sn}\,}}(\psi )\le \varepsilon \). Then \(\Vert \psi \Vert _W\le 1+2\varepsilon \) and we find from Eq. (8.1) that

$$\begin{aligned} \sum _{\mathbf {x}} q_\psi (\mathbf {x})^2 \ge \frac{1}{d^n (1+2\varepsilon )^2}, \end{aligned}$$

i.e.,

$$\begin{aligned} \sum _{\mathbf {x}} q_\psi (\mathbf {x}) \left( d^{-n}-q_\psi (\mathbf {x}) \right) \le d^{-n} \left( 1 - \frac{1}{(1+2\varepsilon )^2}\right) \le 4\varepsilon d^{-n}. \end{aligned}$$
(8.2)

We would like to show that the probability of the set T from Corollary 8.3 with respect to the probability distribution \(q_\psi \) is close to one. First, though, let us consider

$$\begin{aligned} {\tilde{T}} = \left\{ \mathbf {x} : |w_\psi (\mathbf {x})|> d^{-n}\sqrt{1-1/{2d^2}} \right\} = \left\{ \mathbf {x} : q_\psi (\mathbf {x}) > d^{-n}\left( 1-1/{2d^2}\right) \right\} \end{aligned}$$

which is defined just like T but for the absolute value of the Wigner function! Then, using Markov’s inequality and Eq. (8.2), we have

$$\begin{aligned} \sum _{\mathbf {x} \in {\tilde{T}}} q_\psi (\mathbf {x}) \ge 1 - \frac{\sum _{\mathbf {x}} q_\psi (\mathbf {x}) \left( d^{-n}-q_\psi (\mathbf {x}) \right) }{d^{-n}\cdot 1/{2d^2}} \ge 1-8d^2 \varepsilon . \end{aligned}$$

But then T is likewise a high-probability subset:

$$\begin{aligned}&\sum _{\mathbf {x} \in T} q_\psi (\mathbf {x}) \ge \sum _{\mathbf {x} \in {\tilde{T}}} q_\psi (\mathbf {x}) - \sum _{w_\psi (\mathbf {x})< 0} q_\psi (\mathbf {x}) \ge \sum _{\mathbf {x} \in {\tilde{T}}} q_\psi (\mathbf {x})\\ {}&- \sum _{w_\psi (\mathbf {x}) < 0} |w_\psi (\mathbf {x})|\ge 1-8d^2 \varepsilon -\varepsilon , \end{aligned}$$

where we used \(q_\psi (\mathbf {x})=d^n |w_\psi (\mathbf {x})|^2\le |w_\psi (\mathbf {x})|\).

As a result of Corollary 8.3, T is a subset of some affine totally isotropic subspace \(\mathbf {a}+M\). If \(\vert S\rangle \) denotes the corresponding stabilizer state then

$$\begin{aligned} |\langle \psi |S\rangle |^2&= d^n \sum _{\mathbf {x}} w_\psi (\mathbf {x}) w_S(\mathbf {x}) = \sum _{\mathbf {x}\in \mathbf {a}+M} w_\psi (\mathbf {x}) = \sum _{\mathbf {x}\in T} w_\psi (\mathbf {x}) \\&\quad + \sum _{\mathbf {x}\in (\mathbf {a} + M) \setminus T} w_\psi (\mathbf {x}) \ge \sum _{\mathbf {x}\in T} w_\psi (\mathbf {x}) - \varepsilon \\&\ge \sum _{\mathbf {x}\in T} q_\psi (\mathbf {x}) - \varepsilon \ge 1-(8d^2+2)\varepsilon > 1-9d^2 \varepsilon . \end{aligned}$$

In the fifth step we used that, for \(\mathbf {x}\in T\), \(w_\psi (\mathbf {x})=|w_\psi (\mathbf {x})|\ge d^n |w_\psi (\mathbf {x})|^2= q_\psi (\mathbf {x})\). \(\square \)

8.3 Stabilizer testing revisited: minimal number of copies

We will now revisit stabilizer testing from the perspective of the Wigner function and show that for \(d\equiv 1,5\pmod 6\) it is in fact possible to perform stabilizer testing with just three copies of the state. This is clearly optimal, since the set of stabilizer states forms a projective 2-design.

We start with the phase space point operators \(A_{\mathbf {x}}\) from (2.11), Consider the operator

$$\begin{aligned} V :=d^{-n} \sum _{\mathbf {x}} A_{\mathbf {x}}^{\otimes 3} = d^{-2n} \sum _{\mathbf {y}_1+\mathbf {y}_2+\mathbf {y}_3=0} W_{\mathbf {y}_1} \otimes W_{\mathbf {y}_2} \otimes W_{\mathbf {y}_3}. \end{aligned}$$
(8.3)

We remark that it is clear from Eq. (2.13) that V is an element of the commutant. Moreover, we have the following analog of Lemma 3.8:

Lemma 8.5

For \(d\equiv 1,5\pmod 6\), the operator V defined in Eq. (8.3) is a Hermitian unitary.

Proof

Since the operators \(A_{\mathbf {x}}\) are Hermitian, V is Hermitian as well. Thus it remains to prove that V is unitary. For this we compute:

$$\begin{aligned} V^2&= d^{-4n} \sum _{\mathbf {y}_1+\mathbf {y}_2+\mathbf {y}_3=0 ,\,\mathbf {z}_1+\mathbf {z}_2+\mathbf {z}_3=0 }{ W_{\mathbf {y}_1} W_{\mathbf {z}_1}\otimes W_{\mathbf {y}_2} W_{\mathbf {z}_2} \otimes W_{\mathbf {y}_3}W_{\mathbf {z}_3}}\\&=d^{-4n} \sum _{\mathbf {a}_1{+}\mathbf {a}_2{+}\mathbf {a}_3{=}0 ,\,\mathbf {b}_1{+}\mathbf {b}_2{+}\mathbf {b}_3{=}0 }{W_{\mathbf {a}_1} \otimes W_{\mathbf {a}_2} \otimes W_{\mathbf {a}_3}}\omega ^{\frac{1}{8} [\mathbf {a}_1 {+}\mathbf {b}_1,\mathbf {a}_1{-}\mathbf {b}_1] {+}\frac{1}{8} [\mathbf {a}_2 {+}\mathbf {b}_2,\mathbf {a}_2{-}\mathbf {b}_2] {+}\frac{1}{8} [\mathbf {a}_3 {+}\mathbf {b}_3,\mathbf {a}_3{-}\mathbf {b}_3] }\\&=d^{-4n} \sum _{\mathbf {a}_1+\mathbf {a}_2+\mathbf {a}_3=0 ,\,\mathbf {b}_1,\mathbf {b}_2 }{W_{\mathbf {a}_1} \otimes W_{\mathbf {a}_2} \otimes W_{\mathbf {a}_3}}\omega ^{- \frac{1}{4}[\mathbf {a}_1 -\mathbf {a}_3,\mathbf {b}_1] -\frac{1}{4}[\mathbf {a}_2 -\mathbf {a}_3,\mathbf {b}_2] }\\&= \sum _{\mathbf {a}_1+\mathbf {a}_2+\mathbf {a}_3=0 }{\delta _{\mathbf {a}_1,\mathbf {a}_3}\delta _{\mathbf {a}_2,\mathbf {a}_3}\,\,W_{\mathbf {a}_1} \otimes W_{\mathbf {a}_2} \otimes W_{\mathbf {a}_3}}=I, \end{aligned}$$

where, for the second equality, we used the change of variables \(\mathbf {a}_i=\mathbf {y}_i+\mathbf {z}_i\) and \(\mathbf {b}_i=\mathbf {y}_i-\mathbf {z}_i\). \(\square \)

We now consider the binary POVM measurement with accepting projector

$$\begin{aligned} \Pi _\text {accept} = \frac{1}{2} (I+V). \end{aligned}$$

Theorem 8.6

(Stabilizer testing from three copies). Let \(d\equiv 1,5\pmod 6\) and \(\psi \) a pure state of n qudits. Denote by \(p_\text {accept}={{\,\mathrm{tr}\,}}[\psi ^{\otimes 3}\Pi _\text {accept}]\) the probability that the POVM element \(\Pi _\text {accept}\) accepts given three copies of \(\psi \). If \(\psi \) is a stabilizer state then it accepts with certainty, \(p_\text {accept}=1\). On the other hand, if \(\max _S|\langle S|\psi \rangle |^2\le 1-\varepsilon ^2\) then \(p_\text {accept}\le 1-\varepsilon ^2/16d^2\).

Proof

We first note that

$$\begin{aligned} p_\text {accept} ={{\,\mathrm{tr}\,}}\left[ \Pi _\text {accept} \psi ^{\otimes 3}\right] =\frac{1}{2} \left( 1 + d^{-n} \sum _{\mathbf {x}} {{\,\mathrm{tr}\,}}\left[ A_{\mathbf {x}}^{\otimes 3} \psi ^{\otimes 3}\right] \right) =\frac{1}{2} \left( 1 + d^{2n} \sum _{\mathbf {x}} w^3_\psi (\mathbf {x}) \right) , \end{aligned}$$

where \(w_\psi \) denotes the Wigner function defined in Eq. (2.10). It is clear from Eq. (2.18) that if \(\psi \) is a stabilizer state then \(p_\text {accept}=1\).

Now assume that \(\psi \) is an arbitrary pure state. Since \(q(\mathbf {x}) = d^n w_\psi ^2(\mathbf {x})\) is a probability distribution, we can rewrite the above as

$$\begin{aligned} \sum _{\mathbf {x}} q_\psi (\mathbf {x}) \left( 1 - d^n w_\psi (\mathbf {x}) \right) = 2 \left( 1 - p_\text {accept} \right) . \end{aligned}$$

Moreover, \(q(\mathbf {x})\le d^{-n}\) for all \(\mathbf {x}\), so we can use Markov’s probability for the set

$$\begin{aligned} T = \left\{ \mathbf {x} \in {\mathcal {V}}_n : w_\psi (\mathbf {x}) > d^{-n} \sqrt{1 - 1/2d^2} \right\} \end{aligned}$$

to see that

$$\begin{aligned} \sum _{\mathbf {x} \in T} q_\psi (\mathbf {x}) \ge 1 - \frac{2 \left( 1 - p_\text {accept} \right) }{1 - \sqrt{1 - 1/2d^2}} \ge 1 - 8d^2 \left( 1 - p_\text {accept} \right) . \end{aligned}$$
(8.4)

We now argue similarly as in the proof of the robust Hudson theorem (Theorem 8.4). From Corollary 8.3 below we know that there exists an affine Lagrangian subspace \(\mathbf {a}+M\) that contains T. Let \(\vert S\rangle \) denote the corresponding stabilizer state. Then,

$$\begin{aligned} |\langle \psi |S\rangle |^2 = d^n \sum _{\mathbf {x}} w_\psi (\mathbf {x}) w_S(\mathbf {x}) = \sum _{\mathbf {x}\in \mathbf {a}+M} w_\psi (\mathbf {x}) = \sum _{\mathbf {x}\in T} w_\psi (\mathbf {x}) + \sum _{\mathbf {x}\in (\mathbf {a}+M)\setminus T} w_\psi (\mathbf {x}) \end{aligned}$$

For \(x\in T\), \(w_\psi (\mathbf {x})\ge 0\) and so \(w_\psi (\mathbf {x}) \ge q_\psi (\mathbf {x})\). Thus the first sum can be lower-bounded by using Eq. (8.4). For the second sum, we note that Eq. (8.4) also implies that \(1 - 8d^2 \left( 1 - p_\text {accept} \right) \le d^{-n} |T|\), since \(q_\psi (\mathbf {x})\le d^{-n}\), and so

$$\begin{aligned} \sum _{\mathbf {x}\in (\mathbf {a}+M)\setminus T} w_\psi (\mathbf {x}) \ge -d^{-n} |(\mathbf {a}+M) \setminus T|= -d^{-n} \left( d^n - |T|\right) \ge -8d^2 \left( 1 - p_\text {accept} \right) . \end{aligned}$$

Together, we obtain that \(|\langle \psi |S\rangle |^2 \ge 1 - 16d^2 \left( 1 - p_\text {accept} \right) \), or \(p_\text {accept} \le 1 - \varepsilon ^2/16d^2\), which is what wanted to show. \(\square \)