Random quantum circuits transform local noise into global white noise

We study the distribution over measurement outcomes of noisy random quantum circuits in the low-fidelity regime. We show that, for local noise that is sufficiently weak and unital, correlations (measured by the linear cross-entropy benchmark) between the output distribution $p_{\text{noisy}}$ of a generic noisy circuit instance and the output distribution $p_{\text{ideal}}$ of the corresponding noiseless instance shrink exponentially with the expected number of gate-level errors, as $F=\text{exp}(-2s\epsilon \pm O(s\epsilon^2))$, where $\epsilon$ is the probability of error per circuit location and $s$ is the number of two-qubit gates. Furthermore, if the noise is incoherent, the output distribution approaches the uniform distribution $p_{\text{unif}}$ at precisely the same rate and can be approximated as $p_{\text{noisy}} \approx Fp_{\text{ideal}} + (1-F)p_{\text{unif}}$, that is, local errors are scrambled by the random quantum circuit and contribute only white noise (uniform output). Importantly, we upper bound the total variation error (averaged over random circuit instance) in this approximation as $O(F\epsilon \sqrt{s})$, so the"white-noise approximation"is meaningful when $\epsilon \sqrt{s} \ll 1$, a quadratically weaker condition than the $\epsilon s\ll 1$ requirement to maintain high fidelity. The bound applies when the circuit size satisfies $s \geq \Omega(n\log(n))$ and the inverse error rate satisfies $\epsilon^{-1} \geq \tilde{\Omega}(n)$. The white-noise approximation is useful for salvaging the signal from a noisy quantum computation; it was an underlying assumption in complexity-theoretic arguments that low-fidelity random quantum circuits cannot be efficiently sampled classically. Our method is based on a map from second-moment quantities in random quantum circuits to expectation values of certain stochastic processes for which we compute upper and lower bounds.


Introduction
There is a fundamental trade-off in quantum computation between computation size and error rate. Naturally, the longer the computation, the lower the physical error rate must be to maintain a high probability of an errorless computation. Once the error rate is beneath a constant threshold, the theory of fault tolerance and quantum error correction [1,2] may be employed to push the probability of a logical error arbitrarily close to zero, despite the prevalence of many physical errors during the computation; however, error correction comes at the cost of additional qubits and gates. These overheads, while acceptable in an asymptotic sense, are likely to be overwhelming in the near and intermediate term. This inspires the idea of an upcoming Noisy Intermediate-Scale Quantum (NISQ) era [3], where hardware capabilities are good enough to perform non-trivial quantum tasks on dozens or hundreds of qubits, but quantum error correction, which might require thousands or millions of qubits, remains beyond reach.
In this paper, we study a model of NISQ devices performing random computations and prove a precise sense in which, for typical circuit instances, local errors are quickly scrambled and can be treated as white noise. For some applications, this phenomenon makes it possible for the signal of the noiseless computation to be extracted by repetition despite a large overall chance that at least one error occurs.
Our local error model assumes that each two-qubit gate in the quantum circuit is followed by a pair of gate-independent single-qubit unital noise channels acting on the two qubits involved in the gate. For simplicity and ease of analysis, we assume each of these noise channels is identical, but we fully expect the takeaways from our work to apply when the noise strength is allowed to vary from location to location. For concreteness in this introduction, we can consider the depolarizing channel with error probability . In this case, the fidelity of the noisy computation with respect to the ideal computation is expected to be roughly equal to the probability that no errors occur. We see that, for a circuit with s two-qubit gates, this quantity, denoted here by F = (1 − ) 2s , is close to 1 only if the quantity 2 s-the average number of errors-satisfies 2 s 1.
However, this high-fidelity requirement is quite restrictive in practice. Already for circuits with 50 qubits at depth 20, the error rate must be on the order of 10 −4 for the whole computation to run without error at least 90% of the time; this error rate is more than an order of magnitude smaller than what has been achievable in recent experiments on superconducting qubit systems of that size [4][5][6]. Indeed, in their landmark 2019 quantum computational supremacy experiment [4], a group at Google performed random circuits on 53 qubits of depth 20, but the fidelity of the computation was F ≈ 0.002, meaning at least one error occurs in all but a tiny fraction of the trials. Similar experiments at the University of Science and Technology of China on 56 [5] and 60 [6] qubits reported even smaller fidelities of 0.0007 [5] and 0.0004 [6]. This would not be an issue if one could determine when a trial is errorless. (In this case, one could just repeat the experiment 1/F times.) However, error-detection requires overheads similar to error-correction.
Rather, low-fidelity random circuit sampling experiments and their claim of quantum computational supremacy benefit from a key assumption [4,7]: when at least one error does occur, the output of the experiment is well approximated by white noise, that is, the output is random and uncorrelated with the ideal (noiseless) output. When this is the case, the signal of diminished size F can, at least for some applications, be extracted from the white noise using O(1/F 2 ) trials, as we explain later. Specifically, for quantum computational supremacy, the white-noise assumption is that the distribution p noisy over measurement outcomes of their noisy device is close to what we call the "white-noise distribution" with p ideal the ideal distribution and p unif the uniform 1 distribution. In particular, for the approximation to be non-trivial, we demand that the total variation distance between p noisy and p wn be a small fraction of F, that is 1 2 p wn − p noisy 1 F . (white-noise assumption) (2) This demand is necessary because we expect that p noisy also decays toward p unif such that 1 2 p noisy − p unif 1 = Θ(F), and thus p unif is a trivial approximation for p noisy with error Θ(F).
Prior to their experiment, the Google group provided numerical evidence [7] in favor of the white-noise assumption 2 for randomly chosen circuits by showing that the output distribution of random circuits of depth 40 on 20 qubits (arranged in a 2D lattice) subject to a local Pauli error model approaches the uniform distribution, and that the fidelity of p noisy with respect to p ideal appears to decay exponentially, consistent with p noisy ≈ p wn . However, their analysis did not specifically estimate the distance 3 between p noisy and p wn . The white-noise condition in Eq. (2) requires that the distance between p noisy and p wn decrease as the expected number of errors increases and F decays, so quantifying the differences between the distributions is vital for determining how well the white-noise approximation is obeyed.
Here we prove rigorous bounds on the error in the white-noise approximation, averaged over circuits with randomly chosen gates. Our results fully apply in two random quantum circuit architectures: first, the 1D architecture with periodic boundary conditions, where qubits are arranged in a ring and alternating layers of nearest-neighbor gates are applied; and second, the complete-graph architecture, where each gate is chosen to act on a pair of qubits chosen uniformly at random among all n(n − 1)/2 pairs. 4 We show that, for Pauli noise channels, the error in the white-noise approximation is small as long as (1) 2 s 1, (2) s ≥ Ω(n log(n)), and (3) 1/(n log(n)). We believe that condition (3) could be relaxed to read < c/n for some universal constant c = O(1) (numerics suggest c = 0.3 for the complete-graph architecture). Condition (1) is a quadratic improvement over the condition s 1 needed for high fidelity. For circuits with < 0.005, as is the case in recent experiments [4][5][6], thousands of gates could potentially be implemented before condition (1) fails. Note that our technical statements hold for general (non-Pauli) error channels as well, but we find that the error in the white-noise approximation is small only for incoherent noise channels. We complement this analysis with numerical results that confirm the picture presented by our theoretical proofs for the complete-graph architecture, and demonstrate that realistic NISQ-era values of the error rate and circuit size can lead to a good white-noise approximation.
By putting the white-noise approximation for random quantum circuits on stronger theoretical footing, our work has several applications. First, the white-noise assumption is an ingredient in formal complexity-theoretic arguments that the task accomplished on noisy devices running random quantum circuits is hard for classical computers (allowing the declaration of quantum computational supremacy) [4]. We complement our main result by showing in Appendix C that classically sampling from the white-noise distribution within total variation distance ηF is, in a certain complexity-theoretic sense, equivalent up to a factor of F (which is optimal) to sampling from the ideal output distribution within total variation distance O(η). This makes low-fidelity experiments where errors are common nearly as defensible for quantum computational supremacy as high-fidelity experiments where errors are rare, at least in principle. Second, our result lends theoretical justification to the usage [4][5][6] of the linear cross-entropy metric proposed in Ref. [4] to benchmark noise in random circuit experiments and verify that hardware has correctly performed the quantum computational supremacy task. Indeed, as a side result, we show that, for both incoherent and coherent noise, the metric decays precisely as e −2s ±O(s 2 ) when is sufficiently small; this also suggests that the linear cross entropy benchmark could be reliably used to accurately estimate the underlying local noise rate [9].
Beyond random circuit experiments for quantum computational supremacy, our work suggests that other scenarios where the white-noise assumption holds may be advantageous in the NISQ era, as one can eschew error-correction and nonetheless perform a fairly long quantum computation, as long as one is willing to repeat the experiment O(1/F 2 ) times. One example of a scenario where the assumption may hold is quantum simulation of fixed chaotic Hamiltonians, since they are also believed to be efficient at scrambling errors.
The remainder of the paper is structured as follows: in Section 2, we describe our setup and in particular our model for local noise within a random quantum circuit; in Section 3, we precisely state our results; in Section 4, we discuss further implications and how our results fit in with prior work; in Section 5, we give an overview of the intuition behind our result and the method we use in our proofs, which is based on a map from random quantum circuits to certain stochastic processes, which can also be interpreted as partition functions of statistical mechanical systems. This method might be regarded as an extension of the method in Ref. [8], where we studied anticoncentration in random quantum circuits. In Section 6, we present a numerical calculation of our bound for the realistic values of the circuit parameters informed by the experiments in Refs. [4][5][6] (although for the complete-graph architecture, rather than 2D). We conclude the main text with an outlook in Section 7. The rigorous proofs and details behind the map to stochastic processes then appear in the appendices.

A model of noisy random quantum circuits
Here we describe our model of noisy random quantum circuits. Let the circuit consist of s twoqudit gates acting on n qudits, each with local Hilbert space dimension q. We follow Ref. [8] in defining a random quantum circuit architecture as an efficient algorithm that takes the circuit specifications (n, s) as input and outputs a quantum circuit diagram with s two-qudit gates, that is, a length-s sequence of qudit pairs (without specifying the actual gates that populate the diagram). Our results fully apply for two specific architectures: the 1D architecture with periodic boundary conditions, and the complete-graph architecture, which were previously shown in Ref. [8] to have the anti-concentration property as long as s ≥ Ω(n log(n)), with a particular constant prefactor. Our results would also fully apply for standard architectures in D spatial dimensions (with periodic boundary conditions) if it could be proved that they also achieve anti-concentration whenever s ≥ Ω(n log(n)), as was conjectured in Ref. [8].
Given an architecture and parameters (n, s), we can generate a circuit instance by choosing the circuit diagram according to the architecture and then choosing each of the unitary gates in the diagram at random according to the Haar measure. Each instance is associated with an output probability distribution p ideal over q n possible computational basis measurement outcomes x ∈ [q] n (where [q] = {0, 1, . . . , q−1}) that would be sampled if the circuit were implemented noiselessly. Note that in the formal analysis we include a layer of n (also Haar-random) single-qudit gates at the beginning and end of the circuit without counting these 2n gates toward the circuit size; these might be regarded as fixing the local basis for the input product state and the measurement of the output.

Local noise model
We augment this setup by inserting single-qudit noise channels into the circuit diagram, which act on qudits involved in a multi-qudit gate immediately following the gate, as shown in the example in Figure 1. In our model, the single-qudit gates remain noiseless and measurements are assumed to be perfect. 5 Thus, the core assumption is that the noise is local, i.e. independent from qudit to U (-3) U (9) |0 U (-2) U (8) |0 U (-1) U (7) |0 U (0) U (6)  U (2) U (3) U (4) U (5) Figure 1: Example of a noisy quantum circuit diagram on n = 4 qudits with s = 5 two-qudit gates. A pair of single-qudit noise channels N follow each two-qudit gate. The circuit begins and ends with a layer of noiseless single-qudit gates.
qudit. We assume each noise channel N is a unital and completely positive trace-preserving map. For a given noise channel, there are only two parameters that matter for our analysis, the average infidelity and the unitarity of the channel. The average infidelity for a channel N is defined as where the integral is over the Haar-measure on q×q unitary matrices V and |ψ ψ| is any pure state. The average infidelity is one measure of the overall noise strength of the channel N . Following Refs. [10,11], the unitarity is defined for unital channels as The unitarity is the expected purity of the output state under random choice of input state, scaled to have minimum value of 0 and maximum value of 1.
Examples: depolarizing, dephasing, and rotation channels It is helpful to consider explicitly the following three channels. First, the depolarizing channel is the set of single-qudit Pauli matrices (appropriately generalized to higher q), and I is the q × q identity matrix. There are two ways to think of the channel: first, with probability 1 − γ doing nothing and with probability γ resetting the state to the maximally mixed state on that qudit; second, with probability 1 − doing nothing and with probability choosing a Pauli operator at random to apply to the qudit.
We can also consider the dephasing channel which represents doing nothing with probability 1 − q /(q − 1) and performing a measurement in the computational basis with probability q /(q − 1). Finally, we can consider a coherent noise channel, for example the rotation channel which applies a small unitary rotation by angle θ to the state. The average infidelity and unitary of these channels are given in Table 1. Table 1: Average infidelity and unitarity for three different single-qudit noise channels, where q denotes the local dimension of the qudits (q = 2 for qubits).

Output distributions of the quantum circuit
Suppose the locations of the s two-qudit gates have been fixed, with gate t acting on qudits {i t , j t }.
Then a circuit instance is specified by a sequence (U (−n+1) , . . . , U (s+n) ), where U (t) is a q 2 × q 2 (twoqudit) unitary matrix if 1 ≤ t ≤ s and a q × q (single-qudit) unitary matrix otherwise. Accordingly, for each t, let denote the unitary channel that acts as U (t) on qudits i t and j t and as the identity channel (denoted by I ) on the other qudits. To account for noise, let be the channel that applies noise channels after applying the unitary gate. Now we can define the ideal and noisy output distributions by Our work compares the distribution p noisy to the white-noise distribution p wn (defined in Eq. (1) and repeated here) for some choice of F. The white-noise distribution is a mixture of the ideal distribution and the uniform distribution. Note that p ideal , p noisy , and p wn all depend implicitly on the circuit instance U . In the analysis we treat F as a free parameter, and we choose it such that our bound on the distance between p noisy and p wn is minimized. The total variation distance between two distributions p 1 and p 2 is defined as Comment on randomness in our setup There are multiple types of randomness in our analysis, and in understanding our result it is important to keep track of how they interplay. First of all, the noiseless circuit instance U is generated randomly by choosing each gate to be Haar random. The choice of U determines an ideal pure output state. Second of all, for each fixed choice of U , the noise channels may introduce randomness that makes the noisy output state mixed. When the noise is depolarizing noise, this might be regarded as the insertion of a randomly chosen pattern of Pauli errors. Lastly, the measurement of the state in the computational basis gives rise to a random measurement outcome drawn from a certain classical probability distribution: p ideal if we are considering the noiseless circuit, and p noisy if we are considering the noisy circuit. The important thing to remember is that we are primarily concerned with thinking about fixed instances U and the interplay between the resulting probability distributions p ideal , p noisy and p wn for that instance. Then, we make a statement about these distributions that holds in expectation over random choice of U . If desired, one could then use Markov's inequality to form bounds on the fraction of instances U for which the white-noise approximation must be good.
Comment on more general (universal) gate sets We consider random quantum circuits built from local 2-site unitary gates drawn randomly with respect to the Haar measure. As our analysis involves only second moment quantities, our results therefore directly apply to any gate set (or distribution on the 2-site unitary group) that forms an exact unitary 2-design, e.g. random Clifford circuits with gates drawn from the Clifford group. Furthermore, circuits constructed with gates drawn randomly from universal gate sets should give rise to similar scrambling phenomena and we expect that our results hold for such circuits, including the actual random circuit experiments performed in Refs. [4][5][6]. While our method is not directly generalizable to other gate sets, we anticipate that if our analysis were extendable to such gate sets, the results would only change by constant factors. Some evidence for this is provided by the independence of the spectral gap for universal gate sets [12]. This implies that the depth at which random quantum circuits scramble (and converge to approximate unitary designs) only changes by a constant factor when one considers circuits comprised of gates drawn randomly for any universal gate set [13].

Overview of contributions
The main result of this paper is a proof that, for typical random circuits, the output distribution p noisy of the quantum circuit with local noise is very close to the white-noise distribution p wn if the noise is sufficiently weak. Specifically, we prove an upper bound on the expectation value of the total variation distance between the two distributions. In proving that result, we also prove a statement about the expected fidelity in noisy random quantum circuits, and another statement about the speed at which p noisy approaches the uniform distribution. For all statements, the notation E U denotes expectation over choice of Haar-random single-qudit and two-qudit gates.
In the rest of this section, we state our results for general noise channels, deferring the proofs to Appendix B, but first we summarize the contributions specifically applied to the depolarizing channel in Table 2.  (5)) with error parameter . The quantityF, given in Eq. (14), is the expectation of the linear cross entropy metric using noisy samples, normalized by its expectation using ideal samples. These statements apply for the 1D and complete-graph architectures when the circuit size is larger than Ω(n log(n)) (corresponding to the regime where the anti-concentration property has been achieved), and assuming that the quantity n log(n) is small enough to be neglected. We believe that this condition can be relaxed to < c/n for some constant c.
Comment on architectures The theorem statements below are expressed only for the 1D and complete-graph architectures, which are known to anti-concentrate after circuit size Θ(n log(n)).
In the appendix, we prove slightly more general statements that also hold for any architecture consisting of layers and satisfying a natural connectivity property (this includes standard architectures in D spatial dimensions with periodic boundary conditions). These statements depend on the anti-concentration size s AC of these architectures, which is conjectured to be Θ(n log(n)) but for which the best known upper bound is O(n 2 ) [8].

Fidelity decay
The quantityF may be regarded as an estimate of the fidelity of the noisy quantum device with respect to the ideal computation; when p noisy (x) and p ideal (x) are viewed as random variables in the instance U ,F is equal to their covariance, normalized by the variance of p ideal . Note also that the numerator ofF is the expected score on the linear cross-entropy benchmark (as proposed in Ref. [4]) using samples from the noisy device, and the denominator is the expected score using samples from the ideal output distribution. Refs. [9,14] studied a similar quantity, the difference being that the E U appears outside the fraction in their case. Additionally, note that the denominator is given by q n Z − 1, where Z is the collision probability studied in Refs. [8,15]. The results of Ref. [8] imply that the denominator becomes within a small constant factor of (q n − 1)/(q n + 1) ≈ 1 (and can therefore be essentially ignored) after Θ(n log(n)) gates.
Theorem 1. Consider either the complete-graph architecture or the 1D architecture with periodic boundary conditions on n qudits of local Hilbert space dimension q and comprised of s gates. Let r be the average infidelity of the local noise channels. Then there exists constants c and n 0 such that whenever r ≤ c/n and n ≥ n 0 , the following holds: where Q 1 = exp O(sr 2 ) + O(rn log(n)) + e O(log(n))−Ω(s/n) + O(nr log(1/(nr))) .
Note that the relationship = r(q + 1)/q holds for the depolarizing channel as defined in Eq. (5), so, ignoring the O(q −2n ) corrections, e −2s −O(s 2 ) ≤F ≤ e −2s +O(s 2 )+O( n log(n))+O(n log(1/(n )) , (18) indicating that the fidelity decreases exponentially with the expected number of Pauli errors 2s , as long as the noise is sufficiently weak that the other terms can be ignored. In particular, three conditions must be met to approximate Q 1 by 1 in Eq. (16): (1) 2 s 1, (2) anti-concentration has been reached, i.e. s ≥ Ω(n log(n)), and (3) 1/(n log(n)). One implication of Theorem 1 is that the same kind of decay extends to general noise channels and is observed even for coherent noise channels like the rotation channel.

Convergence to uniform
We show an upper bound on the expected total variation distance between the output of the noisy quantum device p noisy and the uniform distribution. Our bound decays exponentially in the number of error locations, under certain circumstances. In particular, it decays exponentially in (1 − u)(1 − q −2 )s where u is the unitarity of the local noise channels.
Theorem 2. Consider either the complete-graph architecture or the 1D architecture with periodic boundary conditions on n qudits of local Hilbert space dimension q and s gates. Let u be the unitarity of the local noise channels (and define v = 1 − u). Then there exist constants c and n 0 such that as long as v ≤ c/n and n ≥ n 0 where p unif is the uniform distribution, and Note that Q 2 is small under a similar three conditions as in the fidelity decay result: (1) s(1 − u) 2 1, (2) anti-concentration has been reached, and (3) n log(n)(1 − u) 1.
For the depolarizing channel, u = 1 − 2 (1 − q −2 ) −1 up to first order in , so the distance to uniform decays like e −2s , which is identical to the rate of fidelity decay. On the other hand, the unitarity of the rotation channel is u = 1, so our upper bound does not decay with s, even though F does decay for the rotation channel. This is expected because the rotation channel is coherent; indeed, unlike the other two examples, it sends pure states to pure states. The ideal pure state and the noisy pure state will become less and less correlated as more noise channels act, which explains whyF decays, but the output distribution for the noisy pure state will not converge to uniform.

Distance to white-noise distribution
We show a stronger statement that is meaningful when the noise is incoherent. Not only does the output distribution decay to uniform, it does so in a very particular way, preserving an uncorrupted signal from the ideal distribution. We show that p noisy is close to p wn by upper bounding the expected total variation distance between the two distributions.
Theorem 3. Consider either the complete-graph architecture or the 1D architecture with periodic boundary conditions on n qudits of local Hilbert space dimension q and s gates. Let r be the average infidelity and u the unitarity of the local noise channels (and define v = 1 − u). Let Then, when we choose F =F as in Eq. (14), there exist constants c 1 , c 2 , and n 0 such that as long as v ≤ c 1 /n, r ≤ c 2 /n, and n ≥ n 0 , whenever the right-hand side of Eq. (22) is less thanF.
We make a couple of comments. First, we emphasize how small the right-hand side of Eq. (22) is. The quantityF is decaying exponentially in the number of expected errors, as shown in Theorem 1. We showed in Theorem 2 that p noisy converges to uniform at roughly the same rate. However, the distance between p noisy and p wn is much smaller thanF if the parameters are sufficiently weak, demonstrating that the noisy and white-noise distribution are much closer to each other than either are to uniform.
Second, let us examine the quantity δ. For the depolarizing channel and the dephasing channel, the leading term in δ cancels out leaving δ = O( 2 ), so the √ δ term in Eq. (22) is on the same order as the other terms. This is a signature of incoherent noise. The coherent rotation channel, which has u = 1 and r = O(θ 2 ), has δ = O(θ 2 ), so √ δ is large compared to the other terms in the expression. In this case, we would need sr 1 for the approximation to be good, but if this is true, thenF ≈ 1 and the white-noise approximation is trivial.
Relatedly, the parameter δ can be connected to the diamond distance D of the channel N , which is the maximum amount action by N can change an input state (which might be entangled with an auxiliary system) as measured by the trace norm. If N is applied 2s times, the total deviation in trace norm from the ideal output can be as large as 2sD in the worst case. It was shown in Ref. [16] It is also known that r ≤ O(D) and 1−u ≤ O(D). Thus, if we ignore the final three terms in Eq. (22), we can write our result as This emphasizes that the fundamental result is an improved trade-off between noise and circuit size; the strength of the signal decays exponentially, but the error on the signal (after renormalization) grows quadratically slower (as O(D √ s)) in the case of random quantum circuits with incoherent noise than it does in the worst case (as O(Ds), for arbitrary circuits and arbitrary noise channels with diamond distance D).

Quantum computational supremacy
A central motivation for our work has been recent quantum computational supremacy experiments [4,5] that sampled from the output of noisy random quantum circuits on superconducting devices. In this context, the main claim is that no classical computer could have performed the same feat in any reasonable amount of time. While no efficient classical algorithms to simulate the quantum device performing this task are known, there is a lack of concrete theoretical evidence that no such algorithm exists.
Our work bolsters the theory behind these experiments in two ways, assuming noise in the device is sufficiently well described by our local noise model. First, our fidelity decay result validates using the linear cross-entropy metric to benchmark the overall noise rate in the device, and quantify the amount of signal from the ideal computation that survives the noise. Second, convergence to the white-noise distribution has theoretical benefits with respect to a potential proof that the random circuit sampling task accomplished by the device is actually hard for classical computers.

Linear cross-entropy benchmarking
Quantum computational supremacy experiments are complicated by the fact that since (by definition) they cannot be replicated on a classical computer, it is non-trivial to classically verify that they actually performed the correct computational task. A partial solution to this issue has been the proposal of linear cross-entropy benchmarking, whereby a sample x is generated by the device according to the noisy output distribution p noisy , and a classical supercomputer is used to compute p ideal (x). 6 When T samples {x 1 , . . . , x T } are chosen, the average is calculated, which is an empirical measure of the circuit fidelity. We can see that the expected value of F is precisely x p noisy (x)(q n p ideal (x)−1), which is the numerator of the quantityF defined in Eq. (14). Meanwhile, the denominator ofF becomes close to 1, so long as the output is anticoncentrated. In Theorem 1, we show that if the depolarizing error rate satisfies 1/(n log(n)) and as long as 2 s 1, then there are matching upper and lower bounds on the expected value of F , which decays with the circuit size like e −2 s . Thus, assuming our local noise model, we prove that one can infer given F and s. The inferred value of can then be compared to the noise strength estimated when testing each circuit component individually, thus providing one method of verification that the components are behaving as expected during the experiment.
Indeed, the idea of using random circuit sampling as an alternative to randomized benchmarking was formally proposed in Ref. [9], a work that has certain similarities to ours. In particular, like us, they find that the condition 1/ ≥ Ω(n) appears necessary for controlled decay of the fidelity. (Our result can be expressed as requiring 1/ ≥Ω(n), where the tilde hides log factors, and we believe those log factors are not necessary for our result.) They give analytical and numerical evidence that the fidelity decays as e −2 s . Additionally, like us, they use a map from random quantum circuits to identity-swap configurations to motivate their results. However, they only analytically study the fidelity decay up to first order in the error rate for a 1D architecture; that is, they compute the expected fidelity due to contributions with an error at only one location (or a correlated set of locations at the same depth). On the other hand, their error model is more general than ours as we do not consider correlated errors (their theoretical analysis handles Pauli errors of up to weight three); in the context of noise characterization, this is important as correlated errors are often the most difficult to diagnose. On this point, we believe correlated errors could be handled by our method with a more intricate analysis, but we leave that for future work. Relatedly, exponential decay of fidelity in noisy systems has been proposed [17] as an experimentally detectable signature of quantum mechanics that distinguishes it from theories where quantum mechanics emerges from an underlying classical theory. Our work may help justify these proposals.
Note that as the fidelity decays, more samples must be generated to form a good estimate of the mean of F . Since p ideal (x) for uniformly random x has standard deviation on the order of q −n (assuming anti-concentration), the standard deviation of F is expected to decay with the number of samples like 1/

√
T . Thus, resolving the mean of F with enough precision to differentiate it from 0 requires T = Ω(1/F 2 ) samples.
We comment that while our analysis assumes that each noise location has the same value of , this is not essential to our method. We expect it could be shown that the expected value of F decays like exp(− i i ) where i runs over all possible noise locations. Moreover, our analysis works for any kind of local noise, not just depolarizing noise; the only relevant parameter is the average infidelity of the noise channels. This includes coherent noise; for example, the average infidelity of the coherent rotation channel given in Eq. (7) is less than 1 and thus leads to exponential decay of F . This is consistent with Ref. [9], which previously showed that from the perspective of fidelity decay, every channel is equivalent to an (incoherent) Pauli noise channel.

Classical hardness of sampling from the noisy output distribution
To claim to have achieved quantum computational supremacy, the low-fidelity random circuit sampling experiments in Refs. [4,5] must define a concrete computational problem that their device solved, but a classical device could not also solve. Here there are a couple of options. One option is to simply rely directly on the linear cross-entropy benchmark and define the task to be generating a set of samples that scores at least F ≥ 1/ poly(n). A related idea is the task of Heavy Output Generation (HOG) [18], which is to generate outputs x for which p ideal (x) is large (i.e. "heavy outputs") significantly more often than a uniform generator. The upshot of these definitions is that in the regime where p ideal (x) can be calculated classically with an exponentialtime algorithm, it can be verified that the quantum device successfully performed the task. Their main drawback is that it is not clear whether running a (noisy) quantum computation is the only way to perform these tasks. Perhaps a (yet-to-be-discovered) classical algorithm can score well on the linear cross-entropy benchmark without performing an actual random circuit simulation; for example, this was the goal in Ref. [19].
Another option is to define the task specifically in terms of the white-noise distribution. Namely, one must produce samples from a distribution p noisy for which 1 2 p noisy − p wn 1 ≤ ηF for some choice of F not too small (ideally at least inverse polynomial 7 in n) and some small constant η. We refer to this task as "white-noise random circuit sampling (RCS)." A downside of this option is that even with unlimited computational power, an exponential number of samples from the device would be needed to definitively verify that the distribution is close to p wn in total variation distance. Our work provides a partial solution here, as we show that a local error model allows a device to accomplish the white-noise RCS task, as long as the error rate is sufficiently weak compared to the number of qubits. Thus, if the experimenters are sufficiently confident in the error model that describes their device, they can rely on our work to be confident they are performing the white-noise RCS task.
The major upside of the white-noise RCS task is that one can give stronger evidence that it is classically hard to perform. For example, in the Supplementary Material of Ref. [4], it was shown that exactly (i.e. η = 0) sampling from p wn (a task they called "unbiased noise F-approximate random circuit sampling") in the worst case is a hard computational task in the sense that an efficient classical algorithm for it would cause the collapse of the polynomial hierarchy (PH), and further that its computational cost should be at most a factor of F smaller than sampling exactly from p ideal . In that spirit, we show in Theorem 4, in the appendix, that the more realistic task of sampling approximately from p wn is essentially just as hard as sampling approximately from p ideal , up to a linear factor of F in the classical computational cost. This is important because some mild progress has been made toward establishing that approximately sampling from p ideal is hard for the polynomial hierarchy, through a series of work that reduce the task of computing p ideal (x) in the worst case to the task of computing p ideal (x) in the average case up to some small error [20][21][22][23]. Weaknesses in this result as evidence for hardness of approximate sampling were discussed in more detail in Refs. [22,24], but it remains true that the white-noise-centered definition of the computational task is the likeliest route to a more robust version of quantum computational supremacy that can be grounded in well-studied complexity theoretic principles.

Convergence to uniform with circuit size
It is widely understood that incoherent and uncorrected unital noise in quantum circuits should typically lead the output of a quantum circuit to lose all correlation with the ideal circuit and become nearly uniform. It is further asserted that the decay to uniform should scale with the circuit size; however, rigorous results have only shown a decay in total variation distance to uniform with the circuit depth d, following the form e −Ω( d) . In particular, Ref. [25] showed that any (even non-random) circuit with interspersed local depolarizing noise approaches uniform at least this quickly. Later, Ref. [26] showed the same is true for any Pauli noise model, at least for most circuits chosen from a particular random ensemble. However, in Ref. [22], a stronger convergence at the rate of e −Ω( s) in random quantum circuits like ours was desired in order to show a barrier on further improvements of their worst-to-average-case reduction for computing entries of p ideal . To that end, they showed that exponential convergence in circuit size occurs in a toy model where each layer of unitary evolution enacts an exact global unitary 2-design, and they conjectured the same is true in the local noise model we consider in this paper. Thus, our result in Theorem 2 gets close to providing the missing ingredient for their claim; for their application, we would need to extend our result to show e −Ω( s) even in the regime where = O(1), independent of n. Our result applies only for = O(1/n), but we believe the extension to = O(1) might also be provable with our method.

Signal extraction in noisy experiments
One implication of our work is that, in the parameter regime where our results apply, the signal from the noiseless random circuit experiment can be extracted by taking many samples. To illustrate this, suppose we are interested in some classical function f (x) for x ∈ [q] n that takes values between −1 and +1. Choosing x randomly from p ideal induces a probability distribution over the resulting values of f (x). To understand this distribution empirically (e.g., estimate its mean or variance), samples x i might be generated on a quantum device, but if the device is noisy, these samples will be drawn from p noisy instead of p ideal . However, if p noisy ≈ p wn , then the sam-pled distribution over f (x) will be a mixture of the ideal with weight F, and the distribution that arises from uniform choice of x with weight 1 − F. Supposing the latter is well understood, inferences can be made about the former by repetition. For example, if x p ideal (x)f (x) = µ = O(1) and x f (x)/q n = 0, 8 then the mean of f under samples from p wn is Fµ. Meanwhile, the standard deviation of f can be as large as O(1), indicating that O(1/F 2 ) samples from p wn are required to compute the mean Fµ up to O(F) precision. Generally, this procedure requires knowing the value of F.
A concrete example of such a situation is the Quantum Approximate Optimization Algorithm (QAOA) [27], where samples x from the output of a parameterized quantum circuit are used to estimate the expectation of a classical cost function C(x). The parameters can then be varied to optimize the expected value of the cost function. Our work is for Haar-random local quantum circuits, which are, in a sense, very different from QAOA circuits. For example, the marginal of typical random circuits on any constant number of qubits is very closed to maximally mixed, whereas QAOA circuits optimized for local cost functions will, by design, not have this property. Nevertheless, it is plausible that generic QAOA circuits might respond to local noise in a similar way as random quantum circuits. Indeed, in Refs. [28][29][30], numerical and analytic evidence was given for the conclusion that the expectation value of the cost function and its gradient with respect to the circuit parameters decay toward zero when local noise is inserted into a QAOA circuit. This behavior would be consistent with a stronger conclusion that the output is well described by p wn .

Summary of method and intuition
In this section, we present a heuristic argument about why the technical statements above should hold. Then we give an overview of how we actually show it using our method, which analyzes certain Markov processes derived from the quantum circuits, extending our previous work in Ref. [8].

Intuition behind error scrambling and error in white-noise approximation
Our result that p noisy is very close to p wn requires three conditions to be satisfied: (1) 2 s 1; (2) anti-concentration has been achieved, i.e. s ≥ Ω(n log(n)); and (3) n log(n) 1. Here, we try to motivate why these conditions should be sufficient and speculate about whether they are also necessary. In particular, we believe condition (3) can be significantly relaxed.
For simplicity, lets restrict to qubits (q = 2). Let U denote the unitary enacted by the noiseless quantum circuit instance, so the ideal output state is the pure state ρ ideal = U |0 n 0 n |U † . If a location somewhere in the middle of the circuit experiences a Pauli error, then we could write the output state as where P is a Pauli operator with support on only one qubit, and U = U 2 U 1 is a decomposition of the unitary into gates that act before and after the error location. If we like, we can commute P to act at the end of the circuit, giving O P U |0 n 0 n |U † O † P where 8 In a sense, the white-noise assumption is overkill for this application; a similar signal extraction could be performed even if p noisy = Fp ideal + (1 − F)p err for some non-uniform p err as long as drawing samples x from p err lead to a mean for f (x) that can be easily calculated in advance (when this is possible one can subtract a constant from f and assume the mean is zero). However, the white-noise assumption certainly makes this process easier as it will typically be easy to calculate the mean of f (x) under uniform choice of x.
Unlike P , the operator O P will likely have support over many qubits. Indeed, this is what we mean by scrambling; the portion of the circuit acting after the error location scrambles the local noise P into more global noise O P . We can handle error patterns E with multiple Pauli errors similarly, by commuting each to the end one at a time and forming an associated global noise operator O E .
Next, we expand the output quantum state ρ noisy of the noisy circuit as a sum over all possible Pauli error patterns, weighted by the probability that each pattern occurs. Assuming the local noise is depolarizing, the probability of a pattern E depends only on the number of non-identity Pauli operators in the error pattern, denoted by |E|.
The classical probability distribution p noisy is then given by p noisy (x) = x|ρ noisy |x for each measurement outcome x. Observe that for the error pattern with |E| = 0 (no errors), we have ρ E = ρ ideal . There can be other error patterns for which O E ρ ideal O † E = ρ ideal ; for example, when a lone Pauli-Z error acts prior to any non-trivial gates, the state is unchanged since the initial state |0 n is an eigenstate of all the Pauli-Z operators. However, these error patterns are rare and for the sake of intuition we ignore this possibility. In essence, the white-noise assumption is the claim that when we take the mixture over output states for all of the error patterns, we arrive at a state ρ err that produces measurement outcomes that are very close to uniform. (Note that in general ρ err need not be close to maximally mixed to yield uniformly random measurement outcomes.) Letting F = (1 − ) 2s , we may write where I/2 n denotes the maximally mixed state. This final term gives the deviations of the noisy output state ρ noisy from a linear combination of the ideal state and I/2 n . This allows us to state more clearly the intuition for our result. Since the circuit is randomly chosen and scrambles the local error patterns, the operators O E generally have large support and are essentially uncorrelated for different choices of error pattern E. Suppose we measure in the computational basis, and examine the probability of obtaining the outcome x. We can calculate the squared deviation between this value and the white-noise value under expectation over instance U . where Suppose we now make the approximation that the quantities p E (x) and p E (x), when considered as functions of the random instance U , are independently distributed unless E = E . Their mean is 2 −n and, assuming anti-concentration (condition (2)), their standard deviation is O(2 −n ). Then we have where the last line is true when 2 s 1. This implies that the deviation of each entry in the probability distribution p noisy from the white-noise distribution is on the order of F2 −n √ s, and since there are 2 n entries, we have In other words, the total variation distance is much smaller than F when 2 s 1, giving an intuitive reason for condition (1). Moreover, without condition (2), the contribution of each term would be much larger than O(2 −2n ), which illustrates why condition (2) is necessary.
The key step in this analysis was the assumption of independence between p E and p E when E E . This is only approximately true; indeed for a circuit that does not scramble errors, this will be a bad approximation because it might be common to have different error patterns E, E that produce the same (or approximately the same) effective error O E = O E . However, for random quantum circuits, this outcome is unlikely for the vast majority of error pairs. Our rigorous proof, later, might be regarded as a justification of this intuition above.
Condition (3) is more subtle to motivate. In our analysis we require 1/(n log(n)) so that the chance an error occurs while the circuit is still anti-concentrating (which takes Ω(n log(n)) gates) is small. This is helpful in the analysis because it allows us to essentially ignore the possibility that an error P occurs near the beginning or end of the circuit, where there is insufficient time to scramble the error (either forward or backward in time). However, a finer-grained analysis might be able to handle these kinds of errors: we believe condition (3) can be improved from −1 Ω(n log(n)) =Ω(n) to simply −1 ≥ n/c for some constant c that depends only on the architecture (1D vs. complete-graph etc.). However, we do not believe that improvement beyond this point would be possible; there is a fundamental barrier that requires to scale as O(1/n).
The reason for this is essentially that if the white-noise approximation is to hold, the errors need to be scrambled at least as fast as they appear. The fidelity F decreases like (1 − ) 2s = exp −2s − O(s 2 ) , so each layer of O(n) gates causes a decrease by a factor exp(−O(n )). Recall that we demand that the total variation distance between p noisy and p wn be much smaller than F, so as F decreases, this condition becomes increasingly stringent. Meanwhile, scrambling is fundamentally happening at the rate of increasing circuit depth, not size. One way to see this is simply that local Pauli errors P that appear at a certain circuit location are expected to be scrambled into larger operators that grow ballistically with the depth [31,32]; each layer of O(n) gates yields a constant amount of operator growth. Another way to see this is to consider a pair of error patterns E and E , where E consists of a single Pauli error on qudit j at layer d and E consists of a single Pauli error on qudit j at layer d + ∆. The correlation between p E (x) and p E (x), as a function of the random instance U , which is roughly speaking the chance that the random circuit transforms the first error into something resembling the second error, will decay exponentially with ∆, the separation in depth between the two errors. 9 Yet a third way to see this fact is to notice that, after a circuit has initially reached anti-concentration, convergence of the collision [8]. Each additional layer of O(n) gates only decreases the deviation of Z from Z H by a constant factor. The terms for E E that were ignored above are expected to obey a similar kind of decay to the value 0 for most choices of (E, E ), but if F is decaying too fast, we are not able to neglect these terms. Each layer of O(n) gates must incur at most a constant-factor decay in fidelity to not exceed the rate of scrambling; equivalently, n < c must hold for some constant c.
In Ref. [8], we analyzed the collision probability , a second-moment quantity, using the stat mech method, although we found it more useful to interpret the result as the expectation value of a certain stochastic process, rather than as a partition function. As we will see, this work is essentially an extension of the analysis in Ref. [8] to account for the action of the single-qudit noise channels N that act after two-qudit gates. We explain the steps in this analysis below, and leave the formal proofs for the appendices.
Expressing the total variation distance in terms of second-moment quantities To apply this method, the first step is to express 1 2 p noisy − p wn 1 in terms of second-moment quantities. To do so, we use the general 1-norm to 2-norm bound: when p 1 and p 2 are vectors on a q n -dimensional vector space, then where Applying this identity with p 1 = p wn and p 2 = p noisy and invoking Jensen's inequality for the concave function √ ·, we find Now we can expand where are second-moment quantities (the second equality holds since by symmetry each term in the sum has the same value under expectation), with Z w containing w copies of the noisy output and 2 − w copies of the ideal output for each w ∈ {0, 1, 2}. Note that Z 0 = q n Z with Z the collision probability studied in Refs. [8,15]. Furthermore, note that F is a free parameter, and we may choose it so that it minimizes the right-hand side 10 of Eq. (38), which occurs when matching the definition forF in Eq. (14). Plugging in F =F yields Mapping second-moment quantities to stochastic processes We bound the quantities Z 0 , Z 1 , and Z 2 by mapping them to stochastic processes. These stochastic processes are the same as the stochastic process we studied in Ref. [8], except that the noise channels introduce slightly modified transition rules, as we now discuss. Second moment quantities include two copies of each random unitary gate in the circuit. The idea in Ref. [8] was to perform the expectation over the two copies of each gate independently, using Haar-integration techniques. For a density matrix ρ on two copies of a Hilbert space of dimension q, let where E V denotes expecation over choice of V from the Haar measure over q × q matrices. Then, we have the following well-known formula (for which a derivation is provided in Ref. [8]) where I is the identity operation and S is the swap operation on two copies of the single-qudit system. The equation above states that, after Haar averaging, the state of the system is simply a linear combination of identity and swap, with certain coefficients that can be readily calculated.
For an n-qudit system acted upon by a sequence of single and two-qudit gates, this formula can be applied sequentially to each gate. After t gates have been applied, the Haar-averaged state of the system can be expressed as a linear combination of n-fold tensor products of I and S (e.g. for n = 3, the state would be given by The important takeaway from Ref. [8] was to interpret the coefficients of these 2 n terms as probabilities of a certain stochastic process over the set of length-n bit strings {I, S} n , which were called "configurations." The stochastic process generates a sequence of s + 1 configurations γ = ( γ (0) , . . . , γ (s) ), which was called a "trajectory," where the probabilistic transition from γ (t−1) to γ (t) depends only on the value of γ (t−1) (Markov property).
The transition rules of the stochastic process are calculated by computing the coefficients in Eq. (45); here we state the result 11 of that calculation; more details can be found in Appendix A.1. First of all, the initial configuration γ (0) is chosen at random by independently choosing each of the n bits to be I with probability q/(q + 1) and S with probability 1/(q + 1). Then, for each time step t, if the tth gate acts on qudits i t and j t , then the transition from γ (t−1) to γ (t) can involve a bit flip at position i t , at position j t , or neither (but not at both), and no bit can flip at any other position. Moreover, γ , then one of the two bits must be flipped. In this situation, when one bit is assigned I and one is assigned S, the S is flipped to I with probability q 2 /(q 2 + 1), and the I is flipped to S with probability 1/(q 2 + 1). Thus, there is a bias toward making more of the assignments I. The quantity Z 0 is given exactly by the expectation value of the quantity q | γ (s) | when trajectories γ are generated in this fashion, where | ν| denotes the Hamming weight of the bit string ν, that is, the number of S assignments out of n.
where here E 0 denotes evolution by the stochastic process described above.
With the stochastic process now defined, a vital observation is that the process has two fixed points, the I n configuration and the S n configuration, since whenever all the bits agree, none can be flipped. In Ref. [8], we could precisely compute the fraction of the probability mass that eventually reaches each of these fixed points if the circuit is infinitely long. Specifically, q n /(q n +1) of the probability mass converges to I n and 1/(q n + 1) converges to S n . 12 Then, since the S n fixed point receives a weighting of q n and the I n fixed point receives a weighting of 1 in Eq. (46), we find that Z 0 → 2q n /(q n + 1).
Noise introduces new rules into this stochastic process. Suppose the configuration immediately after the tth two-qudit gate is ν, and a noise channel N acts on qudit i t . Since the noise channel is unital, if ν i t = I, representing the identity operator on a two-qudit system, then the configuration is left unchanged. However, if ν i t = S, then the action of the noise may cause a flip from S to I. For the calculation of Z 0 , there is no noise, so this happens with probability 0. For the calculation of Z 1 , where there is one copy of the noisy distribution and one copy of the ideal, we 11 In Ref. [8], two equivalent stochastic processes were formulated, an "unbiased random walk" and a "biased random walk." In this paper we build from the formalism of the biased random walk. 12 This can be straightforwardly derived by letting Q(x) be the probability a configuration with x S assignments eventually converges to the S n fixed point and noting that it satisfies the recursion relation Q(x) = q 2 Q(x − 1)/(q 2 + 1) + Q(x + 1)/(q 2 + 1), for which the solution is Q(x) = Aq 2x + B for constants A and B determined by enforcing boundary conditions Q(0) = 0 and Q(n) = 1. The fraction of probability mass that begins at a configuration with x S assignments is n x q n−x /(q + 1) n , allowing the total amount of mass that reaches S n to be computed.
can again use the formula in Eq. (45) to compute the S → I transition probability to be rq/(q − 1), where r is the average infidelity given in Eq. (3). This is explained in Appendix A.2. For Z 2 , where there are two copies of the noisy distribution, the probability of an S → I transition is calculated to be 1 − u, where u is the unitarity of the noise channel given in Eq. (4). The values of Z 1 and Z 2 are thus given by where E σ denotes the stochastic process where S → I bit flips occur at each noise location with probability σ , generalizing Eq. (46). Since noise can flip an S to an I but not vice versa, I n is the only fixed point of the stochastic processes for Z 1 and Z 2 ; the S n fixed point is only metastable: eventually, the action of noise will flip one of the S bits to an I, and the trajectory might re-equilibrate to the I n fixed point. Our analysis consists of a careful accounting of the leakage of probability mass away from the metastable S n fixed point.
Analyzing the stochastic processes for a toy example Now, we consider a toy example which captures the essence of our analysis. Suppose a circuit consists of alternating rounds of (1) a global Haar-random transformation and (2) a depolarizing noise channel on a single qudit, as depicted in Figure 2. Step (1) can be approximately accomplished by performing a very large number of two-qudit gates. This model is similar to the toy model considered in Ref. [22] Figure 2: Toy example where global Haar-random gates U (t) act in between a depolarizing noise channel on a single qudit. In this model we can exactly compute quantities Z 0 , Z 1 , and Z 2 because the global Haar-random gates cause the probability mass in the stochastic process to fully reequilibrate to one of the fixed points, I n or S n .
being that they considered single-qudit noise channels on all n qudits in step (2)), which they analyzed using the Pauli string method of Refs. [41,42].
The initial global Haar-random transformation induces perfect equilibration to the two fixed points, with q n /(q n + 1) mass reaching the I n fixed point and 1/(q n + 1) mass reaching the (metastable) S n fixed point. This is already sufficient to compute Z 0 − 1, which is not sensitive to the noise.
Now suppose we want to calculate Z 1 . Consider a piece of probabiltiy mass that is part of the 1/(q n + 1) fraction at the S n fixed point. The single-qudit depolarizing noise channel will flip one of the S assignments to an I assignment with probability rq/(q − 1) = (1 − q −2 ) −1 . If this happens, there are n − 1 S assignments and 1 I assignment. While it may seem that this new configuration is still close to the S n fixed point, we must remember that the random walk is biased in the I direction. When we perform the next global Haar-random transformation, we get perfect reequilibration back to the two fixed points; with probability 1−q −2 1−q −2n we end at the I n fixed point, and with probability q −2 −q −2n 1−q −2n we end at the S n fixed point. These probabilities were derived in Ref. [8], and are a basic consequence of Eq. (45). Now, the total mass that remains at the S n fixed point is the 1 1−q −2n that left and returned, which comes out to 1 q n +1 (1 − 1−q −2n ). After 2s single-qudit error channels have been applied, the probability mass remaining at the S n fixed point is precisely probability mass at S n after 2s noise locations = This mass receives weighting of q n toward Z 1 . Meanwhile the rest of the mass is at the I n fixed point and receives weighting of 1. This tells us that We see that in this toy model, the quantityF = (Z 1 − 1)/(Z 0 − 1) is precisely given by the fraction of probability mass originally destined for the S n fixed point that remains at the S n fixed point even after the noise locations have acted. Thus, the leakage of probability mass from S n to I n in the calculation of Z 1 corresponds exactly to the decay of fidelity. Calculating Z 2 − 1 is just as easy. Here transitions due to noise occur with probability 1 − u where u is the unitarity of the noise channel. For depolarizing noise, we have We can plug these calculations into Eq. (43) to find that Extending the analysis to a full proof In the proofs of our theorems, the difficulty is that the probability mass does not fully equilibrate to a fixed point before the next error location acts. Nonetheless, we manage to calculate tight bounds on Z 1 and Z 2 by keeping track of the amount of probability mass that would re-equilibrate back to S n and I n if the rest of the gates were noiseless, which we refer to as S-destined and I-destined probability mass. We show that, as long as < c/n for some constant c, the S-destined probability mass is exponentially clustered near the S n fixed point in the sense that the probability of being x bit flips away from S n conditioned on being S-destined decays exponentially in x. Thus, for a piece of S-destined probability mass, nearly all the bits will be assigned S, and the action of a noise channel reduces the amount of S-destined mass by a factor of roughly 1 − . If it were the case that a constant fraction of bits were assigned I, then the noise would cause a flip from S → I less frequently and the fraction of the S-destined mass that stays S-destined after each noise channel would be larger than 1 − by an O( ) amount, which would ruin the analysis. The reason < c/n is required for the exponential clustering effect is that errors need to be rare enough for the S-destined mass to mostly re-equilibrate back to S n before new errors pop up; to say it another way, the errors must get scrambled at a faster rate than they appear. If a configuration has n − 1 S assignments and 1 I assignment, it will take O(n) gates before the single I-assigned qudit participates in a gate. Thus, if errors occur at a slower rate than one per O(n) gates, full re-equilibration will happen before a new error pops up most of the time. It is not clear if this condition is truly necessary for the clustering statement to hold, but we show at the very least that it is sufficient.
However, we need < c/n to hold for another (related) reason: the leakage from S n to I n must occur more slowly than the anti-concentration rate, which corresponds to the speed at which the probability mass initially equilibrates to I n and S n . After all, even though the stochastic process is I-biased, the I-destined mass does not make it to the I n fixed point instantaneously. After s gates, there will be some residual contribution from the not-yet-equilibrated I-destined mass to the calculation of quantities Z 0 − 1, Z 1 − 1, and Z 2 − 1; this contribution decays by a constant factor with every additional O(n) gates. If = O(1/n), a constant fraction of the S-destined mass will leak away with each set of O(n) gates, and if the constant prefactor on this leakage is too large, the I-destined mass will contribute more than the S-destined mass to the expectation values; as a result, the right-hand-side of Eq. (43) will not exhibit the same kind of cancellations observed for the toy example.
In our formal analysis, we actually assume something even stronger: we require that 1/(n log(n)), which essentially means that very few errors occur during the initial anticoncentration period. However, this is done to make the analysis easier, and we do not believe this condition is necessary.

Numerical estimates of error in white-noise approximation
In principle, it would be possible to determine the constant factors under the big-O notation in our proofs, but the result of this exercise would likely yield extremely unfavorable numbers due to our lack of optimization throughout, and the fact that it might be possible to eliminate some of the terms in our error expression altogether with a more fine-grained analysis. The goal of this section is to provide a numerical assessment of the bound on the error in the white-noise approximation for realistic values of the circuit parameters. We find that realistic NISQ-era values of the circuit parameters can lead to a small upper bound on the white-noise approximation error, even for circuits with several thousand gates, but we confirm that the noise rate needs to decrease like O(1/n) as the system size scales up for our upper bound to be meaningful.

Numerical method
The numerics we present are for the complete-graph architecture. In general, the stochastic process underlying our method (described in Section 5.2 and presented formally in the appendix) is a random walk over 2 n possible configurations of a length-n bit string. However, for the completegraph architecture there is an equivalence between all configurations with the same Hamming weight. Thus, the state space for the stochastic process is reduced to n + 1 distinct groups of configurations (associated with Hamming weights 0, 1, . . . , n). The quantities Z 0 , Z 1 , and Z 2 , as defined in Eqs. (39), (40), and (41) can then be precisely computed by multiplying the (sparse) (n + 1) × (n + 1) transition matrices for the stochastic process. This allows us to compute the righthand-side of Eq. (43) for n substantially large, giving a bound on E U [ 1 2 p noisy − p wn 1 ]. In our analysis below, we suppose all noise locations are subject to depolarizing noise with error probability , given as in Eq. (5). We also restrict to q = 2 (qubits). We do not model readout errors, which are a large source of error in the actual experiments of Refs. [4][5][6]. We plug in specifications (n, , s) and exactly compute the quantity which gives the ratio of the bound in Eq. (43) to the fidelityF.

Numerical bound for realistic circuit parameters
We first examine the bound using the circuit parameters of existing experimental setups. The Google experiment [4] ran s = 430 gates on their n = 53 qubit processor called Sycamore, and their error rate per cycle, which is the analogous quantity to the total error in a two-qubit gate in our setup, was reported to be 0.9%. This corresponds to ≈ 0.0045 in our model where separate noise channels act on each of the two qubits. Meanwhile, the largest experiment from USTC [6] ran s = 594 gates on their n = 60 qubit processor called Zuchongzhi, with a similar overall error rate per cycle. In Figure 3, we plot the numerically calculated bound on 1 F E U [ 1 2 p noisy − p wn 1 ] as a function of circuit size for complete-graph circuits with n = 53 and n = 60 at = 0.0045. The circuit sizes s = 430 and s = 594 appear as large dots. We find that, as expected, the bound is bad if the circuit size is too small. There is an initial spike in the bound due to the first few layers of noisy gates, which subsides quickly as those initial errors are scrambled. The behavior that follows reflects the race between fidelity decay and anti-concentration. For these values of the error rate, the fidelity decay is happening at a slower rate than anti-concentration, but it has a head start, since it takes Θ(n log(n)) gates for anti-concentration to initially be reached [8]; this explains why the bound is decreasing (relative to F) even as the circuit size passes 1000. For large s, both curves approach the function 2 √ s/3. This indicates that the constant factor underneath the O( √ s) is less than 1, at least for depolarizing noise in the complete-graph architecture. The point at which we expect the O( √ s) behavior to take over will generally be Θ(n log(n)) + Θ(n), where the first term corresponds to the initial anti-concentration period, and the second term corresponds to the additional time needed for anti-concentration to catch up to the fidelity. The constant prefactor under the second term will be larger when is larger and the fidelity decays more rapidly.
Interestingly, the circuit size actually implemented in both of the experiments falls in a region where the bound on approximation error relative to fidelity is decreasing with circuit size, Figure 3: Plot of the numerically calculated upper bound on the expected total variation distance between p noisy and p wn divided by F for a complete-graph version of recent random quantum circuit experiments by Google (53 qubits) [4] and USTC (60 qubits) [6]. The large dots represent the circuit sizes (number of two-qubit gates) implemented in those experiments. The dotted black line is the function 2 √ s/3 for each experiment.
suggesting the white-noise approximation would become more meaningful if more gates were applied (at the expense of smaller fidelity). In fact, for Google's experiment, the upper bound yields a value close to 1, and for USTC, it yields a value larger than 1, indicating that, in this idealized complete-graph version of their experiments, the white-noise assumption may not hold (we would need a lower bound to know for sure).
There are a few caveats to these conclusions. First, what we plot is only an upper bound, and it is not clear whether this upper bound is tight. Second, this is for the complete-graph architecture, but the experiments of Refs. [4][5][6] had a 2D architecture (although one might speculate that a 2D architecture would only scramble less efficiently than the complete-graph architecture). Third, we have not modeled readout errors in the device. Fourth, we have an idealized error model of depolarizing single-qubit noise. As has been mentioned in footnotes throughout this paper, the goal of our work is not to justify the claims of quantum computational supremacy by specific noisy random quantum circuit experiments. Rather, we aim to show that the whitenoise phenomenon is possible and can be proved analytically, and that this adds some justification to claims that a low-fidelity random quantum circuit experiment could in principle accomplish quantum computational supremacy.

Threshold error rate for good white-noise bound
A key feature we observed in our theoretical analysis was the need for the error rate to decrease with n. For each value of n, we observe a threshold error rate such that, if is beneath the threshold, our upper bound on the total variation distance follows O(F is above the threshold, our bound becomes (empirically) O(Fe Θ(s) ). Without a lower bound, we cannot be sure if this is the actual behavior of the approximation error.
In Figure 4, we present a log plot of the numerically calculated bound on the approximation error (relative to F) for different values of at system sizes n = 53, 106, 159, 212 (corresponding to integer multiples of the size of Google's 53-qubit experiment). For n = 53, we see that choices of beneath roughly 0.0057 appear to approach O( √ s) scaling at large s, while choices of above that threshold increase exponentially with s. For n = 106, n = 159, and n = 212, the apparent threshold decreases to roughly = 0.0028, = 0.0019, and = 0.0014, respectively. This is consistent with a general threshold of roughly = 0.3/n. We expect the = O(1/n) threshold to exist in other architectures as well, but with a modified constant prefactor. Architectures with a faster anticoncentration rate should have larger thresholds.

Outlook
We have presented a comprehensive picture of how the output distribution of typical random quantum circuits behaves under a weak incoherent local noise model. As more gates are applied, the output distribution decays toward the uniform distribution in total variation distance like e −2 s where is the local noise strength in a Pauli error model (for non-Pauli models, this can be expressed in terms of the average infidelity r) and s is the number of gates. Moreover, we show that the convergence to uniform happens in a very special way: the residual non-uniform component of the noisy distribution is approximately in the direction of the ideal distribution. The random quantum circuits scramble the errors that occur locally during the evolution so that they can ultimately be treated as global white noise, allowing some signal of the ideal computation to be extracted even from a noisy device. While this property had previously been conjecturedit was an underlying assumption of quantum computational supremacy experiments [4,5]-it had not received rigorous analytical study. Basic questions like how the error in the white-noise approximation scales with and s had not been investigated.
Our theorem statements are given for general, possibly coherent, noise channels. While we show that local coherent noise channels lead the output distribution to exhibit exponential decay in the linear cross-entropy benchmark for the fidelity, there is not generally also a decay toward the uniform distribution. As a result, the white-noise approximation is not good for coherent noise channels. Moreover, even for incoherent noise channels, our technical statements are only applicable if the Pauli noise strength (or for non-Pauli noise channels, the average infidelity) is beneath a threshold that shrinks with system size like O(1/n) and if the circuit size is at least Ω(n log(n)). Furthermore, our bound on error in the white-noise approximation is only meaningful if 1/(n log(n)). We believe the 1/(n log(n)) requirement is merely a result of suboptimal analysis, but that the assumption < O(1/n) is fundamentally necessary for the approximation to be good: errors must be scrambled faster than the fidelity F ≈ e −2 s decays.
One implication of our result is to put low-fidelity random-circuit-based quantum computational supremacy experiments on stronger theoretical footing by showing that, as long as our local noise model is a reasonable approximation of noise in actual devices, the device produces samples from a well-understood output distribution, which can subsequently be argued is hard to classically sample. Indeed, in Appendix C, we combine observations from previous work to show that the task of classically sampling from the white-noise distribution with fidelity F up to ηF error is essentially just as hard, in a certain complexity-theoretic sense, as the task of classically sampling from the ideal distribution up to a O(η) error. This is important because the latter task (and variants of it in other computational models [43,44]) has previously garnered significant theoretical scrutiny [20][21][22], although it is still not known whether it is hard in a formal complexity-theoretic sense.
These results are good news for the utility of NISQ devices more broadly. In order to perform a larger and more interesting computation, noise rates must become smaller; our work shows that, in some applications, for circuits with s gates, noise rates need only decrease like 1/ √ s, rather than 1/s, as long as one is willing to repeat the experiment many times to extract the signal from the global white noise. A natural next question is when, besides the case of random quantum circuits, do we expect a similar white-noise phenomenon to occur? Our result shows that convergence to white-noise is a generic property, occurring for a large fraction of randomly chosen circuits. Heuristically, this is because random quantum circuits are known to be good scramblers. However, most interesting quantum circuits are non-generic in some way. An extreme example is quantum error-correcting circuits, which are specifically designed not to scramble errors (so that they can be corrected). The output of these circuits will not be close to the white-noise distribution. A fascinating follow-up question is whether other computations proposed for NISQ devices appear to scramble errors well enough that a similar approximation can be made. One leading candidate with relevance for many-body physics is circuits that simulate evolution by fixed chaotic Hamiltonians, since these systems are thought to scramble information efficiently. Indeed, a central motivation for studying random quantum circuits in the first place has been to model the scrambling properties of chaotic many-body systems [31,32,45]. either 1 or 2), and let where the average is over Haar-random choice of U (t) and U

(t)
A (t) denotes the operation that acts as U (t) on qudits in region A (t) and as identity on all other qudits.
Application of the first layer of n single-qudit gates in Figure 1 corresponds to application of M (−n+1) • · · · • M (0) to the initial state |0 n 0 n | ⊗2 . Applying the Haar integration formula in Eq. (45) to each qubit, we find where the second equality expresses the formula as a linear combination of I/q 2 and S/q, both of which have trace one. The coefficients q/(q + 1) and 1/(q + 1) are interpreted as probabilities that each bit of the initial configuration γ (0) as described in Section 5.2 is I or S, respectively. Since the averaged state is a linear combination of tensor products of I and S already after the first layer, we need only compute the action of an averaged two-qudit gate on I ⊗ I, I ⊗ S, S ⊗ I, and S ⊗ S, properly normalized. Suppose gate t acts on qudits {i t , j t }. Then M (t) acts trivially on all qudits outside of {i t , j t } and its action on {i t , j t } is computed using the Haar integration formula in Eq. (45) (note that since the gates are q 2 × q 2 matrices, we replace q by q 2 , I by I ⊗ I, and S by S ⊗ S), yielding The above equations correspond to the transition rules for the noiseless stochastic process mentioned in Section 5.2: if both bits are I or both are S, then there is no change, but if one is I and one is S, they are both set to I with probability q 2 /(q 2 + 1) and both set to S with probability 1/(q 2 + 1). This illustrates that sequential application of M (t) on the state will map linear combinations of tensor products of I/q 2 and S/q to other linear combinations of tensor products of I/q 2 and S/q. The coefficients of these linear combinations transform linearly. When written in terms of the trace-one operators I/q 2 and S/q, this linear transformation will be stochastic, i.e. the sum of the coefficients of the linear combination over tensor products will be conserved (note that the sum of coefficients in Eqs. (57), (58), and (59) is one). Now, let us associate the configuration ν ∈ {I, S} n by the tensor product , which is a basis state for the vector space acted upon by M (t) .
For configurations ν, γ ∈ {I, S} n , denote the matrix elements of this (stochastic) transformation by The matrix elements are given explicitly by where | ν| denotes the Hamming weight of the bit string ν, that is, the number of S assignments. Working now from the definition of Z 0 in Eq. (39) and p ideal in Eq. (10), we have the matrix equation The q n−| γ (0) | /(q + 1) n factor is the probability of starting in γ (0) . Thus, this can be re-expressed as where E 0 denotes expectation over the stochastic process that generates the trajectory γ = (γ (0) , . . . , γ (s) ), as described above, and as concluded in Eq. (46) of Section 5.2. In Ref. [8], this stochastic process was termed the "biased random walk."

A.2 Action of averaged noise channel on identity and swap
Since every single-qudit noise channel is followed by a Haar-random (either single-qudit or twoqudit) gate in the circuit diagram, we are free to add a single-qudit Haar-random gate immediately after every noise channel without changing the overall circuit ensemble (the Haar measure is invariant under multiplication by any unitary). Denote this single-qudit Haar-random matrix by V . There will be a difference in the analysis between the calculation of Z 0 , Z 1 and Z 2 , where Z w contains w copies of the noisy output as defined in Eqs. (39), (40), (41). Define N 0 = I ⊗ I (67) with I denoting the single-qudit identity channel. Let ρ be a state on two copies of a single-qudit Hilbert space. Then for w ∈ {0, 1, 2}, let be the Haar-averaged noise channel. We will only need to compute the action of N w on input states ρ = I/q 2 (here I is the two-qudit identity operator) or ρ = S/q since, as shown above, the random gates turn the initial state |0 n 0 n | into a linear combination of tensor products of I/q 2 or S/q on each qudit. Note that since N is assumed to be unital, we have for all w ∈ {0, 1, 2}. However, computing the action on S/q is not as simple. Let (Note that Y 0 = q 2 since N 0 is the identity channel.) Then, use Eq. (45) and the fact that N is trace-preserving to show Now we relate the quantities Y 1 and Y 2 to the average infidelity and the unitarity, respectively. Recall that tr(AB) = tr(S (A ⊗ B)). Using this trick and Eq. (45), the average infidelity from Eq. (3), can be evaluated as follows: The unitarity from Eq. (4), can be evaluated in a similar way.
Plugging these relations back into Eq. (73) gives us For weak noise channels, r is close to 0 and u is close to 1. In this case we see that the noise causes some small amount of leakage from the S state to the I state, but no leakage from the I state to the S state, introducing an asymmetry into the problem that did not exist in the noiseless analysis. w be the identity channel. If ρ is a linear combination of tensor products of I/q 2 and S/q, N (t) w (ρ) and N (t) w (ρ) will be as well, with coefficients that transform linearly (and stochastically). For configurations γ, ν ∈ {I, S} n , let N (t) w, ν γ denote the matrix elements of this transformation, that is where for 1 ≤ t ≤ s, if γ i t = S and ν i t = S and γ = ν 1 − u if γ i t = S and ν i t = I and γ a = ν a ∀a i t 0 otherwise , and N (t) w are given by the same equations, with j t replacing i t .

A.3 Mapping noisy circuits to stochastic processes
Define where U (t) and U (t) are given in Eqs. (8) and (9). Then we may write, for w ∈ {0, 1, 2} Since each U (t) is chosen independently, we are free to perform the expectation value individually over each U (t) w channel. The noiseless channel U (t) 0 = U (t)⊗2 averages to M (t) , where M (t) is given in Eq. (55). The action of the noise may also be averaged, since, as discussed in Appendix A.2, we may pull out a single-qudit Haar random gate to act after each noise location. Thus, the noiseless single qudit gates at the end of the circuit may be dropped as they are being absorbed into the noise. Let M so that Following the noiseless analysis of Appendix A.1, we may now write Z w as a product of matrices generalizing Eq. (65). In the notation of Section 5.2, for w = 1 this can be expressed as where the expectation is over the stochastic process that generates a trajectory with 3s + 1 configurations (at time values t = 0, 1/3, 2/3, 1, . . . , s). For w = 2, it reads The expressions for Z w as weighted sums over trajectories can alternatively be interpreted as partition functions of an Ising-like stat mech model where each γ (t) a is an Ising variable {+1, −1}. There are interactions between adjacent Ising variables whenever a gate or noise location acts between them; the associated interaction strengths can be calculated from the matrix elements listed above.

A.4 Bra-ket notation for the stochastic process
We now write the above insights in a notation that offers slightly more flexibility, which we will utilize in our proofs. The reader need only read this section to verify the proofs that appear later. Consider a 2 n -dimensional vector space, where orthonormal basis states are labeled by configurations | ν for each ν ∈ {I, S} n . Define the vectors Then we may define 2 n × 2 n transition matrices P (t) , which enact the tth step of the noiseless stochastic process, as well as matrices Q (t) σ and Q (t) σ which enact the S → I transition with probability σ on qudits i t and j t , respectively. Explicitly we let where the subscripts on the right-hand side denote which bits are acted upon by which operators, and Note that P is a stochastic 4 × 4 matrix. Then, define If the circuit diagram is generated randomly, as is the case for the complete-graph architecture, then Z σ is defined instead as the mean of the above expression over choice of circuit diagram. For the specific case of the complete-graph architecture (where the pair of qudits acted upon by each gate is chosen independently from all other gates), the average of Z σ over different circuit diagrams can be accomplished by averaging the matrix Q σ P (t) over all choices of {i t , j t }. This is the convention we follow when analyzing the complete-graph architecture.
The |Λ in the equation above represents the distribution over the initial configuration γ (0) , and the q| represents the weighting given to the final configuration γ (s) . Thus, the equation for Z w in Eq. (98) implies that Z 0 = Z 0 (109)

B Detailed proofs
The statements of our main theorems in the appendix are slightly more general than in the main text: we consider a general class of architectures that are both "layered" and "regularly connected," which we define below. The theorem statements are in terms of the anti-concentration size s AC of the architecture, which is defined [8] to be the minimum circuit size s such that Z 0 ≤ 4q n /(q n + 1). The 1D architecture and complete-graph architecture are the only architectures known to have s AC = Θ(n log(n)), so for clarity, we previously restricted our statements to those architectures. First, in Appendix B.1, we present definitions and our main lemmas, which are themselves dependent on more minor lemmas. Then, in Appendix B.2, we prove a slightly generalized version of our theorems from the main text, based on the main lemmas. Afterward, in Appendix B.3, we develop some more machinery and state the minor lemmas, deferring their proofs to Appendix B.8.

B.1 Definitions and main lemmas
Our proofs apply to architectures that are layered and h-regularly connected for some constant h = O(1). The regularly connected property was defined in Ref. [8], where it was conjectured to imply anti-concentration after Θ(n log(n)) gates, and we repeat its definition here.
First, define an architecture as in Ref. [8] to be an efficient (possibly randomized) algorithm that takes as input circuit parameters (n, s) and outputs a length-s sequence of size-2 subsets (A (1) , . . . , A (s) ), where A (t) ⊂ [n] and |A (t) | = 2 for each t. The subsets A (t) correspond to the pair of qudits acted upon by a gate at time step t.
Definition 1 (Regularly connected [8]). We say an random quantum circuit architecture is h-regularly connected if for any n, any t, any subsequence A = (A (1) , . . . , A (t) ) and any proper subset R ⊂ [n] of qudit indices, there is at least a 1/2 probability that, conditioned on the first t gates in the gate sequence being A, there exists some index t for which t < t ≤ t + hn, A (t ) ∩ R ∅, and A (t ) R.
If h = O(1), we often simply call the architecture regularly connected, without specifying h. This property is a precise way of saying that the circuit does not break into multiple distinct parts that rarely interact with each other (a feature that would prevent scrambling): for any bipartition, there is usually a gate that couples one qubit from each half at least once every O(n) time steps. Nearly all natural architectures are regularly connected (a notable exception being the hypercube architecture [8]).
Next, we define layered, which simply means that the gates can always be neatly arranged into layers of n/2 non-overlapping gates. (A (1) , . . . , A (s) ) it generates with nonzero probability has the property that for any integer d ≥ 0, and any pair of gates in the same "layer"

Definition 2. An architecture is layered if any sequence of gates
Thus, all n qudits are acted upon by exactly one gate out of every n/2 gates.
For layered architectures we can speak clearly about the depth d = 2s/n. The anticoncentration depth is then defined as d AC = 2s AC /n. We will generally require s be a multiple of n/2 so that there are an integer number of layers. Regular lattice architectures in D spatial dimensions are typically layered, although adhering strictly to the definition would require applying periodic boundary conditions. We do not expect this condition is actually necessary for our results, but it is analytically convenient. The only place we need it is in Lemma 12.
Our theorems are corollaries of the following lemmas. Recall the definition of Z σ from Eq. (108). Note that in these proofs, all constants are dependent on q as well as h (the regularly connected parameter), but independent of n and the noise parameters. c 0 , c 1 , c 2 , c 3 , c 4 , c 5 , and n 0 that depend on h and q but not on n or σ , such that as long as σ ≤ c 5 /n and n ≥ n 0 , for any value of the circuit depth d,

Lemma 1. If the random quantum circuit architecture is h-regularly connected and layered with anticoncentration depth d AC , then there exist constants
where Proof. The lower bound is an immediate consequence of two lemmas that appear later, Lemma 11 and Lemma 12. The upper bound is also an immediate consequence, with the constant c 1 absorbing an O(nσ ) term since d AC = 2s AC /n ≥ Ω(log(n)) by the results of Ref. [8].
We show the analogous statement for the complete-graph architecture.
Lemma 2. If the random quantum circuit architecture is the complete-graph architecture, then there exist constants c 0 , c 1 , c 2 , c 3 , c 4 , c 5 , and n 0 that depend on q but not on n or σ , such that as long as σ ≤ c 5 /n and n ≥ n 0 , for any value of the circuit size s, where and s AC = Θ(n log(n)) is the anti-concentration size for the complete-graph architecture.
Proof. The proof is the same as Lemma 1 except using Lemma 13 in place of Lemma 12. Note that in the regime and the following holds The upper bound in Eqs. (119) and (120) whereF is given in Eq. (14), and where p unif is the uniform distribution and

exp O(sv 2 ) + O(s AC v) + e O(s AC /n) e −Ω(s/n) + O(nv log(1/(nv)) . (125)
Proof. We can use the 1-norm to 2-norm inequality in Eq. (35), along with Jensen's inequality for the concave √ · function to say Then, the theorem follows from the upper bound in Lemma 1 for layered architectures and Lemma 2 for the complete-graph architecture, with σ = v, combined with the observation in Eqs. (119) and (120). Note also that nd = 2s.

B.2.3 Proof of Theorem 3: approximation by white noise
Theorem 3 (generalized and restated). Consider either the complete-graph architecture or a regularly connected, layered random quantum circuit architecture with n qudits of local Hilbert space dimension q and s gates, where the anti-concentration size is given by s AC . Let r be the average infidelity and u the unitarity of the local noise channels (and define v = 1 − u). Let Then, when we choose F =F as in Eq. (14), there exist constants c 1 , c 2 , and n 0 such that as long as v ≤ c 1 /n, r ≤ c 2 /n, and n ≥ n 0 , whenever the right-hand side of Eq. (130) is less thanF.
Proof. Following Section 5.2, we first use the 1-norm to 2-norm bound and Jensen's inequality, and then we optimize the value of F. The bound on the distance between p wn and p noisy is minimized when we choose F =F = (Z 1 − 1)/(Z 0 − 1). When this value is chosen, the bound can be expressed as Note that after the anti-concentration size has been surpassed, the quantity Z 0 − 1 rapidly approaches q n −1 q n +1 ≈ 1 from above. To evaluate Z 0 , Z 1 and Z 2 we use the correspondence Z 0 = Z 0 , Z 1 = Z rq/(q−1) and Z 2 = Z v . The bounds from Lemma 1 for layered architectures and Lemma 2 for the complete-graph architecture then allow us to upper bound (Z 2 − 1)(Z 0 − 1) 2 /(Z 1 − 1) 2 , arriving at where Q 2 is given in Eq. (125), and δ is given in Eq. (129). Now, working back from Eq. (131), and noting that e x − 1 < 2x for all x ≤ 1, we have when the quantity under the square root is less than 1 (and using

B.3 Machinery for proof
We now develop some more notation, and we precisely state some of our lemmas. We defer the proofs of these lemmas to Appendix B.8. As we state them, we attempt to give some commentary about the meaning and purpose of the different objects that we define and the related lemmas.

B.3.1 Coupling a noiseless and noisy copy of the dynamics
We have a fairly good understanding of the noiseless stochastic process from Ref. [8]. Our strategy here is to examine how introducing noise perturbs that process. To that end, we consider two copies of the random walk, where one is noiseless and one is noisy, but where they are correlated so that we can isolate the impact of the noise. Recall that we have reduced the calculation of Z σ to the expectation value of a random variable (the configuration) that evolves according to the stochastic transition matrix P (t) (representing the noiseless gate) followed by transition matrices Q (t) σ and Q (t) σ , which represent the impact of noise.
Let X denote the 2 n -dimensional vector space for the first "noiseless" copy and Y for the second "noisy" copy. To define the dynamics formally, recall the definition of D and T from Eqs. (105) and (106), and define the following matrix that acts on four bits.
The matrix R is stochastic. It should be understood as a correlated bit flip where, if the first and third bits are equal and the second and fourth bits are equal, they are sent to a state where that σ only to the Y copy, which results in a bit flip from S to I independently on each location with probability σ . In the example above, only the i t assignment is flipped. The system W captures the difference between the X and Y copies; it is assigned S wherever they agree and I wherever they disagree. This formalism allows us to isolate the impact of the noise on a trajectory of the stochastic process compared to what "would have" happened had there been no noise.
is still true. However, its marginal on either the first two bits or the last two bits is precisely P from Eq. (107). Refer to the ith bit of the first random variable as X i and the ith bit of the second random variable as Y i . Then define In words, what R (t) σ does is first generate a correlated noiseless transition among the bits involved in the gate {X i t X j t , Y i t Y j t } for both the first "noiseless" X copy and the second "noisy" Y copy, and then apply the noise transitions only to the Y copy. Since the marginal dynamics of the matrix R restricted either to the first two bits or to the last two bits is the matrix P , the marginal dynamics of R (t) σ are P (t) on the X copy and Q σ on an example configuration is illustrated in Figure 5.
An additional property of R (t) σ is that it preserves a certain subspace of the 2 n × 2 n Hilbert space. If we define the projector π i = (|II II| + |SS SS| + |SI SI|) {X i Y i } , then the support of n−1 i=0 π i is not coupled with its orthogonal complement by the matrix R (t) σ . Let us refer to this subspace as the accessible subspace. This corresponds to the fact that the noise can send S → I but not vice versa.
We define the initial state to be the correlated version of |Λ which lies in the accessible subspace, so evolution by R (t) σ is guaranteed to remain within the accessible subspace for the entire evolution.
In terms of R (t) σ we can rewrite Eq. (108) as where |a, b is shorthand for |a X ⊗ |b Y . Inner product with 1| in the equation above simply marginalizes over the noiseless X copy (since the vector is normalized in the 1-norm), and in our proofs, we will use this notation often. Note also that since the marginal dynamics of the X copy is the noiseless dynamics, we can marginalize over the Y copy and conclude that for any σ . In our proof, we find it convenient to define which represents the joint probability distribution over the 2 n configurations after t gates (and their associated noise channels) have been applied. Note that for circuit architectures where the circuit diagram is chosen randomly, such as the complete-graph architecture, |v (t) is defined as the above expression averaged over all circuit diagrams. Finally, let W refer to a third copy of the 2 n -dimensional Hilbert space and define a mapping from the ith bits of X and Y to the ith bit of W , as follows: It maps a bit pair to |S if they agree and |I if they disagree. Let be the map from X ⊗ Y to W . Note that ∆|ΛΛ = |S n .

B.3.2 I-destined and S-destined probability mass
We view |v (t) as the probability vector for the correlated stochastic process. Suppose starting at timestep t + 1, we begin running noiseless dynamics on both copies, i.e. we apply R (t) 0 , and we continue for an infinite number of gates. Then we will get full convergence to the fixed points |I n ⊗ |I n , |S n ⊗ |S n and |S n ⊗ |I n . The fourth fixed point |I n ⊗ |S n is not in the accessible subspace. We can compute precisely the probability of each of these outcomes. In Ref. Hamming weight Figure 6: Schematic of the concept of I-destined and S-destined probability mass in an n = 6 example. Each of the 2 n configurations corresponds to a Hamming weight between 0 and n, that is, the number of S assignments out of n. For a given configuration, the mass can be broken into an I-destined and an S-destined portion corresponding to the fraction that would end at Hamming weight 0 and Hamming weight n, respectively, if an infinite number of noiseless gates were applied. In the diagram, this corresponds to a division of the mass into the blue and red circles within each Hamming weight bucket. For each x, the ratio of I-destined to S-destined mass at Hamming weight x is always precisely (1 − q −2n+2x )/(q −2n+2x − q −2n ). A portion of probability mass that is conditioned on being I-destined or S-destined obeys effective transition dynamics (given by transition matrices P (t) I and P (t) S , respectively) that preserve which fixed-point the portion of probability mass is destined for. The allowed transitions of these conditional noiseless dynamics are given by blue and red arrows in the diagram. The allowed transitions associated with action of a noise location are given by yellow lines: a portion of S-destined mass that experiences a S → I flip due to noise can remain S-destined, or it can become I-destined, but I-destined mass can never become S-destined. The proof decomposes the I-destined mass according to which time step it first became I-destined.
arrived at an expression for these probabilities by solving a certain recursion relation. Here, we need only the result of that calculation to inform how we define the diagonal matrices L I and L S : Note that L I + L S is the identity matrix I . The coefficient of | ν ν| in L I gives the probability that a configuration that starts at | ν ends at the I n fixed point if it undergoes completely noiseless dynamics, and the coefficient in L S gives the probability of ending at the S n fixed point [8].
Then define which are the analogous matrices for the joint dynamics to end at |I n ⊗ |I n , |S n ⊗ |S n , and |S n ⊗ |I n , respectively. Now we may define P (t) and R (t) where in each case O −1 denotes the Moore-Penrose pseudo-inverse of O. We interpret these matrices as the transition operators for probability mass that has been conditioned to end up at a certain fixed point. For example, P S is the transition operator for a single copy conditioned on eventually ending up at the S n fixed point. Even though the walk is generally biased toward I, it will be biased toward S when conditioned on ending at the S n fixed point. The following lemma asserts that these are indeed stochastic matrices. All lemmas stated here are proved in Appendix B.8.

Lemma 3. The matrices P
SI , restricted to their support, are stochastic matrices.
The next lemma asserts that if the X ⊗ Y system undergoes dynamics under R (t) SI , then the W system undergoes dynamics under P (t) I . This makes sense, since conditioning on X to go to S n and Y to go to I n should be equivalent to conditioning the W system to go to I n .
Lemma 4. Within the accessible subspace, the following holds.
We now introduce some more notation. For any vector |x on a single copy of the vector space, let and for any vector |v on two copies of the vector space, let Thus, if |x represents a probability distribution over the 2 n basis states on a single copy of the Hilbert space, then the vector |x I is the portion of |x that is destined to end at the fixed point I n , and |x S is the portion destined to end at S n (if all future gates are noiseless). The division of probability mass into separate I and S-destined parts is depicted schematically in Figure 6. The amount of probability mass for which the noisy copy is destined for the S n fixed point cannot decay too quickly with the number of noise locations (note that if the noisy copy ends at S n , the noiseless copy must also end at S n ). In Figure 6, this is depicted by the fact that the only way to transition from the S-destined to an I-destined division of probability mass is due to the action of a noise location, which induces a S → I transition with probability σ . Lemma 5. The S-destined probability mass obeys the following inequality, for any t ≥ t.

SS .
(160) Proof idea. Recall that the inner product with 1, 1| gives the sum of the entries of the vector. We interpret |v (t) SS as the probability vector of mass destined to reach the S n fixed point on both copies. Each time a noise location acts, it can affect at most a σ fraction of the mass, so even after two noise locations act, at least a (1 − σ ) 2 fraction of the mass that was S-destined before will still be S-destined.

B.3.3 Decomposing the I-destined probability mass
The final piece of machinery we need is an accounting of which error leads to each piece of Idestined probability mass. To do this, for each t ≥ 1 define and define the evolution rule The vector |v

(t ,t) SI
represents the probability mass that would have gone to the S n fixed point, but the noise at time step t caused it to be redirected to the I n fixed point, and we have subsequently evolved it forward to timestep t . Importantly, we can verify from the definition that indicating that all of the mass at time step t is accounted for as having originated at some previous time step t.
Lemma 6. For all t and t ≥ t, Proof idea. The vector |v (t,t) SI represents the mass that satisfies two conditions: (1) it was destined for the |S n ⊗ |S n fixed point at time step t − 1, and (2) the noise at time step t caused it to be destined for the |S n ⊗ |I n fixed point at time step t. At most 1, 1|v (t−1) SS mass qualifies under condition (1). Among that mass, each of the two noise location can only impact a σ fraction of the mass, so the fraction of mass that can be re-directed is at most (1 − (1 − σ ) 2 ).

B.4 Consequences of anti-concentration
In all of our rigorous proofs, we assume we have a random quantum circuit architecture that is h-regularly connected for some constant h = O(1), and has anti-concentration size equal to s AC . Recall that this means that Z 0 becomes twice its limiting value at s AC . When this is the case, we have the following lemmas. All constants are dependent on q and h, but not on n or any noise parameters.
Lemma 7. Suppose the random quantum circuit architecture is regularly connected. There exist constants χ 1 and χ 2 such that for all t ≥ s AC q, 1|v (t) ≤ 2q n q n + 1 where Proof idea. The left-hand side is precisely Z 0 for a circuit with size t. The regularly connected property indicates that for any configuration not at a fixed point, there will be a gate that couples an I with an S roughly once every O(n) gates. When this happens, the difference between Z 0 and its infinite-size limit is reduced by a constant factor, leading to the scaling in the lemma.
Lemma 8. Suppose the random quantum circuit architecture is regularly connected. There exist constants χ 3 and χ 4 such that for all t where Proof idea. Anti-concentration happens because most of the probability mass makes it to one of the fixed points. This lemma states that after the anti-concentration size, most of the mass destined for the S n fixed point has already reached it. The fraction that has not yet reached is η t , which decays exponentially with t/n. We show that if this were not the case, then the bound in Lemma 7 could not hold.
Lemma 9. Suppose the random quantum circuit architecture is regularly connected. There exist constants χ 5 and χ 6 such that for any non-negative vector |v that is normalized (i.e. 1, 1|v = 1), the following holds for any t 0 and any t 1 ≥ t 0 .
Proof idea. Recall from Lemma 4 that if |v evolves by R I is the matrix that conditions on sending the vector to the I n fixed point, so it is even more I-biased than the transition matrix P (t) . Thus, each time a bit is flipped, the Hamming weight is likely to decrease, and the inner product with q| − 1| will be reduced by a constant factor. This will (usually) happen once every O(n) gates if the architecture is regularly connected. The insertion of the Q (t) σ operators will only make the Hamming weight smaller since they can only flip S → I.

B.5 Exponential clustering of S-destined probability mass
A key step in our analysis is that the S-destined mass stays close to the S n fixed point, as long as σ = O(1/n). In fact, the probability of deviating from the fixed point by x bit flips decays exponentially in x. Intuitively, this is because the S-destined mass is biased to move upward in Hamming weight, and when σ is small enough, this upward pressure will be greater than the downward pressure coming from the noise itself.
We prove this for the W system, which captures the difference between the (noiseless) X and (noisy) Y systems. We cannot directly analyze the Y system because at time step 0, the statement is definitively not true. It takes s AC gates for the S-destined mass in the Y system to initially converge. Meanwhile, the W system begins at the S n fixed point. This is the main reason we introduced the W system in the first place.
Define the projector Lemma 10. There exist constants χ 7 , χ 8 , χ 9 , and n 0 such that as long as σ ≤ χ 7 /n and n ≥ n 0 , the following holds for any t and any integer w with 1 ≤ w < n.
Proof idea. The S-destined portion of the mass within the W system starts at the S n fixed point. When noise acts at time step t, some of the mass moves to Hamming weight n − 1 but continues to be S-destined, and some of it is "redirected" to become I-destined, which is captured in the |v vector. The total amount of redirected mass cannot be too large, as we see in Lemma 6. Moreover, the redirected mass must steadily move downward in Hamming weight (after all, it is I-destined), which we quantify with Lemma 9. This is important because for each value of the Hamming weight w, the amount of S-destined mass divided by the amount of I-destined mass at that Hamming weight is precisely q −2n+2w −q −2n 1−q −2n+2w ≈ q −2(n−w) , so as the I-destined mass moves down in Hamming weight, the S-destined mass that corresponds to it decreases exponentially. After accounting for each bit of I-destined mass by summing over all |v (t ,t) SI , we can prove the lemma.

B.6 Relating Z σ to the amount of S-destined probability mass
The following lemma states that keeping track of the amount of S-destined mass is sufficient to get good upper and lower bounds on the quantity Z σ .
Proof idea. For each w, we know the ratio of the I-destined and S-destined mass at Hamming weight w: for each portion of S-destined probability mass, there is roughly q 2(n−w) I-destined probability mass. This decreases with w like q −2w . The contribution of mass at Hamming weight w to Z σ increases, but at the slower rate of q w . Thus, for a fixed amount of S-destined mass, Z σ is minimized when all of it is at the S n fixed point, leading to our lower bound. On the other hand, we know that the S-destined mass is exponentially clustered near the S n fixed point (Lemma 10), so this lower bound cannot be too loose, which we leverage into an upper bound.

B.7 Bounding the S-destined mass
Now, all that remains is to compute the amount of S-destined mass. Here we show upper and lower bounds on this quantity for layered architectures and for the complete-graph architecture.
Lemma 12. Suppose the random quantum circuit architecture is regularly connected and layered. Let d AC be its anti-concentration depth. Then, for any d, Moreover, there exist constants a 0 , a 1 , a 2 , a 3 , and n 0 such that, as long as σ ≤ a 3 /n and n ≥ n 0 , where d AC is the anti-concentration depth.
Lemma 13. Suppose the random quantum circuit architecture is the complete-graph architecture. Let s AC be its anti-concentration size. Then, for any s, Moreover, there exist constants b 0 , b 1 , b 2 , b 3 , and n 0 such that, as long as σ ≤ b 3 /n and n ≥ n 0 , 2 1−q −2n s q n + 1 e b 0 σ 2 s+b 1 σ s AC +b 2 nσ log(1/(nσ )) (179) Proof idea for Lemma 12 and Lemma 13. When a portion of S-destined mass is at the S n fixed point, and noise acts to move it to Hamming weight n−1, we have a good understanding of what fraction remains S-destined. Specifically, there is a q −2 −q −2n 1−q −2n chance that it re-equilibrates to S n . We also know the chance that it will make the transition in the first place; the transition from S → I happens with probability precisely σ . This scenario gives the maximum amount of lost S-destined mass, and gives rise to our lower bound. However, if the portion of S-destined mass is not at the S n fixed point, then this is complicated in two ways. First, the probability of re-equilibrating back to S n is a slightly different expression, and, more importantly, the noise will not cause a transition as often, as there is a chance it acts on a bit that is already 0. If the configuration has Hamming weight w and the noise acts on a random bit, the chance of a transition is n−w n σ so a smaller amount of S-destined mass is lost at each step. Luckily, we know that the S-destined mass is exponentially clustered near w = n (Lemma 10), so the corrections are small, which gives rise to the upper bound.
We utilize the layered architecture property to be able to say that every qudit is acted upon by noise after each layer, and thus, from the perspective of the amount of S-destined mass, all that matters is the Hamming weight of the configuration prior to the noise. The same is true for the complete-graph case because the gates are chosen randomly and each qudit is equally likely to participate. However, we do not believe this property is necessary for our result to be true. I also has non-negative matrix elements. The support of P I is the entire vector space except for the span of |S n . Consider another basis state | ν . Since gate t acts on qudits {i t , j t }, if ν i t = ν j t then it is a +1 eigenvector of |P (t) and If ν i t ν j t , then P (t) sends | ν to a basis state with Hamming weight reduced by 1 with probability q 2 /(q 2 + 1), and to Hamming weight increased by 1 with probability 1/(q 2 + 1), so This demonstrates P (t) I is a stochastic matrix when restricted to its support.

B.8.2 Proof of Lemma 4
Proof. We consider the action of both sides of the equation on an input state | ν, µ . Let a and b be the number of 1 entries in ν and µ, excluding the positions {i t , j t }, respectively, and let c be the number of entries on which ν and µ agree. Since we are restricting to the accessible subspace, we Since ∆ is a tensor product across all bits i ∈ {0, . . . , n − 1}, and both P (t) I and R (t) SI modify only bits i t and j t , it is sufficient to consider the transitions among just bits i t and j t . First, define Let the four bits below be ordered X i t X j t , Y i t Y j t . The right-hand side has the following effect, where the first arrow is application of ∆ and the second is application of P which verifies that the left-hand and right-hand sides are equal.

B.8.4 Proof of Lemma 6
Proof. Recall that L SI = I ⊗ L I − L I ⊗ I , but the second term commutes with I ⊗ Q σ , thus we may ignore it in the following calculation.
If µ = ν the factor gives 0. For each ν there are at most three possible µ ν for which the matrix element µ|Q σ | ν 0, corresponding to a single error on either qudit or an error on both at once. In those cases, the matrix element is σ (1 − σ ) (for single error) or σ 2 (for double error). The double error is only possible if | ν| ≥ 2, but note that we may assume | ν| 1 since action by R (t) 0 will leave the two bits it acts on equal, and cannot lead to a configuration with Hamming weight 1. We have This lets us say where the last equality follows because R SS is stochastic. The fact that this is also true for |v (t ,t) with t > t follows from the fact that |v (t ,t) is related to |v (t,t) by a sequence of stochastic matrices, which preserves the left-hand side of the lemma statement.

B.8.5 Proof of Lemma 7
Proof. This proof is similar to the proof of the general upper bound on the collision probability in Ref. [8]. Define Z (t ) = q, 1|v (t ) . If the anti-concentration size is s AC , this means that where Z H = 2/(q n + 1) is the limiting value of the collision probability studied in Ref. [8]. Note that Z (t ) is monotonically non-increasing with t (i.e., collision probability only decreases as more gates are applied). Recall that for architectures where the circuit diagram is random, |v (t ) represents an average over choice of circuit diagram. The h-regularly connected property says that, no matter what the circuit diagram has looked like up to time step t , given any partition of the qudits into two parts, there is at least a 1/2 probability that the next hn gates in the circuit diagram will include at least one gate that couples qudits from opposite parts. Conditioned on coupling the two parts, the portion of the collision probability associated with configurations not already at a fixed point will decrease by a factor 2q/(q 2 + 1), as was seen in the general upper bound on the collision probability in Ref. [8]. Thus for all t , Applying the above recursively, we have Z (s AC +zhn) − 2q n q n + 1 ≤ (q + 1) 2 2(q 2 + 1) z 2q n q n + 1 ≤ 2 (q + 1) 2 2(q 2 + 1) Now we ensure something similar holds for every value of t and not just t = s AC + zhn for integers z. Let t 0 be the maximum integer for which t 0 ≤ t, and t 0 = s AC + z 0 hn for some integer z 0 . So t − t 0 ≤ hn and z 0 ≥ (t − s AC )/(hn) − 1. Moreover, by monotonicity, we have Z (t) ≤ Z (t 0 ) . Together, this implies Z (t) ≤ 2q n q n + 1 + 2 (q + 1) 2 2(q 2 + 1) z 0 = 2q n q n + 1 + 2 (q + 1) 2 2(q 2 + 1) where χ 2 = 4(q 2 + 1)/(q + 1) 2 and χ 1 = 1 h log 2(q 2 + 1)/(q + 1) 2 .
B.8.6 Proof of Lemma 8 where the last line follows because the total amount of S-destined mass for the noiseless copy is exactly 1/(q n + 1). From Lemma 7, we have q, 1|v (t) − 1 q n − 1 Combining the above, we have and hence where The inequality above is true for all n ≥ 1 and q ≥ 2. We choose χ 4 = 6χ 2 and χ 3 = χ 1 , and the lemma is proved.

B.8.7 Proof of Lemma 9
Proof. The gate at time step t acts on bits i t and j t . Suppose for some configuration ν these bits disagree, i.e. ν i t ν j t . Consider a state | η, η for which ∆| η, η = | ν . Then consider the quantity The action of P (t) on | ν will force a bit flip, so there are only two possible µ that lead to a nonzero contribution, one for which | µ| = | ν| + 1 and one for which | µ| = | ν| − 1. The matrix element (probability) of the former is 1/(q 2 + 1) and the matrix element for the latter is q 2 /(q 2 + 1). Thus, we have q|P (t) The above is true for all ν, and demonstrates that each time disagreeing bits are coupled, the total contribution under inner product with ( q| − 1|)∆ decreases by a constant factor. Now consider the sequence SI acting on | η, η . Since the architecture is h-regularly connected, for any t there is at least a 1/2 chance that there will be some pair (i t , j t ) with t < t ≤ t +hn for which ν i t ν j t (assuming ν is not a fixed point). The first time this happens, it will lead to a decrease in inner product with ( q| − 1|)∆ by the factor 2q/(q 2 + 1). The only way this would not happen is if one of the bits ν i t or ν j t was flipped already by action by one of the operators Q (t ) . However, since the Q (t) σ operators act only on the noisy Y copy, they can only flip a bit of η from a 1 to a 0, which would also induce a bit flip in ν from a 1 to a 0. In this case, the Hamming weight decreases by 1 and the inner product with ( q|− 1|)∆ would decrease by a factor of q | ν|−1 −1 q | ν| −1 which is less than 2q/(q 2 + 1). Thus, if z 0 is the largest integer such that t 0 + z 0 hn ≤ t 1 , then for appropriate choice of χ 5 and χ 6 .

B.8.8 Proof of Lemma 10
Proof. When probability mass is redirected from S-destined at time step t −1 to I-destined at time step t , it may begin with Hamming weight as large as n − 1. But since it is I-destined, it will quickly move down in Hamming weight. We wish to quantify this phenomenon. First of all, Now, note that |v , so we can invoke Lemma 9.
where the second line follows because q n is the maximum entry in q|, and the quantity 1|v does not change as t increases (it evolves by stochastic transformations).
We now invoke Lemma 6 (in the first line) and Lemma 5 (in the second line) to say where the extra factor of 2 comes from a very crude bound (q n − 1)/(q w − 1) ≤ 2q n−w . As long as χ 5 /n is greater than 2 log(1/(1 − σ )), the above is exponentially decaying in t. This will be the case whenever σ ≤ 1 − exp(−χ 5 /2n)). There is an n 0 and χ 7 such that σ ≤ χ 7 /n whenever n ≥ n 0 is a weaker condition. Alternatively, we could make a simpler bound by invoking Lemma 6 and Lemma 5, but not Lemma 9.
Both Eq. (253) and Eq. (255) will be useful. Now, we connect |v This allows us to use Eq. (164) and assert Let t w = t − n(n−w) log(q)/χ 5 . For t > t w , we will bound |v (t,t ) SI with Eq. (255), and for t ≤ t w , we will use Eq. (253). Let us examine these sums separately. For the t > t w portion, we make the substitution a = t − t w − 1, and we have for some constant χ 5 slightly larger than 4 log(q)/χ 5 to account for dropping the ceiling in the last line. Note that in the third-to-last line, the extra factor of 2 comes from the bound 2σ /(2σ −σ 2 ) ≤ 2.
For the t ≤ t w portion, we use the substitution a = t w − t and find (assuming for some constant χ 6 . Plugging the bounds on the two parts of the sum into Eq. (261), we find for some constants χ 9 and c which is less than 1 whenever σ ≤ χ 7 /n and n ≥ n 0 hold. Thus we may define χ 8 = (1 − c ) log(q) and the lemma is proved.

B.8.9 Proof of Lemma 11
Proof. Recall that Z σ = 1, q|v (s) . and that |v (t) SS = L SS |v (t) . The matrix L −1 SS is defined to be the Moore-Penrose pseudo-inverse of L SS , and note that the null space of L SS is the space spanned by | ν, I n for all ν. The projector onto this subspace is |1, I n 1, I n |. Thus, |v (s) = I |v (s) = (|1, I n 1, The lower bound is shown as follows: Now, we will show the upper bound. where we have used Z 0 = ν µ q | µ| µ, ν|v (s) . Now we invoke Lemma 8, to say Now we invoke Lemma 7 to bound Z 0 −2q n /(q n +1) in the first step below, and continue on. Denote η s = η s + η s .
≤ η s + (q n − 1) ν q n−| ν| S n , ν|v ≤ η s + (q n − 1) 1, 1|v where in the second-to-last line we have invoked Lemma 10, which requires σ ≤ χ 7 /n and n ≥ n 0 (leading to our requirements in this lemma that σ ≤ χ 13 /n and n ≥ n 0 ). Now, we make the choice of χ 10 = χ 9 (1), which yields the following. (In line 2, we invoke Lemma 5.) n (s−s AC )+4sσ (306) where the third-to-last line is true for all q ≥ 2 and n ≥ 1, and the second-to-last line plugs in the equations for η s and η s , chooses constants χ 11 and χ 12 appropriately, and asserts (1 − σ ) 2s ≤ e −4σ s , which is true whenever σ ≤ 0.79, so it is certainly true under the assumption σ ≤ χ 7 /n for sufficiently large n.

B.8.10 Proof of Lemma 12
Proof. Recall that 1, 1|v Collecting these observations, we have 1, 1|v = E n 1, 1|v Hence, the lower bound in the lemma statement follows by recursively applying the above conclusion for increasing d.
To show the upper bound, we return to Eq. (322). Note that E w ≤ 1. We can restate what we know and divide the mass into whether or not the noiseless copy has reached the S n fixed point, and if it has, what value w for the Hamming weight the noisy copy ends up at.

1, 1|v
where where we have used the substitution a = n−w . For any c, there is a constant c such that ∞ a=a 0 ae −ca is bounded by c e −ca 0 . Thus, there is a constant c such that SS nσ χ 9 c e −(χ 8 +log(q)) n−w 2 ≤ 1, 1|v (t 0 ) SS nσ χ 9 c e −(χ 8 +log(q)) n−w 2 (334) = 1, 1|v with the definitions f = χ 9 c = O(1) and f = χ 8 + log(q) = O (1). Note also that by construction n w=1 A w ≤ 1, 1|v which we can insert into Eq. (327), along with the bounds on A w , giving 1, 1|v We also have which can be verified by observing that the quantity achieves its maximum with respect to σ when σ = 0, where it equals 1. The quantity in parentheses in Eq. (337) is now at most where in the first line, we bound e −x log(1−σ ) − 1 by τσ x for some constant τ, which holds for x sufficiently small, as is the case when σ ≤ O(1/n) with n sufficiently large; in the second line, we choose the appropriate constant f as a bound for the sum f τ n−1 a=1 ae −a . This gives us the recursion relation 1, 1|v For the first few layers, before anti-concentration has been reached and η t 0 has become small, we will just use the simpler naive bound 1, 1|v where in line 1, we refer back to the definition of E n and choose χ 4 slightly larger than χ 4 , in line 2, we use Lemma 5, and in line 3 we choose d * = d AC χ 3 /2χ 3 + f + log(1/nσ )/χ 3 (346) for some constant f that is O(1) whenever −n log(1 − σ ) is O (1). Note that this also requires n log(1 − σ ) ≤ χ 3 . We can choose the constant a 3 such that the condition σ ≤ a 3 /n implies these requirements hold. Note we also must choose a weaker exponential decay constant χ 3 . Thus our recursion relation is (1 + f nσ 2 + nσ e −χ 3 (d −d * ) ) (348) for some choice of χ 3 (the exponentially decaying sum is bounded). Now, we note from the definition of E n that as long as σ ≤ O(1/n), there is a constant g (slightly larger than 1) such that E n ≥ exp(−gnσ ), allowing us to say which, recalling the definition of d * in Eq. (346), implies the lemma statement for appropriate choices of a 0 , a 1 , and a 2 .

B.8.11 Proof of Lemma 13
Proof. In the layered case (proof of Lemma 12), we considered the action of all n/2 gates in a layer at once. For complete-graph, we can treat each gate individually. Following the layered derivation to Eq. (314), for complete-graph we have 1, 1|v Here the tth gate acts on two qudits i t and j t , but in forming |v , we take the average over all possible choices of {i t , j t }, as the complete-graph architecture chooses the pair of qudits to act on uniformly at random. After action by R (t) SS the values assigned at position i t and j t must be set equal. If they are assigned S, then errors can send the new configuration to one of four possible configurations, corresponding to errors on none, one, or both qudits. If they are assigned I then no errors are possible. If we assume ν i t = ν j t = S, then zero errors occurs with probability (1 − σ ) 2 , one error with probability 2σ (1 − σ ), and two errors with probability σ 2 . Thus, we have where σ = σ (1 − q −2 ). Define the final expression as The quantity J w is monotonically increasing in w and satisfies J w ≤ J n for all w. Meanwhile, if ν i t = ν j t = I, then 1|L S Q S . Suppose the noisy copy starts at a configuration | η . If | η| = w, then let φ SS,w be the probability that the qudits i t and j t are both assigned S, φ IS,w be the probability one is assigned S and one is assigned I, and φ II,w be the probability both are assigned I.
Note that φ SS,w + φ IS,w + φ II,w = 1. In the case where one is I and one is S, the I is flipped to S by P (t) S with probability P ↑,w and the S is flipped to I with probability P ↓,w , where P ↑,w = 1 q 2 + 1 q −2n+2w+2 − q −2n q −2n+2w − q −2n (358) which increases or decreases the Hamming weight of w by 1. Note the following equalities and inequalities: where the last inequality follows because, when w ≥ n/2, φ IS,w ≥ n−w n−1 , and when w < n/2, φ II,w ≥ 1 4 . We may now define G w by the following equation, where | η| = w, = φ SS,w J w + φ IS,w (P ↑,w J w+1 + P ↓,w ) + φ II,w .
We want to lower bound this quantity. If n = 2, then G 1 = G 2 = J 2 . If n > 2, we have G w ≥ φ SS,w J w + φ IS,w (P ↑,w J w + P ↓,w ) + φ II,w = J n + (1 − J w )(φ II,w + P ↓,w φ IS,w ) − (J n − J w ) (367) By inspection of the final equation, we see that G w ≥ J n for every combination n > 2, w ≥ 1 (since q > 2) except when w = n, but for w = n, G w = J n by definition, so G w ≥ J n also holds. This immediately gives us which proves the lower bound by recursion on increasing t and the fact that 1, 1|v (0) SS = 1/(q n + 1). To show the upper bound, we first observe G w ≤ J n + (1 − J n )(φ II,w + P ↓,w φ IS,w ) .
Moreover, there exists a constant b such that J n ≥ 1/b as long as n ≥ 2 and σ ≤ 0.5. and thus G w ≤ J n (1 + 2bσ n − w n ) .
Similar to the proof of Lemma 12, we can split the initial weight into parts for which the noiseless copy has reached the S n fixed point, and a part that has not.

1, 1|v
(t) where A not = η, µ S n G | η| µ, η|v very small error tolerance [20-23, 43, 47], but these results are multiple steps away from proving Conjecture 1 because they concern computing probabilities (strong simulation) as opposed to sampling (weak simulation), and furthermore they cannot tolerate errors of size O(1) in total variation distance. However, another issue with applying the conjecture in practice is that actual devices are unlikely to be able to sample from a distribution with such small total variation distance from ideal, as doing so requires error rates to be exceedingly small. Sampling from a distribution p noisy that is close in total variation distance to p wn (for some non-negligible choice of F) is potentially much more tractable in the near term; indeed, the experiments from Refs. [4][5][6] claim to have performed this task (although note that their random circuits were not Haar random, but rather chosen from some other discrete random ensemble). We refer to this task as white-noise RCS.
Conjecture 2 (White-noise RCS is PH-hard). There exists a choice of ε = O(1) and δ ≥ 1/ poly(n) such that whenever the fidelity F satisfies F ≥ 1/ poly(n), the task of sampling from a distribution p noisy for which 1 2 p wn − p noisy 1 ≤ εF for at least a 1 − δ fraction of random quantum circuit instances is PH-hard.
Note that exact worst-case white-noise sampling is PH-hard (as long as F is at least inverse polynomial). A version of this statement, which further claims that the exact worst-case whitenoise task can be at most a factor of F easier for classical computers than the exact worst-case noiseless task, appears in the Supplementary Material of Ref. [4]. However, allowing error of size εF was not explicitly considered. Here we show that this is not an issue, and that approximate white-noise RCS and approximate RCS are essentially equivalent in this context, up to a linear factor in F, whenever the underlying random quantum circuits have the anti-concentration property.
Theorem 4. Consider a random quantum circuit architecture that has the anti-concentration property. That is, there is a constant z such that E U [ x p ideal (x) 2 ] ≤ zq −n . Define an oracle O as follows. On input (U , b), where U is a description of a n-qudit circuit with poly(n) gates drawn randomly from the architecture, and b is a string of poly(n) uniformly random bits, O produces an output x from a distribution p noisy for which 1 2 p noisy − p wn 1 ≤ εF holds for a certain (known) constant F on at least 1 − δ fraction of random circuit instances U .
Then, given access to O and an NP oracle, there is an algorithm with runtime F −1 poly(n) that produces samples from a distribution p for which 1 2 p − p ideal 1 ≤ ε on at least 1 − δ fraction of circuit instances, with ε = 4ε + 1/ poly(n) (392) Proof of Corollary 1. It is straightforward to show that Conjecture 2 implies Conjecture 1 simply by reduction from the white-noise RCS task to the approximate RCS task: suppose one could efficiently classically produce samples from a distribution p noisy for which 1 2 p noisy − p ideal 1 ≤ ε. Then, for any choice of F, one can design another algorithm that samples from a distribution p noisy by producing a uniformly random output with probability 1 − F and an output drawn from p noisy with probability F. Then we have 1 2 p noisy − p wn 1 ≤ εF. Thus, whenever approximate RCS can be performed efficiently, white-noise RCS can also be performed efficiently with the same (ε, δ) parameters, and if the latter is PH-hard then the former is also PH-hard. The fact that Conjecture 1 implies Conjecture 2 is a direct implication of Theorem 4. Given a target (ε , δ ) pair for which approximate RCS is hard, we can choose ε = O(1) and δ ≥ 1/ poly(n) such that if a white-noise sampler exists with those parameters, there is also an approximate sampler with parameters (ε , δ ) that runs in poly(n) time and requires access to an NP oracle. However, since NP lies within the PH, this would still imply a collapse of the PH to one of its levels.
The part of the proof of Corollary 1 that shows Conjecture 2 implies Conjecture 1 also illustrates why a linear factor of F is optimal. To simulate a white-noise output, one need only produce an output from p ideal an F fraction of the time, so producing T samples requires only FT queries to a sampler for p ideal . If sampling from p ideal is a hard classical task, sampling from p wn is thus at least a factor of F easier. Theorem 4 shows that, in a sense, it is also at most a factor of F easier.
This observation essentially puts the low-fidelity and high-fidelity noise regimes on the same theoretical footing when it comes to hardness of sampling, as long as the fidelity is at least inverse polynomial in n. One might object that F ≥ 1/ poly(n) is unrealistic in an asymptotic sense, and in many cases, this may be true. However, one way to achieve F ≥ 1/ poly(n) is to run circuits with Pauli error rate = Θ(1/n) and circuit size s = Θ(n log(n)), which, conveniently, is precisely the size required to achieve the anti-concentration property, as shown in Ref. [8]. Moreover, when the fidelity is inverse exponential in n (but larger than 2 −n ), there is still a sense in which the low-fidelity regime can be at most a factor of F easier for a classical computer than the high-noise regime.
Proof of Theorem 4. The idea behind our reduction is to combine approximate rejection sampling with the ability to efficiently estimate p noisy (x) up to 1/poly(n) relative error for any fixed instance U using an NP oracle (Stockmeyer's approximate counting algorithm [48]). To be precise, for any ν, any µ, and any x, there is a randomized algorithm (with access to NP oracle) that produces a number, denoted p such that with probability at least 1 − µ, and the algorithm runs in time ν −1 * poly(n, log(1/µ)). For the linear dependence on ν −1 , see the Supplementary Material of Ref. [4] or the lecture notes in Ref. [49]. For a fixed ν and µ, we may take µ = q −n µ and note that log(1/µ ) = poly(n) + log(1/µ). Now fix a set of random bits ω to feed into the randomized algorithm above. If we feed the same bits ω for every choice of x with parameters ν and µ , then we have a fixed set of outputs p noisy (x) for each possible x, and by the union bound, these values satisfy for every x simultaneously with probability at least 1 − µ over the choice of ω. On any particular x, the algorithm still runs in time ν −1 poly(n, log(1/µ)). When this is the case, Also, let