1 Introduction

A promising application of future quantum computers is to simulate properties of physical systems [3, 12, 16, 20, 29, 33, 39, 59]. As a fundamental quantum algorithm subroutine, Hamiltonian simulation seeks to efficiently approximate the time evolution operator \( \textrm{e}^{\textrm{i}\varvec{H}t} \) using elementary building blocks, such as a universal gate set or whichever experimentally available operations. Despite the simplicity of the problem statement, developing quantum algorithms that minimize the required resources (e.g., the gate complexity) has drawn tremendous effort [7, 9, 18, 19, 35], especially given the current limited experimental capability of quantum simulators.

The main Hamiltonian simulation method we study is product formulas, or Trotterization. As an old idea, it simply approximates the exponential of a sum by products of individual exponentials

$$\begin{aligned} \textrm{e}^{\textrm{i}(\varvec{H}_1+\varvec{H}_2 )t }= \textrm{e}^{\textrm{i}\varvec{H}_1t } \textrm{e}^{\textrm{i}\varvec{H}_2 t } +\mathcal {O}(t^2). \end{aligned}$$

Constructions such as the Lie-Trotter-Suzuki [33, 53] formulas generalize to Hamiltonians with many terms and to a higher-order approximation \(\mathcal {O}( t^{\ell +1} )\). However, the Trotter error, as hidden in \(\mathcal {O}(t^2)\), had been challenging to analyze, and for a while product formulas were under the shadow of more advanced quantum algorithms based on quantum walks and quantum signal processing [34, 35].

Nevertheless, product formulas have recently resurfaced as a strong candidate for Hamiltonian simulation for experimental, numerical, and theoretical reasons. In the near-term or early-fault-tolerant regime with severe restrictions to the number of qubits, depth, and connectivity, its simple prescription without controlled ancilla appears attractive. Despite its simplicity, numerical case studies [16] suggest product formula may outperform more advanced methods. These reasons further fueled theoretical analysis of Trotter errors where sharper and sharper theoretical guarantees continue to reduce the cost by exploiting the structure of the problem, such as initial state knowledge [48, 56] and spatial locality of the model [22].

Especially, the seminal work [18] puts together an analytic framework that exploits commutation relations. Consider the general class of k-local Hamiltonians \(\varvec{H}= \sum _{\gamma =1}^{\Gamma } \varvec{H}_{\gamma }\) (i.e., sum over few-body Pauli strings \(\varvec{\sigma }^x_1, \varvec{\sigma }^y_1 \varvec{\sigma }^y_2,\ldots \)). It was shown that using higher-order Suzuki formulas, the gate complexity

$$\begin{aligned} G\approx \Gamma \left\| {\varvec{H}} \right\| _{(local),1} t \quad \text { where}\quad \left\| {\varvec{H}} \right\| _{(local),1}&:= \max _{\text {site } i} \sum _{\gamma : i \in \gamma }\Vert {\varvec{H}_\gamma } \Vert \end{aligned}$$
(1.1)

suffices to approximate the unitary evolution for any input state. The bound depends on the number of terms \(\Gamma \) in the Hamiltonian and a local energy estimate \(\left\| {\varvec{H}} \right\| _{(local),1}\) (Fig. 2). This local quantity sums over terms \(\varvec{H}_\gamma \) overlapping with a site i and takes the maximum over sites; it tends to be much smaller than the global sum \(\sum _{\gamma } \Vert {\varvec{H}_{\gamma }} \Vert \). This theoretical guarantee renders product formulas among the strongest candidates for simulating physical systems (Table 1).

In light of the developments, we may ask: what remains to be known for Trotter error? In some other contexts, the folklore [59] suggests errors in quantum computing might, in practice, add up incoherently, which can be significantly smaller than coherent noise [8, 23, 54]. Intuitively, different scaling occurs whether the noises are “pointing at the same direction”. For a minimal example, consider a sum over m numbers taking values \(\pm 1\). In the worst possible scenario, they could all share the same sign and add up coherently. However, if the numbers have random signs independent of each other, the total strength is usually much smaller

$$\begin{aligned} 1+1+\cdots 1&= m \quad \text {(coherent error)} \\ 1-1+\cdots -1&\sim 0 \pm \mathcal {O}(\sqrt{m}) \quad \text {(incoherent error)}. \end{aligned}$$

Curiously, the existing gate complexity, as a manifestation of the Trotter error, exhibits the coherent scaling where terms are added up linearly (1.1). Could Trotter error and the gate complexity, in practice, enjoy the much milder incoherent scaling?

Fig. 1
figure 1

Concentration of gate complexity distribution for product formula for states drawn from any fixed orthogonal basis. The vast majority of states are controlled by our typical case results (Theorem I.1), while extremal states may require the worst case guarantees (1.1) [18]. The two gate complexities coexist and differ because the Trotter error is a high-dimensional object

This work presents the incoherent aspects of Trotter error that exhibit qualitatively different scaling from the state-of-the-art estimates [18]. Pictorially, the Trotter error is a high-dimensional object that cannot be summarized in a single bound. Instead, there is a distribution of Trotter error over input states (Fig. 1). The existing estimate (1.1) accounts for the worst-case inputs that may not be practically relevant; the vast majority of inputs enjoy a much better scaling. More precisely, we show that, with high probability, the gate complexity exhibits a root-mean-square, or 2-norm scaling for inputs drawn from any orthogonal basis

$$\begin{aligned} G\approx \Gamma \left\| {\varvec{H}} \right\| _{(local),2} t \quad \text { where}\quad \left\| {\varvec{H}} \right\| _{(local),2}&:= \max _{\text {site } i} \sqrt{\sum _{S: i \subset S } b^2_S} \quad \text {and}\quad b_{S}:=\sum _{\gamma \sim S} \Vert {\varvec{H}_{\gamma }} \Vert . \end{aligned}$$
(1.2)

The local quantity \(\left\| {\varvec{H}} \right\| _{(local),2}\) is now a sum-of-squares over sets S overlapping with a site i (the scalar \(b_S\) sums over all terms \(\varvec{H}_{\gamma }\) with support being the set S). Our estimate yields substantial improvements over (1.1) when the Hamiltonian has large connectivity (such as with long-range interactions, see Table 1), which directly leads to resource reduction for various quantum simulation tasks. Further, motivated by quantum chaos and the SYK models [38, 47], we show that when the Hamiltonian itself has random coefficients, even the worst input states enjoy a 2-norm scaling for Trotter error.

To reiterate, our results give evidence that, in practice, product formulas may generically work even better than expected. This improvement is due to framing a qualitatively different question from the existing worse-case results. Indeed, we provide analytic and numerical evidence that the average-case (1.2) and worst-case (1.1) estimates can be simultaneously tight in their respective contexts. More broadly, our findings open doors to the average-case study of quantum algorithms, which is relatively unexplored yet could greatly improve the feasibility of quantum computing applications.

To derive our average-case results, we combine matrix concentration inequalities (uniform smoothness and hypercontractivity) with the commutator expansion of exponential products [18]. The matrix analysis framework is simple and robust, and we expect further applications in quantum information (See, e.g., [13,14,15]).

When this work was completed, we became aware of the work [61] which also studies Hamiltonian simulation for random inputs, and we briefly highlight the differences. First, [61] studies only the variance of Trotter error, while we show a stronger sense of typicality where the 2-norm scaling holds for all but exponentially rare inputs. This utilizes matrix concentration inequalities for the higher moments. Second, our gate complexity is asymptotically tighter for non-spatially local models and is accompanied by analytic and numerical evidence for optimality. This roots from diving deeply into the combinatorics of nested commutators. Third, in addition to random inputs, we also study random Hamiltonians and show the corresponding typical-case results.

The main text is organized as follows: we summarize results for arbitrary k-local Hamiltonians in Sect. 1.1.1 and random Hamiltonians in Sect. 1.1.2. The gate complexities are compared in Table 1. We then introduce the proof ingredients in Sect. 1.2.

Fig. 2
figure 2

The local energy estimates \(\Vert {\varvec{H}} \Vert _{(local),1}\) and \(\Vert {\varvec{H}} \Vert _{(local),2}\) sum over terms overlapping with a site i, and maximize over the sites. This is usually smaller than the global Hamiltonian

1.1 Summary of results

In this section, we present our main results regarding the performances of product formulas. Especially, consider the first-order Lie-Trotter formula and the second-order Suzuki formula

$$\begin{aligned} \varvec{S}_1(\tau ) := \prod ^\Gamma _{\gamma =1}\exp (\textrm{i}\tau \varvec{H}_\gamma )\quad \text {and}\quad \varvec{S}_2(\tau ) := \prod ^1_{\gamma =\Gamma }\exp (\textrm{i}(\tau /2) \varvec{H}_\gamma )\cdot \prod ^\Gamma _{\gamma =1}\exp (\textrm{i}(\tau /2) \varvec{H}_\gamma ), \end{aligned}$$

and the higher-order (\(\ell = 4, 6,\dots , 2p,\dots \)) Suzuki [53] formulas constructed recursively

$$\begin{aligned} \varvec{S}_{2p}(\tau ) := \varvec{S}_{2p-2}(q_p\tau )^2 \cdot \varvec{S}_{2p-2}((1-4q_p)\tau )\cdot \varvec{S}_{2p-2}(q_p\tau )^2 \quad \text {where}\nonumber \\\quad q_p:=1/(4-4^{1/(2p-1)}). \end{aligned}$$
(1.3)

1.1.1 Non-random Hamiltonians

Here, we consider a k-local (i.e., a sum of Pauli strings of length k) Hamiltonian on n-qubits with \(\Gamma \) terms \(\varvec{H}= \sum _{\gamma =1}^\Gamma \varvec{H}_\gamma .\) To present our main results, define the normalized Schatten p-norms \( \left\| {\varvec{O}} \right\| _{\bar{p}}:= \frac{\left\| {\varvec{O}} \right\| _{p}}{\left\| {\varvec{I}} \right\| _{p}} \), the vector 2-norm , and a global energy estimate in 2-norm

$$\begin{aligned} \left\| {\varvec{H}} \right\| _{(global), 2} := \sqrt{\sum _{S} b_S^2} \quad \text {and}\quad b_{S}:=\sum _{\gamma \sim S} \Vert {\varvec{H}_{\gamma }} \Vert . \end{aligned}$$

Theorem I.1

(Trotter error in k-local models). To simulate a k-local Hamiltonian using \(\ell \)-th order Suzuki formula, the gate count

$$\begin{aligned} G =\Omega \left( \left( \frac{p^{k/2}\left\| {\varvec{H}} \right\| _{(global),2} t}{\epsilon } \right) ^{1/\ell } \Gamma p^{(k-1)/2}\left\| {\varvec{H}} \right\| _{(local),2} t \right)\ \ {}&\text {ensures}\ \nonumber \\ \left\| {\textrm{e}^{\textrm{i}\varvec{H}t}- \varvec{S}_{\ell }(t/r)^r} \right\| _{\bar{p}} \le \epsilon . \end{aligned}$$
(1.4)

The p-norm estimate implies concentration for typical input states via Markov’s inequality.

Corollary I.1.1

Draw from a state 1-design ensemble such that (e.g., an orthonormal basis), then with high probability, the gate count

(1.5)

See Table 1 for the gate counts in various models and Sect. 3 for the explicit dependence on the failure probability hidden in (1.5). When the Hamiltonian contains Fermionic terms or the input is restricted to a low-particle number subspace, see Proposition II.5.1 and Proposition II.5.2 for analogous results.Footnote 1

Regarding optimality (Sect. 4), we construct a Hamiltonian that demonstrates a separation between the worst case and the typical case bounds: its Schatten p-norm saturates our estimates, while the operator norm saturates the state-of-the-art bound [18]. Namely, our 1-norm to 2-norm improvement is due to asking a qualitatively different question (Fig. 1).

Proposition I.1.1

(A model with different p-norms and spectral norm). Consider a 2-local Hamiltonian on three subsystems of qubits \(\mathcal {H}=\mathcal {H}_{S_1}\otimes \mathcal {H}_{S_2}\otimes \mathcal {H}_{S_3}\)

$$\begin{aligned} \varvec{H}= \sum _{s_1\in S_1, s_2\in S_2} \varvec{\sigma }^z_{s_1}\varvec{\sigma }^x_{s_2} + \sum _{s_2\in S_2, s_3\in S_3} \varvec{\sigma }^y_{s_2}\varvec{\sigma }^z_{s_3}. \end{aligned}$$

Then, at large subsystem sizes \(\left| {S_1} \right| =\left| {S_2} \right| =\left| {S_3} \right| \rightarrow \infty \), the first and second-order Trotter at short enough times match the p-norm estimates in Theorem II.1 and also the spectral norm estimates [18] (up to constant factors).

Note that the dependence on the number of terms \(\Gamma \) is not optimal when the terms in the Hamiltonian have non-uniform strengths; we can use a truncation argument [18] to improve the gate complexity at early times (Appendix A). Interestingly, the error due to truncation also enjoys concentration (using Hypercontractivity directly).

Table 1 Comparison of gate complexities for non-random Hamiltonians (Corollary I.1.1) for simulation time t and system size n with new results in brown. For the higher-order formulas, we drop asymptotically vanishing dependence \(o(1/\ell )\). The spatially local models at higher orders have no average-case speedup because the model has constant connectivity, and the local 1-norm and 2-norm are both independent of system size \(\Vert {\varvec{H}} \Vert _{(1),1} = const.\Vert {\varvec{H}} \Vert _{(1),2}\).

Lastly, we present numerics complementing our Trotter error bounds. In particular, we study Trotter error for 2-body Hamiltonians with an on-site disorder, with all-to-all connectivity (Figs. 3, 4, 5), or nearest-neighbor interactions (Fig. 6).Footnote 2 These models may capture many-body localization and glassy physics. The Trotter error is averaged over realizations of disorder to extract a smooth curve. The disorder also illustrates the robustness of our bounds. Our numerics appear to match the theoretical predictions regarding the dependence on the system size n (Figs. 3, 6), the evolution time t (Fig. 4), and the product formula order \(\ell \) (Fig. 5).

Fig. 3
figure 3

Trotter error for the all-to-all interacting Heisenberg model for second-order Suzuki formulas \(\varvec{S}_2(t/r)\). We fix time \(t = 10\), repeats \(r = 20{,}000\), and change the system size \(n = 5,\ldots , 13\). Each Trotter error is estimated by medium-of-mean: take the medium over 27 bins, where each bin is an average over 32 independent disorder realization. The fit \(a(n+c)^b\) gives the system size dependence b. For average inputs (2-norm), the empirical exponent reads \(b=2.03\pm 0.03\), which matches the theoretical bound (Theorem I.1, \(b = 2\)). For worst inputs (operator norm), the empirical exponent is much larger, \(b=4.07\pm 0.13\), which matches the theoretical bound ([18], \(b = 4\))

Fig. 4
figure 4

Time dependence of the 2-norm Trotter error in Fig. 3. We fix repetition \(r = 20{,}000\), the system size \(n = 12\), and change time \(t = 0, \ldots 17.5\). The fit \(a t^b+c\) gives the time dependence exponent \(b= 2.74 \pm 0.04\) (variance calculated by independent runs), which deviates slightly from the theoretical upper bounds (Theorem I.1, \(b = 3\))

Fig. 5
figure 5

Different orders of Suzuki formulas for the all-to-all interacting Heisenberg model. For the first-order Lie-Trotter-Suzuki formula, the parameters are \(t = 5, r = 200,000, n = 5,\ldots , 14\). We take medium over 8 bins, each averaging over 12 runs. The fit \(a(n+c)^b\) gives the empirical system size dependence \(b=1.46\pm 0.03\), matching the theoretical bound (Theorem I.1, \(b = 1.5\)); the parameters for 4-th order formula are: \(t=10, r =1000, n = 5,\ldots , 13\). We take medium over 32 bins, each averaging over 15 runs. The empirical exponent reads \(b=2.98\pm 0.03\), matching the theoretical bound (Theorem I.1, \(b = 3\))

Fig. 6
figure 6

Trotter error for the spatially-local Heisenberg model for second-order Suzuki formulas \(\varvec{S}_2(t/r)\). We fix time \(t = 50\), repeats \(r = 40,000\), and change the system size \(n = 5,\ldots , 13\). We take medium over 15 bins, where each bin is an average over 32 independent disorder. The fit \(a(n+c)^b\) gives the system size dependence b. For average inputs (2-norm), the empirical exponent reads \(b=0.46\pm 0.01\), which matches the theoretical bound (Theorem I.1, \(b = 1\)). For worst inputs (operator norm), the empirical exponent is much larger, \(b=1.18\pm 0.02\), which is consistent with the theoretical bound ([18], \(b = 1\))

Fig. 7
figure 7

Trotter error for the random all-to-all Heisenberg model for second-order Suzuki formulas \(\varvec{S}_2(t/r)\). We fix time \(t = 10\), repeats \(r = 20{,}000\), and change the system size \(n = 5,\ldots , 13\). We take medium over 15 bins, each averaging over 32 independent disorder. The fit \(a(n+c)^b\) gives the system size dependence b. For worst inputs (operator norm), the empirical exponent is \(b=2.56\pm 0.1\), which is smaller than the theoretical bound for random Hamiltonians (Theorem VIII.1, \(b = 3.5\)Footnote

The \(b=5\) part should be suppressed at this value of repeats \(r=20{,}000\).

) and non-random Hamiltonians ([18], \(b = 4\).). We are unable to numerically optimize the fixed input state for the norm \(\Vert {\cdot } \Vert _{fix,2}\); we only present the numerics for average inputs (2-norm) for a comparison

1.1.2 Random Hamiltonians

Sometimes, we are interested in an ensemble of Hamiltonians, most notably the Sachdev-Ye-Kitaev [38, 47] models with random coefficients. The intrinsic randomness of the Hamiltonian allows us to obtain similar but stronger results. More precisely, we consider random Hamiltonians \( \varvec{H}= \sum _{\gamma =1}^\Gamma \varvec{H}_\gamma = \sum _{\gamma =1}^\Gamma g_\gamma \varvec{K}_\gamma , \) where the coefficients \(g_\gamma \) are i.i.d. standard Gaussian \(\mathbb {E}[g_\gamma ^2]=1\), and the matrices \(\varvec{K}_\gamma \) are deterministic. The local quantities here are defined by dropping Gaussians

$$\begin{aligned} \left\| {\varvec{H}} \right\| _{(global),2} := \sqrt{\sum _{\gamma } b_\gamma ^2} \quad \text {and}\quad \left\| {\varvec{H}} \right\| _{(local),2}&:= \max _{\text {site }i} \sqrt{\sum _{\gamma : i \subset \gamma } b^2_\gamma }, \quad \text {where}\\\quad b_{\gamma }:= \Vert {\varvec{K}_{\gamma }} \Vert . \end{aligned}$$

Theorem I.2

((Informal). Trotter error in random models) Simulating random k-local models with Gaussian coefficients via higher-order (\(\ell \rightarrow \infty \)) Suzuki formulas, the asymptotic gate count

(1.6)

with high probability drawing from the random Hamiltonian ensemble. The fixed input state can be arbitrary.

See Theorem VIII.1 for the complete dependence on the finite order \(\ell \ne \infty \) and the failure probability and Theorem VI.1 for a precise gate count for the first-order Trotter formula. In other words, when the Hamiltonian is random, an arbitrary fixed input state exhibits 2-norm scaling of Trotter error. A slightly higher gate count (by a factor of the system size \(\sqrt{n}\)) would control the performance for the worst inputs that may correlate with the Hamiltonian (e.g., the Gibbs state or the ground state of the model).

Proposition I.2.1

(Distinct Hamiltonians). There exists a set of k-local Hamiltonians \(\{\varvec{H}^{(i)}\}\) with cardinality \(\textrm{e}^{\Omega (\Gamma )}\) such that each of them satisfies

$$\begin{aligned} \varvec{H}^{(i)} = \sum _{\gamma =1}^\Gamma \varvec{H}^{(i)}_\gamma \quad \text {where} \quad \Vert {\varvec{H}^{(i)}_\gamma } \Vert \le \mathcal {O}(1/n^{\frac{k-1}{2}}), \end{aligned}$$

but for early times \(t=\Omega (1)\) they are pairwise distinct

$$\begin{aligned} \left\| {\varvec{H}^{(i)}- \varvec{H}^{(j)}} \right\| _{\infty }t \ge \Omega (\sqrt{n}) \quad \text {for each pair} \quad i\ne j. \end{aligned}$$

If we further assume the matrix exponentials are also distinct (which is believable but harder to prove) \( \left\| {\textrm{e}^{\varvec{H}^{(i)}t}- \textrm{e}^{\varvec{H}^{(j)}t}} \right\| _{\infty } {\mathop {\ge }\limits ^{?}} \Omega (1), \) this implies a counting circuit complexity lower boundFootnote 4\(G=\Omega (\Gamma )=\Omega (n^k)\), which matches our gate complexity for fixed inputs (1.6) and typical input (1.4) at early times \(t=\theta (1)\). See Sect. 9.2 for the proof.

The general optimality of our bounds for random Hamiltonians is less understood numerically. We present qualitative evidence (Fig. 7) suggesting that the Trotter error for random Hamiltonians, in the operator norm, could be much smaller than that of non-random Hamiltonians [18]. At the scale of our numerics, the error seems even smaller than our theoretical estimates. Unfortunately, we are not able to numerically estimate the norm \(\Vert {\cdot } \Vert _{fix,2}\) for fixed inputs.

1.2 Proof ingredients

The Trotter error is a complicated function of matrices. The leading order Trotter error is a commutator; for example, in the first-order product formula

$$\begin{aligned} \varvec{S}_1(t)-\textrm{e}^{\sum ^{\Gamma }_{\gamma =1}\textrm{i}\varvec{H}_\gamma t}&= \frac{t^2}{2}\sum ^\Gamma _{\gamma '>\gamma \ge 1}[\textrm{i}\varvec{H}_{\gamma '},\textrm{i}\varvec{H}_\gamma ] +O(t^3). \end{aligned}$$

Analogously, the \(\ell \)-th order product formulas have leading order errors as degree \(\ell +1\) polynomials of commutators [18].

There are two main technical steps: First, how to take care of the infinite series of higher-order terms? Second, how to deliver concentration bounds for the commutator?

1.2.1 A good presentation of error

The Trotter error has a rather nasty higher-order dependence on time, and a good expansion simplifies the proof. Here we build upon the framework from [18]. Denote the target Hamiltonian \( \varvec{H}= \sum _{\gamma =1}^\Gamma \varvec{H}_\gamma , \) with some labels \(\gamma \) for the summand. We specify a product formula \(\textrm{e}^{\textrm{i}a_J\varvec{H}_{\gamma (J)} t}\cdots \textrm{e}^{\textrm{i}a_1 \varvec{H}_{\gamma (1)} t}\) with J exponentials by choosing an ordering \(\gamma (j)\) and weights \(a_j\). In particular, we will focus on the Suzuki formulas (1.3), which can be rewritten as

$$\begin{aligned} \varvec{S}_{\ell }(t) = \prod _{j=1}^{J} \textrm{e}^{\textrm{i}a_{j}\varvec{H}_{\gamma (j)} t} = \prod _{\nu = 1}^{\Upsilon } \prod _{i=1}^{\Gamma } \textrm{e}^{\textrm{i}a_{\nu ,j} \varvec{H}_{\gamma (i, \nu )} t} \quad \text {where} \quad \Upsilon = 2\cdot 5^{\ell /2-1} \quad \text {and} \quad \left| {a_{\nu ,i}} \right| \le 1. \end{aligned}$$

For the first-order Lie-Trotter formula, each term appears once, so there is only one stage \(\Upsilon =1\); the higher-order Suzuki formula has a total of \(J = \Gamma \cdot \Upsilon \) exponentials and decomposes into \(\Upsilon \) stages, where each stage goes through each Hamiltonian term \(\varvec{H}_{\gamma }\) exactly once.

Following [18], the Trotter error can be captured in the time-ordered exponential form

$$\begin{aligned} \prod _{j=1}^{J} \textrm{e}^{\textrm{i}a_{j}\varvec{H}_{\gamma (j)} t} = \exp _{\mathcal {T}}\left(\textrm{i}\int \left(\varvec{\mathcal {E}} (\varvec{H}_1,\ldots ,\varvec{H}_\Gamma , t)+ \varvec{H}\right)dt\right). \end{aligned}$$

The error is now represented as a sum of nested commutators

$$\begin{aligned} \varvec{\mathcal {E}} (\varvec{H}_1,\ldots ,\varvec{H}_\Gamma , t) :&= \sum ^{J}_{j=1}\left( \prod _{k=j+1}^{J} \textrm{e}^{a_k\mathcal {L}_{\gamma (k)} t} [a_j\varvec{H}_{\gamma (j)}] -a_j\varvec{H}_{\gamma (j)}\right) \quad \text {where}\\\quad \mathcal {L}_{\gamma }[O]:=\textrm{i}[\varvec{H}_\gamma ,O]. \end{aligned}$$

In our proof, we will “beat the nested commutator \(\varvec{\mathcal {E}}\) to death”; do a Taylor expansion on time t for the nested commutators, and each order will be a polynomial of matrices. (Fortunately, we will not need the details of the particular orderings of product formulas.) We then apply our matrix concentration tools and go through a complicated combinatorial bound (which is much more involved than obtaining the 1-norm quantity \(\Vert {\varvec{H}} \Vert _{(1),1}\) in [18]).

1.2.2 Uniform smoothness, matrix martingales, and hypercontractivity

To obtain quantitative control of complicated matrix functions, let us begin with an instructive example that captures the different perspectives. Consider a Hamiltonian as a sum of 1-local Pauli-Zs,

$$\begin{aligned} \varvec{H}= \varvec{\sigma }^z_1 + \cdots +\varvec{\sigma }^z_n, \end{aligned}$$

where each Pauli \(\varvec{\sigma }^z_i\) is supported on qubit i. How “big” is the sum?

Fig. 8
figure 8

For two scalar random variables satisfying the martingale condition, the variable b has zero mean conditioned on variable a. In the non-commutative generalization, the matrix \(\varvec{B}\) is partially traceless (“zero-mean”) on a subsystem where the matrix \(\varvec{A}\) is trivial (“conditioned on \(\varvec{A}\)”)

(1) Take the spectral norm for the largest eigenvalue in magnitude

$$\begin{aligned} \left\| {\varvec{\sigma }^z_1 + \cdots +\varvec{\sigma }^z_n} \right\| = n. \end{aligned}$$

(2) Interpret the trace as an expectation, then its eigenvalue distribution is equivalent to a sum of independent random variables \( S_n:= x_1+\cdots +x_n\) each drawn from the Rademacher distribution \(\Pr (x_i=1)=\Pr (x_i=-1)=1/2\). Now, we can use a concentration inequality to describe how rarely the random variable deviates from its expectation

$$\begin{aligned} \Pr ( \lambda _i \ge \epsilon ) \equiv \Pr ( S_n \ge \epsilon ) \le \textrm{e}^{-\epsilon ^2/2n} \ \ \ (\text {Hoeffding's inequality}). \end{aligned}$$

In other words, the typical magnitude of eigenvalues \( \left| {\lambda } \right| = \mathcal {O}(\sqrt{n}) \ll n\) is much smaller than the largest eigenvalue. This simple example captures the overarching theme of this work: Concentration is ubiquitous but often unspoken in the high dimensional setting.

To go beyond the above example, we rely on a family of recursive inequalities for their p-norms, which leads to concentration by Markov’s inequality. We begin with reviewing the ancestral scalar version, often called the two-point inequality or Bonami’s inequality (See, e.g., [21]).

Fact I.3

(The two-point inequality). For real numbers ab,

$$\begin{aligned} \left(\frac{(a+b)^p+(a-b)^p}{2} \right)^{2/p} \le a^2+(p-1)b^2. \end{aligned}$$

This can be seen by expanding the binomial. This seemingly trivial inequality turns out to have far-reaching consequences, and its simplicity becomes its strength (See, e.g., Boolean analysis [43]). The same form of inequality has an exact matrix analog, often called uniform smoothness.

Fact I.4

(Uniform smoothness for Schatten Classes [55]). For matrices \(\varvec{X}\) and \(\varvec{Y}\),

$$\begin{aligned} \left[ \frac{1}{2}(\left\| {\varvec{X}+\varvec{Y}} \right\| _{p}^p+\left\| {\varvec{X}-\varvec{Y}} \right\| _{p}^p)\right] ^{2/p} \le \left\| {\varvec{X}} \right\| _{p}^2+ (p-1) \left\| {\varvec{Y}} \right\| _{p}^2. \end{aligned}$$

The above form is not directly applicable, but its alternative forms with a martingale flavor streamline most of our proofs. For k-local operators (which are, in fact, closely related to non-commutative martingales; see Fig. 8), we derive and make heavy usage of the following:

Proposition I.4.1

(Uniform smoothness for subsystems). Consider matrices \(\varvec{X}, \varvec{Y}\in \mathcal {B}(\mathcal {H}_i\otimes \mathcal {H}_j)\) that satisfy the non-commutative martingale condition \(\textrm{Tr}_i(\varvec{Y}) = 0\) and \(\varvec{X}= \varvec{X}_j\otimes \varvec{I}_i\). For \(p \ge 2\),

$$\begin{aligned} \Vert \varvec{X}+ \varvec{Y}\Vert _{p}^2\le \Vert \varvec{X}\Vert _{p}^2 + (p-1)\Vert \varvec{Y}\Vert _{p}^2. \end{aligned}$$

In other words, uniform smoothness delivers sum-of-squares behavior that contrasts with the triangle inequality, which is linear

$$\begin{aligned} \Vert {\varvec{X}+\varvec{Y}} \Vert \le \Vert {\varvec{X}} \Vert +\Vert {\varvec{Y}} \Vert . \end{aligned}$$

This difference highlights the qualitative distinction between the worst case and the typical case, which is the starting point of all arguments in this work.

To illustrate its power, we apply to the 2-local operator (Fig. 9)

$$\begin{aligned} \left\| {\sum _{j<i}\varvec{\sigma }^x_i\varvec{\sigma }^y_j } \right\| _{p}^2&\le (p-1)\sum _{j} \left\| {\sum _{i:j<i}\varvec{\sigma }^x_i\varvec{\sigma }^y_j } \right\| _{p}^2 \\&\le (p-1)^2\sum _{j<i}\left\| { \varvec{\sigma }^x_i\varvec{\sigma }^y_j } \right\| _{p}^2, \end{aligned}$$

and more generally this gives concentration of k-local operators, or Hypercontractivity (Sect. 2).

Fig. 9
figure 9

Reorganizing the sum \(\sum _{j<i}\varvec{\sigma }^x_i \varvec{\sigma }^y_j\) into a martingale, w.r.t. the index j

For random Hamiltonians, the flavor of the problem changes slightly; we can think of adding Gaussian coefficients in our guiding example

$$\begin{aligned} \varvec{H}= g_1 \varvec{\sigma }^z_1 + \cdots + g_n \varvec{\sigma }^z_n. \end{aligned}$$

The Gaussian coefficient (i.e., external randomness) requires the following version of uniform smoothness regarding the expected p-norm \({\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{X} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{p}:= (\mathbb {E}[ \Vert {\varvec{X}} \Vert _p^p ] )^{1/p}\) that will allow us to control the spectral norm, i.e., the worst input states. Initially, this featured in simple derivations of matrix concentration for martingales [24, 42].

Fact I.5

(Uniform smoothness for expected p-norm [24, Proposition 4.3]). Consider random matrices \(\varvec{X}, \varvec{Y}\) of the same size that satisfy \(\mathbb {E}[\varvec{Y}|\varvec{X}] = 0\). When \(2 \le p\),

$$\begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{X}+\varvec{Y} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{p}^2 \le {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{X} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{p}^2 + (p-1){\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{Y} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{p}^2 . \end{aligned}$$

See Sect. 5 for the relevant background and an alternative norm for arbitrary fixed input states. Beyond the scope of this work, we emphasize these robust and straightforward martingale inequalities should find applications in versatile quantum information settings, whether by exploiting the tensor product structure of the Hilbert space or the randomness in matrix summands. See, e.g., [13, 15] for applications in operator growth and [14] in randomized quantum simulation.

1.3 Discussion

For many physical systems of interest (i.e., non-random k-local Hamiltonians), we present an average-case gate complexity that is qualitatively better than the worst-case. Without even changing the product formula, our analysis leads to a direct reduction of resources for quantum simulation applications. It is natural to hope that states appearing in practice (such as the ground state in quantum chemistry applications) behave like the typical states rather than the worst states, which would make product formulas very appealing for quantum simulation. It would be very interesting to carry out small-scale numerics. Our result holds with high probability for inputs drawn from any orthonormal basis. Unfortunately, our current argument is probabilistic and does not label the exceptional states. Heuristically, the Trotter error is another k-local Hamiltonian that does not resemble the original Hamiltonian, so the states at low energy need not be “aligned” with the extremal states maximizing the Trotter error. We leave this as an open problem.

Another natural question is whether other quantum simulation methods (such as qubitization) or even other quantum algorithms enjoy an average-case speed-up. If true, it would greatly improve the feasibility of many quantum computing applications.

2 Notations

This section recapitulates notations, and the reader may skim through this and return as needed. We write scalars, functions, and vectors in normal font, matrices in bold font \(\varvec{O}\), and superoperators in curly font \(\mathcal {L}\). Natural constants \(\textrm{e}, \textrm{i}, \pi \) are denoted in Roman font.[0]

$$\begin{aligned} n&:&\text {the system size (number of qubits) of the Hamiltonian }\varvec{H}\\ t&\in \mathbb {R}&\text {the targeted Hamiltonian simulation time}\\ \varvec{\rho }&:&\text {the density matrix}\\ k&:&\text {the locality of operators}\\ \varvec{\sigma }^{x},\varvec{\sigma }^{y}, \varvec{\sigma }^{z}&:&\text {Pauli operators}\\ \varvec{I}&:&\text {the identity operator}\\ \mathbb {E}&:&\text {the expectation (over classical random variables)}\\ \tilde{\mathcal {O}}(\cdot ),\tilde{\Omega }(\cdot )&:&\text {the asymptotic scaling ignoring (poly)logarithmic factors}\\ C_p&:= p-1&\text {the uniform smoothness constant} \end{aligned}$$

Hamiltonian:

$$\begin{aligned} \varvec{H}&= \sum _{\gamma =1}^\Gamma \varvec{H}_\gamma&\text {few-body Hamiltonians}\\ \gamma&\in \Gamma&\text {Hamiltonian term labels}\\ S&\in [1,\cdots , n]&\text {a subset of qubits}\\ b_{S}&:=\sum _{\gamma \sim S} \Vert {\varvec{H}_{\gamma }} \Vert&\text {the sum over terms }\gamma \text { with locality }S\\ \left\| {\varvec{H}} \right\| _{(global), 2}&:= \sqrt{\sum _{S} b_S^2}&\text {the global sum}\\ \left\| {\varvec{H}} \right\| _{(local),2}&:= \max _{\text {site } i} \sqrt{\sum _{S: i \subset S } b^2_S}&\text {the local sum over sets containing site } i \end{aligned}$$

Random Hamiltonian ensemble:

$$\begin{aligned} \varvec{H}&= \sum _{\gamma =1}^\Gamma \varvec{H}_\gamma = \sum _{\gamma =1}^\Gamma g_\gamma \varvec{K}_\gamma ,&\text {few-body Hamiltonians}\\&g_{\gamma }:&\text {standard i.i.d. Gaussians, }\mathbb {E}[g^2_\gamma ] =1\\&b_{\gamma }:= \Vert {\varvec{K}_{\gamma }} \Vert&\text {the strength of a term }\gamma \\&Y\left\| {\varvec{H}} \right\| _{(global),2} := \sqrt{\sum _{\gamma } b_\gamma ^2}&\text {the global sum}\\&\left\| {\varvec{H}} \right\| _{(local),2} := \max _{\text {site }i} \sqrt{\sum _{\gamma : i \subset \gamma } b^2_\gamma }&\text {the local sum over terms overlapping with site }i. \end{aligned}$$

Product formula:

$$\begin{aligned} \varvec{S}_{\ell }(t)&\approx \textrm{e}^{\textrm{i}\varvec{H}t}&\text {the }\ell -\text {th order product formula for time }t\\ r{} & {} \text {the number of repeats}\\ \Upsilon{} & {} \text {the number of stages} \end{aligned}$$

Norms:

3 Preliminary: k-Locality, Uniform Smoothness, and Hypercontractivity

In quantum information, we often encounter a k-local operator: its Pauli strings \(\varvec{F}_{S}\) have lengths at most k. This is the quantum analog of a low-degree Boolean function [41].

$$\begin{aligned} \varvec{F}&= \varvec{I}+ \sum _{\left| {S} \right| =1}\sum _{\alpha =1}^3 b^\alpha _{s_1} \varvec{\sigma }^\alpha _{s_1}+ \sum _{\left| {S} \right| =2} \sum _{\alpha =1}^3\sum _{\beta =1}^3 c^{\alpha \beta }_{s_1s_2} \varvec{\sigma }^{\alpha }_{s_1}\varvec{\sigma }^{\beta }_{s_2}+ \cdots := \sum _{\left| {S} \right| \le k} \varvec{F}_S. \end{aligned}$$

Given such a k-local operator, let us quantify its “strength” acting on states. One cautious choice is to worry about the worst possible state via the operator norm

which maximizes the vector \(\ell _2\)-norm.

Nevertheless, the input state is often unstructured (as opposed to adversarially chosen to extremize the operator). A simple and common case is to draw the inputs randomly from the computational basis. Nicely, this input state ensemble can be efficiently generated and, at the same time, correspond to the maximally mixed state and infinite temperature physics. Motivated by this, we model our “typical” states in our work to be an ensemble of state

which includes any orthonormal basis states, any t-design ensemble, and Haar random states. Of course, adding more structure to the problem would more directly address the particular question at hand, but the statement would become case-dependent and lose its simplicity. In fact, our argument below also applies to other ensembles, but applying it requires additional knowledge of the ensemble.

Now that we specified the ensemble, how large is the typical strength ? This question can be succinctly phrased in terms of a concentration inequality that controls the probability of an undesirably large strength.

Proposition II.0.1

(Typical states and Schatten p-norms). For a pure state drawn randomly from an ensemble , we have

In particular, for the maximally mixed state, we recover the normalized Schatten p-norm

$$\begin{aligned} \left\| {\varvec{F}\varvec{\rho }^{1/p}} \right\| _{p}= \left\| {\varvec{F}} \right\| _{\bar{p}}\quad \text {for}\quad \varvec{\rho }= \varvec{I}/\textrm{Tr}[\varvec{I}]. \end{aligned}$$

Proof of Proposition II.0.1

For illustration, we start with Chebychev’s inequality and the variance \(p=2\)

The last equality evaluates the expectation over states. To obtain sharper tail bounds via Markov’s inequality, consider the p-th moment

(2.1)

The inequality applies a certain form of concavity (Fact II.7). \(\quad \square \)

In other words, the (weighted) 2-norm coincides with the variance of the typical strength; the (weighted) p-norm governs the tail bounds. Indeed, the p-norms can be expressed in terms of the eigenvalues of the operator \(\varvec{F}\); (2.1) will be an equality if we draw the state from the eigenbasis of the operator \(\varvec{F}\). Conveniently, with other choices of basis, the concavity tells us we still retain an inequality in (2.1). The rest of our discussion boils down to estimating the p-norm of k-local operators. This is the content of Hypercontractivity.

Fact II.1

(Hypercontractivity for Paulis [41, Theorem 46]). For an operator acting on qubits \(\varvec{F}\in \mathcal {B}(\mathcal {H}(2^n))\), \(p \ge 2\), and \(C_p:=p-1\),

$$\begin{aligned} \left\| {\varvec{F}} \right\| _{\bar{p}}^2 \le \left\| {\sum _S \sqrt{C_p}^{\left| {S} \right| }\varvec{F}_{S}} \right\| _{\bar{2}}^2 = \sum _S C_p^{\left| {S} \right| } \left\| {\varvec{F}_{S}} \right\| _{\bar{2}}^2. \end{aligned}$$

Indeed, for \(p=2\), it holds with equality, and the 2-norm gives an incoherent sum over subsets S. (The Pauli strings are orthogonal in the Hilbert-Schmidt inner product.) We take squares to emphasize this sum-of-square nature. For general \(p\ge 2\), Hypercontractivity gives an analogous incoherent sum over subsets but with enlarged coefficients \(C_p^{\left| {S} \right| }\). In other words, so long as the sets S are small (e.g., the operator is k-local for a fixed k), we obtain all p-norm estimates from a 2-norm calculation. See also [40] for applications of Hypercontractivity, and note the equivalent inverted form also appears

$$\begin{aligned} \left\| {\sum _S \frac{1}{\sqrt{C_p}^{\left| {S} \right| }} \varvec{F}_{S}} \right\| _{\bar{p}}^2 \le \left\| {\varvec{F}} \right\| _{\bar{2}}^2. \end{aligned}$$

Historically, this zoo of closely-related ideas starts from the Boolean cases (see, e.g., [43] and Sect. 7.1) and extends to the non-commutative cases, including Paulis [41], Fermions [10], and abstract von Neumann algebras [46]. In various contexts, Hypercontractivity has been constantly revisited and found applications in classical [1, 6, 43] and quantum computing [40]. The goal of our discussion here is to put together a coherent and accessible review that illustrates the rather common phenomena with some problem-driven adaptations. We begin with the intuitive qudit case (with the maximally mixed state as the ensemble) and later swiftly generalize to several settings arising from quantum simulation.

To understand Hypercontractivity, our main approach is the following recursive inequality called uniform smoothness.

Proposition II.1.1

(Uniform smoothness for subsystems). Consider matrices \(\varvec{X}, \varvec{Y}\in \mathcal {B}(\mathcal {H}_i\otimes \mathcal {H}_j)\) that satisfy \(\textrm{Tr}_i(\varvec{Y}) = 0\) and \(\varvec{X}= \varvec{X}_j\otimes \varvec{I}_i\). For \(p \ge 2\), \(C_p=p-1\),

$$\begin{aligned} \Vert \varvec{X}+ \varvec{Y}\Vert _{p}^2\le \Vert \varvec{X}\Vert _{p}^2 + C_p\Vert \varvec{Y}\Vert _{p}^2. \end{aligned}$$

Technically, the partially traceless assumption \(\textrm{Tr}_i(\varvec{Y}) = 0\) makes it a non-commutative martingale where taking partial trace is a conditional expectation

$$\begin{aligned} E_i[\cdot ]:= \varvec{I}_i\otimes \bar{\textrm{Tr}}_i[\cdot ] \quad \text {where} \quad \bar{\textrm{Tr}}_i[\cdot ]: = \frac{\textrm{Tr}_i[\cdot ]}{\textrm{Tr}_i[\varvec{I}_i]}. \end{aligned}$$

We can rewrite Proposition II.1.1 more formallyFootnote 5. For any matrix \(\varvec{K}\),

$$\begin{aligned} \left\| { \varvec{K}} \right\| _{p}^2\le \left\| { E_i[\varvec{K}] } \right\| _{p}^2 + C_p\left\| {(1-E_i)[\varvec{K}] } \right\| _{p}^2. \end{aligned}$$

This martingale condition naturally appears in subroutines of quantum information applications, while Hypercontractivity as a black box is more “rigid”. Although these two ideas are intimately related, we emphasize uniform smoothness is a versatileFootnote 6 and transparent driving horse, which implies, among other consequences, Hypercontractivity (Corollary II.4.1).

3.1 Uniform smoothness for subsystems

Proposition II.1.1 is a special case of [46]. Here, we present an elementary proof by adapting the argument in [24, Prop 4.3].Footnote 7 We start with the primitive form of uniform smoothness as a black box.

Fact II.2

(Uniform smoothness for Schatten Classes, recap [55]). For matrices \(\varvec{X}\) and \(\varvec{Y}\),

$$\begin{aligned} \left[ \frac{1}{2}\left(\left\| {\varvec{X}+\varvec{Y}} \right\| _{p}^p+\left\| {\varvec{X}-\varvec{Y}} \right\| _{p}^p\right)\right] ^{2/p} \le \left\| {\varvec{X}} \right\| _{p}^2+ C_p \left\| {\varvec{Y}} \right\| _{p}^2. \end{aligned}$$

We also need the following fact.

Fact II.3

(Monotonicity of p-norm w.r.t partial trace). For matrices \(\varvec{X}\) and \(\varvec{Y}\) satisfying \(\textrm{Tr}_i(\varvec{Y}) = 0\) and \(\varvec{X}= \varvec{X}_j\otimes \varvec{I}_i\), \(p \ge 2\),

$$\begin{aligned} \left\| {\varvec{X}} \right\| _{p}\le \left\| {\varvec{X}+\varvec{Y}} \right\| _{p}. \end{aligned}$$

This can be understood as the non-commutative analog of convexity \( \left\| {\varvec{X}+\mathbb {E}_{\varvec{Y}}\varvec{Y}} \right\| _{p} \le \mathbb {E}_{\varvec{Y}}\left\| {\varvec{X}+\varvec{Y}} \right\| _{p}. \)

Proof of Fact II.3

Recall the variational expression [60, Sec 12.2.1] for Schatten p-norms

$$\begin{aligned} \left\| {\varvec{X}_j} \right\| _{p} = \sup _{\left\| {\varvec{B}_j} \right\| _{q}\le 1} \textrm{Tr}(\varvec{X}_j\varvec{B}_j^\dagger )\quad \text {for} \quad 1/p+1/q=1. \end{aligned}$$

Then,

$$\begin{aligned} \left\| {\varvec{X}+\varvec{Y}} \right\| _{p} = \sup _{\left\| {\varvec{B}} \right\| _{q}\le 1} \textrm{Tr}[(\varvec{X}+\varvec{Y})\varvec{B}^\dagger ] \ge \textrm{Tr}\left[ (\varvec{X}+\varvec{Y}) \varvec{B}_j^\dagger \otimes \frac{\varvec{I}_i}{\left\| {\varvec{I}_i} \right\| _{q}}\right] = \left\| {\varvec{X}_j\otimes \varvec{I}_i} \right\| _{p}. \end{aligned}$$

The last equality uses the partially traceless condition \(\textrm{Tr}_i \varvec{Y}=0\) and that

$$\begin{aligned} \varvec{X}_j\otimes \varvec{I}_i \quad \text {has maximizer} \quad \varvec{B}_j\otimes \varvec{I}_i/{\left\| {\varvec{I}_i} \right\| _{q}}. \end{aligned}$$

An alternative proof is by averaging over Haar unitary on subsystem i

$$\begin{aligned} \left\| {\varvec{X}} \right\| _{p}=\left\| {\varvec{X}+\mathbb {E}_{\varvec{U}}[ \varvec{U}_i\varvec{Y}\varvec{U}_i^\dagger ]} \right\| _{p}\le \mathbb {E}_{\varvec{U}_i}\left\| {\varvec{X}+ \varvec{U}_i\varvec{Y}\varvec{U}_i^\dagger } \right\| _{p} = \left\| {\varvec{X}+\varvec{Y}} \right\| _{p} . \end{aligned}$$

The first equality is Schur’s lemma, then convexity, and lastly, we used unitary invariance of p-norms. \(\quad \square \)

We can almost prove Proposition II.1.1.

$$\begin{aligned} \frac{\left\| {\varvec{X}+\varvec{Y}} \right\| _{p}^2 + \left\| {\varvec{X}} \right\| _{p}^2}{2}&\le \frac{\left\| {\varvec{X}+\varvec{Y}} \right\| _{p}^2 + \left\| {\varvec{X}-\varvec{Y}} \right\| _{p}^2}{2} \\&\le \left( \frac{\left\| {\varvec{X}+\varvec{Y}} \right\| _{p}^p+\left\| {\varvec{X}-\varvec{Y}} \right\| _{p}^p}{2} \right) ^{2/p} \le \left\| {\varvec{X}} \right\| _{p}^2+ C_p \left\| {\varvec{Y}} \right\| _{p}^2. \end{aligned}$$

The last inequality is Lyapunov’s and then Fact I.4. Rearranging terms yields a slightly worse constant \(2C_p\). The advertised constant can be obtained via another elementary but insightful trick [24, Lemma A.1], which we reproduce as follows.

Proof of Proposition II.1.1

The proof considers a rescaling argument. Let \(\varvec{Z}:=\frac{1}{n}\varvec{Y}\). We have just obtained

$$\begin{aligned} \Vert \varvec{X}+ \varvec{Z}\Vert _{p}^2- \Vert \varvec{X}\Vert _{p}^2 \le 2C_p\Vert \varvec{Z}\Vert _{p}^2. \end{aligned}$$
(2.2)

Rearranging Fact I.4,

$$\begin{aligned} \Vert \varvec{X}+ k\varvec{Z}\Vert _{p}^2-\Vert \varvec{X}- (k-1)\varvec{Z}\Vert _{p}^2&\le \bigg (\Vert \varvec{X}+ (k-1)\varvec{Z}\Vert _{p}^2-\Vert \varvec{X}+ (k-2)\varvec{Z}\Vert _{p}^2 \bigg )\\&\quad + 2C_p\Vert \varvec{Z}\Vert _{p}^2 \le 2C_pk\Vert \varvec{Z}\Vert _{p}^2. \end{aligned}$$

The last inequality recursively applies the first line for \(n\ge k\ge 2\) and (2.2) at base caseFootnote 8\(k=1\). Therefore,

$$\begin{aligned} \Vert \varvec{X}+ \varvec{Y}\Vert _{p}^2 = \sum _{k} \bigg (\Vert \varvec{X}+ k\varvec{Z}\Vert _{p}^2-\Vert \varvec{X}- (k-1)\varvec{Z}\Vert _{p}^2\bigg )&\le \sum _{k}2C_pk \Vert \varvec{Z}\Vert _{p}^2 \\&=C_p\frac{n+1}{n} \Vert \varvec{Y}\Vert _{p}^2. \end{aligned}$$

Take \(n\rightarrow \infty \) to obtain the sharp constant. \(\square \)

3.1.1 Subalgebras

Let us work out the analogous elementary derivation for a subalgebra \(\mathcal {N}\subset \mathcal {M}\), which captures non-commutative martingales in full generality. This also provides a unifying perspective for the manipulations we are doing. For subalgebras \(\mathcal {N}\subset \mathcal {M}\subset B(\mathcal {H})\), let \(E:\mathcal {M}\rightarrow \mathcal {N}\) be the projection to subalgebra \(\mathcal {N}\) (or the trace-preserving conditional expectation), with the defining properties:

$$\begin{aligned} E^\dagger [\varvec{I}]=\varvec{I}\quad \text {and}\quad \textrm{Tr}[\varvec{Z}\varvec{X}] = \textrm{Tr}[E[\varvec{Z}] \varvec{X}]\quad \text {for any} \quad \varvec{X}\in \mathcal {N}, \varvec{Z}\in \mathcal {M}. \end{aligned}$$

Intuitively, E is the analog of normalized partial trace \(\varvec{I}_j \frac{\textrm{Tr}_j[\cdot ]}{\textrm{Tr}[\varvec{I}_j]}\). Using the notation natural in this setting, we reproduce the monotonicity.

Fact II.4

(Monotonicity of p-norm w.r.t projection to subalgebra). Consider finite dimensional subalgebras \(\mathcal {N}\subset \mathcal {M}\subset B(\mathcal {H})\) and the corresponding projection to subalgebra \(E:\mathcal {M}\rightarrow \mathcal {N}\). Then, for any \(\varvec{Z}\subset \mathcal {M}\) and \(p \ge 2\),

$$\begin{aligned} \left\| {E[\varvec{Z}]} \right\| _{p}\le \left\| {\varvec{Z}} \right\| _{p}. \end{aligned}$$

Proof of Fact II.4

Again, consider the variational expression

$$\begin{aligned} \left\| {\varvec{X}} \right\| _{p} = \sup _{\left\| {\varvec{B}_j} \right\| _{q}\le 1, \varvec{B}\in \mathcal {N}} \textrm{Tr}(\varvec{X}\varvec{B}^\dagger ) \quad \text {where}\quad 1/p+1/q=1. \end{aligned}$$

Note that the maximum is attained in the same algebra \(\varvec{B}\in \mathcal {N}\) (This can be seen by the structure theorem of finite-dimensional von Neumann algebra. \(\mathcal {N}\) is a direct sum of subsystems.). Then

$$\begin{aligned} \left\| {\varvec{Z}} \right\| _{p} = \sup _{\left\| {\varvec{B}'} \right\| _{q}\le 1} \textrm{Tr}[\varvec{Z}\varvec{B}'^\dagger ] \ge \textrm{Tr}\left[ \varvec{Z}\varvec{B}^\dagger \right] =\textrm{Tr}\left[ E[\varvec{Z}] \varvec{B}^\dagger \right] = \left\| {E[\varvec{Z}]} \right\| _{p}, \end{aligned}$$

which is the advertised result. \(\quad \square \)

Through the same arguments, we conclude the discussion for subalgebras by the following.

Proposition II.4.1

(Uniform smoothness for subalgebras). Consider finite dimensional subalgebras \(\mathcal {N}\subset \mathcal {M}\subset \mathcal {B}(\mathcal {H})\) and the corresponding projection to subalgebra \(E:\mathcal {M}\rightarrow \mathcal {N}\). Then, for any \(\varvec{Z}\in \mathcal {M}\),

$$\begin{aligned} \left\| {\varvec{Z}} \right\| _{p}^2 \le \left\| {E[\varvec{Z}]} \right\| _{p}^2 +C_p \left\| {(1-E)[\varvec{Z}]} \right\| _{p}^2. \end{aligned}$$

This result was first obtained in [46] in a more technical setting. We hope the discussion here provides a simple interpretation.

3.2 Deriving hypercontractivity

Uniform smoothness, through a recursion, implies Hypercontractivity-like global formulas.Footnote 9

Proposition II.4.2

(Moment estimates for local operator). For an operator \(\varvec{F}\in \mathcal {B}(\mathbb {C}^{d\otimes n})\) on n-qudits, and \(p\ge 2\), \(C_p:= p-1\),

$$\begin{aligned} \left\| {\varvec{F}} \right\| _{p}^2&\le \sum _{S\subset \{m,\cdots ,1\}} (C_p)^{|S|} \left\| {\varvec{F}_S} \right\| _{p}^2 \quad \text {where} \quad \varvec{F}_{S}:=\prod _{s\in S}(1-E_s)\prod _{s'\in S^c} E_{s'}[\varvec{F}]. \end{aligned}$$

The super-operator \(E_s[\cdot ]:=\varvec{I}_{s}\otimes \bar{\textrm{Tr}}_s[\cdot ] \) is the conditional expectation associated with the partial trace, and the set \(S^c\) is the complement of set S.

Intuitively, for each subset S, the product of conditional expectations selects the component \(\varvec{F}_S\) that is non-trivial on set S and trivial on the complement \(S^c\). Indeed, for qubits, the summand \(\varvec{F}_S\) corresponds to the Pauli strings non-trivial on set S.

Let us grasp this formula with some examples. For single-site Paulis, this resembles the usual concentration inequality for bounded independent summand (e.g., Hoeffdings’ inequality).

Example II.4.1

(1-local Paulis). For \(\varvec{F}:=\sum _i \alpha _i\varvec{\sigma }^z_i\),

$$\begin{aligned} \left\| {\varvec{F}} \right\| _{\bar{p}}^2 \le C_p \sum _i \alpha _i^2\left\| { \varvec{\sigma }^z_i } \right\| _{\bar{p}}^2=(p-1) \sum _i \left| {\alpha _i} \right| ^2\Vert { \varvec{\sigma }^z_i } \Vert . \end{aligned}$$

By Markov’s inequality, we obtain concentration for pure states drawn from a fixed orthonormal basis .

In other words, the strength is typically bounded by the variance \(\sum _i \left| {\alpha _i} \right| ^2\). If we take the states to be the computational basis, the tail bound applies to its eigenvalue distribution.

Moreover, we obtain a similar sum-of-squares behavior for 4-local Paulis, albeit with heavier tails.

Example II.4.2

(4-local Paulis). For \(\varvec{F}:=\sum _{i>j>k>\ell } \alpha _{ijk\ell }\varvec{\sigma }^x_i\varvec{\sigma }^z_j \varvec{\sigma }^y_k\varvec{\sigma }^x_\ell \),

$$\begin{aligned} \left\| {\varvec{F}} \right\| _{\bar{p}}^2 \le (C_p)^4 \sum _{i>j>k>\ell } \left| {\alpha _{ijk\ell }} \right| ^2\left\| {\varvec{\sigma }^x_i\varvec{\sigma }^z_j \varvec{\sigma }^y_k\varvec{\sigma }^x_\ell } \right\| ^2 \end{aligned}$$

By Markov’s inequality, we obtain

which does not have a Gaussian tail anymore but still decays super-polynomially.

Let us now present the elementary proof.

Proof of Proposition II.4.2

Apply uniform smoothness (Proposition II.1.1) for \(s=1,\ldots , n\) recursively.

$$\begin{aligned} \left\| {\varvec{F}} \right\| _{p}^2&= \left\| {\prod _{s=1}^n\left((1-E_s)+E_s \right)[\varvec{F}]} \right\| _{p}^2 \\&\le \sum _{S\subset \{n,\ldots ,1\}} (C_p)^{|S|} \left\| {\prod _{s\in S}(1-E_s)\prod _{s'\in S^c} E_{s'}[\varvec{F}]} \right\| _{p}^2 \end{aligned}$$

Each application produces two branches, and the total \(2^n\) branches are labeled by the subsets \(S\subset \{n,\ldots , 1\}\). We may regroup the conditional expectations since they are just taking partial traces of disjoint subsystems. The power \((C_p)^{|S|}\) comes from the times the branch \((1-E_s)\) appears. \(\quad \square \)

To compare with the existing Hypercontractivity for qubits, it is worth bringing Proposition II.4.2 to the following form.

Corollary II.4.1

(Non-commutative hypercontractivity). In the setting of Proposition II.4.2,

$$\begin{aligned} \left\| {\varvec{F}} \right\| _{\bar{p}}^2&\le \sum _{S\subset \{m,\ldots ,1\}} (3C_p)^{|S|} \left\| {\varvec{F}_S} \right\| _{\bar{2}}^2 = \left\| {\sqrt{3C_p}^{\left| {S} \right| }\varvec{F}_{S}} \right\| _{\bar{2}}. \end{aligned}$$

This is equivalent to the existing bound (Fact II.1) up to slightly worse constants. However, the martingale formulation streamlines a simple proof (Proposition II.1.1) and, more importantly, allows us to adapt to different settings in the subsequent sections.

Proof of Corollary II.4.1

Bound the normalized p-norm \(\left\| {\varvec{F}} \right\| _{\bar{p}}:=\frac{\left\| {\varvec{F}} \right\| _{p}}{\left\| {\varvec{I}} \right\| _{p}}\) by Pauli expansion \(\sigma _S:=\{\varvec{\sigma }^x,\varvec{\sigma }^y,\varvec{\sigma }^z\}^{\left| {S} \right| }\)

$$\begin{aligned} \left\| {\varvec{F}_S} \right\| _{\bar{p}}^2\le \left( \sum _{\sigma _S} \left\| {\varvec{F}_{\sigma _S}} \right\| _{\bar{p}} \right) ^2= \left( \sum _{\sigma _S} \left\| {\varvec{F}_{\sigma _S}} \right\| _{\bar{2}} \right) ^2 \le (\sum _{\sigma _S} )\cdot \sum _{\sigma _S} \left\| {\varvec{F}_{\sigma _S}} \right\| _{\bar{2}}^2 = 3^{\left| {S} \right| }\cdot \left\| {\varvec{F}_{S}} \right\| _{\bar{2}}^2, \end{aligned}$$

which is the advertised result. Intuitively, the extra factor we pay is the number of distinct Pauli strings \(3^{\left| {S} \right| }\). \(\quad \square \)

3.3 Product background states

Our previous discussion focused on the unweighted p-norm. In this section, we discuss the weighted p-norms. For \(0\le s\le 1\), define

$$\begin{aligned} \left\| {\varvec{F}} \right\| _{p,\varvec{\rho },s}&: =\left\| {\varvec{\rho }^{\frac{1-s}{p} }\varvec{F}\varvec{\rho }^{\frac{s}{p}}} \right\| _{p}, \end{aligned}$$

where \(s=1/2\) [5] and \(s=1\) are the notable cases

$$\begin{aligned} \left\| {\varvec{F}} \right\| _{p,\varvec{\rho },\frac{1}{2}}&: =\left\| {\varvec{\rho }^{\frac{1}{2p}}\varvec{F}\varvec{\rho }^{\frac{1}{2p}}} \right\| _{p}\quad \text {and}\quad \left\| {\varvec{F}} \right\| _{p,\varvec{\rho },1}: =\left\| {\varvec{F}\varvec{\rho }^{\frac{1}{p}}} \right\| _{p}. \end{aligned}$$

The latter feeds into the concentration for typical input states drawn from an ensemble whose average is \(\varvec{\rho }\) (Proposition II.0.1). Even though not applied elsewhere in this paper, we keep the general \(0\le s\le 1\) expression in the following arguments. Uniform smoothness generalizes to the \(\varvec{\rho }\)-weighted p-norm for factorized state \(\varvec{\rho }=\otimes _i \varvec{\rho }_i\). The martingale condition now depends on the state \(\varvec{\rho }_i\).

Proposition II.4.3

(Uniform smoothness for subsystem, weighted). Consider product state \(\varvec{\rho }= \varvec{\rho }_j\otimes \varvec{\rho }_i\) and matrices \(\varvec{X}, \varvec{Y}\in \mathcal {B}(\mathcal {H}_i\otimes \mathcal {H}_j)\) that satisfy the martingale condition \(\textrm{Tr}_i(\varvec{\rho }_i\varvec{Y}) = 0; \varvec{X}= \varvec{X}_j\otimes \varvec{I}_i\). For \(p \ge 2\), \(C_p=p-1\),

$$\begin{aligned} \Vert \varvec{X}+ \varvec{Y}\Vert _{p,\varvec{\rho },s}^2\le \Vert \varvec{X}\Vert _{p,\varvec{\rho },s}^2 + C_p\Vert \varvec{Y}\Vert _{p,\varvec{\rho },s}^2. \end{aligned}$$

In a similar proof, all we need is to modify monotonicity.

Fact II.5

(Monotonicity w.r.t partial trace). For matrices \(\varvec{X}\) and \(\varvec{Y}\) satisfying \(\textrm{Tr}_i(\varvec{\rho }_i\varvec{Y}) = 0\) and \(\varvec{X}= \varvec{X}_j\otimes \varvec{I}_i\), \(p \ge 2\),

$$\begin{aligned} \left\| {\varvec{X}} \right\| _{p,\varvec{\rho },s}\le \left\| {\varvec{X}+\varvec{Y}} \right\| _{p,\varvec{\rho },s}. \end{aligned}$$

Proof

Once again, plug in the variational expression

$$\begin{aligned} \left\| {\varvec{\rho }^{\frac{1-s}{2p}}_j\varvec{X}_j \varvec{\rho }^{\frac{s}{2p}}_j} \right\| _{p} = \sup _{\left\| {\varvec{B}_j} \right\| _{q}\le 1} \textrm{Tr}\left[\varvec{\rho }^{\frac{1-s}{2p}}_j\varvec{X}_j \varvec{\rho }^{\frac{s}{2p}}_j \varvec{B}^\dagger _j\right] \quad \text {for} \quad 1/p+1/q=1. \end{aligned}$$

Suppose the maximum is attained at some \(\varvec{B}_j\). Then by Proposition II.3,

$$\begin{aligned} \left\| {\varvec{X}+\varvec{Y}} \right\| _{p,\varvec{\rho },s}&= \sup _{\left\| {\varvec{B}} \right\| _{q}\le 1} \textrm{Tr}\left(\varvec{\rho }^{\frac{1-s}{2p}}_j(\varvec{X}_j+\varvec{Y}_j) \varvec{\rho }^{\frac{s}{2p}}_j \varvec{B}^\dagger \right) \\&\ge \textrm{Tr}\left[ \varvec{\rho }^{\frac{1-s}{2p}}_j(\varvec{X}_j+\varvec{Y}_j) \varvec{\rho }^{\frac{s}{2p}}_j\cdot \varvec{B}^\dagger _j\otimes \frac{\varvec{\rho }_i^{\frac{1}{q}}}{\Vert {\varvec{\rho }_i^{\frac{1}{q}}} \Vert _{q}}\right] = \left\| {\varvec{X}_j\otimes \varvec{I}_i} \right\| _{p,\varvec{\rho },s}. \end{aligned}$$

In the last inequality, we used the partially traceless assumption \(\textrm{Tr}_i [\varvec{\rho }_i\varvec{Y}] =0\) and that

$$\begin{aligned} \varvec{\rho }^{\frac{1-s}{2p}} (\varvec{X}_j\otimes \varvec{I}_i )\varvec{\rho }^{\frac{s}{2p}} \quad \text {has maximizer} \quad \varvec{B}_j\otimes \frac{\varvec{\rho }_i^{{1}/{q}}}{\Vert {\varvec{\rho }_i^{{1}/{q}}} \Vert _{q}}. \end{aligned}$$

\(\square \)

Combining the monotonicity with Fact I.4, we obtain uniform smoothness (Proposition II.4.3). Moreover, we automatically get a weighted version of a Hypercontractivity-like formula. Let us first define the appropriate operator re-centered w.r.t. the background

as a “shifted” Pauli \(\varvec{\sigma }^{z}\). Accordingly, we shift the conditional expectation

$$\begin{aligned} E_s[\cdot ] := \varvec{I}_s \otimes \textrm{Tr}[\varvec{\rho }_s \cdot ]. \end{aligned}$$

Proposition II.5.1

(Moment estimates for local operator, \(\varvec{\rho }\)-weighted). For an operator \(\varvec{F}\in \mathcal {B}(\mathbb {C}^{d\otimes n})\) on n-qudits, product state \(\varvec{\rho }= \otimes _i \varvec{\rho }_i\), and \(p\ge 2\), \(C_p:= p-1\),

$$\begin{aligned} \left\| {\varvec{F}} \right\| _{p,\varvec{\rho },s}^2&\le \sum _{S\subset \{m,\cdots ,1\}} (C_p)^{|S|} \left\| {\varvec{F}_S} \right\| _{p,\varvec{\rho },s}^2 \quad \text {where} \quad \varvec{F}_{S}:=\prod _{s\in S}(1-E_s)\prod _{s'\in S^c} E_{s'}[\varvec{F}]. \end{aligned}$$

The set \(S^c\) is the complement of set S.

3.3.1 Low-particle number subspace

Why did we study product state as the background? Interestingly, it will tell us about concentration when restricting to low particle number subspaces. Consider the following two operators: the projector \(\varvec{P}_m\) to the m-particle subspace of n-qubit Hilbert space

and the product state

We can control the p-norm with the low-particle subspace, which we care about, with the product state, which we can calculate.

Proposition II.5.2

For any operator \(\varvec{F}\),

$$\begin{aligned} \displaystyle \left\| {\varvec{F}} \right\| _{p,\bar{\varvec{P}},s } \le \left\| {\varvec{F}} \right\| _{p,\varvec{\rho }_{\eta },s} \cdot \left( \textrm{Poly}(n,m)\right) ^{1/p}. \end{aligned}$$

Note the factor \(\textrm{Poly}(n,m)\) is mild since they are suppressed as long as \(p\gtrsim \log (\textrm{Poly}(n,m))\).

Proof of Proposition II.5.2

By Stirling’s approximation, the operators obey positive semi-definite order

$$\begin{aligned} \bar{\varvec{P}_m}:= \frac{\varvec{P}_m}{\textrm{Tr}[\varvec{P}_m]} \le \varvec{\rho }_{\eta }\cdot b(n,m)\quad \text {where} \quad b(n,m) = (\eta )^m(1-\eta )^{n-m}\left( {\begin{array}{c}n\\ m\end{array}}\right) \\=\mathcal {O}(\textrm{Poly}(n,m)). \end{aligned}$$

This gives the advertised result by Fact II.6. \(\square \)

Fact II.6, proved below, is that weighted norms are monotone w.r.t the state. In our application for Trotter error, the Hamiltonian is often particle number preserving, and the following becomes trivial. But for potential applications in other contexts, we include a quick proof when the operator \(\varvec{F}\) and state \(\varvec{\rho }\) are not commuting.

Fact II.6

(Monotonicity of weight). For positive semi-definite operators \(\varvec{\rho }\ge \varvec{\sigma }\ge 0\) (presumably not normalized),

$$\begin{aligned} \left\| {\varvec{O}} \right\| _{p,\varvec{\rho },s}\ge \left\| {\varvec{O}} \right\| _{p,\varvec{\sigma },s}. \end{aligned}$$

This is closely related to a polynomial version of Lieb’s concavity.

Fact II.7

([11, Theorem 1.1]). For operator \(\varvec{A}\ge 0\), and \(q\ge 1\), \(r\le 1\), the function

$$\begin{aligned} f(\varvec{A}):=\textrm{Tr}[(\varvec{B}^\dagger \varvec{A}^{\frac{1}{q}}\varvec{B})^{rq}] \end{aligned}$$

is concave (and hence monotone) in \(\varvec{A}\).

We can now quickly adapt to our settings to present a proof.

Proof of Fact II.6

$$\begin{aligned} \left\| {\varvec{O}} \right\| _{p,\varvec{\rho }}^p = \textrm{Tr}\left[\left( \varvec{\rho }^{\frac{s}{2p}}\varvec{O}^\dagger \varvec{\rho }^{\frac{1-s}{p}}\varvec{O}\varvec{\rho }^{\frac{s}{2p}}\right) ^{\frac{p}{2}}\right]&\ge \textrm{Tr}\left[\left( \varvec{\rho }^{\frac{s}{2p}}\varvec{O}^\dagger \varvec{\sigma }^{\frac{1-s}{p}}\varvec{O}\varvec{\rho }^{\frac{s}{2p}}\right) ^{\frac{p}{2}}\right] \\&=\textrm{Tr}\left[\left( \varvec{\sigma }^{\frac{1-s}{2p}}\varvec{O}\varvec{\rho }^{\frac{s}{p}}\varvec{O}^\dagger \varvec{\sigma }^{\frac{1-s}{2p}}\right) ^{\frac{p}{2}}\right] \\&\quad \ge \textrm{Tr}\left[\left( \varvec{\sigma }^{\frac{1-s}{2p}}\varvec{O}\varvec{\sigma }^{\frac{s}{p}}\varvec{O}^\dagger \varvec{\sigma }^{\frac{1-s}{2p}}\right) ^{\frac{p}{2}}\right] = \left\| {\varvec{O}} \right\| _{p,\varvec{\sigma },s}^p. \end{aligned}$$

Both inequalities use Fact II.7 for \(q=\frac{p}{1-s}\ge 1\) and \(r = \frac{1-s}{2} \le 1\) and for \(q=\frac{p}{s}\ge 1\) and \(r = \frac{s}{2} \le 1\). The second equality is \(\left\| {\varvec{X}^\dagger \varvec{X}} \right\| _{\frac{p}{2}}= \left\| {\varvec{X}\varvec{X}^\dagger } \right\| _{\frac{p}{2}}\). This is the advertised result. \(\quad \square \)

3.4 Fermionic operators

Uniform smoothness and Hypercontractivity apply to Fermions. Consider the Jordan-Wigner transform

These operators also linearly span the full algebra on n-qubits \(\mathcal {B}(\mathcal {H}(2^n))\) by products \(\prod _s(\varvec{a}_s,\varvec{a}^\dagger _s,\varvec{a}_s\varvec{a}^\dagger _s,\varvec{I}_s)\). In this form, Fermions are not local operators due to the Pauli-Z strings. Fortunately, all we need for uniform smoothness is the martingale property (conditionally zero-mean). We derive an analogous 2-norm-like bound with a minor tweak due to Jordan-Wigner strings. The following result was known in [10, Theorem 4]Footnote 10 but we hope the presented derivation is more transparent. We will also extend it in Corollary II.7.2.

Corollary II.7.1

(Hypercontractivity for Fermions). On n-qubits, consider an operator without terms \(\varvec{a}_i\varvec{a}^\dagger _i\). Expand it \(\varvec{A}= \sum _{S\subset \{n,\cdots ,1\}} \varvec{A}_{S}\) by subsets S indicated by Fermionic operators \(\{\varvec{a}^\dagger ,\varvec{a}\}\). Then, for \(p\ge 2\), \(C_p=p-1\),

$$\begin{aligned} \left\| {\varvec{A}} \right\| _{p}^2&\le \sum _{S\subset \{m,\cdots ,1\}} (C_p)^{|S|} \left\| {\varvec{A}_S} \right\| _{p}^2. \end{aligned}$$

Proof

WLG, assume the Fermionic operators are ordered such that the larger index appears on the right (e.g.\(\varvec{a}_1\varvec{a}_3\varvec{a}_n\)).

$$\begin{aligned} \left\| {\varvec{A}} \right\| _{p}^2= \left\| {\varvec{a}_1\varvec{A}_{>1}+\varvec{a}^\dagger _1\varvec{A}'_{>1}+\varvec{I}_1\otimes \varvec{B}_{>1}} \right\| _{p}^2&\le \left\| {\varvec{I}_1\otimes \varvec{B}_{>1}} \right\| _{p}^2 +C_p\left\| {\varvec{a}_1\varvec{A}_{>1}+\varvec{a}^\dagger _1\varvec{A}'_{>1}} \right\| _{p}^2. \end{aligned}$$

To complete the induction as in Proposition II.4.2, apply a gauge transformation to change the Jordan-Winger string such that only \(a_2\) is nontrivial on site 2. Then we can repeat the above inequality. Note that the background \(\varvec{\rho }_{\eta }\) is invariant under gauge transformations, and the Pauli strings of \(\varvec{\sigma }^z\) do not blow up the weighted p-norm. \(\square \)

Example II.7.1

(2-local Fermionic operators).

$$\begin{aligned} \left\| {\sum _{i<j} \alpha _{ij} \varvec{a}_j\varvec{a}_i} \right\| _{p}^2&\le \sum _{i<j}(C_p)^{2} \left| {\alpha _{ij}} \right| ^2\left\| {\varvec{a}_j\varvec{a}_i} \right\| _{p}^2\le \sum _{i<j}(C_p)^{2} \left| {\alpha _{ij}} \right| ^2\Vert {\varvec{a}_i\varvec{a}_j} \Vert \left\| {\varvec{I}} \right\| _{p}^2 \end{aligned}$$

However, when multiplying Fermion operators we may get even powers \(\varvec{a}^\dagger _i \varvec{a}_i= (\varvec{I}+\varvec{\sigma }_i)/2, \varvec{a}_i\varvec{a}^\dagger _i = (\varvec{I}-\varvec{\sigma }_i)/2\) where the Pauli string \(\varvec{\sigma }^z\) cancels. Let us quickly extend to the cases with the presence of \(\varvec{\sigma }^z_i\) terms (perhaps with weighted background). Let us formally define the conditional expectation

$$\begin{aligned}&E_s: \mathcal {B}(\{\varvec{a}^\dagger _i,\varvec{a}_i\}_{i=1,\cdots ,n}) \rightarrow \mathcal {B}(\{\varvec{a}^\dagger _i,\varvec{a}_i\}_{i=1,\cdots ,n, i\ne s})=:\mathcal {N}\\&\text {such that}\quad E_s[\varvec{O}_{-s}\varvec{a}^\dagger _s] = E_s[\varvec{O}_{-s}\varvec{a}_s] = E_s[\varvec{O}_{-s}\varvec{O}^{\eta }] = 0\quad \text {and}\quad E_s[\varvec{O}_{-s}] = \varvec{O}_{-s}\\ \quad&\text {for}\quad \varvec{O}_{-s} \in \mathcal {N}. \end{aligned}$$

The conditional expectation maps the full algebra to the subalgebra generated by all but one fermions. Intuitively, it removes terms that have non-trivial terms \(\varvec{a}_s,\varvec{a}_s^\dagger , \varvec{O}^{\eta }\) on site s.

Corollary II.7.2

(Hypercontractivity for Fermions and \(\varvec{O}^{\eta }\)). On n qubits, consider a product state diagonal in the computational basis . Then, for \(p\ge 2\), \(C_p =p-1\),

$$\begin{aligned} \left\| {\varvec{A}} \right\| _{p,\varvec{\rho }_\eta }^2&\le \sum _{ S \subset \{m,\cdots ,1\}} (C_p)^{\left| {S} \right| } \left\| {\varvec{A}_S} \right\| _{p,\varvec{\rho }_\eta }^2 \quad \text {where}\quad \varvec{A}_{S}:=\prod _{s\in S}(1-E_s)\prod _{s'\in S^c} E_{s'}[\varvec{A}]. \end{aligned}$$

The proof is also elementary.

Proof

$$\begin{aligned} \left\| {\varvec{A}} \right\| _{p,\varvec{\rho }_\eta }^2&= \left\| {\varvec{a}_1\varvec{A}_{>1}+\varvec{a}^\dagger _1\varvec{A}'_{>1}+\varvec{O}^\eta _1\otimes \varvec{C}_{>1} +\varvec{I}_1\otimes \varvec{B}_{>1}} \right\| _{p,\varvec{\rho }_\eta }^2 \\&\le \left\| {\varvec{I}_1\otimes \varvec{B}_{>1}} \right\| _{p,\varvec{\rho }_\eta }^2 +C_p\left\| {\varvec{a}_1\varvec{A}_{>1}+\varvec{a}^\dagger _1\varvec{A}'_{>1}+\varvec{O}^\eta _1\otimes \varvec{C}_{>1}} \right\| _{p,\varvec{\rho }_\eta }^2. \end{aligned}$$

The rest gauge transformation argument follows from Corollary II.7.1. Note that \(\varvec{O}^{\eta }\) is invariant under gauge transformations. Alternatively, we can take the formal route by manipulating the conditional expectations as in Proposition II.4.2. \(\quad \square \)

4 Non-Random k-Local Hamiltonians

This section presents the main result of this work. We evaluate Hypercontractivity (Sect. 2) for Trotter error of non-random Hamiltonians.

Theorem II.1

(Trotter error in k-local models). To simulate a k-local Hamiltonian using \(\ell \)-th order Suzuki formula, the gate complexity

$$\begin{aligned} G =\Omega \left( \left( \frac{p^{\frac{k}{2}} \left\| {\varvec{H}} \right\| _{(global),2} t}{\epsilon } \right) ^{1/\ell } \Gamma p^{\frac{k-1}{2}} \left\| {\varvec{H}} \right\| _{(local),2} t\right) \ \ {}&\text {ensures}\ \\ \left\| {\textrm{e}^{\textrm{i}\varvec{H}t}- \varvec{S}_{\ell }(t/r)^r} \right\| _{\bar{p}} \le \epsilon . \end{aligned}$$

The p-norm estimate and Proposition II.0.1 imply concentration for typical input states via Markov’s inequality.

Corollary II.1.1

Draw from an orthonormal basis , then

This quickly converts to the trace distance between the pure states

We begin with an instructive example that illustrates the combinatorics (Sect. 3.1). We sketch the proof in Sect. 3.2. In Sects. 3.3 and 3.4, we combine the estimates and conclude the proof with explicit constants in Sect. 3.5. See Sect. 3.7 for the analogous result for Fermions.

4.1 An instructive example

Consider a 2-local Hamiltonian on three subsystems of qubits \(\mathcal {H}= \mathcal {H}_{I_1}\otimes \mathcal {H}_{I_2}\otimes \mathcal {H}_{I_3}\) of equal subsystem sizes n/3.

$$\begin{aligned} \varvec{H}= \sum _{\gamma = 1}^{\Gamma } \varvec{H}_{\gamma } = \sum _{i_1\in I_1, i_2\in I_2} \varvec{\sigma }^x_{i_1}\varvec{\sigma }^z_{i_2} + \sum _{i_1\in I_1, i_2\in I_2} \varvec{\sigma }^x_{i_1}\varvec{\sigma }^x_{i_2} + \sum _{i_2\in I_2, i_3\in I_3} \varvec{\sigma }^x_{i_2}\varvec{\sigma }^z_{i_3}. \end{aligned}$$

Let us play around with the first-order product formula. Recall

$$\begin{aligned} \textrm{e}^{\textrm{i}\varvec{H}_\Gamma t} \cdots \textrm{e}^{\textrm{i}\varvec{H}_1}- \textrm{e}^{\textrm{i}\sum _{\gamma =1}^\Gamma \varvec{H}_\gamma } = \frac{t^2}{2}\sum _{\gamma _2>\gamma _1}[\varvec{H}_{\gamma _2},\varvec{H}_{\gamma _1}]+\mathcal {O}(t^3). \end{aligned}$$

The leading order \(\mathcal {O}(t^2)\) Trotter error will be a sum of 3-local terms and 1-local terms

$$\begin{aligned} \sum _{\gamma _2>\gamma _1}[\varvec{H}_{\gamma _2},\varvec{H}_{\gamma _1}]&= \sum _{\left| {S_3} \right| =3}\varvec{F}_{S_3} + \sum _{\left| {S_1} \right| =1}\varvec{F}_{S_1}. \end{aligned}$$

The 3-local terms are the “greediest” way to produce long Pauli strings

$$\begin{aligned} \sum _{\left| {S_3} \right| =3} \varvec{F}_{S_3} = - 2\sum _{i_1,i_2,i_3}\varvec{\sigma }^x_{i_1}\varvec{\sigma }^y_{i_2}\varvec{\sigma }^z_{i_3} + 2\sum _{i_1,i'_1,i_2} \varvec{\sigma }^x_{i_1}\varvec{\sigma }^x_{i'_1}\varvec{\sigma }^y_{i_2} \quad \text {and}\quad \left\| {\sum _{\left| {S_3} \right| =3}\varvec{F}_{S_3}} \right\| _{\bar{2}} \\= \theta ( n^{3/2}). \quad \text {(greedy)} \end{aligned}$$

No cancellation nor collision occurs, and each term is supported on distinct subsets \(\{i_1,i_2,i_3\}\) or \(\{i_1,i_1',i_2\}\). These operators add incoherently (in the Hilbert-Schmidt norm for simplicity). The 1-local terms are more peculiar but turn out equally important. They come from terms that overlap on both sites

$$\begin{aligned} \sum _{\left| {S_1} \right| =1}\varvec{F}_{S_1} = \sum _{i_1, i_2} [\varvec{\sigma }^x_{i_1}\varvec{\sigma }^z_{i_2}, \varvec{\sigma }^x_{i_1}\varvec{\sigma }^x_{i_2}]=2(\sum _{i_1})\cdot \sum _{i_2}\varvec{\sigma }^y_{i_2} \quad \text {and} \quad \left\| {\sum _{\left| {S_1} \right| =1}\varvec{F}_{S_1}} \right\| _{\bar{2}} \\= \theta ( n^{3/2}). \quad \text {(colliding)} \end{aligned}$$

The collision of the same Pauli \(\varvec{\sigma }^x_{i_1}\) leads to a “constructive interference” over site \(i_1\). Consequently, it gives a comparable contribution to the Trotter error, although it has a single sum over \(i_2\). This is not a coincidence; both terms are formally controlled by the advertised quantity

$$\begin{aligned} \left\| {\sum _{\left| {S_1} \right| =1}\varvec{F}_{S_1}} \right\| _{\bar{2}}, \left\| {\sum _{\left| {S_3} \right| =3}\varvec{F}_{S_3}} \right\| _{\bar{2}}=\theta (\sqrt{n}\cdot n)=\theta \left( \Vert {\varvec{H}} \Vert _{(1),2}\cdot \Vert {\varvec{H}} \Vert _{(0),2}\right). \end{aligned}$$

From this example, we can anticipate a formal proof would require (1) extracting the local quantities \(\Vert {\varvec{H}} \Vert _{(1),2}\) and \(\Vert {\varvec{H}} \Vert _{(0),2}\) from the nested commutators and Hypercontractivity and (2) dealing with the higher-order time dependence.

4.2 Proof outline

With the above example in mind, we sketch the proof strategy as follows. Recall for any product formula with ordering \(\gamma (j)\), weights \(a_j\), and number of J exponentials, the general Trotter error can be represented in a time-ordered exponential

$$\begin{aligned} \prod _{j=1}^{J} \textrm{e}^{\textrm{i}a_j\varvec{H}_{\gamma (j)} t} = \textrm{e}^{\textrm{i}a_J\varvec{H}_{\gamma (J)} t}\cdots \textrm{e}^{\textrm{i}a_1 \varvec{H}_{\gamma (1)} t}= \exp _{\mathcal {T}}(\textrm{i}\int (\varvec{\mathcal {E}} + \varvec{H})dt). \end{aligned}$$

The error \(\mathcal {E}\) is time-dependent and takes the commutator form

$$\begin{aligned} \varvec{\mathcal {E}}=\varvec{\mathcal {E}} (\varvec{H}_1,\ldots ,\varvec{H}_\Gamma , t) :&= \sum ^{J}_{j=1}\left( \prod _{k=j+1}^{J} \textrm{e}^{a_k\mathcal {L}_{\gamma (k)} t} [a_j\varvec{H}_{\gamma (j)}] -a_j\varvec{H}_{\gamma (j)}\right) \quad \text {where}\quad \nonumber \\ \mathcal {L}_{\gamma }[O]:=\textrm{i}[\varvec{H}_\gamma ,O]. \end{aligned}$$
(3.1)

The particular form depends on the choice of ordering and weights, but fortunately, the precise values of the coefficients \(a_j\) will not matter. For \(\ell \)-th order Suzuki formulas that we focus on, all we need is a crude uniform bound \(\left| {a_k} \right| \le 1\) and that the total formula consists of \(\Upsilon =2\cdot 5^{\ell /2-1}\) stages for \(J= \Upsilon \cdot \Gamma \). Our combinatorial argument takes norms everywhere and does not rely on delicate cancellations. Our proof will “beat the error \(\varvec{\mathcal {E}}\) to death” by Taylor expansion (from right to left).

Fact III.2

(Taylor expansion [18, Theorem 10]). For any order \(g'\),

$$\begin{aligned} \textrm{e}^{\mathcal {L}_J t}\cdots \textrm{e}^{\mathcal {L}_{j+1} t}&= \sum ^{g'-1}_{g=1} \sum _{g_J+\cdots +g_{j+1}=g-1}\mathcal {L}^{g_J}_J\cdots \mathcal {L}^{g_{j+1}}_2 \frac{t^{g-1}}{g_J!\cdots g_{j+1}!}\\&+\sum _{m=j+1}^{J} \textrm{e}^{\mathcal {L}_J t}\cdots \textrm{e}^{\mathcal {L}_{m+1} t} \int _0^t dt_1 \sum _{g_m+\cdots +g_{j+1}=g'-1,g_m\ge 1}e^{\mathcal {L}_{m} t_1} \mathcal {L}^{g_m}_m\cdots \\ {}&\mathcal {L}^{g_{j+1}}_{j+1} \frac{(t-t_1)^{g_m-1}t^{g'-g_m-1}}{(g_m-1)!\cdots g_{j+1}!}. \end{aligned}$$

The \(g-1\) exponent will be used consistently in the following. Setting \(\mathcal {L}_j:=a_j\mathcal {L}_{\gamma (j)}\), Taylor expansion gives the formal expansion for the error in powers of time t

$$\begin{aligned} \varvec{\mathcal {E}}= \sum _{g=\ell +1}^{g'-1} \varvec{\mathcal {E}}_g + \varvec{\mathcal {E}}_{\ge g'} \quad \text {where}\quad \varvec{\mathcal {E}}_g = \mathcal {O}(t^{g-1})\quad \text {and}\quad \varvec{\mathcal {E}}_{\ge g'} = \mathcal {O}(t^{g'-1}). \end{aligned}$$

Each g-th order term \(\varvec{\mathcal {E}}_g\) is a sum of nested commutators, and we control its p-norm (Sect. 3.3). We will evaluate Hypercontractivity through a rather involved combinatorics to extract the local quantities \(\Vert {\varvec{H}} \Vert _{(1),2}\) and \(\Vert {\varvec{H}} \Vert _{(0),2}\). Note that we will use the version we derived (Proposition II.4.2)

$$\begin{aligned} \left\| {\varvec{F}} \right\| _{p}^2&\le \sum _{S\subset \{m,\cdots ,1\}} (C_p)^{|S|} \left\| {\varvec{F}_S} \right\| _{p}^2. \end{aligned}$$

This will straightforwardly generalize to the case of Fermions (Sect. 3.7) and is not restricted to the case of qubits. See Sect. 3.5 for comments on how much constant overhead improvement is possible using the other Hypercontractivity \(\left\| {\varvec{F}} \right\| _{\bar{p}}^2 \le \sum _{S} C_p^{\left| {S} \right| } \left\| {\varvec{F}_S} \right\| _{\bar{2}}^2\) (Proposition II.1).

We handle the edge case \(g'\)-th order term \(\varvec{\mathcal {E}}_{g'}\) in Sect. 3.4. Indeed, bounding the infinite series will give divergent results, so we must halt the expansion at an appropriate order \(g'\). We combine the estimates and apply Markov’s inequality in Sect. 3.5.

4.3 Bounds on the g-th order

We proceed by controlling each g-th order (3.1) polynomial by Hypercontractivity (Proposition II.1.1). We begin with \(\mathcal {L}_j:=a_j\mathcal {L}_{\gamma (j)}\) to ease notation

$$\begin{aligned} \left\| {\varvec{\mathcal {E}}_g} \right\| _{p}^2&= \left\| {\sum ^{J}_{j=1} \sum _{g_J+\cdots +g_{j+1}=g-1}\mathcal {L}^{g_J}_J\cdots \mathcal {L}^{g_{j+1}}_{j+1} [\varvec{H}_{j}]\frac{t^{g-1}}{g_J!\cdots g_{j+1}!}} \right\| _{p}^2 \nonumber \\&\le \sum _{S\subset \{n,\cdots , 1\}} (C_p)^{|S|} \left\| {\left[ \sum ^{J}_{j=1} \sum _{g_J+\cdots +g_{j+1}=g-1}\mathcal {L}^{g_J}_J\cdots \mathcal {L}^{g_{j+1}}_{j+1} [\varvec{H}_{j}]\frac{t^{g-1}}{g_J!\cdots g_{j+1}!}\right] _S} \right\| _{p}^2 \nonumber \\&\le (C_p)^{g(k-1)+1}\Upsilon (t\Upsilon )^{2(g-1)} \nonumber \\ {}&\sum _{S\subset \{n,\cdots , 1\}} \left( \sum _{\gamma _{g-1}}^\Gamma \cdots \sum _{\gamma _0}^\Gamma \left\| {\left[ \mathcal {L}_{\gamma _{g-1}}\mathcal {L}_{\gamma _{g-2}}\cdots \mathcal {L}_{\gamma _1} [\varvec{H}_{\gamma _0}]\right] _S} \right\| _{p}\right) ^2. \end{aligned}$$
(3.2)

The second inequality uses a uniform bound on locality \(\left| {S} \right| \le g(k-1)+1\) and applies a brutal triangle inequality. The last inequality expresses \(\mathcal {L}_j\) by \(a_j\mathcal {L}_{\gamma (j)}\) and \(\varvec{H}_j\) by \(a_j\varvec{H}_{\gamma (j)}\) and uses that \(\left| {a_j} \right| \le 1\). We also symmetrize the sum over terms \(\mathcal {L}_{\gamma }\) by throwing in extra terms. This costs an extra factor of \((g-1)!\) (which cancels the factor \(1/(g-1)!\) in the exponential) due to possible permutation of a \((g-1)\)-th order term. For example, consider a particular term

$$\begin{aligned} \textrm{e}^{\mathcal {L}_3t}\textrm{e}^{\mathcal {L}_2t}\textrm{e}^{\mathcal {L}_1t}&= \cdots + \mathcal {L}_3 \mathcal {L}_2 \mathcal {L}_1 t^3+\cdots \\ \textrm{e}^{(\mathcal {L}_3+\mathcal {L}_2+\mathcal {L}_1)t}&= \cdots + \mathcal {L}_3 \mathcal {L}_2 \mathcal {L}_1 \frac{t^3}{3!}+\cdots . \end{aligned}$$

The number of stages \(\Upsilon \) arise as each term \(\mathcal {L}_{\gamma }\) or \(\varvec{H}_{\gamma }\) appears \(\Upsilon \)-times.

The main lemma of this section is the following recursive estimate for one layer of commutators \(\sum _{\gamma } \mathcal {L}_\gamma \). This is effectively calculating certain “2–2 norm” for the commutator \(\sum _{S_2} \mathcal {L}_{S_2}\), where the “2-norm” is \( \sum _{S_1} \left( \sum _\alpha \left\| {[\varvec{O}]^{\alpha }_{S_1}} \right\| _{p}\right)^2. \) We will keep this at an analogy level to avoid introducing extra notations.

Lemma II.3

(Effective 2–2 norm of the commutator). For any set of operators \(\{\varvec{O}^{\alpha }\}_{\alpha }\),

$$\begin{aligned} \sum _{S\subset \{n,\cdots , 1\}} \left( \sum _{\gamma }\sum _\alpha \left\| {\left[ \mathcal {L}_{\gamma }[\varvec{O}^\alpha ]\right] _S} \right\| _{p}\right) ^2&\le \lambda (k)^2(\left| {S} \right| _{max})^{2k} \cdot \sum _{S_1\subset \{n,\cdots , 1\}} \left( \sum _{\alpha }\left\| {[\varvec{O}^\alpha ]_{S_1}} \right\| _{p}\right) ^2 \end{aligned}$$

where \(\mathcal {L}_{\gamma }[\varvec{O}^\alpha ]\) is at most \(\left| {S_{max}} \right| \)-local and

$$\begin{aligned} \lambda (k):=\frac{2^{k/2+1}}{(k-1)!} \sum _{k_{f}=1}^k \frac{2^{k_{f}/2}}{{(k-k_{f})!}}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{H} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{(k_{f}),2}. \end{aligned}$$

Assuming Lemma II.3, iterating it for \((g-1)\)-times gives the estimate

$$\begin{aligned} (cont.)\quad \left\| {\varvec{\mathcal {E}}_g} \right\| _{p}^2&\le (C_p)^{g(k-1)+1} (\Upsilon t)^{2(g-1)} \bigg ((g(k-1)+1)\cdots (2(k-1)-1)\bigg )^{2k}\nonumber \\ {}&\lambda (k)^{2(g-1)}\cdot \left\| {\varvec{H}} \right\| _{(global),2}^2\left\| {\varvec{I}} \right\| _{p}^2 \nonumber \\&\le (C_p)^{g(k-1)+1} g^{2gk}\left( c(k)\Upsilon t\right) ^{2(g-1)}\cdot \left\| {\varvec{H}} \right\| _{(global),2}^2\left\| {\varvec{I}} \right\| _{p}^2. \end{aligned}$$
(3.3)

The first inequality also evaluates the last sum over the Hamiltonian terms \(\varvec{H}_{\gamma _0}\) by

$$\begin{aligned} \sum _{S\subset \{n,\cdots , 1\}} \left\| {\left[ \varvec{H}_{\gamma _0}\right] _S} \right\| _{p}^2 \le \left\| {\varvec{H}} \right\| _{(global),2}\left\| {\varvec{I}} \right\| _{p}^2. \end{aligned}$$

The last inequality uses \(g(k-1)+1\le g k\) and hides constants depending only on k in the value c(k)

$$\begin{aligned} c(k):=k^{k}\lambda (k). \end{aligned}$$

The expression (3.3) yields the desired estimate for the g-th order error term. Unfortunately, the power series is not summable due to the super-exponential factor \(g^{2gk}\). We will later truncate the expansion at some properly chosen order \(g'\) (Sect. 3.4).

What remains in this section is to show Lemma II.3. As hinted in the example (Sect. 3.1), we need to systematically handle the cases that grow greedily and those with collisions. Let us identify how taking commutators may produce other sets S (Fig. 10).

$$\begin{aligned} \mathcal {L}_{\gamma }[\varvec{O}_{S_1}] = \sum _S \left[ \mathcal {L}_{\gamma }[\varvec{O}_{S_1}]\right] _S. \end{aligned}$$

Let \(S_2(\gamma )\) be the support of \(\gamma \).

  • If the sets \(S_1\) and \(S_2(\gamma )\) are disjoint, the commutator vanishes.

  • (I) If they overlap on a single site, there is no cancellation. The resulting set is the union \(S=S_2(\gamma )\cup S_1\). This was the “greedy” term in the example.

  • (II) If they overlap on more than 1 site, we may lose all but 1 site. The resulting set S is a subset of the union \(S\subset S_2(\gamma )\cup S_1\).Footnote 11

To account for the above, we rewrite the sets \(S,S_2,S_1\) in terms of the components

$$\begin{aligned} S&= S_0\perp \hspace{0.8cm} S_{f} \perp S_+, \\ S_2&= \hspace{0.3cm} S_-\perp S_{f}\perp S_+, \\ S_1&= S_0\perp S_-\perp S_{f}. \end{aligned}$$

where

  • \(S_0:= S_1/S_2\) are the “untouched” sites.

  • \(S_-\subset S_1\cap S_2\) are the sites that got canceled due to collison

  • \(S_f\) are the sites that stayed in all sets \(S_0, S_1,S_2\). We must have \(\left| {S_f} \right| \ge 1\)

  • \(S_+:=S_2/S_1\) are the new sites.

We will constantly use this decomposition back and forth in the proof.

4.3.1 “Greedy growth”: overlapping at 1 site

To get familiar with the manipulations and notations, we work out the simpler case when the sets overlap on a single site

$$\begin{aligned} \left| {S_1\cap \gamma } \right| =1\quad \text {for}\quad \mathcal {L}_{\gamma }[\varvec{O}_{S_1}]. \end{aligned}$$

We will see that the growth due to the commutator \(\sum _{\gamma } \mathcal {L}_{\gamma }\) is controlled by the succinct norm \(\left\| {\varvec{H}} \right\| _{(local),2}^2\), multiplied by some function of the locality \(\left| {S} \right| \). To ease the notation, we will also overload the set \(S_2(\gamma )\) by \(\gamma \).

$$\begin{aligned}&\sum _{S\subset \{n,\cdots , 1\}} \left(\sum _{\gamma } \sum _{\begin{array}{c} \left| {S_1\cap \gamma } \right| =1 \\ S_1\cup \gamma =S \end{array}} \left\| {\left[ \mathcal {L}_{\gamma }[\varvec{O}_{S_1}]\right] _S} \right\| _{p}\right)^2 \\&\quad = \sum _{S\subset \{n,\cdots , 1\}} \left( \sum _{\begin{array}{c} \left| {S_1\cap S_2} \right| =1 \\ S_1\cup S_2=S \end{array}} \sum _{\gamma \sim S_2} \left\| {\left[ \mathcal {L}_{\gamma }[\varvec{O}_{S_1}]\right] _S} \right\| _{p}\right)^2 \\&\quad \le \sum _{S\subset \{n,\cdots , 1\}} \sum _{\begin{array}{c} \left| {S_1\cap S_2} \right| =1 \\ S_1\cup S_2=S \end{array}} \left(\sum _{\gamma \sim S_2} \left\| {\mathcal {L}_{\gamma }[\varvec{O}_{S_1}]} \right\| _{p}\right)^2\cdot \left(\sum _{\begin{array}{c} \left| {S'_1\cap S'_2} \right| =1 \\ S'_1\cup S'_2=S \end{array}}\right) \\&\quad \le \sum _{S_1} \sum _{S_2: \begin{array}{c} \left| {S_1\cap S_2} \right| =1 \end{array}} \left( \sum _{\gamma \sim S_2}2\Vert {\varvec{H}_{\gamma }} \Vert \right)^2\left\| {\varvec{O}_{S_1}} \right\| _{p}^2\cdot \left(\left( {\begin{array}{c}\left| {S} \right| \\ \left| {S_1} \right| \end{array}}\right) \left| {S_1} \right| \right) \\&\quad \le 4 \cdot \left\| {\varvec{H}} \right\| _{(local),2}^2 \cdot \max _{\left| {S} \right| ,\left| {S_1} \right| }\left( {\begin{array}{c}S\\ \left| {S_{1}} \right| \end{array}}\right) \left| {S_{1}} \right| ^2\cdot \left( \sum _{S_1} \left\| {\varvec{O}_{S_1}} \right\| _{p}^2\right) . \end{aligned}$$

The first inequality is Cauchy-Schwartz. The second inequality rearranges the sum over \(S_1,S_2\), uses Holder’s inequality, and then evaluates the combinations that the two sets \(S_1,S_2\) can give rise to the set S. In the last inequality, we make the 2-norm \(\left\| {\varvec{H}} \right\| _{(local),2}\) explicit by

$$\begin{aligned} \sum _{S_2:\left| {S_1\cap S_2} \right| =1} \Vert {\varvec{H}_{S_2}} \Vert ^2&= \sum _{s_1\in S_1}\sum _{S_+\cap S_1 = \emptyset }\Vert {\varvec{H}_{S_+\cup \{s_1\}}} \Vert ^2 \\&\le \left| {S_1} \right| \cdot \max _{s_1}\sum _{S_+\cap S_1 = \emptyset }\Vert {\varvec{H}_{S_+\cup \{s_1\}}} \Vert ^2= \left| {S_1} \right| \cdot \left\| {\varvec{H}} \right\| _{(local),2}^2. \end{aligned}$$

We also use a uniform upper-bound for the combinatorial function of the set sizes \(\left| {S} \right| ,\left| {S_1} \right| \).

It is instructive to compare with the \(1-1\)-norm calculation without invoking Hypercontractivity.

$$\begin{aligned} \sum _{S\subset \{n,\cdots , 1\}} \sum _{\gamma } \sum _{\begin{array}{c} \left| {S_1\cap \gamma } \right| =1 \\ S_1\cup \gamma =S \end{array}} \left\| {\left[ \mathcal {L}_{\gamma }[\varvec{O}_{S_1}]\right] _S} \right\|&\le \sum _{S_1} \sum _{S_2: \begin{array}{c} \left| {S_1\cap S_2} \right| =1 \end{array}} \sum _{\gamma \sim S_2}2\Vert {\varvec{H}_{\gamma }} \Vert \left\| {\varvec{O}_{S_1}} \right\| \\&\le 2 \left\| {\varvec{H}} \right\| _{(local),1} \max _{S_1} \left| {S_1} \right| \cdot \sum _{S_1}\left\| {\varvec{O}_{S_1}} \right\| . \end{aligned}$$

This is the 1-norm local quantity that featured in the worst-case Trotter error [18].

Fig. 10
figure 10

(Left) Intuitively, commuting with another operator produces new occupancy. (Middle) Unfortunately, occasion cancellations complicate the calculation. (Right) For bookkeeping, we label the possible sets that can be produced by acting the commutator \(\mathcal {L}_{S_2}\) on some operator \(\varvec{O}_{S_1}\). In their intersection, some subset \(S_-\) becomes the identity, and some subset \(S_{f}\) remains occupied. For Pauli strings, the fixed subset \(S_{f}\) must be non-empty; in the Fermionic case, the fixed subset \(S_{f}\) may be empty

4.3.2 Cancellation and collision due to larger overlap

The case with cancellation requires delicate notations to handle. Suppose we lose some set in the overlap \( S_- \subset S_1\cap S_2\) due to collision and gain a new set \(S_2/S_1=:S_+\) (Fig. 10). The combinatorics will be organized by the size of the fixed set \(k_{f}:=\left| {S_{f}} \right| = 1,\ldots , k\).

Proposition II.3.1

(Fixed \(k_f\)). For a value of \(\left| {S_{f}} \right| =k_{f} \in \{ 1,\ldots , k\}\),

$$\begin{aligned}&\sum _{S\subset \{n,\cdots , 1\}} \left( \sum _{\gamma } \sum _{\begin{array}{c} \left| {S_{f}} \right| =k_{f} \\ S_0\perp S_{f} \perp S_+ = S \end{array}} \mathbb {1}(S_2\sim \gamma ) \left\| {\left[ \mathcal {L}_{\gamma }[\varvec{O}_{S_1}]\right] _S} \right\| _{p}\right) ^2 \\&\quad \le 2^{k+k_{f}+2} \cdot \left( \frac{(\left| {S} \right| _{max})^k}{(k-k_{f})!(k-1)!}\right) ^2 \cdot \Vert {\varvec{H}} \Vert _{(k_{f}),2}^2 \cdot \left( \sum _{S_1}\left\| {\varvec{O}_{S_1}} \right\| _{p}^2\right) , \end{aligned}$$

where the indicator \(\mathbb {1}(S_2\sim \gamma )\) checks if the set \(S_2\) coincides with the support of \(\gamma \) and

$$\begin{aligned} \left\| {\varvec{H}} \right\| _{(k_f),2} := \sqrt{ \max _{|S_f|=k_f } \sum _{S\supset S_f} b_{S}^2} \quad \text {where}\quad b_{S}:=\sum _{\gamma \sim S} \Vert {\varvec{H}_{\gamma }} \Vert . \end{aligned}$$

To connect to our notation in the main text, for \(k_f= 0\), this is what we defined as the global norm \(\left\| {\varvec{H}} \right\| _{(0),2} = \left\| {\varvec{H}} \right\| _{(global),2}\); for \(k_f= 1\), this is what we defined as the local norm \(\left\| {\varvec{H}} \right\| _{(1),2} = \left\| {\varvec{H}} \right\| _{(local),2}\). The norms for \(k_f\ge 2\) are more of a proof artifact. To be careful with the distinction between an operator \(\varvec{O}\) and its local component \(\varvec{O}_{S}\), we first note the following bound.

Fact II.4

For any set S and operator \(\varvec{O}\), we have

$$\begin{aligned} \left\| { \prod _{s \in S} (1-E_s) [\varvec{O}]} \right\| _{p} \le 2^{\left| {S} \right| }\left\| {\varvec{O}} \right\| _{p} \quad \text {where}\quad E_s :=\varvec{I}_{s}\frac{\textrm{Tr}_{s}[\cdot ]}{\textrm{Tr}[\varvec{I}_{s}]}. \end{aligned}$$

Proof of Fact II.4

Use monotonicity of partial trace (Fact II.3), i.e., the conditional expectation \(E_s\) is norm non-increasing. The last factor \(2^{\left| {S} \right| }\) is due to a brutal triangle inequality. \(\quad \square \)

Proof of Proposition II.3.1

$$\begin{aligned}&\sum _{S\subset \{n,\cdots , 1\}} \left( \sum _{\begin{array}{c} \left| {S_{f}} \right| =k_{f} \\ S = S_0\perp S_{f} \perp S_+ \end{array}} \sum _{\gamma \sim S_2}\left\| {\left[ \mathcal {L}_{\gamma }[\varvec{O}_{S_1}]\right] _S} \right\| _{p}\right) ^2 \\&\quad \le \sum _{S\subset \{n,\cdots , 1\}} \left( 2^{\left| {S_{f}} \right| } \sum _{S_{f},S_+}\sum _{S_-}\sum _{\gamma \sim S_2} 2\Vert {\varvec{H}_{\gamma }} \Vert \left\| {\varvec{O}_{S_{f}S_-S_0}} \right\| _{p}\right) ^2 \\&\quad \le (4^{k_f+1})\cdot \sum _{S\subset \{n,\cdots , 1\}} \sum _{S_{f},S_+}\left( \sum _{S_-}b_{S_2} \left\| {\varvec{O}_{S_{f}S_-S_0}} \right\| _{p}\right) ^2\cdot \left(\sum _{S'_0\perp S'_{f} \perp S'_+=S} 1\right) \\&\quad \le (\cdot )(\cdot )\sum _{S\subset \{n,\cdots , 1\}} \sum _{S_{f},S_+}\left( \sum _{S_-}b_{S_fS_-S_+}^2\right) \left( \sum _{S'_-} \left\| {\varvec{O}_{S_{f}S'_-S_0}} \right\| _{p}^2\right) \\&\quad = (\cdot )(\cdot ) \sum _{S_{f}}\left( \sum _{S_-,S_+}b_{S_fS_-S_+}^2\right) \left( \sum _{S'_-,S_0}\left\| {\varvec{O}_{S_{f}S'_-S_0}} \right\| _{p}^2\right) \\&\quad \le 2^{k+k_{f}+2} \cdot \left( \frac{(\left| {S} \right| _{max})^k}{(k-k_{f})!(k-1)!}\right) ^2 \cdot \Vert {\varvec{H}} \Vert _{(k_{f}),2}^2 \cdot \left( \sum _{S_1}\left\| {\varvec{O}_{S_1}} \right\| _{p}^2\right) . \end{aligned}$$

The first inequality uses that \(\Vert {\left[ \mathcal {L}_{\gamma }[\varvec{O}_{S_1}]\right] _S} \Vert _{p}= \Vert {\prod _{s \in S_f} (1-E_s) \mathcal {L}_{\gamma }[\varvec{O}_{S_1}]} \Vert _{p}\le 2^{\left| {S_f} \right| }\Vert {\mathcal {L}_{\gamma }[\varvec{O}_{S_1}]} \Vert _{p}\) via Fact II.4. The second inequality evaluates \(\sum _{\gamma \sim S} \Vert {\varvec{H}_{\gamma }} \Vert = b_{S}\) and uses Cauchy-Schwartz w.r.t to the sum over sets \(S_{f}, S_+, S_0\) associated with a given set S. The third inequality uses Cauchy-Schwartz w.r.t the sum over sets \(S_-\). We also evaluate the elementary sum (the last inequality here uses that the largest term is attained at \(\left| {S_+} \right| = k- \left| {S_f} \right| \). )

$$\begin{aligned} \sum _{S'_0\perp S'_{f} \perp S'_+=S} 1 = \sum _{\left| {S_+} \right| =0}^{k-\left| {S_{f}} \right| } \left( {\begin{array}{c}S\\ S_{f}\end{array}}\right) \left( {\begin{array}{c}S-S_{f}\\ S_+\end{array}}\right)&= \sum _{\left| {S_+} \right| =0}^{k-\left| {S_{f}} \right| } \frac{\left| {S} \right| !}{\left| {S_{f}} \right| !\left| {S_+} \right| !(\left| {S} \right| -\left| {S_+} \right| -\left| {S_{f}} \right| )!} \\&\le \sum _{\left| {S_+} \right| =0}^{k-\left| {S_{f}} \right| } \frac{\left| {S} \right| ^{\left| {S_+} \right| }}{\left| {S_{+}} \right| !}\frac{\left| {S} \right| ^{\left| {S_{f}} \right| }}{\left| {S_{f}} \right| !}\\&\le (k-k_{f}+1) \frac{\left| {S} \right| ^{k}}{(k-k_{f})!k!}. \end{aligned}$$

The equality rearranges the sum. The last inequality uses the following estimates

$$\begin{aligned} \max _{S_{f}} \left( \sum _{S_-,S_+} b_{S_fS_-S_+}^2\right)&=\max _{S_{f}} \left( \sum _{S'}\sum _{S_-\cup S_+=S'}b_{S_fS_-S_+}^2\right) \\&\le 2^{k-k_{f}} \cdot \Vert {\varvec{H}} \Vert _{(k_{f}),2}^2, \end{aligned}$$

and

$$\begin{aligned} \sum _{S_{f},S'_-,S_0}\left\| {\varvec{O}_{S_{f}S'_-S_0}} \right\| _{p}^2\le \sum _{S_1} \sum _{S_{f}\perp S'_-\perp S_0=S_1}\left\| {\varvec{O}_{S_1}} \right\| _{p}^2&\le \sum _{\left| {S'_-} \right| =0}^{k-k_{f}} \left( {\begin{array}{c}\left| {S_1} \right| \\ S_{f}\end{array}}\right) \left( {\begin{array}{c}\left| {S_1} \right| -S_{f}\\ S'_-\end{array}}\right) \\ {}&\cdot \sum _{S_1}\left\| {\varvec{O}_{S_1}} \right\| _{p}^2 \\&\le (k-k_{f}+1) \frac{(\left| {S} \right| _{max})^{k}}{(k-k_{f})!k!}\\ {}&\cdot \sum _{S_1}\left\| {\varvec{O}_{S_1}} \right\| _{p}^2. \end{aligned}$$

These, together with the hidden constants \((\cdot )(\cdot )\), give the ultimate prefactors. \(\square \)

We can now prove the main lemma by summing over the set sizes \(k_f = 1,\ldots , k\).

Proof of Lemma II.3

$$\begin{aligned}&\sum _{S\subset \{n,\cdots , 1\}} \left( \sum _{\gamma }\sum _\alpha \left\| {\left[ \mathcal {L}_{\gamma }[\varvec{O}^\alpha ] \right] _S} \right\| _{p}\right) ^2\\&\quad = \sum _{S\subset \{n,\cdots , 1\}} \left( \sum _{S_2,S_1} \sum _{\gamma }\mathbb {1}(\gamma \sim S_2)\sum _\alpha \left\| {\left[ \mathcal {L}_{\gamma }[\varvec{O}^\alpha _{S_1}]\right] _S} \right\| _{p}\right) ^2 \\&\quad = \sum _{S\subset \{n,\cdots , 1\}} \left( \sum _{k_f=1}^k\sum _{\begin{array}{c} \left| {S_{f}} \right| =k_{f} \\ S = S_0\perp S_{f} \perp S_+ \end{array}}\sum _\alpha \left\| {\left[ \mathcal {L}_{\gamma }[\varvec{O}^\alpha _{S_{f}S_-S_0}]\right] _S} \right\| _{p}\right) ^2 \\&\quad \le \left( \sum _{k_f=1}^k \sqrt{\sum _{S\subset \{n,\cdots , 1\}} \left( \sum _{\begin{array}{c} \left| {S_{f}} \right| =k_{f} \\ S = S_0\perp S_{f} \perp S_+ \end{array}}\sum _\alpha \left\| {\left[ \mathcal {L}_{\gamma }[\varvec{O}^\alpha _{S_{f}S_-S_0}] \right] _S} \right\| _{p}\right) ^2} \right) ^2. \end{aligned}$$

The second equality presents the sets \(S_1,S_2\) by the decomposition \(S_{f},S_-,S_+,S_0\) and isolates the sum over \(k_f\). The last inequality might look intimidating, but it is actually a triangle inequality (over values \(\left| {S_{f}} \right| \) ) for certain 2-norm

$$\begin{aligned} \sqrt{\sum _{S} (\sum _{k_f} f(k_f,S))^2} \le \sum _{k_f} \sqrt{\sum _{S} f(k_f,S)^2} \quad \text {for any function} \quad f(k_f,S). \end{aligned}$$

We may now use a variant of Proposition II.3.1 with an additional sum over an abstract set \(\sum _\alpha \). The derivation is analogous by keeping the sum at the innermost layer (sticking to the operator \(\varvec{O}^\alpha \)). with the replacement

$$\begin{aligned} \left\| {\varvec{O}_{S_{f}S_-S_0}} \right\| _{p}\rightarrow \left( \sum _{\alpha }\left\| {\varvec{O}^\alpha _{S_{f}S_-S_0}} \right\| _{p}\right) . \end{aligned}$$

This is the advertised result. \(\quad \square \)

4.4 Bounds for \(g'\)-th order and beyond

The previous section evaluates the g-th order terms \(\mathcal {E}_g\). This section takes care of the last term in the Taylor expansion \(\mathcal {E}_{\ge g'}\). To ease notation, we set the dummy variable to be \(g'\rightarrow g\). It has infinite-order dependence on time, so we have to tweak the calculations. Recall (3.1),

$$\begin{aligned} \left\| {\varvec{\mathcal {E}}_{\ge g}} \right\| _{p}&= \left\| \sum ^{J}_{j=1} \sum _{m=j+1}^{J} \textrm{e}^{\mathcal {L}_J t}\cdots \textrm{e}^{\mathcal {L}_{m+1} t} \int _0^t dt_1\right. \\&\quad \left. \sum _{g_m+\cdots +g_{j+1}=g-1,g_m\ge 1}e^{\mathcal {L}_{m} t_1}\mathcal {L}^{g_m}_m\cdots \mathcal {L}^{g_{j+1}}_{j+1}[\varvec{H}_{j}] \frac{(t-t_1)^{g_m-1}t^{g-1-g_m}}{(g_m-1)!\cdots g_{j+1}!}\right| _{p} \\&\le \sum _{m=2}^{J} \left\| { \sum ^{J}_{j=m-1} \sum _{g_m+\cdots +g_{j+1}=g-1,g_m\ge 1}\mathcal {L}^{g_m}_m\cdots \mathcal {L}^{g_{j+1}}_{j+1}[\varvec{H}_{j}] \frac{t^{g-1}}{g_m!\cdots g_{j+1}!}} \right\| _{p} \\&\le \sqrt{C_p}^{g(k-1)+1}\Upsilon (t\Upsilon )^{g-1}\\ {}&\sum _{\gamma _{g-1}}^\Gamma \sqrt{ \sum _{S\subset \{n,\cdots , 1\}} \left( \sum _{\gamma _{g-2}}^\Gamma \cdots \sum _{\gamma _0}^\Gamma \left\| {\mathcal {L}_{\gamma _{g-1}}\left[ \mathcal {L}_{\gamma _{g-2}}\cdots \mathcal {L}_{\gamma _1} [\varvec{H}_{\gamma _0}]\right] _S} \right\| _{p}\right) ^2}. \end{aligned}$$

The first inequality exchanges the summation order, applies the triangle inequality, integrates over time, and removes the unitary conjugations by unitary invariance of p-norms. The second inequality is a similar calculation to (3.2). We use Hypercontractivity, pull the p-norm inside the sum, and symmetrize the sum by completing the exponential for \(\gamma _{g-2}\cdots \gamma _0\).

The only difference from (3.2) is the outer-most sum outside the square root.

Lemma II.5

(Sum outside the square-root).

$$\begin{aligned} \sum _{\gamma } \sqrt{\sum _{S\subset \{n,\cdots , 1\}} \left( \sum _\alpha \left\| {\left[ \mathcal {L}_{\gamma }[\varvec{O}^\alpha ]\right] _S} \right\| _{p}\right) ^2 }&\le \lambda '(k) \cdot \sqrt{\sum _{S_1\subset \{n,\cdots , 1\}} \left( \sum _{\alpha }\left\| {[\varvec{O}^\alpha ]_{S_1}} \right\| _{p}\right) ^2} \end{aligned}$$

where

$$\begin{aligned} \lambda '(k)= 2\cdot \sum _{k'=1}^k \left( {\begin{array}{c}k\\ k'\end{array}}\right) {\sqrt{20}}^{k'} \sqrt{\frac{1}{k'!}\left\| {\varvec{H}} \right\| _{(k'),1}\left\| {\varvec{H}} \right\| _{(global),1}}. \end{aligned}$$

We can evaluate the bound using Lemma II.5 for the outer-most sum \(\sum _{\gamma _{g-1}}^{\Gamma }\) and Lemma II.3 for \(\gamma _{g-2},\cdots \gamma _1\)

$$\begin{aligned} (cont.)&\le \Upsilon (\Upsilon t)^{g-1} \sqrt{C_p}^{g(k-1)+1} (g(k-1)+1)^{k/2} \lambda '(k) \cdot \nonumber \\ {}&\bigg ( ((g-1)(k-1)+1)\cdots (2(k-1)-1)\bigg )^{k} \lambda (k)^{g-2}\cdot \left\| {\varvec{H}} \right\| _{(global),2}\left\| {\varvec{I}} \right\| _{p}\nonumber \\&\le \sqrt{C_p}^{g(k-1)+1} c'(k) \cdot g^{gk}\Upsilon \left( c(k)\Upsilon t\right) ^{g-1}\left\| {\varvec{I}} \right\| _{p}. \end{aligned}$$
(3.4)

The last inequality absorbs constants into \(c'(k) \)

$$\begin{aligned} c'(k)&= \frac{\lambda '(k)}{\lambda (k)} \frac{1}{\sqrt{k}^k}\left\| {\varvec{H}} \right\| _{(global),2}. \end{aligned}$$

In other words, the higher-order time dependence forces us to apply triangle inequality for the outer layer sum; fortunately, we can still use Lemma II.3 for the inner sums. These give the different prefactor \(c'(k)\).

Proof of Lemma II.5

The calculation is analogous to Lemma II.3. We define a slightly different quantity

$$\begin{aligned} \left| {S_-} \right| +\left| {S_{f}} \right| := k' \end{aligned}$$

that will organize the combinatorics (the analog of the number \(k_f\) in Lemma II.3). We first rearrange the expression in terms of the subsets \(S_+,S_-,S_0,S_f\).

$$\begin{aligned}&\sum _{\gamma } \sqrt{\sum _{S\subset \{n,\cdots , 1\}} \left( \sum _\alpha \left\| {\left[ \mathcal {L}_{\gamma }[\varvec{O}^\alpha ]\right] _S} \right\| _{p}\right) ^2 }\\&\quad \le \sum _{S_2}\sum _{\gamma \sim S_2} \sqrt{ \sum _{S\subset \{n,\cdots , 1\}} \left( \sum _{S_1}\sum _\alpha \left\| {\left[ \mathcal {L}_{\gamma }[\varvec{O}^\alpha _{S_1}] \right] _S} \right\| _{p}\right) ^2 }\\&\quad \le \sum _{S_2}\sum _{\gamma \sim S_2} \sqrt{ \sum _{S\subset \{n,\cdots , 1\}} \left( \sum _{k'=1}^k\sum _{\begin{array}{c} S_-,S_{f}\subset S_2\\ \left| {S_-} \right| +\left| {S_{f}} \right| = k' \end{array}}2^{\left| {S_{f}} \right| } \sum _\alpha \left\| {\mathcal {L}_{\gamma }[\varvec{O}^\alpha _{S_{f}S_-S_0}]} \right\| _{p}\right) ^2} \\&\quad \le \sum _{k'=1}^k \sum _{S_2} \sum _{\gamma \sim S_2}\sqrt{ \sum _{S\subset \{n,\cdots , 1\}} \left( \sum _{\begin{array}{c} S_-,S_{f}\subset S_2\\ \left| {S_-} \right| +\left| {S_{f}} \right| = k' \end{array}}2^{\left| {S_{f}} \right| } \sum _\alpha \left\| {\mathcal {L}_{\gamma }[\varvec{O}^\alpha _{S_{f}S_-S_0}]} \right\| _{p}\right) ^2}. \end{aligned}$$

The first inequality parameterizes the sets \(S_1 = S_{f}S_-S_0\) that could give rise to S after taking the commutator \(\mathcal {L}_{\gamma }\). The factor \(2^{\left| {S_{f}} \right| }\) is due to Fact II.4. The second inequality is a triangle inequality to postpone the sum over \(k' = \left| {S_{f}} \right| +\left| {S_-} \right| \).

Next, we use Cauchy-Schwartz to break the non-linear expression into individual pieces. This costs multiplicative constant overheads that depend only on k.

$$\begin{aligned} (cont.)&\le \sum _{k'=1}^k\sum _{S_2} \sum _{\gamma \sim S_2} \sqrt{ \sum _{S_0} \sum _{\begin{array}{c} S_-,S_{f}\subset S_2\\ \left| {S_-} \right| +\left| {S_{f}} \right| = k' \end{array}} \left( \sum _\alpha \left\| {\mathcal {L}_{\gamma }[\varvec{O}^\alpha _{S_{f}S_-S_0}]} \right\| _{p}\right) ^2 (\sum _{S'_{f},S'_-} 2^{2\left| {S'_{f}} \right| }) } \\&\le \sum _{k'=1}^k\sum _{\begin{array}{c} S_+,S_-,S_{f}\\ \left| {S_-} \right| +\left| {S_{f}} \right| = k' \end{array}} 2b_{S_+S_-S_{f}} \sqrt{ \sum _{S_0} \left( \sum _\alpha \left\| {\varvec{O}^\alpha _{S_{f}S_-S_0}} \right\| _{p}\right) ^2 (\cdot )} \\&\le \sum _{k'=1}^k 2\sqrt{\sum _{\begin{array}{c} S_-,S_{f}\\ \left| {S_-} \right| +\left| {S_{f}} \right| = k' \end{array}} \left( \sum _{S_+} b_{S_+S_-S_{f}} \right)^2 } \sqrt{ \sum _{\begin{array}{c} S_-,S_{f},S_0 \\ \left| {S_-} \right| +\left| {S_{f}} \right| = k' \end{array}} \left( \sum _\alpha \left\| {\varvec{O}^\alpha _{S_{f}S_-S_0}} \right\| _{p}\right) ^2 (\cdot ) }. \end{aligned}$$

The first inequality is Cauchy-Schwartz, where the sum evaluates to

$$\begin{aligned} (\cdot )=\sum _{S'_{f},S'_-} 2^{2\left| {S'_{f}} \right| } = \left( {\begin{array}{c}k\\ k'\end{array}}\right) \cdot \sum ^{k'}_{k_{f}=1} \left( {\begin{array}{c}k'\\ k_{f}\end{array}}\right) 2^{2\left| {S'_{f}} \right| } = \left( {\begin{array}{c}k\\ k'\end{array}}\right) 5^{k'}. \end{aligned}$$

The second inequality is a triangle inequality for the sum over subsets \(S_-,S_{f}\subset S_2\), which then combines with the sum over \(S_2\). The fifth inequality is Cauchy-Schwartz’s. Lastly, we evaluate the combinatorial factors for each term

$$\begin{aligned} \sum _{S_-,S_{f}} \left( \sum _{S_+} b_{S_+S_-S_{f}}\right)^2&\le \sum _{S_-,S_{f}} \sum _{S_+} b_{S_+S_-S_{f}} \cdot \max _{S'_-,S'_{f}} \sum _{S'_+} b_{S'_+S'_-S'_{f}} \\&= \left( {\begin{array}{c}k\\ k'\end{array}}\right) 2^{k'}\left\| {\varvec{H}} \right\| _{(global),1}\cdot \Vert {\varvec{H}} \Vert _{(k'),1}, \end{aligned}$$

and

$$\begin{aligned} \sum _{S_-,S_{f},S_0} \left( \sum _\alpha \left\| {\varvec{O}^\alpha _{S_{f}S_-S_0}} \right\| _{p}\right) ^2 = \left( {\begin{array}{c}\left| {S_{max}} \right| \\ k'\end{array}}\right) 2^{k'}\cdot \sum _{S} \left( \sum _\alpha \left\| {\varvec{O}^\alpha _{S}} \right\| _{p}\right) ^2. \end{aligned}$$

These give the advertised result. \(\quad \square \)

4.5 Proof of Theorem II.1

Proof

For a short time \(\tau \), we arrange and perform the last integral using estimate \(\int (\tau ')^{g-1} d\tau '\le \tau '^{g}\)

$$\begin{aligned} \frac{\left\| {e^{\textrm{i}\varvec{H}\tau }- \varvec{S}_\ell (\tau )} \right\| _{p}}{\left\| {\varvec{I}} \right\| _{p}}&\le \int ^\tau _0 \frac{\left\| {\varvec{\mathcal {E}}(\tau ')} \right\| _{p}}{\left\| {\varvec{I}} \right\| _{p}} d\tau ' \\&\le \frac{\sqrt{C_p}}{c(k)} \left\| {\varvec{H}} \right\| _{(global),2} \cdot \sum _{g=\ell +1}^{g'-1} \left( g^{k}\sqrt{C_p}^{k-1}c(k)\Upsilon \tau \right) ^{g} \\&\quad + \sqrt{C_p} \frac{c'(k)}{c(k)} \cdot \left( g'^{k}\sqrt{C_p}^{k-1}c(k)\Upsilon \tau \right) ^{g'} \\&:= c'_{1,p} \sum _{g=\ell +1}^{g'-1} \left( g^{k}b_p\tau \right) ^{g} + c'_{2,p} \left( g'^{k}b_p\tau \right) ^{g'} \\&\le \frac{c'_{1,p}}{1-1/\textrm{e}} \left( (\ell +1)^{k} b_p\tau \right)^{\ell +1} + c'_{2,p} \exp \left(-\frac{1}{\textrm{e}(b_p\tau )^{1/k}} +1\right) \\&:= c_{1,p} ( b_p\tau )^{\ell +1} + c_{2,p} \exp \left( -\frac{1}{\textrm{e}(b_p\tau )^{1/k}}\right). \end{aligned}$$

In the second inequality we call the bounds for each \(g-th\) order (3.3) and the \(g'\)-th order (3.4) for a good value of

$$\begin{aligned} g' = \left\lfloor \frac{1}{\textrm{e}(b_p\tau )^{1/k}} \right\rfloor . \end{aligned}$$

This is possible as long as the following holds.

Constraint II.5.1

\( (\frac{1}{b_p\tau })^{1/k}\ge \textrm{e}(\ell +3). \)

Then, the total Trotter error at a long time \(t = r \cdot \tau \) is bounded by a telescoping sum

$$\begin{aligned}&\frac{\left\| {\varvec{\mathcal {E}}_{tot} } \right\| _{p}}{\left\| {\varvec{I}} \right\| _{p}} := \frac{\left\| {e^{\textrm{i}\varvec{H}t}- \varvec{S}_\ell (t/r)^r} \right\| _{p}}{\left\| {\varvec{I}} \right\| _{p}} \le r \cdot \frac{\left\| {e^{\textrm{i}\varvec{H}t/r}- \varvec{S}_\ell (t/r)} \right\| _{p}}{\left\| {\varvec{I}} \right\| _{p}} \\&\quad \le c_{1,p} \frac{(b_pt)^{\ell +1}}{r^{\ell }} + r c_{2,p} \exp \left( -\frac{1}{\textrm{e}}(\frac{r}{b_pt})^{1/k} \right) \\&\quad \le 2c_{1,p} \frac{(b_pt)^{\ell +1}}{r^{\ell }}\le p^\eta 2c_{1} \frac{(bt)^{\ell +1}}{r^{\ell }}.\\&\quad \text {where}\quad \eta :=\frac{(\ell +1)(k-1)+1}{2}. \end{aligned}$$

At the second line we restrict to sufficiently large values of r that the first term dominates.Footnote 12

Constraint II.5.2

\((\frac{1}{b_p\tau })^{1/k} \ge \textrm{e}\ln \left( \frac{c_2}{c_1} (\frac{1}{b_p\tau })^{\ell +1} \right)\).

The last inequality isolates the p-dependence and we use \(C_p = p-1\le p\).

Next, for each value of r,

$$\begin{aligned} \text {choose}\quad p= \left( \frac{\epsilon r^\ell }{2c_1(bt)^{\ell +1}}\right)^{1/\eta } \frac{1}{\textrm{e}}. \end{aligned}$$

Via Markov’s inequality, this gives concentration for its singular values(or over any 1-design inputs)

$$\begin{aligned} \left(\frac{ \left\| {\varvec{\mathcal {E}}_{tot}} \right\| _{p}}{\epsilon \Vert {\varvec{I}} \Vert _{p}} \right)^p \le \exp \left( - \frac{\eta }{\textrm{e}} ( \frac{\epsilon r^{\ell }}{2c_1(bt)^{\ell +1}} )^{1/\eta } \right) = \delta . \end{aligned}$$

Choose

$$\begin{aligned} r\quad \text {such that}\quad p\ge \max \left( 2, \log (1/\delta )/ \eta \right), \end{aligned}$$

which explicitly evaluates to

$$\begin{aligned} r\ge \left( \frac{2\sqrt{\textrm{e}(2\!+\!\log (1/\delta ) / \eta )}}{\textrm{e}\!-\!1} \left((\ell \!+\!1)\sqrt{\textrm{e}(2\!+\!\log (1/\delta ) / \eta ) }\right)^{(\ell \!+\!1)(k\!-\!1)} \cdot \frac{\left\| {\varvec{H}} \right\| _{(global),2}\Upsilon t }{\epsilon } \right)^{1/\ell }\\ c(k)\Upsilon t. \end{aligned}$$

We also need to comply with both Constraint II.5.1 and Constraint II.5.2, which summarizeFootnote 13 to

$$\begin{aligned} \frac{1}{b_p\tau } \ge a \quad \text {where} \quad a := \max \left[ \left(\textrm{e}(\ell +3) \right)^{k}, \left(2\textrm{e}\ln \left( \frac{c_2}{c_1} \right) \right)^{k}, x \right]. \end{aligned}$$

The constant x is the unique solution to the transcendental equation

$$\begin{aligned} x\quad \text {such that}\quad x = (2\textrm{e}(\ell +1))^{k}\cdot \ln ^{k}(x). \end{aligned}$$

Rearrange to obtain

$$\begin{aligned} r \ge a^{2\eta /k} \left( c(k)(\Upsilon t)^{1/k} \right) \left(\frac{1-1/\textrm{e}}{2 \textrm{e}^{\eta } (\ell +1)^{(\ell +1)k} } \frac{\epsilon }{\left\| {\varvec{H}} \right\| _{(global),2}}\right)^{\frac{k-1}{k}}, \end{aligned}$$

And recall the explicit values

$$\begin{aligned} c_1&= \frac{\Vert {\varvec{H}} \Vert _{(0),2}}{c(k)(1-1/\textrm{e})}\\ c_2&=\textrm{e}\frac{c'(k)}{c(k)}\\ c(k)&=ak^{k}\lambda (k)\\ c'(k)&= \frac{\lambda '(k)}{\lambda (k)} \frac{1}{\sqrt{k}^k}\left\| {\varvec{H}} \right\| _{(global),2} \end{aligned}$$

and

$$\begin{aligned} \lambda '(k)&= 2\cdot \sum _{k'=1}^k \left( {\begin{array}{c}k\\ k'\end{array}}\right) {\sqrt{20}}^{k'} \sqrt{\frac{1}{k'!}\left\| {\varvec{H}} \right\| _{(k'),1}\left\| {\varvec{H}} \right\| _{(global),1}}\\ \lambda (k)&=\frac{2^{k/2+1}}{(k-1)!} \sum _{k'=1}^k \frac{2^{k'/2}}{{(k-k')!}}\left\| {\varvec{H}} \right\| _{(k'),2}. \end{aligned}$$

The above expressions for gate count are for numerical evaluation; for comprehension, use \(\Omega (\cdot )\) to suppress functions of \(k,\ell \) (such as the number of stages \(\Upsilon \)) and note the local norms are decreasing with \(k'\).

$$\begin{aligned} r&= \Omega \bigg [ \ln (\delta )^{\eta /\ell }\left(\frac{ \left\| {\varvec{H}} \right\| _{(global),2}}{\epsilon \left\| {\varvec{H}} \right\| _{(local),2}} \right)^{\frac{1}{\ell }} \left( \left\| {\varvec{H}} \right\| _{(local),2} t\right)^{1+\frac{1}{\ell }} \\&\quad +(\left\| {\varvec{H}} \right\| _{(local),2}t)^{\frac{1}{k}} \left(\frac{\epsilon \left\| {\varvec{H}} \right\| _{(local),2}}{\left\| {\varvec{H}} \right\| _{(global),2}}\right)^{\frac{k-1}{k}} \left( \ln \left( \frac{\sqrt{\left\| {\varvec{H}} \right\| _{(local),1}\left\| {\varvec{H}} \right\| _{(global),1}}}{\left\| {\varvec{H}} \right\| _{(local),2}}\right) \right)^{2\eta } \bigg ]\\&=\Omega \left[ \ln (\delta )^{\frac{k-1}{2}} \left\| {\varvec{H}} \right\| _{(local),2} t \cdot \left( \ln (\delta )^{k/2}\frac{ \left\| {\varvec{H}} \right\| _{(global),2} t}{\epsilon } \right)^{\frac{1}{\ell }} \right]. \end{aligned}$$

The first term dominates for large time, system size, and error (fixing the value of failure probability \(\delta \)). The gate complexity is given by \(G = r\cdot \Upsilon \cdot \Gamma \). This is the advertised result. \(\quad \square \)

4.5.1 Constant overhead improvement from another hypercontractivity

One may consider directly apply the existing Hypercontractivity \(\left\| {\varvec{F}} \right\| _{\bar{p}}^2 \le \sum _{S} C_p^{\left| {S} \right| } \left\| {\varvec{F}_S} \right\| _{\bar{2}}^2\) (Proposition II.1). However, one needs to go through the same combinatorial estimates, with minor constant overheads improvements by replacing \(\left\| {\varvec{O}_{S_1}} \right\| _{p}^2\rightarrow \left\| {\varvec{O}_{S_1}} \right\| _{2}^2\) and discarding Fact II.4. Unfortunately, what comes into the ultimate quantity \(\left\| {\varvec{H}} \right\| _{(local),2}\) is the spectral norm \(\Vert {\varvec{H}_{\gamma }} \Vert \) coming from a Holder’s inequality

$$\begin{aligned} \left\| {\mathcal {L}_{\gamma } [\varvec{O}_{S_1}]} \right\| _{p} \le 2 \Vert {\varvec{H}_{\gamma }} \Vert \left\| {\varvec{O}_{S_1}} \right\| _{p}, \end{aligned}$$

and it requires more accounting to get better estimates.

4.6 Spin models at a low particle number

In many Hamiltonians, each term \(\varvec{H}_{\gamma }\) preserves the particle number and the total Hilbert space decomposes into a direct sum of subspaces labeled by their particle number. The input state may have a known particle number.

In this section, we will present an appropriate notion of concentration for input states drawn randomly from a fixed particle number subspace. Formally, denote the m-particle subspace by the orthogonal projector

then particle number preserving means

$$\begin{aligned} {[}\varvec{P}_{m'}, \varvec{H}_{\gamma }] = 0 \quad \text {for each} \quad m',\gamma . \end{aligned}$$

We need to first define the appropriate k-locality in this case by expanding the Hamiltonian in the basis

(3.5)

k-locality in this basis is defined by

$$\begin{aligned} k =\left| {S_-} \right| +\left| {S_+} \right| +\left| {S_z} \right| . \end{aligned}$$

Note that particle number preserving enforces the number of raising and lower operators match \(\left| {S_+} \right| =\left| {S_-} \right| \). This expansion is motivated by an auxiliary product state \(\varvec{\rho }_{\frac{m}{n}}\) that closely relates to the normalized subspace projector \(\bar{\varvec{P}}_m = \varvec{P}_m/\textrm{Tr}[\varvec{P}_m]\). Intuitively, the operator \(\varvec{O}^{\eta }\) is the analog of Pauli \(\varvec{\sigma }^z\) in a biased background

See Sect. 2.3 for the details on the construction. Here, we present the concentration result for Trotter error.

Proposition II.5.1

(Trotter error in k-local models). To simulate a number preserving k-local Hamiltonian using the \(\ell \)-th order Suzuki formula on the m-particle subspace \(\varvec{P}_m\), the gate complexity

$$\begin{aligned}&G =\Omega \left( \left( \textrm{Poly}(n,m)^{1/p}\frac{p^{\frac{k}{2}} \left\| {\varvec{H}} \right\| _{(global),2} t}{\epsilon } \right) ^{1/\ell } \Gamma p^{\frac{k-1}{2}} \left\| {\varvec{H}} \right\| _{(local),2} t\right) \\&\quad \text {ensures}\ \ \left\| {\textrm{e}^{\textrm{i}\varvec{H}t}- \varvec{S}_{\ell }(t/r)^r} \right\| _{p,\bar{\varvec{P}}_m} \le \epsilon , \end{aligned}$$

where the quantities \(\left\| {\varvec{H}} \right\| _{(global),2}\) and \(\left\| {\varvec{H}} \right\| _{(local),2} \) are defined w.r.t to (3.5).

Note that we have dropped the parameter s in \(\left\| {\cdot } \right\| _{p,\bar{\varvec{P}}_m,s}\) since every term commutes with \(\varvec{P}_m\) (and the auxiliary state \(\varvec{\rho }_{\frac{m}{n}}\)).

Proof

The result quickly follows by converting to the p-norm w.r.t. the auxiliary product state defined by the filling ratio \(\eta = \frac{m}{n}\). For \(\varvec{F}= [\varvec{\mathcal {E}} (\varvec{H}_1,\ldots ,\varvec{H}_\Gamma , t)]_{g}\),

$$\begin{aligned} \displaystyle \left\| {\varvec{F}} \right\| _{p,\bar{\varvec{P}}} \le \left\| {\varvec{F}} \right\| _{p,\varvec{\rho }_{\eta }} \cdot \left( \textrm{Poly}(n,m)\right) ^{1/p} \le \sqrt{\sum _{S\subset \{m,\cdots ,1\}} (C_p)^{|S|} \left\| {\varvec{F}_S} \right\| _{p,\varvec{\rho }_{\eta }}^2 }\cdot \left( \textrm{Poly}(n,m)\right) ^{1/p}. \end{aligned}$$

Some technical notes: Holder’s inequality still works for the \(\varvec{\rho }_{\eta }-\) weighted normsFootnote 14\(\left\| {\varvec{H}_{\gamma } \varvec{O}} \right\| _{p,\varvec{\rho }_{\eta }}\le \left\| {\varvec{O}} \right\| _{p,\varvec{\rho }_{\eta }} \Vert {\varvec{H}_{\gamma }} \Vert \) (which needs not be true for general \(\varvec{\rho }\)); if \(\varvec{O}\) is particle number preserving, then \([\varvec{O}]_{S}\) is also particle number preserving.Footnote 15\(\quad \square \)

Via Markov’s inequality (plug \(\varvec{\rho }=\bar{\varvec{P}}_m \) into Proposition II.0.1), we obtain concentration.

Corollary II.5.1

Draw from a \(m = \eta n\) - particle subspaces (i.e., ), then

4.7 k-locality for fermions

Analogously, we generalize to Hamiltonians with Fermionic terms. We begin with defining k-locality for Fermionic systems. Suppose the particle-number preserving Fermionic Hamiltonian can be written as

$$\begin{aligned} \varvec{H}= \sum _{S_-\perp S_+\perp S_z\subset \{m,\cdots ,1\}} b_{S_+S_-S_z} \prod _{s_+\in S_+ } \varvec{a}^\dagger _{s_+} \prod _{s_-\in S_- } \varvec{a}_{s_-}\prod _{s_z\in S_z }\varvec{O}^\eta _{s_z}. \end{aligned}$$
(3.6)

Again, particle number preserving enforces \(\left| {S_+} \right| =\left| {S_-} \right| \).Footnote 16 Recall the second quantization commutation relations (following [18])

$$\begin{aligned} {[}\varvec{a}^\dagger ,\varvec{O}^\eta ]&= -\varvec{a}^\dagger ,\\ [\varvec{a},\varvec{O}^\eta ]&= \varvec{a}, \end{aligned}$$

and

$$\begin{aligned} {[}\varvec{a}^\dagger _j\varvec{a}_k,\varvec{a}^\dagger _\ell \varvec{a}_m ]&= \delta _{kl} \varvec{a}^\dagger _j\varvec{a}_m- \delta _{jm} \varvec{a}^\dagger _{\ell }\varvec{a}_k,\nonumber \\ {[}\varvec{a}^\dagger _j\varvec{a}_k,\varvec{O}^{\eta }_\ell ]&= \delta _{kl} \varvec{a}^\dagger _j\varvec{a}_\ell - \delta _{j\ell } \varvec{a}^\dagger _{\ell }\varvec{a}_k,\nonumber \\ {[}\varvec{a}^\dagger _j\varvec{a}_k,\varvec{O}^{\eta }_\ell \varvec{O}^{\eta }_m ]&= \left(\delta _{kl} \varvec{a}^\dagger _j\varvec{a}_\ell - \delta _{j\ell } \varvec{a}^\dagger _{\ell }\varvec{a}_k \right)\varvec{O}^{\eta }_m+ \varvec{O}^{\eta }_{\ell }\left(\delta _{km} \varvec{a}^\dagger _j\varvec{a}_m- \delta _{jm} \varvec{a}^\dagger _{m}\varvec{a}_k \right). \end{aligned}$$
(3.7)

Compared with k-local Paulis, the only difference for the Fermionic case is (3.7): commuting two Fermionic operators on the same site \(\ell \) can produce an identity \(\varvec{I}_{\ell }\). This would add an extra term in our effective 2–2 norm calculation (Corollary II.3)

$$\begin{aligned} \lambda _{ferm}(k):=\frac{2^{k/2+1}}{(k-1)!} \sum _{k_{f}=1}^k \frac{2^{k_{f}/2}}{{(k-k_{f})!}}\left\| {\varvec{H}} \right\| _{(k_{f}),2} + \frac{2^{k/2+1}}{(k-1)!} \frac{1}{{k!}}\left\| {\varvec{H}_{ferm}} \right\| _{(0),2}, \end{aligned}$$

where the “global” 2-norm \(\left\| {\cdot } \right\| _{(0),2}\) only contains Fermionic operators

$$\begin{aligned} \varvec{H}_{ferm}:= \sum _{\left| {S_-} \right| +\left| {S_+} \right| \ne 0, S_-\perp S_+\perp S_z\subset \{m,\cdots ,1\}} b_{S_+S_-S_z} \prod _{s_+\in S_+ } \varvec{a}^\dagger _{s_+} \prod _{s_-\in S_- } \varvec{a}_{s_-}\prod _{s_z\in S_z }\varvec{O}^\eta _{s_z}. \end{aligned}$$

Intuitively, when identity is produced at the overlapping site, more terms may collide, i.e., add coherently. See Sect. 4.2 for an example where this term is necessary. Otherwise, the rest of the calculation is identical (\(\lambda '(k)\) remains the same). Note that we would use a Fermionic version of Fact II.4, which can be shown by a gauge transformation argument.

Proposition II.5.2

(k-local Fermionic Hamiltonians). To simulate a k-local, particle number preserving Fermionic Hamiltonian using \(\ell \)-th order Suzuki formula on m-particle subspace \(\varvec{P}_m\), the gate complexity

$$\begin{aligned}&G =\Omega \left(\left(\frac{\textrm{Poly}(n,m)^{1/p} p^{k/2}\left\| {\varvec{H}} \right\| _{(global),2} t}{\epsilon }\right)^{1/\ell } \Gamma p^{k/2}(\left\| {\varvec{H}} \right\| _{(local),2}+\left\| {\varvec{H}_{ferm}} \right\| _{(0),2}) t\right) \ \ \\&\quad \text {ensures}\ \ \left\| {\textrm{e}^{\textrm{i}\varvec{H}t}- \varvec{S}(t/r)^r} \right\| _{\bar{p},\varvec{P}_{m}} \le \epsilon , \end{aligned}$$

where \(\left\| {\varvec{H}} \right\| _{(global),2}, \left\| {\varvec{H}} \right\| _{(local),2} \) is defined w.r.t to (3.6).

Corollary II.5.2

Draw from a \(m = \eta n\) - particle subspaces (i.e., ), then

5 Optimality for First-order and Second-order Formulas

We demonstrate the optimality of our p-norm estimates for a particular 2-local Hamiltonian, at short times, for the first and second-order Lie-Trotter-Suzuki formulas. The \(k\ge 2\) cases can also be constructed analogously. Consider the Hamiltonian

$$\begin{aligned} \varvec{H}= \sum _{i>j} \alpha _{ij} \varvec{\sigma }^z_i\varvec{\sigma }^z_j + \sum _{i>j} \alpha _{ij} \varvec{\sigma }^x_i\varvec{\sigma }^x_j =: \varvec{A}+\varvec{B}\end{aligned}$$

for the first-order Trotter formula

$$\begin{aligned} \textrm{e}^{\textrm{i}(\varvec{A}+\varvec{B})t} - \textrm{e}^{\textrm{i}\varvec{A}} \textrm{e}^{ \textrm{i}\varvec{B}t}&= \frac{1}{2}[\varvec{A},\varvec{B}]t^2 +\mathcal {O}(t^3). \end{aligned}$$

We can exactly compute its 2-norm due to the orthogonality of Paulis

$$\begin{aligned} \left\| { [\varvec{A},\varvec{B}] } \right\| _{2}^2&= \left\| { \left[\sum _{k>\ell } \alpha _{k\ell } \varvec{\sigma }^z_k\varvec{\sigma }^z_{\ell }, \sum _{i>j} \alpha _{ij} \varvec{\sigma }^x_i\varvec{\sigma }^x_j\right]} \right\| _{2}^2\\&= \sum _{\{i,j,k\}} \left\| {\alpha _{ij}\alpha _{jk} \varvec{\sigma }^z_i \varvec{\sigma }^y_j \varvec{\sigma }^x_k } \right\| _{2}^2 = \sum _{\{i,j,k\}} \alpha _{ij}^2\alpha _{jk}^2. \end{aligned}$$

For our upper bounds (3.3),

$$\begin{aligned} \left\| {\varvec{H}} \right\| _{(global),2}^2 =\sum _{ij} 4\alpha _{ij}^2 \quad \text {and}\quad \left\| {\varvec{H}} \right\| _{(local),2}^2=\max _i \sum _{j} 4\alpha _{ij}^2, \end{aligned}$$

which means when \(\alpha _{ij}=1\) are equal strength,

$$\begin{aligned} \left\| {\varvec{H}} \right\| _{(global),2}^2\left\| {\varvec{H}} \right\| _{(local),2}^2 = \theta \left( \left\| { [\varvec{A},\varvec{B}] } \right\| _{2}^2\right). \end{aligned}$$

It is less obvious how to calculate its p-norm or operator norm.

To obtain tight p-norm and spectral norm estimates, we construct another Hamiltonian on three set of qubits \(\mathcal {H}= \mathcal {H}_{S_1}\otimes \mathcal {H}_{S_2}\otimes \mathcal {H}_{S_3}\)

$$\begin{aligned} \varvec{H}= \sum _{s_1\in S_1, s_2\in S_2} \varvec{\sigma }^z_{s_1}\varvec{\sigma }^x_{s_2} + \sum _{s_2\in S_2, i_3\in S_3} \varvec{\sigma }^y_{s_2}\varvec{\sigma }^z_{s_3}:= \varvec{A}+\varvec{B}. \end{aligned}$$
(4.1)

The commutator evaluates to a factorized commuting sum

$$\begin{aligned} {[}\varvec{A},\varvec{B}]&= \left[\sum _{s_1\in S_1, s_2\in S_2} \varvec{\sigma }^z_{s_1}\varvec{\sigma }^x_{s_2}, \sum _{s_2\in S_2, s_3\in S_3} \varvec{\sigma }^y_{s_2}\varvec{\sigma }^z_{s_3} \right]\\&= 2\sum _{s_1\in S_1, s_2\in S_2, s_2\in S_3} \varvec{\sigma }^z_{s_1}\varvec{\sigma }^z_{s_2} \varvec{\sigma }^z_{s_3} = 2(\sum _{s_1\in S_1} \varvec{\sigma }^z_{s_1}) \cdot (\sum _{s_2\in S_2} \varvec{\sigma }^z_{s_2} )\cdot (\sum _{s_3\in S_3} \varvec{\sigma }^z_{s_3}) . \end{aligned}$$

Its p-norms can be obtain by central limit theorem at large \(\left| {S_1} \right| , \left| {S_2} \right| , \left| {S_3} \right| \)

$$\begin{aligned} \left\| { \sum _{s_1\in S_1} \varvec{\sigma }^z_{s_1} } \right\| _{p}= \Omega ( \sqrt{p\left| {S_1} \right| } ) \left\| {\varvec{I}} \right\| _{p}, \end{aligned}$$

where we recall the p-th moment of standard Gaussian \(\left| {g} \right| _p=\theta (\sqrt{ p } )\). Now, let \(\left| { S_1} \right| =\left| { S_2} \right| =\left| { S_3} \right| =\theta (n)\), then it saturates our first-order p-norm upper bound (3.3).

$$\begin{aligned} \left\| { [\varvec{A},\varvec{B}]} \right\| _{p}&=\Omega (\sqrt{pn})^3 \left\| {\varvec{I}} \right\| _{p}\\ \sqrt{C_p}^3\left\| {\varvec{H}} \right\| _{(global),2}\left\| {\varvec{H}} \right\| _{(local),2} \left\| {\varvec{I}} \right\| _{p}&= \mathcal {O}\left( \sqrt{p}^3 \cdot \sqrt{ n^2} \cdot \sqrt{n}\right) \left\| {\varvec{I}} \right\| _{p}. \end{aligned}$$

At the same time, its spectral norm

$$\begin{aligned} \Vert { [\varvec{A},\varvec{B}] } \Vert&= \theta ( n^3 ) = \left\| {\varvec{H}} \right\| _{(global),1}\left\| {\varvec{H}} \right\| _{(local),1} \end{aligned}$$

matches the triangle inequality bound in [18].

5.1 Second-order suzuki formulas

For the second-order Trotter error, recall the expansion [18, Appendix L],

$$\begin{aligned} \textrm{e}^{\textrm{i}(\varvec{A}+\varvec{B})t} - \textrm{e}^{\textrm{i}\varvec{A}t/2} \textrm{e}^{ \textrm{i}\varvec{B}t} \textrm{e}^{\textrm{i}\varvec{A}t/2}= -\frac{\textrm{i}}{12} \left([\varvec{B},[\varvec{B},\varvec{A}]] - \frac{1}{2}[\varvec{A},[\varvec{A},\varvec{B}]]\right)t^3 +\mathcal {O}(t^4) \end{aligned}$$

with the same Hamiltonian (4.1). Due to the symmetry, we know \([\varvec{B},[\varvec{B},\varvec{A}]]\) has the same p-norm as \([\varvec{A},[\varvec{A},\varvec{B}]]\). Conveniently, the factor \(\frac{1}{2}\) allows us to consider only one term (at most losing a constant overhead \(\frac{1}{2}\))

$$\begin{aligned} {[}\varvec{B},[\varvec{B},\varvec{A}]]&= -4\sum _{s_1\in S_1, s_2\in S_2, s_3, s'_3\in S_3} \varvec{\sigma }^z_{s_1}\varvec{\sigma }^x_{s_2} \varvec{\sigma }^z_{s_3} \varvec{\sigma }^z_{s'_3}. \end{aligned}$$

This converges to a function of three independent Gaussians (note that the \(s_3\), \(s'_3\) are two dummy indexes in the same set \(S_3\))

$$\begin{aligned} \left\| {[\varvec{B},[\varvec{B},\varvec{A}]]} \right\| _{p}&= \Omega ( \left| {g_1g_2g_3^2} \right| {p} ) \left\| {\varvec{I}} \right\| _{p} = \Omega (\sqrt{pn})^4 \left\| {\varvec{I}} \right\| _{p},\\ \sqrt{C_p}^4\left\| {\varvec{H}} \right\| _{(global),2}\left\| {\varvec{H}} \right\| _{(local),2}^2 \left\| {\varvec{I}} \right\| _{p}&= \mathcal {O}\left( \sqrt{p}^4 \cdot \sqrt{ n^2} \cdot \sqrt{n}^2\right) \left\| {\varvec{I}} \right\| _{p}, \end{aligned}$$

matching our p-norm bound. The spectral norm

$$\begin{aligned} \Vert { [\varvec{B},[\varvec{B},\varvec{A}]]} \Vert&= \theta ( n^4 ) = \left\| {\varvec{H}} \right\| _{(global),1}\left\| {\varvec{H}} \right\| _{(local),1}^2 \end{aligned}$$

again matches the triangle inequality bounds in [18].

5.2 Fermionic hamiltonians

To demonstrate the need for the extra term for Fermionic Hamiltonians \(\left\| {\varvec{H}_{ferm}} \right\| _{(0),2}\), consider a Hamiltonian of the form

$$\begin{aligned} \varvec{H}= \sum _{s_1\in S_1, s_2\in S_2} \varvec{a}_{s_1}\varvec{a}^\dagger _{s_2}+ \varvec{a}^\dagger _{s_1}\varvec{a}_{s_2} + \sum _{s_2\in S_2, s_3\in S_3} \varvec{a}_{s_2}\varvec{a}^\dagger _{s_3} +\varvec{a}^\dagger _{s_2}\varvec{a}_{s_3}:= \varvec{A}+\varvec{B}. \end{aligned}$$

The commutator evaluates to

$$\begin{aligned}{}[\varvec{B},\varvec{A}]&= \sum _{s_1\in S_1, s_2\in S_2, s_2\in S_3} \varvec{a}_{s_1} \varvec{a}^\dagger _{s_3} -\varvec{a}^\dagger _{s_1} \varvec{a}_{s_3}\\ \left\| {[\varvec{A},\varvec{B}]} \right\| _{2}&= 2 \sqrt{\left| {S_1} \right| } \left| {S_2} \right| \sqrt{\left| {S_3} \right| } = \theta ( \left\| {\varvec{H}} \right\| _{(global),2}\left\| {\varvec{H}} \right\| _{(global),2}). \end{aligned}$$

And for the second-order Suzuki,

$$\begin{aligned} {[}\varvec{B},[\varvec{B},\varvec{A}] ]&= -\left| {S_2} \right| \cdot \sum _{s_1\in S_1, s_2\in S_2, s_2\in S_3} \varvec{a}^\dagger _{s_2} \varvec{a}_{s_1} + \varvec{a}_{s_1}\varvec{a}^\dagger _{s_2}\\ \left\| {[ \varvec{B},[\varvec{B},\varvec{A}]]} \right\| _{2}&= 2 \sqrt{\left| {S_1} \right| } \left| {S_2} \right| ^2 \sqrt{\left| {S_3} \right| } = \theta ( \left\| {\varvec{H}} \right\| _{(global),2}^2\left\| {\varvec{H}} \right\| _{(global),2}). \end{aligned}$$

6 Preliminary: Matrix-Valued Martingales

Concentration inequalities are well known for an i.i.d. sum of random numbers. Unfortunately, the phenomena in the wild are rarely like a sum, identical, or independent yet nonetheless concentrate around the mean. Among the zoo of extensions that attempt to capture realistic randomness, a (scalar-valued) martingale describes a random process that the future has zero mean conditioned on the past. Martingales constitute a class more flexible than i.i.d. sums that will serve our purpose.

For a minimal technical introduction (following Tropp [57] and Huang et. al [24]), consider a filtration of the master sigma algebra \(\mathcal {F}_0\subset \mathcal {F}_1 \subset \mathcal {F}_2 \cdots \subset \mathcal {F}_t \subset \cdots \mathcal {F}\), where for each filtration \(\mathcal {F}_j\) we denote the conditional expectation \(\mathbb {E}_j\). Intuitively, we can think of the index t as the ’time’, where the associated filtration \(\mathcal {F}_t\) hosts possible events happening before time t. More precisely, a martingale is a sequence of random variable \(Y_t\) adapted to the filtration \(\mathcal {F}_t\) such that

$$\begin{aligned} \sigma (Y_t)&\subset \mathcal {F}_t&\text {(causality)},\\ \mathbb {E}_{t-1} Y_t&= Y_{t-1}&\text {(status quo)}. \end{aligned}$$

In other words, the present depends on the past (’causality’), and tomorrow has the same expectation as today (’status quo’). For simplicity, we often subtract the mean to obtain a martingale difference sequence

$$\begin{aligned} \mathbb {E}_{t-1} D_t = 0 \quad \text {where}\quad D_t:=Y_t-Y_{t-1}. \end{aligned}$$

6.1 Useful norms and recursive bounds for matrices

In our case, our goal is to quantify the error between the ideal unitary \(\varvec{U}=\textrm{e}^{\textrm{i}\varvec{H}t}\) and the product formula \(\varvec{S}\) where the Hamiltonian is drawn randomly. This can be framed as a matrix-valued martingale

$$\begin{aligned} \sigma (\varvec{Y}_t)&\subset \mathcal {F}_t,\\ \mathbb {E}_{t-1} \varvec{Y}_t&= \varvec{Y}_{t-1}, \end{aligned}$$

where the conditional expectation \(\mathbb {E}_{t-1}\) acts entrywise. In other words, the randomness here has both classical (the expectation \(\mathbb {E}\)) and quantum (the trace \(\textrm{Tr}[\cdot ]\)) sources. In comparison, our previous discussion on Paulis strings does not have the above extra layer of classical randomness. This will give slightly different flavors.

Historically, the earliest general results on matrix-valued martingales were established in [36, 37, 45], and more recent works and applications include [13, 14, 24, 26, 44, 57]. Throughout this work, our main driving horse is again uniform smoothness (in a slightly different format from uniform smoothness for subsystems (Proposition II.1.1)). It is not the tightest kind of martingale inequality but arguably the simplest and most robust when matrices are bounded (or with Gaussian coefficients via the central limit theorem). Analogously to Proposition II.1.1, these inequalities deliver sum-of-square (“incoherent”) estimates sharper than the triangle inequality, which is linear (“coherent”).

To study concentration of matrices, we first pick a suitable norm. The error between the ideal unitary and the product formula can be quantified in two ways with different operational meanings. For both norms, uniform smoothness streamlines our concentration results (Sects. 6, 7).

6.1.1 The operator norm

The operator norm quantifies the error for the worst input state

If we are interested in concentration of the operator norm, it suffices to control its moments by the expected Schatten p-norm

$$\begin{aligned} (\mathbb {E}\Vert \varvec{Y}\Vert ^p )^{1/p}\le (\mathbb {E}\Vert \varvec{Y}\Vert ^p_p)^{1/p}=:{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{Y} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_p. \end{aligned}$$

To bound the RHS, the driving horse is the following bound with only a martingale requirement (“conditionally zero-mean”).

Fact V.1

(Uniform smoothness for Schatten classes [24, Proposition 4.3]). Consider random matrices \(\varvec{X}, \varvec{Y}\) of the same size that satisfy \(\mathbb {E}[\varvec{Y}|\varvec{X}] = 0\). When \(2 \le p\),

$$\begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{X}+\varvec{Y} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{p}^2 \le {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{X} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{p}^2 + C_p{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{Y} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{p}^2. \end{aligned}$$

The constant \(C_p = p - 1\) is the best possible.

Uniform smoothness for Schatten classes in another form (Fact I.4) was proven by [55] with optimal constants determined by [4]. The above martingale form is due to [42, 46] and [24, Proposition 4.3]. This can be alternatively seen as a special case of Proposition II.1.1 by interpreting the classical expectation as a trace.Footnote 17

6.1.2 Fixed input state

Sometimes we only care about a fixed but arbitrary input state . This deserves another error metric (following [14]) that differs from the spectral norm by an order of quantifier

Uniform smoothness for this norm follows.

Corollary V.1.1

(Uniform smoothness, fixed input [13]). Consider random matrices \(\varvec{X}, \varvec{Y}\) of the same size that satisfy \(\mathbb {E}[\varvec{Y}|\varvec{X}] = 0\). When \(2 \le p\),

$$\begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{X}+\varvec{Y} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\text {fix},p}^2&= {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{X} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\text {fix},p}^2+C_p{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{Y} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\text {fix},p}^2 \end{aligned}$$

with constant \(C_p = p - 1\).

Proof

This can be seen by rewriting the \(\ell _2\)-norm as a p-norm

$$\begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{X}+\varvec{Y} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\text {fix},p}^2&= \sup _{rank(\varvec{P})=1}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{X}\varvec{P}+\varvec{Y}\varvec{P} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{p}^2\\&\le \sup _{rank(\varvec{P})=1}\left( {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{X}\varvec{P} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{p}^2 + C_p{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{Y}\varvec{P} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{p}^2\right) \\&\le \sup _{rank(\varvec{P})=1}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{X}\varvec{P} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{p}^2 + C_p\sup _{rank(\varvec{P})=1}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{Y}\varvec{P} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{p}^2. \end{aligned}$$

\(\square \)

Note that the pure inputs capture general mixed inputs \(\varvec{\rho }\) by convexity

The second inequality is a telescoping sum. The third equality uses that the operator norm equals to the 1-norm \(\Vert {\cdot } \Vert =\left\| {\cdot } \right\| _{1}\) for rank 1 matrices.

6.2 Reminders of useful facts

Before we turn to the proof, let us remind ourselves of the useful properties for the underlying norms \({\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \cdot \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{*}:={\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \cdot \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{p,q}, {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \cdot \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\text {fix},p}\) for \(p,q\ge 2\). They are largely inherited from the (non-random) Schatten p-norm. Following [13],

Fact V.2

(Non-commutative Minkowski). Each of the expected moments satisfies the triangle inequality and thus is a valid norm. For any random matrix \(\varvec{X}, \varvec{Y}\)

$$\begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{X}+\varvec{Y} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{*} \le {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{X} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{*}+{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{Y} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{*}. \end{aligned}$$

Fact V.3

(Operator ideal norms). For operators \(\varvec{A}\) deterministic and \(\varvec{X}\) random

$$\begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{A}\varvec{X} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{*}, {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{X}\varvec{A} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{*} \le \Vert \varvec{A}\Vert \cdot {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{X} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{*}. \end{aligned}$$

Fact V.4

(Unitarily invariant norms). For \(\varvec{U}, \varvec{V}\) deterministic unitaries and random operator \(\varvec{X}\)

$$\begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{U}\varvec{X}\varvec{V} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{*} = {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{X} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{*}. \end{aligned}$$

Being operator ideal already implies unitary invariance, but we state it regardless. As the norm \({\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \cdot \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\text {fix},p}\) defined for low-rank input is somewhat non-standard, we include a proof as follows.

Proof of Fact V.3 for fixed inputs

The case \({\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{A}\varvec{X} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\text {fix},p}\) follows from the fact that p-norms are operator ideal. For the other ordering,

$$\begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{X}\varvec{A} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\text {fix},p}&=\sup _{rank(\varvec{P})=1}(\mathbb {E}[\Vert \varvec{X}\varvec{A}\varvec{P}\Vert _p^p])^{1/p}\\&=\sup _{rank(\varvec{P})=1}(\mathbb {E}[\Vert \varvec{X}\varvec{P}'\varvec{A}'\Vert _p^p])^{1/p}\\&=\sup _{rank(\varvec{P}')=1}(\mathbb {E}[\Vert \varvec{X}\varvec{P}'\Vert _p^p])^{1/p} \Vert \varvec{A}\Vert \\&={\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{X} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\text {fix},p} \Vert \varvec{A}\Vert . \end{aligned}$$

In the second line, we use the singular value decomposition

$$\begin{aligned} \varvec{A}\varvec{P} = \varvec{U}\varvec{S}\varvec{V}= \varvec{U}\varvec{S}_1 \varvec{S}_{\varvec{A}'}\varvec{V} = \varvec{U}\varvec{S}_{1}\varvec{U}^\dagger \cdot \varvec{U}\varvec{S}_{\varvec{A}'}\varvec{V} := \varvec{P}'\varvec{A}', \end{aligned}$$

where we rewrite the diagonal elements as product \(\varvec{S}=\varvec{S}_{1}\varvec{S}_{\varvec{A}'}\), where \(\varvec{S}_1\) is a rank 1 projector and \(\Vert \varvec{S}_{\varvec{A}'}\Vert \le \Vert \varvec{S}\Vert \le \Vert \varvec{A}\Vert \). This is possible since \(\varvec{S}\) is bounded by \(\Vert \varvec{S}\Vert \le \Vert \varvec{P}\varvec{A}\Vert \le \Vert \varvec{A}\Vert \). This is the advertised result. \(\quad \square \)

7 First-Order Trotter for Random Hamiltonians

In this section, we employ matrix martingales techniques on the first-order Lie-Trotter formula for random Hamiltonians.Footnote 18 Recall, it suffices to control the Trotter error represented in the exponentiated form [18]

$$\begin{aligned} \textrm{e}^{\textrm{i}\varvec{H}_\Gamma t} \cdots \textrm{e}^{\textrm{i}\varvec{H}_1}&= \exp _{\mathcal {T}}\left(\textrm{i}\int (\varvec{\mathcal {E}} (t)+ \varvec{H})dt\right)\quad \text {where}\quad \varvec{\mathcal {E}} (t)\\&:= \sum ^{\Gamma }_{k=2}\left( \prod _{\gamma =k-1}^{1} \textrm{e}^{\mathcal {L}_\gamma t} [\varvec{H}_k] -\varvec{H}_k\right) . \end{aligned}$$

Theorem VI.1

(First-order Trotter for random Hamiltonians). Consider a random Hamiltonian \( \varvec{H}= \sum _{\gamma =1}^\Gamma \varvec{H}_\gamma \) on n-qubits, where each term \(\varvec{H}_\gamma \) is independent, zero mean, and almost surely bounded

$$\begin{aligned} \mathbb {E}\varvec{H}_\gamma =0\quad \text {and}\quad \Vert {\varvec{H}_\gamma } \Vert \le b_\gamma . \end{aligned}$$

Then, the gate count

$$\begin{aligned}&G = 2\sqrt{2}\left(n\ln (2)+\log (\textrm{e}^2/\delta )\right) \Gamma \left\| {\varvec{H}} \right\| _{(global),2} \left\| {\varvec{H}} \right\| _{(local),2}\frac{ t^2}{\epsilon } \ \ \text {ensures}\ \ \\&\quad \Pr \left(\Vert {e^{\textrm{i}\varvec{H}t}- \varvec{S}_1(t/r)^r} \Vert \ge \epsilon \right) \le \delta . \end{aligned}$$

For arbitrary fixed input state \(\varvec{\rho }\), the gate count

$$\begin{aligned}&G = 2\sqrt{2}\log (\textrm{e}^2/\delta ) \Gamma \left\| {\varvec{H}} \right\| _{(global),2} \left\| {\varvec{H}} \right\| _{(local),2}\frac{t^2}{\epsilon } \ \ \text {ensures}\ \ \\&\quad \Pr \left( \frac{1}{2}\left\| {(\textrm{e}^{-\textrm{i}\varvec{H}t}\varvec{\rho }\textrm{e}^{\textrm{i}\varvec{H}t}- \varvec{S}_1(t/r)^{\dagger r}\varvec{\rho }\varvec{S}_1(t/r)^r)} \right\| _{1} \ge \epsilon \right) \le \delta . \end{aligned}$$

We see that the gate counts depend on the 2-norm quantities \( \left\| {\varvec{H}} \right\| _{(global),2}:=\sqrt{\sum _{\gamma } b_\gamma ^2}\) and \(\left\| {\varvec{H}} \right\| _{(local),2}:=\max _{i} \sqrt{\sum _{\gamma : i \subset \gamma } b^2_\gamma }\), but differ by the logarithm of the dimension \(\log (d^n)\). Often, the Hamiltonian we encounter has gaussian coefficients. By a central limit theorem, we may quickly obtain an analogous result.

Corollary VI.1.1

(Gaussian coefficients). Theorem VI.1 also holds for random Hamiltonian where each term \(\varvec{H}_{\gamma }\) is a deterministic bounded matrix with i.i.d. standard Gaussian coefficients

$$\begin{aligned} \varvec{H}_\gamma = g_\gamma \varvec{K}_\gamma \quad \text {and}\quad \Vert {\varvec{K}_\gamma } \Vert \le b_\gamma . \end{aligned}$$

For a concrete gate complexity, we evaluate Theorem VI.1 on all-to-all interacting (SYK-like) models on n-qudits,

$$\begin{aligned} \varvec{H}= \sum _{\gamma } g_{\gamma } \varvec{K}_{\gamma } \quad \text {where}\quad \Vert {\varvec{K}_{\gamma }} \Vert \le J\sqrt{\frac{(k-1)!}{kn^{k-1}} },\quad \left\| {\varvec{H}} \right\| _{(global),2}^2 \le \frac{J^2n}{k^2} \quad \text {and}\\\quad \left\| {\varvec{H}} \right\| _{(local),2}^2 \le J^2 \end{aligned}$$

with \(\Gamma \le n^k/k!.\)

Corollary VI.1.2

(First-order Trotter for SYK models).

$$\begin{aligned} G&= \frac{2\sqrt{2}}{k\cdot k!}\left(n\ln (d)+\log (\textrm{e}^2/\delta )\right) \frac{n^{k+1/2} (Jt)^2}{\epsilon } \quad&\text {(worst inputs)}\\ G&= \frac{2\sqrt{2}}{k\cdot k!}\log (\textrm{e}^2/\delta ) \frac{n^{k+1/2} (Jt)^2}{\epsilon }\quad&\text {(fixed input)}. \end{aligned}$$

The proof of Theorem VI.1 is mainly controlling the integrand \(\mathcal {E}(t)\) via matrix martingale techniques, summarized in the following lemma.

Lemma VI.2

For both p-norms \({\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \cdot \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*={\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \cdot \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_p\) and \({\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \cdot \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\text {fix},p}\),

$$\begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{\mathcal {E}}(t) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{*}^2&\le 2{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{I} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_*C_p\sum ^{\Gamma -1}_{k=1}\left[ 4C_pt^2\sum ^{k-1}_{n=1}\left| {\big \Vert [\varvec{H}_n,\varvec{H}_k]\big \Vert _\infty } \right| _p^2\right. \\&\quad \left. + \left( \sum ^{k-1}_{n=1} \frac{t^2}{2}\left| {\left\Vert [\varvec{H}_n,[\varvec{H}_n,\varvec{H}_k]]\right\Vert _\infty } \right| _p \right) ^2 \right] . \end{aligned}$$

Given such a bound, we may quickly convert to the advertised estimates.

Proof of Theorem VI.1

For a total evolution time t, repeat the Trotter formula for r rounds with individual duration \(\tau = t/r\). Assuming Lemma VI.2, each round has an error

$$\begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{\mathcal {E}} (\tau ) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*^2&\le 2{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{I} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_*C_p \sum ^{\Gamma -1}_{k=1}\left[ 16C_p b^2_k\tau ^2\left\| {\varvec{H}} \right\| _{(local),2}^2+ \left( b_k\frac{\tau ^2}{2} \left\| {\varvec{H}} \right\| _{(local),2}^2 \right) ^2 \right] \nonumber \\&\le 32{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{I} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_*p^2 \tau ^2 \left\| {\varvec{H}} \right\| _{(global),2}^2 \left\| {\varvec{H}} \right\| _{(local),2}^2. \end{aligned}$$
(6.1)

The last inequality simplifies the subleading term by the crude estimate \(\frac{\tau }{4} \left\| {\varvec{H}} \right\| _{(local),2}\le 1\), which will be verified. To control the total Trotter error, integrate along time \(\tau \) and invoke a telescoping sum,

$$\begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| e^{\textrm{i}\varvec{H}t}- \varvec{S}_1(t/r)^r \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_* ={\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{\mathcal {E}}_{tot}(t) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*&\le 2\sqrt{2}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{I} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*p \frac{t^2}{r} \left\| {\varvec{H}} \right\| _{(global),2} \left\| {\varvec{H}} \right\| _{(local),2}\\=:\lambda {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{I} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*p. \end{aligned}$$

To obtain concentration, it remains to optimize the moment p for Markov’s inequality.

(i) For the spectral norm, set \({\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \cdot \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{*}= (\mathbb {E}\Vert {\cdot } \Vert _p^p)^{1/p}\)

$$\begin{aligned} \Pr (\Vert {\varvec{\mathcal {E}}_{tot}} \Vert \ge \epsilon )\le \frac{\mathbb {E}\Vert {\varvec{\mathcal {E}}_{tot}} \Vert ^p}{\epsilon ^p}&\le \frac{\mathbb {E}\left\| {\varvec{\mathcal {E}}_{tot}} \right\| _{p}^p}{\epsilon ^p} \\&\le D \left( p\frac{\lambda }{\epsilon }\right) ^p\\&\le D \exp (-\frac{\epsilon }{\lambda }+2). \end{aligned}$$

The factor of dimension \(D=\Vert {\varvec{I}} \Vert _p^p= 2^n\) is due to the trace and the offset \(+2\) accounts for the constraint \(p\ge 2\). To ensure the Trotter error is at most \(\epsilon \) with failure probability \(\delta \), we demand \(\lambda \le \tfrac{\epsilon }{\log (\textrm{e}^2 D/\delta )}\), which is

$$\begin{aligned} G = \Gamma r = 2\sqrt{2} \log (\textrm{e}^2 D/\delta ) \Gamma \frac{t^2}{\epsilon } \left\| {\varvec{H}} \right\| _{(global),2} \left\| {\varvec{H}} \right\| _{(local),2}. \end{aligned}$$

(ii) For arbitrary fixed inputs, the factor \(\log (D)\) disappears since \(\Vert {\varvec{I}} \Vert _{\text {fix},p}^p=\sup _{rank(\varvec{P})=1}\Vert {\varvec{P}} \Vert _{p}^p=1\). We arrive at the gate count

$$\begin{aligned} G =\Gamma r \ge 2\sqrt{2} \log (\textrm{e}^2/\delta ) \Gamma \frac{t^2}{\epsilon } \left\| {\varvec{H}} \right\| _{(global),2} \left\| {\varvec{H}} \right\| _{(local),2} \end{aligned}$$

which already improves over qDRIFT [9]. Lastly, for a consistency check, the choices of r in both calculations (i) and (ii) guarantee that

$$\begin{aligned} \tau ^2 \left\| {\varvec{H}} \right\| _{(local),2} \left\| {\varvec{H}} \right\| _{(local),2} \le \frac{\epsilon }{2\sqrt{2} \log (\textrm{e}^2/\delta )} \le 16, \end{aligned}$$

which is what we needed for (6.1). \(\quad \square \)

The above result for bounded random matrices quickly extends to those with Gaussian coefficients by the central limit theorem.

Proof of Corollary VI.1.1

Representing Gaussian by a sum of i.i.d. Rademachers

$$\begin{aligned} \varvec{H}_\gamma =g_\gamma \varvec{K}_\gamma = (\lim _{N\rightarrow \infty }\sum _j^N \frac{\epsilon _{\gamma ,j} }{\sqrt{N}}) \varvec{K}_\gamma :=\lim _{N\rightarrow \infty }\sum _j^N \varvec{Y}_{\gamma ,j}, \end{aligned}$$

we obtain a Hamiltonian as sum over bounded, zero mean summands

$$\begin{aligned} \varvec{H}= \sum _{\gamma =1}^\Gamma \varvec{H}_\gamma = \sum _{j=1}^N \sum _{\gamma =1}^\Gamma \varvec{Y}_{\gamma ,j}= \sum _{\gamma '=1}^{\Gamma '} \varvec{Y}_{\gamma '}:= \varvec{H}' \end{aligned}$$

where we use notation \( \varvec{H}'\) for the summand \(\varvec{Y_{\gamma '}}\). Plug in Theorem VI.1 and evaluate the 2-norm quantities

$$\begin{aligned} \Vert { \varvec{H}'} \Vert ^2_{(0),2}&=\sqrt{\sum _{\gamma } b_\gamma ^2} \\ \Vert { \varvec{H}'} \Vert _{(1),2}^2&= \max _{i} \sqrt{\sum _{\gamma : i \subset \gamma } b^2_\gamma }. \end{aligned}$$

This is the advertised result.

7.1 Proof of Lemma VI.2

It remains to prove Lemma VI.2 for the random Hamiltonian with bounded summand. We will use the martingale structure twice.

Proof

Recall

$$\begin{aligned} \varvec{\mathcal {E}} (t) = \sum ^{\Gamma }_{k=2}\left( \prod _{\gamma =k-1}^{1} \textrm{e}^{\mathcal {L}_\gamma t} [\varvec{H}_k] -\varvec{H}_k\right) \end{aligned}$$

and observe the martingale property for summandFootnote 19

$$\begin{aligned} \mathbb {E}_{k-1}\left[ \prod _{\gamma =k-1}^{1} \textrm{e}^{\mathcal {L}_\gamma t} [\varvec{H}_k] -\varvec{H}_k\right] =0 \quad \text {for each}\quad k =1, \ldots , \Gamma . \end{aligned}$$

Indeed, the terms \(\gamma =k-1,\ldots , 1\) in the exponential are independent of \(\varvec{H}_k\). By uniform smoothness, the martingale difference sequence is bounded by a sum-of-squares

$$\begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{\mathcal {E}} (t) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_*&\le {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \sum ^{\Gamma -1}_{k=2}\left( \prod _{\gamma =k-1}^{1} \textrm{e}^{\mathcal {L}_\gamma t} [\varvec{H}_k] -\varvec{H}_k \right) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2 + C_p{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \prod _{\gamma =\Gamma -1}^{1} \textrm{e}^{\mathcal {L}_\gamma t} [\varvec{H}_\Gamma ] -\varvec{H}_\Gamma \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_{*}\\&\le C_p\sum ^{\Gamma -1}_{k=1} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \prod _{\gamma =k-1}^{1} \textrm{e}^{\mathcal {L}_\gamma t} [\varvec{H}_k] -\varvec{H}_k \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_*. \end{aligned}$$

Next, we further massage each term to identify (yet another) martingale difference apart from the ’bias.’ For each k, consider a telescoping sum

$$\begin{aligned} \prod _{\gamma =k-1}^{1} \textrm{e}^{\mathcal {L}_\gamma t} [\varvec{H}_k] -\varvec{H}_k&= \sum ^{k-1}_{n=1} \prod _{\gamma =n-1}^{1} \textrm{e}^{\mathcal {L}_\gamma t}(\textrm{e}^{\mathcal {L}_{n} t}-I) [\varvec{H}_k]\\&= \sum ^{k-1}_{n=1}\Bigg ( \prod _{\gamma =n-1}^{1} \textrm{e}^{\mathcal {L}_\gamma t}(\textrm{e}^{\mathcal {L}_{n} t}-I) [\varvec{H}_k] \\ {}&- \prod _{\gamma =n-1}^{1} \textrm{e}^{\mathcal {L}_\gamma t}\mathbb {E}_{n-1}(\textrm{e}^{\mathcal {L}_{n} t}-I) [\varvec{H}_k]\Bigg ) \quad&\text {(the difference)}\\&+\sum ^{k-1}_{n=1} \prod _{\gamma =n-1}^{1} \textrm{e}^{\mathcal {L}_\gamma t}\mathbb {E}_{n-1}(\textrm{e}^{\mathcal {L}_{n} t}-I) [\varvec{H}_k]\quad&\text {(the bias)}\\&:= \sum ^{k-1}_{n=1} \varvec{D}_n + \varvec{B}_n\quad \text {where} \quad \mathbb {E}_{n-1} \varvec{D}_n =0. \end{aligned}$$

The dominant source of error comes from the martingale difference sequence \(\varvec{D}_n\), which features the desired sum-of-squares behavior. The bias term is treated later.

$$\begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \prod _{\gamma =k-1}^{1} \textrm{e}^{\mathcal {L}_\gamma t} [\varvec{H}_k] -\varvec{H}_k \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_* = {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \sum ^{k-1}_{n=1} \varvec{D}_n + \varvec{B}_n \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_*&\le 2 {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \sum ^{k-1}_{n=1} \varvec{D}_n \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_* +2 {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \sum ^{k-1}_{n=1} \varvec{B}_n \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_*\\&\le 2 C_p \sum ^{k-1}_{n=1}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{D}_n \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_* +2 {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \sum ^{k-1}_{n=1} \varvec{B}_n \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_*. \end{aligned}$$

The first inequality is elementary \((a+b)^2 \le 2a^2+2b^2\) and the last inequality uses uniform smoothness. It remains to evaluate both terms. Compute the summand of the first term

$$\begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{D}_n \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_*&= {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \prod _{\gamma =n-1}^{1} \textrm{e}^{\mathcal {L}_\gamma t}(\textrm{e}^{\mathcal {L}_{n} t}-I) [\varvec{H}_k] - \prod _{\gamma =n-1}^{1} \textrm{e}^{\mathcal {L}_\gamma t}\mathbb {E}_{n-1}(\textrm{e}^{\mathcal {L}_{n} t}-I) [\varvec{H}_k] \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*^2\\&\le {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \left( (\textrm{e}^{\mathcal {L}_{n} t}-I)- \mathbb {E}_{n-1}[e^{\mathcal {L}_{n} t}-I]\right) [\varvec{H}_k] \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*^2\\&\le 4{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \left( \textrm{e}^{\mathcal {L}_{n} t}-I\right) [\varvec{H}_k] \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*^2\\&\le 4t^2 \left| {\big \Vert [\varvec{H}_n,\varvec{H}_k]\big \Vert _\infty } \right| _p^2 {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{I} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*^2 \end{aligned}$$

where the factor of \(2^2\) is due to convexity of \( {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \cdot \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*^2\). The bias term cannot be treated as martingales; apply a crude triangle inequality

$$\begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \sum ^{k-1}_{n=1} \varvec{B}_n \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_*&= {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \sum ^{k-1}_{n=1} \prod _{\gamma =n-1}^{1} \textrm{e}^{\mathcal {L}_\gamma t}\mathbb {E}_{n-1}(\textrm{e}^{\mathcal {L}_{n} t}-I) [\varvec{H}_k] \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_*\\&\le \left( \sum ^{k-1}_{n=1} \frac{t^2}{2}\left| {\left\Vert [\varvec{H}_n,[\varvec{H}_n,\varvec{H}_k]]\right\Vert _\infty } \right| _p \right) ^2{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{I} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_*. \end{aligned}$$

Fortunately, it is at high-orders \(\mathcal {O}(t^2)\) and thus subleading. Combining the two terms yields the advertised result.

8 Preliminary: Concentration for Multivariate Polynomials

This section develops concentration inequalities for multivariate polynomials of independent random matrices. This will prepares us for the proof of higher-order Trotter error for random Hamiltonians (Sect. 8).

8.1 Scalars

For a polynomial of independent scalars, the general results are relatively new and multifaceted [27, 31, 49]. The problem is better understood for Rademachers and Gaussians, captured in the form of Hypercontractivity [25, 43]. As in Sect. 2, it relates the p-norm \(\left| {f} \right| _{p}:= ( \mathbb {E}[ \left| {f} \right| ^p] )^{1/p}\) to the 2-norm, i.e., the typical fluctuation is well-captured by the variance.

Fact VII.1

(Hypercontractivity for Rademacher polynomial [43]) Consider a degree-r polynomial of Rademachers

$$\begin{aligned} f(z_m,\ldots ,z_1)= \sum _{S\subset \{1,\cdots ,m\}} f_{S} \prod _{s\in S} z_s \quad \text {where} \quad z_s =\pm 1. \end{aligned}$$

For \(p \ge 2\),

$$\begin{aligned} \left| {f} \right| _{p} \le \left| {\sum _{S} \sqrt{C_p}^{\left| {S} \right| } f_{S} \prod _{s\in S} z_s} \right| _{2} \le \sqrt{C_p}^{r}\left| {f} \right| _{2}. \end{aligned}$$

Fact VII.2

(Hypercontractivity for a polynomial of independent Gaussians [25, Theorem 6.12]). Conisder a degree-r polynomial of i.i.d. Gaussian variables \(f(g_m,\cdots ,g_1)\). For \(p \ge 2\),

$$\begin{aligned} \left| {f} \right| _{p} \le \sqrt{C_p}^{r}\left| {f} \right| _{2}. \end{aligned}$$

We do not present the intermediate bound for the Gaussian case because it requires expansion by the orthogonal Hermite polynomials, which complicates the picture. Note that we can WLG assumed the above to have zero mean.

8.2 Matrices

Unlike the scalar cases, concentration for a multivariate polynomial of matrices is relatively unexplored; even the i.i.d. cases are fairly modern (See, e.g., [58]) and there it remains what the appropriate matrix analog quantity (such as the variance) is. For multivariate polynomials, the problem seems too general in terms of how matrices may interact with each other and how randomness is involved.

Nevertheless, we will derive concentration results that arguably match the best-known scalar results. What enables this is that we specialize in polynomials of bounded matrices with Gaussian coefficients, motivated by concrete applications in physics and quantum information (e.g., Hamiltonian with Paulis strings with Gaussian coefficients).

As we discussed in Sect. 2, we use the “local” uniform smoothness inequality recursively to derive the “global” concentration for multivariate matrix polynomials. However, the external classical randomness will require slightly different arguments and presentations. We begin with a result being essentially the analog of Hypercontractivity we showed (Proposition II.4.2). Unless otherwise noted, the norms in this section will be overloaded

$$\begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \cdot \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{*} = {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \cdot \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{p} \quad \text {or}\quad {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \cdot \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{fix,p}. \end{aligned}$$

Uniform smoothness holds for both norms (Fact V.1, Fact V.1.1).

Proposition VII.2.1

(Concentration for matrix function). For a matrix-valued function \(\varvec{F}(\varvec{X}_m,\ldots , \varvec{X}_1)\), with matrix-valued variables \(\varvec{X}_i\),

$$\begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{F}(\varvec{X}_m,\ldots , \varvec{X}_1) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_* \le \sum _{S\subset \{m,\ldots ,1\}} (C_p)^{|S|}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \prod _{s\in S}(1-\mathbb {E}_s)\prod _{s'\in S^c}(\mathbb {E}_{s'})\varvec{F}(\varvec{X}_m,\ldots , \varvec{X}_1) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_*. \end{aligned}$$

The expectation \(\mathbb {E}_s\) is associated with random matrix \(\varvec{X}_s\) and \(S^c\) denotes the complement of set S.

The proof is identical to Proposition II.4.2. Note the expectation \(\mathbb {E}_s\) should not be confused with the conditional expectation.

To give a concrete example, we take \(\varvec{F}\) to be a multi-linear function.

Corollary VII.2.1

(Multi-linear function of bounded matrices). Consider a degree r multi-linear polynomial

$$\begin{aligned} \varvec{F}(\varvec{X}_m,\ldots , \varvec{X}_1)= \sum _{i_r,\ldots , i_i}T_{\varvec{i}}\varvec{X}_{i_r}\ldots \varvec{X}_{i_1} = \sum _{S\subset \{m,\ldots ,1\}} \sum _{\varvec{i}\sim S} T_{\varvec{i}}\varvec{X}_{i_r}\ldots \varvec{X}_{i_1} \end{aligned}$$

where \(\varvec{i}=i_r,\ldots ,i_1\) denotes the tuple and \(\varvec{i}\sim S\) indicates that the indices coincide (up to relabeling) with the set \(S=\{s_r,\ldots ,s_1\}\). Suppose each argument \(\varvec{X}_i\) is an independent random matrix with zero mean \(\mathbb {E}\varvec{X}_i=0\) and bounded operator norm \(\Vert \varvec{X}_i\Vert \le b_i\). Then

$$\begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{F}(\varvec{X}_m,\ldots , \varvec{X}_1) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_*&\le (C_p)^{r}\sum _{S\subset \{m,\cdots ,1\}} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \sum _{\varvec{i}\sim S} T_{\varvec{i}}\varvec{X}_{i_r}\ldots \varvec{X}_{i_1} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_* \\&\le (C_p)^r \sum _{S\subset \{m,\ldots ,1\}} \left( b_{s_r}\ldots b_{s_1}\sum _{\varvec{i}\sim S}\left| {T_{\varvec{i}}} \right| \right) ^2 {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{I} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_*. \end{aligned}$$

Intuitively, the sum over different sets S exhibits a sum-of-squares behavior. Within each set S, the reordering of the polynomial is summed via a triangle inequality (\(\sum _{\varvec{i}\sim S}\left| {T_{\varvec{i}}} \right| \)), reflecting the fact that we are bounding the matrices \(\varvec{X}_i\) by their scalar absolute bound \(b_i\). This may seem wasteful to matrix concentration specialists but is a mild overhead for our applications.

Proof

By Proposition VII.2.1,

$$\begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \sum _{\varvec{i}}T_{\varvec{i}}\varvec{X}_{i_r}\cdots \varvec{X}_{i_1} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_*&\le \sum _{S\subset \{m,\ldots ,1\}} (C_p)^{|S|}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \prod _{s\in S}(1-\mathbb {E}_s)\prod _{s'\in S^c}(\mathbb {E}_{s'})\sum _{\varvec{i}}T_{\varvec{i}}\varvec{X}_{i_r}\cdots \varvec{X}_{i_1} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_*\\&=\sum _{S\subset \{m,\cdots ,1\}} (C_p)^{r}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \sum _{\varvec{i}}\mathbb {1}(\{i_r,\ldots , i_1\}=S)\ T_{\varvec{i}}\varvec{X}_{i_r}\ldots \varvec{X}_{i_1} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_*. \end{aligned}$$

The second line uses the multi-linearity that the expectation vanishes for \(i\in S^c\) and converts to the indicator. Lastly, we use that the norm \({\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \cdot \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*\) is operator ideal (Fact V.3) to convert to the advertised result. \(\quad \square \)

8.2.1 Deterministic matrix with Gaussian coefficients

Thus far, we have shown for a polynomial of random bounded, zero mean matrices. In physics (such as the SYK model), randomness often comes in via adding Gaussian coefficients to a deterministic matrix.

Proposition VII.2.2

Consider random matrices \(\varvec{X}, \varvec{Y}\) of the same size and a standard Gaussian g independent of the matrices \(\varvec{Y},\varvec{X}\). For \(p\ge 2\),

$$\begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{X}+g\varvec{Y} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*^2 \le {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{X} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*^2 + C_p{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{Y} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*^2 . \end{aligned}$$
(7.1)

Proof

By the central limit theorem, present Gaussian as a sum over i.i.d. Rademachers

$$\begin{aligned} g = \lim _{N\rightarrow \infty }\sum _i^N \frac{\epsilon _i }{\sqrt{N}}\quad \text {where}\quad \epsilon _i = \pm 1. \end{aligned}$$

Then, apply uniform smoothness (Fact V.1,Fact V.1.1) to the \(\epsilon _i \varvec{Y}\) yields

$$\begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{X}+g\varvec{Y} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*^2&= {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{X}+(\lim _{N\rightarrow \infty }\sum _i^N \frac{\epsilon _i }{\sqrt{N}})\varvec{Y} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*^2 = \lim _{N\rightarrow \infty }{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{X}+(\sum _i^N \frac{\epsilon _i }{\sqrt{N}})\varvec{Y} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*^2 \\&\le {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{X} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*^2 + \lim _{N\rightarrow \infty }\sum _i^N\frac{1}{N}C_p{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{Y} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*^2={\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{X} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*^2 + C_p{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{Y} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*^2. \end{aligned}$$

This is better than directly applying uniform smoothness

$$\begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{X}+g\varvec{Y} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*^2\le {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{X} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*^2 + C_p{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| g\varvec{Y} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*^2={\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{X} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*^2 + C_p\mathcal {O}(p){\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{Y} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*^2 \end{aligned}$$

where the Gaussian moments appear \(\Vert g\Vert ^2_p=\mathcal {O}(p)\). \(\quad \square \)

It is tempting to guess that the coefficient g only needs to be subgaussian, but it is not evident from the proof. At least, one still obtains comparable results if willing to sacrifice factors of p, i.e., heavier tails. Back to the discussion, as a corollary, we can upgrade the premise to allow Gaussian coefficients.

Corollary VII.2.2

(Multi-linear function of matrices with Gaussian coefficients). Consider a degree r multi-linear polynomial

$$\begin{aligned} \varvec{F}(\varvec{X}_m,\cdots , \varvec{X}_1) = \sum _{S\subset \{m,\cdots ,1\}} \sum _{\varvec{i}\sim S} T_{\varvec{i}}\varvec{X}_{i_r}\cdots \varvec{X}_{i_1}\quad \text {where}\quad \varvec{X}_i=g_i\varvec{K}_i. \end{aligned}$$

The deterministic matrices \(\Vert \varvec{K}_i\Vert \le \sigma _i\) are bounded, and the coefficients are i.i.d. standard Gaussians. Then,

$$\begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \sum _{\varvec{i}}T_{\varvec{i}}\varvec{X}_{i_r}\cdots \varvec{X}_{i_1} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_*&\le (C_p)^{r}\sum _{S\subset \{m,\cdots ,1\}} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \sum _{\varvec{i}\sim S} T_{\varvec{i}}\varvec{K}_{i_r}\cdots \varvec{K}_{i_1} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_* \\&\le (C_p)^r \sum _{S\subset \{m,\cdots ,1\}} \left( \sigma _{s_r}\cdots \sigma _{s_1}\sum _{\varvec{i}\sim S}\left| {T_{\varvec{i}}} \right| \right) ^2 {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{I} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_*. \end{aligned}$$

This is immediate from Proposition VII.2.2. For our later development, let us present another proof via the central limit theorem.

Proof

We can employ the central limit theorem mindset from the ground up. For each argument \(\varvec{X}_i\), present each Gaussian via i.i.d. Rademachers \(\epsilon _{i,j}\)

$$\begin{aligned} \varvec{X}_i=g_i\varvec{K}_i = \left(\lim _{N\rightarrow \infty }\sum _j^N \frac{\epsilon _{i,j} }{\sqrt{N}}\right) \varvec{K}_i:=\lim _{N\rightarrow \infty }\sum _j^N \varvec{Y}_{i,j}. \end{aligned}$$

Then, the function

$$\begin{aligned} \sum _{\varvec{i}}T_{\varvec{i}}\varvec{X}_{i_r}\cdots \varvec{X}_{i_1} = \sum _{j_r,\cdots ,j_1}^N \sum _{\varvec{i}}T_{\varvec{i}} \varvec{Y}_{i_r,j_r}\ldots \varvec{Y}_{i_1,j_1} =h\left(\varvec{Y}_{m,N},\ldots ,\varvec{Y}_{1,N},\ldots ,\varvec{Y}_{1,N},\ldots ,\varvec{Y}_{1,1} \right) \end{aligned}$$

is again a multi-linear. By Corollary VII.2.1,

$$\begin{aligned}&{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \sum _{j_r,\cdots ,j_1}^N \sum _{\varvec{i}}T_{\varvec{i}} \varvec{Y}_{i_r,j_r}\cdots \varvec{Y}_{i_1,j_1} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_* \\ {}&\le (C_p)^{r}\sum _{S'\subset \{mN,\cdots ,1\}} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \sum _{\varvec{j}\varvec{i}}\mathbb {1}((\varvec{i,j})\sim S')T_{\varvec{i}} \varvec{Y}_{i_r,j_r}\cdots \varvec{Y}_{i_1,j_1} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_* \\&\le (C_p)^r \sum _{S\subset \{m,\cdots ,1\}}\sum _{j_{s_r}}\cdots \sum _{j_{s_1}} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \sum _{\varvec{i}\sim S}T_{\varvec{i}}\varvec{Y}_{i_r,j_{i_r}}\cdots \varvec{Y}_{i_1,j_{i_1}} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_* \\&\le (C_p)^r \sum _{S\subset \{m,\cdots ,1\}} \left( \sigma _{s_r}\cdots \sigma _{s_1}\sum _{\varvec{i}\sim S}\left| {T_{\varvec{i}}} \right| \right) ^2 {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{I} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_*. \end{aligned}$$

The second inequality relabels the subset \(S'\) by \(S\subset \{m,\ldots ,1\}\) and \(j_{s_r}\) for each element \(s_r\). Once fixing the pairs \((s_r,j_{s_r}),\cdots (s_1, j_{s_1})\), the index j is a function of the index i and hence we only need to look for reordering of indices \(i_r,\cdots i_1\). Also note that the coefficients \(T_{\varvec{i}}\) do not depend on the indices \(\varvec{j}\). \(\quad \square \)

8.2.2 Beyond multi-linear function

The story was clean and straightforward for multi-linear functions, but the function arising from Trotter error can well be non-multi-linear. With careful accounting, we will derive more general results for bounded matrices (Corollary VII.2.3) and matrices with Gaussian coefficients (Theorem VII.3). The bound is qualitatively the same as the multi-linear case but will require heavier notation to account for repeated terms (i.e., terms that show up more than twice). The bounds are analogous to the best-known scalar results [49, Theorem 1.4].

Corollary VII.2.3

(Polynomial of bounded matrices). Consider a multivariate polynomial (with potentially repeated indices \(i_a=i_{a'}\)) with zero mean, bounded, independent matrix arguments

$$\begin{aligned} \varvec{F}(\varvec{X}_m,\ldots , \varvec{X}_1)= \sum _{i_r,\ldots , i_i}T_{i_r,\ldots , i_1}\varvec{X}_{i_r}\ldots \varvec{X}_{i_1} \quad \text {where} \quad \mathbb {E}\varvec{X}_i=0, \Vert \varvec{X}_i\Vert \le b_i. \end{aligned}$$

For \(p \ge 2\),

$$\begin{aligned}&{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{F}(\varvec{X}_m,\ldots , \varvec{X}_1) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_* \\&\quad \le \sum _{S\subset \{m,\ldots ,1\}} (4C_p)^{|S|}\left( \sum _{\textrm{Supp}(\varvec{u})=S}b_{1}^{u_1}\cdots b_{m}^{u_m}\sum _{\begin{array}{c} \textrm{Supp}(\varvec{v})=S^c, \\ v_i\ne 1 \end{array}} b_{1}^{v_1}\cdots b_{m}^{v_m}\sum _{\pi }|T_{\pi (\varvec{u},\varvec{v})}| \right) ^2{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{I} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_* \end{aligned}$$

where \(\pi \) enumerates reordering of polynomial \(\varvec{X}^{u_1+v_1}_{1}\cdots \varvec{X}^{u_m+u_1}_{m}\).

In other words, as usual, we have sum-of-square behavior across different sets S. For each set S,

  1. 1.

    select the powers \(u_1,\ldots u_m = 0,1,\ldots \) such that its support gives the set S,

  2. 2.

    select the powers \(v_1,\ldots , v_m = 0,2,\ldots \) such that its support gives the complement \(S^c\), and

  3. 3.

    enumerate the reorderings \(\pi \) of the non-commutative polynomial.

The takeaway for this calculation is that (1) the larger set |S| corresponds to a heavier tail \(C_p^{|S|}\), and (2) the dominating contribution often comes from larger sets S (if we fix the total degree and grow the number of summands). There, unevenly distributed values of powers \(v_i\gg 2\) and \(v_i\gg 1\) suppress the combinations of other possible \(u_i,v_i\).

Proof

By Proposition VII.2.1,

$$\begin{aligned}&{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \sum _{\varvec{i}}T_{\varvec{i}}\varvec{X}_{i_r}\ldots \varvec{X}_{i_1} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_* \le \sum _{S\subset \{m,\ldots ,1\}} (C_p)^{|S|}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \prod _{s\in S}(1-\mathbb {E}_s)\prod _{s'\in S^c}(\mathbb {E}_{s'})\sum _{\varvec{i}}T_{\varvec{i}}\varvec{X}_{i_r}\ldots \varvec{X}_{i_1} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_p\\&= \sum _{S\subset \{m,\ldots ,1\}} (C_p)^{|S|}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \sum _{\varvec{i}}T_{\varvec{i}}\left( \prod _{s\in S}(1-\mathbb {E}_s)\sum _{\textrm{Supp}(\varvec{u})=S}\right) \left( \prod _{s'\in S^c}(\mathbb {E}_{s'})\sum _{\begin{array}{c} \textrm{Supp}(\varvec{v})=S^c,\\ v_i\ne 1 \end{array}} \right) \mathbb {1}(\varvec{i}\sim (\varvec{u},\varvec{v}) )\varvec{X}_{i_r}\cdots \varvec{X}_{i_1} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_p\\&=\sum _{S\subset \{m,\ldots ,1\}} (C_p)^{|S|}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \sum _{\textrm{Supp}(\varvec{u})=S} \sum _{\begin{array}{c} \textrm{Supp}(\varvec{v})=S^c,\\ v_i\ne 1 \end{array}}\sum _{\pi } T_{\pi (\varvec{u},\varvec{v})}\pi \bigg ((\varvec{X}_{1}^{u_1}-\mathbb {E}\varvec{X}_{1}^{u_1})\cdots (\varvec{X}_{m}^{u_m}-\mathbb {E}\varvec{X}_{m}^{u_m})\mathbb {E}\varvec{X}_{1}^{v_1}\cdots \mathbb {E}\varvec{X}_{m}^{v_m} \bigg ) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_*. \end{aligned}$$

The second equality inserts indicators for powers in polynomial (up to reordering) \(\varvec{X}^{u_1}_{1}\cdots \varvec{X}^{u_m}_{m}\varvec{X}^{v_1}_{1}\cdots \varvec{X}^{v_m}_{m}\), with the array \(\varvec{v}\) for the powers of elements in S and the array \(\varvec{u}\) for elements in the complement \(S^c\). The third equality evaluates the expectations to give the constraint \(v_i \ne 1\) and denote by \(\pi \) the reordering of the non-commutative polynomial. Lastly, we use \({\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \cdot \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*\) being operator ideal (Fact V.3) to convert to bounds on the individual spectral norm. The factor \(4^{|S|}\) is due to the crude estimate \(\Vert {\varvec{X}^u-\mathbb {E}\varvec{X}^u} \Vert \le 2^{u} \Vert {\varvec{X}^u} \Vert \). This is the advertised result. \(\square \)

Next, we will use the central limit theorem to upgrade to Gaussian variables.

Theorem VII.3

(Polynomial of matrices with Gaussian coefficients). Consider a multivariate polynomial (with potentially repeated indices \(i_a=i_{a'}\)) where each argument is a bounded matrix with an i.i.d. standard Gaussian coefficient

$$\begin{aligned} \varvec{F}(\varvec{X}_m,\ldots , \varvec{X}_1)= \sum _{i_r,\ldots , i_i}T_{i_r,\ldots , i_1}\varvec{X}_{i_r}\cdots \varvec{X}_{i_1} \quad \text {where} \quad \varvec{X}_i=g_i\varvec{K}_i\quad \text {and}\quad \Vert \varvec{K}_i\Vert \le \sigma _i. \end{aligned}$$

For \(p \ge 2\),

$$\begin{aligned}&{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{F}(\varvec{X}_m,\ldots , \varvec{X}_1) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_* \\&\quad \le \sum _{\varvec{u}} (C_p)^{|\varvec{u}|} \left( \sum _{\varvec{v}} \sigma _{1}^{u_1+2v_1} \cdots \sigma _{m}^{u_m+2v_m} \sum _{\pi } |T_{\pi (\varvec{u},\varvec{v})}|w(\varvec{u},\varvec{v})\right) ^2 {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{I} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_* \quad \\ {}&\text {where} \quad \left| {\varvec{u}} \right| := \sum _{i=1}^{m} u_i. \end{aligned}$$

The \(\pi (\varvec{u},\varvec{v})\) enumerates the reorderings of polynomial \(\varvec{X}^{u_1+2v_1}_1\cdots \varvec{X}^{u_m+2v_m}_m\) and

$$\begin{aligned} w(\varvec{u},\varvec{v})= \prod _{i=1}^{m} \frac{(u_i+2v_i-1)!!}{(u_i-1)!!} \le \prod _{i=1}^{m} (u_i+2v_i)^{v_i}. \end{aligned}$$

Let us parse the expression.

  1. 1.

    The powers \(u_1,\ldots u_m = 0,1,\ldots \) are summed incoherently; the sum \(\left| {\varvec{u}} \right| =\sum _i u_i\) determines the power \(C_p^{\left| {\varvec{u}} \right| }\).

  2. 2.

    In the square, fill in the remaining powers \(v_1,\ldots , v_m = 0,2,\ldots \) and sum over them.

  3. 3.

    Enumerate the reorderings \(\pi (\varvec{u},\varvec{v})\) of the non-commutative polynomial. The prefactor \(w(\varvec{u},\varvec{v})\) comes from Wick contractions.

We see a sum over squares \(\sum _{\varvec{u}}\), but unlike the multi-linear case, we also allow the same term to have a larger power \(\sigma _1^{u_1}\). This is because of the Gaussian tails. One may be concerned about the sum \(\sum _{\varvec{v}}\) inside the square, but looking more carefully, we see each term is already squared \(\sigma _i^{2v_i}\).

Intuitively, the RHS (without the coefficients \(C_p\)) can be interpreted as a variance proxy.

Proposition VII.3.1

(Variance Proxy).

$$\begin{aligned} \sum _{\varvec{u}} \left( \sum _{\varvec{v}} \sigma _{1}^{u_1+2v_1} \cdots \sigma _{i}^{u_i+2v_i} \sum _{\pi } |T_{\pi (\varvec{u},\varvec{v})}|w(\varvec{u},\varvec{v})\right) ^2 = \mathbb {E}\left[ \left(\sum _{\varvec{i}}\left| {T_{\varvec{i}}} \right| x_{i_r}\cdots x_{i_1}\right)^2 \right] \end{aligned}$$
(7.2)

where \(x_i (= g_i\sigma _i)\) are scalar Gaussians with variance \(\sigma _i^2\).

This essentially matches the scalar results (Fact VII.2) up to the reordering and absolute values \(\left| {T_{\varvec{i}}} \right| \).Footnote 20 Roughly speaking, from Gaussian scalars to matrices (for a bounded matrix \(\Vert {\varvec{K}_i} \Vert \le \sigma _i\))

$$\begin{aligned} x_i=g_i\sigma _i \rightarrow \varvec{X}_i=g_i\varvec{K}_i, \end{aligned}$$

uniform smoothness tells us that the analogous bound holds; we only need to estimate the variance proxy.

Proof of Theorem VII.3

Let us painfully employ the central limit theorem as in the proof of Proposition VII.2.2. Recall

$$\begin{aligned} \varvec{X}_i=g_i\varvec{K}_i = \left(\lim _{N\rightarrow \infty }\sum _j^N \frac{\epsilon _{i,j} }{\sqrt{N}}\right) \varvec{K}_i:=\lim _{N\rightarrow \infty }\sum _j^N \varvec{Y}_{i,j}. \end{aligned}$$

By Corollary VII.2.3,

$$\begin{aligned}&{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \sum _{j_r,\cdots ,j_1}^N \sum _{\varvec{i}}T_{\varvec{i}} \varvec{Y}_{i_r,j_r}\cdots \varvec{Y}_{i_1,j_1} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_*\nonumber \\&{\mathop {\le }\limits ^{N\rightarrow \infty }} \sum _{\varvec{u}} (C_p)^{|\varvec{u}|} \sum _{\varvec{j}_1}\cdots \sum _{\varvec{j}_m}\bigg \vert \hspace{-1.0625pt}\bigg \vert \hspace{-1.0625pt}\bigg \vert \sum _{\pi ,\varvec{v}} \sum _{\varvec{k}_1}\cdots \\&\quad \sum _{\varvec{k}_m} T_{\pi ((\varvec{u},\varvec{j}),(\varvec{v},\varvec{k}))}\pi \bigg ((\varvec{Y}_{1,j_{1,1}}\cdots \varvec{Y}_{1,j_{1,u_1}}) \cdots (\varvec{Y}_{m,j_{m,1}}\cdots \varvec{Y}_{m,j_{m,u_m}})\nonumber \\&\quad (\mathbb {E}\varvec{Y}_{1,k_{1,1}}^{2}\cdots \mathbb {E}\varvec{Y}_{1,k_{1,v_m}}^{2})\cdots (\mathbb {E}\varvec{Y}_{m,k_{m,1}}^{2}\cdots \mathbb {E}\varvec{Y}_{m,k_{m,v_m}}^{2}) \bigg )\bigg \vert \hspace{-1.0625pt}\bigg \vert \hspace{-1.0625pt}\bigg \vert ^2_* \\&=: \sum _{\varvec{u}} (C_p)^{|\varvec{u}|} \sum _{\varvec{j}_1}\cdots \sum _{\varvec{j}_m} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \sum _{\pi ,\varvec{v}} \sum _{\varvec{k}_1}\cdots \sum _{\varvec{k}_m} T_{\pi ((\varvec{u},\varvec{j}),(\varvec{v},\varvec{k}))}\pi \left( \varvec{Y}_{1}^{(u_1)} \cdots \varvec{Y}_{m}^{(u_m)} (\mathbb {E}\varvec{Y}_{1}^{2})^{(v_1)}\cdots (\mathbb {E}\varvec{Y}_{m}^{2})^{(v_m)} \right) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_*\\&=\sum _{\varvec{u}} (C_p)^{|\varvec{u}|} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \sum _{\pi ,\varvec{v}} T_{\pi (\varvec{u},\varvec{v})}w(\varvec{u},\varvec{v})\pi \left( \varvec{K}_{1}^{u_1+2v_1} \cdots \varvec{K}_{m}^{u_m+2v_m} \right) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_*\\&\le \sum _{\varvec{u}} (C_p)^{|\varvec{u}|} \left( \sum _{\varvec{v}}\sigma _{1}^{u_1+2v_1} \cdots \sigma _{m}^{u_m+2v_m} \sum _{\pi } |T_{\pi (\varvec{u},\varvec{v})}|w(\varvec{u},\varvec{v})\right) ^2 {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{I} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_*. \end{aligned}$$

Importantly, in the large N limit for the central limit theorem, the only possible contribution would be the linear terms (e.g., \(\varvec{Y}_{1,j_{1,1}}\)) and the expected squares (e.g., \(\mathbb {E}\varvec{Y}_{1,j_{1,1}} ^2\)); any cubic term (e.g., \(\varvec{Y}_{1,j_{1,1}}^3\)) is subleading in 1/N. The array \(\varvec{u}=u_1,\ldots ,u_m\) collects the number of duplicates \(u_i\) of each argument \(\varvec{Y}_1,\ldots , \varvec{Y}_m\) and contributes to \((C_p)^{\left| {\varvec{u}} \right| }\). Given the array \(\varvec{u}\), the sum \(\sum _{\varvec{j_1}}:=\sum _{j_{1,1}}\ldots \sum _{j_{1,u_1}}\) runs through the duplicates \(\varvec{Y}_{1,j_{1,1}}\ldots \varvec{Y}_{1,j_{1,u_1}}\) of the term \(\varvec{Y}_1\).

Second, we compress the notation by grouping duplicates into the exponent. Third, we get rid of duplicates by summing over indices \(\varvec{j},\varvec{k}\) and also drop the Rademachers. The function \(w(\varvec{u},\varvec{v})\) counts the number of ordering \(\varvec{Y}_1^{(u)}(\mathbb {E}\varvec{Y}_1^2)^{(v)}\) from \(\varvec{X}^{u+2v}_1\)

$$\begin{aligned} w(u,v)&:= \left( {\begin{array}{c}u+2v\\ u\end{array}}\right) \cdot (2v)!! \le 2^{u+2v}(2v)^{v}\\ w(\varvec{u},\varvec{v})&:= w(u_1,v_1)\cdots w(u_m,v_m). \end{aligned}$$

The binomial coefficient \(\left( {\begin{array}{c}u+2v\\ u\end{array}}\right) \) counts the locations of \(\varvec{Y}_1^{(u)}\), and (2v)!! are precisely the Wick contractions for \((\mathbb {E}\varvec{Y}_1^2)^{(v)}\). This is the advertised result. \(\quad \square \)

Proof of Proposition VII.3.1

To see why the RHS is the advertised variance proxy (7.2), we apply the two-point inequality (Fact I.3) to the variance proxy. At \(p=2\) (which gives \(C_p=1\)) the two-point inequality becomes an equality, which implies equality for a scalar version of Proposition VII.2.1. Following the above chain of equations yields the desired expression. \(\quad \square \)

9 Random k-Local Hamiltonians

In this section, we consider k-local Hamiltonians drawn from an ensemble (such as the SYK models) where the terms are bounded matrices with standard Gaussian coefficients \(g_{\gamma }\)

$$\begin{aligned} \varvec{H}= \sum _{\gamma =1}^\Gamma \varvec{H}_\gamma = \sum _{\gamma =1}^\Gamma g_\gamma \varvec{K}_\gamma \quad \text {where} \quad \mathbb {E}[g_\gamma ^2]=1 \quad \text {and} \quad \Vert {\varvec{K}_\gamma } \Vert \le b_\gamma . \end{aligned}$$

We apply matrix concentration inequalities to its Trotter error (Sect. 7). Recall the local quantities

$$\begin{aligned} \left\| {\varvec{H}} \right\| _{(global),2} := \sqrt{\sum _{\gamma } b_\gamma ^2} \quad \text {and}\quad \left\| {\varvec{H}} \right\| _{(local),2}&:= \max _{i} \sqrt{\sum _{\gamma : i \subset \gamma } b^2_\gamma } \quad \text {where}\quad b_{\gamma }:= \Vert {\varvec{K}_{\gamma }} \Vert . \end{aligned}$$

Theorem VIII.1

(Trotter error for random Hamiltonians) Simulating random k-local models with Gaussian coefficients via \(2\ell \)-th order Suzuki formulas, the gate count

$$\begin{aligned} G&= \Omega \left[ \left\| {\varvec{H}} \right\| _{(local),2} \sqrt{n+\log (1/\delta ) }t \cdot \left(\frac{\left\| {\varvec{H}} \right\| _{(global),2}^2\sqrt{n+\log (1/\delta ) } t}{\left\| {\varvec{H}} \right\| _{(local),2} \epsilon } \right)^{\frac{1}{\ell }} \right] \quad \!\!\!\text {for}\quad \! \ell \text { even}, \\ G&= \Omega \left[ \left\| {\varvec{H}} \right\| _{(local),2} \sqrt{n+\log (1/\delta ) }t \cdot \Bigg ( \left(\frac{\left\| {\varvec{H}} \right\| _{(global),2}\sqrt{n+\log (1/\delta ) } t}{ \epsilon } \right)^{\frac{1}{\ell }}\right. \\&\left. +\left(\frac{\left\| {\varvec{H}} \right\| _{(global),2}^2\sqrt{n+\log (1/\delta ) } t}{\left\| {\varvec{H}} \right\| _{(local),2} \epsilon } \right)^{\frac{1}{\ell +1}}\Bigg ) \right] \quad \text {for}\quad \ell \text { odd}\\&\hspace{2cm}\text {ensures}\quad \Pr \left(\Vert {e^{\textrm{i}\varvec{H}t}- \varvec{S}(t/r)^r} \Vert \ge \epsilon \right) \le \delta \quad (\text {worst inputs}). \end{aligned}$$

The probability \(\Pr (\cdot )\) arises from the random Hamiltonian ensemble. For fixed but arbitrary input state \(\varvec{\rho }\), the gate count

$$\begin{aligned} G&= \Omega \left[ \left\| {\varvec{H}} \right\| _{(local),2} \sqrt{\log (1/\delta ) }t \cdot \left(\frac{\left\| {\varvec{H}} \right\| _{(global),2}^2\sqrt{\log (1/\delta ) } t}{\left\| {\varvec{H}} \right\| _{(local),2} \epsilon } \right)^{\frac{1}{\ell }} \right] \quad \text {for}\quad \ell \text { even}, \\ G&= \Omega \left[ \left\| {\varvec{H}} \right\| _{(local),2} \sqrt{\log (1/\delta ) }t \cdot \Bigg ( \left(\frac{\left\| {\varvec{H}} \right\| _{(global),2}\sqrt{\log (1/\delta ) } t}{ \epsilon } \right)^{\frac{1}{\ell }}\right. \\ {}&+\Bigg (\frac{\left\| {\varvec{H}} \right\| _{(global),2}^2\sqrt{\log (1/\delta ) } t}{\left\| {\varvec{H}} \right\| _{(local),2} \epsilon } \Bigg )^{\frac{1}{\ell +1}}\Bigg ) \\&\text {ensures}\ \ \Pr \left( \frac{1}{2}\left\| {\textrm{e}^{-\textrm{i}\varvec{H}t}\varvec{\rho }\textrm{e}^{\textrm{i}\varvec{H}t}- \varvec{S}(t/r)^{\dagger r}\varvec{\rho }\varvec{S}(t/r)^r} \right\| _{1} \ge \epsilon \right) \le \delta \qquad (\text {fixed inputs}). \end{aligned}$$

This is similar but different from the non-random k-local results (Theorem II.1): when the Hamiltonian is random, an arbitrary fixed input \(\varvec{\rho }\) already displays a 2-norm scaling; even the worst input states that may correlate with the Hamiltonian (the Gibbs state or the ground state) enjoys concentration with a price of \(\sqrt{n}\). More carefully, the concentration here is stronger: the dependence on the failure probability \((\log (1/\delta ))\) has a lower power because of the many independent Gaussians. However, the factor in \((\cdot )^{1/\ell }\) is slightly worse (which will be suppressed at large \(\ell \), anyway).

The proof strategy is the same Taylor expansion as in Sect. 3.2 but with different norms. Recall the error

$$\begin{aligned} \varvec{\mathcal {E}}= \sum _{g=\ell +1}^{g'-1} \varvec{\mathcal {E}}_g + \varvec{\mathcal {E}}_{\ge g'}. \end{aligned}$$

The two error terms are combined in Sect. 8.3. We also present an argument for lower bounds at short times (Sect. 9.2).

9.1 Bounds on the g-th order

We proceed by controlling each g-th order polynomial (for \( \ell< g < g'\)) by Theorem VII.3 for \(\sigma _\gamma := \Vert {\varvec{H}_{\gamma }} \Vert \)

$$\begin{aligned}&{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{\mathcal {E}}_g \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*^2 = {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \sum ^{J}_{j=1} \sum _{g_J+\cdots +g_{j+1}=g-1}\mathcal {L}^{g_J}_J\cdots \mathcal {L}^{g_{j+1}}_{j+1} [\varvec{H}_{j+1}]\frac{t^{g-1}}{g_J!\cdots g_{j+1}!} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*^2\nonumber \\&\quad \le (2t)^{2(g-1)} \sum _{\varvec{u}} (C_p)^{|\varvec{u}|} \left( \sum _{\varvec{v}} \sigma _{1}^{u_1+2v_1} \cdots \sigma _{\Gamma }^{u_\Gamma +2v_\Gamma }w(\varvec{u},\varvec{v}) \sum _{j=1}^{J}\sum _{\varvec{g}} \frac{\mathbb {1}(\varvec{g}\sim \varvec{u}+2\varvec{v})}{g_J!\cdots g_{j+1}!} \right) ^2\nonumber \\ {}&{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{I} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_*. \end{aligned}$$
(8.1)

The indicator \(\mathbb {1}(\varvec{g}\sim \varvec{u}+2\varvec{v})\) (1) keeps track of the occurrences \(\varvec{u},\varvec{v}\) of the Hamiltonian terms \(\varvec{H}_\gamma \) of any ordering \(\mathcal {L}^{g_J}_J\cdots \mathcal {L}^{g_{j+1}}_{j+1} [\varvec{H}_{j+1}]\) and (2) enforces the commutation constraint, i.e., it returns zero if some term \(\mathcal {L}\) commutes through all terms on its right. The factor 2 is due to the commutator with coefficient \(a_j\mathcal {L}_j\) and the uniform bound \(\left| {a_j} \right| \le 1\).

This scalar sum (8.1) can be numerically evaluated to get explicit gate counts for the particular systems. To get analytic estimates, we proceed with the combinatorics. First, as in (3.2), throw in extra terms to count the symmetrized sum \(\varvec{\gamma }\) instead of the product-formula-dependent \(\varvec{g}\)

$$\begin{aligned} \sum _{\varvec{g}}\frac{1}{g_J!\cdots g_{j+1}!} \mathcal {L}^{g_J}_J\cdots \mathcal {L}^{g_{j+1}}_{j+1}[\varvec{H}_{j+1}] \rightarrow \sum _{\gamma _{g-1}=1}^\Gamma \cdots \sum _{\gamma _0=1}^\Gamma \mathcal {L}_{\gamma _{g-1}} \cdots \mathcal {L}_{\gamma _1}[\varvec{H}_{\gamma _0}]. \end{aligned}$$

This yields

$$\begin{aligned} (cont.)\ {}&\le {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{I} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2_*(2\Upsilon )^2 (2\Upsilon t)^{2(g-1)}\nonumber \\ {}&\sum _{\varvec{u}} (C_p)^{|\varvec{u}|} \left( \sum _{\varvec{v}} \sigma _{1}^{u_1+2v_1} \cdots \sigma _{\Gamma }^{u_\Gamma +2v_\Gamma }w(\varvec{u},\varvec{v}) \sum _{\gamma _{g-1}=1}^\Gamma \cdots \sum _{\gamma _0=1}^\Gamma \mathbb {1}(\varvec{\gamma }\sim \varvec{u}+2\varvec{v}) \right) ^2 \nonumber \\&\le (\cdot ) \sum _{|\varvec{u}|=0}^g \bigg [\sum _{\varvec{u}} (C_p)^{|\varvec{u}|} \sigma _{1}^{2u_1} \cdots \sigma _{\Gamma }^{2u_\Gamma }\sum _{\varvec{v}} \sigma _{1}^{2v_1} \cdots \sigma _{\Gamma }^{2v_\Gamma } w(\varvec{u},\varvec{v})\nonumber \\ {}&\sum _{\varvec{\gamma }} \mathbb {1}(\varvec{\gamma }\sim \varvec{u}+2\varvec{v}) \end{aligned}$$
(8.2)
$$\begin{aligned}&\quad \cdot \max _{\varvec{u}} \left( \sum _{\varvec{v'}} \sigma _{1}^{2v'_1} \cdots \sigma _{\Gamma }^{2v'_\Gamma } w(\varvec{u},\varvec{v'})\sum _{\varvec{\gamma }} \mathbb {1}(\varvec{\gamma }\sim \varvec{u}+2\varvec{v'}) \right) \bigg ]. \end{aligned}$$
(8.3)

The second inequality is Holder’s for the sum over \(\varvec{u}\). For the first term (8.2),

$$\begin{aligned}&\sum _{\varvec{u}} (C_p)^{|\varvec{u}|} \sigma _{1}^{2u_1} \cdots \sigma _{\Gamma }^{2u_\Gamma }\sum _{\varvec{v}} \sigma _{1}^{2v_1} \cdots \sigma _{\Gamma }^{2v_\Gamma } w(\varvec{u},\varvec{v}) \sum _{\varvec{\gamma }} \mathbb {1}(\varvec{\gamma }\sim \varvec{u}+2\varvec{v}) \nonumber \\&\hspace{2cm}\le g^{|\varvec{v}|}\cdot g^{|\varvec{v}|}\cdot (C_p)^{g} \cdot \sum _{\varvec{u}} \sigma _{1}^{2u_1} \cdots \sigma _{\Gamma }^{2u_\Gamma }\sum _{\varvec{v}} \sigma _{1}^{2v_1} \cdots \sigma _{\Gamma }^{2v_\Gamma } \nonumber \\ {}&\sum _{\varvec{\gamma '}} \mathbb {1}(\varvec{\gamma '}\sim \varvec{u}+\varvec{v}) \nonumber \\&\hspace{2cm}\le g^{g}\cdot (C_p)^{g} \left( gk \left\| {\varvec{H}} \right\| _{(local),2}^2\right) ^{g-1-|\varvec{v}|} \left\| {\varvec{H}} \right\| _{(global),2}^2. \end{aligned}$$
(8.4)

The first inequality passes \(\varvec{\gamma }\) to \(\varvec{\gamma '}\) by assigning pairings of \(\varvec{v}\) (which is bounded by \(g^{|\varvec{v}|}\)); this divides the \(\varvec{v}\) occurences by two. The other factor comes from crude uniform estimates \(w(\varvec{u},\varvec{v}) \le 2^g g^{|\varvec{v}|}\). The second inequality uses the bound \(2\left| {\varvec{v}} \right| \le g\) to combine the sum over \(\varvec{u}\) and \(\varvec{v}\) and evaluates this chain of commutators (as in [18], but here we uses the 2-norm \(\left\| {\varvec{H}} \right\| _{(local),2}\) instead of 1-norm \(\left\| {\varvec{H}} \right\| _{(local),1}\)). We used a crude bound \(( g(k-1)+1 )\le gk\) for the locality of commutators.

For the second term (8.3),

$$\begin{aligned}&\max _{\varvec{u}} \left( \sum _{\varvec{v'}} \sigma _{1}^{2v'_1} \cdots \sigma _{\Gamma }^{2v'_\Gamma } w(\varvec{u},\varvec{v'})\sum _{\varvec{\gamma }} \mathbb {1}(\varvec{\gamma }\sim \varvec{u}+2\varvec{v'}) \right) \\&\quad \le 2^g g^{|\varvec{v}|}\cdot g^{|\varvec{u}|} g^{|\varvec{v}|}\cdot \max _{\varvec{u}} \left( \sum _{\varvec{v'}} \sigma _{1}^{2v'_1} \cdots \sigma _{\Gamma }^{2v'_\Gamma }\sum _{\varvec{\gamma ''}} \mathbb {1}(\varvec{\gamma ''}\sim \varvec{v'}) \right) \\&\quad \le (2g)^g\cdot \left( gk\left\| {\varvec{H}} \right\| _{(local),2}^2 \right) ^{|\varvec{v}|} \left(\frac{\left\| {\varvec{H}} \right\| _{(global),2}^2}{gk\left\| {\varvec{H}} \right\| _{(local),2}^2}\right) ^{\mathbb {1}(|\varvec{u}|=0)}. \end{aligned}$$

Again, in passing \(\varvec{\gamma }\) to \(\varvec{\gamma ''}\) we first select the locations \(\varvec{u}\), and then the pairings over \(\varvec{v}\). Given a non-empty set of terms in \(\varvec{u}\), each terms \(\varvec{v}\) needs to chain together, giving the factor \(gk\left\| {\varvec{H}} \right\| _{(local),2}^2\). For the edge case \(|\varvec{u}|=0\), we obtain one additional factor \(\left\| {\varvec{H}} \right\| _{(global),2}^2\) since we lose the chaining constraints from \(\varvec{u}\). Altogether, summing over \(\sum _{\left| {u} \right| }\) gives additional factor \(g+1\le 2g\)

$$\begin{aligned}&{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| [\varvec{\mathcal {E}} (\varvec{H}_1,\ldots ,\varvec{H}_\Gamma ,t)]_g \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*\le \sqrt{C_p}^g {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{I} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*2g^{3g/2}\cdot \left(4\sqrt{k} \Upsilon \left\| {\varvec{H}} \right\| _{(local),2} t \right)^{g} \cdot \nonumber \\&\quad {\left\{ \begin{array}{ll} \displaystyle \frac{\left\| {\varvec{H}} \right\| _{(global),2}^2}{tk\left\| {\varvec{H}} \right\| _{(local),2}^2} \quad g\text { even}\\ \displaystyle \sqrt{g}\frac{\left\| {\varvec{H}} \right\| _{(global),2}}{t\sqrt{k}\left\| {\varvec{H}} \right\| _{(local),2}} \quad g\text { odd} \end{array}\right. } \end{aligned}$$
(8.5)

for order \(g\ge \ell +1 \ge 3\). On the other hand, for the first order Trotter (\(\ell =1\)) we have a sharper bound (Theorem VI.1).

9.2 Bounds for \(g'\)-th order and beyond

In this section, we handle the edge case. Its presentation will suffer from adhoc prefactors, but fortunately, they will not impact the ultimate performance. The expression has an integral so we first remove it via unitary invariance of \({\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \cdot \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*\) norm. Then, we may plug the polynomial into Theorem VII.3.

$$\begin{aligned}&{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| [\varvec{\mathcal {E}} (\varvec{H}_1,\ldots ,\varvec{H}_\Gamma ,t)]_{\ge g} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_* \\&= {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \sum ^{J}_{j=1} \sum _{m=j+1}^{J} \textrm{e}^{\mathcal {L}_J t}\cdots \textrm{e}^{\mathcal {L}_{m+1} t} \int _0^t dt_1 \sum _{g_m+\cdots +g_{j+1}=g-1,g_m\ge 1}e^{\mathcal {L}_{m} t_1} \mathcal {L}^{g_m}_m\cdots \mathcal {L}^{g_{j+1}}_{j+1}[\varvec{H}_{j}] \frac{(t-t_1)^{g_m-1}t^{g'-g_m-1}}{(g_m-1)!\cdots g_{j+1}!} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*\\&\le \sum _{m=2}^{J} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \sum ^{J}_{j=m-1} \sum _{g_m+\cdots +g_{j+1}=g-1,g_m\ge 1}\mathcal {L}^{g_m}_m \cdots \mathcal {L}^{g_{j+1}}_{j+1}[\varvec{H}_{j}] \frac{t^{g-1}}{g_m!\cdots g_{j+1}!} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_* \\&\le {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{I} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*2\Upsilon (2\Upsilon t)^{g-1} \sum _{m=2}^{J}\\ {}&\sqrt{\sum _{\varvec{u}=0}^{g} (C_p)^{|\varvec{u}|} \left( \sum _{\varvec{v}} \sigma _{1}^{u_1+2v_1} \cdots \sigma _{\Gamma }^{u_\Gamma +2v_\Gamma }w(\varvec{u},\varvec{v}) \sum _{\gamma _{g-1}=1}^\Gamma \cdots \sum _{\gamma _0=1}^\Gamma \mathbb {1}(\varvec{\gamma }\sim \varvec{u}+2\varvec{v}, \gamma _{g-1}=\gamma (m)) \right) ^2 }. \end{aligned}$$

We have a similar expression except for the constraint \(\gamma _{g-1}=\gamma (m)\) coming from \(g_m\ge 1\) and the sum \(\sum _{m=2}^{J}\) outside the square root. This amounts to minor tweaks in the calculation. For the first term,

$$\begin{aligned}&\sum _{\varvec{u}} (C_p)^{|\varvec{u}|} \sigma _{1}^{2u_1} \cdots \sigma _{\Gamma }^{2u_\Gamma } \sum _{\varvec{v}} \sigma _{1}^{2v_1} \cdots \sigma _{\Gamma }^{2v_\Gamma } w(\varvec{u},\varvec{v}) \sum _{\varvec{\gamma }} \mathbb {1}(\varvec{\gamma }\sim \varvec{u}+2\varvec{v},\gamma _{g-1}=\gamma (m)) \\&\hspace{2cm}\le g^{|\varvec{v}|}\cdot g^{|\varvec{v}|} \cdot (C_p)^{g} \cdot \sum _{\varvec{u}} \sigma _{1}^{2u_1} \cdots \sigma _{\Gamma }^{2u_\Gamma }\sum _{\varvec{v}} \sigma _{1}^{2v_1} \cdots \sigma _{\Gamma }^{2v_\Gamma } \\&\quad \max _{0\le j\le g-1}\sum _{\varvec{\gamma '}} \mathbb {1}(\varvec{\gamma '}\sim \varvec{u}+\varvec{v}, \gamma '_{j}=\gamma (m)) \\&\hspace{2cm}\le g^{g}\cdot (C_p)^{g} \left( gk \left\| {\varvec{H}} \right\| _{(local),2}^2\right) ^{g-|\varvec{v}|-1} \Vert {\varvec{H}_{\gamma (m)}} \Vert ^2. \end{aligned}$$

The only difference from (8.4) is that once a pairing of \(\varvec{v}\) is chosen, we lose one choice of \(\gamma '\), but this could happen at any \(\gamma '_j\).

For the second term,

$$\begin{aligned}&\max _{\varvec{u}} \sum _{\varvec{v}} \sigma _{1}^{2v_1} \cdots \sigma _{\Gamma }^{2v_\Gamma } w(\varvec{u},\varvec{v}) \sum _{\varvec{\gamma }} \mathbb {1}\left(\varvec{\gamma } \sim \varvec{u}+2\varvec{v},\gamma _{g-1}=\gamma (m)\right)\\&\le 2^g g^{|\varvec{v}|}g^{|\varvec{u}|} \cdot g^{|\varvec{v}|}\cdot \sum _{\varvec{v}} \sigma _{1}^{2v_1} \cdots \sigma _{\Gamma }^{2v_\Gamma } \sum _{\varvec{\gamma ''}} \mathbb {1}(\varvec{\gamma ''}\sim \varvec{v}) \\&\le (2g)^{g}\left( gk \left\| {\varvec{H}} \right\| _{(local),2}^2\right) ^{|\varvec{v}|}. \end{aligned}$$

For the first inequality, since the term \(\gamma _{g-1}\) may or may not be in \(\varvec{v}\), we bound by the latter case. Finally, sum over index m to arrive at

$$\begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| [\varvec{\mathcal {E}} (\varvec{H}_1,\ldots ,\varvec{H}_\Gamma ,t)]_{>g} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*&\le (C_p)^{g/2}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{I} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*\left( 4\sqrt{k} \Upsilon \left\| {\varvec{H}} \right\| _{(local),2}t\right) ^{g} 2g^{3g/2}\nonumber \\&\quad \cdot \frac{\left\| {\varvec{H}} \right\| _{(global),1}}{t\sqrt{k} \left\| {\varvec{H}} \right\| _{(1,2)}}. \end{aligned}$$
(8.6)

9.3 Proof of Theorem VIII.1

The proof is analogous to Sect. 3.5, with minor changes. We only highlight the differences and hide the numerical factors.

Proof

From the bounds for each \(g-th\) order (8.5) and the \(g'\)-th order (8.6), define

$$\begin{aligned} c(k)&:= 4\sqrt{k} \left\| {\varvec{H}} \right\| _{(local),2}. \end{aligned}$$

For a short time \(\tau = t\), we arrange and perform the last integral using estimate \(\int (\tau ')^{g-1} d\tau '\le \tau '^{g}/g\)

$$\begin{aligned} \frac{{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| e^{\textrm{i}\varvec{H}\tau }- \varvec{S}_\ell (\tau ) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{*,p}}{{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{I} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{*,p}}&\le \int ^\tau _0 \frac{{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{\mathcal {E}}(\tau ') \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{*,p}}{{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{I} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }{*,p}} d\tau ' \\&\le \frac{2\left\| {\varvec{H}} \right\| _{(global),2}^2}{k\left\| {\varvec{H}} \right\| _{(local),2}^2} \cdot \sum _{g=\ell +1, even}^{g'-1} \left( g^{3/2}\sqrt{C_p}c(k)\Upsilon \tau \right) ^{g} \\&\quad +\frac{2\left\| {\varvec{H}} \right\| _{(global),2}}{\sqrt{k} \left\| {\varvec{H}} \right\| _{(local),2}} \cdot \sum _{g=\ell +1, odd}^{g'-1} \left( g^{3/2}\sqrt{C_p}c(k)\Upsilon \tau \right) ^{g} \\&\quad + \frac{\left\| {\varvec{H}} \right\| _{(global),1}}{t\sqrt{k} \left\| {\varvec{H}} \right\| _{(1,2)}} \cdot \left( g'^{3/2}\sqrt{C_p}c(k)\Upsilon \tau \right) ^{g'}. \end{aligned}$$

Now, we evaluate Markov’s inequality to obtain concentration.

(I) For fixed input (suffice to consider pure states ),

Altogether, we obtain the rounds required (dropping \(k,\ell \) dependences)

$$\begin{aligned} r&= \Omega \left[ \left\| {\varvec{H}} \right\| _{(local),2} \sqrt{\log (1/\delta ) }t \cdot \left(\frac{\left\| {\varvec{H}} \right\| _{(global),2}^2\sqrt{\log (1/\delta ) } t}{\left\| {\varvec{H}} \right\| _{(local),2} \epsilon } \right)^{\frac{1}{\ell }} \right] \quad \text {for}\quad \ell \text { even}, \\ r&= \Omega \left[ \left\| {\varvec{H}} \right\| _{(local),2} \sqrt{\log (1/\delta ) }t \cdot \left( \left(\frac{\left\| {\varvec{H}} \right\| _{(global),2}\sqrt{\log (1/\delta ) } t}{ \epsilon } \right)^{\frac{1}{\ell }}\right. \right. \\&\quad \left. \left. +\left(\frac{\left\| {\varvec{H}} \right\| _{(global),2}^2\sqrt{\log (1/\delta ) } t}{\left\| {\varvec{H}} \right\| _{(local),2} \epsilon } \right)^{\frac{1}{\ell +1}}\right) \right] \quad \text {for}\quad \ell \text { odd}. \end{aligned}$$

Note that when the order \(\ell \) is odd (i.e., \(g= \ell +1\) is even), a larger cost may incur at order \(g+1\).

(II) For the spectral norm,

$$\begin{aligned} \hat{\Pr }( \Vert { \varvec{\mathcal {E}}_{tot} } \Vert \ge \epsilon )\le d\cdot \left(\frac{ {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{\mathcal {E}}_{tot} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{p}}{{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \varvec{I} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{p} \epsilon } \right)^p \le \delta . \end{aligned}$$

The factor of Hilbert space dimension d comes from the trace in Schatten p-norm. This would require a higher cost (\(\log (\delta )\rightarrow \sqrt{n+\log (\delta )}\)) to ensure the failure probability is small

$$\begin{aligned} r&= \Omega \left[ \! \left\| {\varvec{H}} \right\| _{(local),2} \sqrt{n\!+\!\log (1/\delta ) }t \cdot \left(\!\frac{\left\| {\varvec{H}} \right\| _{(global),2}^2\sqrt{n\!+\!\log (1/\delta ) } t}{\left\| {\varvec{H}} \right\| _{(local),2} \epsilon } \right)^{\frac{1}{\ell }} \right] \quad \text {for}\quad \ell \text { even}, \\ r&= \Omega \left[ \left\| {\varvec{H}} \right\| _{(local),2} \sqrt{n+\log (1/\delta ) }t \cdot \left( \left(\frac{\left\| {\varvec{H}} \right\| _{(global),2}\sqrt{n+\log (1/\delta ) } t}{ \epsilon } \right)^{\frac{1}{\ell }}\right. \right. \\&\quad \left. \left. +\left(\frac{\left\| {\varvec{H}} \right\| _{(global),2}^2\sqrt{n+\log (1/\delta ) } t}{\left\| {\varvec{H}} \right\| _{(local),2} \epsilon } \right)^{\frac{1}{\ell +1}}\right) \right] \quad \text {else}. \end{aligned}$$

This is the advertised result. \(\square \)

10 Arguments for Optimality

10.1 Comparing the random estimates with non-random estimates

In this section, we compare the Trotter error of random Hamiltonians with that of non-random Hamiltonians. Recall the leading order expansion for both

$$\begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \int _0^t [\varvec{\mathcal {E}} (\varvec{H}_1,\ldots ,\varvec{H}_\Gamma ,t)]_gdt \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_*&\le f(k,g)\cdot \sqrt{p}^g \cdot \left(\left\| {\varvec{H}} \right\| _{(local),2}\right)^{g-2} \left\| {\varvec{H}} \right\| _{(global),2}^2 t^g \\ {}&\quad \text {(random)}\\ \left\| {\int _0^t [\varvec{\mathcal {E}} (\varvec{H}_1,\ldots ,\varvec{H}_\Gamma ,t)]_gdt} \right\| _{p}&\le f(k,g)\cdot \sqrt{p}^{g(k-1)+1} \cdot \\ {}&\left(\left\| {\varvec{H}} \right\| _{(local),2}\right)^{g-1} \left\| {\varvec{H}} \right\| _{(global),2} t^g \quad \text {(non-random)}. \end{aligned}$$

Indeed, we see similarities between the expressions as the derivation are qualitatively analogous. The random Hamiltonian has better p-dependences (i.e., better concentration) because of the “external” randomness from the gaussian coefficients; the non-random Hamiltonian relies solely on the intrinsic randomness of the k-local structure. However, counter-intuitively, the random case has a worse dependence on system size. Shouldn’t the random case be more “incoherent” than the non-random case? More precisely, can we not ignore the gaussian coefficents and plug in the non-random estimates?

Of course, both bounds are consistent because we use different norms. The norms \(\Vert {\cdot } \Vert _{*}\) capture more stringent errors. For illustration, consider \(p=2\) for the norm \(\Vert {\cdot } \Vert _{fix,p}\)

where the vectors are any orthonormal basis. In other words, the 2-norm captures the sizes of the typical inputs ; the fixed norm may optimize for arbitrary fixed inputs . To illustrate the subtle distinction, consider the following Hamiltonian

$$\begin{aligned} \varvec{H}= \sum _{i>j} g_{ij} (\varvec{\sigma }^y_i\varvec{\sigma }^z_j + \varvec{\sigma }^z_i\varvec{\sigma }^z_j) + g'_{ij}\varvec{\sigma }^x_i\varvec{\sigma }^y_j \end{aligned}$$

and a term in the commutator at order \(\mathcal {O}(t^4)\)

$$\begin{aligned} t^4\sum _{k>i>j} g_{ij}^2 g^{'2}_{ki} \left[\varvec{\sigma }^y_i\varvec{\sigma }^z_j + \varvec{\sigma }^z_i\varvec{\sigma }^z_j, \left[\varvec{\sigma }^x_k\varvec{\sigma }^y_i,\left[\varvec{\sigma }^x_k\varvec{\sigma }^y_i, \varvec{\sigma }^y_i\varvec{\sigma }^z_j + \varvec{\sigma }^z_i\varvec{\sigma }^z_j\right]\right]\right] \propto t^4 \sum _{k>i>j} g_{ij}^2 g^{'2}_{ki} \varvec{\sigma }^x_i. \end{aligned}$$

Importantly, both two terms \(g_{ij}(\varvec{\sigma }^y_i\varvec{\sigma }^z_j + \varvec{\sigma }^z_i\varvec{\sigma }^z_j)\) and \(g'_{ki} \varvec{\sigma }^x_k\varvec{\sigma }^y_i\) show up twice that square the Gaussians, making them “coherent”. We evaluate both norms

$$\begin{aligned} \left\| { \sum _{k>i>j} g_{ij}^2 g^{'2}_{ki} \varvec{\sigma }^x_i } \right\| _{\bar{2}}&= \theta (n^2 \sqrt{n}) =\theta \left((\left\| {\varvec{H}} \right\| _{(local),2})^{3} \left\| {\varvec{H}} \right\| _{(global),2} \right) \\ \left\| { \sum _{k>i>j} g_{ij}^2 g^{'2}_{ki} \varvec{\sigma }^x_i } \right\| _{fix,\bar{2}}&= \theta (n^2 n) = \theta \left((\left\| {\varvec{H}} \right\| _{(local),2})^{2} \left\| {\varvec{H}} \right\| _{(global),2}^2 \right) \end{aligned}$$

and observe the latter is \(\sqrt{n}\)-larger because of optimizing over the fixed input

Nevertheless, the extra gate complexity cost due to the term \(\left\| {\varvec{H}} \right\| _{(global),2}^2\) vanishes asymptotically at higher-order formulas. On the other hand, the optimality for the operator norm bounds (with extra factors of \(\sqrt{n}\)) is less understood. For example, a commutator term at order \(\mathcal {O}(t^2)\) reads

$$\begin{aligned} \sum _{k>i>j}g'_{ki}g_{ij}\left[\varvec{\sigma }^x_k\varvec{\sigma }^y_i, \varvec{\sigma }^y_i\varvec{\sigma }^z_j + \varvec{\sigma }^z_i\varvec{\sigma }^z_j\right] \propto \sum _{k>i>j} g'_{ki}g_{ij} \varvec{\sigma }^x_k\varvec{\sigma }^x_i \varvec{\sigma }^z_j. \end{aligned}$$

The expression resembles spin-glass Hamiltonians (e.g., the Sherrington-Kirkpatrick model [51]), whose optimization is an entire research field. Even worse, the coefficients here are correlated, so we leave it as an open problem.

10.2 Counting lower bounds at early times

In this section, we give a counting argument suggesting that our gate complexity for random k-local Hamiltonians with fixed inputs is optimal at early times. For a particular unitary \(e^{\textrm{i}\varvec{H}t}\), is it generally hard to rule out the existence of a shorter circuit. Fortunately, lower bounds do exist by a counting argument for a set of unitaries.

We begin with reviewing the gate complexity for 1d-spatially-local models. There, the gate complexity nt is known to be tight.

Fact IX.1

(Upper bounds, analog to digital [18, 22]). For every piece-wise constant Hamiltonian

$$\begin{aligned} \varvec{H}([T,T+1]) = \varvec{H}_i(T), \Vert \varvec{H}_i \Vert \le 1, \end{aligned}$$

product formula approximates it well using \(\tilde{\mathcal {O}}(nt)\) gates

Fact IX.2

(Lower bounds, digital to analog [22]). In the family of piece-wise constant Hamiltonian, there exists \(\tilde{\mathcal {O}}(nt)\) different instances of Boolean circuits, and hence require a circuit of size \(\tilde{\Omega }(nt)\).

Now, for k-local random Hamiltonians, we have shown the unitary evolution \(\textrm{e}^{-\textrm{i}\varvec{H}t}\) can be simulated with gate complexity of \(\tilde{\mathcal {O}}(n^k \left\| {\varvec{H}} \right\| _{(local),2} t)\) for a fixed input state. Is the factor of the number of Hamiltonian terms \(\Gamma =n^k\) a feature or a bug? We conjecture it is the former.

Hypothesis IX.1

Simulation of a typical sample of random k-local (SYK normalization) Hamiltonian for time t requires \(\tilde{\Omega }(n^kt)\) gates.

We present a supportive early-time argument. Consider the random Hamiltonian drawn randomly where the k-local matrices are o.n.

$$\begin{aligned} \varvec{H}^{}_{k,n} := \sum _{i_1< \ldots<i_k \le n} \varvec{H}^{}_{i_1\cdots i_k} =\sum _{i_1< \ldots <i_k \le n} J_{i_1\cdots i_k} \varvec{K}^{}_{i_1\cdots i_k}\quad \text {where}\quad \textrm{Tr}(\varvec{K}_{i_1\cdots i_k}\varvec{K}_{i'_1\cdots i'_k})\\=d \delta _{\varvec{i},\varvec{i'}} \end{aligned}$$

and d is the dimension of the Hilbert space. Recall, the number of terms is \(\Gamma =\left( {\begin{array}{c}n\\ k\end{array}}\right) =\mathcal {O}(n^k)\). The coefficients are i.i.d. GaussianFootnote 21 with variance

$$\begin{aligned} \mathbb {E}[J^2_{\varvec{i}}]= \mathcal {O}(\frac{1}{n^{k-1}}). \end{aligned}$$

For a counting argument, we count the number of different Hamiltonians by the size of epsilon net \(N(\epsilon )\) via collision probability. Draw N i.i.d. samples from the random Hamiltonian ensemble and take a union bound over the chance a pair of random samples collide

$$\begin{aligned} \Pr \left( \exists \varvec{H},\varvec{H}': \left\| { \varvec{H}- \varvec{H}'} \right\| _{\infty }<\epsilon \right) \le \left( {\begin{array}{c}N\\ 2\end{array}}\right) \Pr (\left\| { \varvec{H}- \varvec{H}'} \right\| _{\infty }<\epsilon ). \end{aligned}$$

So long as RHS \(< 1\), there must exist an epsilon-net of size N, i.e., \(N(\epsilon )\) can be as large as

$$\begin{aligned} N(\epsilon ) = \lfloor {\sqrt{2/\Pr (\left\| { \varvec{H}- \varvec{H}'} \right\| _{\infty }<\epsilon )}}\rfloor . \end{aligned}$$

To bound the RHS, we reduce to controlling the 2-norm

$$\begin{aligned} \Pr (\left\| { \varvec{H}- \varvec{H}'} \right\| _{\infty }<\epsilon ) \le \Pr (\left\| { \varvec{H}- \varvec{H}'} \right\| _{2}<\epsilon \sqrt{d}), \end{aligned}$$

where the dimension of Hilbert space d will be canceled. The 2-norm calculation is a scalar concentration bound

$$\begin{aligned} \left\| { \varvec{H}- \varvec{H}'} \right\| _{2}^2 = \left\| {\sum _{\varvec{i}}(J_{\varvec{i}}-J'_{\varvec{i}})\varvec{K}_{\varvec{i}}} \right\| _{2}^2 = \sum _{\varvec{i}}(J_{\varvec{i}}-J'_{\varvec{i}})^2 d \simeq 2\sum _{\varvec{i}}J_{\varvec{i}}^2 d. \end{aligned}$$

The last line uses that two i.i.d. Gaussians sum to another Gaussian. We can use Bernsteins’ inequality for variables \(x_{\varvec{i}}:= J^2_{\varvec{i}}\)

Fact IX.3

(Bernstien’s inequality). For independent variables \(x_i\) with variance \(\sum _i (x_i-\mathbb {E}[x_i])^2 = v\) and bound \(\left| {x_i-\mathbb {E}[x_i]} \right| {\mathop {\le }\limits ^{a.s.}} L\),

$$\begin{aligned} \Pr \left( |\sum _{i} x_{i} -\mathbb {E}\sum _{\varvec{i}} x_{i} |\ge \delta \right)&\le 2\exp \left(\frac{-\delta ^2/2}{v +L\delta /3}\right). \end{aligned}$$

For our parameters,

$$\begin{aligned} \Pr (\left\| { \varvec{H}\!-\! \varvec{H}'} \right\| _{2}<\epsilon \sqrt{D})\!\le \! \Pr (\sum _{\varvec{i}} J_{\varvec{i}}^2\!\le \! \epsilon ^2/2 )&\!\le \! \Pr \left( |\sum _{\varvec{i}} J_{\varvec{i}}^2 \!-\Omega (n) |\!\ge \! \Omega (n)\!-\epsilon ^2/2 \right) \\&\lesssim \exp (-\Omega (n^k)) = \exp (-\Omega (\Gamma )). \end{aligned}$$

where we plugged in \(\mathbb {E}\sum _{\varvec{i}} x_{\varvec{i}} = \mathcal {O}(n)\), \(\delta = n-\epsilon /2=\mathcal {O}( n)\), \(L=\mathcal {O}( 1/n^{k-1})\), and \(v= \mathcal {O}(n^k/n^{2(k-1)} )=\mathcal {O}(1/n^{k-2})\). To estimate the circuit lower bound, consider unitary evolution up to a short time (say \(t_*\sim \theta (1)\)Footnote 22).

$$\begin{aligned} \Pr (\left\| {e^{\textrm{i}\varvec{H}t_*}-e^{\textrm{i}\varvec{H}'t_*}} \right\| _{\infty }\le \epsilon ) {\mathop {\lesssim }\limits ^{?}} \Pr (\left\| {( \varvec{H}- \varvec{H}')t^*} \right\| _{\infty }<\epsilon ) \le \exp (\Omega (-\Gamma )). \end{aligned}$$

Unfortunately, there is still a missing step from the Hamiltonian to the unitary evolution; the second inequality is rigorous. If this line holds, then a circuit of size

$$\begin{aligned} \Omega (\Gamma )=\Omega (n^k) \end{aligned}$$

is needed at early times \(t=\theta (1)\), matching our Trotter bounds for non-random and random Hamiltonians (Theorem II.1, Theorem VIII.1).