Introduction

A primary goal of emerging quantum computing technologies is to enable the simulation of quantum many-body systems that are challenging for classical computers1,2,3. Early experimental demonstrations of quantum simulation algorithms have focused on computing ground- and excited-state energies of small molecules or few-site spin and fermionic models4,5,6,7,8. More recently, the scale of quantum simulation experiments has increased in terms of numbers of qubits, diversity of gate sets, and complexity of algorithms, as manifested in the simulation of models based on real molecules and materials9,10, various phases of matter such as thermal11, topological12,13, and many-body localized states14,15, as well as holographic quantum simulation using quantum tensor networks16,17. As quantum advantages in random sampling have been established on quantum hardware18,19, focus has turned to the experimental demonstration of quantum advantages in problems of physical significance20.

For applications in chemistry and physics, the calculation of the response properties of molecules and materials is of substantial interest21. Investigating response properties in the electronic structure theory framework involves calculating quantities such as the one-particle Green’s function22 and density-density response functions23, which provide insight into interpreting experimental spectroscopic measurements24. Response properties of molecules and materials can be determined either in the time domain or in the frequency domain. Due to the natural ability of quantum computers to simulate time evolution1,2, near-term algorithms to compute time-domain response properties have been carried out on quantum hardware25,26,27. However, computing the frequency-domain response from the time-domain response using the typical gate set requires a time duration that exceeds the circuit depth limitations of near-term quantum computers.

An alternative approach to determine these response properties is by computing them directly in the frequency domain. Frequency-domain algorithms generally involve obtaining the ground- and excited-state energies, as well as the transition amplitudes between the ground state and the excited states. Although there are established methods to obtain ground- and excited-state energies on quantum computers28,29,30, calculating transition amplitudes is less straightforward. Various schemes including variational quantum simulation31,32,33, quantum subspace expansion34, and quantum linear algebra35 to determine frequency-domain response properties have been proposed. While variational quantum methods to compute frequency-domain response properties have been demonstrated36, the accuracy of variational methods generally depends on the quality of the ansatz. Moreover, quantum subspace expansion is susceptible to numerical instabilities from basis linear dependence, and quantum linear algebra is out of reach for near-term quantum hardware. Recently, a non-variational scheme amenable to near-term hardware implementation has been proposed37,38. This scheme constructs the electron-added and electron-removed states simultaneously by exploiting the probabilistic nature of the linear combination of unitaries (LCU) algorithm39. Recently developed high-fidelity multipartite gates40,41,42,43,44 which would facilitate the execution of these algorithms have been reported. Implementation of frequency-domain response property calculations on quantum hardware with such gates would allow for a demonstration of their effectiveness on a representative problem of scientific relevance within the constraints of qubit number and circuit depth. However, they have yet to be integrated into quantum simulation circuits or demonstrated to yield improved accuracy of observable properties.

In this work, we experimentally demonstrate the application of a high-fidelity three-qubit iToffoli gate40 on a superconducting quantum processor to the calculation of frequency-domain response properties of diatomic molecules using LCU circuits. The use of the iToffoli gate leads to substantial reductions in the circuit depth by ~50% and in the circuit execution time by ~40%. The transition amplitudes between the ground state and the N-electron or (N ± 1)-electron states of NaH and KH molecules are computed on the quantum hardware and used to construct spectral functions and density-density response functions. We apply error mitigation techniques including randomized compiling (RC)45,46 during circuit construction, and McWeeny purification47 during post-processing, both of which result in marked improvement of the experimental observables. The molecular response properties obtained from the reduced-depth circuits with iToffoli decomposition show comparable or better agreement with theory compared to those from circuits with CZ decomposition, despite incomplete Pauli twirling in the RC procedure applied to the iToffoli gate. Our results advance the general application of multi-qubit gates to quantum chemistry and related quantum simulation protocols on near-term quantum hardware.

Results

Quantum algorithm for transition amplitudes of diatomic molecules

We consider the highest occupied molecular orbital-lowest unoccupied molecular orbital (HOMO-LUMO) models of the diatomic molecules NaH and KH as shown in Fig. 1A (see “Methods” for parameters of the molecular models). Such molecular models with reduced active space have been used in benchmarking quantum chemistry methods on quantum computers48. The HOMO-LUMO model generates two spatial orbitals or equivalently four spin orbitals, which correspond to four qubits after the Jordan-Wigner transformation49. To reduce quantum resources, we exploit the number symmetry in each spin sector to reduce the number of qubits from four to two using a qubit-tapering technique50 (details given in Supplementary Note 1).

Fig. 1: Schematic of the diatomic molecules and diagrams of the LCU circuits for computing transition amplitudes.
figure 1

A Schematic of the diatomic molecules NaH and KH. The active space consists of only the HOMO and the LUMO. B The circuits to calculate diagonal transition amplitudes, where a0 is the ancilla qubit and s0 and s1 are the system qubits. For the spectral functions, the target unitaries are \({\tilde{X}}_{p\sigma }\) and \({{{i}}}{\tilde{Y}}_{p\sigma }\), while for the response function, the target unitaries are I and \({\tilde{Z}}_{p\sigma }\). C The circuit to calculate off-diagonal transition amplitudes in the response functions, where a0 and a1 are the ancilla qubits, and s0 and s1 are the system qubits. The double-controlled-\(\tilde{Z}\) gates are decomposed with either iToffoli gates or CZ gates. In both B, C quantum state tomography (QST) is performed on the system qubits.

The observables we aim to determine are the spectral function and density-density response function. Suppose that the molecular Hamiltonian with reduced active space has ground state \(\left\vert {\Psi }_{0}\right\rangle\) with energy E0, and (N ± 1)-electron eigenstates \(\left\vert {\Psi }_{\lambda }^{N\pm 1}\right\rangle\) with energies \({E}_{\lambda }^{N\pm 1}\). Let \({\hat{a}}_{p\sigma }^{{\dagger} }\) and \({\hat{a}}_{p\sigma }\) be the creation and annihilation operators on orbital p with spin σ, respectively. The one-particle Green’s function has the expression22:

$$\begin{array}{ll}{G}_{pq}(\omega )\,=\,\mathop{\sum}\limits_{\lambda \sigma }\frac{\langle {\Psi }_{0}| {\hat{a}}_{p\sigma }| {\Psi }_{\lambda }^{N+1}\rangle \langle {\Psi }_{\lambda }^{N+1}| {\hat{a}}_{q\sigma }^{{\dagger} }| {\Psi }_{0}\rangle }{\omega +{E}_{0}-{E}_{\lambda }^{N+1}+{{{i}}}\eta }\\ \qquad\qquad\quad+\,\mathop{\sum}\limits_{\lambda \sigma }\frac{\langle {\Psi }_{0}| {\hat{a}}_{q\sigma }^{{\dagger} }| {\Psi }_{\lambda }^{N-1}\rangle \langle {\Psi }_{\lambda }^{N-1}| {\hat{a}}_{p\sigma }| {\Psi }_{0}\rangle }{\omega -{E}_{0}+{E}_{\lambda }^{N-1}+{{{i}}}\eta }\end{array}$$
(1)

where ω is the frequency and η is a small broadening factor. The spectral function A(ω) is related to the Green’s function by \(A(\omega )=-{{{{\rm{\pi }}}}}^{-1}{{{\rm{Im}}}}\,{{{\rm{Tr}}}}\,G(\omega )\).

For the density-density response function, we consider the charge-neutral N-electron excited states \(\left\vert {\Psi }_{\lambda }^{N}\right\rangle\) with energies \({E}_{\lambda }^{N}\) and the number operator \({\hat{n}}_{p\sigma }\) on the orbital p with spin σ. The density-density response function has the expression23:

$${R}_{pq}(\omega )=\mathop{\sum}\limits_{\lambda }\frac{{\sum }_{\sigma {\sigma }^{{\prime} }}\langle {\Psi }_{0}| {\hat{n}}_{p\sigma }| {\Psi }_{\lambda }^{N}\rangle \langle {\Psi }_{\lambda }^{N}| {\hat{n}}_{q{\sigma }^{{\prime} }}| {\Psi }_{0}\rangle }{\omega +{E}_{0}-{E}_{\lambda }^{N}+{{{i}}}\eta }.$$
(2)

The operators \({\hat{a}}_{p\sigma }^{{\dagger} },{\hat{a}}_{p\sigma }\) and \({\hat{n}}_{p\sigma }\) are not unitary, but they can be written as linear combinations of unitary operators as

$${\hat{a}}_{p\sigma }^{{\dagger} }=({\bar{X}}_{p\sigma }-{{{i}}}{\bar{Y}}_{p\sigma })/2,$$
(3)
$${\hat{a}}_{p\sigma }=({\bar{X}}_{p\sigma }+{{{i}}}{\bar{Y}}_{p\sigma })/2,$$
(4)
$${\hat{n}}_{p\sigma }=(I-{Z}_{p\sigma })/2,$$
(5)

where I is the identity operator, Zpσ is the Pauli Z operator on orbital p with spin σ, and \({\bar{X}}_{p\sigma }\) and \({\bar{Y}}_{p\sigma }\) are the Jordan-Wigner transformed Pauli X and Y operators on orbital p with spin σ with a string of Z operators included to account for the anticommutation relation49. The Pauli strings \({\bar{X}}_{p\sigma },{\bar{Y}}_{p\sigma }\) and Zpσ undergo the same transformation and qubit tapering process as the Hamiltonian (details given in Supplementary Note 1). Except for the identity operator which does not change under the transformation, we label the transformed \({\bar{X}}_{p\sigma },{\bar{Y}}_{p\sigma },{Z}_{p\sigma }\) as \({\tilde{X}}_{p\sigma },{\tilde{Y}}_{p\sigma }\) and \({\tilde{Z}}_{p\sigma }\).

The LCU circuits to calculate diagonal and off-diagonal transition amplitudes are given in Fig. 1B, C, respectively. Each circuit has two system qubits s0 and s1, and one ancilla qubit a0 or two ancilla qubits a0 and a1. The unitary U0 prepares the ground state \(\left\vert {\Psi }_{0}\right\rangle\) on the system qubits from the all-zero initial state. The operators \({\tilde{X}}_{p\sigma }\) and \({\tilde{Y}}_{p\sigma }\) are only present in the diagonal circuit in Fig. 1B since the calculation of the spectral function only requires diagonal transition amplitudes. The operators I and \({\tilde{Z}}_{p\sigma }\) are present in both the diagonal circuit in Fig. 1B and the off-diagonal circuit in Fig. 1C, since the density-density response function requires both the diagonal and the off-diagonal transition amplitudes. The remaining two double-controlled identity gates that would complete the LCU circuit, which correspond to the first double-controlled gate (controlled on \(\left\vert 0\right\rangle\) of both a0 and a1) and the third double-controlled gate (controlled on \(\left\vert 0\right\rangle\) of a0 and \(\left\vert 1\right\rangle\) of a1) in Fig. 3 of ref. 37, are not shown because they are equivalent to identity gates on the whole circuit. We note that the original algorithm37,38 proposed performing quantum phase estimation on the system qubits. Due to quantum resource constraints, we need to encode our physical state onto a two-qubit subset of the four qubits available on the quantum device. Therefore, in contrast to the original algorithm, we apply quantum state tomography51 to the system qubits while measuring the ancilla qubits in the Z basis.

In the diagonal circuits, we obtain the (unnormalized) system-qubit states \(\frac{1}{2}({\tilde{X}}_{p\sigma }\pm {{{i}}}{\tilde{Y}}_{p\sigma })\left\vert {\Psi }_{0}\right\rangle\) or \(\frac{1}{2}(I\pm {\tilde{Z}}_{p\sigma })\left\vert {\Psi }_{0}\right\rangle\) with probabilities p±, where the probabilities are specified by the ancilla measurement outcome as \({p}_{+}={p}_{{a}_{0} = 0}\) and \({p}_{-}={p}_{{a}_{0} = 1}\); in the off-diagonal circuits, we obtain the (unnormalized) system-qubit states \(\frac{1}{4}[(I-{\tilde{Z}}_{p\sigma })\pm {{{{e}}}}^{{{{i}}}{{{\pi }}}/4}(I-{\tilde{Z}}_{q{\sigma }^{{\prime} }})]\left\vert {\Psi }_{0}\right\rangle\) with probabilities p±, where \({p}_{+}={p}_{({a}_{0},{a}_{1}) = (1,0)}\) and \({p}_{-}={p}_{({a}_{0},{a}_{1}) = (1,1)}\). We take the overlap of the tomographed system-qubit states with the exact eigenstates, which are then post-processed according to Eq. (18) in ref. 37 or Eq. (25) in ref. 38 to yield the transition amplitudes (see Supplementary Note 2 for a detailed derivation). The transition amplitudes are then used to construct the spectral function and density-density response function according to Eqs. (1) and (2).

In the following sections, for simplicity, we will denote the diagonal circuit that applies the operator \({\hat{a}}_{p\sigma }^{({\dagger} )}\) or \({\hat{n}}_{p\sigma }\) to the initial ground state as the pσ-circuit, and the off-diagonal circuit that applies the operators \({\hat{n}}_{p\sigma }\) and \({\hat{n}}_{q{\sigma }^{{\prime} }}\) to the initial ground state as the \((p\sigma ,q{\sigma }^{{\prime} })\)-circuit.

iToffoli vs CZ decompositions in LCU circuits

The transformed and tapered operators are two-qubit Pauli strings with multiplicative factors of ±1 or \(\pm {{{i}}}\). To apply the single- or double-controlled gates, we follow the standard multi-qubit Pauli gate decomposition52 (see Supplementary Note 4 for details) with the base gate as CZ or CCZ and use CNOT gate equivalents, which consist of native CZ gates dressed by Hadamard gates, to extend the weights of the Pauli strings. The multiplicative factor −1 or \(\pm {{{i}}}\) can be applied as a single-qubit phase gate on the ancilla in the diagonal circuits, or as the native CZ, CS, or CS on the two ancillae in the off-diagonal circuits. Additionally, X gates are wrapped around the ancilla qubits controlled on \(\left\vert 0\right\rangle\). Figure 2A shows how a double-controlled gate with ancilla a0 controlled on \(\left\vert 1\right\rangle\), ancilla a1 controlled on \(\left\vert 0\right\rangle\), and the target operator −ZZ is applied on the device.

Fig. 2: Decomposition of the double-controlled composite gates in the LCU circuits.
figure 2

A Example of the decomposition of a double-controlled −ZZ gate, which is controlled on \(\left\vert 1\right\rangle\) of a0 and \(\left\vert 0\right\rangle\) of a1, into CCZ (blue) along with other single- and two-qubit gates. The X gates (green) are used to adjust the control states; the CZ gate on a0 and a1 (purple) is used to adjust the overall multiplicative factor, which is −1 in this case; the CNOT gate equivalents (orange) are used to extend the weights of the Pauli string as in ref. 52. B Decomposition of the CCZ gates with the iToffoli gate, which is a \({\rm{CC}}{\hbox{-}}i{\rm{X}}\) gate with both control qubits controlled on \(\left\vert 0\right\rangle\). The decomposition includes the equivalent of a \({\rm{CC}}{\hbox{-}}i{\rm{Z}}\) gate (light blue) and the equivalent of a long-range CS gate (yellow). The SWAP gates are simplified in the transpilation stage or further decomposed with CZ gates according to ref. 53.

We decompose the CCZ gate either with the three-qubit iToffoli gate as shown in Fig. 2B or with the native CZ gates. The iToffoli decomposition starts with a double-controlled \(i{\rm{Z}}\) component, followed by a long-range CS gate to cancel the phase factor \({i}\). The SWAP gates in the long-range CS part of the circuit are further simplified in the transpilation stage or decomposed into three CZ gates and additional single-qubit gates according to a recent work on the same quantum device53. For the CZ decomposition of CCZ, we use the topology-aware quantum circuit synthesis package BQskit54 to obtain the optimal decomposition as eight CZs under linear qubit connectivity, as opposed to the six-CZ decomposition that requires all-to-all qubit connectivity55.

The spectral function only requires the four diagonal circuits 0, 0, 1, 1. The density-density response function requires four diagonal circuits 0, 0, 1, 1 and six off-diagonal circuits (0, 0), (0, 1), (0, 1), (0, 1), (0, 1), (1, 1). We use the same transpilation procedure to optimize the circuits constructed from the iToffoli decomposition and the CZ decomposition (details given in “Methods” section).

The diagonal circuits after transpilation are relatively shallow circuits with maximum circuit depth (excluding virtual Z gates) of 19, maximum two-qubit gate count of 7 and no iToffoli gates. In the off-diagonal circuits, the circuit depths range from 24 to 29 for iToffoli decomposition and from 54 to 59 for CZ decomposition. As for the two- and multi-qubit gate counts, each iToffoli-decomposed circuit contains two iToffoli gates and 9 to 12 native two-qubit gates, while each CZ-decomposed circuit contains 19 to 21 native two-qubit gates. The iToffoli decomposition thus results in ~50% reduction in the circuit depth and the number of two-qubit gates compared to the CZ decomposition.

We also compare the durations of the circuits that result from the iToffoli decomposition and the CZ decomposition. The duration of each CZ gate is 201 ns53, while the duration of each iToffoli gate is 413 ns40. Combined with other gate execution times, the durations of the iToffoli- (CZ-) decomposed circuits are 2.9–3.6 μs (4.9–5.5 μs), corresponding to a reduction in circuit execution time of approximately 40% from using iToffoli gates. This reduction in duration is expected to have a more pronounced effect on deeper circuits with execution times comparable to qubit coherence times, which are on the order of 30–50 μs53 (a complete set of gate durations and qubit coherence times are given in Supplementary Note 3).

Spectral function and response function on quantum hardware

The spectral functions of NaH and KH are shown in Fig. 3. The density matrices are obtained from quantum state tomography and post-processed with McWeeny purification. RC is not employed in constructing the circuits for obtaining these results. A broadening factor of η = 0.75 eV is used to generate both the exact and experimental spectra. As the peak frequencies are determined classically, we use the peak heights in the spectral functions as the primary metric for comparison56. The experimental spectral functions show good agreement with the exact ones with a maximum peak height deviation of 10.6%, indicating the high fidelity of circuit execution on the quantum device.

Fig. 3: Spectral function of diatomic molecules.
figure 3

Spectral function of A NaH, B KH. The circuits to obtain the spectral function are shallow three-qubit circuits that do not require the iToffoli gates. A broadening factor of η = 0.75 eV is used to generate both the exact and the experimental spectra. The experimental spectral functions are in quantitative agreement with the exact ones, with a maximum peak height deviation of 10.6%.

We next turn to the density-density response function, which is more challenging to compute than the spectral function because it requires deeper off-diagonal circuits containing three-qubit iToffoli gates. We begin by considering a specific off-diagonal circuit needed for the density-density response function, the (0, 0)-circuit. To understand the influence of the iToffoli gate on the accuracy of the executed circuit, we compute the fidelity of the whole qubit register obtained by quantum state tomography versus circuit depth. The same quantity was computed for a circuit using only CZ gates to decompose the double-controlled gates. The resulting circuit fidelities are shown in Fig. 4. Although the iToffoli decomposition shows a steeper decrease in fidelity compared to the CZ decomposition, the fidelity at the end of the circuit is higher due to lower circuit depth. The noisy simulation in the inset of Fig. 4 shows a similar trend. The iToffoli gate reported in ref. 40 does not consider spectator errors on neighboring qubits, which are canceled out in the gate calibration in this work (details given in Supplementary Note 3). The cycle benchmarking fidelity of the iToffoli gate accounting for the spectator qubit is 96.6%, lower than the single-qubit gate fidelities which are above 99.5%, and the two-qubit gate fidelities which are between 98.0% and 98.7%, which may explain the steeper decay in fidelity with circuit depth in the iToffoli circuit compared to the CZ circuit.

Fig. 4: Fidelity vs circuit depth of the (0, 0)-circuit for NaH.
figure 4

Fidelity for the iToffoli decomposition (blue), which has a circuit depth of 24, and the CZ decomposition (yellow), which has a circuit depth of 54. The locations of the iToffoli gates are marked by red crosses. The CZ decomposition results in lower overall fidelity compared to iToffoli decomposition due to higher circuit depth. The inset is the corresponding data from noisy simulation and shows a similar trend. All results in this figure are raw experimental or simulated data without any error mitigation.

Next, we examine the fidelity of the final state in each iToffoli-decomposed circuit used in the calculation of response functions. Figure 5 shows the system-qubit state fidelities on each response function circuit for NaH, where McWeeny purification is applied to the system-qubit density matrix after restricting the full density matrix to each ancilla bitstring sector. Comparing the values in Fig. 5A with those in Fig. 5B, we can see that RC itself only results in a moderate improvement in the fidelities, with the average diagonal fidelities changing from 84.6% to 85.5% and average off-diagonal fidelities changing from 45.2% to 54.8%. However, the results between Fig. 5B, D show that RC combined with purification yields an average diagonal fidelity of 99.9% and an average off-diagonal fidelity of 96.0%, even though purification without RC only leads to a limited improvement in the average diagonal fidelity from 85.6% to 95.7%, and in the average off-diagonal fidelity from 45.2% to 67.4% in Fig. 5A, C.

Fig. 5: System-qubit state fidelities in the response function calculation of NaH.
figure 5

A, B Fidelities between the raw experimental and exact system-qubit density matrices without (A) and with RC (B). The diagonal elements correspond to system-qubit density matrices in the diagonal circuits after taking the ancilla state a0 = 1, and the off-diagonal elements correspond to the system-qubit density matrices in the off-diagonal circuits after taking the ancilla states either as (a0, a1) = (1, 0) (upper diagonal) or as (a0, a1) = (1, 1) (lower diagonal). C, D Fidelities between the purified experimental and exact system-qubit density matrices without (C) and with RC (D). The layout of the tiles is the same as in panels (A, B). Without RC, purification raises the average off-diagonal fidelity from 45.2% to 67.4%, but with both RC and purification, the average off-diagonal fidelity increases to 96.0%.

We now show the imaginary parts of the density-density response functions χ00 and χ01 of NaH in Fig. 6. Here χ00 is obtained from two diagonal circuits 0, 0 and one off-diagonal circuit (0, 0), while χ01 is obtained from four off-diagonal circuits (0, 1), (0, 1), (0, 1), (0, 1). All experimental results are post-processed with purification after constraining the ancilla qubits to each bitstring subspace. A broadening factor of η = 1.5 eV is used to produce the response functions.

Fig. 6: Density-density response function of NaH.
figure 6

A \({{{\rm{Im}}}}\,{\chi }_{00}\) without RC. B \({{{\rm{Im}}}}\,{\chi }_{00}\) with RC. C \({{{\rm{Im}}}}\,{\chi }_{01}\) without RC. D \({{{\rm{Im}}}}\,{\chi }_{01}\) with RC. All experimental results are post-processed with McWeeny purification on the system-qubit states after constraining to the ancilla bitstring subspace. A broadening factor of η = 1.5 eV is used to generate the spectra. Without RC, the iToffoli decomposition yields qualitatively better results compared to the CZ decomposition. After RC is applied, the two decompositions yield comparable results.

Overall, the iToffoli decomposition yields better results compared to the CZ decomposition in the absence of RC, while both decompositions yield comparable results when RC is applied. Examining the spectral functions in Fig. 6A, C, we observe that the peak at 24.0 eV is not present in χ00 and displays the wrong sign in χ01 under the CZ decomposition. Although the iToffoli decomposition also produces the peak at 24.0 eV with the wrong sign in χ01, it exhibits a peak with a deviation of 6.1% from the exact peak in χ00. The same trend occurs for the peak at 1.4 eV. Both decompositions result in similar deviations of the peak height at 1.4 eV in χ01, where the deviation is 45.3% for the CZ decomposition and 52.5% for the iToffoli decomposition. However, in χ00, the iToffoli decomposition yields a 26.6% deviation from the exact peak in χ00, whereas the CZ decomposition produces a peak more than twice the exact value.

The results for circuits constructed with RC are shown in Fig. 6B, D. In χ00, deviations from the exact peak height at 24.0 eV and 1.4 eV are 34.8% and 4.7% for the CZ decomposition, and 11.8% and 24.0% for the iToffoli decomposition. In χ01, deviations from the exact peak at 24.0 eV and 1.4 eV are 5.7% and 28.2% for the CZ decomposition, but are 39.2% and 32.2% for the iToffoli decomposition. Since the iToffoli gate is non-Clifford, our implementation of RC results in incomplete Pauli twirling compared to applying RC to the CZ-decomposed circuits (see Supplementary Note 4). The incompleteness of RC on the iToffoli-decomposed circuits may explain why the two decompositions have comparable peak height deviations when RC is applied, despite the initial advantage for the iToffoli decomposition without RC due to its lower circuit depth.

Discussion

We have carried out an LCU-based algorithm to compute the spectral functions and density-density response functions of diatomic molecules from the transition amplitudes determined on a superconducting quantum processor. Using a native high-fidelity iToffoli gate40 has enabled the required circuit depth to be reduced by ~50% and the circuit execution time to be reduced by ~40%. These resulting circuits produced a better agreement with the exact results compared to the circuits constructed only from single- and two-qubit gates when RC is not employed in circuit construction. We also developed a RC protocol for the non-Clifford iToffoli gate, and have shown that in the absence of complete Pauli twirling on the iToffoli gate, the circuits constructed from iToffoli gates gave comparable results as the circuits constructed only from single- and two-qubit gates when RC is applied in circuit construction. Our work also indicates that to obtain more accurate observables of the simulated physical systems, quantum hardware needs to improve in terms of two- and three-qubit gate fidelities.

The quality of the computed observables was substantially improved by the use of several error mitigation techniques. Specifically, our results highlight the significance of combining RC45,46 with McWeeny purification47 for quantum simulation. McWeeny purification has been widely used in quantum chemistry57 and started to be exploited in quantum computing for constraining the purity of the output state9,58. Our results have shown that RC or McWeeny purification individually only improve the experimental results to a limited extent, as observed in the change of the average off-diagonal fidelities from 45.8% to 54.2% with only RC, and to 67.4% with only purification in Fig. 5. However, the combination of RC and purification results in a substantial improvement in the quality of the results, with the system-qubit state fidelities being 96.0% on average. The large improvement when combining RC and McWeeny purification is explained by the fact that RC tailors coherent errors into stochastic Pauli errors46. If the rates of various stochastic Pauli errors are similar, the errors are largely depolarizing and are corrected by McWeeny purification, yielding the high fidelities in Fig. 5D (see Supplementary Note 4 for further discussion). Moreover, previous works applied purification to the whole qubit register, but we have shown here that the purification scheme can be applied when there is a purity constraint on a subset of qubits. Additionally, our work is the first to apply RC to the non-Clifford iToffoli gate. As more native non-Clifford two-qubit and multi-qubit gates become available, our findings may guide future application of RC to non-Clifford gates. Other error mitigation techniques such as zero noise extrapolation and probabilistic error cancellation59 may also be generalized to non-Clifford gates with appropriate modifications60.

Our work is also among the first to demonstrate the practical use of a native multi-qubit gate in quantum simulation. The particular algorithm in this work is amenable to larger-scale implementation if quantum processors with higher qubit count and lower two- and three-qubit gate errors are available. Due to the nature of the LCU algorithm, the number of system qubits increases but the number of ancilla qubits stays constant while scaling up to larger system sizes. In particular, the circuit depths remain constant if long-range gates are available. Therefore, the compilation advantages of the iToffoli gates, as well as the fidelities of the circuits are not expected to change as the system size increases. Additionally, LCU as a general algorithmic framework is not limited to determining transition amplitudes in frequency-domain response properties but has broader applications in areas such as solving linear systems61, simulating non-Hermitian dynamics62, and preparing quantum Gibbs states63. Besides the LCU algorithm, quantum algorithms such as Shor’s algorithm64 and Grover’s search algorithm65 can benefit from native three-qubit gates with a reduction in circuit depths and gate counts. Quantum algorithm design and implementation thus far have been mostly restricted to single- and two-qubit gates due to their ease of implementation and demonstrated high fidelities. Meanwhile, early implementations of three-qubit gates66,67,68 were generally slower and more prone to leakage and decoherence compared to the iToffoli gate employed here due to populating higher levels outside the qubit computational space. However, more recent implementations of three-qubit gates40,41,42,43,44 have begun to address these challenges yielding fidelities approaching those of two-qubit gates. Further, they have been carried out on quantum devices with tens of qubits, suggesting their utility for larger-scale quantum devices. As such native multi-qubit gates become more prevalent, our work paves the way for using them as native gate components in future quantum algorithm design and implementation.

Methods

Molecular models

The molecular models studied in this work are HOMO-LUMO models of NaH at a bond distance of 3.7 Å and KH at a bond distance of 3.9 Å in the STO-3G basis. The bond distances are chosen to ensure sufficient population in the excited states to facilitate comparisons of the spectral peaks. Molecular integrals are determined from the quantum chemistry software package PySCF69. Since our work focuses on comparing the transition amplitudes, the ground- and excited-state energies are determined classically, as has been performed in other quantum simulation demonstrations70. OpenFermion71, a software library for designing and analyzing quantum algorithms, is used to map the second-quantized Hamiltonians to qubit operators.

Quantum circuit construction

The ground-state preparation gate on the system qubits is determined classically by constructing a unitary that maps the all-zero initial states to the ground state and then decomposes into three CZ gates and single-qubit gates using the KAK decomposition72. The LCU circuits are then constructed by applying the gates shown in Fig. 1B, C, where the SWAP gates are decomposed according to the scheme in ref. 53, and the circuits are transpiled by the functions \(\texttt{MergeInteractions}\), \(\texttt{MergeSingleQubitGates}\), and \(\texttt{DropEmptyMoments}\) in the quantum simulation software library Cirq73. The transition amplitudes are combined with the classically determined ground- and excited-state energies to calculate the spectral functions and response functions (see Supplementary Note 2).

Quantum device

The quantum device used in this work is a superconducting quantum processor with eight transmon qubits46,74. The algorithm is performed on a four-qubit subset of the device with linear connectivity. Single-qubit gates are performed with resonant microwave pulses. Multiplexed dispersive readout allows for simultaneous state discrimination on all four qubits. CZ gates between all nearest neighbors are performed according to the method in ref. 75. The same method allows for a native CS gate on a particular pair of qubits according to the requirements of the algorithm. While single-qubit gates are applied simultaneously, microwave crosstalk requires that all two- and three-qubit gates are applied in separate cycles from each other, as well as from any single-qubit gates. TrueQ76, a software framework for gate-level optimization, is used for circuit manipulations in the implementation of RC, as well as gate benchmarking. Internal software is used to map the circuits to hardware pulses for implementing the circuits with the native gate set.