Introduction

Combinatorial optimization problems are ubiquitous in modern science, engineering, and medicine. These problems are often NP-hard, so the runtime of classical algorithms for solving them is expected to scale exponentially. One approach for tackling such hard optimization problems is to map them to the Ising spin glass model1,

$${{{\mathcal{H}}}}=-\mathop{\sum}\limits_{i < j}{J}_{ij}{S}_{i}{S}_{j}-\mathop{\sum}\limits_{i}{h}_{i}{S}_{i}\,.$$

Here, each Si represents a classical Ising spin attaining a value of ±1, [Jij] is an Ising coupling matrix, and [hi] is a vector of local field biases on the spin sites. When all hi are zero, the Ising model is equivalent to a (weighted) MaxCut problem on a graph with vertices corresponding to the spin sites and edge weights corresponding to the Ising couplings between the spin sites. Various mathematical programming problems, such as partitioning problems, binary integer linear programming, covering and packing problems, satisfiability problems, coloring problems, Hamiltonian cycles, tree problems, and graph isomorphisms can be formulated in the Ising model, with the required number of spins scaling at most cubically with respect to the problem size2. This has been a primary motivation for the recent extensive study of various Ising solvers. Several potential areas of industrial application of Ising solvers include drug discovery and bio-catalyst development (e.g., in lead optimization or virtual screening), compressed sensing, deep learning (e.g., in the synaptic pruning of deep neural network), scheduling (e.g., resource allocation and traffic control), computational finance, and social networks (e.g., community detection).

Approximate algorithms and heuristics, such as semi-definite programming (SDP)3, simulated annealing (SA)4,5 and its variants6,7, and breakout local search (BLS)8 have been widely used as practical tools for solving MaxCut problems. However, even problem instances of moderate size require substantial computation time and, in the worst cases, solutions cannot be found with such approximate algorithms and heuristics. To overcome these shortcomings, a search for alternative solutions using various forms of quantum computing has been actively pursued. Adiabatic quantum computation9, quantum annealing10,11, and the quantum approximate optimization algorithm (QAOA)12 using circuit model quantum computers have been proposed. A coherent Ising machine (CIM) using networks of quantum optical oscillators has also been studied and implemented13,14.

Given that the present circuit model quantum computers suffer from short coherence times, gate errors, and limited connectivity among qubits, a fair comparison between them and modern heuristics is not yet possible15,16,17. This situation raises the important question of whether quantum devices can, even in principle, provide sensible solutions to combinatorial optimization problems, assuming all sources of noise and imperfections can be overcome and ideal quantum processors are built in the future. In order to address this pressing question, we perform a comparative numerical study on three distinct quantum approaches, ignoring the effects of noise, gate errors, and decoherence, that is, we compare the ultimate theoretical limits of three quantum approaches.

The first approach is based on the effects of constructive and destructive quantum interference of amplitudes in a circuit model quantum computer that utilizes only unitary evolution of pure states and projective (exact) measurement of qubits. The approach uses Grover’s search algorithm18,19 as a key computational primitive. We call this approach “DH-QMF” in reference to Dürr and Høyer’s “quantum minimum finding” algorithm20. Our scaling analysis of DH-QMF is presented in the Results section; additional details are provided in the Methods section. A review of related literature and a discussion of how our analysis differs from previous work are given in the Supplementary Materials.

The second approach is based on adiabatic quantum state preparation implemented on a circuit model quantum computer. The underlying concept, the quantum adiabatic theorem, goes as far back as the seminal work of Born and Fock21. Its application to quantum computing and solving optimization problems was introduced by Farhi et al.9. A Trotterized approximation to adiabatic evolution gives rise to a discrete implementation suitable for the circuit model. We refer to this approach as “discrete adiabatic quantum computation” (DAQC). This algorithm uses an iterative unitary evolution of pure states in a quantum circuit according to a mixing Hamiltonian and a problem Hamiltonian, which in the framework of adiabatic quantum computation correspond to the initial and final Hamiltonians of evolution, respectively. The coefficients in the exponents form the gate parameters, which can be treated as hyperparameters that follow a tuned schedule, and the overall number of Trotter steps directly pertains to the circuit depth of the algorithm. To attain the ultimate theoretical performance limit, we use pre-tuned DAQC schedules and allow for quantum circuits of arbitrary depth. Our scaling analysis of DAQC is presented in the Results section; additional details are provided in the Methods section. In the presence of noise, the closely related NISQ-type quantum approximate optimization algorithm (QAOA)12,22 (see the Supplementary Materials) deviates from DAQC in its use of (a) shallow (i.e., short-depth) quantum circuits (hence, attempting to perform ground-state preparation diabatically as opposed to adiabatically) and (b) an outer classical optimization routine to variationally optimize the diabatic evolution. We do not include the QAOA in this study in view of its poor and unstable scaling, which we empirically observed in comparison to that of DAQC. This poor performance is exacerbated especially if the overhead of the classical optimizer is taken into account. Our observations are consistent with the challenges of variational quantum algorithms in overcoming the barren plateau problem22,23. Further details are provided in the Methods section.

The third approach is based on a measurement-feedback coherent Ising machine (MFB-CIM)24,25. This algorithm utilizes a quantum-to-classical transition in an open-dissipative, non-equilibrium network of quantum oscillators. A critical phenomenon known as pitchfork bifurcation realizes the transition of squeezed vacuum states to coherent states in the optical parametric oscillator. The measurement-feedback circuit plays several important roles. It continually reduces entropy and sustains a quasi-pure state in the quantum oscillator network in a controlled manner using repeated approximate measurements. It, additionally, implements the Ising coupling matrix [Jij] and local field vector [hi] in an iterative fashion. Finally, it removes the amplitude heterogeneity among the oscillators and destabilizes the machine state out of local minima. Table 1 summarizes the differences among the three approaches studied in this paper.

Table 1 Three approaches studied for MaxCut problems: the Dürr–Høyer algorithm for quantum minimum finding (DH-QMF) based on Grover’s search, the discretized adiabatic quantum computation algorithm (DAQC), and the measurement-feedback coherent Ising machine (MFB-CIM)

When studying quantum algorithms, it is important to consider the effects of noise and control errors, and the overhead needed to overcome them. Several previous studies have investigated these effects on the performance of the QAOA (here viewed as a NISQ-type, diabatic counterpart to DAQC). Some studies26,27 consider the effects of various Pauli noise channels, namely, the dephasing, bit-flip, and depolarizing noise channels; these studies report on the fidelity of the state prepared by a noisy QAOA circuit to the state prepared by an ideal QAOA circuit, for varying amounts of physical noise affecting the circuit. In contrast, another study28 models noise via single-qubit rotations by an angle chosen from a Gaussian distribution with variance values of TG/T2, where TG is the gate time and T2 is the decoherence time of the qubits. All three papers provide insight into how noise affects the expected energy of the prepared state. Note that arbitrary-depth circuits are permitted in our study of DAQC, and optimal circuit depths resulting in the best algorithmic performance can be substantially larger than the size of circuits suitable for NISQ devices.

DH-QMF circuits are much deeper than typical DAQC circuits; thus, their performance is significantly hampered by various sources of noise unless the algorithm is run on a fault-tolerant quantum computer with quantum error correction29,30,31,32,33,34,35,36. A variety of different noise models have been used to study the sensitivity of Grover’s search by simulating small quantum circuits that apply it to simple functions. Prior work on this research topic include studies based on the following approaches: introducing random Gaussian noise at each step of Grover’s search29; analyzing the effect of gate imperfections on the probability of success of the algorithm30; examining the effect of unbiased and isotropic unitary noise resulting from small perturbations of Hadamard gates33; modeling the effect of decoherence by introducing phase errors in each qubit and at each time step and using a perturbative method31; and conducting a numerical analysis on the effects of single-qubit and two-qubit gate errors and memory errors, using a depolarizing channel to model decoherence34. The impact of using a noisy oracle has also been examined32, wherein noise is modeled by introducing random phase errors. Another recent study has investigated the effects of localized dephasing36. Finally, the effects of various noise channels have also been investigated more systematically by using trace-preserving, completely positive maps applied to density matrices35.

In our benchmark study, by “solving” an optimization problem we mean finding an actual optimal solution with high probability (as opposed to an approximate, suboptimal solution). For a fair comparison, this notion pertains to all three algorithms considered in this work. As a practical measure of the algorithms’ performance, we use the time-to-solution (TTS) metric, which refers to the time required to find an optimal solution with high confidence. For the MFB-CIM and DAQC, the TTS is computed as the number of “shots” (i.e., trials) that must be performed to ensure a high probability (specified by a target probability of success, often taken to be 0.99) of observing an optimal solution at least once, multiplied by the time required for the execution of a single shot. Similarly, for DH-QMF, the TTS is computed as the overall number of Grover iterations required to ensure a target probability of success of observing an actual optimal solution, multiplied by the time required to implement a single Grover iteration.

We have evaluated the wall-clock TTS of the three algorithms introduced above for solving MaxCut problems, and empirically found exponential scaling laws for them already in the relatively small problem size range of 4 to 800 spins. In order to elucidate the ultimate performance limits of these solvers, we assume no extrinsic noise, gate errors, or connectivity limitations exist in the hardware. That is, we assume that phase decoherence (T2) and energy dissipation (T1) times are infinite and gate errors are absent. Consequently, the overheads associated with performing quantum error correction and building fault-tolerant architectures and protocols are not included in our benchmarking study, as they would make the comparison less favorable for circuit model quantum algorithms against the MFB-CIM. We also assume that all spins (represented by qubits in the circuit model) can be coupled to each other via (non-local) spin–spin interaction with a universal gate time of 10 nanoseconds. Therefore, there is no need to implement expensive sequences of swap gates or other bus techniques for transferring quantum information across the hardware. However, since energy dissipation and stochastic noise both constitute important computational resources for the MFB-CIM, we allow a finite energy dissipation time T1, as well as a finite gate error limited by vacuum noise, for the MFB-CIM.

We emphasize that we compare optimistic lower bounds on the TTS for the circuit-model quantum algorithms considered in this paper. It is for this reason that we do not include the overhead costs associated with quantum error correction and the realization of fault-tolerant quantum computation schemes that become necessary for deep circuits of DH-QMF and DAQC. The impact of such overhead costs, for instance, when using topological surface code built of error-prone physical qubits and gates for encoding logical qubits and logical operations, is estimated more precisely in other recent works37,38,39. The asymptotic overhead introduced by fault-tolerant architectures can be inferred as follows. For DAQC, the circuit depth of each Trotter layer scales linearly with the problem size n (see the Results section). Therefore, the error rate of each logical gate must scale inversely with n, necessitating a code distance logarithmic in n. Fault-tolerant operations on an encoding scheme of distance d introduce at least a factor of d in physical gate time overhead. Hence, we can expect the TTS for the DAQC algorithm to increase by an \(\Omega (\log n)\) factor. Similarly, for DH-QMF, which is based on Grover’s search requiring circuits of depth \(\widetilde{\Theta }\left(\sqrt{{2}^{n}}\right)\) (see the Results section), the incurred overhead results in an increase in the TTS by a factor of Ω(n). This rough estimate does not account for compilation overhead, which would typically further increase the TTS. In addition, it also does not account for overheads caused by decoding and active error correction.

From a fundamental viewpoint, such a comparative study is of interest but the outcome is difficult to predict, because the three algorithms are based on completely different computational principles, as shown in Table 1. The DH-QMF algorithm iteratively deploys Grover’s search, which uses a unitary evolution of a superposition of computation basis states in order to amplify the amplitude of a target state by successive constructive interference, while the amplitudes of all the other states are attenuated by destructive interference. The DAQC algorithm attempts to prepare a pure state that has a large overlap with the ground state of the optimization problem through an approximation of the adiabatic quantum evolution. Finally, the ground state search mechanism of the MFB-CIM employs a collective phase transition at the threshold of an optical parametric oscillator (OPO) network. The correlations formed among the squeezed vacuum states in OPOs below the threshold guide the network toward oscillating at a ground state.

It is worth noting that all the algorithms in our study in various ways rely on hybrid quantum–classical architectures for computation. In an MFB-CIM with self-diagnosis and dynamical feedback control, a classical processor plays an important role by detecting when the OPO network is trapped in local minima, and destabilizes it out of those states. The DH-QMF algorithm also relies on comparing the values of an objective function with a (classical) threshold value. This threshold value is updated in a classical coprocessor as DH-QMF proceeds. Finally, DAQC relies on tuning a set of parameters (e.g., the rotation angles of quantum gates). These parameters can be treated as hyperparameters of a predefined approximate adiabatic evolution and tuned for the problem type solved by the algorithm. Alternatively, the quantum circuit can be viewed as a variational ansatz, in which case the gate parameters are optimized using a classical optimizer. In the latter case, the algorithm can be considered as a variational quantum algorithm40. The QAOA is commonly viewed as such an algorithm. In previous studies, the contribution of the variational optimization of DAQC parameters to the TTS has often been ignored. In fact, while both approaches (i.e., hyperparameter tuning and variational optimization) have been adopted for solving MaxCut problems using the QAOA41,42, our investigation makes it clear that the variational approach hurts the TTS scaling significantly. The optimization landscape for such a variational quantum algorithm is ill-behaved, which results in a poor and unstable scaling for TTS with respect to the size of the MaxCut instances (refer to the Methods section). As a result, the TTS scalings reported in this paper rely on pre-tuned DAQC schedules rather than variational optimization.

Results

Scaling of the MFB-CIM

A CIM is a non-equilibrium, open-dissipative computing system based on a network of degenerate OPOs to find a ground state of Ising problems13,43,44,45,46. The Ising Hamiltonian is mapped to the loss landscape of the OPO network formed by the dissipative coupling rather than the standard Hamiltonian coupling. By providing a sufficient gain to compensate for the overall network loss, a ground state of the target Hamiltonian is expected to build up spontaneously as a single oscillation mode14. However, the mapping of the cost function to the OPO network loss landscape often fails in the case of a frustrated spin problem due to the OPO amplitude inhomogeneity13,24. In addition, with an increasing number of local minima occurring as problem sizes become larger, the machine state is trapped in those minima for a substantial amount of time, thereby causing the machine to report suboptimal solutions14,25. Recently, self-diagnosis and dynamical feedback mechanisms have been introduced by a measurement-feedback CIM (MFB-CIM) to overcome these problems24,25. This is achieved by a mutual coupling field dynamically modulated for each OPO to suppress the amplitude inhomogeneity and simultaneously to destabilize the machine’s state out of local minima.

We first describe the principles of the MFB-CIM’s operation. A schematic diagram of two MFB-CIMs with predefined feedback control (hereafter referred to as “open-loop CIM”) and with self-diagnosis and dynamical feedback control (hereafter referred to as “closed-loop CIM”) is shown in Fig. 1a. If the fiber ring resonator has high finesse, both CIMs are modeled via the Gaussian quantum theory47,48. The dynamics captured by the master equation for the density operator (i.e., the Liouville–von Neumann equation) is driven by the parametric interaction Hamiltonian, \(\hat{\mathcal{H}}=i\hslash \frac{S}{2}{\sum}_{i}\big({\hat{a}}_{i}^{{\dagger} 2}-{\hat{a}}_{i}^{2}\big)\), the measurement-induced state reduction (the third term on the right-hand side in Eq. (1)), the coherent injection (the fourth term on the right-hand side in Eq. (1)), as well as three Liouvillians. The Liouvillians pertain to the linear loss due to measurement and injection couplings, \({\hat{\mathcal{L}}}_{{\rm{c}}}^{(i)}=\sqrt{J}{\hat{a}}_{i}\), two-photon absorption loss (i.e., parametric back conversion) in a degenerate parametric amplifying device, \({\hat{\mathcal{L}}}_{2}^{(i)}=\sqrt{B/2}\,{\hat{a}}_{i}^{2}\), and background linear losses, \({\hat{\mathcal{L}}}_{1}^{(i)}=\sqrt{{\gamma}_{{\rm{s}}}}{\hat{a}}_{i}\), respectively48. The master equation is thus given by

$$\begin{array}{rcl}{\displaystyle{\frac{d}{dt}}}\hat{\rho }=-{\displaystyle{\frac{i}{\hslash }}}\left[\hat{{{{\mathcal{H}}}}},\hat{\rho }\right]&+&\mathop{\sum}\limits_{i=1}^{n}\mathop{\sum}\limits_{k=1,2,{{{\rm{c}}}}}\left(\left[{\hat{{{{\mathcal{L}}}}}}_{k}^{(i)},\hat{\rho}{\hat{{{{\mathcal{L}}}}}}_{k}^{(i){\dagger}}\right]+\text{H.c.}\right)\\ &+&\sqrt{J}\mathop{\sum }\limits_{i=1}^{n}\left({\hat{a}}_{i}\hat{\rho }+\hat{\rho }{\hat{a}}_{i}^{{\dagger}}-\langle {\hat{a}}_{i}+{\hat{a}}_{i}^{{\dagger}}\rangle \hat{\rho }\right){w}_{i}+\displaystyle{\frac{J}{2}}\mathop{\sum}\limits_{i,k=1}^{n}{e}_{i}(t){J}_{ik}\left(\langle {\hat{a}}_{k}+{\hat{a}}_{k}^{{\dagger}}\rangle +\frac{{w}_{k}}{\sqrt{J}}\right)[{\hat{a}}_{i}^{{\dagger}}-{\hat{a}}_{i},\hat{\rho}].\end{array}$$
(1)

In general, the numerical integration of Eq. (1) requires exponentially growing resources as the problem size n (i.e., the number of spins) increases. Generally speaking, the size of the density matrix scales as \({{{\mathcal{O}}}}({{n}_{0}}^{n}\times {{n}_{0}}^{n})\), where n0 ≫ 1 is the maximum number of photons possible for each OPO pulse. In MFB-CIMs, however, there is no entanglement between the OPO pulses, that is, the OPO states are separable. Therefore, the simulation’s memory requirements reduce to \({{{\mathcal{O}}}}(n\times {{n}_{0}}^{2})\). However, this reduction still yields too many c-number differential equations due to the large upper bounds on the number of photons n0 ≲ 107 and the number of spins n ≤ 1000. The Gaussian quantum model has been introduced to overcome this difficulty25,48.

Fig. 1: Principles of the closed- and open-loop MFB-CIMs' operation.
figure 1

a Schematic diagram of the measurement-feedback coupling CIMs with and without the self-diagnosis and dynamic feedback control (closed-loop and open-loop CIMs) indicated using dashed blue and orange lines, respectively. b, c Dynamical behavior of the closed-loop and open-loop CIMs, respectively. (b1) and (c1) Inferred Ising energy (the dashed horizontal lines are the lowest three Ising eigenenergies). (b2) and (c2) Mean-field amplitude μi(t). (b3) and (c3) Feedback-field amplitude ei(t). (b4) Target squared amplitude a(t). (c4) Pump rate p(t).

In the case of a small saturation parameter, g2 = B/γs ≪ 1, we can split the i-th OPO’s pulse amplitude operator, \({\hat{a}}_{i}=\frac{1}{\sqrt{2}}({\hat{X}}_{i}+i{\hat{P}}_{i})\), into the mean field and small fluctuation operators, \({\hat{X}}_{i}=\langle {\hat{X}}_{i}\rangle +\Delta {\hat{X}}_{i}\) and \({\hat{P}}_{i}=\langle {\hat{P}}_{i}\rangle +\Delta {\hat{P}}_{i}\). The saturation parameter g2 corresponds to the inverse photon number at twice the threshold pump rate of a solitary OPO. With an appropriate choice of the pump phase, each OPO’s mean field is generated only in an \(\hat{X}\)-quadrature, that is, \(\langle {\hat{P}}_{i}\rangle =0\). The equation of motion for the mean field \({\mu }_{i}=\langle {\hat{X}}_{i}\rangle /\sqrt{2}\) and the variances \({\sigma }_{i}=\langle \Delta {{\hat{X}}_{i}}^{2}\rangle\) and \({\eta }_{i}=\langle \Delta {{\hat{P}}_{i}}^{2}\rangle\) obey the following equations48:

$$\frac{d}{dt}{\mu }_{i}=\left[-\left(1+j\right)+p-{g}^{2}{\mu }_{i}^{2}\right]{\mu }_{i}+j\xi {e}_{i}(t)\mathop{\sum}\limits_{k}{J}_{ik}{\tilde{\mu }}_{k}+\sqrt{j}\left({\sigma }_{i}-1/2\right){w}_{i}\,,$$
(2)
$$\frac{d}{dt}{\sigma }_{i}=2\left[-\left(1+j\right)+p-3{g}^{2}{\mu }_{i}^{2}\right]{\sigma }_{i}-2j{\left({\sigma }_{i}-1/2\right)}^{2}+\left[\left(1+j\right)+2{g}^{2}{\mu }_{i}^{2}\right],$$
(3)
$$\frac{d}{dt}{\eta }_{i}=2\left[-\left(1+j\right)-p-{g}^{2}{\mu }_{i}^{2}\right]{\eta }_{i}+\left[\left(1+j\right)+2{g}^{2}{\mu }_{i}^{2}\right].$$
(4)

Here, t = γsT refers to normalized and dimensionless time, where T is physical (or wall-clock) time, and γs is the background loss rate of the cavity. The time t is normalized so that the background linear loss (with a signal amplitude decay rate of 1/e) is 1. The term −(1 + j) in Eqs. (2) to (4) represents a background linear loss (−1) and an out-coupling loss (−j) for optical homodyne measurement and feedback injection, where j = J/γs is a normalized out-coupling rate (see Fig. 1a). The parameter p = S/γs is a normalized linear gain coefficient provided by the parametric device. The term \({g}^{2}{\mu }_{i}^{2}\) represents two-photon absorption loss (i.e., back conversion from signal to pump fields). The second and third terms on the right-hand side of Eq. (2), respectively, represent the Ising coupling term and the measurement-induced shift of the mean field μi. The inferred mean-field amplitude, \({\tilde{\mu }}_{k}={\mu }_{k}+\sqrt{\frac{1}{4j}}{w}_{k}\), deviates from the internal mean field amplitude μk by a finite measurement uncertainty in the optical homodyne detection. The random variable \({w}_{k}\sqrt{\Delta t}\) attains values drawn from the standard normal distribution, where Δt is a time step for the numerical integration of Eqs. (2) to (4). The k-th Ising spin Sk = ±1 is determined by the sign of the inferred mean-field amplitude, \({S}_{k}={\tilde{\mu }}_{k}/| {\tilde{\mu }}_{k}|\). Jik is the Ising coupling coefficient and ei(t) is a dynamically modulated feedback-field amplitude, while \(\xi =1/\sqrt{\frac{1}{N}{\sum }_{i,j}\left\vert {J}_{ij}\right\vert }\) is a feedback-gain parameter. The second term on the right-hand side of Eq. (3) represents the measurement-induced partial state reduction of the OPO field. The last terms of Eqs. (3) and (4), respectively, represent the variance increase by the incident (fresh) vacuum field fluctuations via linear loss and the pump noise coupled to the OPO field via gain saturation.

The dynamically modulated feedback-field amplitude ei(t) is introduced to reduce the amplitude inhomogeneity24, which is determined by the inferred signal amplitude \({\tilde{\mu }}_{i}\):

$$\frac{d}{dt}{e}_{i}(t)=-\beta \left[{g}^{2}{{\tilde{\mu }}_{i}}^{2}-a\right]{e}_{i}(t).$$
(5)

Here, β is a positive constant representing the rate of change for the exponentially growing or attenuating feedback amplitude ei(t), and a is a target squared amplitude. Both a and the pump rate p are dynamically determined by the difference of the current Ising energy \({{{\mathcal{E}}}}(t)=-{\sum }_{i < k}{J}_{ik}{S}_{i}{S}_{k}\) and the lowest Ising energy \({{{{\mathcal{E}}}}}_{{{{\rm{opt}}}}}\) visited previously:

$$a(t)=\alpha +{\rho }_{a}\tanh \left(\frac{{{{\mathcal{E}}}}(t)-{{{{\mathcal{E}}}}}_{{{{\rm{opt}}}}}}{\Delta }\right),$$
(6)
$$p(t)=\pi -{\rho }_{p}\tanh \left(\frac{{{{\mathcal{E}}}}(t)-{{{{\mathcal{E}}}}}_{{{{\rm{opt}}}}}}{\Delta }\right).$$
(7)

Here, π, α, ρa, ρp, and Δ are predetermined positive parameters which characterize the self-diagnosis and dynamic feedback control.

The machine can distinguish the following three modes of operation from the energy measurements. When \({{{\mathcal{E}}}}(t)-{{{{\mathcal{E}}}}}_{{{{\rm{opt}}}}}\, < \,-\Delta\), the machine is in a gradient descent mode and moving toward a local minimum, in which case the pump is set to a positive value of π + ρp (leading to parametric amplification). When \(| {{{\mathcal{E}}}}(t)-{{{{\mathcal{E}}}}}_{{{{\rm{opt}}}}}|\, \ll\, \Delta\), the machine is close to, or trapped in, a local minimum, in which case the pump is switched off (i.e., there is no parametric amplification) so as to destabilize the current spin configuration. When \({{{\mathcal{E}}}}(t)-{{{{\mathcal{E}}}}}_{{{{\rm{opt}}}}}\, > \,\Delta\), the machine is attempting to escape from a previously visited local minimum, in which case the pump is set to a negative value of π − ρp (i.e., there is parametric de-amplification) to increase the rate of spin flips.

Figure 1b shows the time evolution of a closed-loop CIM to demonstrate its inherent exploratory behavior from one local minimum to another. We solve a MaxCut problem with randomly generated discrete edge-weights Jij ∈ {−1, −0.9, …, 0.9, 1} over n = 30 vertices, for which an exact solution is obtained by performing an exhaustive search. The dynamical behavior of the inferred Ising energy measured from the ground state energy, \(\Delta {{{\mathcal{E}}}}(t)={{{\mathcal{E}}}}(t)-{{{{\mathcal{E}}}}}_{{{{\rm{G}}}}}\), the mean amplitude, μ(t), the feedback-field amplitude, e(t), and the target squared amplitude, a(t), are shown in Fig. 1b, c. The results shown in Fig. 1b are taken from a single trial for one particular problem instance and a particular set of noise amplitudes \({w}_{i}\sqrt{\Delta t}\). The feedback parameters are set to α = 1.0, π = 0.2, ρa = ρp = 1.0, Δ = 1/5, and β = 1.025. The saturation parameter and the out-coupling loss are chosen as g2 = 10−4 and j = 1, respectively. The time step Δt for the numerical integration of Eqs. (2) to (4) is identical to the normalized round-trip time Δtc = γsΔTc = 0.025. This means the signal-field lifetime 1/γs is 40 times greater than the round-trip time.

As shown in Fig. 1b1, the inferred Ising energy \({{{\mathcal{E}}}}(t)\) fluctuates up and down during the search for a solution even after the machine finds one of the degenerate ground states. As shown in Fig. 1b2, the measured squared amplitude \({g}^{2}{\tilde{\mu} }_{i}^{2}\) is stabilized to the target squared amplitude a through the dynamically modulated feedback mean-field ei(t). Several OPO amplitudes, however, flipped their signs followed by an exponential increase in ei(t), while most other OPOs maintained a target amplitude. During this spin-flip process, the feedback-field amplitude ei(t) increases exponentially and then decreases exponentially after the OPO’s squared amplitude \({g}^{2}{\tilde{\mu }}_{i}^{2}\) exceeds the target squared amplitude a(t). The mutual coupling strength \({\sum }_{k}{J}_{ik}{\tilde{\mu }}_{k}\) is adjusted in order to decrease the energy continuously by flipping the “wrong” spins and preserving the “correct” ones. If the machine reaches local minima, which may also include global minima (in which case there are degenerate ground states), the current Ising energy \({{{\mathcal{E}}}}(t)=-{\sum }_{i < k}{J}_{ik}{S}_{i}{S}_{k}\) is roughly equal to the minimum Ising energy \({{{{\mathcal{E}}}}}_{{{{\rm{opt}}}}}\) previously visited (\({{{\mathcal{E}}}}(t)\simeq {{{{\mathcal{E}}}}}_{{{{\rm{opt}}}}}\)). The machine then decreases the target squared amplitude a, which helps it to escape from the local minimum. During this escape, the current Ising energy \({{{\mathcal{E}}}}(t)\) becomes greater than the minimum Ising energy \({{{{\mathcal{E}}}}}_{{{{\rm{opt}}}}}\). The machine then switches the pump rate p to a negative value and deamplifies the signal amplitude, which results in further destabilization of the local minimum. As a consequence of such dynamical modulation of the pump rate p and the target squared amplitude a, the machine continually escapes local minima, migrating from one local minimum to another as the computation carries on. Figure 1c shows the time evolution of an open-loop CIM, in which both the pump rate p and the feedback-field amplitude ei(t) are predetermined constants.

As shown in Fig. 2a, b, the quantum states of the OPO fields satisfy the minimum uncertainty product, \(\langle \Delta {\hat{X}}^{2}\rangle \langle \Delta {\hat{P}}^{2}\rangle =1/4\), with a small excess factor of ~30% despite the open-dissipative nature of the machine. We note that each OPO state is in a quantum domain (\(\langle \Delta {\hat{X}}^{2}\rangle \,<\, 1/2\) or \(\langle \Delta {\hat{P}}^{2}\rangle \,<\, 1/2\)), which is shown by the shaded area in Fig. 2. This is a consequence of the repeated homodyne measurements performed during the computation, which iteratively reduces the entropy in the machine and partially collapses the OPO state such that it comes close to being a minimum-uncertainty state. In a closed-loop CIM, parametric amplification with a positive pump rate (p > 0) is employed only in the initial stage, but parametric deamplification with a negative pump rate (p < 0) is used later on. The resulting squeezing (\(\langle \Delta {\hat{X}}^{2}\rangle \,<\, 1/2\)) rather than anti-squeezing (\(\langle \Delta {\hat{X}}^{2}\rangle \,>\, 1/2\)) is favorable for exploration when using repetitive spin flips. In contrast, parametric amplification with a positive pump rate is used in an open-loop CIM throughout the computation.

Fig. 2: Uncertainty regimes of the quadratures associated with an OPO during the operation of the closed- and open-loopMFB-CIMs.
figure 2

Variances \(\langle \Delta {\hat{X}}^{2}\rangle\) and \(\langle \Delta {\hat{P}}^{2}\rangle\) for a a closed-loop CIM and b an open-loop CIM. The shaded areas show the quantum domains (\(\langle \Delta {\hat{X}}^{2}\rangle \,<\, 1/2\) or \(\langle \Delta {\hat{P}}^{2}\rangle < 1/2\)). Note that these are the results for one particular OPO, i.e., for one of the trajectories shown in Fig. 1b, c.

We now discuss our numerical findings for the TTS scaling of the two MFB-CIM schemes. Fig. 3a, b show the median of the success probability Ps and the TTS ts of the closed-loop CIM as a function of problem size n = 4, 5, …, 30 with varying runtime \({t}_{\max }\). We perform 1000 trials, with a trial considered successful if the machine finds an exact solution within \({t}_{\max }\). The success probability Ps decreases exponentially with respect to n, especially for \({t}_{\max }\le 5\). For a greater value of \({t}_{\max }\), the slope of the decay improves as shown in Fig. 3a. The TTS is defined as the expected computation time required to find a ground state for a particular problem instance with 99% confidence. As such, it is defined via

$${t}_{{{{\rm{s}}}}}={R}_{99}\cdot {t}_{\max }\,,$$
(8)

where \({R}_{99}=\frac{\log (0.01)}{\log (1-{P}_{{{{\rm{s}}}}})}\) is the number of trials required to achieve a 99% probability of success. We solve 1000 instances for each problem size (n = 4, …, 30) to evaluate the median Ps and TTS. Note that ts refers to the normalized and dimensionless TTS, while the actual wall-clock TTS (in seconds) is denoted by T. These two notions of TTS are related via the equation ts = γsT. The wall-clock time T is estimated by assuming a cavity round-trip time of ΔTc = 10 nanoseconds (all-to-all spin coupling is implemented in 10 nanoseconds), and a 1/e signal amplitude decay time of 400 nanoseconds (γsΔTc = 0.025). An important observation from Fig. 3b is that the optimal median TTS scales as an exponential function of the square root of the problem size, that is, an exponential of \(\sqrt{n}\) rather than n. This unique trend has been noticed before17.

Fig. 3: Performance of the closed-loop MFB-CIM, as a function of problem size, for various machine runtimes.
figure 3

a Success probability Ps and b time-to-solution (in units of signal field decay time 1/γs) as a function of problem size n for various runtimes \({t}_{\max }\). The black dotted line shows the best-fit TTS curve of the form \(A{B}^{\sqrt{n}}\).

Figure 4a, b show the optimum TTS of the closed-loop CIM and the open-loop CIM with respect to the problem size n. We solve two types of MaxCut problems. The first type are randomly generated instances with edge weights Jij ∈ {−1, −0.9, …, 0.9, 1}. We refer to these instances as 21-weight MaxCut problem instances. The second type are randomly generated Sherrington–Kirkpatrick (SK) spin glass instances with Jij = ±1. We study the open-loop CIM with the same Gaussian quantum model without dynamical modulation of ei(t), ai(t), and pi(t), but with measurement-induced state reduction (the third term of Eq. (2) and the second term of Eq. (3))48. We set the feedback parameters β = 0, ρa = 0, and ρp = 0 for the open-loop CIM in order to have a constant feedback field strength ei(t) = ei(0) = 1.0. The pump rate p is linearly increased from p = 0.5 at t = 0 (below threshold) to p = 1.0 at \({t}_{\max }\) (above threshold). As shown in Fig. 4a, b, the performance of the closed-loop CIM is superior to that of the open-loop CIM for both types of MaxCut problems.

Fig. 4: Scaling of the MFB-CIM in solving MaxCut problems.
figure 4

The optimal (median) time-to-solution of the closed-loop CIM and open-loop CIM on a 21-weight randomly generated Jij and b binary-weight randomly generated instances (Jij = ±1, SK model). The shaded regions represent the interquartile range (IQR), showing the region between the 25th and 75th percentiles obtained from the 1000 instances. The dashed blue and red lines are fitted curves of the form \(A{B}^{\sqrt{n}}\).

Table 2 summarizes the best-fitting parameters A and B for a function of the form \({t}_{{{{\rm{s}}}}}=A{B}^{\sqrt{n}}\) in both the closed-loop and open-loop CIMs. The smaller coefficient values for B for the closed-loop CIM than those for the open-loop CIM highlight the superior scaling of the closed-loop CIM compared to the open-loop variant. We note that A is expressed in units of a normalized time ts = γsT, where T is the wall-clock time. It is worth noticing that the scaling law of the sub-exponential function is not necessarily optimal for fitting the data within the problem size range n ≤ 30. In Fig. 13, we present results for a much wider range for the SK model.

Table 2 Parameters A and B found by regression of a function of the form \(A{B}^{\sqrt{n}}\) to the TTS curves of the closed-loop and open-loop CIMs for the two types of MaxCut instances

In what follows, we describe the impact of increasing the normalized total cavity loss rate on the TTS, which can be inferred by using a discrete-time model. Indeed, thus far we have presented the results of our study of the performance of closed-loop and open-loop CIMs with a high-finesse cavity. Nevertheless, it is obvious that a low-finesse cavity with a larger signal decay rate γs is favorable in terms of the runtime of the algorithm. This is because the wall-clock time T scales as T = ts/γs. However, it appears that the continuous-time Gaussian quantum theory based on the master equation [Eq. (1)] breaks down in the case of a low-finesse cavity. We now briefly describe a new discrete-time Gaussian quantum model49 that we used to find the optimum normalized loss rate. A more detailed description of this model is provided in the Supplementary Materials.

We treat the MFB-CIM as an n-mode bosonic system with 2n quadrature operators, \({\hat{X}}_{1},{\hat{P}}_{1},\ldots ,{\hat{X}}_{n},{\hat{P}}_{n}\), satisfying \([{\hat{X}}_{k},{\hat{P}}_{{k}^{{\prime} }}]=i{\delta }_{k{k}^{{\prime} }}\). If the system is in a Gaussian state, it is fully characterized by a mean-field vector μ and a covariance matrix Σ. In other words, the density operator of each OPO pulse can be written as \({\hat{\rho }}_{i}({\mu }_{i},{\Sigma }_{i})\), where

$${\mu }_{i}=\left(\langle {\hat{X}}_{i}\rangle ,\langle {\hat{P}}_{i}\rangle \right),$$
(9)
$${\Sigma }_{i}=\left(\begin{array}{cc}\langle {{\hat{X}}_{i}}^{2}\rangle &{\displaystyle{\frac{1}{2}}}\langle \Delta {\hat{X}}_{i}\Delta {\hat{P}}_{i}+\Delta {\hat{P}}_{i}\Delta {\hat{X}}_{i}\rangle \\ {\displaystyle{\frac{1}{2}}}\langle \Delta {\hat{X}}_{i}\Delta {\hat{P}}_{i}+\Delta {\hat{P}}_{i}\Delta {\hat{X}}_{i}\rangle &\langle {{\hat{P}}_{i}}^{2}\rangle \end{array}\right).$$
(10)

We let \(\hat{\rho }\left({\mu }_{i}(\ell ),{\Sigma }_{i}(\ell )\right)\) denote the state of the i-th OPO pulse just before it starts its -th round trip through the cavity. To propagate the state of the i-th signal pulse from \(\hat{\rho }({\mu }_{i}(\ell ),{\Sigma }_{i}(\ell ))\) to \(\hat{\rho }({\mu }_{i}(\ell +1),{\Sigma }_{i}(\ell +1))\), we perform the following five discrete maps iteratively: the background linear-loss map \({{{\mathcal{B}}}}\), the OPO crystal propagation map χ, the out-coupling loss map \({{{{\mathcal{B}}}}}_{{{{\rm{out}}}}}\), the homodyne detection map H, and the feedback injection map \({{{\mathcal{D}}}}\). These discrete maps are defined in the Supplementary Materials.

In order to see how the wall-clock TTS of the closed-loop and open-loop CIMs is decreased by increasing the total cavity loss rate γs(1 + j), we solve the 21-weight MaxCut instances and the SK model instances for n = 30 to explore the TTS as a function of the normalized total loss rate γsΔTc(1 + j). The results are shown in Fig. 5. The saturation parameter and the out-coupling loss are chosen as g2 = 10−4 and j = 1, respectively. The feedback parameters are set to α = 0.5, π = 0.2, ρa = ρp = 0 (while we keep a and p constant), and β = 0.2.

Fig. 5: Performance of the MFB-CIM as a function of the total cavity loss rate.
figure 5

Median TTS expressed in units of round trips (left y-axis) and the corresponding wall-clock time (right y-axis) of the closed-loop CIM and the open-loop CIM versus the normalized total loss rate γsΔTc(1 + j) for a 21-weight problem instances and b SK model instances, for size n = 30 and j kept constant at the value 1.

As expected, the TTS (expressed in terms of the number of round trips) decreases monotonically for both problem types and for both the closed-loop and open-loop CIMs as long as γsΔTc(1 + j) ≲ 0.1 (i.e., in the case of a high-finesse cavity). However, if γsΔTc(1 + j) ≳ 1 (i.e., in the case of a very-low-finesse cavity), the TTS increases for both the closed-loop and the open-loop CIMs. This is because one homodyne measurement per round-trip loss does not provide sufficiently accurate information about the internal OPO pulse state and, therefore, the measurement-feedback circuit fails to implement the Ising Hamiltonian and self-diagnosis feedback properly. At n = 30, the optimum normalized loss rate is γsΔTc(1 + j) ≈ 1 for both the closed-loop and the open-loop CIMs. Additional details on how we find the optimal loss parameters are discussed in the Methods section.

Scaling of DAQC

We now analyze the efficacy of the DAQC algorithm in solving MaxCut problems. In this paper, DAQC is associated with the first-order Suzuki–Trotter expansion of the adiabatic Hamiltonian evolution. This algorithm attempts to prepare the ground state of a target Hamiltonian HP. A typical circuit for DAQC is shown in Fig. 6. The state \({\left\vert +\right\rangle }^{\otimes n}\) is prepared on n qubits, and is evolved through a sequence of p “layers”. Each layer consists of an evolution according to HP along a computational basis, here chosen to be the Pauli-Z eigenbasis, followed by an evolution under a mixing Hamiltonian HM = ∑iXi. A vector of tunable parameters γ = (γ1, …, γp) is chosen, where each entry γi corresponds to the angle of rotation along HP in the i-th layer. Similarly, a vector β = (β1, …, βp) is chosen for the HM evolutions. Finally, the qubits undergo projective measurements in the computational basis, and the measurement results are used to compute the energy values of HP.

Fig. 6: DAQC circuit with p layers.
figure 6

The rotation parameters satisfy γi ∈ [0, π) and βi ∈ [0, π/2). This circuit ansatz results from Hamiltonian simulation implementing a discretized adiabatic evolution in terms of a first-order Suzuki–Trotter expansion.

A “shot” of the circuit with the parameters (γ, β) is defined as a single execution of the circuit from preparation to measurement, and returns a single energy measurement. Multiple shots performed with the same parameters (γ, β) can return different results, as they are taken from independent copies of the same prepared state \(\left\vert \psi (\gamma ,\beta )\right\rangle\). For the weighted MaxCut problem, we use the target Hamiltonian HP = ∑i,jJijZiZj, which is diagonal in the computational basis and whose ground states correspond to the largest cuts of the complete n-vertex graph with edge weights Jij.

We study two schemes for optimizing the gate parameters of the DAQC algorithm. The first scheme treats gate parameters as hyperparameters that follow a tuned DAQC schedule. The second scheme uses a variational hybrid quantum–classical protocol to optimize the gate parameters, similar to the method typically used for the QAOA. In our numerical experiments, we observed a better TTS scaling for the first scheme compared to the second scheme (see the Methods section); therefore, we use the first scheme to conduct our scaling analysis.

To study the time-to-solution of the DAQC algorithm in solving MaxCut problems, we analyze the algorithm using pre-tuned Trotterized adiabatic scheduling. We use randomly generated graphs of size n ∈ {10, …, 20}. Our test set consists of 1000 graphs of each size, with edge weights Jk = ±0.1j, where j ∈ {0, 1, …, 10}.

Given a parameter vector (γ, β), we evaluate the TTS of DAQC as a product of two terms6,

$${{{\rm{TTS}}}}(\gamma ,\beta )={R}_{99}(\gamma ,\beta )\cdot {t}_{{{{\rm{ss}}}}}\,,$$
(11)

where tss is the time taken for a single shot.

The R99 is the number of shots that must be performed to ensure a 99% probability of observing the ground state of HP. It is a metric commonly used to benchmark the success of heuristic optimization algorithms. If the state \(\left\vert \psi (\gamma ,\beta )\right\rangle\) has a probability p of being projected onto the ground state, then

$${R}_{99}(\gamma ,\beta )=\frac{\log (0.01)}{\log (1-p)}.$$
(12)

We estimated the time required for a single shot using the following assumptions for an ideal, highly performant quantum computer with access to arbitrary-angle, single-qubit X-rotations and two-qubit ZZ-rotations. The preparation and measurements of qubits collectively take 1.0 microseconds. The processor performs any single-qubit or two-qubit gate operations in 10 nanoseconds. Gate operations may be performed simultaneously if they do not act on the same qubit. In addition, all components of the circuit are noise-free and, therefore, there is no overhead for quantum error correction or fault-tolerant quantum computation.

For each problem size varying from 10 to 20 vertices, Fig. 7 shows a plot of the median TTS, suggesting that the TTS scales exponentially with respect to problem size. With more layers, DAQC has a lower potential R99, but a single shot takes more time. We found the best scaling was achieved with p ≈ 20 layers. However, near-term hardware will suffer from various sources of noise, such as decoherence and control noise, which will restrict us to employing shallow DAQC circuits with only a few layers, for example, p = 4.

Fig. 7: Scaling of the DAQC algorithm in solving MaxCut problems.
figure 7

The TTS results are obtained by simulating DAQC, using pre-tuned adiabatic scheduling rather than optimizing its parameters variationally. The number of qubits required to implement the algorithm is n. a TTS scaling for a 4-, 10-, 20-, and 50-layer DAQC algorithm as the problem size grows from 10 to 20 vertices. A best-fit line (dashed) is drawn to the median of the TTSs of the 1000 instances of each size, whose IQR ranges are represented using colored bars. The equation of this linear regression is given by \(\ln ({{{\rm{TTS}}}})=mn+b\), where n is the problem size. In the Supplementary Materials, we present the results of additional regression analysis for more-general scaling laws of the form \(\log ({{{\rm{TTS}}}})=m{n}^{c}+b\). The highest confidence with respect to the quality of the regression fit is indeed obtained at an exponent value close to c = 1, which supports our conjecture that DAQC scales exponentially. In actuality, the scaling is found to be slightly sub-exponential, at the value c ≈ 0.9. b Slope of the linear regression for a range of layers. The best scaling for DAQC on these 21-weight MaxCut instances is observed at 20 layers. c TTS scaling for the SK model, when using a 20-layer DAQC. A best-fit linear-regression is drawn to the median of the TTSs of the 1000 instances for each problem size.

The DAQC parameters (γ, β) used in Fig. 7 were produced using the formula explained in what follows. Recall the setup for quantum adiabatic evolution50. Given an initial Hamiltonian H0 and a target Hamiltonian H1, we consider the time-dependent Hamiltonian

$$H(t)=s(t){H}_{1}+(1-s(t)){H}_{0},t\in [0,T]$$

over a total annealing time T, where the function s(t) is an increasing schedule satisfying s(0) = 0 and s(T) = 1. The time-dependent Hamiltonian H(t) is then applied to the ground state of H0. Let ψ(t) denote the wavefunction at time t, so that ψ(0) is the ground state of H0 and ψ evolves according to the Schrödinger equation

$$\dot{\psi }=-i\left(s(t){H}_{1}+(1-s(t)){H}_{0}\right)\psi .$$

We use Trotterization to approximate the prepared state ψ(T). Let

$${c}_{k}:=\int\nolimits_{(k-1)T/p}^{kT/p}s(t)\,dt\quad \,{{{\rm{and}}}}\quad \,{b}_{k}:=\int\nolimits_{(k-1)T/p}^{kT/p}(1-s(t))\,dt.$$

Then,

$$\psi (T)\,\approx \,{e}^{-i{b}_{p}{H}_{0}}{e}^{-i{c}_{p}{H}_{1}}\cdots {e}^{-i{b}_{1}{H}_{0}}{e}^{-i{c}_{1}{H}_{1}}\psi (0),$$
(13)

and this approximation becomes exact in the limit as p → .

The Hamiltonians H0 and H1 are both chosen to have a Frobenius norm equal to 1. We divide both HM and HP by their corresponding norms, which can easily be calculated, as each Hamiltonian is a sum of the orthogonal Pauli terms

$${H}_{0}=\frac{1}{\parallel {H}_{{{{\rm{M}}}}}\parallel }{H}_{{{{\rm{M}}}}}=-\frac{1}{\sqrt{n}}\mathop{\sum}\limits_{i}{X}_{i}$$

and

$${H}_{1}=\frac{1}{\parallel {H}_{{{{\rm{P}}}}}\parallel }{H}_{{{{\rm{P}}}}}=\frac{1}{\sqrt{{\sum }_{i,j}{J}_{ij}^{2}}}\mathop{\sum}\limits_{i,j}{J}_{ij}{Z}_{i}{Z}_{j}.$$

Thus,

$${\gamma }_{k}=\int\nolimits_{(k-1)T/p}^{kT/p}\frac{s(t)}{\parallel {H}_{{{{\rm{P}}}}}\parallel }\,dt\,\,\,{{{\rm{and}}}}\,\,\,{\beta }_{k}=\int\nolimits_{(k-1)T/p}^{kT/p}\frac{1-s(t)}{\parallel {H}_{{{{\rm{M}}}}}\parallel }\,dt.$$

Empirically, we found that enforcing this Frobenius normalization has yielded a very well-performing schedule for DAQC for multiple problem types. The theoretical basis for this is yet to be fully understood.

The schedule s(t) should have an “inverted S” shape51,52, as illustrated in Fig. 8, in order to handle the squeezed energy gap in the middle. We take s(t) to be a cubic function with the general form

$$s(t)=\frac{t}{T}+a\cdot \frac{t}{T}\left(\frac{t}{T}-\frac{1}{2}\right)\left(\frac{t}{T}-1\right)$$
(14)

for a free hyperparameter a. When a = 0, s(t) is a straight linear path. When a = 4, s(t) is a curved path with a slope of 0 at t = T/2. We found by empirical means that a = 4 and T = p(1.6 + 0.1n) are the best hyperparameters. See the Methods section for more details.

Fig. 8: Trotterization of adiabatic evolution into p = 6 layers.
figure 8

The integrals computing bk and ck yield the coefficients for H0 and H1, respectively.

We also compare the TTS for DAQC to the TTS for breakout local search (BLS), a classical search algorithm. For each graph instance, 20 runs of BLS were performed, and runtimes were averaged to obtain the TTS. The algorithm’s runtime for each run was capped at 0.1 seconds, although the minimum value was almost always found within that time. Figure 9 demonstrates that the TTS for DAQC shows no significant correlation with the TTS for BLS. We now summarize the challenges encountered when using the variational quantum–classical protocol that we also extensively explored in our initial studies. This protocol is typical for the approach known as the QAOA. It includes an optimization loop which learns better parameters (γ, β) by using the data from already-performed shots. However, we found that including an optimization step did not improve the total TTS for the following reasons, and therefore did not include the step in our analysis. The R99 is impossible to measure without knowledge of the ground state, and therefore any optimization routine must instead rely on energy measurements. A common approach is to use the expected energy, \(\left\langle \psi (\gamma ,\beta )\right| {H}_{{{{\rm{P}}}}}\left| \psi (\gamma ,\beta )\right\rangle\), which is estimated by averaging over the multiple shots taken with the parameters (γ, β). This approach suffers from two limitations. First, we must use a large number of shots to accurately estimate the expected energy, which makes the optimization step costly. This is consistent with the challenges encountered in overcoming the problem known as barren plateau phenomenon22,23. Second, the expected energy is an imperfect stand-in for R99, and therefore optimization typically offers little to no improvement upon the annealing-inspired parameter schedule. See the Methods section for more details.

Fig. 9
figure 9

Scatter plot of DAQC-TTS versus BLS-TTS indicating there is no significant correlation between the difficulty of an instance of DAQC versus the difficulty of an instance of breakout local search.

Scaling of DH-QMF

We now consider using Dürr and Høyer’s algorithm for quantum minimum finding (DH-QMF)20 to find the ground state of an Ising Hamiltonian corresponding to a MaxCut problem. Given a real-valued function \(E\!:\,S\to {\mathbb{R}}\) on a discrete domain S of size \({N}=\vert{S}\vert\), DH-QMF finds a minimizer of E (out of the possibly many) using \({{{\mathcal{O}}}}(\sqrt{N})\) queries to E. In our case, the domain S is the set of all spin configurations of a classical Ising Hamiltonian on n sites (N = 2n), and the function E maps each spin configuration to its energy. The DH-QMF algorithm is a randomized algorithm, that is, it succeeds in finding the optimal solution only up to a (high) probability. The probability of failure of DH-QMF can be made arbitrarily small without changing the mentioned complexity. A schematic illustration of DH-QMF is shown in Fig. 10, and additional technical details can be found in the Methods section and the Supplementary Materials.

Fig. 10: Schematic illustration of the Dürr–Høyer algorithm for quantum minimum finding (DH-QMF) applied to searching for a spin configuration corresponding to the energy minimum (ground state).
figure 10

The possible spin configurations are labeled by the indices \(y\in \left\{0,\ldots ,{2}^{n}-1\right\}\). The algorithm starts by choosing uniformly at random an initial guess for the “threshold index” y, whose energy E(y) serves as a threshold: solutions to the problem cannot have an energy value larger than this threshold. The main step of the algorithm is a loop consisting of Grover’s search for a spin configuration with an energy value strictly smaller than the threshold energy, followed by a threshold-index update. This loop needs to be repeated many times until the threshold index eventually holds the solution with a probability of success higher than a given target lower bound, say, e.g., psucc = 0.99. The final step returns the threshold index as output. A key element of the Grover’s search subroutine is an oracle which marks all states whose energies are strictly smaller than the threshold energy. Note that Grover’s search may fail to output a marked state.

Given an n-spin Ising Hamiltonian

$$H=-\mathop{\sum}\limits_{0\le i < j\le n-1}{J}_{ij}{Z}_{i}{Z}_{j}$$
(15)

corresponding to an undirected weighted graph of size n, its N = 2n energy eigenstates can be labeled by the integer indices 0 ≤ y ≤ N − 1, with the corresponding energy eigenvalues E(y). The index y associated with a computational basis state \(\left| y\right\rangle =\left| {\eta }_{0}\right\rangle \otimes \cdots \otimes \left| {\eta }_{n-1}\right\rangle\) represented by the classical bits ηj ∈ {0, 1} is the binary representation \(y=\sum\nolimits_{j = 0}^{n-1}{\eta }_{j}{2}^{j}\) of the bit string (η0, …, ηn−1).

The algorithm starts by choosing uniformly at random an index y ∈ {0, …, N − 1} as the initial “threshold index”. The threshold index is used to initiate a Grover’s search19,53. The Grover subroutine searches for a label y whose energy is strictly smaller than the threshold value E(y). We measure the output of Grover’s search and (classically) ascertain whether the search has been successful, E(y) < E(y), in which case we (classically) update the threshold index from y to y, and then continue by performing the next Grover’s search using the new threshold. The threshold is not updated if Grover’s search fails to find a better threshold.

In this paper, we assume a priori knowledge of a hyperparameter we call the number of “Grover iterations” (see the Methods section) inside every Grover’s search subroutine that guarantees a sufficiently small failure probability. However, the practical scheme for using DH-QMF consists of multiple trials of Grover’s search and iterative updates to the threshold index. We terminate this loop when the Grover subroutine repeatedly fails to provide any further improvement to y and the probability of the existence of undetected improvements drops below a sufficiently small value. Finally, we return the last threshold index as the solution. As shown by Dürr and Høyer20, the overall required number of Grover iterations needed to find the ground state with sufficiently high probability, say 1/2, is in \({{{\mathcal{O}}}}(\sqrt{N})\).

We now discuss our TTS benchmarking analysis for DH-QMF. We investigate the scaling of the time required by DH-QMF to find a solution of weighted MaxCut instances with a 0.99 success probability, assuming an optimistic scenario that is explained in the Methods section. This runtime is analogous to the TTS measure defined in previous sections for the heuristic algorithms of the MFB-CIM and DAQC and we therefore call this runtime a TTS as well. For each instance of the problem we have estimated an optimistic lower bound on the runtime of the quantum algorithm with numbers of Grover’s iterations in DH-QMF set (ahead of any trials) to achieve an at least 0.99 success probability. As this optimal number of Grover’s iterations is dependent on the specific MaxCut instance, we consider this an optimistic bound on performance of DH-QMF. We use the same test set of randomly generated 21-weight MaxCut instances as in previous sections.

Our results are illustrated in Fig. 11. The optimistic values for the TTS are in the range of orders of magnitude of 1.0 milliseconds – 1.0 seconds for the considered range of the number of vertices, 10 ≤ n ≤ 20. These results are based on the same set of assumptions for the quantum processor as used for DAQC (see the paragraph following Eq. (12)).

Fig. 11: Scaling of Dürr and Høyer’s algorithm for quantum minimum finding (DH-QMF) in solving MaxCut.
figure 11

a Time-to-solution (TTS) for 21-weight problem instances. b TTS for the SK model instances. In both cases, for each value of the number of vertices in the range 10 ≤ n ≤ 20, DH-QMF has been emulated for 1000 (dark blue data) MaxCut instances (see main text). A non-linear least-squares regression (orange curve) has been performed to fit the expected runtime scaling in Eq. (16), respectively, resulting in a sum of squared residuals approximately 1.2 × 10−4s2 for 21-weighted instances and 3.30 × 10−3s2 for the SK model instances. A logarithmic scale has been used to display the data and the regression fits. Note that the contributions from the logarithmic factors become more (less) significant for smaller (larger) problem sizes.

Our estimates for the runtime of the quantum algorithm are obtained as follows. We note that DH-QMF consists of a sequence of Grover’s search algorithms. The total runtime of DH-QMF is therefore the sum of the runtimes of the quantum circuits, each of which corresponds to a Grover’s search. The runtime of each such circuit is calculated using the depth of that circuit, which is the length of the longest sequence of native operations on the quantum processor (i.e., qubit preparations, single-qubit and two-qubit gates, and qubit measurements) in that circuit, assuming maximum parallelism between independent operations. This path is also known as the “critical path” of a circuit. The runtime of the circuit is therefore identical to the sum of the runtimes of the operations along the critical path, with a contribution of 1.0 milliseconds in total for both qubit initialization and measurement, and 10 nanoseconds for any quantum gate operation along the critical path.

The asymptotic scaling of the TTS is identical to the scaling of the circuit depth, which is

$$\Theta \left(\sqrt{{2}^{n}}\left({n}^{2}\log \log n+{(\log n)}^{2}+n\right)\right),$$
(16)

as shown in the Methods section. Here the \(\Theta \left(\sqrt{{2}^{n}}\right)\) contribution is that of the number of Grover iterations (identical to the query complexity of Grover’s search), while the \({{{\rm{poly}}}}(n,\log n,\log \log n)\) factors are the contribution of each single Grover iteration consisting of an oracle query with implementation cost \(\Theta \left({n}^{2}\log \log n+{(\log n)}^{2}\right)\) and the Grover diffusion with cost \(\Theta \left(n\right)\). A nonlinear least-squares regression toward this scaling is shown in Fig. 11 for both the 21-weight and the SK model problem instances, respectively. Note that the contributions of logarithmic terms are significant only for small problem sizes.

Alongside the optimistic runtime, we have also computed lower bounds on the number of quantum gates, including concrete counts for the overall number of single-qubit gates, two-qubit CNOT gates, and T gates (see Fig. 12). Our circuit analysis, presented in the Methods section, yields the gate complexity

$$\Theta \left(\sqrt{{2}^{n}}\left({n}^{2}\log n\log \log n+{(\log n)}^{2}+n\right)\right).$$
(17)

Our concrete resource estimates have been generated using ProjectQ54.

Fig. 12: Optimistic gate counts for DH-QMF in solving the MaxCut problem.
figure 12

For each value of the number of vertices in the range 10 ≤ n ≤ 20, the DH-QMF algorithm was emulated for 1000 (blue data) 21-weight MaxCut instances, see main text. Concrete counts were conducted for the a overall number of single-qubit gates, and b two-qubit CNOT gates. A non-linear least-squares regression (orange curve) has been performed to fit the expected gate complexity given in Eq. (17), respectively. A logarithmic scale has been used to display the data and the regression fits. The number of qubits required to implement the algorithm scales as \({{{\mathcal{O}}}}\left(n+\log n\right)\).

Comparison of the three algorithms

A direct comparison of the three algorithms for solving MaxCut problems is illustrated in Fig. 13. In Fig. 13a, the median wall-clock TTS of DH-QMF, DAQC, and the closed-loop MFB-CIM are plotted as a function of the problem size n for randomly generated 21-weight MaxCut instances. The solid blue line indicates a best-fitting curve, \({f}_{{{{\rm{CIM}}}}}(n)=A{B}^{\sqrt{n}}\), for the closed-loop MFB-CIM, where A = 121 nanoseconds and B = 2.21; the solid orange line represents a best-fitting curve, \({f}_{{{{\rm{DAQC}}}}}(n)={A}^{{\prime} }{B}^{{\prime} {n}^{0.9}}\), for a 20-layer DAQC, where \({A}^{{\prime} }=3.56\) microseconds and \({B}^{{\prime} }=1.26\); and the solid green curve represents a best-fitting curve, \({f}_{{{{\rm{QMF}}}}}(n)=\left(\tilde{A}{n}^{2}\log \log n+\tilde{C}{(\log n)}^{2}+\tilde{D}n\right){\tilde{B}}^{n}\), for DH-QMF, where \(\tilde{B}=\sqrt{2}\), and \(\tilde{A}\), \(\tilde{C}\), and \(\tilde{D}\) are equal to 3.9, 5.25 × 102, and −2.97 × 102 milliseconds, respectively.

Fig. 13: Comparison of the time-to-solution (TTS) scalings for the MFB-CIM, DAQC, and DH-QMF in solving MaxCut problems.
figure 13

a Wall-clock time of a closed-loop CIM with a high-finesse cavity (γsΔTc = 0.1), DAQC with an optimum number of layers (p = 20), and DH-QMF with an a priori known number of optimum iterations versus problem size n for fully connected 21-weight graphs. b TTS of the closed-loop CIM on the fully connected SK model for problem sizes from n = 100 to n = 800, in steps of 100. For each problem size, the minimum TTS with respect to the optimization over \({t}_{\max }\) is plotted. In comparison, the SK model TTSs are shown for 20-layer DAQC and DH-QMF for problem sizes ranging from n = 10 to n = 20. The straight, lighter-blue line (a linear regression) for the CIM demonstrates a scaling according to \(A{B}^{\sqrt{n}}\). The lighter-orange and lighter-green best-fit curves for DAQC and DH-QMF are extrapolated to larger problem instances, illustrating a scaling that is exponential in n rather than in \(\sqrt{n}\). In both figures, the shaded regions show the IQRs.

In order to see how the performance of a closed-loop MFB-CIM scales with increasing problem size, we solved MaxCut problems with SK instances of problem sizes n = 100, 200, …, 800. A total of 100 instances of the SK model for each problem size were randomly generated. Using a closed-loop MFB-CIM, we solved each instance 100 times to evaluate the success probability Ps of finding a ground state and compute a wall-clock time to achieve a success probability of ≥0.99. It is assumed that all-to-all spin coupling is implemented in 10 nanoseconds, which corresponds to a cavity round-trip time. The signal field lifetime is 100 nanoseconds, that is, γsΔTc = 0.1. We use the continuous-time Gaussian model as described in the Results section. The results are shown in Fig. 13b, along with the predicted performance of DAQC and DH-QMF for the SK model instances. The minimum wall-clock TTS for the closed-loop MFB-CIM at the optimized runtime \({t}_{\max }\) scales as an exponential function of \(\sqrt{n}\), while those for DH-QMF and DAQC scale as exponential functions of n. At a problem size of n = 800, the wall-clock TTS for the closed-loop MFB-CIM is ~10 milliseconds, while those for DH-QMF and DAQC are ~10120 seconds and ~1050 seconds, respectively.

For the bimodal SK model, which is known to be “easy”for many algorithms, and for a limited problem-size range of 100 ≤ n ≤ 500, we empirically observe a sub-exponential scaling of \(\Theta ({2}^{\sqrt{n}})\) for the closed-loop MFB-CIM’s TTS. Such a sub-exponential scaling in solving the SK model instances using CIM-based algorithms has also been reported in other recent studies17,55. For the 21-weight problem instances, due to the limited problem-size range 5 ≤ n ≤ 30 of the data available, we cannot reliably infer the actual asymptotic scaling. While our results for the MFB-CIM seem to agree well with a sub-exponential scaling (with the same exponent \(\sqrt{n}\)), extrapolations from numerical findings based on small-sized problem instances can potentially be misleading.

In contrast, the scaling of DAQC appears to be exponential. In the absence of empirical data for large problem sizes (even for the SK instances), we perform a careful regression analysis on our data, which we report in the Supplementary Materials. Our analysis suggests an exponential scaling for solving the SK model problem instances and a slightly sub-exponential scaling with the exponent n0.9 for the 21-weight problem instances. Nevertheless, we remain reluctant to extrapolate any exponential scaling laws from this investigation.

As for the TTS scaling of the DH-QMF algorithm, an exponential law of \(\widetilde{{{{\mathcal{O}}}}}\left(\sqrt{{2}^{n}}\right)\) for the query complexity is supported by rigorous proofs20,53. Our benchmarking study reveals that this exponential scaling is not improved for problem instances based on the SK model. In addition, the query complexity does not account for the cost of a single query to the oracle. Our benchmarking results, shown in Fig. 11, are based on a regression towards the scaling given in Eq. (16), which includes an additional \({{{\rm{poly}}}}(n,\log n)\) factor to account for the scaling of the circuit depth of our oracle implementation.

Discussion

In this paper, we have presented the results of our study of the scaling of two types of measurement-feedback coherent Ising machines (MFB-CIM) and compared this scaling to that of discrete adiabatic quantum computation (DAQC) and the Dürr–Høyer algorithm for quantum minimum finding (DH-QMF). We performed this comparative study by testing numerical simulations of these algorithms on 21-weight MaxCut problems, that is, weighted MaxCut problems with randomly generated edge weights attaining 21 equidistant values from − 1 to 1. We emphasize that our study was a numerical analysis; its results depend on the experimental choices we have empirically made to the best of our abilities.

The MFB-CIM of the first type is an open-loop MFB-CIM with predefined feedback control parameters and the second is a closed-loop MFB-CIM with self-diagnosis and dynamically modulated feedback control parameters. The open-loop MFB-CIM utilizes the anti-squeezed \(\hat{X}\) amplitude near threshold under a positive pump amplitude for finding a ground state but at larger problem sizes the machine is often trapped in local minima. The closed-loop MFB-CIM employs the squeezed \(\hat{X}\) amplitude under a negative pump amplitude, in which a finite internal energy is sustained through an external feedback injection signal rather than through parametric amplification. This second machine self-diagnoses its current state by performing Ising energy measurement and comparison with the previously attained minimum energy. The machine continues to explore local minima without getting trapped even in a ground state. We observed that for both the 21-weight MaxCut problems and the SK Ising model, the closed-loop MFB-CIM outperforms the open-loop MFB-CIM. One remarkable result is that a low-finesse cavity machine realizes a shorter TTS than a high-finesse one. This fact clearly demonstrates that the dissipative coupling of the machine to external reservoirs is a crucial computational resource for MFB-CIMs. The wall-clock TTS of the closed-loop MFB-CIM closely follows \({{{\rm{TTS}}}}\,\approx \,4.32\times {(1.34)}^{\sqrt{n}}\) microseconds for the SK model instances of size n ranging from 100 to 800, assuming a cavity round-trip time of 10 nanoseconds and a 1/e signal amplitude decay time of 100 nanoseconds (γsΔTc = 0.1). The performance of the MFB-CIM shown in Fig. 13 is already competitive against various heuristic solvers implemented on advanced digital platforms such as CPUs, GPUs, and FPGAs, in which massive parallel computation is performed over many billions of transistors6,7,55,56,57,58. Note that the results shown in Fig. 13 are based on the assumption of an MFB-CIM architecture that employs only a single OPO as an active element (i.e., it involves only a single optical resonator, along with a nonlinear optical crystal, pumped by a laser) for processing information encoded in time-multiplexed oscillations of the resonator. It is anticipated that advanced on-chip coherent network computing technologies (based, e.g., on chip-scale integrated lithium niobate second-order nonlinear photonic circuits59) will allow the design of highly parallelized MFB-CIM architectures involving multiple OPO components operated in parallel, with the potential for massively parallel computation that would further enhance performance.

We have also studied the scaling of the DAQC algorithm in solving 21-weight and SK model MaxCut problem instances. We considered two schemes for optimizing the quantum gate parameters of DAQC, denoted in the paper as (γ, β). In the first scheme, we treat γ and β as hyperparameters that follow a schedule inspired by the adiabatic theorem. In this case, DAQC can be viewed as a Trotterization of an adiabatic evolution from the ground state of a mixing Hamiltonian to the ground state of a problem Hamiltonian. The second scheme is a variational hybrid quantum–classical algorithm (similar to the QAOA approach) wherein a classical optimizer is tasked with optimizing the gate parameters γ and β. The variational scheme must perform repeated state preparation and projection measurements to estimate the ensemble averaged energy, which makes the optimization step not only costly but vulnerable to the shot noise of these measurements. Another disadvantage of the variational scheme is that optimizing the ensemble average energy does not necessarily improve the TTS, which is the more practical measure of performance for the algorithm (see the Methods section for more details). As shown later in Fig. 16, the adiabatic schedules achieve very low R99 values, suggesting a challenging bound on the allowed number of shots for the variational scheme to outperform the adiabatic scheme for this problem. Given these considerations, we used a pre-tuned adiabatic scheme to assess the performance limits of DAQC. In contrast, we note that the quantum state in an MFB-CIM survives through repeated measurements, as the measurements performed on the OPO pulses are not direct projective measurements but indirect approximate measurements. These measurements perturb the internal quantum state of the OPO network but do not completely destroy it. As a result, the above drawback of a variational scheme for DAQC does not apply to the closed-loop MFB-CIM. The wall-clock TTS of DAQC with hypertuned adiabatic schedules is well-represented by the TTS ≈ 4.6 × (1.17)n microseconds. As shown in Fig. 13, extrapolating this trend suggests that DAQC will perform poorly compared to the MFB-CIM as the problem sizes increase due to an exponential dependence on the number, n, of vertices in the MaxCut problem compared to an exponential growth with a \(\sqrt{n}\) exponent in the case of the MFB-CIM.

Finally, we have also studied the scaling of DH-QMF for solving 21-weight and SK model MaxCut problems. As this algorithm is based on Grover’s search, it performs \(\widetilde{{{{\mathcal{O}}}}}(\sqrt{{2}^{n}})\) Grover iterations, implying it makes a number of queries, of the same order, to its oracle. The algorithm also iterates on multiple values of a classical threshold index; however, this does not change the dominating factors in the scaling of the algorithm. We have shown that the wall-clock TTS of DH-QMF is well-approximated by the \({{{\rm{TTS}}}}\,\approx \,17.3\times {2}^{n/2}{n}^{2}\log \log n\) microseconds when extrapolated to larger problem sizes. As shown in Fig. 13, DH-QMF requires a computation time that is many orders of magnitude larger than that for either DAQC or the MFB-CIM. This comparatively poor performance of DH-QMF can be traced back to the linear amplitude amplification in the Grover iteration in contrast to the exponential amplitude amplification at the threshold of the OPO network. Our study thus leaves open the question of whether there exist optimization tasks for which Grover-type speedups are of practical significance.

Methods

Optimal loss parameters for the MFB-CIM

The performance of the MFB-CIM critically depends on the machine’s total loss rate. Here we discuss how the optimal loss parameters were found. In Fig. 5, the effect of changing the total loss rate γsΔTc(1 + j) on the TTS by using the discrete-time model of the MFB-CIM is shown. There are various ways the total loss rate can be varied. For the results displayed in Fig. 5, we kept j constant at the value 1 (recall that j is a parameter that corresponds to the escape efficiency49, which is the ratio of the out-coupling loss associated with the optical homodyne measurement to the total cavity loss) and varied γsΔTc. There is a sweet spot around γsΔTc(1 + j) ≈ 1.

In Fig. 14a and Fig. 14b, heat maps of the TTS for a problem instance of size n = 30 are shown. Here, the x-axis represents the total loss rate γsΔTc(1 + j) and the y-axis represents the out-coupling loss j. In these plots, j = 1 on the y-axis corresponds to the TTS curves plotted in Fig. 5. The green contour lines correspond to fixed values for J = jγs. As evident from these plots, at least in the case of the open-loop CIM, an increase in the value of the total loss rate, moving along the horizontal axis, results in the optimal region becoming larger, while moving along a green contour line, the optimal region becomes sharper. In the case of the closed-loop CIM, there appear to be two optimal regions. We believe that the more accurate optimal region in this case is the region along the vertical line given by γsΔTc(1 + j) ≈ 0.5 or (Ndecay = 2), even though in this region the TTS is longer, because, as the total loss rate becomes sufficiently large, the nonlinearity increases in strength such that the error correction mechanism can no longer stabilize the amplitude to the desired target amplitude. The reason there is a short TTS in this region for n = 30 is that the problems are small enough that they can still be solved despite the unstable behavior of the solver. However, in the case of the problem size n = 100, as shown in Fig. 14c, this second region no longer has a short TTS, and the optimal TTS occurs in the region around the vertical line defined by γsΔTc(1 + j) ≈ 0.3.

Fig. 14: Heat maps of the TTS for the Sherrington–Kirkpatrick model with the x-axis representing the total loss rate γsΔTc(1 + j) and the y-axis representing the out-coupling loss j.
figure 14

a, b Heat maps for the closed-loop and open-loop CIMs for n = 30. c, d Heat maps for the closed-loop and open-loop CIMs for n = 100. The colors indicate the value of the TTS in terms of the number of round trips, where a darker color represents a shorter TTS. The green contour lines correspond to fixed values for J = jγs.

Hyperparameter tuning for DAQC parameter schedules

We now present our method for generating DAQC parameter schedules for any problem Hamiltonian HP and number of layers p. We consider two hyperparameters for these schedules:

  • The number L = T/p is the evolution time in each Trotterized layer of the associated annealing schedule. A larger value of L corresponds to a slower and therefore better associated annealing schedule, but also brings along a greater Trotterization error;

  • The number a is the coefficient of the cubic term in the adiabatic schedule. When a = 0 the schedule is linear, and when a = 4 the schedule is cubic, with \({f}^{{\prime} }(T/2)=0\). We therefore only consider a ∈ [0, 4], because for a > 4 the schedule would be decreasing at t = T/2.

Here, we compile our results on the performance of DAQC with cubic schedules for various values of the hyperparameters a, L, and p. In Figs. 1517, the horizontal axis displays the number of vertices for the problem instance, and the vertical axis displays the R99 or TTS (in logarithmic scale). Each blue dot represents a single problem instance. All plots depict a total of 11,000 problem instances varying from 10 to 20 nodes in size. Each black point represents the geometric mean of all values of R99 or TTS for problem instances of a given size. Finally, the red line indicates the best linear fit to the black points. The equation corresponding to the best-fit line is written in each subplot, where n is the number of vertices.

Fig. 15
figure 15

R99 of the good initial DAQC parameters at p = 4 layers for various values of a and L, on all 1000 graph instances of each size ranging from 10 to 20.

We empirically found that a value of L between 2.6 and 3.6 worked best. In Fig. 15, we plot the R99 values of the good parameter schedule with hyperparameters a ∈ {0, 2, 4} and L ∈ {2.8, 3.0, 3.2, 3.4, 3.6}. Note that a = 4 (a cubic schedule with a derivative of 0 at the inflection point) outperforms a = 0 (a linear schedule). We observed that, as the number of vertices n increases, the optimal value of the scaling constant L increases. Therefore, our tuned hyperparameter value used in Figs. 16 and 17 is L = 1.6 + 0.1n.

Fig. 16
figure 16

R99 and TTS of a linear schedule for 10 ≤ n ≤ 20, p ∈ {4, 10, 20, 50}, with hyperparameters a = 0.0 and L = 1.6 + 0.1n.

Fig. 17: R99 and TTS of a cubic schedule for 10 ≤ n ≤ 20, p ∈ {4, 10, 20, 50}, with hyperparameters a = 4.0 and L = 1.6 + 0.1n.
figure 17

The performance is better than that of the linear schedule for shallow circuits, but stops improving as the number of layers becomes larger.

In Figs. 16 and 17, we present the scaling of a linear schedule opposite to that of a cubic schedule. As the number of layers increases, performance as measured by R99 improves, as expected. However, with more layers, more time is required to perform a single circuit shot, and therefore the scaling of TTS is actually worse at 50 layers than it is at 20 layers. For large numbers of layers, the linear schedule and cubic schedule perform similarly, which is expected because both are Trotterizations of a very slow adiabatic schedule.

Challenges encountered with the variational DAQC protocol

Our initial investigations also included the variational quantum-classical protocol for optimizing the DAQC gate parameters. Here we briefly outline the methods we used for this approach, and the challenges we encountered in applying it, which is why we decided not to use it in our benchmarking study. When DAQC parameter schedules are tuned variationally, the energy measurements from the quantum device are used to decide the next parameters to try via a hybrid quantum–classical process. A single “shot” with the parameters (γ, β) consists of running the DAQC circuit once with parameters (γ, β), and measuring the energy of the prepared state \(\left\vert \psi (\gamma ,\beta )\right\rangle\), which destroys the prepared state and returns a single measurement outcome. We perform a large number of shots using (γ, β), and the results are averaged to estimate the expected energy

$$EE(\gamma ,\beta ):= \langle \psi (\gamma ,\beta )| {H}_{{{{\rm{P}}}}}| \psi (\gamma ,\beta )\rangle .$$
(18)

This expected energy is treated as a loss function which is minimized by a classical optimizer. This approach suffers from two major challenges.

Firstly, we want the parameters (γ, β) which minimize the R99, rather than the expected energy. Although these two loss functions are related, they are not perfectly correlated, and this difference becomes more apparent as we move closer to the parameters which minimize R99. Unfortunately, it is impossible to optimize the ansatz with respect to R99, as this would require knowledge of the ground state.

Secondly, because projective measurements are stochastic, our estimate of the expected energy is approximate, and this makes parameter optimization difficult. To overcome this issue, we would need to use a large number of shots per point (γ, β), which makes the variational algorithm costly.

In Fig. 18, we illustrate the implications of the first challenge. We consider a four-layer DAQC circuit on graphs of size 10, 15, and 20. For each graph G, the following analysis is performed. First, the cubic schedule θG (see the Results section) is found and its R99 is calculated. The Nelder–Mead method is then used to optimize the expected energy, with its parameter schedule initialized as θG and given access to 100 perfect evaluations of expected energy (which ordinarily can only be approximated). The R99 of the result is divided by the R99 of the cubic schedule, and these ratios have been plotted in red. Finally, the Nelder–Mead method is used to optimize R99, with a schedule initialized with θG and access to 100 perfect evaluations of R99 (which is ordinarily impossible to calculate). The R99 of the result is divided by the R99 of the cubic schedule, and these ratios have been plotted in blue. For better visibility, the graph instances along the x-axis have been sorted by the y-values of the red points. We observe that even with perfect estimation of the expected energy, optimization results in a worse final R99 in 15 to 40 percent of graph instances. This is the case despite the fact that the cost (in shots) of performing this optimization has been discarded. The effect of including the cost would have been substantial.

Fig. 18: Plot depicting the fraction of the baseline R99 achieved when optimizing for expected energy with no shot noise (red) versus optimizing for R99 (blue).
figure 18

Baseline R99 (black) is given by the cubic parameter schedule, as described in the Results section. Even when shot noise is absent, optimizing for expected energy can increase the R99 about a third of the time, as is evidenced by the fact that a third of the red points are above the black line. We performed this optimization using 100 function evaluations using the Nelder–Mead method, and due to imperfect optimization, a few blue points landed above the red curve. The x-axis is the graph instance number from 0 to 199, where graphs have been sorted according to the y-value of the red point.

Optimal number of Grover iterations in DH-QMF

In what follows, we explain how the DH-QMF algorithm can always be designed such that the output is indeed a ground state with a probability higher than any target lower bound for the probability of success, for example, 0.99.

A key component of Grover’s search as part of QMF is an oracle that marks every input state \(\left\vert x\right\rangle\) whose energy is strictly smaller than the energy corresponding to the threshold index y (see Fig. 10). We call it the “QMF oracle” and denote it by OQMF to distinguish it from the “energy oracle” OE which computes the energy of a state under the problem Hamiltonian. The oracle OQMF uses an ancilla qubit initialized in the state \(\left\vert z\right\rangle\) to store its outcome

$${O}_{{{{\rm{QMF}}}}}{{{\rm{:}}}}\,\left\vert x\right\rangle \left\vert z\right\rangle \longmapsto \left\vert x\right\rangle \left\vert z\oplus f(x)\right\rangle ,$$
(19)

where f(x) = 1 if, and only if, E(x) < E(y), and f(x) = 0 otherwise. Here, ⊕ represents a bitwise XOR. The QMF oracle is constructed from multiple uses of the energy oracle OE and an operation that compares the values held by two registers. Details of this construction are provided in the next Methods subsection. The combined effect of querying OQMF followed by the Grover diffusion (together forming the Grover iteration to be repeated \({{{\mathcal{O}}}}\left(\sqrt{{2}^{n}}\right)\) times) results in constructively amplifying the amplitudes of the marked items while diminishing the amplitudes of the unmarked ones.

When there are multiple solutions to a search problem, as is frequently the case in the Grover subroutine of QMF, the optimal number of Grover iterations needed to maximize the success probability depends on the number of marked items as well. Indeed, suppose we were to have knowledge of the number of marked items t ahead of time. Then, the optimal number of Grover iterations could be obtained from the closed formulae provided in53:

$$\begin{array}{l}{\wp }_{{{{\rm{succ}}}}}={\sin }^{2}\left((2m+1)\theta \right),\\ {\wp }_{{{{\rm{fail}}}}}={\cos }^{2}\left((2m+1)\theta \right).\end{array}$$
(20)

Here, m is the number of Grover iterations, and θ is defined by \({\sin }^{2}\theta =t/N\). Hence, the success probability is maximized for the optimal number of Grover iterations \({m}_{{{{\rm{opt}}}}}=\left\lfloor \pi /4\theta \right\rfloor\). We also observe that after exactly mopt iterations the failure probability obeys

$${\wp }_{{{{\rm{fail}}}}}\le {\sin }^{2}\theta =t/N,$$

which is negligible when t ≪ N.

In practice, t and, consequently, mopt are often unknown. Nevertheless, Section 4 and Theorem 3 in the analysis by Boyer et al.53 present a method to find a marked item with query complexity \({{{\mathcal{O}}}}\big(\sqrt{N/t}\big)\) even when no knowledge of the number of solutions is assumed.

To simplify the analysis for our benchmark in this paper, we examine each MaxCut instance and assume t is known every time Grover’s search is invoked. This assumption provides a lower bound on the performance of DH-QMF. In view of the previous discussion, having knowledge of t allows us to compute mopt, succ, and fail.

We then boost the overall success probability of Grover’s search to any target success probability pG by repeating it K times, where K satisfies

$${p}_{{{{\rm{G}}}}}\le 1-{\wp }_{{{{\rm{fail}}}}}^{\,K}\,.$$
(21)

Moreover, if DH-QMF requires J non-trivial threshold index updates in total, we must succeed in every boosted Grover search (each including K Grover searches). The probability of this event is thus at least \({p}_{{{{\rm{G}}}}}^{\,J}\). Finally, let us denote the target lower bound for the probability of success of the overall DH-QMF algorithm by psucc. We then must have

$${p}_{{{{\rm{succ}}}}}\,\le \,{p}_{{{{\rm{G}}}}}^{\,J}.$$
(22)

We achieve a lower bound for K using Eqs. (21) and (22)

$$K\ge \frac{\log \left(1-{p}_{{{{\rm{succ}}}}}^{\ \,\frac{1}{J}}\right)}{\log {\wp }_{{{{\rm{fail}}}}}}\ .$$
(23)

Note that this number still depends on the optimal number mopt of Grover iterations. The remainder of this section explains how the latter number is sampled for each MaxCut instance via Monte Carlo simulation.

Given a weighted graph, we first generate the histogram of the sizes of all cuts in the graph. Examples of such histograms are provided in Fig. 19. This cut-size histogram allows us to perform a Monte Carlo simulation of the progression of DH-QMF as follows. The DH-QMF algorithm starts by choosing uniformly at random an initial cut C as the threshold index. The resulting energy threshold is therefore sampled according to the cut-size histogram. Grover search then attempts to find a larger cut. The number of these cuts is t in the notation above, and can be found if the cut-size histogram is known. Using Eq. (20), we can also compute the optimal number mopt of Grover iterations needed to achieve the highest possible success rate succ in that search. We furthermore can now use Eq. (23) to predict the number K of Grover searches needed to boost the success probability to at least pG. The cut C is now replaced with a larger cut also selected at random using the cut-size histogram, and this simulation is repeated for the next iteration in DH-QMF.

Fig. 19: Typical cut-size histograms of undirected random weighted graphs with weights wk = ±0.1j, where j ∈ {0, 1, …, 10}.
figure 19

Two instances are shown for random graphs with n = 15 (left) and n = 20 (right) vertices. Note that, for a fully connected graph with n vertices, the overall number of edges is n(n − 1)/2.

We repeatedly sample and update the threshold until we find a maximum cut (i.e., at an iteration where t = 0). At this point, we stop our Monte Carlo simulation (even though in practice it will not be known that t has become zero). For each sampling step j, we count the total number tj of states contributing to strictly greater cuts and use it to calculate the optimal number \({m}_{{{{\rm{opt}}}}}^{[j]}\) of Grover iterations as well as the number of boosting iterations Kj via Eq. (23).

We now obtain an optimistic TTS as well as an optimistic gate count estimate using the formulae

$${{{\rm{TTS}}}}=\mathop{\sum }_{j=1}^{J}{K}_{j}{m}_{{{{\rm{opt}}}}}^{[j]}\times {{{\rm{RUNTIME}}}},$$
(24)
$$\#\,{{{\rm{gates}}}}=\mathop{\sum}\limits_{j=1}^{J}{K}_{j}{m}_{{{{\rm{opt}}}}}^{[j]}\times {{{\rm{GATECOUNT}}}}.$$
(25)

Here, RUNTIME denotes the running time and GATECOUNT indicates the gate count for a single Grover iteration. In the Results section, we provide optimistic estimates for the number of single-qubit gates, CNOT gates, and T gates. The quantum circuit implementation of a single Grover iteration is presented in the Supplementary Materials.

The QMF oracle as part of DH-QMF

In this section, we expand on our analysis of the quantum circuits used to implement the QMF oracle, which is required when Grover’s search is employed as a subroutine of DH-QMF, and explain the computational contributions to the oracle’s resource requirements. A brief review of how Grover’s search algorithm works is provided in the Supplementary Materials.

For DH-QMF, the search for a ground state of an Ising Hamiltonian H = −∑i<JiZiZ (corresponding to an undirected weighted graph with weights wi = −Ji) requires an oracle which marks all states whose energies are strictly smaller than the energy corresponding to the latest updated threshold index value, respectively, which we refer to as the “QMF oracle” in this paper. Its quantum-circuit implementation is shown in Fig. 20. Note that here, instead of using the weights \({w}_{k\ell }=\pm\! 0.1j\in \left[-1,1\right]\) for j ∈ {0, 1, …, 10}, we take the weights to be the integers −10 ≤ wk ≤ 10; this facilitates the quantum-circuit implementation of arithmetic operations without altering the underlying MaxCut problem.

Fig. 20: Quantum oracle as a key component of the Grover step as part of DH-QMF.
figure 20

The oracle marks every state whose energy is strictly smaller than the threshold value E(y), which is computed given the latest threshold index y. The result is recorded in a single-qubit flag: given its input state \(\left\vert z\right\rangle\) (where z ∈ {0, 1}), the oracle outputs \(\left\vert z\oplus f(x)\right\rangle\), where f(x) = 1 if, and only if, E(x) < E(y), and f(x) = 0 otherwise. a The circuit consists of several queries to the energy oracle OE, which reversibly computes the energy corresponding to a given input state, and applications of a unitary module called “Compare”, which compares the values held by two registers and records the result (0 or 1) in a single-qubit ancilla. To infer if E(x) < E(y) for a given input \(\left\vert x\right\rangle\), we prepare the quantum state \(\left\vert y\right\rangle\) corresponding to the known threshold index y, then independently compute E(x) and E(y) by separately employing OE, respectively, and compare their values using “Compare”. The computational registers for holding the energy values are initialized in \(\left\vert {\tilde{E}}_{0}\right\rangle\), where \({\tilde{E}}_{0}\) is a constant energy shift chosen so as to avoid negative energies. If E(x) < E(y) is TRUE, a 1 is recorded in an ancilla qubit that was initialized in \(\left\vert 0\right\rangle\); the ancilla remains unaltered otherwise. Using a CNOT gate, we copy out the result of the comparison to the single-qubit flag and reverse the whole circuit producing this result. b OE is implemented by serially executing the shown circuit template for every vertex pair (i, ). Depending on whether vertex[i] and vertex[] carry the same or different values, we respectively subtract or add the value Ji in the data(H) register.

In addition to the n-qubit register vertex for encoding the possible spin configurations and any superpositions of them and a single-qubit register flag for holding the result of the oracle, several other computational registers as well as ancillae are required to reversibly compute the energies E(x) and E(y) and compare their values. More concretely, we need another n-qubit register to encode the value y of the threshold index as a quantum state \(\vert y\rangle\). Furthermore, we need two computational registers to store the computed values E(x) and E(y); we call these registers “data(H)” to indicate that they hold the computed data related to the Hamiltonian. Both are initialized such that they initially hold an integer \({\tilde{E}}_{0}\) that is an upper bound on the maximum possible absolute value of an energy eigenvalue, \({\tilde{E}}_{0}\ge {\max }_{x}\left\vert E(x)\right\vert\). This energy shift by a constant value allows us to have a nonnegative energy spectrum, which facilitates the implementation of the energy comparison. The maximum possible absolute energy eigenvalue, \({\max }_{x}\left\vert E(x)\right\vert\), is bounded by the product of the total number of edges in the graph times the maximum absolute edge weight in the weighted graph. The registers data(H) must thus be able to store a value twice as large as this bound. Since generic weighted graphs have full connectivity, the total number of edges in such graphs is \({{n}\choose{2}}=n(n-1)/2\), where n is the number of vertices, while the maximum absolute edge weight in our analysis is \({\max }_{(i,\ell )}\left\vert {w}_{i\ell }\right\vert =10\). Hence, we may use \({\tilde{E}}_{0}:= 10{{n}\choose{2}}=5n(n-1)\) and choose the registers data(H) to be of size \(\lceil {\log }_{2}\left(10n(n-1)\right)\rceil \in {{{\mathcal{O}}}}(\log n)\).

The energy values E(x) and E(y) are computed using two separate energy oracles, whose quantum circuit implementation is provided in Fig. 20b. For a given input \(\left| x\right\rangle =\left| {\xi}_{0}\right\rangle \otimes \cdots \otimes \left| {\xi}_{n-1}\right\rangle\) held by the vertex register, we serially execute the shown circuit template for every vertex pair (i, ) in the graph whose edge ei is nonzero. Each such circuit subtracts or adds the value Ji in the data(H) register, depending on whether ξi = ξ or ξi ≠ ξ, respectively, effectively contributing the term \({(-1)}^{{\xi}_{i}}{(-1)}^{{\xi}_{\ell}}\left(-{J}_{i\ell}\right)\) to the overall energy. The series for all pairs of vertices accumulates the sum \({\sum}_{ij}{(-1)}^{{\xi}_{i}}{(-1)}^{{\xi}_{\ell}}\left(-{J}_{i\ell}\right)\), which together with the initial value \({\tilde{E}}_{0}\) results in the value \(E(x)={\tilde{E}}_{0}-{\sum}_{i\ell}{(-1)}^{{\xi}_{i}}{(-1)}^{{\xi}_{\ell}}{J}_{i\ell}\) held by the data(H) register as output of the energy oracle OE. Similarly, we obtain the value \(E(y)={\tilde{E}}_{0}-{\sum}_{i\ell}{(-1)}^{{\eta}_{i}}{(-1)}^{{\eta}_{\ell}}{J}_{i\ell}\) for the quantum state \(\left| y\right\rangle =\left| {\eta}_{0}\right\rangle \otimes \cdots \otimes \left| {\eta}_{n-1}\right\rangle\) corresponding to the threshold index y. For generic weighted graphs with full connectivity, this serial implementation contributes a factor \({\mathcal{O}}({n}^{2})\) to the overall circuit depth scaling. Moreover, there is an additional contribution from the arithmetic operations needed to implement addition and subtraction of the constant integer Ji within the data(H) register. Our circuit implementations and resource estimates have been obtained using projectQ60. The implementation of addition or subtraction of a constant c, that is, \(\left\vert E\right\rangle \mapsto \left\vert E\pm c\right\rangle\), in projectQ54 is based on Draper’s addition in Fourier space61, which allows for optimization when executing several additions in sequence, which applies to our circuits. Due to cancellations of the quantum Fourier transform (QFT) and its inverse, \({{{\rm{QFT}}}}\,{{{{\rm{QFT}}}}}^{-1}={\mathbb{1}}\), for consecutive additions or subtractions within the sequence given by the serial execution of circuits shown in Fig. 20b, the overall sequence contributes a multiplicative factor scaling only as \({{{\mathcal{O}}}}(\log \log n)\) to depth, and a multiplicative factor in \({{{\mathcal{O}}}}(\log n\,\log \log n)\) to the gate complexity. To understand these contributions, recall that the registers data(H) are of size \({{{\mathcal{O}}}}(\log n)\). The remaining initial QFT and the final inverse QFT, which transform into and out of the Fourier space in that scheme, contribute only an additional additive term \({{{\mathcal{O}}}}\left({(\log n)}^{2}\right)\) to both the depth and the gate complexity of the overall sequence. Hence, the implementation of the energy oracle OE contributes the factors \({\mathcal{O}}\left({n}^{2}\log \log n+{(\log n)}^{2}\right)\) to the overall circuit depth and \({\mathcal{O}}\left({n}^{2}\log n\log \log n+{(\log n)}^{2}\right)\) to the overall gate complexity.

The energy computation is followed by a unitary operation called “Compare”, which compares the energies E(x) and E(y). Using methods developed by one of the authors of this paper in a previous work62, we can implement this comparison by a circuit with a depth only log-logarithmic in the number of qubits, that is, with a depth in \({\mathcal{O}}(\log \log n)\), while its gate complexity is \({{{\mathcal{O}}}}(\log n)\). An additional single-qubit ancilla is used to store the result of the comparison. Concretely, initialized in state \(\left\vert 0\right\rangle\), the ancilla is output in the state \(\left\vert f(x,y)\right\rangle\), where

$$f(x,y)=\left\{\begin{array}{ll}0,\quad &{{{\rm{if}}}}\,E(x)\ge E(y)\\ 1,\quad &{{{\rm{if}}}}\,E(x)\, < \,E(y)\,.\end{array}\right.$$
(26)

Using a CNOT gate, we copy out this result to the single-qubit flag (bottom wire) and reverse the whole circuit used to compute the result so as to uncompute the entanglement with the garbage generated along the way.

In summary, the QMF oracle is a quantum circuit of depth \({\mathcal{O}}\left({n}^{2}\log \log n+{(\log n)}^{2}\right)\) and gate complexity \({\mathcal{O}}\left({n}^{2}\log n\log \log n\,+\,{(\log n)}^{2}\right)\). The Grover diffusion requires an n-controlled NOT gate to implement the reflection, which is a circuit of depth and gate complexity both scaling as \({\mathcal{O}}(n)\) in terms of elementary gates. Putting all contributions together, a single Grover iteration in our implementation has a circuit of depth in \({\mathcal{O}}\left({n}^{2}\log \log n+{(\log n)}^{2}+n\right)\), while its gate complexity is \({\mathcal{O}}\left({n}^{2}\log n\log \log n+{(\log n)}^{2}+n\right)\). While we have not explicitly shown it, we note that the growth rates of circuit depth and gate counts are lower-bounded by the same scalings, meaning that in the above expressions we may replace the \({\mathcal{O}}(\cdot)\) notation by Θ(⋅).

As an additional final remark, we note that it is possible to achieve a slightly better circuit depth scaling for the Grover iteration, namely as \({\mathcal{O}}\left(n+{(\log n)}^{3}+\log \log n\right)\), by a parallel (instead of serial) execution of the circuit components shown in Fig. 20b pertaining to each vertex pair (i, ) in the graph. However, this parallelization would come at an unreasonably high additional space cost, as it would necessitate the use of n(n − 1) computational registers of size \({\mathcal{O}}\left(\log n\right)\) instead of only two. The number of qubits required would scale as \({\mathcal{O}}\left(n+{n}^{2}\log n\right)\). In contrast, our serial implementation above requires only \({\mathcal{O}}\left(n+\log n\right)\) qubits.