1 Introduction

Theoretically, quantum computers are superior to classical computers when quantum mechanical laws are carefully exploited in deterministic algorithms [1, 2]. However, in real-world deployment, empirical performance can take precedence over theoretical superiority. In fact, for most classical algorithms that perform well empirically on practical tasks, no proof exists for estimates or bounds on their performance. For this reason, empirical experimentation of quantum heuristics is critical to understanding the breadth of quantum computing’s impact.

Quantum annealing is one of the most prominent quantum metaheuristics for optimization [3,4,5,6,7,8,9]. Quantum annealing has become accessible to experimentation in recent years, and the company D-Wave has produced and commercialized multiple devices in the last decade. Starting with the 128-qubit D-wave One, each generation of annealer has aimed to improve performance on optimization problems. The D-Wave Two, 2X, 2000Q, and Advantage systems (the four we use in this study) have 512, 1152, 2048, and 5640 qubits, respectively. The D-Wave 2000Q introduced several features like a shorter minimum anneal time, increased control of the annealing schedule, and a wider range of ferromagnetic coupling strengths. Advantage, the newest device, uses a denser native graph structure with 15 instead of 6 degrees of coupling between qubits. In addition to these increased number of qubits and features, these newer generations of annealers also have reduced noise and better calibration and reduction in analog control errors [10, 11].

This report evaluates how the various updates made to these machines over the last few years impact their effectiveness in solving hard problems of practical interest. While the comparative performance of Advantage and 2000Q was discussed in [12] for exact cover problems, to our knowledge, this is the first investigation of its kind that compares the performance of four different generations of quantum annealers with respect to an identical set of applied problems. Our results can be summarized as follows:

  • Under the default operating conditions and fixed anneal times, 2X outperformed Two, and Advantage outperformed all its predecessors. While 2000Q was able to solve larger problems than 2X, there was no statistical difference in the average TTS between the two machines. Improvement from Two to 2X is likely due to reduced specification errors and other hardware level improvements. For Advantage, a large part of the performance increase can be attributed to increased connectivity.

  • The shortest possible anneal time gives the shortest TTS for most problem sizes, with the effect being most pronounced for small-sized problems. However, a longer anneal time is necessary to obtain solutions for the largest problem sizes that are addressable only via the Advantage platform. Crucially, the scaling of the TTS with respect to size increases as we decrease the anneal time. Overall, in future architectures with lower anneal times, problems of different sizes might benefit from size-appropriate anneal times.

  • In general, optimizing the ferromagnetic coupling beyond the default settings lowers the TTS at each problem size. However, the magnitude of this improvement is device-dependent. On 2X and 2000Q, the change in TTS due to such optimization is nearly negligible. However, on Advantage, there is a substantial reduction in the scaling exponent after ferromagnetic coupling optimization. While our brute-force optimization is informative, we suspect that as the devices and the problems increase in size, a more sophisticated method of optimizing the coupling will have to be employed.

The structure of the paper is as follows. Section  2 briefly reviews quantum annealing. Section 3 describes the scheduling problem and the generation of problem instances. Section 4 discusses the methodological details of our experiments. In Sect. 5 we discuss our results, and summarize their implications in Sect. 6.

2 Review of quantum annealing

In quantum annealing, the device starts by implementing an initial Hamiltonian \(H_I\), and the overall Hamiltonian H is parametrically changed until it becomes the final Hamiltonian \(H_F\). This process corresponds to the following Hamiltonian evolution

Table 1 Hardware details for all four generations of the D-Wave annealers, the first three hosted at NASA Ames Research Center and the last one accessed through the D-Wave Leap cloud platform, are shown
$$\begin{aligned} H(s) = A(s) H_I + B(s) H_F \ , \ s \in [0,1], \end{aligned}$$
(1)

where A(s) is a monotonically decreasing function and B(s) is a monotonically increasing function such that \(A(1)=B(0)=0\). \(H_F\) encodes the cost-function landscape of a combinatorial optimization problem in its eigenvalues, and its ground state maps to the optimal solution to the problem. Quantum annealing utilizes both thermal and quantum fluctuations when exploring the cost-function landscape. So, in principle, it is more expressive and possibly more effective than simulated annealing or parallel tempering methods, where quantum effects like tunneling are not possible [13]. However, due to practical aspects such as finite anneal time, temperature, noise, and sub-optimal parameter setting, the final output of an annealing run might not necessarily be a low-energy state of the problem Hamiltonian. Consequently, quantum annealers are usually treated as black-box optimizers that need to be iteratively run in conjunction with a parameter setting strategy. In benchmarking studies, once a problem instance is programmed into the annealer and a set of parameter values chosen, a run consisting of a large number of anneals is typically performed to obtain statistics (see Sect. 4 and in particular Table 1 for details). Two complementary metrics are commonly used to report results; the first one is the probability of the ground state solution

$$\begin{aligned} P_{\text {gs}} = \frac{\text {Number of ground state solutions}}{\text {Total anneals}}, \end{aligned}$$
(2)

where “Total anneals” refers to the number of anneals in one run as described above. Its counterpart, the time-to-solution (TTS) is defined as the expected time to obtain the ground state solution with 0.99 success probability for a specific anneal time, and is computed as

$$\begin{aligned} \text {TTS} = \frac{\ln (1-0.99)}{\ln (1-P_{\text {gs}})} t, \end{aligned}$$
(3)

where t is the anneal time for a single annealing repetition. The results presented here focus on TTS since it provides a more direct representation of the resources (i.e., time) required to solve optimization problems, and this data is valuable to researchers. Given that comparisons are presented across devices and performed with the same annealing times when possible (limited by the annealing time range of older devices), the conclusions translate to a \(P_{\text {gs}}\) comparison. In practice, the total time needed to solve a problem will also be affected by processes outside of the actual annealing, such as programming time (which takes the longest but is only needed once before each run), readout time, etc. (see  [14] for QPU timing details). These are not included in the total time used for calculating TTS, in accordance with standard practice in benchmarking literature. It is also worth noting that these additional time requirements are common to all the devices considered for this study, making their relevance much more minor than if the comparison were between quantum and classical devices.

The D-Wave quantum annealers have 2-local architectures, meaning that there are pairwise couplings between the qubits, i.e., the Hamiltonian \(H_F\) is an Ising Hamiltonian. Consequently, the optimization problem is formulated as a quadratic unconstrained binary optimization (QUBO) problem [15,16,17], where the cost function is a quadratic function of binary variables. We will refer to obtaining the QUBO for a problem as mapping the problem to its QUBO. As each qubit in the quantum annealer is connected to a small subset of the other qubits, representing a single binary variable usually requires multiple physical qubits to reproduce the QUBO’s connectivity faithfully. This set of physical qubits representing a single logical variable is called a vertex model, and the process of finding vertex models for each logical qubit in the problem is called embedding of the QUBO in the hardware chip. The physical qubits within a vertex model are subjected to ferromagnetic coupling \(J_F\) in the final Hamiltonian. Finding the value \(J_F\) that maximizes the TTS is called the parameter-setting problem [18]. For a more detailed review of quantum annealing on D-Wave machines, the reader is invited to read Ref. [19,20,21].

3 Parameterized families of scheduling problems

Scheduling problems deal with the allocation of time and resources under certain constraints. While these NP-complete problems are important on their own, they also have many applications to planning problems from artificial intelligence [22]. Here, we consider the task of assigning k time-slots to n tasks while avoiding any possible time conflicts (double-scheduling). These problems are equivalent to vertex-coloring problems, which are also a class of NP-complete problems. In particular, when we map the time-slots to colors and tasks to vertices, the edges between the vertices correspond to constraints between the tasks (see Fig. 1). So the colorability of the graph refers to the solvability of the scheduling problem. The k-coloring gives the desired conflict-free schedule.

The coloring task can be represented using doubly-indexed binary variables \(x_{\textrm{ic}}\), where \(x_{\textrm{ic}}=1(0)\) means that the ith vertex is colored (not colored) with color c. Each vertex must be colored only once, and adjacent vertices must have different colors. Using these two conditions, we get the QUBO:

$$\begin{aligned} H(\textbf{x})=\sum \limits _{i=1}^{n} \left( 1- \sum \limits _{c=1}^{k} x_{\textrm{ic}} \right) ^2 + \sum \limits _{(i,j)\in E} \sum \limits _{c} x_{\textrm{ic}} x_{\textrm{jc}} \end{aligned}$$
(4)

For a valid coloring, this objective function will give \(H(\textbf{x})=0\).

Fig. 1
figure 1

Example of a schedule for \({n}=8\) is shown with a valid 3-coloring. Each scheduling problem can be mapped to a graph-coloring problem by identifying tasks with nodes and task conflicts as edges

We consider an ensemble of problems that are not easily colorable or trivially uncolorable. To be more specific, we consider the 3-colorability of a specific set of Erdos-Renyi graphs \(G_{n,p}\). \(G_{n,p}\) are graphs with n vertices such that the probability of having an edge between each pair of vertices is p. A combinatorial phase transition for k-colorability problems is known in terms of the parameter \(d = \frac{e}{n}\) where e is the number of edges [23]. This phase transition occurs when the difficulty of finding the coloring goes from easy to hard to easy. This easy-hard-easy transition of colorability is well studied, and there are upper and lower bounds on the transition parameter d. However, finding the exact location of the phase transition is still an open problem [24]. We use \(d = 4.5\) for the generation of our graphs, following the study in [25]. We generate these graphs using a C++ program, which is an extension of the graph-generation method used by Culberson et al. [26]. The graphs used in this paper are identical to the set used by Rieffel et al. [27] in their case study about the performance of quantum annealers in solving planning-type problems.

4 Methods

Quantum annealing runs were performed on the three generations of D-Wave quantum annealers (Two, 2X, and 2000Q) housed at NASA Ames Research Center and the latest generation (Advantage), accessed through the D-Wave Leap cloud platform. The number of qubits, minimum anneal times, and base operating temperature for all four annealers are listed in Table 1. We used 100 problem instances at each problem size while restricting the instances to ones with at least one valid schedule (i.e., coloring). We found the embeddings for the QUBO instances using D-Wave’s native heuristic find_embedding [28].

To mitigate biases toward certain solutions formed by asymmetries in the processors, we used random gauges - bits 0,1 are randomly mapped to either -1,1 or 1,-1. Each gauge corresponds to a specific choice of this bit-to-spin mapping. Due to spin-reversal symmetry, the problem Hamiltonian is unaffected by the gauge choice. Table. 1 outlines the number of gauges used for different problem sizes. We check for the number of valid schedules (i.e., colorings) obtained using Eq. (4) and report the corresponding median TTS. We bootstrap the median TTS from 100 problem instances to get 5000 samples. The reported TTS is the mean of this ensemble, and the error bars correspond to 95% confidence intervals. Scheduling problems are NP-complete problems, so we expect our TTS to scale exponentially with size in the asymptotic limit. This exponential scaling is expected both for quantum and classical algorithms, with the scaling varying between different algorithms [25]. So we fit \(\text {TTS}= T_0 e^{n \alpha }\) [29], where n is the size of the problem, and report the scaling exponent \(\alpha \). We plot the median, 35th, and 65th percentiles of the TTS across instances; the error bars for the bootstrapped results are not plotted but used when computing the fits.

Each generation of D-Wave annealers has a larger chip layout structure than their predecessor, increasing the size of problems that can be embedded in each device. Moreover, the Advantage platform has transitioned the topology from a Chimera graph to a Pegasus graph [10, 30] (see Fig. 2), which has a higher connectivity. We were able to embed problems with up to 40 tasks on D-Wave Advantage and D-Wave 2000Q, and up to 32 tasks on D-Wave 2X. However, problems of sizes larger than 16, 20, and 24 on Two, 2X and 2000Q respectively had \(P_{\textrm{gs}} = 0\). For each machine, the last plotted dot in the curve for Figs. 3, 4, 5, 6 indicates the largest problem size for which the statistics analysis returned a non-infinite median. Working with larger problem sizes would require millions of runs per instance to obtain sufficient statistics.

In line with the typical practice for applied problems [7, 18], we optimized the ferromagnetic couplings. For this optimization, we set the anneal time to 5, 1, and 20 \(\mu \)s for 2X, 2000Q, and Advantage, respectively. In particular, for each machine, we chose the anneal time that gave the shortest TTS for the largest problem size we considered before \(J_F\) optimization. When optimizing the ferromagnetic coupling \(J_F\), we chose \(J_F \in [-0.5, -2.0]\) for 2X and \([-0.625, -1.375] \) for 2000Q, with steps of 0.125. For both machines, relatively few problem sizes had their optimal coupling at \(|J_{F}|>1.0\). \(J_{F}=-2.0, -1.375\) were not the optimal coupling for any problem sizes on 2X and 2000Q, respectively. In our plots, ‘i.opt’ refers to \(J_F\) optimized for each problem instance separately. A similar optimization of \(J_F\) was done for D-Wave Two by Rieffel et al. in [27], so we do not repeat those experiments here. For 2000Q data, we restrict \(J_F\) optimization to \(n<20\) due to limited computational resources. Moreover, for 2X and 2000Q, we performed coupling optimization at various anneal times and found that the optimal coupling did not depend on the anneal time. Optimization of \(J_F\) on Advantage has been recently studied in detail by some of the same author [31]. Here we explored the range \(J_F \in [-0.4, -1]\) for \(n=24\) and \(t=20 \mu \)s and found \(J_F = -0.5\) to be optimal (labeled ‘opt’). Unless specified otherwise, the default problem sizes, gauges, number of anneals, and anneal times used are detailed in Table 1. We discuss the results in the next section.

Fig. 2
figure 2

Left: Structure of the Chimera graph used by D-Wave Two, 2X, and 2000Q. 8-qubit, bipartite cells are arranged in a square pattern. Shown is a C3 graph; with three cells per side, for a total of 9 cells and 72 qubits. The pattern repeats to attain larger numbers of qubits: C8 for Two, C12 for 2X, and C16 for 2000Q. Right: Pegasus graph used by D-Wave Advantage. Shown is a P3 graph, with 12 full cells and several partial ones. Like with Chimera, the pattern repeats to create the larger P16 graph that Advantage features. Notice that Chimera is a subgraph of Pegasus; any problem native to Chimera is also native to Pegasus

Fig. 3
figure 3

TTS as a function of problem-size across generations at various anneal times. Ferromagnetic coupling is set to the default value. The scaling exponent is labeled as \(\alpha _{\text {machine, anneal time, }J_F \text { optimization}}\)

Fig. 4
figure 4

TTS as a function of problem-size across generations. \(J_F\) is optimized at the best performing anneal time. Optimizing \(J_F\) both reduces the TTS at each size and the exponent \(\alpha \). The scaling exponent is labeled as \(\alpha _{\text {machine, anneal time,} J_F \text { optimization}}\)

Fig. 5
figure 5

Left: TTS as a function of problem-size across generations. The time and \(J_F\) are set to the default value. Right: anneal time, and \(J_F\) are optimized for each machine. The scaling exponent is labeled as \(\alpha _{\text {machine, anneal time, } J_F \text { optimization}}\). Note that we are optimizing the TTS across anneal times and \(J_F\). Optimizing over anneal times is equivalent to setting the shortest allowed anneal times on annealers other than Advantage, with no optimal anneal time

5 Results

D-Wave’s quantum annealers allow the user to vary the anneal time while keeping the weight functions in Eq. (1) fixed. Using the shortest possible anneal time increases the number of solutions obtained in a fixed period, but it also affects the quality of the solutions [27]. We evaluate this tradeoff by considering TTS as a function of the anneal time and problem size. Figure 3 shows TTS as a function of problem size at various anneal times for D-Wave 2X, 2000Q, and Advantage. Generally, when varying t while fixing the problem size n, the lowest possible anneal time gives the lowest TTS for most graph sizes. However, the improvement is less significant for larger problem sizes, and the error bars on the TTS also increase with n. In other words, the improvement offered due to shorter anneal time becomes negligible for larger problem sizes. The scaling exponent \(\alpha \) increases slightly for shorter anneal times. This effect is most pronounced in Advantage where the exponent goes from \(\alpha _{\text {A,20 }\mu \text { s,def}} = 0.351 \pm 0.014\) at 20 \(\mu \)s to \(\alpha _{\text {A,1 }\mu \text { s,def}} = 0.508 \pm 0.022\) at 1 \(\mu \)s. However, note that if we limit ourselves to \(n \le 22\) (like for 2000Q), Advantage’s optimal annealing time would also be 1 \(\mu \)s, and it would outperform 2000Q for the smaller sizes. Overall, our results suggest that lowering anneal time further in future architectures could benefit small-sized scheduling problems. Experiments using future annealers featuring lower anneal times and allowing to solve larger problem sizes might benefit from assigning different anneal times to problems of different sizes.

Generally, \(J_F\), the ferromagnetic coupling applied to each vertex model, is set to be larger than the coupling between the logical qubits \(J_{ij}\) and the local field \(h_{i}\). However, if \(J_F\) is arbitrarily large, the coupled qubits may have the same spin, but the solution might not correspond to the ground state of the QUBO [32, 33]. By default, on D-Wave annealers, \(J_F = -1.0\) (where \(J_{ij}, h_i \in [-1.0,+1.0]\), with this range expanded for the latest generations). We considered optimizing \(J_F\) across problem sizes and problem instances (except for Advantage, where we only optimized for \(n=24\)).

In general, we found that setting \(J_F = -1.0\), as D-Wave’s native algorithm does, is not the optimal setting. Rieffel et al. [27] had argued for non-optimality of this default setting and further observed that for D-Wave Two, the ratio of the optimal ferromagnetic coupling \(|J_F|\) to the maximum internal coupling \(|J_{ij}|\) decreases with problem sizes for scheduling-type problems. We do not observe this trend on the newer machines. For 2X and 2000Q, \(|J_{F}|=0.875\) and 0.75 generally gives the shortest median TTS (see Appendix B for further details). For Advantage, we found \(-0.5\) to be the optimal value after exploring the range \(J_F \in [-0.4, -1]\) for \(n=24\) and \(t=20 \mu \)s.

As shown in Fig. 4, the optimization of \(J_F\) leads to a slight, statistically insignificant reduction in TTS for 2X and 2000Q. Optimizing \(J_F\) further across each vertex-model using standard optimization methods, like Nelder-Mead and gradient descent, was unwieldy. Due to the noisiness in the solution probability landscape, we observed that these search methods could not improve upon instance-level optimization. We suspect that as quantum annealers become less noisy, similar or more sophisticated optimization methods might succeed.

The effect of \(J_F\) optimization is most pronounced for Advantage. Here for all sizes and annealing times, \(J_F=-0.5\) performs better than the default setting. For example, the median TTS for the \(n=40\) instances is infinity for all annealing times (1,5, and 20 \(\mu \)s) with \(|J_F|\) = 1, while it is finite for \(|J_F| = 0.5\), and as low as \(2.8 \times 10^{4}\) \(\mu \)s when the annealing time is 20 \(\mu \)s. Moreover, we also see the scaling exponent decrease by a factor of two before and after \(J_F\) optimization. We suspect that this change has to do with the improved connectivity of the Pegasus architecture. As the vertex models are much smaller in Advantage compared to the older annealers, a strong \(|J_F|\) is no longer necessary to keep these vertex models from breaking. Overall, our observations make a strong case for the non-optimality of the default \(J_F\) setting.

Lastly, we compare the performance of each subsequent generation of annealers by solving identical problems at default and optimized settings (see Fig. 5). In order to avoid artificial flatness in the scaling due to inefficiency at smaller problem sizes, when we we consider the optimized setting, we report the lowest TTS across anneal times and JF optimization. The results in both configurations confirm the intuition that as the machine got upgraded, the performance improved. Note that the effect of hardware improvements between 2X and 2000Q was minimal. While optimizing the anneal time and JF was shown to reduce the TTS slightly, these minor improvements add up. Using the optimized JF at the lowest possible anneal time of 1 \(\upmu \)s, we do observe shorter TTS in 2000Q than 2X. Besides the obvious opportunity to test larger problem sizes, we observe a change in slope for the median TTS, which is very appreciable between generations and especially striking on Advantage. This performance improvement is primarily, but not entirely due to the smaller vertex model sizes required for embedding a densely connected Ising in the Advantage chip, owing to the increased connectivity of the Pegasus graph (see Appendix A for more details).

6 Conclusion

We analyzed and quantified the comparative performance of four generations of quantum annealers, D-Wave Two, 2X, 2000Q, and Advantage in solving a parameterized family of hard scheduling problems. By solving an identical ensemble of problems on each machine, we highlight how the hardware updates, lower anneal times, and the size and the connectivity of the architectures affect the time-to-solution for these problems. Under the default settings, we found a noticeable improvement in performance between 2X and Two, but not when updating from 2X to 2000Q. Advantage outperforms all of its predecessors.

Using the shortest possible anneal time gave the lowest TTS at most problem sizes, but this improvement got less pronounced as the problem size increased. For the largest problem sizes, using a longer anneal time yielded the best results. Optimizing the ferromagnetic coupling across each problem instance also reduced TTS, but the improvement was not pronounced for 2X and 2000Q. On Advantage, we found a substantial reduction in TTS when using an optimized ferromagnetic coupling, both in absolute terms and the scaling exponent. Overall, the exponent improved by nearly a factor of six, from \(1.006 \pm 0.01\) on Two to \(0.259 \pm 0.008\) on Advantage.

Hardware upgrades and optimization of operational parameters like anneal times and ferromagnetic couplings are crucial for improving quantum annealing performance. Using sophisticating annealing schedules, as demonstrated in  [6, 7], are other ways to improve hardware performance that we did not consider in this report. Quantifying the performance of scheduling problems using these advanced annealing schedules would be a natural extension of this work. As the problems used here can be generated systematically for larger problem sizes [25], these problems serve as a valuable and fair method of benchmarking future quantum annealers, both against other annealers and state-of-the-art classical heuristics.