1 Introduction

In the 1990s, an optimization algorithm called quantum annealing (QA) was proposed with the aim of providing a fast heuristic for solving combinatorial optimization problems (Kadowaki and Nishimori 1998; Ray et al. 1989; Finnila et al. 1994; Farhi et al. 2001; Santoro et al. 2002). At a high level, QA is an analog quantum algorithm that leverages the non-classical properties of quantum systems and continuous time evolution to minimize a discrete function. Annealing is the process that steers the dynamics of the quantum system into an a priori unknown minimizing variable assignment of that function. Under suitable conditions, theoretical results have shown that QA can arrive at a global optimum of the desired function (Born and Fock 1928; Kato 1950; Jansen et al. 2007). These results have motivated the study of using this algorithm for combinatorial optimization over the past thirty years.

Due to the computational difficulty of simulating quantum systems (Feynman 1982), the study of QA remained a theoretical pursuit until 2011, when D-Wave Systems produced a quantum hardware implementation of the QA algorithm (Berkley et al. 2010; Johnson et al. 2010; Harris et al. 2010; Johnson et al. 2011). This represented the first time that QA could be studied on optimization problems with more than a few dozen decision variables and spurred significant interest in developing a better understanding of the QA computing model (Job and Lidar 2018; Hauke et al. 2020; Crosson and Lidar 2021).

The release of D-Wave Systems’ QA hardware platform also generated expectations that this new technology would quickly outperform state-of-the-art classical methods for solving well-suited combinatorial optimization problems (Farhi et al. 2001, 2002; Santoro et al. 2002). The initial interest from the operations research community was significant. However, through careful comparison with both complete search solvers (McGeoch and Wang 2013; Puget 2013; Dash 2013) and specialized heuristics (Selby 2014; Boixo et al. 2014; Selby 2013; Mandrà et al. 2016; Mandrà and Katzgraber 2018; Rønnow et al. 2014; Hen et al. 2015; Albash and Lidar 2018), it was determined that the available QA hardware was a far cry from state-of-the-art optimization methods. These results tempered the excitement around the QA computing model and reduced interest from the operations research community. Since the waning of the initial excitement around QA, QA hardware has steadily improved and now features better noise characteristics (Vuffray et al. 2022; Zaborniak and de Sousa 2021; King et al. 2022) and quantum computers that can solve optimization problems more than fifty times larger than what was possible in 2013 (McGeoch and Farre 2020).

Since 2017, we have been using the benchmarking practices of operations research to track the performance of QA hardware platforms and compare the results with established optimization algorithms (Coffrin et al. 2019; Pang et al. 2021). In previous studies of this type, this benchmarking approach ruled out any potential performance benefit for using available QA hardware platforms in hybrid optimization algorithms and practical applications, as established algorithms outperformed or were competitive with the QA hardware in both solution quality and computation time. However, in this work, we report that with the release of D-Wave Systems’ Advantage Performance Update computer in 2021, our benchmarking approach can no longer rule out a potential run time performance benefit for this hardware. In particular, we show that there exist classes of combinatorial optimization problems where this QA hardware finds high-quality solutions around fifty times faster than a wide range of heuristic algorithms under best-case QA communication overheads and around fifteen times faster under real-world QA communication overheads. This work thus provides compelling evidence that quantum computing technology has the potential for accelerating certain combinatorial optimization tasks. This represents an important and necessary first condition for demonstrating that QA hardware can have an impact on solving practical optimization problems.

Although this work demonstrates encouraging results for the QA computing model, we also emphasize that it does not provide evidence of a fundamental or irrefutable performance benefit for this technology. Indeed, it is quite possible that dramatically different heuristic algorithms (Dunning et al. 2018; Mohseni et al. 2021) or alternative hardware technologies (McMahon et al. 2016; Goto et al. 2019; Matsubara et al. 2020; Kowalsky et al. 2022) can reduce the run time performance benefits observed in this work. We look forward to and encourage ongoing research into benchmarking the QA computing model, as closing the performance gap presented in this work would provide significant algorithmic insights into heuristic optimization methods, benefiting a variety of practical optimization tasks.

This work begins with a brief introduction to the types of combinatorial optimization problems that can be solved with QA hardware and the established benchmarking methodology in Sect. 2. It then presents a summary of the key outcomes from a large-scale benchmarking study in Sect. 3, which required hundreds of hours of compute time. In Sect. 4, the paper concludes with some discussion of the limitations of our results and future opportunities for QA hardware in combinatoral optimization. Additional details regarding the experimental design, as well as further analyses of computational results, are provided in the appendices.

2 Quantum annealing for combinatorial optimization

Available QA hardware is designed to perform optimization of a class of problems known as Ising models, which have historically been used as fundamental modeling tools in statistical mechanics (Gallavotti 2013). Ising models are characterized by the following quadratic energy (or objective) function of \(\mathcal {N} = \{1, 2, \dots , n\}\) discrete spin variables, \(\sigma _{i} \in \{-1, 1\}, \; \forall i \in \mathcal {N}\):

$$\begin{aligned} E(\sigma ) = \sum _{{(i, j) \in \mathcal {E}}} {J}_{ij} \sigma _{i} \sigma _{j} + \sum _{{i \in \mathcal {N}}} {h}_{i} \sigma _{i} , \end{aligned}$$
(1)

where the parameters, \({J}_{ij}\) and \({h}_{i}\), define the quadratic and linear coefficients of this function, respectively. The edge set, \(\mathcal {E} \subseteq \mathcal {N} \times \mathcal {N}\), is used to encode a specific sparsity pattern in the Ising model, which is determined by the physical system being considered. The optimization task of interest is to find the lowest energy configuration(s) of the Ising model, i.e.,

$$\begin{aligned} \begin{aligned}&\underset{\sigma }{\text {minimize}}{} & {} E(\sigma ) \\&\text {subject to}{} & {} \sigma _{i} \in \{-1, 1\}, \, \forall i \in \mathcal {N}. \end{aligned} \end{aligned}$$
(2)

At first glance, the lack of constraints and limited types of variables make this optimization task appear distant from real-world applications. However, the optimization literature on quadratic unconstrained binary optimization (QUBO), which is equivalent to minimization of an Ising model’s energy function, indicates how this model can encode a wide range of practical optimization problems (Kochenberger et al. 2014; Lucas 2014).

2.1 Foundations of quantum annealing

The central idea of QA is to leverage the properties of quantum systems to minimize discrete-valued functions, e.g., finding optimal solutions to Problem (2). The mathematics of QA is comprised of two key elements: (i) leveraging quantum states to lift the minimization problem into an exponentially larger space and (ii) slowly interpolating (i.e., annealing) between an initial easy problem and the target problem to find high-quality solutions to the target problem. The quantum lifting begins by introducing, for each spin, \(\sigma _i \in \{-1,1\}\), a \(2^{|\mathcal N |} \times 2^{|\mathcal N |}\) dimensional matrix, \(\widehat{\sigma }_i\), expressible as a Kronecker product of \({|\mathcal N |}\) \(2 \times 2\) matrices,

$$\begin{aligned} \widehat{\sigma }_i = \underbrace{\begin{pmatrix} 1 &{} 0 \\ 0 &{} 1 \end{pmatrix} \mathop {\otimes } \cdots \mathop {\otimes } \begin{pmatrix} 1 &{} 0 \\ 0 &{} 1 \end{pmatrix}}_{1\,\textrm{to}\, i-1} \mathop {\otimes } \underbrace{\begin{pmatrix} 1 &{} 0 \\ 0 &{} -1 \end{pmatrix}}_{i} \mathop {\otimes } \underbrace{\begin{pmatrix} 1 &{} 0 \\ 0 &{} 1 \end{pmatrix} \mathop {\otimes } \cdots \mathop {\otimes } \begin{pmatrix} 1 &{} 0 \\ 0 &{} 1 \end{pmatrix}}_{i+1\,\textrm{to}\,{|\mathcal N |}} . \end{aligned}$$
(3)

In this lifted representation, the value of a spin, \(\sigma _i\), is identified with the two possible eigenvalues, 1 and \(-1\), of the matrix \(\widehat{\sigma }_i\). The quantum counterpart of the energy function defined in Eq. (1) is the \(2^{|\mathcal N |} \times 2^{|\mathcal N |}\) matrix obtained by substituting spins, \(\sigma _{i}\), with the \(\widehat{\sigma }_{i}\) matrices, defined in Eq. (3), within the algebraic expression for the energy. That is,

$$\begin{aligned} \widehat{E} = \sum _{{(i,j) \in \mathcal{E}}} J_{ij} \widehat{\sigma }_i \widehat{\sigma }_j + \sum _{i \in \mathcal{N}} h_i \widehat{\sigma }_i . \end{aligned}$$
(4)

Notice that the eigenvalues of the matrix \(\widehat{E}\) are the \(2^{|\mathcal N |}\) possible energies obtained by evaluating \(E(\sigma )\) from Eq. (1) for all possible configurations of spins. This implies that finding the minimum eigenvalue of \(\widehat{E}\) is equivalent to solving Problem (2). This lifting is clearly impractical in the classical computing context, as it transforms a minimization problem over \(2^{|\mathcal N |}\) configurations into computing the minimum eigenvalue of a \(2^{|\mathcal N |} \times 2^{|\mathcal N |}\) matrix. The key motivation for the QA computational approach is that it is possible to model \(\widehat{E}\) with only \(|\mathcal{N} |\) quantum bits (qubits), so it is feasible to compute over this exponentially large matrix.

The annealing process in QA provides a method for steering a quantum system into the a priori unknown eigenvector that minimizes Eq. (4) (Kadowaki and Nishimori 1998; Farhi et al. 2000). First, the system is initialized at an a priori known minimizing eigenvector of a simple (“easy”) energy matrix, \(\widehat{E}_0\). After the system has been initialized, the energy matrix is interpolated from the easy problem to the target problem slowly over time. Specifically, the energy matrix at a point during the anneal is \(\widehat{E}_a(\Gamma ) = (1-\Gamma )\widehat{E}_0 + \Gamma \widehat{E}\), with \(\Gamma \) varying from zero to one. The annealing time is the physical time taken by the system to evolve from \(\Gamma =0\) to \(\Gamma =1\). When the anneal is complete (\(\Gamma =1\)), the interactions in the quantum system are described by the target energy matrix. For suitable starting energy matrices, \(\widehat{E}_0\), and a sufficiently slow annealing time, the adiabatic theorem demonstrates that a quantum system remains at the minimal eigenvector of the interpolating matrix, \(\widehat{E}_a(\Gamma )\) (Born and Fock 1928; Kato 1950; Jansen et al. 2007), and therefore achieves the minimum energy of the target problem.

2.2 Quantum annealing hardware

The computers developed by D-Wave Systems realize the QA computational model in hardware with more than 5000 qubits. However, the engineering challenges of building real-world quantum computers are significant and have an impact on the previously discussed theoretical model of QA. In particular, QA hardware is an open quantum system, meaning that it is affected by environmental noise and decoherence. The coefficients in Eq. (1) are constrained to the ranges, \(-4 \le {h_{i}} \le 4\), \(-1 \le {J_{ij}} \le 1\), and nonzero \({J_{ij}}\) values are restricted to a specific sparse lattice structure (i.e., \(\mathcal {E}^H \subseteq \mathcal {E}\)), which is determined by the hardware’s implementation. (See “Appendix A” for details.) The D-Wave hardware documentation also highlights five sources of deviation from ideal system operations called integrated control errors, which include background susceptibility, flux noise, digital-to-analog conversion quantization, input/output system effects, and variable scale across qubits (D-Wave Systems 2020). These implementation details impact the performance of QA hardware (Nelson et al. 2021). Consequently, QA hardware often does not find globally optimal solutions but instead finds near-optimal solutions, e.g., within 1% of the best-known solutions (Coffrin et al. 2019; Pang et al. 2021). All of these deviations from the ideal QA setting present notable challenges for encoding and benchmarking combinatorial optimization problems with available QA hardware platforms.

2.3 Benchmarking quantum annealing hardware

Due to the challenges associated with mapping established optimization test cases to specific QA hardware (Coffrin et al. 2019), the QA benchmarking community has adopted the practice of building instance generation algorithms that are tailored to specific quantum processing units (QPUs) (King et al. 2015; Hen et al. 2015; King et al. 2017; Denchev et al. 2016; Albash and Lidar 2018; Pang et al. 2021). The majority of the proposed problem generation algorithms build Ising model instances that are defined over a specific QPU’s hardware graph, i.e, \((\mathcal {N}, \mathcal {E}^H)\), or subsets of this graph, which are typically referred to as hardware-native problems.

In this work, we build upon an earlier class of hardware-native instances termed corrupted biased ferromagnets, or CBFMs, as proposed by Pang et al. (2021). Given the QPU graph, \((\mathcal {N}, \mathcal {E}^H)\), the CBFM model adopts the following distributions for hardware-native instances:

$$\begin{aligned} \begin{aligned} P({J}_{ij} = 0) = 0, \, P({J}_{ij} = -1) = 0.625, \, P({J}_{ij} = 0.2) = 0.375, \, \forall (i, j) \in \mathcal {E}^H \\ P({h}_i = 0) = 0.97, \, P({h}_i = -1) = 0.02, \, P({h}_i = 1) = 0.01, \, \forall i \in \mathcal {N}. \end{aligned} \end{aligned}$$
(CBFM)

This instance model is characterized by ten parameters, which define the probabilities that the h and J terms in the Ising model take on zero, positive or negatives values and the magnitude of those values. Benchmarking these instances on the previous generation of D-Wave’s QPU architecture (i.e., the 2000Q platform using the Chimera hardware graph) showed promising performance against state-of-the-art classical alternatives, although a clear wall-clock run time benefit was not achieved (Pang et al. 2021).

In this work, we design a variant of the CBFM problem class called CBFM-P, which is tailored to D-Wave’s first Advantage QPU platform. The model parameters for this problem class are

$$\begin{aligned} \begin{aligned} P({J}_{ij} = 0) = 0.35, \, P({J}_{ij} = -1) = 0.10, \, P({J}_{ij} = 1) = 0.55, \, \forall (i, j) \in \mathcal {E}^H \\ P({h}_i = 0) = 0.15, \, P({h}_i = -1) = 0.85, \, P({h}_i = 1) = 0, \, \forall i \in \mathcal {N}. \end{aligned} \end{aligned}$$
(CBFM-P)

The CBFM-P parameters differ from CBFM, as the Advantage QPU architecture features a different and denser hardware graph called Pegasus, whose topology is detailed in “Appendix A”. These new parameters were discovered using a metaheuristic approach that explored different combinations of the ten parameters in this model and sought to maximize the problem’s difficulty. In each evaluation of the metaheuristic, a combination of parameters was selected, one random instance was generated following this parameterization, and a variety of classical solution methods were executed on the instance. The instance difficulty was determined by comparing the lower and upper bounds of solutions found by these classical solution methods. Although this approach is naive, we found that it was sufficient for the objectives of this study. We expect that there exist classes of more challenging hardware-native instances on the Pegasus graph, but identifying these classes is left for future work.

3 Optimization performance analysis

In this section, we compare the performance of the D-Wave Advantage QPU and a variety of classical algorithms for optimization of CBFM-P Ising models. Specifically, we consider the following established classical algorithms:

  • A greedy algorithm based on steepest coordinate descent (SCD) (Pang et al. 2021);

  • An integer quadratic programming (IQP) model formulation solved using the commercial mathematical programming solver Gurobi (Billionnet and Elloumi 2007);

  • Simulated annealing (SA) (van Laarhoven and Aarts 1987; D-Wave Systems 2022);

  • A spin-vector Monte Carlo (SVMC) algorithm, which was proposed to approximate the behavior of QA (Shin et al. 2014);

  • Parallel tempering with iso-energetic clustering moves (PT-ICM) (Zhu et al. 2015).

SCD and IQP are general optimization approaches, intended to serve as strawman comparisons to understand solution quality, while SA, SVMC, and PT-ICM reflect high-performance classical competitors, which provide different tradeoffs in run time and solution quality. Details of these methods and others that were considered are discussed in “Appendix B”. All of these classical optimization algorithms were executed on a system with two Intel Xeon E5-2695 v4 processors, each with 18 cores at 2.10 GHz, and 125 GB of memory. The parameterizations used by each algorithm in this work are also detailed in “Appendix B”.

For the QA hardware comparison, we use the Advantage_system4.1 QPU accessed through D-Wave Systems’ LEAP cloud platform. The largest system we consider features \(|\mathcal {N} |= 5{,}387\) discrete variables and \(|\mathcal {E}^{H} |= 25{,}324\) quadratic coefficients in the Pegasus topology. Solving a hardware-native optimization problem on this platform consists of (i) programming an Ising model, (ii) repeating the annealing and read-out process a number of times, and (iii) returning the highest quality solution found over all replicates. In this analysis, we hold the annealing time constant at \(62.5 \upmu \)s, which is justified in “Appendix C”. The number of anneal-read cycles are varied between 10 and \(5{,}\!120\) to produce different total run times. We also leverage the spin reversal transforms feature, provided by the LEAP platform, after every 100 anneal-read cycles to mitigate the undesirable impacts of the aforementioned integrated control errors. For each Ising instance, this protocol typically requires less than two seconds of QPU compute time and less than 10s of total wall-clock time.

3.1 A characteristic example

Here, we present an evaluation of the above optimization techniques on a characteristic problem instance of the largest CBFM-P Ising models that we considered on the Advantage_system4.1 QPU, with 5, 387 variables.Footnote 1 For each solution technique, parameters that control the execution time of the algorithm (e.g., the number of sweeps in SA or the wall-clock time limit of the IQP method) were varied to understand their effects on solution quality. These parameters are detailed in “Appendix B”. All other parameters remained fixed.

Fig. 1
figure 1

Evaluation of solution quality for a characteristic example of the CBFM-P instance class with 5, 387 decision variables. Although the Advantage_system4.1 QPU does not find the best-known solution, it consistently and quickly finds solutions within \(0.5\%\) of the best-known solution. Here, the dashed line corresponds to the achieved solution quality from QA when using 2, 560 anneal-read cycles, as used in subsequent analyses. For comparison, the dotted line corresponds to a traditional optimization tolerance of \(0.01\%\), as typically used by mathematical programming solvers as a termination criterion

Benchmarking results for the CBFM-P instance “16” are shown in Fig. 1. Here, the horizontal axis measures the execution time of each algorithm, where each point indicates the best solution at the end of an independent algorithm execution with some set termination criterion. The vertical axis measures the solution quality as the relative difference from the best-known solution. Specifically, each solution’s relative difference is computed as

$$\begin{aligned} \% \text {Relative Difference} = 100\% \left( \frac{|\bar{E} - E^{*} |}{|E^{*} |}\right) , \end{aligned}$$
(5)

where \(E^{*}\) is the best-known objective value, i.e., the energy of Eq. (1) for the best-known solution, and \(\bar{E}\) is the objective value obtained for a specific solver and execution time.

In Fig. 1, we first observe that the QPU (i.e., the solid black line) is shown to find high-quality, but not optimal, solutions at very fast timescales (between 0.01 and 2.40 seconds), with relative quality differences between \(0.2\%\) and \(0.5\%\) of the best-known solution. Note that each execution time comprising the solid black line reflects a setting where the classical computer is colocated with and has exclusive access to the QPU. In practice, QPU access is managed by D-Wave’s remote cloud service, LEAP, which has overheads in both communication and job scheduling. These solve times are reflected by the open points in Fig. 1, which add between one and five seconds of overhead to the total idealized solve times. Impressively, even accounting for these significant overheads, the QPU is still able to obtain high-quality solutions well before all other classical methods that are considered.

Although the Advantage_system4.1 QPU is capable of quickly obtaining high-quality solutions in short amounts of time, it appears to reach a solution quality limit around \(0.2\%\). This relative difference is over an order of magnitude larger than the standard termination criterion used by mathematical programming solvers, i.e., an optimality gap of \(0.01\%\) or less, delineated by the dotted line in Fig. 1. To facilitate a comparison of the run time performance gained by the Advantage_system4.1 QPU, we thus propose a measurement that evaluates the ability of classical algorithms to match the solution quality found by the QPU after 2, 560 anneal-read cycles. The measurement we use is similar to determining the intersection with the dashed line in Fig. 1, albeit on a linear instead of logarithmic scale.

In this instance example, the best solution obtained by the QPU after 2, 560 anneal-read cycles is discovered after around 1.2 seconds when neglecting overheads and 4.3 seconds when including overheads. Most solution techniques (i.e., SVMC, IQP, and SCD) do not reach this solution quality after one hour of computation. Simulated annealing matches this quality after around 132 seconds and, linearly interpolating between the two nearest points before and after the intersection, PT-ICM is estimated to match this quality after around 77 seconds. Thus, the best-case performance of the QPU in this experiment, which assumes colocation with and direct access to the QPU, provides a 64 times improvement in run time, from 77 seconds with PT-ICM to 1.2 seconds. A similar comparison using the wall-clock run time yields an improvement of around 18 times. That is, even when including the overhead of communicating with D-Wave’s LEAP cloud service, the QPU is capable of providing a high-quality solution over an order of magnitude faster than all tested classical methods.

3.2 Problem scaling run time trends

In this subsection, we investigate how the run time performance of the QPU is impacted by the size of the problem that is considered. Unlike the previous section, here, we consider solution statistics that are aggregated over 50 distinct CBFM-P instances per problem size. Similar to the run time ratios discussed in Sect. 3.1, we estimate the amount of time required for the classical algorithms to match, on average, the solution quality reported by the QPU using an annealing time of \(62.5 \upmu \)s and 2, 560 anneal-read cycles. This experiment is performed for Pegasus lattice sizes ranging from two to sixteen, yielding problems with 40 to 5, 387 decision variables. For each instance, if a classical algorithm exactly matches the best solution (objective) found by the QPU, this time-to-match measurement is the earliest solve time at which that solution is obtained. If a classical algorithm finds a solution that does not strictly match but is better than the solution found using the QPU, the time-to-match measurement is estimated via a linear interpolation between the time at which the better solution is obtained and the time at which the worse solution, preceding it, is obtained.

Fig. 2
figure 2

Estimated relative computation times (“run time ratios”) required for classical algorithms to match the solution quality of the Advantage_system4.1 QPU as a function of problem size. Each point represents a mean computed over 50 random CBFM-P instances per problem size. Error bars correspond to standard errors of each mean run time ratio, and points are plotted only if the solver could match QA objectives for all 50 instances. The computational benefits of the QPU begin to become apparent for problems with around 1, 000 decision variables, and these run time benefits increase steadily with problem size

The results of conducting these scaling experiments and analyses are summarized in Fig. 2. Figure 2a illustrates the idealized run time improvements (i.e., without including communication overheads) as a function of the number of variables in the Ising model. It is clear that the problem scale has a large impact on the usefulness of QA hardware. For small problem sizes (e.g., less than 500 variables), the QPU run time is greater than most of the classical algorithms, as indicated by a run time ratio less than \(10^{0}\). For instances containing roughly 1, 000 variables or more, the QA hardware begins to have run time performance benefits, and only the best classical heuristics, i.e., PT-ICM and SA, are capable of matching the QA hardware’s solution quality, albeit sometimes after a significant amount of time. Note that points are excluded from Fig. 2a if a solver did not match the QA solution quality for all 50 instances. For example, SA matched the solution quality for only 49 of 50 instances for Pegasus lattice sizes of 9 and 10, and hence these points are excluded from this plot.

For the two most competitive classical methods, PT-ICM and SA, Fig. 2b shows that the run time benefits of the QA hardware increase steadily with problem size. This trend holds when considering both the idealized computation setting (solid lines) and the real-world setting that includes communication and scheduling overheads (dashed lines). In particular, the estimated \(15 \times \) run time ratio for the largest problem size is encouraging, as this suggests that the solutions identified by the QPU, even when accessed via a cloud computing service, can be obtained quickly enough to accelerate the performance of classical heuristic methods.

4 Limitations and opportunities

Sections 3.1 and 3.2 provide evidence that there exist classes of Ising models where available QA hardware can provide run time performance improvements over classical alternatives. This is an encouraging result, but it is also important to recognize some limitations of this study and available QA hardware.

Limitations of This Study: The foremost limitation of this work is that it considers Ising models that are hardware-native. Such models provide best-case scenarios for QA hardware and, thus far, have not reflected sparsity patterns of realistic combinatorial optimization tasks. Although this work demonstrates an important necessary condition for having a performance benefit on practical problems, it is not a sufficient condition. Benchmarking real-world problems is required to show that these benefits can be also realized in that context.

We also note that most of the classical algorithms employed in this work did not effectively exploit parallelism, and all except SVMC and PT-ICM used their single-threaded variants. Parallelism of classical algorithms may reduce or eliminate the performance benefits presented in this work. Further, the benchmarks considered in this study did not evaluate other novel computing technologies or special-purpose hardware (e.g., McMahon et al. 2016; Goto et al. 2019; Matsubara et al. 2020; Honjo et al. 2021; Kowalsky et al. 2022), which could provide improved performance on CBFM-P instances. Both of these avenues should be explored in future work to improve heuristic algorithms and better exploit computational resources.

Finally, we also recognize that this study does not attempt to demonstrate nor assert the much sought-after scaling advantage from quantum annealing (Rønnow et al. 2014), even for the contrived class of CBFM-P instances that are considered. This work has provided encouraging initial evidence of a class of Ising models where QA hardware can provide a practical, constant factor performance improvement over available classical algorithms.

Limitations of Current QA Hardware: The primary limitation of the QA hardware identified in this study is that it appears to approach a limit on solution quality for the largest CBFM-P instance class, i.e., around \(0.20\%\) from the best-known solution. More evidence for this behavior is provided in “Appendix C”. As such, this work adopted a time-to-match measurement of performance, which is atypical for an optimization benchmarking study. Additional research is required to develop extensions of the simple QA optimization protocols used in this work to understand if the hardware can achieve solutions that are within 0.01% of global optimality, which would make this hardware’s performance consistent with standard optimality tolerances used by commercial optimization tools. QA hardware improvements to reduce noise and integrated control errors would also serve to further close this gap.

Future Opportunities: Despite the limitations of this work and current QA hardware, our results provide encouraging evidence that QA hardware is reaching a point where existing classical optimization algorithms can be practically outperformed, especially over very short timescales. If QA hardware continues to increase in the number of qubits and hardware graph connectivity while also reducing noise properties, it is reasonable to expect that the performance gap on hardware-native problem instances will continue to increase, as suggested by the results of Sect. 3.2. Recently, D-Wave Systems announced their plans to develop an Advantage 2 QPU, which will support over 7, 000 decision variables and a denser hardware graph (D-Wave Systems 2021). If the trends observed in this work continue on this new platform, identifying even more dramatic performance gains should be possible. Acknowledging these anticipated hardware improvements, as well as the empirical findings of this study, revisiting the topic of demonstrating a QA scaling advantage (as in Denchev et al. 2016; Albash and Lidar 2018) is a natural next step to establish a stronger case for a fundamental performance benefit of the QA computing model for combinatorial optimization.

5 Conclusion

After roughly twenty years of research and development and ten years of focused commercial development, we believe quantum annealing technology has reached a level of technical maturity and performance that warrants serious consideration by the operations research community. This work has shown, for the first time, an order-of-magnitude run time performance benefit for quantum annealing over a wide range of classical alternatives, even when accounting for the substantial overheads involved in the practical usage of commercial quantum annealing services. Nonetheless, significant open challenges remain in translating these performance results into benefits for practical optimization tasks. There may be significant unrealized opportunities to hybridize this new computing technology into existing mathematical programming algorithms and impact real-world optimization challenges. We sincerely hope that this work will inspire the operations research community to increase its consideration of the quantum annealing computing model and continue exploring how it can potentially benefit mathematical optimization algorithms and practical applications. To support follow-on works along these lines, all of the test cases and runtime results used in the production of this article are made available at https://github.com/lanl-ansi/arXiv-2210.04291.