1 Introduction

In recent years, the field of Quantum Computing has gathered significant popularity, thanks to remarkable advancements that led to the development of several quantum computers of different architectures and technologies that can be used to tackle numerous problems. Although quantum computers are still limited both by their relatively small size and by the noise that limits the precision of the computation, the field is rapidly moving forward. Among the existing Quantum Computing paradigms, Quantum Annealing (QA) is a meta-heuristic that can be used to solve Quadratic Unconstrained Binary Optimization (QUBO) problems, a family of NP-hard optimization problems. The key idea of QA is to represent a QUBO problem as an energy minimization problem of a real and configurable quantum device. To do so, the problem variables are mapped onto physical quantum bits or qubits. The quantum device is steered toward a state of minimal energy, called ground state, with a controlled evolution. The ground state corresponds to the optimal solution of the original QUBO problem. The devices that implement the QA process are called Quantum Annealers.

The ability of QA to tackle NP-hard optimization problems and its flexibility to heterogeneous domains is what makes it an interesting technology for industries and researchers. Many applications of QA have been proposed in the fields of machine learning (Neukart et al. 2018; Mott et al. 2017; Mandrà et al. 2016; Ferrari Dacrema et al. 2022; Neven et al. 2009; Willsch et al. 2020; Kumar et al. 2018; Neukart et al. 2018; O’Malley et al. 2017; Ottaviani and Amendola 2018; Nembrini et al. 2021), chemistry (Micheletti et al. 2021; Hernandez and Aramon 2017; Streif et al. 2019; Xia et al. 2018) and logistics (Ikeda et al. 2019; Rieffel et al. 2015; Ohzeki 2020; Stollenwerk et al. 2017), but the results are not always competitive against classical heuristics solvers.

An important issue is that the quality of the solutions found by QA is limited by multiple factors. First of all, Quantum Annealers are physical devices that have a limited number of qubits and connections between them. This limits the size of the problems that they can tackle and requires to process of the QUBO problem adapting it to the physical structure of the Quantum Annealer. A second important aspect is that the quality of the solutions found by QA depends on the behavior of the underlying physical quantum system, which is very difficult to study. It is known that some problems appear to be more difficult to solve with QA (Yarkoni et al. 2022; Jiang and Chu 2023; Huang et al. 2023), but understanding why is not a trivial task and still an open research question.

Most of the previous studies on QA compare its performance, in terms of required time for computation with respect to other heuristic solvers, rather than on the quality of the solutions it finds, i.e., its effectiveness. There are two ways in which one can study the effectiveness of QA, one is by analytically describing the underlying quantum behavior and the other is to perform empirical experiments. A theoretical analysis has been performed for very small QUBO instances (Stella et al. 2005), which, however, are too simple to assess the effectiveness of QA when compared to other classical solvers. Furthermore, analytically analyzing such a quantum system becomes rapidly very expensive and is generally impossible for problems of interesting size. On the other hand, the existing empirical studies on the effectiveness of QA have explored much larger problems but focus mainly on specific tasks such as feature selection (Ferrari Dacrema et al. 2022), clustering (Kumar et al. 2018; Neukart et al. 2018) and classification (Mott et al. 2017; Willsch et al. 2020; Neven et al. 2009), and therefore, lack generality.

To the best of our knowledge, there is no published research that has investigated extensively how the characteristics of the problem impact the effectiveness of QA. For this reason, in this study, we propose a novel empirical methodology for the analysis of the effectiveness of QA, based on the study of the characteristics of QUBO problems with a meta-learning approach. The general idea consists of generating many QUBO instances, defining a set of features that can describe them, and train meta-models to predict whether QA would solve that problem or not. Our key contributions are as follows:

  • The design of an experimental methodology that can be applied to study the effectiveness of QA. This methodology can be used also for other quantum algorithms, such as QAOA (Farhi et al. 2014) or VQE (Fedorov et al. 2022);

  • The selection of ten classes of optimization problems, each one with specific characteristics, from which we generate approximately five thousand QUBO instances;

  • The design and the generation of a meta-learning dataset, which contains for each of the five thousand instances a selection of a hundred features based on probability theory, statistics, and graph theory. We show that using them it is possible to effectively predict whether QA would solve a problem instance effectively or not. We share the meta-learning dataset online for further research;

  • The analysis of the features of a QUBO problem with the strongest impact on the effectiveness of QA;

2 Background

2.1 QUBO and Ising models

In order to use Quantum Annealing (QA) to tackle optimization problems, these should be represented with one of two equivalent formulations called QUBO and Ising, suitable for NP-Complete, and some NP-Hard optimization problems (Glover et al. 2022; Lucas 2014). While the two are equivalent, the QUBO formulation is closer to traditional Operations Research, the Ising formulation is instead closer to Physics.

The objective function in the QUBO model is given by Eq. 1, where \(x \in \{0,1\}^n\) is a column vector representing the assignment of the binary variables \(x_1, x_2,..., x_n\), n is the number of problem variables, y the cost, and \(Q \in \mathbb {R}^{n \times n}\) is a real square matrix, either symmetric or upper triangular.

$$\begin{aligned} \min _x y = x^TQx \end{aligned}$$
(1)

We will refer to combinatorial optimization problems written in the QUBO formulation as QUBO problems. Note that the QUBO formulation does not allow for hard constraints. An optimization problem with constraints can be transformed into a QUBO problem by introducing a quadratic penalty term multiplied by a penalty coefficient p. The idea is that the hard constraints are transformed in soft constraints, such that if they are violated a positive penalty p is added to the cost function making the cost of that variable assignment worse. Note that by using soft constraints we do not have the guarantee that the optimal solution will satisfy the constraints, which may happen frequently if the penalty coefficient p has a value that is too low. In general, a quadratic binary optimization problem with equality constraints formulated as \(Ax - d = 0\), where \(d \in \mathbb {R}^m\) and \(A \in \mathbb {R}^{m \times n}\), can be transformed into the following QUBO problem:

$$\begin{aligned} \min _x y&= x^TQx + p \cdot x^TCx \\ C&= (Ax - d)^T(Ax - d) \nonumber \end{aligned}$$
(2)

If the quadratic binary optimization problem also has inequality constraints, those need to be transformed first into equality constraints using binary slack variables. For example, if we have the following constraint:

$$\begin{aligned} x_1 + 2x_2 + 4x_3 \le 3 \end{aligned}$$

we can transform it into an equality constraint by introducing the binary slack variables \(x_4\) and \(x_5\):

$$\begin{aligned} x_1 + 2x_2 + 4x_3 + x_4 + 2x_5 = 3 \end{aligned}$$

There exist no general rule to choose the best number of slack variables, so multiple strategies can be followed.

A second useful formulation is the Ising model, which was developed to describe an energy minimization problem for a system of particles (Glover et al. 2022; Lucas 2014). The objective function of the Ising model is given by Eq. 3, where \(s \in \{-1,1\}^n\) is the column vector representing the assignment of the n problem variables \(s_1, s_2,..., s_n\) also called spin variables, \(J \in \mathbb {R}^{n \times n}\) is the coupling matrix that describes the quadratic terms of the objective function and has zero diagonal, \(b \in \mathbb {R}^n\) is the bias vector, which contains the linear terms of the objective function. The constant term \(c \in \mathbb {R}\) is called offset.

$$\begin{aligned} \min _s y = s^TJs + b^Ts + c \end{aligned}$$
(3)

A QUBO problem can be transformed into an Ising problem through a linear mapping of the variables. In particular, a binary variable \(x_i\) is transformed into a spin variable \(s_i\) according to the following conversionFootnote 1:

$$\begin{aligned} x_i = \frac{1-s_i}{2} \end{aligned}$$

2.2 Quantum Annealing and Quantum Annealers

Quantum Annealing (QA) is a meta-heuristic solver for QUBO problems. It is based on the Adiabatic Quantum Computation (AQC) paradigm, with some relaxations (Yarkoni et al. 2022; Morita and Nishimori 2008; Farhi et al. 2000; Albash and Lidar 2018; Hauke et al. 2020). The idea is to represent the optimization problem as an energy minimization one, and then use a configurable device that exhibits the needed quantum behavior to minimize it. Such a device, the Quantum Annealer, is composed of multiple qubits connected between each other. QA works based on a time evolution of the quantum system. The initial state of the system is a default one, easy to prepare, so that the qubits are in a state of minimal energy, i.e., the ground state. Then, the physical system evolves slowly over a short amount of time by introducing a dependency on the Ising coefficients of the problem one wishes to solve. This means, for example, slowly changing the magnetic fields the qubits are subject to. At the end of the evolution, the physical system will depend only on the problem and, if the evolution was careful enough, it will still be in the ground state. Since the state of minimal energy is also the solution to the optimization problem, measuring the state of the qubits will yield the values that the problem variables should have.

The evolution of the system in QA occurs in a noisy environment and is subject to quantum fluctuations, i.e., quantum tunneling, which helps it explore the solution space. The noise of the system and the duration of the evolution influence the results of QA, if the evolution is too fast the system will likely escape its ground state and find a worse solution, while if the evolution is too slow noise may build up and push the system again out of the ground state. Due to its stochastic nature, QA acts as a device sampling low-cost solutions in a similar way as other classical solvers do, such as Simulated Annealing. For this reason, QA is repeated multiple times in order to obtain samples of the final state of the quantum system.

The physical devices that implement QA are called Quantum Annealers. Currently, D-Wave Systems Inc. is the company that provides the Quantum Annealers with the largest number of qubits.Footnote 2 For example, the D-Wave Advantage has more than 5000 qubits with a topology called Pegasus, where each qubit is connected to other 15 ones.

Solving a QUBO problem with a Quantum Annealer requires the following steps:

  1. 1.

    Formulate the problem as a QUBO or an Ising problem: the coefficients that are needed to configure the Quantum Annealers are those of the Ising formulation, as such the problem needs to be in this form. If the problem has a simpler formulation as a QUBO, the transformation is straightforward. Note that some problems can be formulated as QUBO or Ising easily, while others require more expensive processing.

  2. 2.

    Embed the problem on the topology of the device: since the Quantum Annealer is a physical object, we must fit the problem we want to solve on it, accounting for the limited number of qubits and of the connections between them. This procedure is called minor embedding (Carmesin 2022) and maps each problem variable to one or more qubits. If multiple qubits are needed to represent a single problem variable, that is called a qubit chain. If the problem has a large number of quadratic terms, a substantial number of qubits may be needed to create all the physical connections. Figure 1 shows an example of how a simple problem can be mapped on a Quantum Annealer. Minor embedding is an NP-Hard problem but polynomial-time heuristic algorithms are available (Choi 2008; Cai et al. 2014; Boothby et al. 2020)Footnote 3.

  3. 3.

    Evolution of the system and sampling of the solutions: once the minor embedding is done, the problem is transferred to the Quantum Annealer. First, the device is programmed with the problem coefficients, then we can perform a sequence of multiple evolutions to obtain the desired number of samples \(n_s\). Each sample requires three steps: (i) the evolution is run for the desired duration, called annealing time \(t_a\), (ii) the final state of the system is measured, which requires a read-out time \(t_r\) dependent on the number of qubits used, and (iii) the device pauses shortly for cooling.

More formally, the energy of a system can be modeled with an Hamiltonian, \(H \in \mathbb {R}^{2^n \times 2^n}\), and the evolution that occurs in QA is described by the time-dependent Hamiltonian H(t) that models the transition from the initial default Hamiltonian \(H_i\).Footnote 4 and the Hamiltonian describing the problem \(H_p\):

$$\begin{aligned} H(t) = A(t)H_i + B(t)H_p \end{aligned}$$
(4)

The coefficient A(t) decreases as the evolution progresses, while B(t) increases introducing the dependency on the characteristics of the problem, but their exact values depend on the hardware. At the beginning of the evolution B(t) is zero, while at the end A(t) is zero. Note that this is just a description of the underlying physical system and there is no need to compute this representation to use QA.

In the ideal Adiabatic Quantum Computing setting, it is possible to compute the exact annealing time needed to ensure the system remains in the ground state and finds the global optimum, this result dates back from a century ago (Born and Fock 1928). This optimal annealing time is inversely proportional to the smallest difference between the two smallest eigenvalues \(\lambda _1(t), \lambda _2(t)\) of H(t). Such difference is called minimum gap. Although this result may be useful to understand its behavior, it is not applicable to QA because it is subject to noise. Furthermore, computing the eigenvalues of H(t) is prohibitive for all but the smallest problems.

Fig. 1
figure 1

Embedding of a simple problem with six variables on a portion of a D-Wave Quantum Annealer using the Chimera topology. Each node represents a qubit and each edge is a physical connection between them. Nodes of the same color indicate the chain of qubits used to represent a single problem variable. Note how, while the original problem had six variables, the embedded one requires 14 qubits

To exemplify how this representation works, assume to have an Ising problem of n variables with coupling J and bias b, \(H_p\) is a \(2^n \times 2^n\) matrix computed as follows:

$$\begin{aligned} H_p = \sum _{i=1}^n \sum _{j=i}^nJ_{ij}\sigma _z^{(i)}\sigma _z^{(j)} + \sum _i^n h_i \sigma _z^{(i)} \end{aligned}$$
(5)

The matrix \(\sigma _z^{(i)}\) is the Z-Pauli operator \(\sigma _z\) acting on qubit i:

$$\begin{aligned} \sigma _z = \begin{pmatrix} 1 & \,\,\,\,0 \\ 0 & -1 \end{pmatrix} \end{aligned}$$
(6)
$$\begin{aligned} \sigma _z^{(i)} = \bigotimes _{k=1}^{i-1}I \otimes \sigma _z \bigotimes _{k=1}^{n-i}I \end{aligned}$$
(7)

with \(\otimes \) being the tensor product and I the identity matrix. A useful property of \(H_p\) is that it is a diagonal matrix that contains all the cost values for all possible variable assignments of the problem. Since it is diagonal, these values are also its eigenvalues and the corresponding eigenvectors encode the variable assignment that has that cost. The minimum eigenvalue of \(H_p\) corresponds to the minimal cost and the corresponding eigenvector to the optimal variable assignment.

As an example, consider the following QUBO problem, which is minimized when \(x_1 = x_2\):

$$\begin{aligned} \min _{x_1, x_2} y = x_{1} + x_{2} - 2x_1 x_2 \end{aligned}$$

The equivalent Ising formulation is as follows:

$$\begin{aligned} \min _{s_1, s_2} y = \frac{1}{2} - \frac{1}{2}s_1 s_2 \end{aligned}$$

For this small instance, we can compute \(H_p\) easily. The matrices \(\sigma _z^{(1)}\) and \(\sigma _z^{(2)}\) are:

$$\begin{aligned} \sigma _z^{(1)} = \sigma _z \otimes I = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & -1 & 0 \\ 0 & 0 & 0 & -1 \\ \end{pmatrix} \quad \sigma _z^{(2)} = I \otimes \sigma _z = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & -1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & -1 \\ \end{pmatrix} \end{aligned}$$

\(H_p\) is then equal to:

$$\begin{aligned} H_{p} = \frac{1}{2}\begin{pmatrix} -1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & -1 \\ \end{pmatrix} \end{aligned}$$

The smallest eigenvalue of \(H_p\) is \(\lambda _{\min } = -\frac{1}{2}\) and has a multiplicity of two corresponding to the first and last eigenvalues. Indeed, both \(x_0 = 0, x_1=0\) and \(x_0 = 1, x_1=1\) are optimal solutions to the problem. If we sum \(\lambda _{\min }\) with the offset of the Ising problem, \(\frac{1}{2}\), we obtain 0, that is the same value of the QUBO cost function when \(x_1 = x_2\).

2.3 Studies on the effectiveness of QA

Most of the previous studies on QA focus on its performance, by measuring the time required to solve a problem and comparing it to that of classical solvers. To the best of our knowledge, there is no consensus on whether QA provides a general and consistent speedup compared to other traditional solvers for QUBO problems (Hauke et al. 2020; Yarkoni et al. 2022; Katzgraber et al. 2014), while a recent paper claims substantial speedup for a quantum simulation task (King et al. 2024). Although some papers may claim a speedup, this is often based on measurements that only account for part of the process. Indeed, one should consider the time required by all phases: (i) formulating the optimization problem as QUBO or Ising, (ii) embedding the problem on the QA, (iii) sampling the solutions on the device and (iv) postprocessing the results if needed (for example by checking if the constraints are satisfied). Frequently, the efficiency of QA is measured by only accounting for the usage of the quantum device itself (programming time and the repeated annealing and read-out) while ignoring the time needed for minor embedding and for creating the QUBO formulation. This gives an incomplete picture of the technology that does not account for two significant bottlenecks. For example, it may be that in a certain situation, QA is faster than other traditional methods in solving a specific QUBO problem, but that may not be the case anymore if one includes the minor embedding phase. Furthermore, if it is very computationally expensive to formulate the problem as QUBO, it may be more efficient to use other traditional methods that do not need a QUBO formulation at all.

When comparing the quality of the solutions found by QA and classical solvers, i.e., their effectiveness, the published literature usually focuses on problems related to specific fields or even to very specific instances of those problems. Due to this, there is still a limited understanding of how would QA compare in a more general setting. For example, the effectiveness of QA has been analyzed for feature selection (Ferrari Dacrema et al. 2022), classification (Mott et al. 2017; Willsch et al. 2020; Neven et al. 2009) and clustering (Neukart et al. 2018; Kumar et al. 2018), which are typical machine learning tasks. In the field of chemistry, QA has been applied and analyzed to find the equilibria of polymer mixtures (Micheletti et al. 2021), to find similarities between molecules (Hernandez and Aramon 2017) and to find their ground state (Streif et al. 2019). The effectiveness of QA in solving problems related to logistics has been analyzed too, for example in solving the Nurse Scheduling Problem (Ikeda et al. 2019) and in optimizing the assignments of the gates at the airport (Stollenwerk et al. 2017).

Previous research also studied the effectiveness of QA from a theoretical perspective by representing analytically the evolution of the time-dependent Hamiltonian H(t) (see Eq. 4) and computing the probability of escaping the ground state (Stella et al. 2005). This approach is, however, limited by the fact that the size of the Hamiltonian grows exponentially on the number of QUBO problem variables n, and the analytical analysis of the Hamiltonian becomes rapidly impractical for all but the smallest problems. An alternative way is to adopt an empirical approach, by using the outcome of multiple experiments to probe the underlying physical system (Irsigler and Grass 2022). The idea is to allow the evolution to progress up to a certain intermediate stage and then drastically accelerate it (i.e., a quench, according to D-Wave terminology), observing how the effectiveness changes based on when the evolution accelerated. While this approach allows to tackle of large problem instances, applying the acceleration at different stages of the evolution requires to repeat the experiment a large number of times and therefore this approach too is very resource-intensive.

To overcome the limitations of the methods adopted in the literature, this paper proposes a new empirical approach to study how the quality of the solutions found by QA is impacted by the characteristics of the problem. To achieve this, we first collect a dataset of problem instances belonging to 10 selected problem classes and solve them using both QA and three classical solvers. Then, for each problem instance, we compute a set of features describing various characteristics, from the distribution of the bias coefficients of its QUBO formulation to the topology of the graph that describes the instance once it has been embedded in the Quantum Annealer. Using this dataset, we train a machine learning classifier to identify whether QA was able to find a good solution for that instance and, finally, use it to assess which are the most important problem features.

3 Meta-learning dataset generation

In this section, we present the methodology used to generate our meta-learning dataset, on which we train the meta-models to predict the effectiveness of QA. We publish this dataset online for future research. First, we describe how we select the ten classes of problems we want to solve with QA and the strategies we use to generate the five thousand instances. Then, we describe how we evaluate the effectiveness of QA, in terms of closeness to the optimal solution of the problem and by comparing QA with other classical methods (Simulated Annealing, Tabu Search, and Steepest Descent). Third, we describe the representations of the QUBO problem we used to compute the approximately one hundred features used to train the meta-models. Finally, we describe how we solve the instances with QA and with the classical solvers, with a particular focus on the choice of the optimal hyperparameters of the solvers.

3.1 Selection of problems and instances

We identify a selection of ten different optimization problems that exhibit different characteristics: some have constraints, others do not; some have linear terms, others do not; some have a large number of quadratic terms while others do not, etc. The details on their formulations are reported in Appendix 1 and the details on the generation of the instances are in Appendix 2.

The first group contains five classes of optimization problems defined over a graph: Max-Cut, Minimum Vertex Cover, Maximum Independent Set, Max-Clique, and Community Detection. They were selected for the following reasons. Both the Max-Cut and Community Detection problems have a straightforward QUBO formulation that does not require penalties to represent constraints. The Max-Cut, Maximum Independent Set, and Minimum Vertex Cover problems share the same quadratic terms in their QUBO matrix, but not the diagonal (i.e., the linear terms or bias). The Max-Clique problem is formulated as a Maximum Independent Set problem but is defined on the complement graph. The Community Detection problem has a very dense QUBO matrix as there are quadratic terms between all variables and is a relevant problem in Machine Learning (Nembrini et al. 2022). Since these problems are formulated on a graph, we apply them to four different graph topologies: Erdös-Renyi, Cyclic, Star, and 2d-grid. Note that in order to have a diversified set of instances we introduce small random perturbations to each topology, consisting of few edge insertions and deletions. The number of insertions and deletions depends on the number of nodes on the graph. More details are reported in Appendix 2.1.

The second group of five optimization problems contains Number Partitioning, Quadratic Knapsack, Set Packing, Feature Selection, and \(4 \times 4\)-Sudoku. These are a more heterogeneous set than the previous graph-based problems and so require ad-hoc strategies to generate their instances which we detail in Appendix 2.2. Similarly to the Max-Cut and Community Detection problems, the Number Partitioning problem has a straightforward QUBO formulation with no penalty terms to represent constraints. Similarly to the Community Detection problem, the Feature Selection problem has a dense QUBO matrix with quadratic terms between all variables. Finally, the Quadratic Knapsack, Set Packing, and \(4 \times 4\)-Sudoku problems are all Constraint Satisfaction Problems, each with different types of constraints. In particular, the Quadratic Knapsack problem has inequality constraints that need to be converted into equality constraints using slack variables.

We generate multiple instances of all the problem classes we selected. Concerning the size of the problem instances, measured in the number of problem variables, there are two constraints to take into account. First, the D-Wave Quantum Annealer that we use has more than 5000 qubits but, due to their limited connectivity, it is generally possible to tackle problem instances up to between 100 and 200 variables depending on the structure of the QUBO problem. This is due to the minor-embedding phase. Second, we want the instances to be representative of problems that are not trivial and with a Hamiltonian that could not be analyzed analytically. In order to provide a more complete picture, we are also interested to assess the impact of the distribution of the solution space of the problem. This, formally, corresponds to the set of eigenvalues and eigenvectors of the Hamiltonian of the problem \(H_p\) (see Section 2.2). Unfortunately, it is impractical to compute them for instances of more than 32 problem variables, which may be too small and easy to allow a comparison of the effectiveness of different solvers. For these reasons, we decided to create two separate sets of instances:

  • One set of large instances, with 5114 instances of between 69 and 99 variables, the upper range of what can be tackled with the QA;

  • One set of small instances, with 246 instances of between 27 and 32 variables. With this set of instances, we can do a more complete analysis which includes also the distribution of the solution space.

The number of problem instances for each optimization problem is summarized in Table 1. Notice that the instances of the \(4 \times 4\)-Sudoku problem are included only in the small instances set since the largest possible instance is unique and it has at most 64 variables. All the generated instances are satisfiable and, when needed, the penalty term coefficient p used in the QUBO formulation is optimized with a Bayesian SearchFootnote 5 (Victoria and Maragatham 2021; Snoek et al. 2012), in order to maximize the number of feasible solutions for Simulated Annealing.

Table 1 Each row of the table gives the number of small and large instances related to each problem class

3.2 Evaluating the effectiveness of a solver

In this section, we describe how to evaluate the effectiveness of a solver and, in particular, of QA. Both QA and the traditional solvers, we compare it to are stochastic and are executed multiple times to obtain a set of variable assignments that aim to minimize the cost function, which we call a set of samples. A sample is represented by an assignment of the decision variables x and by the related cost value y.

We solve all instances with QA, Simulated Annealing (SA), Tabu Search (TS), and Steepest Descent (SD). Our definition of how much a solver is effective is based on whether it finds samples that meet some quality constraints. While for the small instance set it is possible to compute the global optimum, for the large ones it is not feasible to do so, and therefore, we define the effectiveness in relative terms with respect to the other solvers.

In particular, we evaluate the effectiveness of QA on the large instances set by comparing its samples with those of the traditional heuristic solvers. We define the samples associated with the best cost value for a solver S as \(y^S_{\text {min}}\). An instance I is QA-over-all if the best solution found by QA is at least as good as the best one found by SA, SD, and TS combined. More formally, if \(y^{QA}_{\text {min}} \le \min {\{y^{SA}_{\text {min}}, y^{TS}_{\text {min}}, y^{SD}_{\text {min}}\}}\). Comparing QA with a pool of multiple solvers results in a stricter evaluation of its effectiveness, but the condition that QA has to be at least as good as all the other solvers combined may be too strict. For this reason, we also compare QA with each individual solver. An instance I is QA-over-S if the best solution found by QA is at least as good as the one found by solver S, hence \(y^{QA}_{\text {min}} \le y^S_{\text {min}}\).

For the small instances, we can perform a deeper analysis of the effectiveness because we can explore the full solution space and find the global optimum. This is in practice done by computing the Hamiltonian of the problem, \(H_p\), which is a diagonal matrix enumerating the eigenvalues \(\lambda \), sometimes called energy, of all the variable assignments. The eigenvalue is equivalent to the cost function y but does not include possible constant offsets c from the Ising formulation, therefore \(y = \lambda + c\). The variable assignment x associated with an eigenvalue \(\lambda \) can be computed starting from the corresponding eigenvector of \(H_p\). The global optimum of an instance is the assignment \(x_{\min }\) corresponding to the minimal eigenvalue of \(H_p\), \( \lambda _{\min }\). We will refer to the maximum eigenvalue as \(\lambda _{\max }\).

We define a sample with energy \(\lambda \) as \(\epsilon \)-Optimal if the following condition holds:

$$\begin{aligned} \lambda \le \lambda _{\min } + \epsilon \cdot (\lambda _{\max }- \lambda _{\min }) \end{aligned}$$
(8)

The \(\epsilon \)-Optimality condition describes how close is the eigenvalue of a sample to the solution of the instance. The coefficient \(\epsilon \in [0,1]\) allows to restrict the interval under which \(\lambda \) is considered close enough to the optimal eigenvalue \(\lambda _{\min }\). Notice that if \(\epsilon =0\) only the global optimum of the instance meets the constraint in Eq. 8.

We also define a sample x as Hamming-Optimal (h-Optimal) if it differs from any solution \(x_{\text {min}}\) in at most one decision variable. This corresponds to check the Hamming distance between a sample and a solution of the instance:

$$\begin{aligned} \left| \left| x - x_{\min }\right| \right| _{\text {Hamming}} \le 1 \end{aligned}$$
(9)

3.3 Meta-learning features

In this section, we introduce the features we define to describe a problem instance. We rely on a selection of metrics used in statistics and probability theory, such as the Gini coefficient (Damgaard and Weiner 2000), the Herfindahl-Hirschman index (Brezina et al. 2016) and the Shannon entropy (Shannon 1948), as well as metrics used in graph theory, such as the spectral gap, the radius a graph, its diameter and its connectivity. In total, we compute 107 features, which we describe in detail in Appendices 3 and 4.

The features we compute can be grouped in multiple domains of analysis. Overall, we identify seven domains, among which we describe the three most relevant ones:

  • Logical Ising Graph (LogIsing): This domain uses the Ising formulation of a QUBO problem. It is represented as a graph having one node per problem variable, associated with the corresponding bias b, and an adjacency matrix that corresponds to the coupling matrix J;

  • Embedded Ising Graph (EmbIsing): This domain uses the Ising formulation of a QUBO problem obtained after its minor embedding on the QA. The target architecture is D-Wave Advantage with the Pegasus topology. Therefore, this formulation represents the actual problem solved by the Quantum Annealer, in which multiple qubits may be used to represent one problem variable. This formulation is represented as a graph in the same way as LogIsing;

  • Solution Space (SolSpace): This domain uses the eigenvalues, or energy values, of all possible variable assignments, which can be computed only for the small instances, and aims to describe how they are distributed.

Other domains we identify are (i) Normalized Multiplicity (NorMul), whose features are related to the multiplicity of the eigenvalues of \(H_p\); (ii) Matrix Structure (MatStruct), which contains features related to the distribution of the values of the matrix Q of a QUBO problem; (iii) 25%-SolSpace and 25%-NorMul, which contain the same features of SolSpace and NorMul, but computed by considering only the 25% lowest eigenvalues of \(H_p\), i.e., the energies of the 25% best solutions.

For the LogIsing and EmbIsing domains, we compute several features on different mathematical objects, such as the coupling matrix J, the Laplacian matrix of the corresponding graph, and the bias vector. We call such objects components.

We can also identify sets of features that refer to the same mathematical object but are computed on different domains. For example, both LogIsing and EmbIsing domains include features computed on the bias. We refer to them as component sets and allow us to perform an analysis of the importance of those mathematical objects that is orthogonal to that of the domains. We identified the following component sets: Coupling, Bias, Laplacian, Structural Adjacency (StructAdj), and Structural Laplacian (StructLap), where StructAdj and StructLap gather features related to the binarized versions of the coupling and Laplacian matrices.

3.4 Hyperparameter optimization of the solvers

Since the goal of this study is to compare the effectiveness of different solvers, it is essential to ensure that each solver is using the best hyperparameters. Indeed, it is well known in many fields that comparing methods that are not consistently optimized leads to inconsistent results that cannot be used to draw reliable conclusions (Shehzad and Jannach 2023; Ferrari Dacrema et al. 2021). The same applies in our case.

We optimize the hyperparameters of each solver (QA, SA, TS, SD) on the instance with the largest minor embedding on D-Wave Quantum Annealer for each optimization problem class. The goal is to identify the hyperparameters that will lead the solver to find the variable assignment with the lowest cost y. Once the optimal hyperparameters have been found, they are used to solve all instances of the corresponding problem class. We optimize separately the hyperparameters used for the large and small instance sets. To optimize the hyperparameters of the classical solvers, we use the standard QUBO formulation, while for QA, we use the embedded QUBO formulation: we followed this strategy because the embedded QUBO formulation is required only for QA. In this way, we have a fair comparison between different solvers since, for each one of them, we take into account only the necessary steps to solve a QUBO instance.

Optimal hyperparameters of Quantum Annealing

QA has several hyperparameters that can be optimized,Footnote 6 some of which refer to the evolution process as a whole while others allow to fine-tune it at the level of each individual qubit. The access to the D-Wave Quantum Annealers is limited and for such a large set of instances, we have devised a methodology to optimize the hyperparameters we believe are the most important: the annealing time \(t_a\) and the number of samples \(n_{s}\). In order to perform an efficient optimization within the available resources, we define a fixed computational budget T for each instance. Using the default annealing time, \(20 \mu s\), and drawing 100 samples requires, in the worst case, at most \(37\;ms\). In our experiments, we allocated \(T=70\;ms\) and \(T=300\;ms\) per each problem instance.

The optimization is performed by iterating over 10 values for \(t_a\), approximately equidistant from each other, between \(5\;\mu s\) and \(200\;\mu s\). Given \(t_a\), the number of samples \(n_{s}\) is computed as the maximum value allowed within the computational budget T, according to Eq. 10. For \(T =70\;ms\), \(n_s\) is between 145 and 537 while, for \(T =300\;ms\), \(n_s\) is between 766 and 2826.

$$\begin{aligned} n_{s} = \left\lfloor \frac{T - t_{p}}{t_a + t_{r} + \Delta } \right\rfloor \end{aligned}$$
(10)

The term \(t_p \simeq 15\; ms\) is the time needed to program the instances on the Quantum Annealer, \(t_{r}\) is the read-out time, needed to read the results of the annealing process, and \(\Delta \simeq 20\; \mu s\) is the delay applied after each read-out operation. The read-out time \(t_{r}\) is unknown a-priori because it depends on the size of the embedded problem. Based on empirical observation, we use \(t_{r}=75 \;\mu s\) for small instances and \(t_{r} = 150\;\mu s\) for large instances. We choose \(t_a\) and the related \(n_s\) which provide the sample with the lowest energy. If for multiple pairs (\(t_a\), \(n_s\)) QA finds samples with the lowest energy, we choose the pair with the smallest \(t_a\).

Since the results we obtained when using both computational budgets are very similar, we report those for \(T=70\;ms\). The selected hyperparameters are reported in Table 2. Notice that the annealing time \(t_a\) for the large instances is often smaller than for the small instances. This highlights that, for large instances, a larger \(t_a\) does not improve the effectiveness of QA, at least in the range of values, we considered and for the number of samples it allows to draw. Such result suggests that QA may require an additional optimization, for example, of the annealing schedule, which is not straightforward and it goes beyond the scope of this study.

Table 2 Optimal hyperparameters for QA for each problem class

Optimal hyperparameters of the classical solvers

The hyperparameters of Simulated Annealing (SA), Tabu Search (TS), and Steepest Descent (SD) are optimized with the following procedure. For the optimization of these methods, we do not use a fixed computational budget because the technology is fundamentally different and, due to the various stages required by QA, it is not trivial to define such a comparison in a way that is fair. First, we fix the number of samples to \(n_s = 200\) which is a value comparable to that used for QA. For half of the large instances, QA uses more samples than the classical solvers, while for the remaining half the opposite is true. We optimize the hyperparameters with a Bayesian Search of 100 iterations (Victoria and Maragatham 2021; Snoek et al. 2012). The results are available in Appendix 5. For TS, we optimize the number of restarts of the algorithm and the initialization strategy. For SD, there are no hyperparameters to optimize, except for the number of samples, which we have already set. For what concerns SA, we optimized the number of sweeps,Footnote 7 the scheduleFootnote 8 and the initial state generator. We noticed, however, that hyperparameters, we found for SA produced worse results compared to the default ones in our following analysis, which may be due to the sensitivity of SA to some of them. For this reason, we retain the default hyperparameters of 1000 sweeps, a geometric beta schedule, and a random initial state generator.

4 Meta-model training and optimization

In this study, we aim to identify which are the characteristics of a problem that impact the effectiveness of QA. We do this by first training a classification model on the dataset we have created in order to predict whether QA would solve that instance well or not based on its features. Since the classifier is trained to predict the outcome of another experiment, it is called a meta-model. Once the meta-model is trained, we can use it to probe how important are the various features.

We train the meta-models with Random Forest, AdaBoost, XGBoost and Logistic Regression, using as input data either a specific domain (e.g., LogIsing, EmbIsing, SolSpace) or a specific component set (e.g., Bias, Coupling, Laplacian), which are described in Section 3.3. The target labels are described in Section 3.2 (i.e., Optimal, \(\epsilon \)-Optimal, h-Optimal) and they are binary, according to whether the solver meets that effectiveness condition or not.

The first step is to train the meta-model and optimize its hyperparameters to ensure it is effective in predicting the label. In order to measure the effectiveness of the meta-models, we have to account for the significant class imbalance of the labels toward the negative class, i.e., instances that are not solved well by QA (see Section 5). We use Balanced Accuracy (BA) to evaluate the meta-models because it is robust to class imbalance. Given the true positives as TP, the true negatives as TN, the number of positive labels in the data as P and the number of negative labels as N, the Balanced Accuracy BA is computed as follows:

$$\begin{aligned} BA = \frac{1}{2}\left( \frac{TP}{P} + \frac{TN}{N}\right) \end{aligned}$$
(11)

The training and optimization of the meta-models is performed with 5-fold Nested Cross-Validation. First, we create a 5 folds testing split with a training fold and a testing one which we will use to train and evaluate the meta-model. In order to find the optimal hyperparameters for the meta-model, we split each training fold with a further 5 folds split, the optimization split. This results in 5 optimization splits for each of the 5 training folds of the testing split and is aimed at preventing the overfitting of the meta-models. The splits are all stratified with respect to the problem class of the instances, to ensure every split has an equal distribution of the problem classes. All meta-models are trained on the same data splits and we perform different splits for the large and small instances. The hyperparameters of the meta-models are optimized according to a Bayesian Search (Victoria and Maragatham 2021; Snoek et al. 2012) exploring 50 configurations, we select those that provide the best Balanced Accuracy on the optimization split.

Once the meta-models have been optimized, we use them to assess which problem characteristic, i.e., feature, is most important. We use Permutation Feature Importance (PFI), which evaluates how the accuracy of a model drops when the values of a certain feature are shuffled. The idea is that the more important a feature is the larger will be the drop when the values of that feature are shuffled. For each feature the process is repeated multiple times and the corresponding importance is given by the mean of the drop in accuracy observed.

5 Results and analysis

In this section, we provide the most relevant insights of our analysis regarding the effectiveness of QA. We have three goals: (i) determine hitch classes of problems are more difficult to solve with QA, (ii) understand whether it is possible to predict the effectiveness of QA based on the features we have identified; and (iii) discover the domains, the component sets and the features that impact the effectiveness of QA. To do so, we describe the results obtained by solving the instances with QA and with the other classical solvers. Then, we describe the results of the validation of the meta-models and of the Permutation Feature Importance performed on their features. We publish online a dataset with all the instances we generated, the features we computed and the samples obtained for each solver.Footnote 9

Table 3 Comparison on the percentage of problem instances in which QA is at least as effective as a specific solver (QA-over-SA, QA-over-TS and QA-over-SD) or as all of them combined (QA-over-all)

5.1 Effectiveness of QA for large instances

In this section, we discuss the effectiveness of QA compared to the other classical solvers (SA, TS and SD) on the large problem instances. Table 3 reports the results on each problem class according to the labels we defined in Section 3.2, i.e., whether the best sample found by QA is at least as good as that found by a specific solver (QA-over-SA, QA-over-TS and QA-over-SD) or by all of them combined (QA-over-all).

As a general comment, we can observe that for less than half of the problem classes (4 out of 9) QA solves effectively more instances than at least one classical solver, while for most of the problem classes (7 out of 9) QA is more effective than at least one of the classical solvers for some particular instances. However, if we combine all classical solvers QA is more effective only in three problem classes but mostly to a limited extent. Only for Max-Cut QA shows a consistently high effectiveness. These results confirm that the effectiveness of QA depends on the problem class, as is the case for classical solvers, which is consistent with what was observed in previous studies (Yarkoni et al. 2022; Jiang and Chu 2023; Huang et al. 2023). If we compare QA and classical solvers throughout the problem classes, we can see that QA is very frequently more effective than SA, while it is more effective than TS or SD only on some specific problem classes. As a result, we conclude that comparing QA only with SA, without considering other solvers, is not the best practice to evaluate the effectiveness of QA.

Regarding the characteristics of the problem classes, a first observation we can make is that QA is more effective on problems that do not require penalties to represent constraints: Max-Cut, Community Detection, and Number Partitioning. This suggests that the presence of constraints is a factor that makes a problem more difficult to solve with QA. The reason for this may be due to the type of quadratic terms introduced by the penalties which could open new research directions in whether one could use a different formulation for the same constraint that is more suitable for QA (Mirkarimi et al. 2024). Furthermore, remember that Max-Cut, Maximum Independent Set, and Minimum Vertex Cover share the same Ising coupling matrix J, with the exception of a multiplicative factor, but have a different bias vector b. We can observe how Max-Cut is the only one among them that is solved effectively by QA, suggesting that the bias structure plays an important role as well.

Lastly, a high number of quadratic terms (i.e., a dense coupling matrix J) does not always negatively affect QA. In particular, both Community Detection, and Number Partitioning have a dense coupling matrix but still \(13\%\) of Community Detection instances and the \(30\%\) of Number Partitioning instances are solved effectively with QA.

5.2 Effectiveness of QA for small instances

In this section, we discuss the effectiveness of both QA and the other classical solvers (SA, TS, and SD) on the small problem instances. For these instances, we can compute the cost associated with all variable assignments and the global optimum. We do this by computing the Hamiltonian of the problem \(H_p\) and using its eigenvalues (i.e., its diagonal). We also compute the maximum energy values needed to assess the \(\epsilon \)-Optimality for the samples of QA.

Table 4 Fraction of the instances that are solved well according to a certain effectiveness condition (see Section 3.2)

Table 4 compares the effectiveness of the solvers according to the labels defined in Section 3.2, i.e., if the solver finds the global optimum (Optimal), if the energy of the best sample is close to that of the global optimum (for \(\epsilon \)-Optimal, we use \(\epsilon = 10^{-5}\)), if the variable assignment of the best sample has a Hamming Distance of at most 1 with any of the global optimum solutions (h-Optimal).

Consistently with what was observed for the large instances, QA is less effective than the classical solvers on all the effectiveness conditions. If we compare \(10^{-5}\)-Optimal and h-Optimal, we can see that QA finds more h-Optimal samples than \(10^{-5}\)-Optimal samples, as opposed to the other solvers. This indicates that QA finds more easily samples which are close to the optimal ones in terms of Hamming distance rather than energy. Notice that in this experiment SD is the most effective solver, this may be related to the small size of the instances which may make them relatively easy to solve with simple strategies.

Table 5 Fraction of the instances in which the solvers are able to find the global optimum (i.e., Optimal)

As done for the large instances, we compare the effectiveness of the solvers on the problem classes. Table 5 shows the fraction of problem instances in which the solver finds the global optimum (i.e., Optimal). Overall, as opposed to what we observed for the large instances, on the small ones QA is never more effective than the classical solvers. QA seems to be more effective for problems that are defined over graphs (Max-Cut, Maximum Clique, Community Detection) compared to the ones that are not. Note, however, that on two graph problems, Minimum Vertex Cover and Maximum Independent Set, QA performs significantly behind the classical solvers. Based on the analysis of the large instances (see Section 5.1) we observed that the bias of the problem seems to play a role in affecting the effectiveness of QA. This observation is confirmed here as well since QA is much more effective in solving Max-Cut problems than in solving Maximum Independent Set and Minimum Vertex Cover ones. We can also observe that the effectiveness of QA is quite poor for Set Packing and Feature Selection, being significantly behind the classical solvers.

Interestingly, on Max-Cut and \(4 \times 4\)-Sudoku problems almost all the solvers find the global optimum, while for the Quadratic Knapsack hardly any instance can be solved optimally at all, with the most effective solver being TS with 4% of the instances solved optimally.

As a second analysis, we study the effectiveness of QA in sampling solutions with a variable assignment that is close to one of the optimal ones in terms of Hamming distance (i.e., h-Optimality). The results reported in Table 6 are consistent with the previous ones in which we assessed the ability of the solvers to find the global optimum, see Table 5. We should note QA exhibits much better effectiveness, when measured in this way, being able to sample solutions close to the optimal ones in the majority of cases. For example, the effectiveness of the Number Partitioning problem goes up from 33 to 96% while for Minimum Vertex Cover goes from 17 to 58%. The results also confirm that QA is quite effective for problems that do not require penalties to model constraints (Max-Cut, Community Detection, and Number Partitioning). On the other hand, QA is still ineffective for the Feature Selection problem. Quadratic Knapsack remains very challenging for all the solvers.

Table 6 Fraction of the instances in which the solvers are able to find a variable assignment having a Hamming distance of at most 1 with respect to any optimal solution (i.e., h-Optimal)
Fig. 2
figure 2

Bar plot showing the balanced accuracy of the meta-models which predict whether QA is at least as good of all the classical solvers combined (QA-over-all) on the large instances. The domains or component sets the model is trained on are listed on the x-axis. Domains and component sets are ordered according to the Balanced Accuracy of the best related meta-model, in descending order. The vertical black segments on the top of each bar represent the standard deviation of the meta-models

Fig. 3
figure 3

Bar plot showing the Balanced Accuracy of the meta-models which predict whether QA will find the global optimum (Optimal) on small problem instances. The domains or components the model is trained on are listed on the x-axis. Domains and component sets are ordered according to the Balanced Accuracy of the best related meta-model, in descending order. The vertical black segments on the top of each bar represent the standard deviation of the meta-models

5.3 Meta-models and feature importance analysis

The goal of our analysis is to identify the domains and the component sets whose features allow the meta-models to predict well the effectiveness of QA. We limit our analysis to the meta-models trained to predict whether QA will be at least as effective compared to all the classical solvers combined (target label QA-over-all) for the large instances. We also analyze the meta-models predicting whether QA will find the global optimum on the small instances (QA-Optimal). The full results are available in the online appendix.

The first important question is whether it is possible to train meta-models able to predict the effectiveness of QA. The results in terms of Balanced Accuracy are shown in Fig. 2 (target QA-over-all) and in Fig. 3 (target QA-Optimal) for the two most effective classifiers. We can immediately see that for several domains or components of the large instances, the Balanced Accuracy is approximately 85%, while for the small instances it is often exceeding 90 or even 95%. This, combined with the fact that we selected a heterogeneous set of problem classes and instances that are solved by QA with different degrees of success, allows us to conclude that it is indeed possible to build accurate meta-models to predict the effectiveness of QA. These meta-models can then be used for many purposes, among which, studying the behavior of this technology.

Table 7 Best five features, ordered according to feature importance, of AdaBoost and XGBoost meta-models trained with LogIsing and EmbIsing domains and with Bias and Coupling component sets

We now analyze which are the domains or components that produce the best meta-models. Concerning the domains, the more informative ones are those related to the graph structure of an Ising problem (LogIsing and EmbIsing) which, based on the high accuracy of the meta-model, we conclude are very informative on the effectiveness of QA. Among the domains, the distribution of the values in the Q matrix of the QUBO problem (MatStruct) is less informative, this can be explained by the fact that the problem that is actually solved on the quantum device is represented as Ising and not as QUBO.

Secondly, if we consider the domains related to the distribution of the energies of an Ising problem which are only available for the small instances (SolSpace, NorMul, 25%-SolSpace, 25%-NorMul in Fig. 3) it is possible to build meta-models which, again, predict well the effectiveness of QA with a Balanced Accuracy well above 90%. This result shows that the effectiveness of QA also depends on the distribution of the energies of the problem, i.e., on how the cost y of the QUBO problem is distributed. Notably, using features based on the solution space allows to achieve comparable Balanced Accuracy with other domains, indicating that both are equally very informative. This is a particularly good result because it is relatively easy to compute the features for the other domains once the problem has been formulated as QUBO.

If we look at the orthogonal grouping of the features, by component sets, we notice that with the Bias, Coupling the Laplacian component sets it is possible to train at least a meta-model with good Balanced Accuracy (higher than \(80\%\)). On the other hand, the Structural Adjacency (StructAdj) and Structural Laplacian (StructLap) are the least informative and, in particular for large instances, do not allow to build a meta-model better than random guess. Since both StructAdj and StructLap are computed on the binarized problem structure, they only account for how the problem variables are connected and not the coefficient values, this means that the structure of the problem alone is not informative at all. The bias and the coupling of an Ising problem, together with the Laplacian matrix related to the graph of the Ising problem, are the most informative on the effectiveness of QA.

In general, we can confirm that the characteristics of the problem are important to determine the effectiveness of QA but one must account for the actual coefficient values of the problem and, preferably, use features derived from the Ising formulation. This could open new research questions on whether one can change the formulation of a problem so that its coefficients have a different distribution that is more adequate for the QA. Furthermore, since the coefficients are a function of the problem class and the data, it may be possible to identify which types of graph topologies may be more or less difficult to tackle based on the distributions of the coefficients that they would produce.

We now move to analyzing which specific features are the most important among the ones we identified. We limit our analysis on feature importance to the XGBoost and AdaBoost meta-models trained on two domains (LogIsing and EmbIsing) and on two component sets (Bias and Coupling). In the case of meta-models related to small instances, we include in the analysis also the domain SolSpace. The best five features of each of these domains and component sets are listed in Table 7 (target QA-over-all, large instances) and in Table 8 (target Optimal, small instances).

Table 8 Best five features, ordered according to feature importance, of AdaBoost and XGBoost meta-models trained on given domains and component sets

Domains feature importance

We consider in particular the domains LogIsing and EmbIsing. The majority of the most important features are related to the bias and to the coupling of the problem, which is consistent with our previous analysis. Some features are related to the distribution of the values of the bias and the coupling (Gini index, Shannon entropy, Herfindahl-Hirschman index), while other features are related to precise values of these mathematical objects (minimum value, maximum value).

In particular, notice that Bias gini index (related to the distribution of the bias), Bias condition number (related to the values of the bias) and Coupling max eigval (related to the eigenvalues of the coupling) are among the best features in the majority of the meta-models. We deduce that the distribution of the values and the values themselves of the bias are important to study the effectiveness of QA, together with the eigenvalues of the coupling. This is again an interesting observation because it would allow us to identify in advance whether a problem could be well-suited for QA.

Notice also that the number of qubits needed to embed the problem on the Quantum Annealer (Graph Structure qubits) is important, for one meta-model, to predict the effectiveness of QA for the large instances, but not for the small instances. This difference may be linked to the fact that, for the small instances, the number of qubits required after the minor-embedding process is limited and therefore has a lower impact.

We analyze, for these two domains, the least important features too. The majority of them is related to the structural adjacency and to the structural Laplacian matrix. This confirms that the sole structure of a problem is not sufficient to determine the effectiveness of QA. Thus, we must consider also the coefficient values between the variables.

For what concerns the SolSpace domain (see small instances in Table 8), notice that both the meta-models have the same top three features, although in a different order: such features are mostly related to the distribution of the eigenvalues of the problem (gini index and grouped hhi), which plays, therefore, a role in determining the effectiveness of QA.

Component sets feature importance

If we consider the Bias component set, the majority of the most important features are related to the distribution of the values of the bias, both considered in the LogIsing domain and in the EmbIsing domain. In particular, observe that the Gini index and the condition number of the Bias, computed in both domains, are among the five most important features, as they were for the meta-models trained on the domains LogIsing and EmbIsing. This corroborate our statement that the distribution and the values of the bias are important to determine the effectiveness of QA.

If we instead consider the Coupling component set, we observe that most of the important features we identify are related to the values and to the eigenvalues of the coupling. Features computed in both the LogIsing and EmbIsing domains are important and the most important features are related to the values of the eigenvalues of the coupling. Notice that the spectral gap and the Gini index of the coupling, computed both in the LogIsing and EmbIsing domains, are shared with almost all meta-models as some of the most important features. We conclude that the eigenvalues of the coupling and their distribution are important to analyze the effectiveness of QA.

Feature correlation with target label

In the previous analysis, we identified the features that are the most important for the meta-models. Some of these features, furthermore, are important also if computed in different domains and for different meta-models. These features are, for the LogIsing and EmbIsing domains: Bias gini index, Bias condition number, Coupling max eigval; while for the SolSpace domain: gini index, grouped hhi, and third quartile. We want now to give an intuition of which values of these features determine a low or an high effectiveness of QA. For each feature we identified, we compute the Spearman rank with the targets of the meta-models (QA-over-all and QA-Optimal).

For most of these features, the Spearman rank does not highlight a strong correlation with the value of the target (in general, the Spearman rank in absolute value is below 0.40). This suggests that the complexity of the underlying behavior might require more powerful tools. Two exceptions are given by gini index and grouped hhi in domain SolSpace, which have respectively a Spearman rank of \(-0.596\) and 0.537. This mean that as the Gini index of the eigenvalues increases, the Ising problem becomes less difficult to solve. On the other hand, high values of grouped hhi of the eigenvalues imply that the Ising problem is difficult to solve with QA.

6 Conclusions

In this paper, we have studied the effectiveness of QA with an empirical approach based on meta-learning models.

First, we select a pool of ten optimization problems which can be formulated as QUBO. Then, we generated more than five thousand instances, based on different problem sizes and structures. In particular, we created two sets, one containing large instances and another with small ones for which we can study also the properties of the whole solution space.

As a second step, we define a set of more than a hundred features to describe each problem instance. The features are heterogeneous, based on graph theory or on metrics largely used in statistics, probability theory, and economics, and account for the structure of the problem, its coefficients, and its solution space. We gather all the features into a meta-learning dataset, which we share on GitHub for further research.

Third, we compare the effectiveness of QA and three classical solvers: Simulated Annealing (SA), Tabu Search (TS), and Steepest Descent (SD). We observe that QA is frequently less effective than the classical solvers, for both the large and small instances, except for specific problems. In particular, we have observed that QA solves more effectively problems with no constraints in their formulation (Max-Cut, Number Partitioning, and Community Detection).

Lastly, we train different classification algorithms to predict whether QA will solve an instance effectively or not and show that it is possible to do so accurately. We then use the meta-models to probe the behavior of QA. In particular, by analyzing the feature importance of the meta-models, we can observe how the distribution of the bias and the coupling of a problem play a key role in determining whether QA will be effective in solving it.

In conclusion, we successfully applied an empirical analysis of the effectiveness of QA based on meta-learning. Possible future directions include the analysis on how different distributions of the coupling and bias values relate to the effectiveness of Quantum Annealing. Such results could be correlated to specific kinds of problems. For example, problems characterized by graphs with a power-law distribution (e.g., problems involving social networks) may be more or less difficult to tackle than those characterized by regular graphs. This can be done, for example, by defining new features that describe how much the distribution of the bias and the coupling differs from another distribution, e.g., from a Gaussian or a uniform distribution. Thanks to its generality, the methodology can be easily extended to other heuristic solvers of QUBO problems, such as the Variational Quantum Eigensolver (VQE) (Fedorov et al. 2022) and Quantum Approximate Optimization Algorithm (QAOA) (Farhi et al. 2014), providing a useful tool to further our understanding of how to use these quantum algorithms effectively.

Supplementary information. The meta-learning dataset with all the problem instances, the corresponding graphs and features, as well the samples obtained with each solver can be accessed here: https://github.com/qcpolimi/QA-MetaLearning