Statistical quality assessment of Ising-based annealer outputs

The ability to evaluate the outcomes of quantum annealers is essential for such devices to be used in complex computational tasks. We introduce a statistical test of the quality of Ising-based annealers' output based on the data only, assessing the ground state's probability of being sampled. A higher probability value implies that at least the lower part of the spectrum is a part of the sample. Assuming a plausible model of the univariate energy distribution of the sample, we express the ground-state energy and temperature as a function of cumulants up to the third order. Using the annealer samples, we evaluate this multiple times using Bootstrap resampling, resulting in an estimated histogram of ground-state energies and deduce the desired parameter on this basis. The approach provides an easily implementable method for the primary validation of Ising-based annealers' output. We demonstrate its behavior through experiments made with actual samples originating from quantum annealer devices.


Introduction
Optimization problems have increasing importance in many fields [1], which is driven by several factors, including the demand for competitiveness, better use of resources, and the increasing complexity and interconnectivity in the contemporary world. However, many problems of practical relevance are computationally hard [2] [3]. Quantum computational devices offer a promising perspective in handling such difficulties [4]. These include quantum annealers, such as the D-Wave machines [5]. In principle, such machines could solve a variety of (hard) optimization problems "naturally" by finding low energy eigenstates encoding the solution [6,7]. Therefore the development of quantum technology has the potential to efficiently solve complicated discrete optimization problems by encoding them into the energy of a physical system. Consequently, offering the optimal solution as one of the ground states (since there exist Ising formulations of NP-hard problems [8]). Indeed, during adiabatic evolution [9], such a system reaches a ground state "naturally". Therefore in principle, an optimal solution of the encoded problem can be read out from the state deterministically.
Quantum annealers such as the D-Wave machine are approximate physical realizations of an adiabatic quantum computer, i.e., they are based on a real physical system. They realize a fixed topology of couplings, e.g., the Chimera or Pegasus graphs. Thus the optimization problem must be embedded into this topology either as an induced subgraph or in a redundant manner using multiple physical quantum bits to represent a logical one. This procedure is called minor embedding and even though it is often doable relatively simply, finding an optimal embedding is a hard computational problem itself.
No real quantum system can be entirely separated from its environment: the phenomena such as the heat exchange [10] or decoherence [11] cannot be wholly neglected in the case of a physical quantum annealer. This leads to a noisy version of the adiabatic evolution [12]. Moreover, the measurements performed at the last stage of computations are not perfect either. As the dimensionality of the underlying Hilbert space grows exponentially with the number of qubits [13], the aforementioned issues affect the results to an even greater extent. To tackle these problems in a quantum annealing device, the adiabatic evolution is run repeatedly, each run followed by a readout. The results form a statistical sample of configurations and objective values, possibly containing optimal solutions, i.e., the minimum energy states. The quantum annealer implements probabilistic heuristics, which are still potentially valuable for addressing specific hard computational problems.
The setting of the annealing procedure is challenging, even in the case of an ideal system. The parameters of the process depend on the minimum gap between the instantaneous ground state and the rest of the spectrum during the evolution, which in general, cannot be determined [14]. Therefore, in the observed result, which is the output from a physical quantum annealer, the elements of the output sample can have significantly higher energy than the ground states. Thus, whether any ground state has been sampled is an essential question in practical applications. In many cases, only the ground state is useful in physical applications, while in optimization problems, low-energy excited states are often also valuable. Therefore, obtaining states close to the ground state in the sample also bears significant relevance. The estimation of the success probability of having a ground state (or states close to it in energy) in the sample, has also been addressed in benchmarking literature, see e.g. [15,16,17]. In the state of the art benchmark scenarios, the addressed optimization problems are specifically designed, i.e. the ground state is usually known in advance, and the size of a problem is typically small. The success probability is then empirically investigated based on solving various benchmark problems and comparing the result with known solutions. Our idea is different: we propose the complementary method of estimating the ground-state energy and testing the quality of the solution against containing any ground state solely using statistical analysis of the output of the Ising machine. We build on certain generic assumptions coming from the statistical description of the system, and therefore our method is more suitable for larger problems. This is in line with our intention to use it in application-driven scenarios. We will, however, use the minimum energy known from other calculations to validate our method.
Our goal is to use the entire sample resulting from a quantum annealer (or a similar device) to estimate the likelihood of having a ground state or at least states from the low-energy part of the spectrum in the sample. We remark here that as quadratic binary optimization problems are NP-hard, polynomialtime bounds on the optimal solution constitute important results in the classical literature on the problem, see e.g., the works of Nesterov [18] or Ye [19]. Those results also provide hints to finding the distance between a particular solution and the optimal one. In a similar spirit, we can consider probabilistic solvers.
In [20] it has been proposed that in future quantum annealer designs, to improve the convergence of solutions into the ground state with the increase of the number of spins, the temperature needs to be scaled down. For high enough temperatures, [20] assumes power-law scaling of the heat capacity with temperature. Such scaling is typically a near-phase-transition behavior, similar to that described by critical exponents. In [20] the relation between the first three cumulants of the Ising energy output and the heat capacity has been calculated using the Boltzmann distribution and the fluctuation-dissipation theorem. We will use this relation to estimate the ground state energy from the spectrum.
Our research is tied to Extreme Value Theory [21] where limits for low values are estimated from a particular probabilistic model of the sample. However, instead of estimating the extreme-value distribution, we assume its form from the underlying Ising model and use it to estimate the minimal values.
The paper is organized as follows. In Section 2 a theoretical introduction of our model is presented. In Section 3 the results of our experiments using data both from simulators and real D-Wave machines are discussed. In Section 4 the results are summarised and conclusions are drawn. Appendix A contains more details on performing the Metropolis-Hastings simulations.

Theoretical model
Quantum annealers are based on the Ising model defined by the following Hamiltonian: where V is a set of spins (vertices), and E describes the topology of the processor. Furthermore, s i ∈ {−1, 1} is the spin value at the i'th vertex of E, J i,j is the coupling between spin at vertex i and spin at vertex j, and h i is the local field acting on spin at vertex i. In certain cases it is more convenient to deal with quadratic unconstrained binary optimization problems (QUBOs): where x is a binary vector of decision variables, and Q is an arbitrary matrix which can be, without loss of generality, symmetric or upper triangular. There is a one-to-one relation [22] between Ising problems of finding the ground state of the Hamiltonian in Eq. (1) and the QUBO problems in Eq. (2), as their objective function's values depend linearly on each other. The physical quantum annealers are expected to reach a ground state of the classical Ising model, so even for a QUBO model, the energy samples will reflect Ising spectra. Therefore, our results will also apply to the output of the QUBO formulations.

Background
Let us now briefly recapitulate the considerations in [20], where the authors, using the techniques of statistical physics, analyze the energy spectrum of the Ising model under the assumption of the Boltzmann distribution. Concerning the effect of finite temperature, the analysis concludes that the probability of sampling the ground state goes to zero exponentially with the number of spins N . Thus, scaling down the temperature can help in regaining the success probability.
In the considerations of [20] there are two types of scaling of the specific heat with the temperature assumed: power-law for low values of β = 1 T and exponential for high values (albeit it is claimed as a general assumption that the system is not tuned to a phase transition point). We focus on the lower range, i.e. relatively high temperatures, as we expect our D-Wave samples to fall into this region. Hence, we will adopt the assumption that specific heat behaves as where A is the coefficient of the particular instance, and α is a parameter of the model. The spread of energies is measured as a standard deviation of the sample of energies Following [20] the mean of the energy distribution of the Ising system can be expressed as: where E 0 is the ground state energy. Following [20] asymmetry can be written as: Our model is based on assumptions expressed in Eqs. (3), (4), (5), and (6). Eq. (3) describes the scaling, which is independent from the underlying distribution. Remaining assumptions stem from our assumption of the Boltzmann distribution. In the case when a different distribution is assumed (as quantum annealer may also operate in the non-equilibrium region or even in the coherent one [23]), these assumptions should be tested separately. In this way a different model version can be obtained, which may outperform the original one if the assumed distribution is closer to the real one. The exact distribution is, however, unknown. Therefore, we assume the Boltzmann distribution and as shown later, with this assumption, the performance of our method is acceptable.

Results
From the assumptions presented in Section 2.1, we can derive relations, which will be used to asses validity of the Ising-based output. Using Eq. (4) (see also [20]), we are given the formula for the variance: Next, with the use of Eq. (7) and solving Eq. (5), we obtain: Analogously, by solving Eq. (6), we get Finally combining Eq. (8) and Eq. (9) we obtain the formula for the ground state energy, which can be used as an estimator In the similar manner the parameter β can be estimated The energy of the ground state and the parameter β, under our assumptions, can be expressed as a function of cumulants σ, η and H , which will be later estimated from samples. Both in Eq. (10) and Eq. (11) the asymmetry η(H) can be computed as normalised 3rd cumulant c 3 of the sample of energies, i.e.:

Estimation error analysis
The output of the quantum annealer (or its simulator) is an n-sample of energies and configurations. Our method uses only the energies as input. The goal is to estimate the ground state energy using methods of moments, i.e., via cumulants computed from the data. Next, we will compare the estimate with the minimum value from the sample in order to assess the likelihood of the event that the sample contains the ground state energy.
To construct the distribution of estimated ground-state energies E 0 in order to obtain a significance threshold, we use Bootstrap resampling [24]. In details, let H 1 , H 2 , . . . , H n be a sample, and E 0 (H 1 , H 2 , . . . , H n ) be the estimate of the ground-state energy via Eq. (10). Then from H 1 , H 2 , . . . , H n we sample n items with replacements, i.e. repeating some of the elements optionally. Let us denote the resulting samples by H Repeating this procedure S times we obtain the desired estimated distribution of E 0 -s.
To validate the Bootstrap approach, we compute the standard deviation of E 0 by k-statistics approximation and standard error calculus. In order to do so, it is convenient to combine Eq. (10) with Eq. (12), then: Let c k be the non-normalised k-th order cumulant; we will omit the argument H of all cumulants in what follows. The estimation error of H can be neglected in comparison to the estimation error of σ 2 and c 3 , as the estimation error of moments (and cumulants) tends to increase with their degree. To estimate the standard deviation of cumulants' estimation, we approximate the cumulants with k-statistics, which is valid for large n [25]. The standard deviation of the cumulants in the argument is: and The contributions of the particular cumulants to the standard errors of E 0 are: Finally, assuming that estimators are independent, the standard deviation of E 0 can be estimated as

Physical assessment of validity
The results in [20] serving as the basis for our considerations were obtained by considering and analyzing D-Wave solutions for the random instance with couplings J = ±1 as well as 3-regular 3-XORSAT instances and planted (droplet) instances on the Chimera graph. In this context the assumption in Eq. (3) results from the scaling c(β) ∼ T α which is appropriate for β = β c (such a scaling is a phase-transition-like behaviour in β), recall also that β = 1/T . Here β c is the quasi phase transition point. An actual phase transition would take place in a system of infinite volume. However, as discussed in [26], for finite volume a phase-transition-like behaviour occurs instead. The larger the system size, the sharper the dependence of c(β) on β [27,28]. We expect a real annealer to be in the β < β c regime as the thermal noise significantly impacts the machine's output.
Consider a general probabilistic approach to the Ising model with couplings J i,j -s and local fields h i -s. However a phase-transition-like behavior can also be expected under such circumstances, in a less rigorous form. A simple model of variable coupling can be [29] where 's are drawn randomly according to a chosen probability distribution.
In [29], authors show, that the presence of the disturbance in such a form, indeed results in a flattening of the exponential scaling. One can claim that for small systems and high variation of couplings, c(β) depends weekly on β. Such a phenomenon can possibly affect, for example, the parameter α, making it more instance-dependent in such a case. Let us also remark that the heat capacity per node depends strongly on the number of connections of the node. In [20] authors assumed that the number of couplings scales linearly with N and also, there were additional assumptions on the coupling strength. These do not necessarily hold for our problem instances. As both α and β c may vary with the degree of connectivity of the graph, the model may behave worse for a problem graph with a highly variable degree of connectivity.
The more detailed analysis of the parameter α will be a subject of further research: a more thorough investigation should include α parameter fitting for a particular type of instance and consider α's estimation error. Here, for demonstration, we will use the α = 0.19 -0.38 parameter values, already proposed in [20], as well as some lower value of α for wider sensitivity analysis.

Experiments
In what follows, we demonstrate the procedure introduced in the previous Section on particular examples. The inputs will be Ising annealer samples, and we shall test whether they contain the ground-state energy.
The described method leads to the following algorithm: Input: • Parameter α.
For each H (j) estimate the minimal value E where X is a random variable with the distribution µ (H) ; Output: The above procedure provides an estimate of the probability of having a ground state in the sample. A higher parameter value also implies probabilistically that at least a lower part of the energy spectrum has been sampled. The estimation in Step 3 assumes the theoretical model from Section 2 and depends on the original sample (H) and on the result of the draw from Step 2. This estimated probability p val of having sampled a ground state will be referred to as the "pvalue" in what follows. The algorithm was implemented in Julia programming language, and the source code is publicly available [30].

Artificial data
Our first experiments were performed on a problem instance of 198 logical bits from the field of railway operations research, described in detail as a case 1 example in [31] (consult Section IV A therein for a problem description and Section III C for its QUBO formulation).
The samples have been generated using the Metropolis-Hastings algorithm. We refer to these data as artificial as they do not come from a physical solver. The Metropolis-Hastings algorithm has β MH as a parameter playing a similar role as the β in our model in the range of model validity. It is tied to the temperature of the simulated system and thus affects the quality of the solution. We expect β MH ≈ β; the latter estimated in the way described in Section 3.3. The results on β estimation and the scaling of η(H) vs. β M H , see Eq. (9), are presented in Fig. 1. As it was expected in Section 2.4, we have a limit on β MH , below which scaling η ∝ β α/2 MH , and Eq. (3) hold. To find the threshold, we can analyze the scaling of η(H) and determine the threshold value of β M H above which it stops following the model in Eq. (9). It has been done manually in the case of Fig. 1 (left panel), but it could be done automatically.
In the case of this problem instance, the optimum is known. Hence, we can plot both the difference of the minimum energy state in the sample H min from the ground state E 0 , and also the p-value as a function of β MH . To assess the quality of solutions, we use the relative difference between the best solution Figure 2: The minimal energy and the p-value for the problem instance addressed in Section 3.1. We have used S = 1000 for bootstrapping. In the model validity region β M H < 1.375, we can observe 9 solutions that are far from the ground state; those have an almost zero p-value and one solution that is near the ground state with the p-value of ≈ 0.3. This is as expected. The choice of the α values is discussed at the end of Section 2.4. Observe that the particular choice of the value of parameter α does not significantly affect the results. Note also that there was no minor embedding in this case. We have used n = 1000 samples from the Metropolis-Hastings algorithm's output.
H min and the ground state E 0 : which is equal to zero only if the ground state energy value is in the sample. (The division by |E 0 | can be omitted if it is more convenient to check the absolute differences.) Fig. 2 clearly demonstrates that for β MH -s within the model validity region, the p-value can be used to distinguish between better and worse solutions. Finally, in Fig. 3 we have compared the means and the standard deviations of the Bootstrap histograms and found them to coincide with those obtained by the direct calculation (with no bootstrapping) via Eq. (10) and Eq. (17).

Analysis on D-Wave data
We have tested our algorithm on energy spectra returned by a D-Wave quantum annealer. The addressed optimization problems were the aforementioned set of practical instances as well as the a set of droplet instances. As an additional set of examples, we present results on randomly generated exact set cover problems which we will be described later. The first set contains instances from the field of railway operations research [31], see Section IV therein for a more detailed description. The second set contains droplet instances characterized by artificially "planted" ground states [32], designed to be difficult for an annealer. Droplets are specially designed to benchmark various annealers; they are not motivated by other optimization problems. Note that the two types of instances differ in the sense of the variability of couplings and local fields (this issue is important as discussed in Section 2.4). In the case of practical instances, there are many couplings with the same value reflecting a particular set of constraints in the actual problem, which does not hold for the droplet instances. In the case of the practical instances coming from [31] we, therefore, plot our p-values along with the difference of the energy from the true ground state, see Eq. (19), as a function of the annealing time. This is done in order to demonstrate the potential usefulness of the p-value in the estimation of whether the right annealing time had been chosen. (We will return to the estimation of the β parameter later.) The figures Fig. 4a, Fig. 4b, and Fig. 5 all confirm that the p-value has the expected behavior: in most cases it reflects whether the best solution from the sample is close to the ground-state energy, as it can be expected from a probabilistic discriminator. The useful solutions: those near the ground state (i.e. whose energy is within 10% reach of the minimum, see Fig. 4b) give high p-values (i.e. above 0.5), whereas solutions far from the ground state yield low p-values. For the set of our practical instances solved on D-Wave, the usefulness of the method has thus been demonstrated.
The results for the droplet instances of 2048 physical quantum bits are presented in Tabs. 1a, 1b, the results for small droplet instances of 128 physical quantum bits are presented in Tabs. 2a, 2b. As one can observe, for some droplet instances and experimental settings, the model works well (black numbers), and  [31] (Section IV C therein). The dependence of the p value and ∆H, the relative difference between the best D-Wave solution and the actual minimum (c.f. Eq. (19)) is plotted against the annealing time. The problem sizes are left panel: 198 logical bits, right panel: 48 logical bits. Chimera minor embedding was used, as described in [31], with coupling strength css = 2; n = 1000 D-Wave samples were involved. In most cases, the higher p-value reflects the lower ∆H. for some of them (red numbers) it does not. In Tab. 1a there are points where p-value based test concludes positively: the high value suggests that the ground state should have been sampled, even though it is not the case. Therefore we consider these as "false positives". In Tab. 2a we have "false negatives", these are points where p-value based test falsely gives a negative results; it incorrectly indicates the absence of the ground state from the sample. It will be shown in Section 3.3 that most of the false (red) results can be filtered out by analysing the estimated temperature of the annealer, as such an estimate will not be physical for most of problematic data.     Table 1, the β estimation fails in some of these cases, which indicates the model validity issue.
From the above results, one can conclude that, in general, if the output is far from the ground state (e.g., see the worst results on each figure or table), we have almost zero p-value. Hence, a high p-value can be a valid indicator indicating that the solution is in the low energy part of the spectrum. (Recall, however that in the case of droplet instances, we are interested only in the ground state, whereas low excited states can also be of interest in some other problems.) We can consider such a procedure as a primary valuation of the solution.
As we have seen in the examples, our discriminator will yield both false positives and false negatives. Their presence can be due to various reasons, including the deviation from the Boltzmann distribution, failure of the scaling in Eq. (3) for some instances, the error of the estimation of the third-order momenta, the use of inappropriate α parameter. Also, as most problems need to be embedded in the annealer's graph, our method could be best applied to the embedded problem's raw data. The minimal postprocessing to get the solutions to the original problem from the embedded data (including also the fixing of chain breaks with majority voting) can also affect the accuracy of our method. As in practice, one would prefer working with the original (and not the embedded) problems, we have tested our method in this way, and it appears to work also in this setting.
As an additional test, we have addressed set partitioning problems. Set partitioning and cover problems constitute a relevant class of hard optimization problems, with significant application in staff scheduling problems [33]. Their large instances require specialised algorithms (see [34] for a recent study). Even though the small ones can be solved using a linear constrained 0-1 program formulation easily, they have been used to benchmark quantum annealers recently [16], as they are well-known and controllable problems. We have generated random exact cover problems of small sizes and have solved them in the linear 0-1 problem formulation. Then we have converted them into a QUBO form using penalties as described, e.g. in [22], choosing a large enough penalty coefficient. This procedure resembles a typical practical application. We have solved the so arising QUBO using its standard linearization [35], resulting in a relatively large mixed-integer program that we have solved with GLPK [36]. This way, we were aware of the minimum of the QUBO. We have verified that the solution of the QUBO coincides with the classical solution, so our choice of the penalty was correct. From this point on, we are only interested in analysing QUBOs: if the minimum will get actually sampled when solved on a physical annealer.
We have solved the problem instances on the D-Wave Advantage4.1 quantum computer. To keep ourselves in a situation similar to practice, we have used the default settings of autoscaling, embedding, and minimal postprocessing. We have applied our method to the so-obtained samples. The results are presented in Figure 6. The true minimum was sampled only in the case of the smallest instance, and we are getting farther in the case of the bigger problem instances. This illustrates further that our method can be useful, at least to some extent, even though, in the case of these calculations, our original assumptions on the distribution do not probably hold exactly because of the use of QUBO formulation, the embedding, and the auto-scaling and minimal postprocessing.

Beta estimation
If the ground state energy is known with certainty (e.g., from some other consideration), we can estimate the parameter β by means of Eq. (11). This calculation of β can be used to validate the effectiveness of our method on a particular set of instances. The β calculation requires the ground state energy for a few instances within the set of problems calculated under the same circumstances. To demonstrate such an approach conceptually, we compare β computed for droplet instances and practical instances. Alternatively, Eq. (11) can be reformulated to express β in terms of α rather than E 0 , as a result, more instance-wise validation will be possible.
The β parameter reflects the effective temperature of the Ising system realised by the annealer. Therefore this parameter carries information about the extent of noisiness to expect: the higher the β, the closer the annealer is to an ideal adiabatic quantum computer. (Meanwhile, as seen in Fig. 1, the estimation of β facilitates the comparison with the Metropolis-Hastings approach.) The results of the β computation are presented in Tabs. 3, 4 (droplet) and Fig. 7 (practical instances). There is a large spread in β estimation for the droplet instances. Furthermore, some values of β are non-physical (i.e. negative). The observed behavior can be caused by the fact that droplets are complicated instances with highly variable couplings. Hence our model may not work correctly for some (especially smaller) droplet instances. For such smaller instances, the physical validity of the model may be weaker, see Section 2.4). Here, however, a very important conclusion appears. If we have a series of instances (e.g. droplets), and we know the ground state exactly for a few, we can check if the β values are physical for these few. Based on this, we can conclude whether the method works well or not for the whole series.
In the case of practical instances, where the instance itself is less complicated, estimated β values appear to be physical for all data, see Fig. 7. Furthermore, for the particular Chimera chip, β oscillates in the range 0.2 -0.65. In contrast, for the particular Pegasus chip, it is closer to 0.2, which may suggest a slightly higher temperature on the Pegasus chip.

Conclusions
In this paper, we have introduced an easily implementable method that uses univariate cumulants of order 1 -3 to assess the quality of the annealer's output. The method results in a parameter, the p-value, which under assumptions about the physical model, indicates that a lower part of the energy spectrum has been sampled. The model depends on the scaling parameter α. We use an ad-hoc value which is plausible according to results in the literature for demonstration. Our sensitivity analysis suggests that the model can be moderately sensitive to α in some cases. Yet, its accurate estimation for the given instance (or set of instances) could improve it. We have applied Bootstrap resampling, a heuristic method to compute the significance interval. The method could also be further   Fig. 4 and Fig. 5). Observe that the β values appear to be physical (i.e. positive) for all experimental points. However, in these cases, the temperature was computed from the QUBO energy spectrum (not Ising one), and the auto-scale was on. Hence it is re-scaled in comparison with the actual annealer temperature.
improved, e.g., by rescaling the significance interval to fit the variance from the error calculus. We have demonstrated the potential of statistical analysis in estimating the quality of Ising annealers' output on particular examples. At least based on the particular examples of annealer outputs we have studied so far, we argue that the introduced analysis can serve as a useful tool in evaluating the solution. To make a stronger statement, the limitations of our model have to be studied further, both analytically (by elaborating, e.g. on the more precise determination of α) and empirically by applying the method to many samples. We plan to use it for additional problems in the field of logistics and operations research, similarly to those in [37,38].
As for the possible further steps of this research, recall that in the case of non-Gaussian distributions (like that of the Ising annealer output), information about a probabilistic model can also appear in cumulants of order higher than 3. Hence, applying such cumulants may improve our model. Further, as the annealer output is multivariate (containing energy and spin configurations), the multivariate cumulant analysis can give a further clue in analyzing, qualifying, and perhaps correcting the annealer's output. As of the further research, one can also perform quantum annealing with anneal schedule option of D-Wave, c.f. [39]. Another interesting question is the systematic study of the effect of the distribution. Systematic testing using artificial data generated assuming non-Boltzmann distribution and assessing the performance of the method would enable the testing of the assumptions in Section 2.1 as well as the characterization of the exact limitations. This will be the subject of further research. spin flipped at position i. Then the probability to move is: where the Gibbs distribution with β MH parameter is used -this parameter models the temperature of the Ising system under simulation. If β MH is low, jumps to "better" solutions are favorable, while if the β is large, a more extensive search of the solution space is favorable. The term β MH (H(y) − H(x)) in the first line sets a certain unit to β MH . As we have no Boltzmann constant, the unit of β MH is the inverse of the unit of energy in H.