Modernizing quantum annealing II: genetic algorithms with the inference primitive formalism

Quantum annealing, a method of computing where optimization and machine learning problems are mapped to physically implemented energy landscapes subject to quantum fluctuations, allows for these fluctuations to be used to assist in finding the solution to some of the world’s most challenging computational problems. Recently, this field has attracted much interest because of the construction of large-scale flux-qubit based quantum annealing devices. These devices have since implemented a technique known as reverse annealing which allows the solution space to be searched locally, and algorithms based on these techniques have been tested. In this paper, I develop a formalism for algorithmic design in quantum annealers, which I call the ‘inference primitive’ formalism. This formalism naturally lends itself to expressing algorithms which are structurally similar to genetic algorithms, but where the annealing processor performs a combined crossover/mutation step. I demonstrate how these methods can be used to understand the algorithms which have already been implemented and the compatibility of such controls with a wide variety of other current efforts to improve the performance of quantum annealers.

Quantum annealing allows for quantum fluctuations to be used used to assist in finding the solution to some of the worlds most challenging computational problems. Recently, this field has attracted much interest because of the construction of large-scale flux-qubit based quantum annealing devices. There has been recent work on [Chancellor NJP 19(2):023024, 2017] how the control protocols of these devices can be modified so that individual annealer calls on real devices can take initial conditions. Development is being undertaken to implement such protocols in the quantum annealing devices designed by D-Wave Systems Inc. and these features will be available to customers soon. In this paper, I develop a formalism for algorithmic design in quantum annealers, which I call the 'inference primitive' formalism. This formalism allows for a natural description of calls to quantum annealers with a general control structure. This more generalized control structure includes not only the ability to include initial conditions in an annealer run, but also to control the annealing schedules of qubits or clusters of qubits independently, thereby representing relative certainty values of different parts of a candidate solution. I discuss the compatibility of such controls with a wide variety of other current efforts to improve the performance of annealers, such as non-stoquastic drivers, synchronizing freeze times for the qubits, and belief propagation techniques. To demonstrate the power of the formalism I present here, I discuss how this new formalism can be used to represent annealer implementations of genetic algorithms, and can represent the addition of genetic components to currently used algorithms. The new tools I develop will allow a more complete understanding of the algorithmic space available to quantum annealers, and thereby make the field more competitive. * email: nicholas.chancellor@durham.ac.uk

Introduction
The quantum annealing algorithm (QAA) [1,2,3,4,5] has been demonstrated to be a promising candidate for a vast number of real-world problems. The potential applications are too numerous to list here, but include fields as diverse as aerospace [6], computational biology [7], neural networks [8,9,10,11], pure computer science [12], and economics [13]. In this manuscript, I discuss a formalism which can represent general control of quantum annealers. I demonstrate how this formalism can be used to design new algorithms based on multiple calls to a quantum annealer. More generally, this formalism represents hybrid analog-digital computation, but I restrict the discussion in this paper to quantum annealing applications.
The QAA as it is usually structured starts from a superposition state representing all possible solutions. The system is then annealed and quantum fluctuations are introduced through competition between a problem Hamiltonian and a 'driver' Hamiltonian which does not commute with the problem Hamiltonian H(s) = A(s(t)) H driver + B(s(t))H problem , (1) where 0 ≤ s ≤ 1 is the annealing parameter which controls the annealing schedule, A(s(t)), B(s(t)), which are chosen such that A(0) B(0) ≫ 1 and B(1) A(1) ≫ 1, and both behave monotonically with s. In traditionally formulated quantum annealing, s is also a monotonic function of t, but to construct the protocols here, I will consider cases where s is a non-monotonic function of t, as was done in [14]. The problem Hamiltonian is usually chosen to be an Ising model, 2 system quantum annealing, where tunneling mediated by these fluctuations is driven by a low temperature thermal bath. One example of a driver Hamiltonian is the transverse field driver which is currently implemented on the annealers produced by D-Wave Systems Inc. [15].
I also consider more general multi-body driver Hamiltonians of the form where, c i is a positive real number which determines the strength of the coupling, R i is a set of qubits, and where a = 0 1 0 0 is a lowering operator operator such that σ x = a + a † . The reason such drivers are of interest is that they are able to introduce a sign problem in quantum Monte Carlo simulations if no basis exists for which all off diagonal terms are negative [16,17]. No other method is known for large scale low temperature simulations of these so-called non-stoquastic Hamiltonians [18]. Because of this increased difficulty in simulation, it is widely suspected that quantum annealing with non-stoquastic drivers is more powerful than quantum annealing with stoquastic drivers.
Recall that the QAA as it is usually formulated starts from an equal superposition of all classical solutions, meaning that there is no way to incorporate existing knowledge about the solution, neither from previous annealing runs nor from different algorithms. One way around this deficiency is to use algorithms based on local searches [14,19] around a candidate solution rather than global searches which start from a superposition of all classical solutions. In particular, [14] includes proofof-principle numerical experiments which demonstrated how such techniques may assist in a search. It has recently been announced that reverse annealing features capable of performing these protocols will be added to D-Wave 2000Q devices [20].
There is also an alternate formulation which predates the proposals in [14,19] which allows an initial guess [21] to be incorporated into a closed system adiabatic quantum protocol. While protocols based on these techniques can also be represented with the inference primitive formalism, for this paper I will restrict the discussion to the local search formulation in [14]. It also may be fruitful to explore connections to recent work exploring the use of a reinforcement algorithm [22] in quantum optimisation, although such a study is beyond the scope of this work.
In addition to representing the protocols in [14], I show that the formalism demonstrated here represents a more generalized control strategy which includes annealing the qubits independently. Such additional freedom allows for the annealer to accept individual uncertainty values for each bit, or cluster of bits in the case of multi-body drivers.
This formalism can be used to demonstrate a new way in which a directed mutation engine for genetic algorithms [23,24,25] can be constructed using these individual uncertainty values. The idea of using an annealer for genetic algorithms is not new: Coxson, Hill, and Russo [6] experimentally demonstrated that a D-Wave device can successfully aid these algorithms in finding optimal radar waveforms. The method I propose, however, is completely general, and only requires that an annealer be able to realize a problem Hamiltonian, rather than a potentially more complex directed mutation Hamiltonian.
The structure of this paper is as follows. In Sec. 2 I discuss the inference primitive formalism, how it relates to quantum annealers, and demonstrate how previously known algorithms such as the traditional QAA and those proposed in [14] may be represented using inference primitives. In Sec. 3 I discuss how annealer based genetic algorithms may be represented in this formalism and how it may be used to add genetic components to the algorithms proposed in [14]. This is followed by a discussion in Sec. 4 about how the control represented in the inference primitive formalism is compatible many other recent advances in the field, including synchronization of freezing 4.1, higher order drivers, including non-stoquastic drivers 4.2, and belief propagation methods used to represent graphs larger than the hardware4.3. Finally in Sec. 5 I conclude with some overall discussion.

Inference Primitive
Consider a high level description of a subroutine Φ which performs a guided search of an energy landscape based on known information about likely solutions. I will call such a subroutine an inference primitive, as it will try to infer the correct solution based on input information. The inference primitive will be supplemented by information processing which determines the parameters to give each call to the primitive, I will call this the processing function F . I will demonstrate later in this section that Φ can be a high level description of a call to a quantum annealer, with F representing classical information processing used within a hybrid algorithms. I will also formally define both Φ and F .  Figure 1: Annealing schedule for inference primitive protocol. This is the same as in [14] except that individual qubits are annealed back to different values of s. Qubits are annealed first with a simple Hamiltonian to program an initial state, then the Hamiltonian is reprogrammed to the problem Hamiltonian and each qubit (or multi-qubit driver) is annealed back to s ′ (P i ), where P i is a measure of the uncertainty of a qubit value. The qubits are then annealed back toward s = 1, each starting its anneal when the other bits reach the same value of s. For s ′ i = 0 (red), setting the initial value is unnecessary, as no information about the qubit value is known.
Before discussing the formalism further, I will motivate the use of this formalism to represent control of quantum annealers. It has recently been demonstrated in [14], that global transverse fields can be used to control the range of local search in solution space. Building on this idea, application of different transverse fields locally will cause an algorithm to search different ranges in different directions in solution space. In this way, the strength of local transverse fields can encode bitwise certainty of a solution. In fact, algorithms based on an extreme version of this have already been implemented [26,27], in which, based on previous solution statistics, qubits are either treated as taking fixed values (absolute certainty), or annealed using a traditional protocol (absolute uncertainty). To implement a protocol which incorporates local uncertainty, I generalize the methods given in [14], to allow different qubits to be annealed to different points s ′ i , as depicted in Fig. 1. In this paper, I will not focus on how to construct heuristics which relate uncertainty to transverse field strengths, but rather examine how algorithms can be designed and represented, assuming a suitable heuristic has been developed. I provide an example of a very simple heuristic in appendix 1. This heuristic is only intended as an example of how these quantities can be related, and may be too simplified to perform well in the real world. Alter-native heuristics could be based on experimental local temperature estimates using the methods of [28], or by adaptations of the methods to estimate a global effective temperature used in [9]. For the remainder of this work, I will assume that a suitable heuristic, s ′ i ({P }), where the set notation has been used to emphasize that in general this parameter may also depend on the uncertainty P i ∈ [0, 0.5] of neighbouring qubits as well. I have motivated the high level description of a quantum annealer as an inference primitive Φ, now I must further motivate that suitably chosen processing functions F will be able to appropriately extract uncertainty information from the output data of a quantum annealer. To do this, I consider the problem of finding the ground state of a Sherrington-Kirkpatrick like spin glass [29]: where each J ij is selected uniformly randomly from the range [−1, 1]. All energy eigenstates of such Hamiltonians will be at least two fold degenerate because of total spin inversion symmetry. To break this symmetry I fix the last spin to be in the down orientation. This transformation results in the following effective n − 1 spin Hamiltonian.
where h i = J in . For the proof-of-principle I generate 1500 such Hamiltonians with n = 17. I then run Path Integral Quantum Annealing (PIQA) 1001 times for each such Hamiltonian, following the methods used in [14], which were adapted from those in [30], but with T = 0.8246, τ = 20 and P = 30. For each spin within each Hamiltonian, I compare the average value of the annealer output to a simple certainty value P i calculated using where G consists of the list of the 1001 solutions returned by PIQA (G i ∈ {1, −1}). I then break these spins up into two categories, those where S i found by Eq. (7) agrees with the true solution found by exhaustive classical search, and those where it does not. As Fig. 2 clearly shows, the larger the value of P i becomes, the more likely it is that the bit value is incorrect. Therefore the statistics of our simulated quantum annealer outputs not only information about the probable value of a bit in using the same numerical methods as the proof of principle in [14]. Blue bars are cases where S i found by Eq. (7) agrees with the true ground state, red are cases where it does not, and unfilled bars are total counts. a given solution, but also about the relative certainty of different bit values. How effectively this information is used depends on the heuristic used in F , I discuss a few examples of how F could be constructed in Sec. 3.1.

Definitions
I now define a mathematical representation of the computational subroutine I have described earlier.
Firstly I consider a system of N bits bits. To simplify some mathematical definitions which I will give later and for consistency with spin Hamiltonian definitions, I allow these bits to take values {1, −1}, rather than {1, 0}. I further define clusters R i which each consist of a unique, non-empty set of these bits, as represented in Fig. 3(a). I also define an inference primitive Φ, which takes as inputs a list of guesses for the value of the bits, S, as well as uncertainty values P for each cluster in R. An inference primitive in turn outputs a list of solution candidates G, and a list of associated energies for each candidate E. Each solution candidate consists of N bits numbers, each corresponding to a bit value of {1, −1}. The energy value E i = G i | H problem | G i tells how optimal each solution value is, where lower values indicate a higher level of optimality. Lists G and E must have the same length, which I refer to as N out . Fig.  3(b) represents an inference primitive visually. In practice, the role of Φ will be played by a call to an analog computational element, in the case of this paper, a quantum annealer.
In the absence of multi-bit clusters, S and P could be defined as a single 'mean' bit value for each bit which could be written as ]. However, this notation does not easily generalize to include multi-bit clusters, and therefore I represent S and P as distinct quantities where |S| ≤ |P |. Parametrizing in terms of S and P is natural as these two quantities map to different control parameters within an annealing protocol. In addition to the inference primitive, I also define a mathematical function which I call the processing function F . This function takes as its input a list of lists G, each element of which is a list G of solution candidates. This function likewise takes E as an input, which is a list of lists E of the associated energies for each solution candidate. The lists G and E must have the same length which I call N inputs . Generally, G and E will be allowed to be empty (N inputs = 0). This function outputs a list of guesses for the values of each of the bits S, and an uncertainty value P for each cluster in R. A processing function is represented visually in Fig.  3(c).
I have now defined an inference primitive Φ, the outputs of which can be used to construct the inputs of a processing function F , in turn the outputs of F can be used as the inputs of Φ. The mathematical functions and their associated inputs and outputs define the basic structure of the inference primitive framework, these mathematical functions can be expressed diagrammatically as depicted in Fig. 3 and this diagrammatic representation can be used to express sophisticated protocols as discussed in Sec. 2.2 and 3.2.
It is useful to give a few more definitions of mathematical quantities which will become important in specific examples which I will give later in this pa-5 per. In particular, to define ways in which G and E can be reduced to lists, rather than lists of lists. I first consider 'flattened' versions of the lists G and E,G = G 1 ∪ G 2 ∪ ... andẼ = E 1 ∪ E 2 ∪ ..., both will have length N f lat = N inputs N out . These flattened versions contain all of the information within the original lists G and E except for information about where each solution candidate came from. As I will discuss later, many processing functions may be constructed for which information about where each solution candidate originated is not important. A second pair of useful quantities is the list of unique solution candidates inG, and their associated energies. I label these quantitiesG (u) As a convention, forG andG (u) , which are both solution candidate lists, I use a subscript to refer to the solution number and put the list of bits to be considered as a functional argument. For instancẽ G j (i) is the value of the ith bit in solution candidate number j. Alternatively,G j [R i ] is the list of all of the bit values over the cluster R i in solution candidate number j. For S, which only has a bit index, I use the subscript to refer to the bit cluster, so for instance S i refers to the value of the inferred bit value of bit i and while S Ri refers to the list of inferred bit values on the cluster R i , expressed For single bit clusters, the solution candidates can be divided into two groups based on the value of the bit. For multi-bit clusters the picture is more complicated, one quantity which I will demonstrate later is convenient to define is a weighting factor, W (Ẽ j ,G j [R i ], S j ) which weights the importance of each state to calculating P for the cluster. Based on these weighting factors, I define where |Rj | , and the minimum value is taken to guarantee that P i ∈ [0, 0.5]. For simplicity, one can further restrict this study to functions W which can be decomposed into two parts, one which depends purely onẼ, and one which depends purely on S such that (10) As a further matter of notation, I use piping symbols |⋆| to refer to the length of a list, so for instance |R| means the number of elements in the list R.

Examples with Existing protocols
Let us now discuss in more detail how to construct algorithms based on inference primitives from quantum annealers. As an example, I will first explicitly demonstrate how both the traditional QAA and the simplest local search method of [14] can be re-expressed in terms of inference primitives.
The traditionally formulated QAA is not biased toward a particular state, we formulate a processing function F init which takes no inputs and returns P i = 0.5 ∀i. For these values of P , the values of S do not matter, so we set them to be all 1 without loss of generality, (11) In general, the traditional QAA can be augmented by sophisticated post processing, [31,32,33,34], and therefore after the inference primitive, we should include a second processing function F post (G, E, R) to include all of these possibilities. This representation is depicted on the left of Fig. 4. The hybrid methods used in [32,33,34] actually use multiple runs with changing problem definitions to solve a problem, and therefore constitute many repeated runs of the protocol depicted on the left of Fig. 4. I discuss in Sec. 4.3 how such existing hybrid techniques may be combined with more sophisticated inference primitive protocols.
For the local search protocols considered in [14], the results of previous calls to the inference primitive are used sequentially, with the result of a previous call being fed into the next iteration of the protocol, as depicted on the right of Fig. 4. In this case, however, there is only one global value of P i = p ∀i which defines the uncertainty, the processing function which is run at each step can therefore List of all unique solution candidates inGG where p ′ is the global value of P to be used for the next local search, and the protocol is run iteratively with p ← p ′ and S ← S ′ at each step. This formalsim can further be generalized to represent another class of hybrid annealer based algorithms, which can be used without any reverse annealing capabilities. These algorithms, which have been shown to be successful in [26,27] work by 'fixing' some qubits by removing them from the problem description and replacing them with appropriate field terms to match the state which they are assumed to take. This kind of process allows an annealer without reverse annealing to be represented by an inferrence primitive where p i is restricted to only take values of either 0, indicating that a spin is to be 'fixed' or 0.5 for those which are not removed and will be annealed normally. The representation of these algorithms in the inferrence primitive formalism are therefore exactly the same as the ones for the local search given in Fig. 4, but with Figure 5: Structure of the poplution annealing inspired protocols from [14] expressed in the inference primitive formalism.
where P ′ i ∈ {0, 0.5}. Going beyond simple local search [14], protocols incorporating local search that are inspired by the state-of-the-art optimization techniques of parallel tempering [35,36] and population annealing [37,38,39,40], these algorithms can be represented in this framework. The processing function and inference primitives will still have the general local search structure in Eq. 12, but generally allow {G, E} to be copied (in the case of population annealing) or exchanged between sets of inferrence primitives with different p values. The structure of the population annealing inspired protocol is depicted in Fig. 5, while the structure of a parallel tempering inspired algorithm is depicted in Fig. 6. Figure 6: Structure of the parallel tempering inspired protocols from [14] expressed in the inference primitive formalism.

Algorithmic Design
As well as being a powerful tool for expressing currently proposed algorithms, the inference primitive formalism is also a powerful tool in designing new algorithms. This formalism depicts the different possible ways for information to flow between classical processing and a quantum 'inference primitive' subroutine in a high level way, and therefore can be used to express different algorithmic possibilities in terms of information flow. Thus far, we have only considered processing functions which take outputs from a single call to an inference primitive, however, processing functions can be constructed which take information from multiple inference primitive calls. Using processing functions in this way represents a breeding hybridization step in a genetic algorithm. While the focus of this paper is on developing the inference primitive formalism for design of annealer algorithms, rather than to design specific heuristics, it is still useful to discuss examples how different processing function heuristics can be constructed, which I do in the next subsection.

Processing Function Heuristics
Although the primary purpose of this paper is not to design algorithms, it is worth briefly discussing what form the heuristics in the processing function could take, including some examples which are direct extensions of work which has already been done. While testing these heuristics would be useful, doing it properly would be quite an involved task, and therefore beyond the scope of the current work. The focus of this work is to examine how new algorithms can be designed for a quantum annealer with generalized controls, not to study relative algorithm performance.
Recall that I have already discussed heuristics to convert probability values for each qubit into the actual s ′ values which will be supplied to the an-nealer. In the inference primitive formalism details of the exact experimental implementation are contained with the inference primitive Φ itself, rather than the processing function F . In this subsection however, I focus on the processing function F which provides uncertainty information which can then be converted to experimental parameters in the inference primitive.
For simplicity, let us start with cases where the processing function F only has a single stream of input values from the inference primitive {G, E}. In this case, the simplest thing to do is just to take statistics over the raw data, calculating the probability that a bit will take a certain value directly by averaging over G with no regard for E, as was done in Eq. 7 and 8. Such a simplistic approach relies on the ability of the inference primitive, for instance a quantum annealer, to always find highly optimal states. However, in practice real devices may not do this.
One approach to mitigate the fact that some solutions in G may not be very optimal is to only consider candidates which have an energy below an 'elite threshold', this approach has already proved useful in hybrid algorithms used in [26,27] which do not require an initial state to be seeded. Those papers, however, were based on annealers which did not have reverse annealing capabilities. With reverse annealing capabilities (and independent annealing controls of individual qubits), their method can be extended to include the possibility where the direction of a state of a qubit is suspected but should not be assigned with 100% certainty. A simple generalized processing function in this case could take the form: where Θ is the Heaviside step function defined so that Θ(a) = 1 if a > 0 and Θ(a) = 0 otherwise, and E elite is the elite energy threshold, as assigned in [26,27]. Note that, as was previously done in this algorithm, any qubit with P i = 0 can be excluded from the actual annealer run and replaced with field terms.
Rather than using a hard cutoff, another way to give preference to low energy solution candidates when calculating S and P is to thermally reweight each of the unique candidates where the (u) superscript indicates a set of solution candidates and energies where duplicate candidates in G i have been removed. In this case, T can be thought of as a meta-parameter which controls the effective range of the search that will be performed by the inference primitive. This suggests that one algorithmic possibility could be to run a series of inference primitive calls as depicted in Fig. 4(right), but with successively decreasing T as a simulated annealing analogue. Thus far we have only considered processing functions F which take a single {G, E}, however, for genetic algorithms, we need to define processing functions which take sets of inference primitive outputs {G, E}. One way to construct such processing function heuristics is to create flattened lists, which treat all solution candidates as if they came from a single inference primitive, these flattened data {G,Ẽ} can then be used directly in heuristics such as those discussed earlier in this section. Not all processing functions can be represented in this way, however, for example a processing function F could take the lowest energy solution candidate from two different G ∈ G and assign P i = 0.5 to bits which disagree between the two and P i = p where 0 < p < 0.5 to those which do.

Algorithm Structure
Now that I have given examples of how processing function heuristics can be constructed, it is worth briefly considering how the inference primitive formalism can be used as a graphical tool to design new algorithms. For instance, a genetic component can be added to the population annealing algorithm depicted in Fig. 5 by allowing multiple edges to be incident on each processing function, as depicted in Fig. 7. Because of the way the total population is controlled in these algorithms (see [37]), adding a fixed number of extra processing functions which accept two or more inputs to produce offspring will not cause the population to grow (or shrink) uncontrollably. In this example, which inference primitive outputs get to produce extra offspring could be chosen for instance by drawing two or more from a Boltzmann probability distribution constructed from the lowest energy given by each inference primitive call (as was suggested in [14]) The inference primitive formalism can also demon- strate how we can add a genetic component to a parallel tempering inspired algorithm. In such an algorithm one can replace each single call to an inference primitive at an effective temperature with a pair of calls, and than combine these outputs in a 'hybridization pool' consisting of inference primitive calls based on pairs of inference primitive outputs as depicted in Fig. 8. These hybridization results could then be reinserted into the main pool of inference primitive calls probabilistically, one way to accomplish this is to use the process outlined below: 1. Produce 'genetic pool' of inference primitive outputs, for instance using some of the methods discussed in the previous subsection.
2. For each set of inference primitive outputs in the genetic pool, {G hyb , E hyb }, starting from the lowest T ef f and increasing, have this set of outputs replace a set in the standard inference primitive pool probabilistically with a probability determined by where T ef f is the effective temperature which has been used on the inference primitive in the parallel tempering pool. If either a replacement has been performed, or all inference primitive outputs in the regular parallel tempering pool have been tested and none have been replaced, move on to the next set of hybridzation outputs. In the case where a replacement has been successfully performed discard the inference primitive outputs which have been replaced, otherwise, discard the outputs in the genetic pool. Once all outputs in the gentic pool have been either discarded or used as replacements, move on to the next step.
3. Perform parallel tempering inspired swaps using the standard update rules as described in [14].
There are also many other algorithms which can be discovered using the inference primitive formal- ism. The two ideas here are included to give examples of how the inference primitive formalism can be used as a tool to visualize information flow in annealer based algorithm design.

Compatibility with Other Methods
Now that I have demonstrated the power of the inference primitive formalism in terms of designing algorithms based on quantum annealers with generalized classical controls, I turn my attention to how these methods are compatible with many methods which currently represent the state of the art, as well as techniques which are now on the horizon. This section is not supposed to be an exhaustive list, but rather to give the reader an idea of the versatility of inference primitive based annealer computation.

Protocol Modifications
The first technique which I discuss are techniques developed by D-Wave Systems Inc. to advance or retard individual qubits to synchronize freezing [41] using an effective local temperature estimated using the methods in [28]. These methods apply to the relative values of the annealing parameter s during the final forward anneal, a parameter which is not fixed by the inference primitive protocol described in Sec. 2, and therefore freezing can be synchronized by advancing or retarding the point at which one qubit begins its final forward anneal relative to the other qubits, as depicted in Fig. 9.

Higher Order Drivers
Let us now consider generalizations of inference primitive protocols for multi-body drivers, which 0 1 s t s'(P i ) s'(P j ) Figure 9: Depiction of how the time at which annealing from s ′ is started can be used to advance or retard individual qubit annealing schedules to synchronize freezing.
are necessary to realize non-stoquastic drivers, for example. Previously, R has just been a list of every qubit, but now will also include some clusters of qubits which are flipped simultaneously by multibody drivers. To determine the strength at which multibody drivers are applied, one should consider statistics over the overlap of each of the members of G j with the solution candidate S over the rel- |Rj | where |R j | is the number of elements in R j . When M j = 1, then the cluster agrees exactly for the candidate solution and theG j [R i ]. The value M j = −1 indicates perfect disagreement. The uncertainty value P i for the cluster R i corresponds to the probability that S Rj is closer in Hamming distance to the correct solution than ¬S Rj . Positive M j indicates that S Rj is the closer of the two, whereas negative indicates that ¬S Rj is closer.
For each cluster, we formulate a weighted sum to determine the probability that S (Ri) is closer. To achieve this, I define P in terms of a weighting factor W using Eq. (9). For simplicity, I assume that W can be decomposed into two terms such that For the energy dependent part, one could for example defineŴ (Ẽ i ) = exp( −Ẽi T ) corresponding to the thermal weighting as in Eq. (17),Ŵ (Ẽ i ) = 1 for unweighted averages as in Eq. (8), or finallŷ W (Ẽ i ) = Θ(Ẽ elite − E i ) for a multi-bit analogue of the elite averages used in Eq. (15). As for W (G j [R i ], S (Ri) ), it should be weighted to favor |M j | close to 1, as these are the values for which cluster flips will make the largest difference. A logical choice is therefore to choose weights which are inversely proportional to the number of states within the same Hamming distance from either (18) where |R j | indicates the number of elements in the set, and D(S (Ri) ,G j [R i ]) indicates the Hamming distance between the two lists.

Belief Propagation
For the current generation of annealers, with hardware graphs which are relatively small compared to the size of many relevant problems, it is important to be able to solve problems which are larger than the available hardware graph. The general method to do this is to solve problems on modified subgraphs of the hardware graph in an algorithmic way [33,34,32], eventually converging on a single consistent solution. In this paper I will focus on one particular method, the generalized belief propagation method proposed in [34] based on earlier work in [42]. Although only exact for tree graphs, belief propagation has proven to be an important tool for solving a host of important real world problems, most notably decoding Low Density Parity Check Codes (LDPC) [43,44]. The belief propagation method described in [34] performs belief propagation between hardware-sized subgraphs to obtain an approximate thermal distribution.
Because this method obtains a distribution, rather than a single state, it can be used effectively as an inference primitive and therefore can be used as a subroutine in all of the previously discussed algorithms, using the same {R, S, P } throughout the protocol until either convergence is found or a timeout occurs. However, the marginals which are calculated throughout the protocol carry beliefs about the likely value of a bit and its uncertainty. The protocol can be made more efficient by using this information to update {S, P }, whenever the beliefs are updated. With fixed {S, P } new information about bit values is wasted. If one of the bit values S i with a low value of P i , became inconsistent with the others during the course of this protocol it would likely not be able to correct for this inconsistency and may either fail to converge or return a low quality solution.
In the algorithm proposed in [34], each bit has an associated marginal, b i (z i ), which contains information about the relative likelihood of a bit having a value of 1 or −1. Based on a normalized version of this marginal, we can find an approximate value for S i and P i which dynamically updates at each step of the protocol:

Conclusions
In this paper I have proposed a new way of thinking about algorithms based on a quantum annealer with generalized classical controls. I have given examples both of how existing quantum annealer based algorithms can be represented in this formalism and how this formalism can be used to design new algorithms, including algorithms with genetic components. While the algorithms proposed here will not in general obey detailed balance, they could allow for a more complete accounting of the low energy local minima of an energy landscape, and therefore may be useful for calculating thermal distributions if used with appropriate post processing.
To motivate this formalism I have given a proof-ofprinciple demonstration that the output of annealer runs contain information not only about the likely solution to a problem Hamiltonian, but also the relative bitwise uncertainty. Although a full analysis is beyond the scope of this paper, it would likely be interesting to explore the connection between the methods proposed here and quantum inspired diffusion Monte Carlo algorithms as discussed in [45,46], which show similar structure in the methods with which they solve problems. It would likewise be interesting to develop inference primitives based on other physical mechanisms, such as closed system adiabatic quantum computing, or quantum walks. It would also be interesting to run comparisons of algorithms designed with this formalism on real devices to determine their performance, and to design more algorithms. The algorithms given in this paper are only intended as examples of how the design techniques I have developed can be used, this paper has only scratched the surface of the algorithmic possibilities for this functionality of a quantum annealer.
There are many potential heuristics which could be used to relate the probabilites P which are passed to an inference primitive to the annealing s ′ parameter which is use in a reverse annealing protocol. While the focus of this paper is not on how to actually relate these two parameters, it is instructive to give a simple example of what one such heuristic could look like. Whether or not this heuristic works well in practice is beyond the scope of this current work, and almost certainly more sophisticated heuristics, for instance based on the local temperture estimates given in [28] are likely to perform better.
To start with, I make use of the fact that it has been numerically demonstrated that quantum fluctuations moderated by a transverse field can be used as a proxy for thermal fluctuations for inference problems [47]. In this spirit I define an approximate effective temperature related to a transverse field strength, which is set by a chosen value of s in Eq. (1) which I denote as s ′ . This can be done using the method suggested in [14] by analytically diagonalizing the Hamiltonian at the appropriate point in the annealing schedule with a "problem" Hamiltonian consisting of a single bit Hamiltonian with a longitudinal field of unit strength, H 1 (s ′ ) = −A(s ′ ) σ x + B(s ′ ) σ z . This ratio is then compared to a Boltzmann distribution, and the equation inverted to solve for temperature. This approach yields In situations where coupling is present, rather than the single qubit case examined here, the effective picture becomes more complicated. To correctly determine the effect of a coupler on a single qubit, one must take into account the fact that all other qubits within the coupler are also fluctuating in a way which is generally complicated and correlated both with each other and the qubit we are examining. The results in [47] suggest, however, that these complicated effects will be very similar for both quantum and thermal fluctuations. Based on these similarities, a simple first approximation is to apply relationships between temperature and driver strength which are derived in the single qubit case to larger multi-qubit systems, based on the reasoning that the effects of correlations with neighbors may be qualitatively similar in both cases. While this is a rather crude approximation, the heuristic given here is only intended as a minimal example, single qubit dynamics provide one of the simplest ways to relate temperature to transverse field. Alternatively, a local temperature could be estimated experimentally using the methods of [28], or by adapting the methods to estimate a global effective temperature used in [9]. Now I use the seminal result by Nishimori [48,49,31] that a temperature can be related to an error probability via the Nishimori temperature, T N . This relationship is mathematically rigorous and is the underlying principle behind maximum entropy inference, which has many practical applications [50,51,52,53,54,55,56]. In these applications, the Nishimori temperature serves to match a temperature to an effective uncertainty, expressed as a probability P . The quantity could be, for instance, an error rate in the context of decoding of communications as in [56]. In the context of inference primitive protocols, P should be taken as P i for a given bit or cluster of bits a simple approximate heuristic to relate the probailities to the effective temperature T ′ is to set it to be proportional to the Nishimori temperature By plugging in the approximate formula in Eq. 21 and inverting the equation, I obtain the approximate uncertainty value, The relationship I have just derived allows a direct definition of the uncertainty values defined in {P } in Sec. 2 in terms of real device parameters. Expressed in these term, the algorithms in [14] assign the same probability of being incorrect to every bit value.
Thus far, I have assumed that the annealer is exposed to a bath with a temperature which is low compared with the relevant energy scales A(s ′ ) and B(s ′ ). However, this may not be the case in a real annealer. In this case we can make the approximation that the themal and quantum fluctuations act in a statistically independent way and add them in quadrature, (25) It is worth discussing briefly a special subclass of problem Hamiltonians for which h i = 0∀i in Eq. (2). For the quantum annealing algorithm applied to such a problem Hamiltoninan, the mean orientation of a bit is zero σ z i = 0 and similarly for any cluster of bits j∈Ri σ z j = 0 by the fact that these Hamiltonians have a Z 2 symmetry with respect to flipping all of the qubits (global bit inversion). However, the candidate solution breaks this symmetry, meaning that solution refinement will still work. If multiple sets of annealer outputs are being combined (i.e. |G| > 1) for such a problem Hamiltonian, then we should consider the possibility of performing global spin inversions on some of the sets of outputs before applying the processing function. Ideally this should be chosen as the one which yields the highest possible bitwise correlation between all of the candidates.
Because the space of possible global spin inversions of candidate solutions will be 2 Ninputs , performing an exhaustive search over all possible inversions may not be possible if N inputs is moderately large. However a heuristic search method such a simulated annealing could be used to find choices which yield high correlations. Alternatively, one could break the spin inversion symmetry by taking a 'majority vote', and performing a global bit inversion on all solution candidates in G k if more bits are in the −1 state than the 1 state.
A simple alternative approach for problems where h i = 0∀i is to effectively fix a single spin arbitrarily, and replace coupling to that spin with fields. While mathematically correct, this approach has the disadvantage that it gives one spin a 'privileged' role in that quantum fluctuations damp out the effect of couplers much more strongly then they do fields because the effect of a coupler is moderated by the fluctuations of two qubits, while the effect of a field is moderated only by the fluctuations of the single qubit it is coupled to.
The methods which I have derived in this section to relate the local annealing parameter on the real device s ′ to the uncertainty value P i are not necessarily unique, there will be other suitable mathematical ways to relate these quantities. For real applications the preferred method may actually be to try different heuristics until one is found which works well, or to try to work out this relationship directly experimentally, for instance by adapting the bisection methods used to find the range of local searches proposed in [14].