Solving larger maximum clique problems using parallel quantum annealing

Quantum annealing has the potential to find low energy solutions of NP-hard problems that can be expressed as quadratic unconstrained binary optimization problems. However, the hardware of the quantum annealer manufactured by D-Wave Systems, which we consider in this work, is sparsely connected and moderately sized (on the order of thousands of qubits), thus necessitating a minor-embedding of a logical problem onto the physical qubit hardware. The combination of relatively small hardware sizes and the necessity of a minor-embedding can mean that solving large optimization problems is not possible on current quantum annealers. In this research, we show that a hybrid approach combining parallel quantum annealing with graph decomposition allows one to solve larger optimization problem accurately. We apply the approach to the Maximum Clique problem on graphs with up to 120 nodes and 6395 edges.


Introduction
Quantum annealers are devices that aim to use quantum mechanical fluctuations to search for low energy solutions to combinatorial optimization problems [1][2][3][4].Quantum annealers have been physically manufactured, such as those manufactured by D-Wave Systems, Inc. [5][6][7][8][9], and allow approximate solutions of NP-hard problems to be computed that are hard to solve classically.D-Wave quantum annealers are designed to find high quality solutions of so-called quadratic unconstrained binary optimization (QUBO) and Ising problems by minimizing the function where the linear weights a i ∈ R, i ∈ {1, . . ., n}, and the quadratic couplers a ij ∈ R for i < j are chosen by the user to define the problem under investigation.The variables x i , i ∈ {1, . . ., n}, are binary and unknown.The function of eq. ( 1) is called a QUBO if x i ∈ {0, 1}, and an Ising problem if x i ∈ {−1, +1} for all i ∈ {1, . . ., n}.Both QUBO and Ising formulations are equivalent [10,11].After mapping eq. ( 1) to the physical quantum system, the D-Wave quantum annealer aims to find the values of x 1 , . . ., x n that minimize H by trying to obtain a minimum-energy state of the quantum system.To this end, the coefficients of the linear terms in eq. ( 1) are mapped onto the corresponding qubits' physical parameters, and the coefficients for the quadratic terms are mapped onto parameters of the links connecting the corresponding qubits.However, this process is subject to a variety of constraints in practice, which limit the applicability of the quantum annealer.First, the number of available hardware qubits is relatively small, which restricts the admissible problem size of eq.(1).Second, the connectivity of the hardware qubits on the D-Wave quantum chip is limited and local.Therefore, for two logical qubits x i and x j with a ij = 0 in eq.(1), it is not guaranteed that a physical link between qubits i and j exists on the quantum hardware.To alleviate this issue, a minor embedding of the connectivity structure of eq. ( 1) onto the D-Wave qubit hardware graph is needed, where one logical qubit is represented by a connected set of physical qubits, called a chain.However, the existence of chains often results in a severe reduction in the number of available qubits [11,12].Third, excessive chain lengths may cause the solution

Literature review
Exact algorithms for solving NP-hard problems on real-life problems of practical importance have received continuous attention in the literature [23][24][25].One can broadly distinguish between three flavors of such methods: exact exponential-time algorithms to decompose NP-hard problems, (polynomial-time) algorithms for special cases of NP instances, and hybrid algorithms relying on both decomposition and quantum annealing.
The idea of an exact decomposition for NP-hard problems, which also lays at the heart of the present contribution, dates back to at least [26].Similar techniques for other NP-hard problems such as graph coloring are known [27].For MC, the exact decomposition algorithm that serves as the basis of the present work has been proposed in [11].In [16], the authors consider several techniques to speed up the decomposition by pruning subproblems that cannot contribute to the optimal solution, for instance by computing bounds on clique sizes in subproblems.Although bounding and pruning subproblems can considerably speed up the computations and allow one to solve problems of practical importance exactly, there is no guarantee that exact decomposition techniques terminate in reasonable time, and they do not asymptotically lower the exponential runtime complexity.
For the Maximum Independent Set problem, the complimentary problem of MC, a variety of exact algorithms are available [28][29][30][31].Additionally, special cases can be solved in polynomial-time [32][33][34].Some known algorithms for such special cases rely on graph decomposition [35,36].Similarly to before, the aforementioned exact techniques still have an exponential runtime complexity, and the special cases that can be solved in polynomial time are rather specific, meaning they usually do not apply to problems of practical importance.
The algorithm of [16] has been generalized in [37] to decompose large Minimum Vertex Cover (MVC) problems, a related problem to MC, and turned into a framework for decomposing certain NP-hard graph problems that does not explicitly consider quantum annealing [19].Further exponential-time algorithms for MVC have been proposed [38][39][40][41], including some aiming to reduce the size of an MVC instance [42,43], and some considering the weighted MVC problem [44,45].
The related problem of enumerating all maximum cliques of a graph has been addressed in the literature with the algorithms of [46] and [47], which both do not rely on graph decomposition.A parallel version of the algorithm of [46] can be found in [48].
The present work falls under the area of branch-and-prune algorithms, specifically those that employ a different solver once the generated subproblems of the decomposition are sufficiently small.A survey of such algorithms can be found in [49] and [50].Recent works that specifically employ the D-Wave quantum annealer in connection with a classical decomposition include [51], who use the heuristic D-Wave tool "QBsolv".
Although having their merits for practical use, the aforementioned algorithms cannot solve arbitrary instances in time that is asymptotically lower than exponential.Moreover, techniques using decomposition in connection with quantum annealing do not give a guarantee of optimality, as solutions of subproblems solved via quantum annealing are of a heuristic nature.
Finally, a survey of classical simplification techniques for QUBO problems, including problem-agnostic decomposition algorithms for QUBOs that are applicable in special cases, can be found in [52] and [53].

Methods
This section introduces the MC problem that we aim to solve (Section 2.1) and the DBK algorithm, which is able to decompose an MC instance recursively into smaller instances of prespecified size (Section 2.2).Moreover, we introduce the D-Wave Advantage System 4.1 and its hardware (Section 2.3), as well as the idea of parallel quantum annealing (Section 2.4), which allows one to solve several problem instances in a single call to the quantum annealer.A generalization of the time-to-solution (TTS) metric for measuring the performance of the DBK algorithm is introduced in Section 2.5.

The Maximum Clique problem
Let G = (V, E) be an undirected graph with vertex set V and edge set Many NP-hard problems can be easily reformulated as QUBO or Ising problems of the form of eq. ( 1).This includes, for instance, graph coloring, graph partitioning, or the MC problem [10].As given in [16], the QUBO formulation for the MC problem on a graph G = (V, E) is given by where E denotes the edge set of the complement graph of G and the constants A > 0 and B > 0 need to satisfy A < B. Without loss of generality, we fix A = 1 and B = 2 in the remainder of this work.Each binary variable x v ∈ {0, 1} for v ∈ V indicates if vertex v belongs to the maximum clique (x v = 1) or not (x v = 0).

The DBK algorithm
The DBK algorithm [16,19,37] was designed to decompose an input graph recursively into smaller subgraphs such that, in each recursion level, (a) each generated subgraph is strictly smaller than the graph that was split, and (b) the maximum clique of the input graph can be reconstructed exactly from the maximum cliques determined for the leaf graphs, given all subgraphs can be solved exactly.The name of the algorithm stands for decomposition, bounds, and k-core.These three components work as follows.
First, the decomposition of an input graph G is illustrated in Figure 1.We choose a random vertex v ∈ G and use it to split G into two graphs, the subgraph G v induced by the neighbor vertices of v, and the subgraph G obtained by removing v from G including all its incident edges.If v is part of a maximum clique then that maximum clique, minus vertex v, will be contained in G v .Hence, the size of the maximum clique of G v will be one less than the size of the maximum clique of G. On the other hand, if v is not part of a maximum clique, then v can be removed from G, resulting in a new graph G that contains the same maximum clique as G. Therefore, the maximum clique can be reconstructed after determining the maximum clique in both G v and G , with each of these graphs being strictly smaller than G.The decomposition is valid irrespective of the choice of the vertex v used for splitting the graph, though some choices might be advantageous in that they make the algorithm terminate faster.As shown in [16,19], choosing v as the vertex of lowest degree yields the fastest solutions.Second, before recursing into any of the generated subgraphs, the size of the clique contained in a subgraph can be lower and upper bounded using various techniques.For instance, the chromatic number computed with a fast heuristic (we use the Python module NetworkX of [54]), and the upper bound of [55] provide efficiently computable upper bounds.Likewise, a trivial lower bound on any subproblem in the decomposition tree is obtained by simply solving the other subbranch.These upper and lower bounds are employed in the DBK algorithm.These bounds were selected due to their low runtime complexity and good performance compared to other bounds (thus allowing one to prune many subproblems).
Third, the DBK algorithm attempts to simplify any newly generated subproblem in the decomposition tree before solving it further in a recursive fashion.After exploring several techniques, the k-core algorithm was selected in [16].This algorithm works as follows.The k-code of a graph is the subgraph consisting of all vertices having degree at least k.Clearly, at any point during the decomposition, one can simplify a subgraph by replacing it with its k-core, where k is set to the largest clique number found so far, thus potentially reducing the size of some of the subproblems before continuing the recursion.
The complete algorithm is summarized in Algorithm 1, which uses a stack called subgraphs instead of a recursion.Starting from the input graph G, Algorithm 1 first queries a lower bound on the maximum clique and then simplifies G by replacing it with its k-core.Next, G is split at the lowest degree vertex of G.Both resulting subgraphs, denoted g and g , are then examined recursively.After replacing them with their k-core as done before, the maximum clique size is lower bounded using the MCs found in g and g and k is updated if necessary.If the size of a subgraph is at most L, the size cutoff at which we attempt to solve the maximum clique problem, we query the provided solver method, otherwise a subgraph is added to the stack subgraph for further decomposition.Though proven to be exact, the DBK algorithm has a worst-case exponential runtime (which is to be expected since the MC problem is NP-hard).
The DBK algorithm can be specifically defined based on what Maximum Clique solver method is used.For example a heuristic Maximum Clique solver could be used, in which case, despite the decomposition being exact, the solution may not be optimal.In this case we use two different versions of DBK.
The first we call DBK-pQA; here the Maximum Clique problem is sampled using the D-Wave Advantage System 4.1 quantum annealer with parallel quantum annealing.In this case the solver method that is used in the DBK algorithm (line 16 of Algorithm 1) is querying the quantum annealer for 1, 000 samples, which are then post processed into solutions for the maximum clique problem.The largest clique found in those 1, 000 samples is returned to the main DBK algorithm.The DBK-pQA method is a hybrid quantum-classical algorithm.
The second variant of DBK we call DBK-fmc.In this case the Maximum Clique solver method (line 16 of Algorithm 1) is the Fast Maximum Clique solver of [56], which is an exact classical solver.Therefore the DBK-fmc algorithm is guaranteed to find the Maximum Clique of the given graph.
g ← subgraphs.pop() v ← lowest degree vertex of g 7: Partition g at vertex v into g and g 8: for g * ∈ {g , g } do 9: if lower bound on maximum clique of g * exceeds k then if size of g * exceeds L then 14: subgraphs.push(g* ) 15: else if upper bound on maximum clique of g * exceeds k then 16: Call solver method on g * 17: Update k while taking into account previously removed vertices end for 20: end while 21: return k

The D-Wave Advantage System 4.1
The hardware connectivity of the D-Wave Advantage System systems is referred to as Pegasus [57], and the connectivity graphs of the D-Wave 2000Q systems are referred to as Chimera [58,59].Both of these connectivity graphs are relatively sparse, and they do not allow for the direct mapping of problem QUBO or Ising formulations with arbitrary connectivity onto the hardware.As an alternative to the direct embedding, a minor embedding allows a problem with arbitrary connectivity (up to the hard limit of the size of the hardware) to be embedded [12].In a minor embedding, a representation of the logical problem is created in which sets of physical qubits are linked together into chains [60], where each chain represents a single logical qubit.Usually, the programmed weight of the logical variable is uniformly distributed across the chain qubits, although alternative weight distributions are possible [61,62].Computing a minor embedding with minimum chain length is NP-hard, but there are heuristics that can be used in order to efficiently generate minor embeddings [15].Another viable method to circumvent repeatedly generating minor-embeddings is to generate a fixed clique minor-embedding [58], which then allows one to embed an arbitrary graph of any connectivity so long as its size is at most the one of the clique embedding.
One of the disadvantages of minor embeddings is that at large chain lengths, the chains may begin to disagree on the logical variable state when the final state of the qubits is read out.We call these instances broken chains.Broken chains typically indicate poor solution quality [63,64], and the solutions with broken chains also need to either be repaired in some way or discarded entirely.Repairing broken chains is referred to as unembedding [65].A simple method for resolving broken chains, and thus to form a logical variable state, is to simply take the majority vote on the qubit states of all physical qubits in a chain.Note that in the edge case of a broken chain being evenly split between states then a random choice with p = 0.5 is used to resolve the chain.In this research we always apply the majority vote unembedding method.

Parallel Quantum Annealing
One of the other disadvantages of minor embeddings is that a fixed clique embedding does not typically make maximal use of the hardware available (i.e., many qubits and couplers available on the hardware stay idle as nothing is mapped onto them).This problem, in conjunction with poorer solution quality at larger embedding sizes, gives rise to the natural idea of embedding multiple smaller (disjoint) cliques onto the quantum annealing hardware and thus solving multiple minor-embedded problems on the quantum annealer during the same anneal [17].This idea is referred to as parallel quantum annealing, which makes better use of the available hardware compared to the sequentially solving of all individual problems.Parallel quantum annealing can be applied in order to solve a set of different QUBOs on a quantum annealer, or it can be used to solve the same QUBO multiple times on the quantum annealer.In this research we apply the method of solving the same maximum clique QUBO on as many embeddings (this is determined by the size of the QUBOs) as can be fit using the heuristic minor-embedding tool minorminer [15].
The quantum annealing backend used is the D-Wave Advantage System 4.1, hereafter referred to as D-Wave.Figure 2 shows the disjoint clique minor embeddings on the Pegasus connectivity graph of sizes ranging from two 100-vertex clique embeddings to twelve 50-vertex clique embeddings.Each of the disjoint embeddings can be used to solve either the same QUBO repeatedly in the same anneal, or to solve different QUBOs.
For usage in conjunction with the DBK algorithm, we solve each subproblem that DBK generates by computing the QUBO of the MC problem for that graph and embedding the QUBO as many times as possible given the fixed disjoint clique embeddings displayed in Figure 2. The size of the clique embeddings being used is always equal to the DBK cutoff value used.This increases the probability that a maximum clique will be found during a single D-Wave call [17].This is different from previous DBK approaches [16,19] which solved subproblems using sequential quantum annealing.
While running the experiments in the context of the DBK algorithm, there is a potential inefficiency in the embedding usage, which is due to the fact that the sizes of the subgraphs generated by the DBK algorithm can be less than the DBK cutoff value.In such a case, the utilized embedding is larger than required, thus resulting in inefficient usage of the available qubits and unnecessarily long chains, which in turn potentially decreases the solution quality.However, for the purposes of being able to solve multiple problems simultaneously, this is a reasonable cost to take on.

The time-to-solution (TTS) metrics
Since the D-Wave quantum annealer is a probabilistic solver and because of its imperfect hardware, it only samples a ground state solution for a QUBO or Ising problem with a certain probability.Therefore, many samples are usually necessary to obtain an optimal, or at least a sufficiently good, solution.In order to estimate the time it takes for D-Wave to find an optimal solution, one can use the so-called time-to-solution (TTS) metric [66,67].TTS is a measure of the time it takes to reach an optimal solution with a 99 percent probability.
When using parallel quantum computing, i.e., when solving N ∈ N problem instances simultaneously on the D-Wave hardware, the standard method to compute the TTS does not apply.Let A i ∈ N be the number of samples for problem i ∈ {1, . . ., N }, let T QPU i be the QPU access time (in seconds), and T unembedi be the classical processing time (in seconds) to unembed the solution vectors across all disjoint minor-embeddings.Since the same problem is embedded multiple times on the hardware chip, it is possible that a single anneal might have found the ground state solution multiple times.However, for the purposes of computing the ground state probability p i , we solely count the number of samples (among the total A i samples) that found the ground state solution at least once.If p i = 0 we can not compute TTS opt .The quantity p i is also called the ground state probability (GSP) for subgraph i ∈ {1, . . ., N }.Lastly, we record the DBK processing time required to carry out the DBK decomposition (i.e., excluding the time required to solve each subgraph), denoted as T DBK proc time .We obtain the formula if 0 < p i < 1.The coefficient log(0.01)log(1−pi) corresponds to the number of times the algorithm should be run for the corresponding subgraph to ensure 99% probability of finding an optimal solution.In the special case where p i = 1 for some i ∈ {1, . . ., N }, we set the TTS for that subgraph to be (T QPU i + T unembedi )/A i .Also, since we use a fixed minor-embedding, we can set the minor embedding computation time to zero because the embedding can be pre-computed.
Measuring the time-to-solution of a probabilistic algorithm is a useful way of comparing it to other algorithms.However, in computing the TTS for an algorithm that uses a quantum annealer, there are some important assumptions to note.First, when estimating the time to run an anneal (and the time to unembed that solution), we are assuming that the QPU time (and unembedding time) are proportional to the number of samples.This assumption is not necessarily true, in particular because one can gain efficiency by batching samples together into the same job.Second, the TTS formula aims to compute the minimum amount of time that is required to reach the optimal solution with 99% confidence.However this does not take into account that determining the optimal number of samples to take in order to reach this ideal TTS in general is not known before actually running the algorithm.Therefore, quantifying TTS in this manner gives a lower bound on the ideal computation time.
When using a quantum annealer in order to solve optimization problems in practice, it is easier to set the number of samples per problem to some fixed number.However, this means that when the quantum annealer is used as a subsolver in a larger hybrid algorithm (for instance DBK), then the optimal TTS given by formula (3) will not be achieved.In this case, we can consider computing the TTS of the DBK algorithm, where the success rate p is the number of trials where DBK succeeded in finding the maximum clique and T QPU i and T unembedi will be the same for all problems (since they only depend on the number of samples A i ).Importantly in this case the number of samples to use when querying the quantum annealer A i = A can be selected by the user.The resulting TTS formula for fixed number of samples is Here, T classical is the classical time used by D-Wave, e.g., mapping the problem onto the D-Wave hardware and postprocessing the samples.In the case p = 1, we set TTS fixed = T QPU + T classical .If p = 0, then TTS fixed is not defined.

Experiments
This section presents some experimental results.After introducing the experimental setup in Section 3.1, we apply the DBK-fmc algorithm of Section 2.2 to a set of test graphs.We study how the number of subgraphs as well as their size and density behaves during the decomposition (Section 3.2), and we investigate the probability of finding an optimal solution at the leaf level of the decomposition tree, when using parallel quantum annealing, as a function of the DBK-fmc cutoff value (Section 3.3).Moreover, we consider a comparison of the DBK-fmc algorithm where the sub-problems are solved using parallel quantum annealing and the classical FMC solver using a modified TTS measure that emulates an optimal usage of quantum annealing in conjunction with DBK (Section 3.4).The experiments conclude with an application of the DBK-pQA algorithm where a quantum annealer is used as a real time subsolver with a fixed number of samples (Section 3.5).

Experimental set-up
All experiments are performed on a series of 60 Erdős-Rényi random graphs [18].Each of these 60 graphs has 120 vertices and density sampled from a continuous uniform distribution in [0.1, 0.9].Additionally, we ensure that each of the test graphs is connected, meaning there is a path between any pair of vertices.We apply the DBK algorithm to each of the 60 graphs using different cutoff sizes for the subgraphs.To be precise, we study the behavior of the DBK algorithm for the cutoff values {110, 100, 90, 80, 70, 60, 50}, where the solver used is the classical Fast Maximum Clique solver of [56].Then, we investigate under what parameters the D-Wave quantum annealer would be able to solve the generated subgraph problems (as well as what the precise Time-to-Solution characteristics are).In this way, similarly to [16,19,37], we can emulate the expected quantum annealer behavior in the DBK algorithm.Next, using the information gained from the the previous step we run DBK using the quantum annealer as the real-time subsolver.
The D-Wave settings we use are annealing time 50 microseconds, 1000 samples per backend call, readout thermalization 0 microseconds, programming thermalization 0 microseconds, and the reduce intersample correlation Boolean flag is set to True.The chain strength value is dynamically computed based on the individual QUBO properties using the D-Wave Ocean SDK function uniform torque compensation [68] (with a user-specified prefactor of 0.2).The uniform torque compensation method attempts to provide a chain strength that reduces the number of broken chains, chosen as the square root of the mean of the quadratic couplers of the QUBO.We found empirically that for the large Maximum Clique minor embedded problems, a uniform torque compensation prefactor less than 0.5 and greater than 0.1 gave the overall best results; we chose to use 0.2, which resulted in favorable approximation ratios overall (see Figure 4).Because the settings across these experiments are constant, the resulting QPU time (specifically qpu-access-time) is relatively constant in these experiments at about 1 second per backend call.
In Figures 3, 5, and 7 we color code lower input graph densities with dark blue, intermediate densities (0.5) are colored orange and yellow corresponds to higher input graph densities (before decomposition).

Number, density, and size of subgraphs with DBK-fmc
Figure 3 shows the number of subgraphs, the average subgraph density, as well as the average size of the subgraphs generated at each cutoff level of the DBK-fmc algorithm.Note that the number of generated subgraphs can be zero (in particular, if a generated subgraph is a clique itself then it is not necessary to solve it, and therefore it not counted as a subgraph to be solved).
First, Figure 3 (top left) shows that the number of generated subgraphs generally increases as the DBK cutoff decreases, as one should expect, especially for higher density graphs.At lower densities, the number of subgraphs is generally quite small (sometimes even zero) and does not greatly increase as the cutoff level decreases.An interesting case occurs for some high density graphs, whereby the trend of increasing generated subgraph counts actually reverses at a cutoff value of 60.The precise reason why the reversal occurs at a cutoff value of 60 is not clear, but the existence of a reversal is somewhat expected.This is due to the fact that two phenomena work against each other.A lower cutoff value causes the decomposition to run longer, thus producing more subgraphs.At the same time, an increase in the number of subgraphs results in all bounds working better, thus leading to more pruning.
Second, Figure 3 (top right) shows that on average, as the cutoff decreases, the subgraph density increases in comparison to the density of the input graph.This is again a result of the design of the DBK algorithm, as the DBK algorithm preferentially removes lower degree nodes, both with the help of the k-core reduction and the low degree vertex removal when partitioning.
Third, Figure 3 (bottom) shows the average subgraph size at each cutoff level.By construction of the DBK algorithm, the size of the subgraphs will be either equal to or smaller than the cutoff level.The latter case happens if either the bounds or the k-core reduction worked particularly well.As can be seen, the size of most subgraphs is indeed roughly equal to the cutoff for high densities, while subgraph sizes for lower densities vary widely.

Solving the Maximum Clique problem on the subgraphs
We are interested in exploring the accuracy of the solutions found by parallel quantum annealing on the leaf level of the DBK-fmc decomposition.First, for the test graphs of Section 3.1, Figure 2 shows the disjoint minorembeddings for different all-to-all problem graphs embedded on the Advantage System 4.1 Pegasus connectivity graph.We observe that a single anneal can find the maximum clique multiple times in a single anneal when using parallel embeddings.However, when calculating the ground state probability p in eq. ( 3) over multiple samples, we only consider a binary indicator (maximum clique found at least once vs. not found) per sample.Note that, typically, single set of samples will contain a small number of optimal solutions (e.g., once or twice), if any.
Figure 4 shows the approximation ratios (specifically the approximation ratio of the largest clique found among the 1, 000 anneals) across all subgraphs generated during decomposition across the varying DBK cutoff values.Importantly, the approximation ratios consistently decrease as cutoff gets larger, and at a DBK cutoff of 50, all of the subgraphs can be solved exactly.The difficult task is that for the original large graphs there is not a consistent way to know a-priori if the quantum annealer will find the maximum clique or not; this approximation ratio plot shows that decomposing the graph into smaller subgraphs increases the approximation ratio of the solutions found, up to being able to always find the maximum clique (at least for these 60 random graphs).The approximation ratios in Figure 4 at a DBK cutoff of 120 correspond to the non-parallel QA results from embedding a single 120 node clique onto the hardware and running the MC QUBO for each of the 60 graphs on the hardware.Once the DBK decomposition is applied we see a gradual decrease in the failure rate to reach the optimal solution.
Next, Figure 5 looks at the failure rate for finding ground state solutions over all 60 test graphs as a function of the DBK cutoff value.We observe that for higher graph densities, ground state solutions are almost never found when using a DBK cutoff of 90 or more, while for graphs decomposed down to sizes 70 or less, the ground state of the subproblems is almost always found.As already observed in Section 3.2, the size and density of the subgraphs for lower input densities varies widely, resulting in either very easy or very hard MC problems for the quantum annealer.This causes the failure rate to be very volatile for lower input densities.
Last, we visualize the number of subgraphs stratified by the proportion of samples that correctly found the  optimal solution, referred to as the ground state probability (GSP) of the Maximum Clique QUBO(s), as a histogram.Figure 6 shows that, as seen before, hardly any subproblem can be solved for a DBK cutoff of 110.The lower the cutoff, the most subproblems can be solved to optimality by D-Wave.As seen in Figure 5, the GSP for a DBK cutoff of 50 is always nonzero, meaning that when decomposing down to a subproblem size of 50, all problems could be solved at least once to optimality in the 1000 samples from the quantum annealer.

Comparison of the DBK algorithm and FMC with the TTS measure
We compare the full DBK algorithm to the classical FMC solver using the TTS measure introduced in eq. ( 3).
We note that the current noisy intermediate-scale quantum (NISQ) technology [69], including D-Wave's quantum annealers, is not advanced enough and, hence, not competitive with classical computers on solving NP-hard optimization problems when dedicated algorithms are used.Regardless, we compare our algorithm against a highly optimized MC solver, the Fast Maximum Clique (FMC) solver from [56] in order to determine in which cases the quantum algorithm performance gets closer to the classical one.
Figure 7 shows several aspects of the comparison.First, Figure 7 (left) shows the the TTS time of eq. ( 3) for the full DBK algorithm (including time for decomposition and unembedding) as a function of the DBK cutoff.We observe that the lower the density, the faster a test graph can be solved by the DBK algorithm.Moreover, high density graphs can only be solved when decomposing them to relatively low DBK cutoffs of 50 to 60, whereas lower density graphs can be decomposed at higher cutoffs as well.Importantly, at larger DBK cutoffs we have a lack of TTS values in the figure (especially at higher densities); this is because those TTS values could not be computed because for at least one subgraph p = 0.This means that none of the 1000 quantum annealing samples found the maximum clique at least once, which causes p i = 0 in eq. ( 3) which means a TTS value can not be computed.Thus another important aspect of Figure 7 is showing at what densities and DBK cutoff values can all subgraphs, that were generated by DBK-fmc, be solved by the quantum annealer in 1000 samples.
Second, Figure 7 (right) shows the ratio of the classical FMC process time over the TTS time for the full DBK algorithm.Here, values above one indicate that FMC is slower than the quantum annealer sampling all subgraphs generated by the DBK-fmc algorithm.Indeed, we observe that the DBK-fmc algorithm with quantum annealing is superior to the entirely classical approach for high densities at low cutoff values of around 50 or 60, and computes maximum cliques up to two orders of magnitude faster than FMC.For either higher densities, or for higher cutoff  values than 60, the classical FMC algorithm is the better choice when solving the MC problem.

Using D-Wave as a real-time subsolver for DBK
In the previous experiments, we have emulated the procedure of executing DBK where the D-Wave quantum annealer is the subsolver.This was accomplished by exactly solving the problem graphs using FMC, and then determining what parameters (for example, annealing parameter and the DBK cutoff level) to use in order to consistently find the Maximum Clique.
In this section, we present results when using the quantum annealer, with the fixed disjoint clique embeddings for parallel quantum annealing, in the real time execution of DBK-pQA (with no classical solver).Based on our previous experiments (see Figure 6), we would expect that with a DBK-pQA cutoff level of 50 and using 1, 000 samples per subgraph, the Maximum Clique solution would be consistently found.
In order to perform this experiment we ran the first 20 random graphs of the 60 generated for consistent experiments a total of 5 times, while using the quantum annealer (with majority vote unembedding) as the subsolver.Using eq. ( 4) in order to compute the T T S fixed , we can use the 5 different runs for each graph in order to compute p.However, in all 100 experiments, the algorithm found the maximum clique upon termination.Thus, p = 1 in all cases, and therefore the time-to-solution was simply the average sum of the the used QPU time plus the used classical processing time.Figure 8 shows the Time-to-Solution results.The T T S fixed metric quantifies the success rate of the DBK-pQA algorithm, as opposed to the success rate of the individual samples from the quantum annealer.Therefore, for a large number of anneal samples (i.e., 1000), we would expect the T T S fixed metric to be larger than the T T S opt metric.We observe this to be the case in Figure 8. Figure 8 shows that the DBK-pQA TTS when using the quantum annealer as a real-time subsolver is exponential with respect to the graph density.
Most importantly, for all 20 random graphs used in this experiment, DBK-pQA found an optimal solution to each original graph.This is despite the fact that D-Wave is a probabilistic sampler and DBK may generate up to 10, 000 subgraphs for each input graph that are each sent to D-Wave for finding a MC.

Discussion
In this work, we consider solving the Maximum Clique problem using a combination of graph decomposition and parallel quantum annealing.To be precise, we base our solution on the DBK algorithm [16] to decompose an arbitrary input graph to subgraphs of any pre-specified cutoff level sizes.We then employ the D-Wave Advantage System 4.1, a quantum annealer manufactured by D-Wave Systems, Inc., to solve the subproblems generated during the decomposition.In order to best leverage the capabilities of the D-Wave annealer, we embed and solve several of the generated subproblems simultaneously, an idea previously introduced under the name of parallel quantum annealing in the literature [17].Therefore, this work shows the end-to-end process required to solve large maximum clique problems to (near) optimality using a quantum annealer.
Using the DBK algorithm in connection with D-Wave Advantage System 4.1, we are able to sample ground state solutions of MC problems of up to 120 vertices, and additionally employ parallel quantum annealing to speed up the computations.We demonstrate that in some cases, the resulting algorithm can compute maximum cliques in high density graphs up to around two to three orders of magnitude faster than a classical solver.Current NISQ annealing devices are not necessarily expected to outperform classical algorithms, although a scaling advantage using quantum annealing has been demonstrated in [70].The experiment reported in [70] considers the simulation of geometrically frustrated magnets (which reduces to a 0-1 integer programming problem on a given geometric lattice) using quantum annealing, and demonstrates that a quantum annealing processor can provide a computational advantage over the classical counterpart, path-integral Monte Carlo (PIMC).
This work leaves scope for further research avenues: 1.The methodology of this article (the exact graph decomposition in connection with multiple embeddings) can be applied to other NP-hard problems such as the minimum vertex cover problem.
2. The performance of the quantum annealing backend can be improved by using further tunable features such as h-gain schedules, anneal offsets, flux bias offsets, and spin reversals.
3. It remains to be analyzed how different minor embeddings solve the same QUBO when they are all used in the same annealing cycle.In particular, it is unclear if some embeddings contribute more than others to finding the ground state solutions.If so, what characteristics of those embeddings cause them to perform better than the other embeddings?Additionally, the spatial and temporal correlations with regard to which embeddings are finding maximum cliques remain to be investigated.
4. Do minor embeddings that have the same connectivity perform the same or differently during the same anneal(s)?In other words, if all of the disjoint embeddings used in the parallel quantum annealing method are exactly the same, just acting on different qubits, are the ground states found across the different embeddings with (nearly) equal probabilities?
5. Utilizing an efficient algorithm to compute the minimum number of samples required in order to obtain an optimal TTS (as opposed to using a fixed number of samples) would significantly reduce the real time computation required to run DBK with a quantum annealer.

Data availability
Python implementation and data for parallel quantum annealing and the DBK graph decomposition algorithm can be found on Github [71].

Figure 1 :
Figure 1: Illustration of the DBK vertex splitting applied to vertex v, resulting in the induced subgraph G v of v, and a graph G = (V , E ) with V = V \ {v} and all edges incident to v removed from E. Figure taken from [16].

Figure 2 :
Figure 2: Disjoint minor embeddings for parallel problem solving on the quantum annealer.The chip topology is for the D-Wave Advantage System 4.1.The minor-embeddings are of cliques of sizes N ∈ {100, 90, 80, 70, 60, 50} from top left to bottom.Red and Blue coloring is used (randomly) in order to help to visually differentiate neighboring minor embeddings.

Figure 3 :
Figure 3: Number of subgraphs (top left), average subgraph density (top right), and average subgraph size (bottom) at each cutoff level of the DBK-fmc algorithm.Log scale on the y-axis of the top left plot.Input graph densities ranging from 0.1 to 0.9 (see color legend).

Figure 4 :
Figure 4: Scatter plot of the Maximum Clique approximation ratios across all subgraphs that were solved as a function of the DBK cutoff.Computed by taking the best Maximum Clique solution (unembedded with majority vote) out of the 1, 000 samples used for each subgraph.The data at a DBK cutoff of 120 corresponds to the original input graphs.

Figure 5 :Figure 6 :
Figure 5: Left: Failure rate (failure to reach a ground state solution) as a function of the DBK cutoff values (after majority vote unembedding).Graph densities ranging from 0.1 to 0.9 (see color legend).Right: Averaged results for four groups of input graph densities ; 0.1 − 0.3, 0.3 − 0.5, 0.5 − 0.7, and 0.7 − 0.9 (the coloring corresponds to the median graph density of those ranges).

Figure 7 :
Figure 7: TTS as a function of the DBK cutoff for solving each of the 60 test graphs.Each line represents a single graph being decomposed over different DBK cutoff values.Log scale on the y-axis.Graph densities ranging from 0.1 to 0.9 (see color legend).TTS time for the full DBK algorithm (including time for decomposition and unembedding) according to eq. 3 (left), and FMC time divided by TTS time for the full DBK algorithm (right).

Figure 8 :
Figure 8: Scatter plot showing the DBK-pQA T T S fixed (y-axis) when using D-Wave as a real time solver in the DBK-pQA algorithm as a function of graph density.