# Quantum adiabatic machine learning

## Authors

- First Online:

- Received:
- Accepted:

DOI: 10.1007/s11128-012-0506-4

- Cite this article as:
- Pudenz, K.L. & Lidar, D.A. Quantum Inf Process (2013) 12: 2027. doi:10.1007/s11128-012-0506-4

- 16 Citations
- 842 Views

## Abstract

We develop an approach to machine learning and anomaly detection via quantum adiabatic evolution. This approach consists of two quantum phases, with some amount of classical preprocessing to set up the quantum problems. In the training phase we identify an optimal set of weak classifiers, to form a single strong classifier. In the testing phase we adiabatically evolve one or more strong classifiers on a superposition of inputs in order to find certain anomalous elements in the classification space. Both the training and testing phases are executed via quantum adiabatic evolution. All quantum processing is strictly limited to two-qubit interactions so as to ensure physical feasibility. We apply and illustrate this approach in detail to the problem of software verification and validation, with a specific example of the learning phase applied to a problem of interest in flight control systems. Beyond this example, the algorithm can be used to attack a broad class of anomaly detection problems.

### Keywords

Adiabatic quantum computationQuantum algorithms Software verification and validationAnomaly detection## 1 Introduction

Machine learning is a field of computational research with broad applications, ranging from image processing to analysis of complex systems such as the stock market. There is abundant literature concerning learning theory in the classical domain, addressing speed and accuracy of the learning process for different classes of concepts [1]. Groundwork for machine learning using quantum computers has also been laid, showing that quantum machine learning, while requiring as much input information as classical machine learning, may be faster and is capable of handling concepts beyond the reach of any classical learner [2, 3].

We consider the machine learning problem of binary classification, assigning a data vector to one of two groups based on criteria derived from a set of training examples provided to the algorithm beforehand. The learning method we use is boosting, whereby multiple *weak classifiers* are combined to create a *strong classifier* formula that is more accurate than any of its components alone [4, 5]. This method can be applied to any problem where the separation of two groups of data is required, whether it is distinguishing two species of plants based on their measurements or picking out the letter “ a” from all other letters of the alphabet when it is scanned. Our approach to classification is based on recent efforts in boosting using adiabatic quantum optimization (AQO) which showed advantages over classical boosting in the sparsity of the classifiers achieved and their accuracy (for certain problems) [6, 7].

As a natural outgrowth of the classification problem, we also formulate a scheme for anomaly detection using quantum computation. Anomaly detection has myriad uses, some examples of which are detection of insider trading, finding faults in mechanical systems, and highlighting changes in time-lapsed satellite imagery [8]. Specifically, we pursue the verification and validation (V&V) of classical software, with programming errors as the anomalies to be detected. This is one of the more challenging potential applications of quantum anomaly detection, because programs are large, complex, and highly irregular in their structure. However, it is also an important and currently intractable problem for which even small gains are likely to yield benefits for the software development and testing community.

The complexity of the V&V problem is easily understood by considering the number of operations necessary for an exhaustive test of a piece of software. Covering every possible set of inputs that could be given to the software requires a number of tests that is exponential in the number of input variables, notwithstanding the complexity of each individual test [9]. Although exhaustive testing is infeasible due to its difficulty, the cost of this infeasibility is large—in 2002, NIST estimated that tens of billions of dollars were lost due to inadequate testing [10].

The subject of how to best implement software testing given limited resources has been widely studied. Within this field, efforts focused on combinatorial testing have found considerable success and will be relevant to our new approach. Combinatorial testing focuses on using the test attempts available to test all combinations of up to a small number, \(t\), of variables, with the idea that errors are usually caused by the interaction of only a few parameters [11, 12]. This approach has found considerable success [13, 14], with scaling that is logarithmic in \(n\), the number of software parameters, and exponential in \(t\).

Currently, the use of formal methods in the coding and verification phases of software development is the only way to guarantee absolute correctness of software without implementing exhaustive testing. However, formal methods are also expensive and time-consuming to implement. Model checking, a method of software analysis which aims to ensure the validity of all reachable program states, solves \(n\)-bit satisfiability problems (which are NP-complete), with \(n\) as a function of the number of reachable states of the program [15]. Theorem proving, where a program is developed alongside a proof of its own correctness, requires repeated interaction and correction from the developer as the proof is formed, with the intermediate machine-provable lemmas checked with a satisfiability solver [16].

We propose a new approach to verification and validation of software which makes use of quantum information processing. The approach consists of a quantum learning step and a quantum testing step. In the learning step, our strategy uses quantum optimization to learn the characteristics of the program being tested and the specification to which it is being tested. This learning technique is known as quantum boosting and has been previously applied to other problems, in particular image recognition [6, 17–19]. Boosting consists of building up a formula to accurately sort inputs into one of two groups by combining simple rules that sort less accurately, and in its classical forms has been frequently addressed in the machine learning literature [4, 5, 20].

The testing step is novel, and involves turning the classifying formulas generated by the learning step into a function that generates a lower energy when it is more likely that its input represents a software error. This function is translated into the problem Hamiltonian of an adiabatic quantum computation (AQC). The AQC allows all potential software errors (indeed, as we will see, all possible operations of the software) to be examined in quantum-parallel, returning only the best candidates for errors which correspond to the lowest values of the classification function.

Both the learning and testing steps make use of AQC. An adiabatic quantum algorithm encodes the desired result in the ground state of some problem Hamiltonian. The computation is then performed by initializing a physical system in the easily prepared ground state of a simpler Hamiltonian, then slowly changing the control parameters of the system so that the system undergoes an adiabatic evolution to the ground state of the difficult-to-solve problem Hamiltonian [21, 22]. The adiabatic model of quantum computation is known to be universal and equivalent to the circuit model with a polynomial conversion overhead [23, 24]. While it is not known at this time how to make AQC fault tolerant, several error correction and prevention protocols have been proposed for AQC [25, 26], and it is known to exhibit a certain degree of natural robustness [27, 28].

In this article, Sect. 2 will begin by establishing the framework through which the quantum V&V problem is attacked, and by defining the programming errors we seek to eliminate. As we proceed with the development of a method for V&V using quantum resources, Sect. 3 will establish an implementation of the learning step as an adiabatic quantum algorithm. We develop conditions for ideal boosting and an alternate quantum learning algorithm in Sect. 4. The testing step will be detailed in Sect. 5. We present simulated results of the learning step on a sample problem in Sect. 6, and finish with our conclusions and suggestions for future work in Sect. 7.

## 2 Formalization

In this section we formalize the problem of software error detection by first introducing the relevant vector spaces and then giving a criterion for the occurrence of an error.

### 2.1 Input and output spaces

### 2.2 Recognizing software errors

#### 2.2.1 Validity domain and range

^{1}The implemented program \(P\) should ideally compute the exact same function. In reality it may not. With this in mind, the simplest way to identify a software error is to find an input vector \(\mathbf{x}_\mathrm{in}\) such that

#### 2.2.2 Specification and implementation sets

As stated, (6) is impractical since it requires knowledge of the complete structure of the intended input and output spaces. Instead, we can also use the specification and implementation sets to give a correctness criterion for a given input–output pair:

**Definition 1**

Input–output vectors satisfying (7) are the manifestation of software errors (“bugs”) and their identification is the main problem we are concerned with here. Conversely, we have

**Definition 2**

Input–output vectors satisfying (8) belong to the “don’t-worry” class. The two other possibilities belong to the “don’t-care” class:

**Definition 3**

**Definition 4**

Note that Eq. (5) implies that the vector is erroneous and implemented, i.e., Definition 1. Indeed, let \(\mathbf{x}_\mathrm{in} = P(\mathbf{x}_\mathrm{in})\), i.e., \(\mathbf{x} = (\mathbf{x}_\mathrm{in},\mathbf{x}_\mathrm{out}) \in S\), but assume that \(\mathbf{x}_\mathrm{out} \ne \hat{\vec {x}}_\mathrm{out}\) where \(\hat{\vec {x}}_\mathrm{out} = \hat{P}(\mathbf{x}_\mathrm{in})\). Then \(\mathbf{x} \notin \hat{S}\), since \(\mathbf{x}_\mathrm{in}\) pairs up with \(\hat{\vec {x}}_\mathrm{out}\) in \(\hat{S}\). Conversely, Definition 1 implies Eq. (5). To see this, assume that \(\mathbf{x} = (\mathbf{x}_\mathrm{in},\mathbf{x}_\mathrm{out}) \in S\) but \(\mathbf{x} = (\mathbf{x}_\mathrm{in},\mathbf{x}_\mathrm{out}) \notin \hat{S}\). This must mean that \(\mathbf{x}_\mathrm{out} \ne \hat{\vec {x}}_\mathrm{out}\), again because \(\mathbf{x}_\mathrm{in}\) pairs up with \(\hat{\vec {x}}_\mathrm{out}\) in \(\hat{S}\). Thus Eq. (5) is in fact equivalent to Definition 1, but does not capture the other three possibilities captured by Definitions 2–4.

Definitions 1–4 will play a central role in our approach to quantum V&V.

#### 2.2.3 Generalizations

^{2}Formally, this would mean that Eq. (4) is replaced by

As a final general comment, we reiterate that a solution of the problem we have defined has implications beyond V&V. Namely, Definitions 1–4 capture a broad class of anomaly (or outlier) detection problems [8]. From this perspective the approach we detail in what follows can be described as “quantum anomaly detection”, and could be pursued in any application which requires the batch processing of a large data space to find a few anomalous elements.

## 3 Training a quantum software error classifier

In this section we discuss how to identify whether a given set of input–output pairs is erroneous or correct, and implemented or unimplemented, as per Definitions 1–4. To this end we shall require so-called *weak classifiers*, a *strong classifier*, a methodology to efficiently train the strong classifier, and a way to efficiently apply the trained strong classifier on all possible input–output pairs. Both the training step and the application step will potentially benefit from a quantum speedup.

### 3.1 Weak classifiers

We can now formally associate a weak classification with each vector in the input–output space.

**Definition 5**

Weak classification of \(\mathbf{x}\in \mathcal{V }\).

Weakly classified correct (WCC): a vector \(\mathbf{x}\) is WCC if \(h_{i}(\mathbf{x})>0\).

Weakly classified erroneous (WCE): a vector \(\mathbf{x}\) is WCE if \(h_{i}(\mathbf{x})<0\).

Clearly, there is an advantage to finding “ smart” weak classifiers, so as to minimize \(N\). This can be done by invoking heuristics, or via a systematic approach such as one we present below.

For each input–output pair \(\mathbf{x}\) we have a vector \(\mathbf{h}(\mathbf{x})=\left( h_{1}(\mathbf{x}),...,h_{N}(\mathbf{x})\right) \in \mathbb{R }^{N}\). Such vectors can be used to construct geometric representations of the learning problem, e.g., a convex hull encompassing the weak classifier vectors of clustered correct input–output pairs. Such a computational geometry approach was pursued in [29].

We are free to normalize each weak classifier so that \(h_{i}\in [-1/N,1/N]\) (the reason for this will become clear below). Given Definition 5 we choose the sign of each weak classifier so that \(h_{i}(\mathbf{x}_{s})<0\) for all erroneous training data, while \(h_{i}(\mathbf{x}_{s})>0\) for all correct training data. Each point \(\mathbf{h}(\mathbf{x}_{s})\in [-1/N,1/N]^{N}\) (a hypercube) has associated with it a label \(y_{s}\) which indicates whether the point is correct or erroneous. The convex hull approach to V&V [29] assumes that correct training points \(\mathbf{h}(\mathbf{x}_{s})\) cluster. Such an assumption is not required in our approach.

### 3.2 Strong classifier

We would like to combine all the weak classifiers into a single “strong classifier” which, given an input–output pair, will determine that pair’s correctness or erroneousness. The problem is that we do not know in advance how to rank the weak classifiers by relative importance. We can formally solve this problem by associating a weight \(w_{i}\in \mathbb{R }\) with each weak classifier \(h_{i}\). The problem then becomes how to find the optimal set of weights, given the training set.

The process of creating a high-performance strong classifier from many less accurate weak classifiers is known as boosting in the machine learning literature. Boosting is a known method for enhancing to arbitrary levels the performance of known sets of classifiers that exhibit weak learnability for a problem, i.e., they are accurate on more than half of the training set [20, 31]. The most efficient method to combine weak classifiers into a strong classifier of a given accuracy is an open question, and there are many competing algorithms available for this purpose [32, 33]. Issues commonly considered in the development of such algorithms include identification of the data features that are relevant to the classification problem at hand [34, 35] and whether or not provisions need to be taken to avoid overfitting to the training set (causing poor performance on the general problem space) [36, 37]. We use an approach inspired by recent quantum boosting results on image recognition [6, 17–19]. This approach has been shown to outperform classical boosting algorithms in terms of accuracy (but not speed) on selected problems, and has the advantage of being implementable on existing quantum optimization hardware [38–41].

Since we shall map the \(w_{i}\) to qubits we use binary weights \(w_{i} \in \{0,1\}\). It should be straightforward to generalize our approach to a higher resolution version of real-valued \(w_{i}\) using multiple qubits per weight.

**Definition 6**

Strong classification of \(\mathbf{x}\in \mathcal{V }\).

Strongly classified correct (SCC): a vector \(\mathbf{x}\) is SCC if \(Q_{\mathbf{w}}(\mathbf{x})=+1\).

Strongly classified erroneous (SCE): a vector \(\mathbf{x}\) is SCE if \(Q_{\mathbf{w}}(\mathbf{x})=-1\).

### 3.3 The formal weight optimization problem

Let \(\mathbf H \left[ z\right] \) denote the Heaviside step function, i.e., \(\mathbf H \left[ z\right] =0\) if \(z<0\) and \(\mathbf H \left[ z\right] =1\) if \(z\ge 0\). Thus \(\mathbf H \left[ -y_{s}Q_{\mathbf{w}}(\mathbf{x}_{s})\right] =1\) if the classification of \(\mathbf{x}_{s}\) is wrong, but \(\mathbf H \left[ -y_{s}Q_{\mathbf{w}}(\mathbf{x}_{s})\right] =0\) if the classification of \(\vec {x}_{s}\) is correct. In this manner \(\mathbf H \left[ -y_{s}Q_{\mathbf{w}}(\mathbf{x}_{s})\right] \) assigns a penalty of one unit for each incorrectly classified input–output pair.

### 3.4 Relaxed weight optimization problem

Unfortunately, the formulation of (18) is unsuitable for adiabatic quantum computation because of its discrete nature. In particular, the evaluation of the Heaviside function is not amenable to a straightforward implementation in AQC. Therefore, following [6], we now relax it by introducing a quadratic error measure, which will be implementable in AQC.

Let \(\mathbf{y}=(y_{1},...,y_{|\mathcal T |})\in \{-1,1\}^{|\mathcal T |}\) and \(\mathbf{R}_{\mathbf{w}} =(R_{\mathbf{w}}(\mathbf{x}_{1}),...,R_{\mathbf{w}}(\mathbf{x}_{|\mathcal T |}))\in [-1,1]^{|\mathcal T |}\). The vector \(\mathbf{y}\) is the ordered label set of correct/erroneous input–output pairs. The components \(R_{\mathbf{w}}(\mathbf{x})\) of the vector \(\mathbf{R} _{\mathbf{w}}\) already appeared in the strong classifier (15). There we were interested only in their signs and in Eq. (16) we observed that if \(y_{s}R_{\mathbf{w}}(\mathbf{x}_{s})<0\) then \(\mathbf{x}_{s}\) was incorrectly classified, while if \(y_{s}R_{\mathbf{w}}(\mathbf{x}_{s})>0\) then \(\mathbf{x}_{s}\) was correctly classified.

We can consider a relaxation of the formal optimization problem (18) by replacing the counting of incorrect classifications by a sum of the values of \(y_{s}R_{\mathbf{w}}(\mathbf{x}_{s})\) over the training set. This seems reasonable since we have normalized the weak classifiers so that \(R_{\mathbf{w}}(\mathbf{x})\in [-1,1]\), while each label \(y_{s}\in \{-1,1\}\), so that all the terms \(y_{s}R_{\mathbf{w}}(\mathbf{x}_{s})\) are in principle equally important. In other words, the inner product \(\mathbf{y}\cdot \mathbf{R}_{\mathbf{w}}=\sum _{s=1}^{|\mathcal T |}y_{s}R_{\mathbf{w}}(\mathbf{x}_{s})\) is also a measure of the success of the classification, and maximizing it (making \(\mathbf{y}\) and \(\mathbf{R}_{\mathbf{w}}\) as parallel as possible) should result in a good training set.

### 3.5 From QUBO to the Ising Hamiltonian

In this final form (Eq. 25), involving only one and two-qubit \(Z_{i}\) terms, the problem is now suitable for implementation on devices such as D-Wave’s adiabatic quantum optimization processor [19, 39].

In Sect. 4.4 we shall formulate an alternative weight optimization problem, based on a methodology we develop in Sect. 4 for pairing weak classifiers to guarantee the correctness of the strong classifier.

### 3.6 Adiabatic quantum computation

It should be noted that while the number of weak classifiers that can be selected from using this algorithm may appear to be limited by the number of qubits available for processing, this is not in fact the case. By performing multiple rounds of optimization, each time filling in the spaces left by classifiers that were assigned weight \(0\) in the previous round, an optimized group of \(N\) weak classifiers can be assembled. If the performance of the strong classifier is unsatisfactory with \(N\) weak classifiers, multiple groups of \(N\) found in this manner may be used together.

*actual*state \(|\psi (t_F)\rangle \) obtained under quantum evolution subject to \(H(t)\) with respect to the

*desired*final ground state \(|\phi (t_F)\rangle \). More precisely, \(|\psi (t)\rangle \) is the solution of the time-dependent Schrödinger equation \(\partial |\psi (t)\rangle /\partial t = -iH(t)|\psi (t)\rangle \) (in \(\hbar \equiv 1\) units), and \(|\phi (t)\rangle \) is the instantaneous ground state of \(H(t)\), i.e., the solution of \(H(t)|\phi (t)\rangle = E_0(t)|\phi (t)\rangle \), where \(E_0(t)\) is the instantaneous ground state energy [the smallest eigenvalue of \(H(t)\)]. The parameter \(\epsilon \), \(0\le \epsilon \le 1\), measures the quality of the overlap between \(|\psi (t_F)\rangle \) and \(|\phi (t_F)\rangle \), \(\dot{H}\) is the derivative with respect to the dimensionless time \(t/t_F\), \(\varDelta \) is the minimum energy gap between the ground state \(|\phi (t)\rangle \) and the first excited state of \(H(t)\) (i.e., the difference between the two smallest equal-time eigenvalues of \(H(t)\), for \(t\in [0,t_F]\)), The values of the integers \(\alpha \) and \(\beta \) depend on the assumptions made about the boundary conditions and differentiability of \(H(t)\) [43–45]; typically \(\alpha \in \{0,1,2\}\), while \(\beta \) can be tuned between \(1\) and arbitrarily large values, depending on boundary conditions determining the smoothness of \(H(t)\) (see, e.g., Theorem 1 in Ref. [45]). The crucial point is that the gap \(\varDelta \) depends on \(N\), typically shrinking as \(N\) grows, while the numerator \(\Vert \dot{H}\Vert \) typically has a mild \(N\)-dependence (bounded in most cases by a function growing as \(N^2\) [45]). Consequently a problem has an efficient, polynomial time solution under AQC if \(\varDelta \) scales \(1/\mathrm{poly}(N)\). However, note that an inverse exponential gap dependence on \(N\) can still result in a speedup, as is the case, e.g., in the adiabatic implementation of Grover’s search problem [46, 47], where the speedup relative to classical computation is quadratic.

As for the problem we are concerned with here, finding the ground state of \(H_F\) as prescribed in Eq. (25) in order to find the optimal weight set for the (relaxed version of the) problem of training a software error-classifier, it is not known whether it is amenable to a quantum speedup. A study of the gap dependence of our Hamiltonian \(H(t)\) on \(N\), which is beyond the scope of the present work, will help to determine whether such a speedup is to be expected also in the problem at hand. A related image processing problem has been shown numerically to require fewer weak classifiers than in comparable classical algorithms, which gives the strong classifier a lower Vapnik-Chernovenkis dimension and therefore a lower generalization error [7, 18]. Quantum boosting applied to a different task, \(30\)-dimensional clustering, demonstrated increasingly better accuracy as the overlap between the two clusters grew than that exhibited by the classical AdaBoost algorithm [6]. More generally, numerical simulations of quantum adiabatic implementations of related hard optimization problems (such as Exact Cover) have shown promising scaling results for \(N\) values of up to 128 [22, 40, 48]. We shall thus proceed here with the requisite cautious optimism.

## 4 Achievable strong classifier accuracy

We shall show in this section that it is theoretically possible to construct a perfect, 100 % accurate majority-vote strong classifier from a set of weak classifiers that are more than 50 % accurate—if those weak classifiers relate to each other in exactly the right way. Our construction in this section is analytical and exact; we shall specify a set of conditions weak classifiers should satisfy for perfect accuracy of the strong classifier they comprise. We shall also show how to construct an imperfect strong classifier, with bounded error probability, by a relaxation of the conditions we shall impose on the weak classifiers. We expect the quantum algorithm to find a close approximation to this result.

Consider a strong classifier with a general binary weight vector \(\mathbf{w}\in \{0,1\}^N\), as defined in Eq. (14). Our approach will be to show that the strong classifier in Eq. (14) is completely accurate if a set of three conditions is met. The conditions work by using pairs of weak classifiers which both classify some \(\mathbf{x}\) correctly and which disagree for all other \(\mathbf{x}\). An accurate strong classifier can be constructed by covering the entire space \(\mathcal{V }\) with the correctly classifying portions of such weak classifier pairs.

### 4.1 Conditions for complete classification accuracy

**Condition 1**

\(P[\omega ]\) denotes the probability of event \(\omega \). We use a probabilistic formulation for our conditions since we imagine the input–output space \(\mathcal{V }\) to be very large and accessed by random sampling.

**Condition 2**

**Condition 3**

It is possible to substitute a similar Condition 3a for the above Condition 3 to create a different, yet also sufficient set of conditions for a completely accurate strong classifier. The number of weak classifiers required to satisfy the alternate set of conditions is expected to be smaller than the number required to satisfy the original three conditions. This is due to the fact that the modified conditions make use of one standalone weak classifier to cover a larger portion of the space correctly than is possible with a pair of weak classifiers.

**Condition 3a**

### 4.2 Perfect strong classifier theorem

We will now prove that any strong classifier satisfying Conditions 1–3, or 1–3a, is completely accurate.

**Lemma 1**

*Proof*

Recall that if the majority of the votes given by the weak classifiers comprising a given strong classifier is true then the input–output vector being voted on receives a strong classification that is true (Eq. 34), and that if this is the case for all input–output vectors then the strong classifier is perfect (Eq. 31). We are now in a position to state that this is the case with certainty provided the weak classifiers belong to the set \(\mathcal{J }\) defined by the conditions given above.

**Theorem 1**

A strong classifier comprised solely of a set of weak classifiers satisfying Conditions 1–3 is perfect.

*Proof*

^{3}because the probability of an event cannot be greater than \(1\).

**Theorem 2**

A strong classifier comprised solely of a set of weak classifiers satisfying Conditions 1, 2, and 3a is perfect.

*Proof*

### 4.3 Imperfect strong classifier theorem

Because the three conditions on the set \(\mathcal{J }\) of weak classifiers guarantee a completely accurate strong classifier, errors in the strong classifier must mean that the conditions are violated in some way. For instance, Condition 2 could be replaced by a weaker condition which allows for more than minimum overlap of vectors \(\mathbf{x}\) categorized correctly by both weak classifiers in a pair.

**Condition 2a**

The quantity \(\epsilon _{jj^{\prime }}\) is a measure of the “overlap error”. We can use it to prove relaxed versions of Lemma 1 and Theorem 1.

**Lemma 1a**

*Proof*

We can now replace Theorem 1 by a lower bound on the success probability when Condition 2 is replaced by the weaker Condition 2a. Let us first define an imperfect strong classifier as follows:

**Definition 7**

A strong classifier is \(\epsilon \)-perfect if, for \(\mathbf{x}\in \mathcal{V }\) chosen uniformly at random, it correctly classifies \(\mathbf{x}\) [i.e., \(Q_{\mathbf{w}}(\mathbf{x}) = y(\mathbf{x})\)] with probability at least \(1-\epsilon \).

**Theorem 2**

A strong classifier comprised solely of a set of weak classifiers satisfying Conditions 1, 2a and 3 is \(\epsilon \)-perfect, where \(\epsilon = \sum _{(j,j^{\prime })\in \mathcal{J }}\epsilon _{jj^{\prime }}\).

*Proof*

### 4.4 An alternate weight optimization problem

*pairs*of weak classifiers, rather than singles, which can be constructed using elements of the set \({\mathcal{A }}{\times }{\mathcal{A }}\), with \(\mathcal{A }\) as defined in Condition 1. We define the

*ideal*pair weight as

*a priori*, we shall define a QUBO whose solutions \(w_{ij}\in \{0,1\}\), with \((i,j)\in {\mathcal{A }}{\times }{\mathcal{A }}\), will be an approximation to the ideal pair weights \(\tilde{w}_{ij}\). In the process, we shall map the pair weight bits \(w_{ij}\) to qubits. Each \(w_{ij}\) determines whether its corresponding pair of weak classifiers, \(h_i\) and \(h_j\), will be included in the new strong classifier, which can thus be written as:

*a priori*; they are found in our approach via the solution of a QUBO, which we set up as follows:

## 5 Using strong classifiers in quantum-parallel

Now let us suppose that we have already trained our strong classifier and found the optimal weight vector \(\mathbf{w}^\mathrm{opt}\) or \(\mathbf{w}^\mathrm{opt}_\mathrm{pair}\). For simplicity we shall henceforth limit our discussion to \(\mathbf{w}^\mathrm{opt}\). We can use the trained classifier to classify new input–output pairs \(\mathbf{x}\notin \mathcal T \) to decide whether they are correct or erroneous. In this section we shall address the question of how we can further obtain a quantum speedup in exhaustively testing *all* exponentially many (\(2^{N_\mathrm{in}+N_\mathrm{out}}\)) input–output pairs \(\mathbf{x}\). The key observation in this regard is that if we can formulate software error testing as a minimization problem over the space \(\mathcal{V }\) of all input–output pairs \(\mathbf{x}\), then an AQC algorithm will indeed perform a quantum-parallel search over this entire space, returning as the ground state an erroneous state.

### 5.1 Using two strong binary classifiers to detect errors

*specification classifier*is the binary classifier developed in Sect. 3. Ideally, it behaves as follows:

*implementation classifier*, determines whether or not an input–output vector is in the program as implemented. It is constructed in the same way as \(Q_{\mathbf{w}}(\mathbf{x})\), but with its own appropriate training set. Ideally, it behaves as follows:

### 5.2 Formal criterion

Given the results of the classifiers \(Q^\mathrm{opt}(\mathbf{x})\) and \(T^\mathrm{opt}(\mathbf{x})\) for any vector \(\mathbf{x}\), the V&V task of identifying whether or not \(\mathbf{x}\in (S\cap \lnot \hat{S})\) reduces to the following. Any vector \(\mathbf{x}\) is flagged as erroneous and implemented if \(Q^\mathrm{opt}(\mathbf{x})+T^\mathrm{opt}(\mathbf{x})=-2\). We stress once more that, due to our use of the relaxed optimization to solve for \(\mathbf{w}^\mathrm{opt}\) and \(\mathbf{z}^\mathrm{opt}\), a flagged \(\mathbf{x}\) may in fact be neither erroneous nor implemented, i.e., our procedure is susceptible to both false positives and false negatives.

### 5.3 Relaxed criterion

*Case 1:*\(\mathbf{x}\notin \hat{S}\) and \(\mathbf{x}\in S\)

The vector \(\mathbf{x}\) is an error implemented in the program and manifests a software error. These vectors gain negative weight from both classifiers \(R_{\mathbf{w}^\mathrm{opt}}\) and \(U_{\mathbf{z}^\mathrm{opt}}\). Vectors falling under this definition should receive the lowest values of \(C^\mathrm{opt}\), if any such vectors exist.

*Case 2:*\(\mathbf{x}\in \hat{S}\) and \(\mathbf{x}\in S\)

The vector \(\mathbf{x}\) satisfies the don’t-worry condition, that is, it is a correct input–output string, part of the ideal program \(\hat{P}\). In this case, \(R_{\mathbf{w}^\mathrm{opt}}>0\) and \(U_{\mathbf{z}^\mathrm{opt}}<0\). In the programs quantum V&V is likely to be used for, with very infrequent, elusive errors, the specification and implementation will be similar and the negative weight of \(U_{\mathbf{z}^\mathrm{opt}}<0\) should be moderated enough by the positive influence of \(R_{\mathbf{w}^\mathrm{opt}}>0\) that don’t-worry vectors should not populate the lowest-lying states.

*Case 3:*\(\mathbf{x}\in \hat{S}\) and \(\mathbf{x}\notin S\)

The input portion of the vector \(\mathbf{x}\) is a don’t-care condition. It does not violate any program specifications, but is not important enough to be specifically addressed in the implementation. This vector will gain positive weight from both \(R_{\mathbf{w}^\mathrm{opt}}\) and \(U_{\mathbf{z}^\mathrm{opt}}\) and should therefore never be misidentified as an error.

*Case 4:*\(\mathbf{x}\notin \hat{S}\) and \(\mathbf{x}\notin S\)

The vectors \(\mathbf{x}\) in this category would be seen as erroneous by the program specification - if they ever occurred. Because they fall outside the program implementation \(S\), they are not the errors we are trying to find. This case is similar to the don’t-worry situation in that the two strong classifiers will have opposite signs, in this case \(R_{\mathbf{w}^\mathrm{opt}}<0\) and \(U_{\mathbf{z}^\mathrm{opt}}>0\). By the same argument as Definitions 2 and 4 vectors should not receive more negative values of \(C^\mathrm{opt}\) than the targeted errors.

### 5.4 Adiabatic implementation of the relaxed criterion

*some*error candidate, our procedure never generates false negatives. However, Cases 2 and 4 would correspond to false positives, if an input–output vector satisfying either one of these cases is found as the AQC output.

To ensure that errors which are members of the training set are never identified as ground states we construct the training set \(\mathcal T \) so that it only includes correct states, i.e., \(y_s=+1\)\(\forall s\). This has the potential drawback that the classifier never trains directly on errors. It is in principle possible to include errors in the training set (\(y_s =-1\)) by adding another penalty term to the strong classifier which directly penalizes such training set members, but whether this can be done without introducing many-body interactions in \(H_F\) is a problem that is beyond the scope of this work.

Also beyond the scope of this work is an analysis of the gap for the quantum testing step of the algorithm. The problems are simply too large to solve analytically, and too dependent on the outcome of the learning step to be easily attacked numerically. However, verification and validation of software is such an important problem that quantum formulations must be explored. Even if the scaling of the algorithm proves unfavorable for the V&V problem, other anomaly detection applications may exhibit better scaling due to different outcomes from the learning step.

### 5.5 Choosing the weak classifiers

Written in the form \(\sum _{i{=}1}^{N}(w_{i}^\mathrm{opt}{+}z_{i}^\mathrm{opt})h_{i}(\mathbf{x})\), the energy function \(C^\mathrm{opt}(\mathbf{x})\) is too general, since we haven’t yet specified the weak classifiers \(h_{i}(\mathbf{x})\). However, we are free to choose these so as to mold \(C^\mathrm{opt}(\mathbf{x})\) into a Hamiltonian that is physically implementable in AQC.

^{4}Much more efficient representations are possible under reasonable assumptions [50], but for the time being we shall not concern ourselves with these. In the example of the classifier (77) there are \(N_\mathrm{in}(N_\mathrm{in}-1)\) input bit combinations for each of the \(N_\mathrm{out}\) output bits. The number of different Boolean functions in this example, where \(\ell =2\), is \(2^{2^{2}}=16\). Thus the dimension of the “dictionary” of weak classifiers is

We wish to find a two-local quantum implementation for each \(h_i(\mathbf{x})\) in the dictionary. It is possible to find a two-local implementation for any three-local Hamiltonian using so-called “perturbation gadgets”, or three ancilla bits for each three-local term included [51], but rather than using the general method we rely on a special case which will allow us to use only one ancilla bit per three-local term. We first devise an *intermediate form* function using products of the same bits \(x_i\in \{0,1\}\) used to define the logical behavior of each weak classifier. This function will have a value of \(1\) when the Boolean relationship specified for \(h_i(\mathbf{x})\) is true, and \(-1\) otherwise. For example, consider function number 8, \(x_{i_3}==x_{i_1}\wedge x_{i_2}\), the AND function. Its intermediate form is \(4x_{i_1}x_{i_2}x_{i_3}-2\left(x_{i_3}+x_{i_1}x_{i_2}\right)+1\). For the bit values \((x_{i_1},x_{i_2},x_{i_3})=(0,0,0)\), the value of the intermediate function is \(1\), and the Boolean form is true: \(0\) AND \(0\) yields \(0\). If instead we had the bit values \((x_{i_1},x_{i_2},x_{i_3})=(0,0,1)\), the intermediate form would yield \(-1\), and the Boolean form would be false, because the value for \(x_{i_3}\) does not follow from the values for \(x_{i_1}\) and \(x_{i_2}\).

All \(16\) Boolean functions \(f_i\) of two binary variables, and their implementation form in terms of the Pauli matrices \(Z_{i_j}\) acting on single qubits or pairs of qubits \(j\in \{1,2,3\}\)

Function # | Boolean logic | Intermediate form | Implementation form |
---|---|---|---|

\(i=0\) | \(x_{i_{3}}==0\) | Not applicable | \(-Z_{i_3}\) |

\(i=1\) | \(x_{i_3}==\overline{\left( x_{i_1}\vee x_{i_2}\right) }\) | \(4\left(x_{i_1}x_{i_2}x_{i_3}-x_{i_1}x_{i_3}-x_{i_2}x_{i_3}\right)-2\left(x_{i_1}x_{i_2}-x_{i_1}-x_{i_2}-x_{i_3}\right)-1\) | \(Z_a\otimes Z_{i_3} -Z_{i_1}\otimes Z_{i_3} -Z_{i_2}\otimes Z_{i_3} \) |

\(i=2\) | \(x_{i_3}==\overline{x_{i_1}}\wedge x_{i_2}\) | \(4(-x_{i_1}x_{i_2}x_{i_3}+x_{i_2}x_{i_3})+2(-x_{i_3}+x_{i_1}x_{i_2}-x_{i_2})+1\) | \(-Z_a\otimes Z_{i_3}+Z_{i_2}\otimes Z_{i_3} - Z_{i_3}\) |

\(i=3\) | \(x_{i_3}==\overline{x_{i_1}}\) | Not applicable | \(-Z_{i_3}\otimes Z_{i_1}\) |

\(i=4\) | \(x_{i_3}==x_{i_1}\wedge \overline{x_{i_2}}\) | \(4(x_{i_1}x_{i_3}-x_{i_1}x_{i_2}x_{i_3})-2(x_{i_1}-x_{i_1}x_{i_2}+x_{i_3})+1\) | \(Z_{i_1}\otimes Z_{i_3}-Z_{a}\otimes Z_{i_3}-Z_{i_3}\) |

\(i=5\) | \(x_{i_3}==\overline{x_{i_2}}\) | Not applicable | \(-Z_{i_3}\otimes Z_{i_2}\) |

\(i=6\) | \(x_{i_3}==x_{i_1}\oplus x_{i_2}\) | \(-8x_{i_1}x_{i_2}x_{i_3} + 4(x_{i_1}x_{i_3}+x_{i_2}x_{i_3}+x_{i_1}x_{i_2})-2(x_{i_1}+x_{i_2}+x_{i_3})+1\) | \(-2Z_a \otimes Z_{i_3}+Z_{i_1}\otimes Z_{i_3}+Z_{i_2}\otimes Z_{i_3}-Z_{i_3}\) |

\(i=7\) | \(x_{i_3}==\overline{\left( x_{i_1}\wedge x_{i_2}\right) }\) | \(-4x_{i_1}x_{i_2}x_{i_3}+2(x_{i_3}+x_{i_1}x_{i_2})-1\) | \(-Z_a\otimes Z_{i_3}\) |

\(i=8\) | \(x_{i_3}==x_{i_1}\wedge x_{i_2}\) | \(4x_{i_1}x_{i_2}x_{i_3}-2(x_{i_3}+x_{i_1}x_{i_2})+1\) | \(Z_a\otimes Z_{i_3}\) |

\(i=9\) | \(x_{i_3}==\overline{\left( x_{i_1}\oplus x_{i_2}\right) }\) | \(8x_{i_1}x_{i_2}x_{i_3} -4(x_{i_1}x_{i_3}+x_{i_2}x_{i_3}+x_{i_1}x_{i_2})+2(x_{i_1}+x_{i_2}+x_{i_3})-1\) | \(2Z_a \otimes Z_{i_3}-Z_{i_1}\otimes Z_{i_3}-Z_{i_2}\otimes Z_{i_3}+Z_{i_3}\) |

\(i=10\) | \(x_{i_3}==x_{i_2}\) | Not applicable | \(Z_{i_3}\otimes Z_{i_2}\) |

\(i=11\) | \(x_{i_3}==\overline{x_{i_1}}\vee x_{i_2}\) | \(-4(x_{i_1}x_{i_3}-x_{i_1}x_{i_2}x_{i_3})+2(x_{i_1}-x_{i_1}x_{i_2}+x_{i_3})-1\) | \(-Z_{i_1}\otimes Z_{i_3}+Z_{a}\otimes Z_{i_3}+Z_{i_3}\) |

\(i=12\) | \(x_{i_3}==x_{i_1}\) | Not applicable | \(Z_{i_3}\otimes Z_{i_1}\) |

\(i=13\) | \(x_{i_3}==x_{i_1}\vee \overline{x_{i_2}}\) | \(-4(-x_{i_1}x_{i_2}x_{i_3}+x_{i_2}x_{i_3})-2(-x_{i_3}+x_{i_1}x_{i_2}-x_{i_2})-1\) | \(Z_a\otimes Z_{i_3}-Z_{i_2}\otimes Z_{i_3} + Z_{i_3}\) |

\(i=14\) | \(x_{i_3}==x_{i_1}\vee x_{i_2}\) | \(-4\left(x_{i_1}x_{i_2}x_{i_3}-x_{i_1}x_{i_3}-x_{i_2}x_{i_3}\right)+2\left(x_{i_1}x_{i_2}-x_{i_1}-x_{i_2}-x_{i_3}\right)+1\) | \(-Z_a\otimes Z_{i_3} +Z_{i_1}\otimes Z_{i_3} +Z_{i_2}\otimes Z_{i_3} \) |

\(i=15\) | \(x_{i_3}==1\) | Not applicable | \(Z_{i_3}\) |

We have reduced the dictionary functions from three-bit to two-bit interactions by adding an ancilla bit to represent the product of the two input bits involved in the function. Therefore, the maximum number of qubits needed to implement this set of weak classifiers on a quantum processor is \(Q=N_\mathrm{in}+N_\mathrm{out}+N_\mathrm{in}^2\). In practice, it is likely to be significantly less because not every three-bit correlation will be relevant to a given classification problem.

Let us now discuss how the penalty function is introduced. For example, consider again the implementation of weak classifier function \(i=8\), whose intermediate form involves three-qubit products, which we reduced to two-qubit interactions by including \(x_a\).

The inclusion of ancilla qubits tied to products of other qubits and their associated penalties need not interfere with the solution of the V&V problem, although the ancilla penalty terms must appear in the same final Hamiltonian as this optimization. If the ancilla penalty terms are made reasonably large, they will put any states in which the ancillas do not represent their intended products (states which are in fact outside of \(\mathcal{V }\)) far above the levels at which errors are found. For instance, consider an efficient, nearly optimal strong classifier closely approximating the conditions set forth in Sect. 4. Such a classifier makes its decision on the strength of two simultaneously true votes. If two such classifiers are added together, as in the verification problem, the lowest energy levels will have an energy near \(-4\). If the penalty on a forbidden ancilla state is more than a reasonable \(4\) units, such a state should be well clear of the region where errors are found.

This varied yet correlation-limited set of weak classifiers fits nicely with the idea of tracking intermediate spaces (Eq. 11), where we can use an intermediate space \(\mathcal{I }_{j}\) to construct a set of weak classifiers feeding into the next intermediate space \(\mathcal{I }_{j+1}\). This is further related to an obvious objection to the above classifiers, which is that they ignore any correlations involving four or more bits, without one-, two-, or three-bit correlations. By building a hierarchy of weak classifiers, for intermediate spaces, such correlations can hopefully be accounted for as they build up by keeping track instead of one-, two-, and three-bit terms as the program runs.

### 5.6 QUBO-AQC quantum parallel testing

How do we construct the AQC such that all input–output pairs \(\mathbf{x}\) are tested in parallel? This is a consequence of the adiabatic interpolation Hamiltonian (26), and in particular the initial Hamiltonian \(H_I\) of the type given in Eq. (27). The ground state of this positive semi-definite \(H_I\) is an equal superposition over all input–output vectors, i.e., \(H_I\sum _{\mathbf{x}\in \mathcal{V }}|\mathbf{x}\rangle = 0\), and hence when we implement the AQC every possible \(\mathbf{x}\) starts out as a candidate for the ground state. The final (Boltzmann) distribution of observed states strongly favors the manifold of low energy states, and by design these will be implemented erroneous states, if they exist.

## 6 Sample problem implementation

In order to explore the practicality of our two-step adiabatic quantum approach to finding software errors, we have applied the algorithm to a program of limited size containing a logical error. We did this by calculating the results of the algorithm assuming perfect adiabatic quantum optimization steps on a processor with few \((N<30)\) available qubits. Preliminary characterizations of the accuracy achievable using such an algorithm given a set of weak classifiers with certain characteristics are also presented.

### 6.1 The triplex monitor miscompare problem

The problem we chose to implement is a toy model of program design practices used in mission critical software systems.^{5} This program monitors a set of three redundant variables \(\{A_t,B_t,C_t\}\) for internal consistency. The variables could represent, e.g., sensor inputs, control signals, or particularly important internal program values. If one value is different from the other two over a predetermined number of snapshots in time \(t\), a problem in the system is indicated and the value of the two consistent redundant variables is propagated as correct. Thus the program is supposed to implement a simple majority-vote error-detection code.

We consider only the simplest case of two time snapshots, i.e., \(t=1,2\). As just explained, a correct implementation of the monitoring routine should fail a redundant variable \(A\), \(B\), or \(C\) if that *same* variable miscompares with both of the other variables in each of the two time frames. The erroneous implemented program we shall consider has the logical error that, due to a mishandled internal implementation of the miscompare tracking over multiple time frames, it fails a redundant variable any time there has been a miscompare in both time frames, even if the miscompare implicated a *different* variable in each time frame.

Logical bits and their significance in terms of variable comparison in the Triplex Miscompare problem

Bit | Significance |
---|---|

\(x_1\) | \(A_1 \ne B_1\) |

\(x_2\) | \(B_1 \ne C_1\) |

\(x_3\) | \(A_1 \ne C_1\) |

\(x_4\) | \(A_2 \ne B_2\) |

\(x_5\) | \(B_2 \ne C_2\) |

\(x_6\) | \(A_2 \ne C_2\) |

\(x_7\) | \(A\) failed |

\(x_8\) | \(B\) failed |

\(x_9\) | \(C\) failed |

In terms of Boolean logic, the two behaviors are as follows:

### 6.2 Implemented algorithm

The challenges before us are to train classifiers to recognize the behavior of both the program specification and the erroneous implementation, and then to use those classifiers to find the errors. These objectives have been programmed into a hybrid quantum-classical algorithm using the quantum techniques described in Sects. 3 and 5 and classical strategy refinements based on characteristics of available resources (for example, the accuracy of the set of available weak classifiers). The performance of this algorithm has been tested through computational studies using a classical optimization routine in place of adiabatic quantum optimization calls.

The algorithm takes as its inputs two training sets, one for the specification classifier and one for the implementation classifier. The two strong classifiers are constructed using the same method, one after the other, consulting the appropriate training set.

When constructing a strong classifier, the algorithm first evaluates the performance of each weak classifier in the dictionary over the training set. Weak classifiers with poor performance, typically those with over 40 % error, are discarded. The resulting, more accurate dictionary is fed piecewise into the quantum optimization algorithm.

Ideally, the adiabatic quantum optimization using the final Hamiltonian (25) would take place over the set of all weak classifiers in the modified, more accurate dictionary. However, the reality of quantum computation for some time to come is that the number of qubits available for processing will be smaller than the number of weak classifiers in the accurate dictionary. This problem is addressed by selecting random groups of \(Q\) classifiers (the number of available qubits) to be optimized together. An initial random group of \(Q\) classifiers is selected, the optimal weight vector \(\mathbf{q}^\mathrm{opt}\) is calculated by classically finding the ground state of \(H_F\), and the weak classifiers which receive weight \(0\) are discarded. The resulting spaces are filled in with weak classifiers randomly selected from the set of those which have not yet been considered, until all \(Q\) classifiers included in the optimization return a weight of \(1\). This procedure is repeated until all weak classifiers in the accurate dictionary have been considered, at which time the most accurate group of \(Q\) generated in this manner is accepted as the strong classifier for the training set in question. Clearly, alternative strategies for combining subsets of \(Q\) weak classifiers could be considered, such as genetic algorithms, but this was not attempted here.

The two strong classifiers of Eqs. (87) and (88) are summed as in Eq. (72) to create a final energy function that will push errors to the bottom part of the spectrum. This is translated to a final Hamiltonian \(H_F\) as in Eq. (84) and the result of the optimization (i.e., the ground state of this \(H_F\)) is returned as the error candidate. This portion of the algorithm makes it crucial to employ intelligent classical preprocessing in order to keep the length of the input and output vectors as small as possible, because each bit in the input–output vector corresponds to a qubit, and the classical cost of finding the ground state of \(H_F\) grows exponentially with the number of qubits.

### 6.3 Simulation results

Our simulation efforts have focused on achieving better accuracy from the two strong classifiers. If the strong classifiers are not highly accurate, the second part of the algorithm, the quantum-parallel use of the classifiers, will not produce useful results because input–output vectors the classifiers do not handle correctly could occupy the low-lying spectrum.

Both the average and minimum error for the specification and implementation classifiers are plotted in Figs. 6 and 7, respectively, as a function of \(\lambda \).

As shown in Figs. 6 and 7, while the average percent error for both classifiers hovered around 25 %, the best percent error was consistently just below 16 % for both the specification and implementation classifiers. The consistency suggests two things: that the randomness of the algorithm can be tamed by looking for the best outcome over a limited number of iterations, and that the sparsity parameter, \(\lambda \), did not have much effect on classifier accuracy.

For \(Q=16\) through \(Q=23\), the error fraction shown is for the best-performing classifier, selected from \(26\) iterations of the algorithm that were calculated using different values of \(\lambda \). The consistently observed lack of dependence on \(\lambda \) in these and other simulations (such as the \(50\)-iteration result presented above) justifies this choice. For \(Q=24\) to \(Q=26\), it was too computationally intensive to run the algorithm multiple times, even on a high performance computing cluster, so the values plotted are from a single iteration with \(\lambda \) assigned to zero. This was still deemed to be useful data given the uniformity of the rest of the simulation results with respect to \(\lambda \). The dependence on the parity of the number of qubits is a result of the potential for the strong classifier to return \(0\) when the number of weak classifiers in the majority vote is even. Zero is not technically a misclassification in that the classifier places the vector \(\mathbf{x}\) in the wrong class, but neither does the classifier give the correct class for \(\mathbf{x}\). Rather, we obtain a “ don’t-know” answer from the classifier, which we do not group with the misclassifications because it is not an outright error in classification. It is a different, less conclusive piece of information about the proper classification of \(\mathbf{x}\) which may in fact be useful for other applications of such classifiers.

The important conclusion to be drawn from the data quantifying strong classifier errors as a function of the number of available qubits is that performance seems to be improving only slightly as the number of available qubits increases. This may indicate that even with only \(16\) qubits, if the algorithm is iterated a sufficient number of times to compensate for its random nature, the accuracy achieved is close to the limit of what can be done with the current set of weak classifiers. This is encouraging in the context of strong classifier generation and sets a challenge for improving the performance of weak classifiers or breaking the problem into intermediate stages.

### 6.4 Comparison of results with theory

The problematic aspect of Fig. 9 is the vertical bars of white and black exhibited by some of the more accurate classifiers. The method detailed above for constructing a completely accurate strong classifier relies on pairs of classifiers which are correct where others fall short, and which do not both classify the same input–output vector incorrectly. This is impossible to find in the most accurate group of weak classifiers alone, given that there are black bars of erroneous classifications spanning the entire height of the set.

*minimum possible*overlap of correctly classified vectors for a pair of weak classifiers over \(\mathcal{V }\):

*actual*overlap of correct classifications:

If the minimum possible and actual overlaps are the same, i.e., \(\epsilon _{jj^{\prime }}=0\), then Condition 2 holds, and the weak classifier pair has minimum *correctness overlap*. Otherwise, if \(\phi _{jj^{\prime }}\ne \gamma _{jj^{\prime }}\), only the weaker Condition 2a is satisfied, so the weak classifier pair has a greater than minimal correctness overlap and a forced overlap of incorrect classifications \(\epsilon _{jj^{\prime }}>0\) (see Fig. 2) that could cancel out the correct votes of a different weak classifier pair and cause the strong classifier to be either incorrect or inconclusive.

Our numerical analysis of the weak classifiers satisfying Condition 1 (having \(\eta _j<0.5\)) showed that the average correctness overlap \(\gamma _{jj^{\prime }}\) between any two weak classifiers was \(0.3194\). The maximum correctness overlap for any pair of weak classifiers was \(\gamma _{jj^{\prime }}=0.6094\). The minimum was \(\gamma _{jj^{\prime }}=0.1563\), between two weak classifiers with respective error fractions (amount of the training set misclassified by each individual weak classifier) of \(\eta _j=0.4844\) and \(\eta _{j^{\prime }}=0.4531\). Compare this to the minimum possible overlap with two such classifiers, \(\phi _{jj^{\prime }}=0.0625\), and it becomes apparent that this set of weak classifiers falls short of ideal, given that \(\epsilon _{jj^{\prime }}=0.0938\) for the weak classifier pair with minimum overlap.

When only the most accurate weak classifiers (\(\eta _j=0.3906\); above the top red horizontal line in Fig. 9) were included, the average correctness overlap was \(\gamma _{jj^{\prime }}=0.4389\), the maximum was \(\gamma _{jj^{\prime }}=0.6094\), and the minimum was \(\gamma _{jj^{\prime }}=0.3594\). In order to come up with a generous estimate for the accuracy achievable with this group of weak classifiers, we focus on the minimum observed correctness overlap. The minimum possible correctness overlap for two classifiers with \(\eta _j=0.3906\) is \(\phi _{jj^{\prime }}=0.2188\). With an ideal set of weak classifiers of error \(\eta _{j}=0.3906\) and correctness overlap \(\phi _{jj^{\prime }}=0.2188\), it would take seven weak classifiers to construct a completely accurate strong classifier: three pairs of two classifiers each to cover a fraction \(0.6564\) of the solution space with a correctness overlap from one of the pairs, and one more weak classifier to provide the extra correct vote on the remaining \(0.3436\) fraction of the space. Assuming that three pairs of weak classifiers with minimum overlap and optimal relationships to the other weak classifier pairs could be found, there will still be a significant error due to the overlap fractions of the pairs being larger than ideal. In fact, each pair of weak classifiers yields an error contribution of \(\epsilon _{jj^{\prime }}=0.1406\), guaranteeing that a fraction \(3\epsilon _{jj^{\prime }}=0.4218\) of the input–output vectors will be classified incorrectly by the resulting strong classifier. This is not far from the simulation results for odd-qubit strong classifiers (Fig. 8, left), which suggests that the algorithm currently in use is producing near-optimal results for the dictionary of weak classifiers it has access to.

## 7 Conclusions

We have developed a quantum adiabatic machine learning approach and applied it to the problem of training a quantum software error classifier. We have also shown how to use this classifier in quantum-parallel on the space of all possible input–output pairs of a given implemented software program \(P\). The training procedure involves selecting a set of weak classifiers, which are linearly combined, with binary weights, into two strong classifiers.

The first quantum aspect of our approach is an adiabatic quantum algorithm which finds the optimal set of binary weights as the ground state of a certain Hamiltonian. We presented two alternatives for this algorithm. The first, inspired by [6, 17], gives weight to single weak classifiers to find an optimal set. The second algorithm for weak classifier selection chooses pairs of weak classifiers to form the optimal set and is based on a set of sufficient conditions for a completely accurate strong classifier that we have developed.

The second quantum aspect of our approach is an explicit procedure for using the optimal strong classifiers in order to search the entire space of input–output pairs in quantum-parallel for the existence of an error in \(P\). Such an error is identified by performing an adiabatic quantum evolution, whose manifold of low-energy final states favors erroneous states.

A possible improvement of our approach involves adding intermediate training spaces, which track intermediate program execution states. This has the potential to fine-tune the weak classifiers, and overcome a limitation imposed by the desire to restrict our Hamiltonians to low-order interactions, yet still account for high-order correlations between bits in the input–output states.

An additional improvement involves finding optimal interpolation paths \(s(t)\) (26) from the initial to the final Hamiltonian [52, 53], for both the classifier training and classifier implementation problems.

We have applied our quantum adiabatic machine learning approach to a problem with real-world applications in flight control systems, which has facilitated both algorithmic development and characterization of the success of training strong classifiers using a set of weak classifiers involving minimal bit correlations.

Random number generation may appear to be a counterexample, as it is multi-valued, but only over different calls to the random-number generator.

One important consideration is that, as we shall see below, for practical reasons we may only be able to track errors at the level of one-bit errors and correlations between bit-pairs. Such limited tracking can be alleviated to some extent by using intermediate spaces, where higher order correlations between bits appearing at the level of the output space may not yet have had time to develop.

This inequality reflects the fact that for \(n\) overlapping sets, \(P\left[\bigcup _{i=1}^{n}s_i\right]=\sum _{i=1}^{n}P[s_i]-\sum _{i\ne j}P[s_i\cap s_j] + \sum _{i\ne j \ne k}P[s_i\cap s_j \cap s_k] - \sum _{i\ne j\ne k \ne m}P[s_i\cap s_j \cap s_k \cap s_m]+\dots \) Each term is larger than the next in the series; \(n+1\) sets cannot intersect where \(n\) sets do not. Our truncation of the series is greater than or equal to the full value because we stop after a subtracted term.

Any Boolean function of \(\ell \) variables can be uniquely expanded in the form \(f_i(x_1,\dots ,x_\ell ) = \sum _{\alpha =0}^{2^\ell -1} \epsilon _{i\alpha } s_{\alpha }\), where \(\epsilon _{i\alpha }\in \{0,1\}\) and \(s_\alpha \) are the \(2^\ell \) “simple” Boolean functions \(s_0 = x_1 x_2 \cdots x_\ell \), \(s_1 = x_1 x_2 \cdots \overline{x_\ell }\), \(\dots \), \(s_{2^\ell -1} = \overline{x_1}\, \overline{x_2} \cdots \overline{x_\ell }\), where \(\overline{x}\) denotes the negation of the bit \(x\). Since each \(\epsilon _{i\alpha }\) can assume one of two values, there are \(2^{2^\ell }\) different Boolean functions.

We are grateful to Greg Tallant from the Lockheed Martin Corporation for providing us with this problem as an example of interest in flight control systems.

## Acknowledgments

The authors are grateful to the Lockheed Martin Corporation for financial support under the URI program. KP is also supported by the NSF under a graduate research fellowship. DAL acknowledges support from the NASA Ames Research Center.