Encyclopedia of Machine Learning and Data Mining

Living Edition
| Editors: Claude Sammut, Geoffrey I. Webb

Quantum Machine Learning

  • Maria SchuldEmail author
  • Francesco Petruccione
Living reference work entry
DOI: https://doi.org/10.1007/978-1-4899-7502-7_913-1

Abstract

Quantum machine learning is a young research area investigating which consequences the emerging technology of quantum computing has for machine learning. This article introduces into basic concepts of quantum information and summarises some major strategies of implementing machine learning algorithms on a quantum computer.

Keywords

Quantum System Quantum Algorithm Boltzmann Machine Qubit System Probabilistic Description 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Definition

Quantum machine learning (QML) is a subdiscipline of quantum information processing research, with the goal of developing quantum algorithms that learn from data in order to improve existing methods in machine learning. A quantum algorithm is a routine that can be implemented on a quantum computer, a device that exploits the laws of quantum theory in order to process information.

A number of quantum algorithms have been proposed for various machine learning models such as neural networks, support vector machines, and graphical models, some of which claim runtimes that under certain conditions grow only logarithmic with the size of the input space and/or dataset compared to conventional methods. A crucial point for runtime considerations is to find a procedure that efficiently encodes classical data into the properties of a quantum system. QML algorithms are often based on well-known quantum subroutines (such as quantum phase estimation or Grover search) or exploit fast annealing techniques through quantum tunneling and can make use of an exponentially compact representation of data through the probabilistic description of quantum systems.

Besides finding quantum algorithms for pattern recognition and data mining, QML also investigates more fundamental questions about the concept of learning from the perspective of quantum theory. Sometimes the definition of QML is extended by research that applies machine learning to quantum information, such as is frequently done when the full evolution or state of a quantum system has to be reconstructed from limited experimental data.

Motivation and Background

The accurate solution of many learning problems is known to be NP-hard, such as the training of Boltzmann machines or inference in graphical models. But also methods for which tractable algorithms are known suffer from the increasing size of datasets available in today’s applications. The idea behind QML is to approach these problems from the perspective of quantum information and harvest the power of quantum computers for applications in artificial intelligence and data mining.

The motivation to find quantum analogues for “classical” machine learning algorithms derives from the success of the dynamic research field of quantum information. Some speedups compared to the best or best-known classical algorithms have already been shown, the most prominent being Shor’s factorization algorithm (Shor 1997) (providing an exponential speedup compared to the best classical algorithm known) and Grover’s search algorithm for unsorted databases (Grover 1996) (providing a quadratic speedup to the best possible classical algorithm). Although it is still an open question whether “true” exponential speedups are possible, the number of quantum algorithms is constantly growing. Also the technological implementation of large-scale universal quantum computers makes steady progress, and many proof-of-principle experiments have confirmed the theoretical predictions (The reader can get a first impression of the current progress in Wikipedia’s “timeline of quantum computing” https://en.wikipedia.org/wiki/Timeline_of_quantum_computing.) The first realizations of quantum annealing devices, which solve a very specific type of optimization problem and are thus not universal, are already commercially available (e.g., http://www.dwavesys.com/).

Proposals that apply quantum computing to data mining in general and learning tasks in particular have been sporadically put forward since quantum computing became a well-established research area in the 1990s. A specifically large share of attention has been devoted to so-called quantum neural network models which simulate the behavior of artificial neural networks based on quantum information. They were initially motivated by questions of whether quantum mechanics can help to explain the functioning of our brain (Kak 1995) and vary in the degree of a rigorous application of quantum theory (Schuld et al. 2015). Since around 2012, there has been a rapid increase in other contributions to QML, consisting of proposals for quantum versions of hidden Markov models (Barry et al. 2014), Boltzmann machines (Wiebe et al. 2014; Adachi and Henderson 2015), belief nets (Low et al. 2014), support vector machines (Rebentrost et al. 2014), linear regression (Schuld et al. 2016), Gaussian processes (Zhao et al. 2015) and many more. Several collaborations between IT companies and academic institutions have been created and promise to advance the field of QML in future. For example, Google and NASA founded the Quantum Artificial Intelligence Lab in 2013, the University of Oxford and Nokia set up a Quantum Optimisation and Machine Learning program in 2015, and the University of Southern California collaborates with Lockheed Martin on machine learning applications through the Quantum Computation Center.

Quantum Computing

In order to present the major approaches to QML research below, it is necessary to introduce some basic concepts of quantum information. The interested reader shall be referred to the excellent introduction by Nielsen and Chuang (2010).

In conventional computers, the state of a physical system represents bits of information and is manipulated by the Newtonian laws of physics (e.g., the presence of a current in a circuit represents 0 and 1 and is manipulated by the laws of electrodynamics). A quantum computer follows a very similar concept, only that the underlying physical system is governed by the laws of quantum theory and is therefore called a quantum system.

Quantum theory is a mathematical apparatus describing physical objects on very small scales (i.e., electrons, atoms, photons). More precisely, it is a probabilistic description of the results of physical measurements on quantum systems, and although confirmed in many experiments, it shows a number of features distinct to classical or Newtonian mechanics. Quantum computers exploit these features through information processing based on the rules of quantum theory. Although a number of exciting results have been achieved, it is still unknown whether BQP, the class of decision problems solvable by a quantum computer in polynomial time, is larger than BPP, its classical analogue. In short, quantum computing is a very dynamic research area with many promising results and open questions.

The quantum information community uses a variety of computational models that have been shown to be equivalent, but which constitute different building blocks of universal quantum computation. The following will give a short introduction to the most influential model, the circuit model, to clarify important concepts on which QML algorithms are based.

The Concept of a Qubit

A central concept in the major quantum computational models is the qubit, an abstraction of a quantum system that has two possible configurations or states. As long as certain properties are fulfilled (DiVincenzo 2000), such a two-level system can have many possible physical realizations (just like bits may be encoded in currents of circuits or the pits and lands of CDs), for example, a hydrogen atom in the energetic ground or first excited state, the current in a superconducting circuit or the path a light photon chooses through a semitransparent mirror.

Qubits are often introduced as “bits that can be in states 0 and 1 at the same time,” which mystifies rather than explains the concept. In fact, qubits can be compared to a probabilistic description of a classical physical system with two different states, say a coin with the states “heads” and “tails.” As illustrated in Table 1, the probabilities p00, p01, p10, and p11 with \(\sum _{i}p_{i} = 1\) describe our expectation to get the respective result “head and head,” “head and tail,” “tail and head,” and “tail and tail” after tossing two coins. Note that the coin tosses do not necessarily need to be statistically independent events.
Quantum Machine Learning, Table 1

Probabilistic description of a classical system of two coins. Each of the four possible outcomes or configurations after tossing both coins is associated with a probability.

The probabilistic description of a qubit shows a significant difference (see Table 2). The four configurations “00,” “01,” “10,” and “11” of a two-qubit system such as two simplified atoms are each associated with a complex number called amplitude, and the probability of observing the two qubits in one of the four possible joint states is given by the absolute square of the amplitude. The sum of absolute squares of the amplitudes a i , i = 1, , 2 n of an n-qubit system consequently has to add up to one, \(\sum _{i}\vert a_{i}\vert ^{2} = 1\). In both the classical and the quantum case, once the coins or atoms are observed in one of the joint configurations, their state is fully determined, and repeated observations will only confirm the result. As will be explained below, this concept of complex amplitudes is central to quantum information and has up to the present – 100 years after the beginning of quantum theory – still not found a satisfying interpretation for our everyday intuition.
Quantum Machine Learning, Table 2

Important elements in the description of a two-qubit system. An example is two atoms that can each be in the ground and first excited state, so that the system has four possible abstract configurations. Quantum theory associates each configuration (or potential measurement outcome) with an amplitude, and the absolute square of the amplitude is the probability of measuring this state. In the mathematical notation, each configuration corresponds to a unit basis vector or, in Dirac notation, a Dirac basis state

Algorithmic Manipulations of Qubits

Information processing is about the manipulation of bits by elementary logic gates such as AND or XOR, and quantum information processing likewise needs to define elementary operations on qubit systems (of course derived from the laws of quantum theory), from which algorithms with a well-defined output can be constructed.

In a probabilistic description, manipulating information corresponds to a transformation of the system’s probability distribution. For example, in the case of the two coins, this could mean drawing a “heads” over the “tails” symbol, causing the coin to only toss “heads.” Using the mathematical language of Markov chains, changes of a classical probability distribution can be expressed by a linear transformation applied to the vector of probabilities, written as a stochastic matrix S = (s ij ) multiplied from the left. The stochastic matrix has the properties that its entries are nonnegative and all columns sum up to one, in order to guarantee that the resulting vector on the right side is again a probability distribution. In our two-coin example, this reads
$$\displaystyle{ S\left (\begin{array}{*{10}c} p_{00} \\ p_{01} \\ p_{10} \\ p_{11}\\ \end{array} \right ) = \left (\begin{array}{*{10}c} p_{00}^{{\prime}} \\ p_{01}^{{\prime}} \\ p_{10}^{{\prime}} \\ p_{11}^{{\prime}}\\ \end{array} \right ),\qquad \begin{array}{*{10}c} s_{ij} \geq 0, \\ \sum _{i}s_{ij} = 1. \end{array} }$$
(1)
For quantum systems, any physically possible evolution can be mathematically represented by a unitary matrix U = (u ij ) applied to the vector of amplitudes, which in the two-qubit example reads
$$\displaystyle{ U\left (\begin{array}{*{10}c} a_{00} \\ a_{01} \\ a_{10} \\ a_{11}\\ \end{array} \right ) = \left (\begin{array}{*{10}c} a_{00}^{{\prime}} \\ a_{01}^{{\prime}} \\ a_{10}^{{\prime}} \\ a_{11}^{{\prime}}\\ \end{array} \right ),\qquad \begin{array}{*{10}c} u_{ij} \in \mathbb{C}, \\ S^{\dag }S = 1. \end{array} }$$
(2)
A unitary matrix has orthogonal column vectors, guaranteeing that the resulting vector on the right side is again a quantum amplitude vector. Equation (2) describes in fact any possible closed evolution of a two-qubit system in quantum theory.
Quantum algorithms (as well as QML algorithms) are usually formulated using the Dirac notation, in which one decomposes the amplitude vector into a linear combination of unit vectors and rewrites the unit vectors as Dirac vectors:
$$\displaystyle\begin{array}{rcl} \mathbf{a} = a_{1}\left (\begin{array}{*{10}c} 1\\ 0\\ \vdots \\ 0\\ \end{array} \right )& +\mathop{\ldots } + a_{2^{n}}\left (\begin{array}{*{10}c} 0\\ 0\\ \vdots \\ 1\\ \end{array} \right )&{}\end{array}$$
(3)
$$\displaystyle\begin{array}{rcl} & \Updownarrow &{}\end{array}$$
(4)
$$\displaystyle\begin{array}{rcl} \left \vert \psi \right > = a_{1}\left \vert 0\ldots 0\right >& +\mathop{\ldots } + a_{2^{n}}\left \vert 1\ldots 1\right >.&{}\end{array}$$
(5)
Dirac notation is very handy as it visualizes the actual measurement result of the n qubits corresponding to an amplitude.
Similarly to elementary gates, the circuit model defines elementary unitary transformations as building blocks to manipulate the quantum state of a qubit system. For example, consider a single qubit described by the complex amplitude vector (a1, a2) T . If the quantum system is in state (1, 0) T , we know with certainty that a measurement will produce the state 0 (since the probability of measuring the 0 state is given by \(p_{0} = \vert a_{1}\vert ^{2} = 1.0\), while \(p_{1} = \vert a_{2}\vert ^{2} = 0.0\)). The unitary transformation
$$\displaystyle{U_{x} = \left (\begin{array}{*{10}c} 0&1\\ 1 &0 \end{array} \right )}$$
then transforms this state into (0, 1) T , which will certainly result in a measurement of state 1. U x hence effectively performs a bit flip or NOT gate on the state of the qubits. In a similar fashion, other quantum gates can be defined that together form a set of universal gates for quantum computation.

Why Is Quantum Computing Different?

Returning to the question why complex amplitudes change the rules of classical information processing, consider another elementary quantum gate that has no classical equivalent since it cannot be expressed as a stochastic matrix with positive entries. The Hadamard gate
$$\displaystyle{U_{H} = \frac{1} {\sqrt{2}}\left (\begin{array}{*{10}c} 1& 1\\ 1 &-1 \end{array} \right ),}$$
will, applied to a state (1, 0) T , produce \(( \frac{1} {\sqrt{2}}, \frac{1} {\sqrt{2}})^{T}\), which is called a superposition of states 0 and 1. A classical equivalent would be a state of maximum uncertainty, as the probability of measuring the qubit in state 0 or 1 is \(\vert \frac{1} {\sqrt{2}}\vert ^{2} = \frac{1} {2}\) each. However, the difference of a superposition becomes apparent when applying U H once more, which transforms the state back into (1, 0) T as the minus in U H cancels the two amplitudes with each other when calculating the second entry of the resulting amplitude vector. In other words, amplitudes can annihilate each other, a phenomenon called interference which is often mentioned as the crucial resource of quantum computing. Beyond this illustration, the elegant theory of quantum Turing machines allows a more sophisticated comparison between quantum and classical computing (Deutsch 1985), but goes beyond our scope here.

Quantum Machine Learning Algorithms

Most existing QML algorithms solve problems of supervised or unsupervised pattern classification and regression, although first advancements to reinforcement learning have been made (e.g., Paparo et al. 2014). Given a (classical) dataset \(\mathcal{D}\) and a new instance \(\tilde{x}\) for which we would make a prediction, a QML algorithm usually consists of three parts: First, the input has to be encoded into a quantum system through a state preparation routine. Second, the quantum algorithm is executed by unitary transformations (Note that nonunitary evolutions are possible in so-called open quantum systems, but correspond to a unitary evolution of a larger system.) Third, the result is read out by measuring the quantum system (see Fig. 1). The encoding and readout steps are often the bottlenecks of a QML algorithm; for example, reading out an amplitude in a quantum state that is in a uniform superposition of all possibilities will on average take a number of measurements that is exponential in the number of qubits. In particular, claims of quantum algorithms that run in time logarithmic in the size of the dataset and input vectors often ignore the resources it takes for the crucial step of encoding the information carried by a dataset into a quantum system. Such algorithms can still be valuable for pure quantum information processing, i.e., if the “quantum data” is generated by previous routines or experiments.
Quantum Machine Learning, Fig. 1

Comparison of the basic scheme of classical (left) and quantum (center) machine learning algorithms for pattern classification, together with the operations on the quantum system (right). In order to solve machine learning tasks based on classical datasets, the quantum algorithm requires an information encoding and readout step that are in general highly non-trivial procedures, and it is important to consider them in the runtime

The QML algorithm and readout step depend heavily on the way information is encoded into the quantum system; one can distinguish three ways of information encoding into an n-qubit system:
  1. 1.

    Interpreting the possible measurement outcomes of a qubit system as a bit sequence.

     
  2. 2.

    Interpreting the amplitude vector as (i) a 2 n -dimensional classical real vector or (ii) a probability distribution over n binary variables.

     
  3. 3.

    Encoding the result to an optimization problem into the ground state (state of the lowest energy) of a quantum system.

     

These strategies help to distinguish different approaches to develop QML algorithms.

Associating Qubits with Bits

The most straightforward method of information encoding into quantum systems is to associate bits with qubits. For example, the two-qubit state (1, 0, 0, 0) T in the example in Table 2 represents the bit string [00], since the system has unit probability of being measured in the ‘00’ state.

To encode a full dataset in this fashion, it needs to be given in binary form, meaning that every feature vector (and, if applicable, its label) has been translated into an n-bit binary sequence. For example, the dataset \(\mathcal{D}\) [\(\mathcal{D} = \left \{\left (\begin{array}{*{10}c} 0\\ 0 \\ 1 \end{array} \right ),\left (\begin{array}{*{10}c} 0\\ 1 \\ 1 \end{array} \right ),\left (\begin{array}{*{10}c} 1\\ 1 \\ 0 \end{array} \right )\right \}\)] can be encoded into the quantum state \(\mathbf{a}_{\mathcal{D}} = \frac{1} {\sqrt{3}}(01010010)^{T}\):

In this case, the Dirac notation introduced above is helpful as it explicitly contains the encoded feature vectors:
$$\displaystyle{\left \vert \mathcal{D}\right > = \frac{1} {\sqrt{3}}(\left \vert 001\right > + \left \vert 011\right > + \left \vert 110\right >).}$$

An early example of a QML algorithm based on such a “quantum dataset” has been developed for pattern completion (finding feature vectors containing a given bit sequence) by an associative memory mechanism as known from Hopfield models (Ventura and Martinez 2000). The authors suggest a routine to construct the state \(\mathbf{a}_{\mathcal{D}}\) efficiently and use a modified Grover search algorithm, in which the amplitudes corresponding to the desired measurement outcomes are marked in one single step, after which the amplitudes of the marked states are amplified. The resulting quantum state has a high probability of being measured in one of the basis states containing the desired bit sequence.

An example of a QML algorithm for supervised pattern classification is a quantum version of k-nearest neighbor (Schuld et al. 2014b). Beginning with a superposition as in Eq. (6) where some selected qubits encode the class label, the idea is to weigh the amplitudes by the Hamming distance between each corresponding training vector and the new input. Only the “class-label qubits” get measured, so that close inputs contribute more to the probability of measuring their class label than distant ones. An alternative is presented by Wiebe et al. (2015), who also prepare a quantum state with distance-weighted amplitudes and then performed a subroutine based on the Grover search to find the basis state representing the closest neighbor.

Encoding Information into Amplitudes

Another way to encode information is to associate the quantum amplitude vector with a real classical vector:
$$\displaystyle{\left (\begin{array}{*{10}c} a_{1}\\ \vdots \\ a_{2^{n}} \end{array} \right ) \leftrightarrow \left (\begin{array}{*{10}c} x_{1}\\ \vdots \\ x_{2^{n}} \end{array} \right ),\sum \limits _{i}\vert x_{i}\vert ^{2} = 1,x_{ i} \in \mathbb{R}.}$$
Note that since amplitude vectors are normalized, the classical vector has to be preprocessed accordingly. A quantum system of n qubits can therefore in principle encode 2 n real numbers, which is an exponentially compact representation. There are some vectors for which state preparation can be done in time that grows only linear with the number of qubits, and if the QML algorithm and readout step have the same property, an algorithm which is logarithmic in the input dimension is found.

Two different strategies to use this encoding for QML can be distinguished, one that associates the amplitude vector with one or all feature vectors in order to use the power of eigenvalue decomposition inherent in the formalism of quantum theory, and the other in which amplitudes are used to encode classical probability distributions.

Quantum Eigenvalue Decompositions

An important branch of QML research is based on the intrinsic feature of quantum theory to evaluate eigenvalues of operators, which has been exploited in an important quantum algorithm for the solution of systems of linear equations (Harrow et al. 2009). The routine takes a quantum state described by the amplitude vector b which corresponds to the (normalized) right side of a classical linear system of equations Ax = b. Through a set of involved operations (including the Hamiltonian simulation of an operator corresponding to A, a quantum phase estimation algorithm and a selective measurement that has to be repeated until a certain result was obtained), the quantum state is transformed into \(\sum _{j}\lambda _{j}^{-1}\mathbf{u}_{j}^{T}\mathbf{b}\,\mathbf{u}_{j}\) with eigenvalues \(\lambda _{j}\) and eigenvectors u j of A, which equals the correct solution x. Due to the exponentially compact representation of information, the complexity of the algorithm depends only logarithmically on the size of b when we ignore the encoding and readout step. However, its running time depends sensibly on other parameters such as the condition number and sparsity of A, as well as the desired accuracy in the result. This makes the linear systems algorithm only applicable to very special problems (Aaronson 2015). QML researchers have tried to find such applications in different areas of machine learning that rely on matrix inversion.

The first full QML example exploiting the ideas of the linear systems algorithm was the quantum support vector machine (Rebentrost et al. 2014). The main idea is to take the dual formulation of support vector machines written as a least squares problem, in which a linear system of equations with the kernel matrix \(K_{ij} = \mathbf{x}_{i}\mathbf{x}_{j}\text{ and }\mathbf{x}_{i},\mathbf{x}_{j} \in \mathcal{D}\) has to be solved, and apply the above quantum routine. By making use of a trick, the linear systems algorithm can take a quantum state encoding K ij (instead of a quantum operator as in the original version). Creating a quantum version of K ij is surprisingly elegant if one can prepare a quantum state:

$$\displaystyle{ (x_{1}^{1},\ldots ,x_{ N}^{1},\ldots ,x_{ 1}^{M},\ldots ,x_{ N}^{M}) }$$
(6)

whose amplitudes encode the MN features of all training vectors x m = (x1 m , , x N m ) T m = 1, , M. The statistics of a specific subset of the qubits in state Eq. (6) include a covariance matrix (in quantum theory known as density matrix) that is entrywise equivalent to the kernel and which can be accessed by further processing.

Data fitting by linear regression has been approached by means of the quantum linear systems algorithm by Wiebe et al. (2012) to obtain the well-known least squares solution:
$$\displaystyle{\mathbf{w} = \mathbf{X}^{+}\mathbf{y}}$$
for the linear regression parameters w with the pseudoinverse \(\mathbf{X}^{+} = (\mathbf{X}^{+}\mathbf{X})^{-1}\mathbf{X}^{+}\) where the columns of \(\boldsymbol{X}\) are the training inputs. Schuld et al. (2016), propose another version of the quantum algorithm that is suited for prediction. The algorithm is based on a quantum computation of the singular value decomposition of X+ which in the end encodes the result of the prediction of a new input into the measurement result of a single qubit.

Other QML algorithms based on the principle of matrix inversion and eigenvalue estimation on a quantum computer have been proposed for Gaussian processes (Zhao et al. 2015) as well as to find topological and geometric features of data (Lloyd et al. 2016). The routines discussed here specify the core algorithm as well as the readout step in the scheme of Fig. 1 and are logarithmic in the dimension of the feature vectors. However, they leave the crucial encoding step open, which might merely “hide” the complexity for all but some selected problems as has been critically remarked by Aaronson (2015).

Quantum Probability Distributions

Since quantum theory defines probability distributions over measurement results, it is immediately apparent that probability distributions over binary variables can very genuinely be represented by the amplitudes of a qubit system.

More precisely, given n random binary variables, an amplitude vector can be used to encode the square roots of 2 n probabilities of the different realizations of these variables. For example, the probability distribution over the possible results of the two-coin toss in Table 1 could be encoded into an amplitude vector \(\left (\sqrt{p_{00}},\sqrt{p_{01}},\sqrt{p_{10}},\sqrt{p_{11}}\right )\) of the two-qubit system in Table 2. Despite the efficient representation of probability distributions, also the marginalization of variables, which is intractable in classical models, corresponds to the simple step of excluding the qubits corresponding to these variables from measurements and considering the resulting statistics.

While these advantages sound impressive, it turns out that the problem of statistical inference remains prohibitive: Conditioning the qubit probability distribution on the state of all but one qubit, \(p(x_{1},\ldots ,x_{N}) \rightarrow p(x_{N}\vert x_{1},\ldots ,x_{N-1})\), requires measuring these qubits in exactly the desired state, which has in general an exponentially small probability. Measuring the state can be understood as sampling from the probability distribution, and one has to do an unfeasibly large number of measurements to obtain the conditional statistics, while after each measurement a new quantum state has to be prepared. It has in fact been shown that the related problem of Bayesian updating through quantum distribution is intractable (Wiebe and Granade 2015), as it corresponds to a Grover search which can only be quadratically faster than classically possible.

Even without the ability for efficient inference, quantum systems can still be interesting for probabilistic machine learning models. Low et al. (2014) exploit the quadratic speedup for a problem of inference with “quantum Bayesian nets.” Hidden Markov models have been shown to have an elegant formal generalization in the language of open quantum systems (Barry et al. 2014). Wiebe et al. (2014) show how quantum states that approximate Boltzmann distributions can be prepared to get samples for the training of Boltzmann machines through contrastive divergence. The same authors propose a semi-classical routine for Bayesian updating (Wiebe and Granade 2015). These contributions suggest that a lot of potential lies in approaches that exploit the genuinely stochastic structure of quantum theory for probabilistic machine learning methods.

Optimization and Quantum Annealing

Another branch of QML research is based on techniques of quantum annealing, which can be understood as an analogue version of quantum computing (Das and Chakrabarti 2008). Similar to the metaheuristic of simulated annealing, the idea is to drive a physical system into its energetic ground state which encodes the desired result of an optimization problem. To associate each basis state of a qubit with an energy, one has to introduce externally controllable physical interactions between the qubits.

The main difference between classical and quantum annealing is that “thermal fluctuations” are replaced by quantum fluctuations which enable the system to tunnel through high and thin energy barriers (the probability of quantum tunneling decreases exponentially with the barrier width, but is independent of its height). That makes quantum annealing especially fit for problems with a “sharply ragged” objective function (see Fig. 2). Quantum annealing can be understood as a heuristic version of the famous computational model of quantum adiabatic computation, which is why some authors speak of adiabatic quantum machine learning.
Quantum Machine Learning, Fig. 2

Illustration of quantum annealing in an energy landscape over (here continuous) states or configurations x. The ground state is the configuration of the lowest energy (black dot). Quantum tunneling allows the system state to transgress high and thin energy barriers (gray dot on the left), while in classical annealing technique stochastic fluctuations have to be large enough to allow for jumps over peaks (gray dot on the right)

The significance of quantum annealing lies in its relatively simple technological implementation, and quantum annealing devices are available commercially. Current machines are limited to solving quadratic unconstrained binary optimization (QUBO) problems:
$$\displaystyle{ \mathop{\mathrm{argmin}}\limits_{(x_{1},\ldots ,x_{N})}\sum \limits _{ij}w_{ij}\,x_{i}x_{j}\quad \mathrm{with}\quad x_{i},x_{j} \in [0,1]. }$$
(7)
An important step is therefore to translate the problem into QUBO form, which has been done for simple binary classifiers or perceptrons (?Denchev et al. 2012), image matching problems (Neven et al. 2008) and Bayesian network structure learning (O’Gorman et al. 2015). Other machine learning models naturally relate to the form of Eq. (7). For example, a number of contributions investigate quantum annealing for the sampling step required in the training of Boltzmann machines via contrastive divergence (Adachi and Henderson 2015; Amin et al. 2016). Another example is the Hopfield model for pattern recognition via associative memory, which has been investigated from the perspective of adiabatic quantum computation with nuclear magnetic resonance systems (Neigovzen et al. 2009).

Measuring the performance of quantum annealing compared to classical annealing schemes is a non-trivial problem, and although advantages of the quantum schemes have been demonstrated in the literature mentioned above, general statements about speedups are still controversial.

Experimental Realizations

The reason why one rarely finds classical computer simulations of quantum machine learning algorithms in the literature is that the description of quantum systems is classically intractable due to the exponential size of the amplitude vectors. Until a large-scale universal quantum computer is built, only QML algorithms based on quantum annealing can be tested on real devices and benchmarked against classical machine learning algorithms. Some proof-of-principle experiments have nevertheless implemented few-qubit examples of proposed QML algorithms in the lab. Among those are experimental realizations of the quantum support vector machine (Cai et al. 2015) as well as quantum clustering algorithms (Li et al. 2015; Neigovzen et al. 2009).

Further Reading

The interested reader may be referred to existing reviews on quantum machine learning research (Schuld et al. 2014a2015; Adcock et al. 2015).

Recommended Reading

  1. Aaronson S (2015) Read the fine print. Nat Phys 11(4):291–293CrossRefGoogle Scholar
  2. Adachi SH, Henderson MP (2015) Application of quantum annealing to training of deep neural networks. arXiv preprint arXiv:1510.06356Google Scholar
  3. Adcock J, Allen E, Day M, Frick S, Hinchliff J, Johnson M, Morley-Short S, Pallister S, Price A, Stanisic S (2015) Advances in quantum machine learning. arXiv preprint arXiv:1512.02900Google Scholar
  4. Amin MH, Andriyash E, Rolfe J, Kulchytskyy B, Melko R (2016) Quantum boltzmann machine. arXiv preprint arXiv:1601.02036Google Scholar
  5. Barry J, Barry DT, Aaronson S (2014) Quantum partially observable markov decision processes. Phys Rev A 90:032311CrossRefGoogle Scholar
  6. Cai X-D, Wu D, Su Z-E, Chen M-C, Wang X-L, Li L, Liu N-L, Lu C-Y, Pan J-W (2015) Entanglement-based machine learning on a quantum computer. Phys Rev Lett 114(11):110504CrossRefGoogle Scholar
  7. Das A, Chakrabarti BK (2008) Colloquium: quantum annealing and analog quantum computation. Rev Mod Phys 80(3):1061MathSciNetCrossRefzbMATHGoogle Scholar
  8. Denchev V, Ding N, Neven H, Vishwanathan S (2012) Robust classification with adiabatic quantum optimization. In: Proceedings of the 29th international conference on machine learning (ICML-12), Edinburgh, pp 863–870Google Scholar
  9. Deutsch D (1985) Quantum theory, the church-turing principle and the universal quantum computer. Proc R Soc Lond A: Math Phys Eng Sci 400:97–117. The Royal SocietyGoogle Scholar
  10. DiVincenzo DP (2000) The physical implementation of quantum computation. Fortschritte der Physik 48(9–11):771–783 ISSN 1521–3978Google Scholar
  11. Grover LK (1996) A fast quantum mechanical algorithm for database search. In: Proceedings of the twenty-eighth annual ACM symposium on theory of computing. ACM, New York, pp 212–219Google Scholar
  12. Harrow AW, Hassidim A, Lloyd S (2009) Quantum algorithm for linear systems of equations. Phys Rev Lett 103(15):150502MathSciNetCrossRefGoogle Scholar
  13. Kak SC (1995) Quantum neural computing. Adv Imaging Electron Phys 94:259–313CrossRefGoogle Scholar
  14. Li Z, Liu X, Xu N, Du J (2015) Experimental realization of a quantum support vector machine. Phys Rev Lett 114(14):140504CrossRefGoogle Scholar
  15. Lloyd S, Garnerone S, Zanardi P (2016) Quantum algorithms for topological and geometric analysis of data. Nat Commun 7:10138CrossRefGoogle Scholar
  16. Low GH, Yoder TJ, Chuang IL (2014) Quantum inference on Bayesian networks. Phys Rev A 89:062315CrossRefGoogle Scholar
  17. Neigovzen R, Neves JL, Sollacher R, Glaser SJ (2009) Quantum pattern recognition with liquid-state nuclear magnetic resonance. Phys Rev A 79(4):042321CrossRefGoogle Scholar
  18. Neven H, Rose G, Macready WG (2008) Image recognition with an adiabatic quantum computer i. Mapping to quadratic unconstrained binary optimization. arXiv preprint arXiv:0804.4457Google Scholar
  19. Nielsen MA, Chuang IL (2010) Quantum computation and quantum information. Cambridge University Press, CambridgeCrossRefzbMATHGoogle Scholar
  20. O’Gorman B, Babbush R, Perdomo-Ortiz A, Aspuru-Guzik A, Smelyanskiy V (2015) Bayesian network structure learning using quantum annealing. Eur Phys J Spec Top 224(1):163–188CrossRefGoogle Scholar
  21. Paparo GD, Dunjko V, Makmal A, Martin-Delgado MA, Briegel HJ (2014) Quantum speedup for active learning agents. Phys Rev X 4(3):031002Google Scholar
  22. Rebentrost P, Mohseni M, Lloyd S (2014) Quantum support vector machine for big data classification. Phys Rev Lett 113:130503CrossRefGoogle Scholar
  23. Schuld M, Sinayskiy I, Petruccione F (2014a) The quest for a quantum neural network. Q Inf Process 13 (11):2567–2586MathSciNetCrossRefzbMATHGoogle Scholar
  24. Schuld M, Sinayskiy I, Petruccione F (2014b) Quantum computing for pattern classification. Pham, Duc-Nghia, Park, Seong-Bae (Eds.) Springer International Publishing In: Lecture notes in computer science, vol 8862. Springer, pp 208–220Google Scholar
  25. Schuld M, Sinayskiy I, Petruccione F (2015) Introduction to quantum machine learning. Contemp Phys 56(2):172–185CrossRefzbMATHGoogle Scholar
  26. Schuld M, Sinayskiy I, Petruccione F (2016) Prediction by linear regression on a quantum computer. Phys Rev A 94(2):022342CrossRefGoogle Scholar
  27. Shor PW (1997) Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM J Comput 26(5):1484–1509MathSciNetCrossRefzbMATHGoogle Scholar
  28. Ventura D, Martinez T (2000) Quantum associative memory. Inf Sci 124(1):273–296MathSciNetCrossRefGoogle Scholar
  29. Wiebe N, Granade C (2015) Can small quantum systems learn? arXiv preprint arXiv:1512.03145Google Scholar
  30. Wiebe N, Braun D, Lloyd S (2012) Quantum algorithm for data fitting. Phys Rev Lett 109(5):050505CrossRefGoogle Scholar
  31. Wiebe N, Kapoor A, Svore K (2014) Quantum deep learning. arXiv: 1412.3489v1Google Scholar
  32. Wiebe N, Kapoor A, Svore K (2015) Quantum nearest-neighbor algorithms for machine learning. Q Inf Comput 15:0318–0358MathSciNetGoogle Scholar
  33. Zhao Z, Fitzsimons JK, Fitzsimons JF (2015) Quantum assisted Gaussian process regression. arXiv preprint arXiv:1512.03929Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Quantum Research GroupSchool of Chemistry & Physics, University of KwaZulu-NatalDurbanSouth Africa
  2. 2.National Institute of Theoretical Physics (NITheP)KwaZulu-NatalSouth Africa