MAQA: a quantum framework for supervised learning

Quantum machine learning has the potential to improve traditional machine learning methods and overcome some of the main limitations imposed by the classical computing paradigm. However, the practical advantages of using quantum resources to solve pattern recognition tasks are still to be demonstrated. This work proposes a universal, efficient framework that can reproduce the output of a plethora of classical supervised machine learning algorithms exploiting quantum computation’s advantages. The proposed framework is named Multiple Aggregator Quantum Algorithm (MAQA) due to its capability to combine multiple and diverse functions to solve typical supervised learning problems. In its general formulation, MAQA can be potentially adopted as the quantum counterpart of all those models falling into the scheme of aggregation of multiple functions, such as ensemble algorithms and neural networks. From a computational point of view, the proposed framework allows generating an exponentially large number of different transformations of the input at the cost of increasing the depth of the corresponding quantum circuit linearly. Thus, MAQA produces a model with substantial descriptive power to broaden the horizon of possible applications of quantum machine learning with a computational advantage over classical methods. As a second meaningful addition, we discuss the adoption of the proposed framework as hybrid quantum–classical and fault-tolerant quantum algorithm.


Introduction
Quantum computers are machines that leverage the properties of quantum mechanics to store and process information.Although a potential quantum advantage has already been shown in different fields, such as quantum chemistry [1], multi-agent systems [2,3], it is still unclear whether quantum computing can be efficiently used for machine learning (ML) tasks.
The intersection between ML and quantum computing (QC) is known as Quantum Machine Learning (QML).There are two ways in which ML and QC can be combined: one approach is to run the learning process predominantly in a quantum computer so that the expensive subroutines can be executed efficiently.For this purpose, a rich collection of quantum algorithms for basic linear algebra subroutines have been proposed in literature [4,5,6].Some popular examples of this approach are QSVM [7] and QSplines [8], which obtain an exponential speed-up with respect to their classical counterparts.However, the protocols within this category usually assume the availability of a fault-tolerant quantum computer.
Alternatively, variational quantum algorithms can be considered machine learning models that can be trained using hybrid quantum-classical optimization.In this case, a quantum algorithm is used to make a call to a function that allows estimating the target variable of interest given the input data and a set of rotation parameters [9,10].This approach requires a parametrised quantum circuit and a classical optimisation procedure to find the optimal set of parameters for a sequence of quantum gates.Although these techniques represent the most promising attempt to leverage near-term quantum technology, it is still unclear whether they can outperform classical algorithms.
Despite the remarkable success of ML in numerous real-world applications, the ever-increasing size of datasets and the high computational requirements of modern algorithms indicate that the current computational tools will no longer be sufficient in the future.In this work, we propose a novel and efficient quantum framework to reproduce a plethora of machine learning models using quantum computational advantages.The framework is called Multiple Aggregator Quantum Algorithm (MAQA) due to its capability to combine multiple and diverse functions to solve typical supervised learning tasks.Thanks to superposition, entanglement and interference, the MAQA framework can compute the weighted average of an exponentially large number of functions while increasing the depth of the correspondent quantum circuit linearly.This allows for building quantum models with incredible descriptive power that might be a credible alternative to classical methods in the future.

Preliminaries
The objective of a supervised model is to find a useful approximation to the function f (x; θ) that underlies the predictive relationship between the input x and output y for a fixed set of parameters θ.Assuming for simplicity an additive error, the model of interest can be expressed as follows: where is a random variable whose conditioned probability distribution given x is centred in zero.Although Eq. ( 1) provides a general mathematical formulation for supervised learning, several methods do not estimate a single function but explicitly calculate multiple and diverse functions.These functions belong to the same family but differ in either a set of parameters or the training data.In all these cases, the final output results from the weighted average of the estimated functions: where f (x; θ) is the final output and g(x; •) describes the function component.
The calculation of g(x; •) corresponds to a specific transformation of data x based on θ h , whose contribution to the final output is weighted by β h .The estimation of a collection of functions components allows producing an extremely flexible model, which is able to approximate the behaviour of complex patterns.Different choices for β, g(x; •) and θ h determine different supervised models commonly adopted in real-world applications.
For instance, a single-layer neural network (or Single Layer Perceptron -SLP) assumes as function component g(x; •) the activation function σ hidden applied to the linear combinations L(x; θ h ) of the input vector x.In fact, an SLP with H hidden neurons is a two-stage model that takes as input training data x and H sets of linear coefficients and estimates the target variable as follows: where σ output is the identity function when the task is the function approximation. 2nother classical supervised learning approach that falls into the schema of function aggregation is ensemble learning.In practice, ensemble methods reduce to computing several predictions g 1 (x), g 2 (x), . . ., g H (x) using H different training sets, which are then averaged to obtain a single model: In this case, the component functions g(x; •) are weak classification/regression models and the choice of the weights depends on the type of the ensemble in use (boosting, bagging, randomisation).
Other models that fit into the idea of multiple aggregations are Generalised Additive Models [11], Support Vector Machines and Decision trees [12].
Contribution.In this work, we propose a novel efficient quantum framework to reproduce the idea of machine learning models as functions aggregators.The proposed architecture, named Multiple Aggregator Quantum Algorithm (MAQA), can potentially reproduce some of the most important classical supervised learning algorithms introducing relevant computational advantages.In particular, MAQA propagates an input state to multiple quantum trajectories in superposition, and each trajectory describes a specific function g(x; •) that represents the component function of the final model.The entanglement between the two quantum registers involved (data and control) allows for efficient averaging of those transformations, and the final result can be accessed by measuring only a subset of qubits.The proposed approach has two main advantages: from a classical perspective, it introduces an exponential scaling in the number of aggregated functions while linearly increasing the time complexity of the correspondent quantum algorithm.From a quantum perspective, the framework opens the possibility of implementing a plethora of models not yet proposed in the literature.Eventually, we discuss the adoption of MAQA to generalise some existing QML algorithms, considering both fault-tolerant settings and hybrid quantum-classical algorithms.
3 Multiple Aggregator Quantum Algorithm (MAQA) In this section, we describe the MAQA framework that is able to reproduce the classical model expressed in Eq. ( 2).The algorithm leverages the three main properties of quantum computing (superposition, entanglement and interference) to encode in a quantum state the sum of different input transformations accessible by measuring a single quantum register.
The proposed algorithm can potentially reproduce all those models that refer to the idea of functions aggregation and provide attractive computational advantages with respect to the classical counterparts.
The quantum algorithm adopts two quantum registers: data and control.The data register encodes the model's input data, and the control register is used to generate multiple trajectories in superposition, where each trajectory represents a different transformation of data.
Starting from a n-qubit data register and a d-qubit control register the Multiple Aggregator Quantum Algorithm (MAQA) involves four main steps: state preparation, multiple trajectories in superposition, transformation via interference and measurement.

(Step 1) State Preparation
State preparation consists of encoding the input in the data register and the initialisation of the control register whose amplitudes depend on a set of parameters β = {β * i } i=1,...,2 d : We refer to S x as a quantum routine to encode data into a quantum state, and S β as a routine that transforms a d-qubit register from an all-zero state to a quantum state which depends on a set of parameters β.Importantly, the computational cost of this step is not considered classically since any classical algorithm assumes the input x to be directly accessible.

(Step 2) Multiple Trajectories in Superposition
The second step regards the generation of 2 d different transformations of the input data in superposition, each entangled with a possible state of the control register.The single quantum state of the superposition encodes a specific transformation of the data and it depends on a set of parameters Θ k .To this end, a unitary G(θ 1 , . . ., θ 2 d ) that performs the following operation is assumed 3 : where the implementation of G(θ 1 , . . ., θ 2 d ) can be accomplished in only d steps.Each step consists in the entanglement of the i th (i = 1, . . .d) control qubit with two transformations G (θ i,1 ) and G (θ i,2 ) of |x based on two sets of parameters, θ i,1 and θ i,2 .Let us consider a unitary G (θ i,j ) that implements the transformation l (x; θ i,j ).The most straightforward way to obtain the quantum state in Eq. ( 6) is to apply G (θ i,j ) through controlled operations, using as control state the two basis states of the current control qubit.In particular, the generic i th step involves the following two transformations: First, the controlled-unitary C (1) G (θ i,1 ) is executed to entangle the transformation G (θ i,1 ) |x with the excited state |1 of the i th control qubit: where a i and b i are the amplitudes of the i th control qubit and C (1) G(θ i,1 ) is a controlled operation that entangles the exited state of the control qubit |c i to transform the data register according to the unitary G(θ i,1 ).
Then, a second controlled-unitary C (0) G (θ i,2 ) is executed.This time the control state is the |0 basis state: These two transformations are repeated for each qubit in the control register and two different unitaries G (θ i,1 ) and G (θ i,2 ) are applied, at each iteration.After d steps, the control and data registers are fully entangled and 2 d different quantum trajectories in superposition are generated.The output of this procedure can be expressed as follows: where G (Θ h ) results from the product of d unitary matrices G (θ i,j ) and it represents a single quantum trajectory.Each trajectory differs from the others for, at least, one unitary G (θ i,j ) 4 .
When discussing a specific implementation of QML algorithms (Sections 4.2 and 4.1), we will see that, from a computational point of view, the possibility to generate 2 d different transformations in only d steps potentially leads to scaling exponentially the number of component functions with respect to classical methods, assuming an efficient implementation of the C (j) G(θ i,j ).

(Step 3) Transformation via Interference
Once we generated multiple transformations l(x; Θ h ) of the input in superposition, the third step consists of transforming the data register through a generic quantum gate F that works via interference: where H = 2 d .In Eq. (10) the assumption is that the sequential application of G(Θ h ) and F on the quantum state |x is equivalent to calculate the function g * h to an input x.At this point, different values of the function g * h are entangled with different states of the control register.
It is important to notice that a single execution of F allows the computation of the function g * h for all the quantum trajectories in superposition.This is extremely useful when, during the computation, the same operations need to be applied to multiple inputs (e.g., when the activation function is applied to a huge number of neurons or in the case of ensemble learning, where the same classifier has to be executed to different sub-samples of the training set).

(Step 4) Measurement
The last step consists of measuring the data register, leaving untouched the control register: where The expectation value M stores the weighted average of the 2 d functions g (x; Θ h ), which is accessible by measuring the data register.While extracting the single contribution g (x; Θ h ) would require an exponential number of measurements (since those values are in the superposition of 2 d possible basis states), in a classical supervised learning scenario the measure of interest is the weighted average of all the functions which can be directly accessed by measuring the data register and leaving intact the control register.
To summarise, the proposed architecture allows calculating the aggregation of multiple and diverse functions described in Eq. ( 2) using a quantum algorithm.In particular, it is possible to access the final result by measuring only the data register while obtaining the weighted average of 2 d different transformations g(x; •) of the input data x, where d is the size of the control register.Specifying properly S β , S x , {G (θ i,1 ) , G (θ i,2 )} i=1,...,d and F allows potentially to reproduce the quantum version of all the ML algorithms discussed in Section 2. Furthermore, the framework is very generic and can be adopted for hybrid and fault-tolerant quantum computation.The quantum circuit for implementing MAQA is depicted in Figure 1.

Discussion
As shown in the previous section, the MAQA allows obtaining a quantum state that reproduces the idea of ML models as aggregators of functions using the properties of quantum computing.From a classical ML perspective, relevant computational advantages are introduced.Given 2 d component functions, any classical method that leverages the idea of functions aggregation scales linearly in 2 d since it is necessary to compute those functions explicitly to obtain the overall average.Furthermore, in the worst-case scenario, each component function has to process all available data; this implies a linear cost in the training set size multiplied by 2 d .Using big-O notation, given a dataset (x i , y i ) for i = 1, . . .N , where x i is a p-dimensional vector, and y i is the target variable of interest, the overall time complexity of a model based on the aggregation of 2 d functions is: In contrast, MAQA generates a superposition of 2 d different transformations of the input in only d steps since the single transformations are not computed directly, but they result from the combination of different unitaries G(θ i,j ).Then, once the quantum state in Eq. ( 10) is generated, any operation (unitary F ) is propagated to all the quantum trajectories with a single execution.Using big-O notation, the time complexity of implementing the MAQA is: where C G is the cost the controlled operation C (j) G(θ) and C F is the cost of F .Note that the number of different functions grows exponentially with respect to the parameter d, which has a linear impact on the overall time complexity.This means that it is possible to generate an exponentially large number of different transformations of the input while obtaining their average efficiently, at the cost of increasing the depth of the corresponding quantum circuit linearly by a factor of 2C G .
However, these advantages come with some compromises.First, the assumption about the nature of the operator G(θ i,j ).
In fact, MAQA assumes that the product of G(θ i,j ) for i = 1, . . ., d produce a quantum gate G (Θ k ): In practice, this means that multiple applications of the unitaries that depend on some set of parameters θ i,j result in a single transformation of the same nature that depends on a derived set of parameters Θ k .Although any quantum circuit can be expressed as the product of different unitary matrices, the design of these gates in the context of supervised learning needs to be accomplished such that the final measurement provides the target variable of interest.
Finally, when comparing classical and quantum algorithms, it is important to consider that quantum computation introduces a new complexity class, the Bounded-error Quantum Polynomial time, representing the class of problems solvable in polynomial time by an innately probabilistic quantum Turing machine [13].Nevertheless, quantum algorithms need to be evaluated in terms of gate complexity.Thus, it is necessary that the exponential scaling introduced with respect to d is preserved when considering a specific QML model.

MAQA as Hybrid Quantum-Classical Algorithm
Recently the idea of aggregating two different unitary operators to reproduce the output of a two-neuron single-layer neural network via quantum circuit has been proposed (qSLP) [14,15].Since multiple aggregations are the basis of both qSLP and MAQA, the latter can be seen as a natural extension of the former with an exponentially large number of neurons in the hidden layer.In fact, the entanglement between the control and data registers implies the number of linear combinations to be equal to the number of basis states of the control register.This, in turn, implies that the number of hidden neurons H scales exponentially with the number of states of the control register as a consequence of each hidden neuron being represented by a quantum trajectory.This exponential scaling might enable the construction of a qSLP with an arbitrarily large number of hidden neurons as the amount of available qubits increases.In other words, by adopting MAQA to generalise the qSLP, we can build a model with an incredible descriptive power capable of being a universal approximator.
From a computational point of view, given H hidden neurons and L training epochs, the training of a classical SLP scales (at least) linearly in H and L since the output of each hidden neuron has to be calculated explicitly to obtain the final output.Furthermore, if H is too large (a necessary condition for an SLP to be a universal approximator [14,16]), the problem becomes NP-hard [17].The adoption of MAQA to generalize the qSLP allows scaling linearly with respect to log 2 (H) = d, thanks to the entanglement between the two quantum registers, which allows generating an exponentially large number of quantum trajectories in superposition.
However, the main challenge to tackle in the near future for qSLP-MAQA is still the design of a proper activation function -in the sense of the Universal Approximation Theorem -which is one of the significant issues for building a complete quantum neural network.Yet, a recent proposal of QSplines [8] opened the possibility of approximating non-linear activation functions via a quantum algorithm.Even so, QSplines use the HHL as a subroutine, a fault-tolerant quantum algorithm that cannot be adopted in hybrid computation on NISQ devices.
Nevertheless, recently it has been shown that quantum feature maps alongside functions aggregation is able to achieve universal approximation [18].Thus, a possible future work consists of studying the qSLP-MAQA on top of the quantum feature map to enable it as a universal functions approximator without implementing a non-linear quantum activation function.

MAQA as Fault-Tolerant Quantum Algorithm
Recently, a quantum algorithm that implements the idea of ensemble methods has been proposed [19] and further developed [20].Looking at the specific quantum circuit in use, it is possible to observe that quantum ensembles can be considered as a particular instance of MAQA, where the controlled-rotation in Eq. ( 8), (7) are implemented using only the basis state |1 as control state which is transformed through Pauli-X gate at each iteration.Furthermore, while MAQA allows flexible quantum trajectories in terms of parametrised quantum gates S β and {G (θ i,1 ) , G (θ i,2 )} i=1,...,d , in the case of quantum ensemble [20] the weights are pre-fixed (uniform superposition of the control register) and the transformations of the input data are represented by CSWAP operations.Thus, MAQA potentially extends the proposed quantum ensemble specifically defined for bagging strategy to other ensembles such as boosting and randomisation, where the parameters of the single base model and the correspondent weights are not pre-fixed.Still, the main drawback of the quantum ensemble remains the underlying assumption to encode the complete training and test set into two different quantum registers and use a large number of trajectories in superposition to compute different subsamples of the training set.This would require an incredibly high number of qubits in a fault-tolerant quantum computer.
In this respect, the main challenge to tackle to make the ensemble effective (using MAQA) in the near future is the design of a quantum classifier based on interference that guarantees a more efficient data encoding strategy (e.g.amplitude encoding) and can process larger datasets.

Conclusions
The practical advantages of using quantum resources to solve machine learning tasks are still to be demonstrated.However, the ground provided by quantum mechanics is highly appealing since a low number of qubits allows accessing an exponentially large Hilbert space.
In this work, we tried to take a further step towards the study of how machine learning can benefit from quantum computation.The proposed quantum framework, the Multiple Aggregator Quantum Algorithm (MAQA), is capable of reproducing some of the most important classical machine learning algorithms using quantum computing resources.MAQA can potentially improve, in terms of time complexity, all those models that require explicitly computing multiple and diverse functions to produce a final strong model.In particular, the cost aggregating H different functions in classical machine learning requires a computational cost linear in H. Instead, the proposed quantum architecture allows scaling exponentially in H, requiring only log 2 (H) steps under the assumption that the cost in terms of circuit complexity is unitary for each step.The advantage comes directly from using superposition and entanglement as resources for generating different transformations of the input.Furthermore, quantum interference allows propagating the use of a specific unitary (gate F ) to all the quantum trajectories in superposition.Hence, the application of F impacts additively the overall time complexity, and the same operation would require a multiplicative cost in classical computation.
In addition, we discussed how the proposed approach could be adopted as a fault-tolerant (quantum ensemble) and hybrid quantum-classical (quantum Single Layer Perceptron) algorithm, though different technical aspects need to be further investigated for both cases.
We are still in an early stage of QML, and its contribution to solving real-world in the context of machine learning is yet to be understood.However, many research findings, including this work, suggest that the potential of quantum computing is huge, and machine learning will likely benefit from it in the future.
Notice that the entanglement performed in Step 2.1 influences the entanglement in Step 2.2, and each trajectory describes a different transformation of |x .Eq. ( 18) can be rewritten expressing the four basis states of the control register using natural numbers: where G(Θ h ) is the product of d = 2 unitaries G i,j , the coefficients β * h result from the product of two coefficients a i and b i .Thus, using 2 control qubits 4 different quantum trajectories are generated that correspond to 4 different transformations of data |x .
Step 3 (i = 3) Extending the same procedure when d = 3, the result is the following: where each G(Θ h ) is the product of 3 unitaries G i,j for i = 1, 2, 3 and j = 1, 2.
Repeating this procedure d times with different control qubits the result is the following quantum state: where each G(Θ h ) is the product of d = 3 unitaries G i,j for i = 1, • • • , d and j = 1, 2.
Finally, gate F is applied, as shown in Eq. ( 10) and the measurement of the data register is performed.