1 Introduction

Breakthroughs in quantum technologies have allowed the construction of small-scale prototypes of quantum computers (Madsen et al. 2022; Dumitrescu et al. 2022; Huang et al. 2022), namely NISQ devices (Preskill 2018). Even though many sources of noise may corrupt the execution on these devices (Pelofske et al. 2022), we are able to run a certain class of algorithms (Bharti et al. 2022) which compromises the strong theoretical speedup of fault-tolerant quantum algorithms (Montanaro 2016) to achieve shorter, less noisy computations. A large subset of the NISQ-ready algorithms is dedicated to the development of machine learning models.

One of the most interesting technique among them are the quantum classifiers (Schuld and Killoran 2019; Havlíček et al. 2019; Mengoni and Di Pierro 2019): the function \(f(x) = \textrm{Tr}[\rho _\textbf{x} \rho _w]\), where \(\rho _\textbf{x}\) represents the encoding of a data point \(\textbf{x}\) in a quantum state through the feature map \(|{0}\rangle \!\langle {0}| \mapsto U(\textbf{x})|{0}\rangle \!\langle {0}| = \rho _\textbf{x}\) and \(\rho _w\) represents the weight vector encoded through the mapping \(|{0}\rangle \!\langle {0}| \mapsto W|{0}\rangle \!\langle {0}| = \rho _w\), can be interpreted as a linearFootnote 1 model (Schuld and Petruccione 2021). Such a function can be immediately used to solve supervised learning tasks. By choosing the weight mapping to be parametric \(W(\theta )\), we can train the parameters to minimize some loss function using gradient-descent-based techniques: such an approach is named quantum neural network (Mitarai et al. 2018; Abbas et al. 2021). However, the training phase of these models could be affected by barren plateau (McClean et al. 2018; Holmes et al. 2021), i.e. the flat loss landscape, where the variance of the gradient vanishing exponentially fast with respect to the number of qubits. Highly entangled states (Marrero et al. 2021), noise (Wang et al. 2021), global measurement (Arrasmith et al. 2021), and expressibility of the feature map (Holmes et al. 2022) have been linked to the appearance of barren plateau. To avoid such a problem, Quantum Kernel Estimation (QKE) (Schuld and Killoran 2019; Havlíček et al. 2019; Mengoni and Di Pierro 2019) algorithm can be used in a hybrid form - we implement a quantum kernel function \(\kappa (\textbf{x}, \textbf{x}') = \textrm{Tr}[\rho _\textbf{x} \rho _{\textbf{x}'}]\), quantifying the similarity between two encoded data points, with a classical machine learning algorithm. The training of the model is classical and is expected to end successfullyFootnote 2 and efficiently due to the representer theorem (Schölkopf et al. 2001). Classical kernel methods are a cornerstone of machine learning, and have been applied to any sort of task including signal processing (Pérez-Cruz and Bousquet 2004; Rojo-Álvarez et al. 2018; Camps-Valls 2006), bioinformatics (Camps-Valls 2006; Ben-Hur et al. 2008), and image processing (Wang and Qi 2014; Yang 2001).

A clear benefit in using quantum kernel estimation to enhance machine learning applications has still to be found (Schuld and Killoran 2022). Quantum kernels have been shown to improve the performances of classical machine learning algorithms for some problems, such as the prediction of the output of quantum systems (Huang et al. 2021, 2022), and in learning from distributions based on the discrete logarithm (Liu et al. 2021). They have been applied to several real-world, industrial scale problems such as anomaly detection (Liu and Rebentrost 2018), fraud detection (Di Pierro and Incudini 2021; Grossi et al. 2022; Kyriienko and Magnusson 2022), the effectiveness of pharmaceutical treatments (Krunic et al. 2022), and supernova classifications (Peters et al. 2021). These approaches have been experimentally tested on superconducting (Peters et al. 2021; Wang et al. 2021), optical (Bartkiewicz et al. 2020), and NMR (Kusumoto et al. 2021) quantum devices, and their effectiveness is usually assessed empirically.

Most of these experiments share, at least partially, a common structure: dimensionality reduction techniques, used to limit the number of quantum resources needed for the computation; the scaling of the input; the choice of the quantum kernel; the evaluation of the quantum kernel. The researcher is usually in charge of developing a software prototype, which requires the knowledge of many different software frameworks and platforms: the ones dealing with the machine learning tasks (Paszke et al. 2019; Chollet et al. 2015), and the one dealing with quantum computing (Bergholm et al. 2018; Anis et al. 2021; Broughton et al. 2020; Killoran et al. 2019; Baidu 2020). As the prototype becomes larger, the probability of introducing bugs in the code increases (Lipow 1982), possibly leading to erroneous results (Fidler et al. 2017; Botvinik-Nezer et al. 2020; Campos and Souto 2021). Well-organized code has been shown to facilitate code reuse and reproducibility (Trisovic et al. 2022; Mineault and Nozawa 2021). Minimizing the quantity of code needed to run an experiment has clear benefit in speeding up the research, reducing the time spent to learn, and put the software in a production environment.

We propose QuASK (Quantum Advantage Seeker with Kernel), a Python3 software framework unifying under a single interface all the features to run experiment with quantum kernels. QuASK can be run from the terminal using a single command line which specifies how to operate on the given data. Within the same command, the researcher can specify to analyze the data and subsequently generate graphics. QuASK can also be used as a library, to be integrated within an existing pipeline. Finally, the open-source nature of the framework allows the user to integrate further capabilities into the software, having them immediately available through the command line interface. QuASK is freely available at https://github.com/CERN-IT-INNOVATION/QuASK and the documentation is available at https://quask.readthedocs.io/en/latest/index.html.

2 Theoretical aspects of Quantum Kernels

A binary, symmetric function \(K: \mathcal {X} \times \mathcal {X} \rightarrow \mathbb {R}\) is a kernel function if positive definite (pd), i.e.

$$\begin{aligned} \sum _{i=1}^n \sum _{j=1}^n c_i c_j K(x_i, x_j) \ge 0 \end{aligned}$$
(1)

for all \(x_1,..., x_n \in \mathcal {X}\) given the realFootnote 3 coefficients \(c_1,..., c_n \in \mathbb {R}\). Supposing k continuous, we can associate a linear Hilbert-Schmith integral operator

$$\begin{aligned}{}[T g](x') = \int _{x \in \mathcal {X}} K(x', x) g(x) dx. \end{aligned}$$
(2)

whose eigenfunctions \(\{ e_i \}_{i=1}^\infty \) form an orthonormal basis of square-integrable functions, and the sequence of corresponding eigenvalues \(\{ \lambda _i \}_{i=1}^\infty \) is non-negative. In such a case,

$$\begin{aligned} K(x, x') = \sum _{i=1}^\infty \lambda _i e_i(x) e_i(x') \end{aligned}$$
(3)

For \(x, x' \in \mathcal {X} \subseteq \mathbb {R}^d\), some examples of kernels are:

$$\begin{aligned} K_\text {l}(x, x')&= x \cdot x'{} & {} \text {Linear} \end{aligned}$$
(4)
$$\begin{aligned} K_\text {p}(x, x')&= (x \cdot x' + b)^r{} & {} \text {Polynomial of degree }r \end{aligned}$$
(5)
$$\begin{aligned} K_\text {rbf}(x, x')&= \exp (-\alpha \Vert x \!-\! x'\Vert ){} & {} \text {RBF or Gaussian } (\alpha \!>\! 0) \end{aligned}$$
(6)
$$\begin{aligned} K_\text {eq}(x, x')&= 1 - \delta _{x, x'}{} & {} \text {Equality kernel} \end{aligned}$$
(7)

while, for example, \(\vert x-x'\vert \) is not a valid kernel due to the lack of positive definiteness. Important class of kernels are the translation-invariant kernels \(K(x, x') = \Phi (x - x')\) given that \(\mathcal {X}\) is a vector space (the Gaussian kernel is an instance of such a family), and the group kernel \(K(x, x') = \Phi (x^{-1} x')\) given \(\mathcal {X}\) has a group structure. Kernels can form other kernels: a non-negative linear combination of kernels, a product, and the limit of a kernel sequence (if exists) are kernels too. \(K_\text {eq}\) is a valid non-continuous kernel, which can be obtained as the limit for \(n \rightarrow \infty \) of \(\exp \{-n\Vert x - x' \Vert \}\). A larger list of kernels and their compositions can be found in (Rasmussen and Williams 2005; Duvenaud 2014).

Positive definite kernels can be thought as a generalization of the notion of inner product due to the strong relationship with the concept of Reproducing Kernel Hilbert Space (RKHS). A space \(\mathcal {H} = \{ f: \mathcal {X} \rightarrow \mathbb {R} \}\) of real-valued functions over \(\mathcal {X}\) is a RKHS if any linear functional \(L_x: \mathcal {H} \rightarrow \mathbb {R} , \hspace{4pt} L_x(f) = f(x)\) is bounded in \(\mathcal {H}\) (meaning that if two functions are close in terms of norm, then they are close also pointwise); or equivalently, for the Riesz representation theorem, it holds that \(\forall x \in \mathcal {X}\) exists a unique \(K_x \in \mathcal {H}\) such that \(f(x) = L_x(f) = \langle f, K_x \rangle _\mathcal {H} , \hspace{4pt} \forall f \in \mathcal {H}\). For every RKHS \(\mathcal {H}\) there is a unique K such that \(K(x, x') = \langle K_x, K_{x'} \rangle \), namely K is a reproducing kernel for \(\mathcal {H}\), and viceversa given a K positive definite kernel there is a unique Hilbert space of functions on \(\mathcal {X}\) for which K is a reproducing kernel (Aronszajn 1950). The mapping \(\phi : \mathcal {X} \rightarrow \mathcal {H}\) encoding a data point within the RKHS is a feature map. Since

$$\begin{aligned} K(x, x') = \langle K_x, K_{x'} \rangle = \langle \phi (x), \phi (x') \rangle \end{aligned}$$
(8)

we can interpret the application of K as calculating the inner product over a different vector space than the original point space \(\mathcal {X}\).

In supervised learning applications is common to use a feature map \(\phi \) to encode the data in higher dimensional (Hilbert) space to find a linear separation of the transformed data and by using the inverse map \(\phi ^{-1}\) we can recover a complex, nonlinear decision boundary in the original space. Due to the representer theorem, the linear pattern can be found independently of the dimensionality of \(\mathcal {H}\): given the data points \(\{ (x_1, y_1),..., (x_m, y_m) \}\), the algorithms are fed with the kernel Gram matrix \(K_{i,j} = [K(x_i, x_j)]\) of pairwise kernel similarities and no other information about the data is necessary for the classifier. Formally, the representer theorem asserts the linear function

$$\begin{aligned} \min _{f \in \mathcal {H}} L(f) + \lambda \Vert f \Vert \end{aligned}$$
(9)

that minimizes the empirical risk is always in the form:

$$\begin{aligned} f(x) = \sum _{i=1}^m \alpha _i K(x, x_i). \end{aligned}$$
(10)

The terms L is the loss function, e.g. the mean square error \(L(f) = \frac{1}{m} \sum _{i=1}^m \Vert y - y' \Vert ^2\). The term \(\lambda \Vert f \Vert \), \(\lambda > 0\) has regularization purposes, i.e. penalizes high norm solutions thus preferring smooth functions over non-smooth ones. The determination of the \(\{ \alpha _i \}\) values is a convex (efficient) optimization problem.

Kernel methods can be applied to supervised learning tasks using the kernel ridge regression algorithm (Murphy 2012), which is a straightforward generalization of linear regression, and the support vector machine (SVM) (Cortes and Vapnik 1995), which finds the linear classifier that maximizes the margin (i.e. the minimum distance between the data points and the boundary, on both sides). The SVM usually finds a sparse solution, i.e. a classifier whose output depends only on a few dataset items named support vectors. Kernel methods can be applied also to unsupervised learning tasks. Kernel PCA (Schölkopf et al. 1997) is the straightforward extension of the principal component analysis algorithm. It finds the components in the higher dimensional Hilbert space that have larger variance. Kernels can be applied to clustering techniques too, including the k-means algorithm (MacQueen 1967).

Fig. 1
figure 1

Fig. 1(a)-(b) Fidelity test and SWAP test for Quantum Kernel Estimation where U is the feature map associated with the quantum kernel. Figure 1(c) Quantum circuit for the feature map associated with the projected kernel, the Hermitian observable H can be arbitrary

Kernel function can be parameterized, i.e. depending on one or more hyper-parameters that can be trained according to some loss function or chosen using a grid search. A different approach is the multiple kernel learning (Bach et al. 2004), which consists of defining multiple, fixed kernels and learning the most effective linear combination of such kernels.

2.1 Quantum Kernels implementation

Any parametric quantum circuit implementing the unitary transformation \(U_\phi (\theta )\) acting on the Hilbert space \(\mathcal {H}\) of the n-qubits quantum system can be used to implement a feature map:

$$\begin{aligned}&\phi : \mathcal {X} \rightarrow \mathcal {H} \nonumber \\&\vert 0 \cdots 0 \rangle \mapsto U_\phi (x) \vert 0 \cdots 0 \rangle = \vert \phi (x)\rangle \end{aligned}$$
(11)

Such a feature map allows using the quantum space as an RKHS. In fact, we can obtain a kernel function sharing the same structure of Eq. 8 by encoding a pair of data points into quantum states and calculating the inner product between them:

$$\begin{aligned} K(x, x')= & {} \ \langle {\phi (x)}\vert {\phi (x')}\rangle \end{aligned}$$
(12)
$$\begin{aligned}= & {} \ \text {prob}(\text {measurement of the state } U_\phi ^\dagger (x')U_\phi (x)\vert 0 \cdots 0 \rangle \nonumber \\{} & {} \text { using observable } \sigma _z^{\otimes n} \text { collapse to eigenstate }\vert 0 \cdots 0 \rangle ).\nonumber \\ \end{aligned}$$
(13)

Such a kernel can be concretely implemented using the overlap test circuit. The circuit structure is shown in Fig. 1a. We can equivalently use the SWAP test whose circuit structure is shown in Fig. 1b. Thus, we estimate the value of the kernel matrix \(K_{i,j}\) by executing, for each pair of data points, \(U_\phi \) multiple times (aka shots). This procedure consists in performing multiple measurements which force the quantum wavefunction to collapse, resulting in the fidelity measure between the two encoded data points. The kernel matrix can be finally fed to a kernel machine (e.g. SVM, Kernel PCA). Moreover, given the parametric quantum circuit for the feature map \(U_\phi \) and a second quantum circuit W implementing the state \(\vert {w}\rangle = W\vert 0\cdots 0\rangle \) corresponding to the linear weights, the function

$$\begin{aligned} f(x) = \langle {\phi (x)}\vert {w}\rangle \end{aligned}$$
(14)

is a linear classifier.

Due to the large dimensionality of \(\mathcal {H}\), exponentially in the number of qubits, the computation of the inner product may be affected by the curse of dimensionality: any two pairs of quantum states uniformly sampled in the Hilbert space have a high probability to be almost orthogonal (Ball et al. 1997). Each off-diagonal element of the Gram matrix vanishes with the increasing dimension of \(\mathcal {H}\). If we think to perform the QKE on a current NISQ hardware this small value \(K_{ij}\) becomes indistinguishable from the inherently pervading noise making our classifier worthless. Such limitation requires a number of measurements to estimate a kernel function value that is polynomially in the dimensions of \(\mathcal {H}\), thus exponentially in n. Therefore, we need to accurately design our unitary transformation in order to avoid loosing quantum states within the Hilbert space.

We can design an effective quantum transformation, i.e. not affected by the curse of dimensionality, using several techniques.

The first approach is the use of parametric quantum circuits that we know analytically are restricted to a small subspace of \(\mathcal {H}' \subset \mathcal {H}\), i.e. any parameter assignment x results in \(U(x)\vert 0\cdots 0 \rangle \in \mathcal {H}'\). The second approach is the use of a bandwidth coefficient (Canatar et al. 2022), i.e. a small scalar to be applied pointwise to the components of x, diminishing the range of each component. The third approach is to implement a projected or biased quantum kernel (Huang et al. 2021), which projects the quantum state to an approximate classical representation through an observable O (the choice of the observable is an educated guess). The quantum state lives in a large Hilbert space, but the observable O usually implies partial traces. The effect of a partial trace (present in O) over, e.g. the \(k-th\) qubit, is to restrict the quantum space to some smaller representation, thus projecting it to the \(k-th\) qubit subspace. A projected kernel function could take a gaussian form as follows:

$$\begin{aligned} k(x, x') = \exp (-\gamma \Vert \langle 0\vert U_\phi (x)OU_\phi ^\dagger (x)\vert 0 \rangle - \langle 0 \vert U_\phi (x')OU_\phi ^\dagger (x') \vert 0\rangle \Vert ). \end{aligned}$$
(15)

The transformation \(U_{\phi }\) defining the quantum kernel influences dramatically the linear decision boundary to be found in the feature space through classical optimization. In fact, we can find an optimalFootnote 4 form in an automatic fashion. The approach proposed by Glick et al. (2021) suggests the unitary transformation \(U(x; \theta )\) should depend on both the data point features x and on some trainable parameters \(\theta \). The trainable parameters are then trained using stochastic gradient descent-based algorithm to minimize a loss function. Such an approach has been shown to be ineffective (Thanasilp et al. 2022). A different approach to optimize the parametric quantum circuit, choosing the basis gates at each point of the circuit as a combinatorial optimization algorithm (possibly a meta-heuristics) has been proposed by Incudini et al. (2022); Altares-López et al. (2021).

To evaluate the performance of a quantum kernel, a family of metrics has been introduced: geometric difference, approximate dimension, model complexity, and target-kernel alignment. The first three metrics constitute the central discussion of the Huang et al paper Huang et al. (2021). The last one has several implementations already. The groundwork can be found in Cristianini et al. (2001).

  • The geometric difference compares classical and quantum kernel feature spaces evaluating the separation in performances of the two kernels. A large g compared with the \(\sqrt{N}\) indicates there is a deviation between the two kernel performances.

  • The approximate dimension gives us an effective dimension of the quantum feature space generated by the encoding of the training samples. Indeed, this quantity helps us to understand the expressibility of the quantum kernels. If the d saturates with N it means the quantum states of the training data points are all orthogonal, otherwise a small value of d tells us the Hilbert space has not been fully exploited and the model has limited expressivity.

  • The model complexity represents a final test where we find the complexity of a kernel including in the computation the labels of data. This metric derives from a prediction error generalization bound.

  • The target-kernel alignment, as the model complexity, captures the relation present between a kernel and the relative target function, that is, the labels. The final objective of a kernel-based method is to approximate the label distribution with the data distribution in the feature space, and a margin tries to quantify this relation.

3 Quantum software frameworks

In recent years, a variety of software and programming languages have been developed to perform quantum computation. Most of the frameworks express quantum computation in terms of quantum circuits (Feynman 1985), which is the standard model de facto. They are usually able to apply a universal set of basis gates, decompose a unitary matrix to a quantum circuit, reverse a circuit, perform uncomputation (e.g. to restore the original value of an auxiliary qubit), and perform circuit transformation (e.g. replacing part of a circuit with another one). Such frameworks allow the simulation on the host computer, while others allow sending the quantum circuit to some remote quantum hardware to be executed. Some possible alternatives to the quantum circuit model are the quantum lambda calculus (Van Tonder 2004), the quantum Turing machine (Deutsch 1985), the adiabatic quantum model (Farhi et al. 2000), the measurement-based quantum computation (Raussendorf and Briegel 2001), the topological quantum computation (Kitaev 2003), and the ZX calculus (Coecke and Duncan 2011).

A comparison of frameworks using the quantum circuit model is shown in Table 1. Most frameworks allow to import and export of circuits in the OpenQASM format (Cross et al. 2017), an open-source specification for quantum circuits. This facilitates the porting of quantum software among the different platforms.

Table 1 Comparison of relevant quantum computing frameworks

3.1 Quantum machine learning frameworks

PennyLane has been the first framework offering Quantum Machine Learning capabilities. They include the possibility to train a parametric quantum circuit, whose gradient can be calculated using the parameter-shift rule (Wierichs et al. 2022) or with finite difference method. It allows the integration of a quantum transformation as a layer in a neural network object defined in Keras (Chollet et al. 2015) or PyTorch (Paszke et al. 2019) libraries. PennyLane has also facilities to define a quantum kernel, whose fidelity circuit (Fig. 1a) is created automatically given the circuit for a quantum embedding. Strawberry Field proposes the same high-level capabilities for Continuous Variables formalism of quantum computing, like photonic Quantum Computing.

Qiskit Quantum Machine Learning has similar features, allowing us to embed quantum transformations within PyTorch networks and calculate kernel matrices. TensorFlow Quantum (Broughton et al. 2020) allows for rapid prototyping of hybrid-classical models due to its straightforward integration with the Machine Learning library TensorFlow (Abadi et al. 2015). Paddle Quantum allows for effortless application of QNNs to define LOCC (Local Operations and Classical Communication) protocols (Chitambar et al. 2014). Moreover, it allows the simulation of some quantum machine learning algorithms defined in the measurement-based quantum computation formalism.

4 Proposed approach

As described in Section 3, many different quantum machine learning software exists, and most of them have few high-level algorithmic capabilities. However, there are several issues to address: many experiments require the interaction between different software platforms, e.g. Qiskit with PyTorch, requiring specific expert knowledge to be used. Furthermore, experiments need to be compared with theoretical results which are growing in the literature without, usually, a common implementation baseline.

Therefore, we have designed QuASK, a unifying, easy-to-use software framework that automates each phase of an experiment: the selection of the dataset, the preprocessing, the definition of the kernel, its implementation, and analysis. QuASK can be used both as a standalone executable through its command line interface and as a software library. The first approach performs the experiment without writing a single line of code. The second approach might be interesting if the researcher needs to use both existing code routines. After having accurately processed the data, and implemented them to compute classical and quantum Gram matrices we have a modest range of metrics (proposed in QuASK) to evaluate the obtained kernel methods.

4.1 Running experiments through a command line interface

We show how to use QuASK to perform an end-to-end experiment. Once installed, the software is run with quask \(<\texttt{command}>\)Footnote 5. QuASK performs the sequence of operations illustrated in Fig. 2.

Fig. 2
figure 2

Sequence of operations performed when analyzing a dataset using QuASK

The experiment should start with the choice of a dataset. In such a case, QuASK offers several classical datasets both for regression and classification tasks. Moreover, some quantum datasets are available, i.e. datasets whose features has be encoded on a quantum system and modified by a unitary transformation, such as the one used in Huang et al. (2021). The output of the process is a pair of NumPy binary files representing the feature data and the corresponding labels.

figure a

The dataset, which can be obtained by using the QuASK command get-command or by using any dataset provided by the user in NumPy format (a feature matrix X.npy, \(X \in \mathbb {R}^{d \times n}\), and a label vector y.npy, \(y \in \mathbb {R}^{1 \times n}\)), can be preprocessed classically before being fed to the quantum machine learning algorithm. Several preprocessing techniques are available. Firstly, the researcher can vertically slice the dataset, keeping only a certain range of labels. Specifically, the software prompts the researcher to simplify the classification task by restricting it to binary classification. However, it is worth noting that most kernel-based predictors are able to handle both binary and multi-class classification problems. Secondly, the user can apply dimensionality reduction techniques. These are important especially in the NISQ setting due to the lack of resources. The techniques available are PCA for numerical data and FAMD for mixed numerical and categorical data.Footnote 6 These choices are motivated by the fact that PCA is a widely used dimensionality reduction technique, while FAMD is a new method specifically tailored to handle categorical data. The user can extend QuASK to include further dimensionality reduction techniques. Thirdly, it is possible to fix the possible imbalanceness of the classes using random undersampling or random oversampling. When loading, the script already shows some statistics about the dataset, both for classification and regression tasks, which can guide the user through the preprocessing. The output of the process is the four files X_train, y_train, X_test, y_test which can now be fed to some kernel machine.

figure b

At this point, the quantum kernel is built on the processed dataset. There are several available techniques the researcher can select from. The results of such an evaluation are the kernel Gram matrices corresponding to the training and testing datasets.

figure c

The researcher can use optimized quantum kernels, i.e. quantum kernels whose circuits have been chosen after an optimization process. Such a process can be gradient-descent (ADAM optimizer) or gradient-free (grid search optimizer) based, in case we are optimizing the angles of the quantum operations, or combinatorial-optimization based, in case we are optimizing the generators of the quantum transformations. Although some quantum machine learning frameworks, such as PennyLane and Qiskit, already allows gradient-descent optimization of any circuit (including quantum kernels), no one offers the capabilities to adaptively choose the generators of the transformation through combinatorial optimization.

figure d

Finally, the researcher can calculate the accuracy of the kernel model using the training and testing Gram matrices given as input. The output is a plot comparing the different kernels. For each kernel matrix, the user specifies the label that appears at the x-axis of the plot. If multiple instances are specified with the same label these are interpreted as i.i.d. random experiments and will contribute to the error bars. Multiple metrics are defined.

figure e

4.2 Integrating QuASK in an existing code base

There might be cases in which the command line interface of QuASK cannot be straightforwardly used for a certain project. For example, the researcher might be forced to use a particular preprocessing technique, or analyze the result accordingly to a custom metric. Such cases can be addressed by integrating QuASK with the existing code base. In fact, QuASK provides a library of elements that can be integrated with other projects. The software is organized into several modules whose structure is shown in Fig. 3.

Fig. 3
figure 3

Software stack describing the modularity of QuASK. QuASK is written on top of PennyLane for defining the quantum circuit software and Sci-Kit Learn. These frameworks, written in Python, allow accessing basic machine-learning routines

4.2.1 Download or generate datasets

The quask.datasets module facilitates the researcher in choosing a suitable dataset, providing some of the most popular datasets from OpenML platformFootnote 7 and custom datasets generated also from quantum experiments (the latter allows us to reproduce results in Huang et al. (2021)).

figure f

4.2.2 Evaluation metrics

The quality of a quantum kernel can be empirically tested through the performance of a kernel machine with respect to a certain dataset, by evaluating some metrics. The module quask.metrics contains the metrics to compare and evaluate the kernels, including the kernel polarity (i.e. Frobenius inner product between two Gram matrices), the target-kernel alignment (Cristianini et al. 2001), the training and testing accuracy of the Support Vector Machine with the precomputed kernel, the geometric difference (Huang et al. 2021) (which can be used to find a potential quantum advantage), and the model complexity (Huang et al. 2021).

figure g

4.2.3 Implement quantum kernels

The quantum kernel requires the definition of a feature map \(\phi \) which is implemented using a parameterized unitary transformation \(U_\phi (x)\). The Quantum Kernel Estimation algorithm (Havlíček et al. 2019) calculating the function \(k(x, x') = \langle {\phi (x)}\vert {\phi (x')}\rangle \) is implemented through either the overlap test (Fig. 1a) or the C-SWAP test (Fig. 1b). The projected quantum kernel calculates classically the inner product between two feature vectors \(\phi (x_1), \phi (x_2)\), each one being the output of the quantum feature map \(\phi (x) = \langle 0\vert U^\dagger (x) H U(x)\vert 0\rangle \) (Fig. 1c). The feature map crosses the quantum space first through U(x) and projects the data back in a classical representation when measuring with the Hermitian operator H. QuASK contains both some notable unitary transformations U from the literature and the code to use such unitary transformations as a kernel function through one of the three methods described above. The user can define their own unitary transformation and immediately get the kernel function. QuASK is agnostic with respect to the software framework used to define, simulate and execute the quantum circuits: we have implemented some unitary transformations in PennyLane. This allows also the use of the different functionalities offered by the different frameworks. For example, noiseless simulation with PennyLane can be speeded up using JAX.Footnote 8 The open-source nature of QuASK allows for easy integration of other quantum computing frameworks in this platform.

The module quask.kernels collect all the quantum kernels defined within the platform. Most of the quantum kernels available are parametric quantum transformations in the form of Eq. 12. Such a module can be straightforwardly extended to include user-specified quantum kernels. However, we can also have a more expressive quantum transformation, parameterized by both the user features and some trainable parameters, which can be adjusted using gradient-descent-based techniques to minimize some criteria. Such criteria can be one of the functions implemented in the quask.metrics module. The user can take advantage of the efficient optax optimization library. Moreover, we can use Structure Learning techniques (Incudini et al. 2022) to optimize the generators of the transformation using a combinatorial-optimization-based technique such as Simulated Annealing (Kirkpatrick et al. 1983) or Genetic Algorithms (Forrest 1996).

figure h

4.3 Execution on real hardware

As the software is built on top of PennyLane, QuASK offers the same possibility of execution on real-world hardware. In particular, with the Qiskit-PennyLane pluginFootnote 9 it is possible to run the quantum circuit on the IBM superconductor-based quantum hardware, and with the Braket-PennyLane pluginFootnote 10 it is possible to exploit Rigetti, IonQ, and Oxford Quantum Circuit hardware. The execution on the NISQ hardware is noisy and the results may largely deviate from the simulated ones. The authors in Heyraud et al. (2022) have studied the effect of noise on quantum kernels.

5 Conclusions

We have introduced QuASK, a tool supporting researchers in creating powerful quantum kernels. The software takes care of the most time-consuming and error-prone aspects of the experimentation. It exploits theoretical metrics in QML, providing users with an environment to easily assess cases for potential quantum advantage. This package offers the exciting perspective of testing these metrics on real-world datasets. The QuASK project will be extended in future versions with a wider range of datasets and feature maps, both classical and quantum.