1 Introduction

The field of quantum technologies (Nielsen and Chuang 2010) has seen tremendous progress in recent years, with the potential to transform a wide range of scientific research and industries. One possible application of quantum computing is the field of quantum machine learning (Rebentrost et al. 2014; Biamonte et al. 2017; Cerezo et al. 2022), which could potentially be used for classifying and recognizing complex patterns more efficiently than classical methods (Abbas et al. 2021). The quantum kernel method, a candidate in the field, leverages quantum states as described by quantum circuits to compute inner products between pairwise data points in the high-dimensional quantum feature space (Havlíček et al. 2019; Schuld and Killoran 2019). The classification based on the quantum kernel method is known as quantum support vector machine (QSVM), which is a quantum analog of classical support vector machine (SVM) that has been used for a variety of machine learning tasks (Cortes and Vapnik 1995; Schölkopf and Smola 2002). An advantage of QSVM with certain feature maps for classically hard problems has been mathematically analyzed for the regime of fault-tolerant quantum computing (Liu et al. 2021; Jäger and Krems 2023). On the other hand, current quantum computers are still noisy intermediate-scale quantum (NISQ) devices (Preskill 2018); that is, NISQ processors are error-prone, and error mitigation is sometimes necessary to reduce the impact of errors (Temme et al. 2017; LaRose et al. 2022). Despite the challenges, with the aid of cloud computing technology, there has been growing interest in the quest for early practical applications of near-term devices (Bharti et al. 2022).

In recent years, there has been remarkable progress in quantum hardware (de Leon et al. 2021), opening the path for the implementation of NISQ algorithms. Previous studies on quantum kernels have explored the use of various quantum hardware platforms, such as superconducting qubits (Havlíček et al. 2019; Djehiche and Löfdahl 2021; Heredge et al. 2021; Peters et al. 2021; Wang et al. 2021; Hubregtsen et al. 2022; Krunic et al. 2022), trapped-ion qubits (Moradi et al. 2022), Gaussian boson sampling (Schuld et al. 2020; Giordani et al. 2023), neutral atom qubits (Albrecht et al. 2023), and nuclear-spin qubits (Kusumoto et al. 2021). Owing to quantum decoherence and the noise of quantum gates, one can typically perform a limited number of quantum operations on NISQ devices. In this regard, trapped-ion quantum processors seem to offer some advantages, thanks to long coherence time, all-to-all connectivity, and high-fidelity gate operations (Bruzewicz et al. 2019). Previous studies have demonstrated the implementation of different NISQ algorithms on trapped-ion quantum computers; in particular, researchers have recently used the IonQ Harmony quantum processor and reported interesting results in quantum machine learning (Johri et al. 2021; Ishiyama et al. 2022; Rudolph et al. 2022), finance (Zhu et al. 2022), quantum chemistry (Nam et al. 2020; Zhao et al. 2023), and the generation of pseudo-random quantum state (Cenedese et al. 2023). A recent study has shown the feasibility of implementing QSVM with a simple quantum circuit on a trapped-ion quantum computer (Moradi et al. 2022); nonetheless, further investigation is necessary to understand the full potential of the quantum kernel method on this platform using a different quantum kernel and various datasets.

In the present work, we investigate the performance of quantum support vector classification (QSVC) and quantum support vector regression (QSVR) on a trapped-ion quantum computer, using datasets from different industry domains including finance and materials science, aiming to bridge the gap between potential quantum computing applications and real-world industrial needs. Here, we employ quantum kernels described by a shallow quantum circuit that can be implemented on the IonQ Harmony quantum processor and analyze the performance of the models, in comparison with that of the classical counterpart as well as with that obtained from noiseless quantum circuit simulations.

The remainder of the paper is organized as follows. To estimate the number of quantum measurements necessary for the estimation of quantum kernels for reliable predictions, we first perform noiseless and noisy quantum computing simulations before conducting quantum experiments. Next, we investigate the effect of noise on the performance of the QSVC models using noisy simulations with various values for qubit gate error rates. Then, we report the results of QSVMs on the trapped-ion quantum processor. We train our QSVC models using a dataset containing fraudulent credit card transactions and image datasets such as the MNIST dataset and the Fashion-MNIST (Xiao et al. 2017) dataset. Also, we train our QSVR models using a financial market dataset and a dataset for superconducting materials. In the QSVR tasks, to reduce the effect of noise, we use a low-rank approximation of the noisy quantum kernel and carefully optimize hyperparameters in SVMs. We demonstrate that our quantum kernel can be used for both the QSVC and QSVR tasks for our datasets examined. Finally, we summarize our conclusions.

2 Results

2.1 Quantum circuit and the quantum kernel method

In the NISQ era, two-qubit gates are typically an order of magnitude lower in fidelity compared to single-qubit gates. This means that one can only perform a limited number of quantum operations to ensure that the results are distinguishable from noise. In the present study, we use the following quantum feature map using a shallow quantum circuit:

$$\left|\phi \left({{x}}\right)\rangle \right.={U}\left({{x}}\right)|{0}^\otimes{^n}\rangle =\left({\bigotimes}_{q=1}^{n}{R}_{z}\left({x}_{q}\right)\right){U}_{{2}^{n}}^{{\text{ent}}}\left({\bigotimes}_{q=1}^{n}\left({R}_{y}\left({x}_{q}\right){R}_{z}\left({x}_{q}\right)H\right)\right)|{0}^\otimes{^n}\rangle$$
(1)

and

$${U}_{{2}^{n}}^{{\text{ent}}}:=\prod_{q=1}^{n-1}{\mathbf{C}\mathbf{N}\mathbf{O}\mathbf{T}}_{q,q+1}$$
(2)

Note that the connectivity of qubits in Eq. 1 is limited to their neighbors, resulting in \((n-1)\) interactions. This can make quantum computation more amenable for near-term quantum devices. The quantum feature map given in Eq. 1 has been applied to image classification using a specialized quantum kernel simulator, which is highly customized for this particular quantum circuit using field programmable gate arrays (Suzuki et al. 2023). In quantum machine learning, the quantum kernel \(K\left({\varvec{x}},{\varvec{x}}^{\prime}\right)\) described by the quantum feature map can be estimated by the inner product of the quantum states obtained from the two data points \({\varvec{x}}\) and \({\varvec{x}}^{\prime}\):

$$K\left({\varvec{x}},{\varvec{x}}\mathrm{^{\prime}}\right)={\left|\langle \phi \left({\varvec{x}}\right)|\phi \left({\varvec{x}}\mathrm{^{\prime}}\right)\rangle \right|}^{2}=\left|\langle {0}^\otimes{^n}\right|{\mathcal{U}}^\dagger\left({\varvec{x}}\right)\mathcal{U}\left({{\varvec{x}}}^{\prime}\right)|{0}^\otimes{^n}\rangle {|}^{2}$$
(3)

The kernel represents the similarity between the two data points in the high-dimensional Hilbert space. The quantum kernel entry can be estimated by using a shallow quantum circuit described in Fig. 1a. Once the quantum kernel is estimated by a quantum computer or a quantum circuit simulator, we can use the kernel-based method (Fig. 1b). The goal of SVM is to find the decision function for binary classification. Suppose we are given a set of samples \(\left({{\varvec{x}}}_{i},{y}_{i}\right)\) with \({{\varvec{x}}}_{i}\in {\mathbb{R}}^{d}\) and \({y}_{i}\in \{\pm 1\}\), where \(d\) is the dimension of the input vector and the index \(i\) runs over \(1,\cdots ,m\). To find the decision function, we solve the following problem (Schölkopf and Smola 2002):

$$\underset{\boldsymbol{\alpha } \in {\mathbb{R}}^{m}}{{\text{max}}}W\left(\boldsymbol{\alpha }\right)=-\frac{1}{2}\sum_{i,j=1}^{m}{\alpha }_{i}{\alpha }_{j}{y}_{i}{y}_{j}K\left({{\varvec{x}}}_{i},{{\varvec{x}}}_{j}\right)+\sum_{i=1}^{m}{\alpha }_{i}$$
(4)

subject to

Fig. 1
figure 1

Schematic representation of our QSVC and QSVR workflows. a Shallow quantum circuit for estimating our quantum kernel \({\left|\langle \phi \left({\varvec{x}}\right)|\phi \left({\varvec{x}}{^\prime}\right)\rangle \right|}^{2}\). Our quantum feature map \(\left|\phi\left(\boldsymbol x\right)\rangle=U\left(\boldsymbol x\right)\right|0^{\otimes n}\rangle\) is given in Eq. 1 in the text. b Workflow for our QSVC and QSVR tasks (indicated by orange and green, respectively). For QSVC tasks, a credit card dataset, the MNIST dataset, and the Fashion-MNIST dataset were used. For QSVR tasks, a financial dataset and a materials dataset were used. The estimation of the quantum kernel can be obtained using a quantum circuit simulator (either with or without noise) on a CPU (indicated by the white boxes) or computed using the IonQ Harmony (indicated by the blue box). Low-rank approximation was employed for QSVR tasks to reduce noise in the quantum kernel (for more details, see Sect. 2.3). The optimization of the hyperparameters in the models was also performed

$${\alpha }_{i}\in \left[0,C\right]\mathrm{\;and\;}{\sum }_{i=1}^{m}{y}_{i}{\alpha }_{i}=0$$
(5)

Here, the coefficients \(\left\{{\alpha }_{i}\right\}\) are parameters determined through the optimization process. The patterns \({{\varvec{x}}}_{i}\) for which \({\alpha }_{i}>0\) are called support vectors (SVs). The regularization parameter \(C\) controls the tradeoff between model complexity and its capacity to tolerate errors. The decision function \(f\left({\varvec{x}}\right)\) takes the form:

$$f\left({\varvec{x}}\right)={\text{sgn}}\left(\sum_{i=1}^{m}{y}_{i}{\alpha }_{i}K\left({{\varvec{x}}}_{i},{{\varvec{x}}}_{j}\right)+b\right)$$
(6)

The bias \(b\) can be determined by the support vectors once the SVs and their Lagrange multipliers are obtained by the dual optimization (Schölkopf and Smola 2002).

2.2 Noiseless and noisy quantum simulations

To understand the effects of noise, we first employed a device noise model provided by Qiskit Aer (Aleksandrowicz et al. 2019). Here, the noise model is based on a depolarizing noise model, in which single-qubit gate errors and two-qubit gate errors are taken into account. Single-qubit errors consist of a single-qubit depolarizing error followed by a single-qubit thermal relaxation error, whereas two-qubit gate errors comprise a two-qubit depolarizing error followed by single-qubit thermal relaxation errors on both qubits in the gate. Hereafter, we denote them as \({p}_{1}\) and \({p}_{2}\), respectively. In the context of the quantum kernel method, it is of particular importance to understand how noise affects the quality of the quantum kernel matrix and the prediction accuracy. To quantify this, we use the alignment between two kernels (Cristianini et al. 2001) defined by

$$A\left(K,{K}^{\mathrm{^{\prime}}}\right)=\frac{{\langle K,{K}^{\mathrm{^{\prime}}}\rangle }_{F}}{\sqrt{{\langle K,K\rangle }_{F}{\langle {K}^{\mathrm{^{\prime}}},{K}^{\mathrm{^{\prime}}}\rangle }_{F}}}$$
(7)

where \({\langle P,Q\rangle }_{F}\) is the Frobenius inner product between the matrices \(P\) and \(Q\):

$${\langle P,Q\rangle }_{F}=\sum_{ij}{P}_{ij}{Q}_{ij}={\text{Tr}}\left\{{P}^{T}Q\right\}$$
(8)

The alignment \(A\left(K,{K}{^\prime}\right)\) can be viewed as the cosine of the angle between the two matrices viewed as vectors. By using the alignment of the noisy quantum kernel \({K}^{{\text{noise}}}\) with the noiseless quantum kernel \(K\), \(A\left(K,{K}^{{\text{noise}}}\right)\), we can conveniently measure the deviation of a noisy kernel from the noiseless one.

Using device noise model simulations, we investigated the robustness of our quantum kernel matrix in the presence of noise and the prediction performance (Fig. 2). In our noisy simulations, we varied the number of qubits from 4 to 12 in Eq. 1 and considered the following conditions for qubit gate error rates: (i) \({p}_{1} = 0.001\), \({p}_{2}= 0.005\) and (ii) \({p}_{1}= 0.01\), \({p}_{2}=0.05\). In the Appendix, we numerically show that 500 shots per kernel entry were enough to ensure the quality of our quantum kernel and to maintain reliable predictions; thus, 500 shots were conducted for each kernel entry throughout our simulations. To explore the applicability of our quantum kernel, three different datasets were considered: the credit card fraudulent transaction dataset, the MNIST dataset, and the Fashion-MNIST dataset. For the three datasets, the test accuracy obtained from the noisy quantum kernel was on par with that obtained from the noiseless quantum simulations, suggesting that the noise in the quantum kernel had minimal impact on the test accuracy of our QSVM models. This can be confirmed by the fact that the alignment was above 0.996, which may be partly due to the nature of our shallow quantum circuits. On the other hand, we are aware that our simulations based on the device noise model are only an approximation of real errors that occur on actual devices (In Sect. 2.3, we will demonstrate the performance of our QSVC models on the real quantum device using 4 qubits).

Fig. 2
figure 2

The dependence of the prediction performance of our QSVC models on the number of qubits from 4 to 12 (noisy simulations). The error bars indicate the standard deviation obtained from 5 independent seeds. Top panel: test accuracy for a credit card dataset, b MNIST dataset (binary classification of two labels: “0” vs. “1”), and c Fashion-MNIST dataset (binary classification of two image categories: “T-shirt” vs. “trouser”). Bottom panel: alignment of the noisy quantum kernel with the noiseless quantum kernel for d credit card dataset, e MNIST dataset, and f Fashion-MNIST dataset. In our device noise model simulations, we consider the following conditions for single- and two-qubit gate error rates: (i) \({p}_{1} = 0.001\), \({p}_{2}= 0.005\) (indicated by blue); (ii) \({p}_{1}= 0.01\), \({p}_{2}=0.05\) (indicated by red). Five independent seeds for each dataset were used to obtain the statistical results. The number of training data was 40, and the number of test data was 20

Next, we investigated how the device noise level affects the alignment and the test accuracy (Fig. 3). To this end, we performed device noise model simulations using 4 qubits for a range of qubit gate error rates: \(0.001 \le {p}_{1}\le 0.55\) and \(0.001 \le {p}_{2}\le 0.55\). By comparing the alignment and the test accuracy, we found that, for certain regions (\({p}_{1}< \approx 0.05\) and \({p}_{2}< \approx 0.1\)), our QSVC model can predict, even in the presence of noise (Fig. 3a, b). When qubit gate error rates become relatively high, however, the noisy quantum kernel deviates from the ideal quantum kernel, and thus, the prediction performance begins to deteriorate rapidly. Given the fact that \({p}_{1}\approx 0.001\) and \({p}_{2}\approx 0.01\) for state-of-the-art NISQ devices, we can validate the prediction performance of our QSVC model on real quantum computers (which will be demonstrated in the next subsection). Our noise model simulations suggest that the alignment between noiseless and noisy quantum kernels is an indicator of how reliably a QSVC model can predict using a NISQ device in comparison with its noiseless counterpart. In our QSVC model, if the alignment is higher than 0.98, then the QSVC model can make reliable predictions (Fig. 3c). To understand this more intuitively, one can recall that the alignment can be viewed as the cosine of the angle between the two matrices (viewed as vectors); in such a mathematical viewpoint, it means that the angle between noiseless and noisy kernels needs to be less than 11.5° for reliable predictions on a noisy quantum device.

Fig. 3
figure 3

Effects of noise on our QSVM model. a The alignment, b the test accuracy of the QSVC model, and c the correlation between the two. The device noise model simulations were performed using 4 qubits for a range of qubit gate error rates: \(0.005 \le {p}_{1}\le 0.55\) and \(0.005 \le {p}_{2}\le 0.55\). The results suggest that our QSVC model is capable of making reliable predictions, if the alignment is higher than 0.98, which roughly corresponds to the condition that \({p}_{1}< \approx 0.05\) and \({p}_{2}< \approx 0.1\). The shaded area in part c indicates the standard deviation. The Fashion-MNIST dataset was used, and the number of training data was 40 and the number of test data was 20

2.3 QSVC on the IonQ Harmony quantum computer

Having examined the results of the noise model simulations, we now turn to our quantum experiments using the IonQ Harmony. The Gram matrices we obtained using the quantum device (4 qubits) are shown in Fig. 4. To validate the quality of our noisy quantum kernels, we investigated the alignment of the noisy quantum kernel with the noiseless quantum kernel: the values for the alignment \(A\left(K,{K}^{{\text{noise}}}\right)\) were 0.986, 0.984, and 0.993 for the credit card dataset, the MNIST dataset, and the Fashion-MNIST dataset, respectively. Since the three values were higher than 0.98, this suggests that the quantum kernel matrix entries were successfully estimated using the IonQ Harmony and indicates that reliable predictions can be made using our QSVC models on the quantum device (see also Fig. 3c).

Fig. 4
figure 4

Quantum kernel matrices obtained using the IonQ Harmony quantum computer. a Credit card dataset, b MNIST dataset, and c Fashion-MNIST dataset. For all the cases, 4 qubits were used for obtaining the matrices, with the number of training data \(N=20\). The values for the alignment of the noisy quantum kernel with the noiseless quantum kernel, \(A\left(K,{K}{^\prime}\right)\), were 0.986, 0.984, and 0.993 for a, b, and c, respectively. All of the values were higher than 0.98, which suggests that the quantum kernel matrix entries were successfully obtained using the IonQ quantum computer and indicates that reliable predictions can be made using our QSVC models on the quantum device (see also Fig. 3c)

Motivated by the reliable estimation of the quantum kernel on the IonQ Harmony, we trained QSVC models using the three datasets and validated the models using test data (Table 1). For comparison, we used classical Gaussian kernels \(K({\varvec{x}},{\varvec{x}}{^\prime})={\text{exp}}(-{\gamma \Vert {\varvec{x}}-{{\varvec{x}}}{^\prime}\Vert }^{2})\). For the credit card dataset, the classical SVM parameters were the regularization constant \(C=3.2\) and \(\gamma =0.25\), whereas QSVM parameters were \(C=6.2\) for the noiseless simulation and \(C=4.2\) for the IonQ machine. For the MNIST dataset, the classical SVM had \(C=3.5\) and \(\gamma =0.25\), whereas both the noiseless and noisy QSVMs had \(C=1.0\). Finally, for the Fashion-MNIST dataset, the classical SVM parameters were \(C=1.5\) and \(\gamma =0.25\), whereas the QSVM had \(C=0.4\) for the noiseless simulation and \(C=1.0\) for the IonQ machine.

Table 1 Model accuracy and the number of support vectors (SVs) for classical and quantum SVMs on three different datasets. The dimension for the input data was reduced to 4

Our results show that the prediction performance of our QSVC models was maintained even in the presence of noise. Test accuracies achieved with the quantum computer for the credit card dataset, the MNIST dataset, and the Fashion-MNIST dataset were 70%, 100%, and 100%, respectively, reflecting equivalent performance to the QSVC models using noiseless quantum kernels. This is consistent with the results of our noise model simulations, in which predictions can be made using noisy quantum kernels with an alignment higher than 0.98. Furthermore, the performance of our QSVC models on the IonQ Harmony was comparable to that of the classical counterparts. In Supplementary Information, we also included the results of the IonQ Aria experiments involving 4 and 8 qubits. Thanks to the improved fidelity of the two-qubit gate operations in the IonQ Aria, we were able to obtain a quantum kernel using an 8-qubit system. Our QSVC model with 8 qubits achieved a 100% test accuracy on the Fashion-MNIST dataset. On a final note, we mention the number of support vectors (note that the decision boundary for the largest margin is determined solely by the position of the support vector): we found that there was a slight difference in the number of support vectors between QSVC models with noiseless kernels and those with noisy kernels on the IonQ Harmony, which might imply a subtle difference in the quantum feature map between noiseless simulations and actual quantum experiments, though both the QSVC models gave the same test accuracies.

2.4 QSVR on the IonQ Harmony quantum computer

2.4.1 Datasets

Two different datasets were used in our QSVR tasks. One is a financial market dataset (given in Tables S2 and S3 in Supplementary Information). Financial data are characterized by high volatility and are often subject to noise caused by random fluctuations. In recent years, pandemics, geopolitical risks, and other microeconomic factors have caused supply chain disruption, which led to price fluctuations of metal commodities such as nickel. In our QSVR model, the target variable \({y}_{i}\) was the UK nickel price, and three attributes \({{\varvec{x}}}_{i}\in {\mathbb{R}}^{3}\) were considered: the Shanghai Stock Exchange Composite (SSE) Index, West Texas Intermediate (WTI) crude oil, and the US Dollar Index. Thus, in our QSVR model, 3 qubits were used for describing the quantum feature map on the Ion Q Harmony. Hereafter, the dataset is referred to as the financial dataset. The other is a superconducting materials dataset (Hamidieh 2018). Here, the target variable \({y}_{i}\) was the critical temperature \({T}_{c}\) for a broad class of superconducting materials. The original dataset contains 81 features (or descriptors); by using dimensionality reduction, four-dimensional vectors \({{\varvec{x}}}_{i}\in {\mathbb{R}}^{4}\) were used as input data in this work (for more details on preprocessing, see Sect. 4.1). Hence, 4 qubits were used to describe the quantum feature map. Hereafter, the dataset is referred to as the materials dataset.

2.4.2 Low-rank approximation in the noisy quantum kernel

A popular approach to reducing the noise in the quantum kernel is to use a depolarizing model (Hubregtsen et al. 2022; Moradi et al. 2022); however, such a noise model may not necessarily be suited for real quantum devices, because there are various sources of noise. In addition, at the time of conducting our quantum experiment, we were not able to access the full control of native quantum gates of the trapped-ion quantum computer in the cloud service. In this work, we rather employed a postprocessing approach for error mitigation; in particular, we used low-rank approximation to reduce the noise in the quantum kernel. This can maintain the important information of the original matrix while reducing the noise. The low-rank approximation can be performed, for instance, by singular value decomposition (SVD). Recently, a study by Wang et al. (2021) showed that the training performance of noisy quantum kernels is improved when spectral transformation (eigendecomposition) is adopted. Our idea is to reconstruct a quantum kernel \(\widehat{K}\) from a noisy quantum kernel \(K\) by using eigendecomposition:

$$\widehat{K}=\sum_{k=1}^{r}{\mu }_{k}{{\varvec{u}}}_{k}{{\varvec{u}}}_{k}^{{\text{T}}}$$
(9)

where \({\mu }_{k}\) is the \(k\) th eigenvalue and \({{\varvec{u}}}_{k}\) is the corresponding \(k\) th eigenvector. The quantum kernel is approximated by summing over \({\mu }_{k}{{\varvec{u}}}_{k}{{\varvec{u}}}_{k}^{{\text{T}}}\), where the index \(k\) runs over \(1,\cdots ,r\). Motivated by the important role of the alignment in QSVC (Fig. 2c), we argue that the optimal value for \(r\) can be determined by maximizing the alignment of the noisy quantum kernel with the noiseless one:

$${r}^{*}=\underset{r\in {\mathbb{ N}}}{{\text{argmax}}}\;A\;(K,\widehat{K})$$
(10)

In the case of test data, we calculate a train-test kernel matrix, which is generally a rectangular matrix; hence, SVD was used for low-rank approximation.

We investigated the effects of low-rank approximation in improving the quality of the noisy quantum kernel (Fig. 5). For the financial dataset, the alignment had the maximum value of 0.993 at \({r}^{*}=8\) (Fig. 5a). For the materials dataset, on the other hand, the alignment had the maximum value of 0.984 at \({r}^{*}=10\) (Fig. 5d). The difference in the optimal \({r}^{*}\) appears to be related to the difference in the nature of the datasets. For both cases, after the alignment peaked at the optimal \({r}^{*}\), the value for the alignment was gradually decreased and finally saturated for larger values of \(r\). This can be confirmed by the fact that the contribution of eigenvectors for \(k>{r}^{*}\) became substantially small (Fig. 5b, e), indicating that a large portion of the information is concentrated in eigenvectors up to \({r}^{*}\) th. The results suggest that low-rank approximation can improve the quality of the noisy quantum kernel to some extent.

Fig. 5
figure 5

Use of spectral decomposition in improving the quantum kernel on the quantum processor. Top (financial dataset; 3 qubits): a alignment between the reconstructed quantum kernel and the noiseless quantum kernel with respect to the rank in the low-rank approximation (the alignment has the maximum value of 0.993 at \({r}^{*}=8\), which is indicated by the red circle); b eigenvalues with respect to the Eigenvector Index (the first 8 components are indicated by the red bars); c reconstructed quantum kernel using the low-rank approximation (\({r}^{*}=8\)). Bottom (materials data; 4 qubits): d alignment between the reconstructed quantum kernel and the noiseless quantum kernel with respect to the rank in the low-rank approximation (the alignment has the maximum value of 0.984 at \({r}^{*}=10\), which is indicated by the red circle); e eigenvalues with respect to the Eigenvector Index (the first 10 components are indicated by the red bars); f reconstructed quantum kernel using the low-rank approximation (\({r}^{*}=10\))

2.4.3 Optimization of hyperparameters in \({\varvec{\varepsilon}}\)-support vector regression (SVR)

The goal of \(\varepsilon\)-SVR is to find a regression function \(f({\varvec{x}})\) that has at most \(\varepsilon\) deviation from the obtained targets \(\left\{{y}_{i}\right\}\) for all the training data \(\left\{{{\varvec{x}}}_{i}\right\}\) (Schölkopf and Smola 2002). Here, the hyperparameter \(\varepsilon\) defines a margin of tolerance (orε-insensitive tube) where no penalty is associated with errors. In other words, any data points within this allowable error range are not considered errors, even if they do not fall directly on the regression line. This can be realized by using the \(\varepsilon\)-insensitive loss function introduced by Vapnik, which is an analog of the soft margin in SVC (Schölkopf and Smola 2002). The linear ε-insensitive loss function can be described by

$${L}_{\varepsilon }=\left\{\begin{array}{cc}0& \left|y-f({\varvec{x}})\right|\le \varepsilon \\ \left|y-f({\varvec{x}})\right|-\varepsilon & {\text{otherwise}}\end{array}\right.$$
(11)

A smaller value of \(\varepsilon\) narrows the “no penalty” region, making the model more sensitive to the training data, whereas a larger value of \(\varepsilon\) creates a wider tube, making the model less sensitive to the training data. An appropriate value of \(\varepsilon\) is related to the noise magnitude of data (Zhang and Han 2013). By using the Lagrangian formalism and introducing a dual set of variables, the primal problem in \(\varepsilon\)-SVR can be transformed into the dual optimization problem (Schölkopf and Smola 2002), which is described as follows:

$$\underset{\boldsymbol{\alpha },{\boldsymbol{\alpha }}^{\mathbf{*}} \in {\mathbb{ R}}^{m}}{{\text{max}}}W(\boldsymbol{\alpha },{\boldsymbol{\alpha }}^{*})=-\frac{1}{2}\sum_{i,j=1}^{m}\left({\alpha }_{i}^{*}-{\alpha }_{i}\right)\left({\alpha }_{j}^{*}-{\alpha }_{j}\right)K\left({{\varvec{x}}}_{i},{{\varvec{x}}}_{j}\right)-\varepsilon \sum_{i=1}^{m}\left({\alpha }_{i}^{*}+{\alpha }_{i}\right)+\sum_{i=1}^{m}\left({\alpha }_{i}^{*}-{\alpha }_{i}\right){y}_{i}$$
(12)

subject to

$${\alpha }_{i}^{*}, {\alpha }_{i}\in \left[0,C\right]\mathrm{\;and\;}{\sum }_{i=1}^{m}\left({\alpha }_{i}^{*}-{\alpha }_{i}\right)=0$$
(13)

Here, the coefficients \(\left\{{\alpha }_{i}\right\}\) and \(\left\{{\alpha }_{i}^{*}\right\}\) are parameters determined through the optimization process. The regularization parameter \(C\) determines the tradeoff between the complexity of the model and its capacity to tolerate errors. A larger value of \(C\) makes the model less tolerant of errors, which potentially leads to a risk of overfitting, whereas a smaller value of \(C\) helps the model be more tolerant of errors, which tends to make the model less complex. By tuning the hyperparameters \(\varepsilon\) and \(C\), one can find a good combination of parameters that makes the model more robust on new data, thus improving its generalization performance. The regression function takes the form (Schölkopf and Smola 2002):

$$f\left({\varvec{x}}\right)=\sum_{i=1}^{m}\left({\alpha }_{i}^{*}-{\alpha }_{i}\right)K({{\varvec{x}}}_{i},{\varvec{x}})+b$$
(14)

The bias \(b\) can be determined by the SVs, once the Lagrange multipliers are obtained by the dual optimization. To access the performance of the model, we used the root-mean-square error (RMSE):

$${\text{RMSE}}=\sqrt{\frac{1}{N}{\sum }_{i=1}^{N}{\left({y}_{i}-{\widehat{y}}_{i}\right)}^{2}}$$
(15)

By performing a grid search for \(\varepsilon\) and \(C\), we optimized the hyperparameters in ε-SVR using the quantum kernel that had been reconstructed by low-rank approximation (Fig. 6). For the financial dataset, the optimal values for the hyperparameters \(\varepsilon\) and \(C\) were 0.21 and 1.4, respectively, indicating that the ε-insensitive loss function was effective in enhancing the generalization of the model and reducing overfitting. The result is partly due to the nature of the financial market data; that is, the data is characterized by high volatility owing to various factors such as geopolitical events and market sentiment and is often subject to noise caused by random fluctuations. For the materials dataset, on the other hand, the optimal values for the hyperparameters \(\varepsilon\) and \(C\) were 0.0 and 0.3, respectively. The optimal value of \(\varepsilon =0\) means that the allowable error range in the training data was unnecessary for this particular case. A possible reason may be that the impact of noise in the quantum kernel was effectively canceled through dimensionality reduction of input data and the use of low-rank approximation of the noisy quantum kernel. At the same time, a slightly smaller value of \(C=0.3\) made the model more robust against overfitting. For both cases, increasing the value for \(C\) (i.e., making the model fit the training data more tightly) did not improve the performance; instead, it had an adverse effect on the results (Fig. 6). Our results suggest that a combined approach that involves both the low-rank approximation to the noisy quantum kernel and the optimization of the hyperparameters in ε-SVR can be a useful strategy for improving the performance and robustness of the QSVR models.

Fig. 6
figure 6

Optimization of the hyperparameters \(C\) and \(\varepsilon\) in \(\varepsilon\)-SVR using the quantum kernel reconstructed by low-rank approximation (see also Fig. 5). RMSE with respect to \(\varepsilon\) for a financial dataset and b materials dataset. The optimal values for \((\varepsilon , C)\) were (0.21, 1.4) and (0.0, 0.3) for a and b, respectively

2.4.4 Performance

Herein, we report the results of our QSVR models on the IonQ trapped-ion quantum computer and compare the performance with that obtained by the classical SVR tasks (Table 2 and Fig. 7). For the classical SVR, we used Gaussian kernels \(K({\varvec{x}},{\varvec{x}}{^\prime})={\text{exp}}(-\gamma {\Vert {\varvec{x}}-{{\varvec{x}}}{^\prime}\Vert }^{2})\). The optimized hyperparameters \(\left(\gamma , \varepsilon , C\right)\) were (0.6, 0.1, 5.7) and (0.3, 0.18, 3.4) for the financial dataset and the materials dataset, respectively. For the financial dataset, the performance of our QSVR model using the noiseless simulation was comparable to that of our classical SVR model: the coefficient of determination (\({R}^{2}\)) for the classical SVR model was 0.930, whereas that for the QSVR model was 0.932. On the other hand, \({R}^{2}\) for the QSVR model using the IonQ Harmony was 0.868, which was 6.9% lower than that obtained by the noiseless simulation (Table 2). The QSVR model worked well in predicting the financial price for this particular period (note that the model may not guarantee a similar performance for another time window). For the materials dataset, \({R}^{2}\) for the classical SVR model was 0.728, whereas that for the QSVR with the noiseless quantum kernel was 0.703. The coefficient of determination \({R}^{2}\) for the QSVR model using the real quantum device was 0.628, which was 10.7% lower than that obtained from the noiseless simulation (Table 2). Since the alignment of the reconstructed quantum kernel for the materials dataset (0.984) was slightly lower than that for the financial dataset (0.993), the decrease in \({R}^{2}\) observed in the real device appeared to be more significant in the case of the former dataset. Overall, the presence of noise negatively affected the performance of tasks (Fig. 7). In QSVC models, the performance appears to be less impacted by noise, as data points can be effectively separated in the high-dimensional space. However, regression models, which predict real values, were more acutely affected by noise. Similar to the case with the QSVC models, the results suggest that the alignment of the quantum kernel is a reliable measure for accessing the performance of the quantum kernel method on the real quantum device.

Fig. 7
figure 7

Parity plot between the predicted and observed values. Top panel (financial dataset): a classical SVR (\({R}^{2}=0.930\)); b QSVR using the noiseless simulation (\({R}^{2}=0.932\)); c QSVR using the IonQ Harmony with 3 qubits (\({R}^{2}=0.868\)). Bottom panel (materials dataset): d classical SVR (\({R}^{2}=0.728\)); e QSVR using the noiseless simulation (\({R}^{2}=0.703\)); f QSVR using the IonQ Harmony with 4 qubits (\({R}^{2}=0.628\)). For all cases, the number of test data was 40

Table 2 Coefficients of determination \({R}^{2}\) and RMSE for classical and quantum SVR models on the financial dataset and the superconducting materials dataset (for more details, see the text)

3 Discussion

In the present work, we have investigated our QSVC and QSVR models, by performing quantum circuit simulations and using the IonQ Harmony quantum processor. For the classification tasks, we used the credit card dataset, the MNIST dataset, and the Fashion-MNIST dataset. The performance of our QSVC models obtained using 4 qubits of the trapped-ion quantum computer was comparable to that of the classical counterparts and that of the QSVC models obtained from the noiseless quantum kernels. This suggests that the presence of noise in the quantum kernel had a minimal impact on the test accuracy of the QSVM models. Our quantum experiments with 4 qubits were consistent with the analysis of our device noise simulations, in which the prediction performance can be maintained so long as the device noise level is lower than a certain threshold. The robustness of our quantum kernel in the presence of noise can be explained by the fact that the alignment between the noiseless and the noisy quantum kernels was close to one (the alignment was higher than 0.98). Hence, our results suggest that the alignment is a reliable measure for evaluating the performance of a QSVM model on a NISQ device in comparison with a noiseless counterpart.

In the case of our QSVR models, we used the financial dataset and the dataset for superconducting materials. In particular, we investigated the role of the low-rank approximation and the effects of the hyperparameter tuning in ε-SVR in improving the performance and robustness of the QSVR models. We found that the low-rank approximation was effective in reducing the effects of noise in the quantum kernel. The optimization of the hyperparameters in ε-SVR was also beneficial for mitigating the effect of noise. Therefore, a combined approach using the low-rank approximation to the noisy quantum kernel and the hyperparameter tuning in ε-SVR can be a useful method for enhancing the performance of the QSVR models. We have demonstrated that the quantum kernel described by our shallow circuit was versatile for both the QSVM and QSVR tasks for the different datasets we examined. While our quantum feature map did not necessarily exemplify a so-called quantum advantage because of its shallow quantum circuit and the limited number of qubits, our findings could provide valuable insights for designing quantum feature maps.

Let us now discuss open questions and challenges. A recent theoretical study by Thanasilp et al. (2022) shows that under certain conditions, quantum kernel entries can exponentially concentrate around a certain value (with an increasing number of qubits). On the other hand, such exponential concentration was not observed in the quantum experiments conducted for our tasks. This discrepancy can be attributed partly to our choice of shallow quantum circuits, which circumvent high expressibility, and partly to the nature of the datasets we used. In general, quantum kernels with high expressibility can lead to training difficulties. For instance, quantum kernels with \(L\) layers of hardware-efficient circuit unit reach the exponential decay regime when \(L\) is sufficiently large (\(L \ge 75\)); in contrast, for a small number of layers (\(L \le 8\)), quantum kernels do not enter the exponential concentration regime (see Fig. 4 in the paper by Thanasilp et al. (2022)). Our quantum kernels consisting of low-depth circuits correspond to the latter scenario, which is in line with our successful results in training and prediction. Furthermore, noise can significantly impose limitations on the potential of quantum computing in the NISQ era. The presence of noise results in the loss of information during gate operations before extracting information through measurements. Although our quantum experiments have shown that training and prediction were feasible for systems with 4 and 8 qubits, an increase in the number of qubits (which in turn leads to an increase in two-qubit gate operations) may result in a substantial decrease in the values of quantum kernels, thereby making the training process more difficult. Lastly, the scalability of quantum kernels remains an open question (Thanasilp et al. 2022; Jerbi et al. 2023), and further advancements are necessary for the practical application of quantum kernel methods. In this context, a new field of geometric quantum machine learning (Meyer et al. 2023; Ragone et al. 2023; West et al. 2023), given its broad theoretical scope, could facilitate the development of carefully designed quantum kernels.

4 Computational details

4.1 Quantum-computing experiments

All the quantum calculations were carried out using the IonQ Harmony quantum processor provided by the Amazon Braket cloud service. To conduct our quantum computing experiments, we have developed our quantum software development kit (SDK) called PhiQonnect, which is especially intended for quantum kernel–based methods, including QSVC and QSVR. The quantum SDK utilizes open-source libraries such as IBM Qiskit (Aleksandrowicz et al. 2019) (ver. 0.39.5); pytket (ver. 1.11.1), which is a language-agnostic optimizing compiler provided by Quantinuum (Sivarajah et al. 2020); amazon-braket-sdk (ver. 1.35.3) developed by Amazon Braket (Amazon Braket SDK Python 2022); and scikit-learn (Pedregosa et al. 2011) (ver. 1.2.1), in which LIBSVM library (Chang and Lin 2011) is included. All the quantum computations on the NISQ device were obtained using our SDK, which is available as open-source software (see Code Availability).

The computational details of our quantum experiments using the IonQ Harmony are summarized in Table 3. To obtain the quantum kernel estimation, the number of quantum measurements per kernel entry was set to 500 (we measured the quantum state on \(Z\) basis). This is supported by our noise model simulations, in which 500 shots were shown to be enough to ensure the quality of the quantum kernel for the objective of this study (see Appendix). To obtain the quantum kernel matrix for the training data, only the upper triangular entries were computed considering the symmetric nature of the quantum kernel, reducing the computational cost of using the quantum device. In training and testing our QSVC model, we used 20 data points for training and 10 data points (which are separate from the training data) for testing. A total of \(\mathrm{105,000} (=20\times \frac{21}{2}\times 500)\) quantum measurements were conducted to obtain the quantum kernel, and a total of \(\mathrm{100,000} \left(20\times 10\times 500\right)\) shots were conducted to obtain the train-test kernel matrix. In training and testing our QSVR model, we used 40 data points for training and 40 out-of-sample data points for testing. A total of \(\mathrm{410,000} (=40\times \frac{41}{2}\times 500)\) quantum measurements were conducted to obtain the quantum kernel, and a total of \(\mathrm{800,000} (40\times 40\times 500)\) shots were conducted to obtain the train-test kernel matrix.

Table 3 Details of quantum computing experiments using the IonQ Harmony

To improve the performance of the machine learning models using the quantum kernels, we introduced a scaling hyperparameter \(\uplambda\) in the quantum feature map (i.e., \({{\varvec{x}}}^{(i)}\leftarrow\uplambda {{\varvec{x}}}^{(i)}\) in the quantum circuit). Such a hyperparameter can calibrate the angles of the rotation gates and affect the quantum feature map in the Hilbert space. The hyperparameter can help improve the performance of the QSVM model (Canatar et al. 2022; Shaydulin and Wild 2022; Suzuki et al. 2023). In the present work, for the classification tasks, the hyperparameter \(\uplambda\) was set to 1.0, whereas for the regression tasks, \(\uplambda\) was set to 1.3.

4.2 Preprocessing the materials dataset

We used a dataset for superconducting materials provided by Hamidieh (2018), which was originally compiled by the National Institute of Materials Science in Japan. In this particular dataset, there are 81 features for the critical temperature \({T}_{c}\). To reduce the number of features that can be encoded into the NISQ device, the original 81-dimensional vector was reduced into the 4-dimensional vector using principal component analysis (Subasi and Gursoy 2010).

The distribution of critical temperature is concentrated in the low-temperature region. Such a non-normal distribution is not suited for building regression models. To overcome this, we used the Box–Cox transformation (Sakia 1992), which is a statistical technique used to transform a non-normal distribution into a normal distribution. It is often used in regression analysis to improve the performance of the model when the data does not follow a normal distribution. The Box–Cox transformation is defined by the following equation:

$${y}^{(\xi )}=\left\{\begin{array}{cc}\frac{{y}^{\xi }-1}{\xi }& (\xi \ne 0)\\ {\text{log}}(\xi )& (\xi =0)\end{array}\right.$$
(16)

Here, \(y\) is the original data, \({y}^{(\xi )}\) is the transformed data, and \(\xi\) is the Box–Cox transformation parameter. In this study, we used \(\xi =0.15084028\).