Expressive Quantum Supervised Machine Learning using Kerr-nonlinear Parametric Oscillators

Quantum machine learning with variational quantum algorithms (VQA) has been actively investigated as a practical algorithm in the noisy intermediate-scale quantum (NISQ) era. Recent researches reveal that the data reuploading, which repeatedly encode classical data into quantum circuit, is necessary for obtaining the expressive quantum machine learning model in the conventional quantum computing architecture. However, the data reuploding tends to require large amount of quantum resources, which motivates us to find an alternative strategy for realizing the expressive quantum machine learning efficiently. In this paper, we propose quantum machine learning with Kerr-nonlinear Parametric Oscillators (KPOs), as another promising quantum computing device. The key idea is that we use not only the ground state and first excited state but also use higher excited states, which allows us to use a large Hilbert space even if we have a single KPO. Our numerical simulations show that the expressibility of our method with only one mode of the KPO is much higher than that of the conventional method with six qubits. Our results pave the way towards resource efficient quantum machine learning, which is essential for the practical applications in the NISQ era.


I. INTRODUCTION
The quantum computers have attracted much attention due to its potential impact on quantum chemistry [1][2][3][4], machine learning [5][6][7], cryptography [8][9][10], search problems [11] and so on.With advancements in quantum technology, commercially available quantum computers have become a reality.In principle, we could realize a fault-tolerant quantum computer, if the number of qubits is more than 10 millions with a fidelity around 0.999 [12][13][14].However, in the current device, the available number of qubits is an order of 500 or less, which is much smaller than that required for the fault-tolerant quantum computation.A more feasible scenario to be realized in the near future is the so-called NISQ regime [15,16].
Numerous quantum algorithms have been designed for execution on NISQ devices.Among these, VQAs are considered some of the most promising applications for NISQ devices [16,17].Specifically, quantum machine learning has emerged as an appealing use case for VQAs.As a NISQ algorithm, quantum machine learning has been predominantly investigated in the context of qubitbased systems.Recent studies have shown that data reuploading, the process of repeatedly encoding classical On the other hand, the KPO is one of the candidates to realize quantum computation [26][27][28].The KPO is a parametric oscillator with large Kerr nonlinearity.This Kerr nonlinearity can be used to generate cat states.We can realize the KPO by using superconducting resonators with Josephson junctions [29,30].The KPO is one of the candidates to perform gate-type quantum computation [28,31,32] and quantum annealing [33,34], and the KPO qubit is realized experimentally [35].It is known that the KPO qubit is highly tolerant to bit-flip errors and we can exploit this property to reduce the overhead for fault-tolerant quantum computation [32,36].
In this paper, we propose to use the KPO for the supervised machine learning with variational algorithm.KPO is a bosonic system, and we can in principle use the infinitely large Hilbert space with the single KPO.Also, unlike the conventional approach to use parametrized gates, we use a natural Hamiltonian dynamics where we change the Hamiltonian parameter to implement the variational algorithm.We numerically study the performance of our method to use the KPO with that of the conventional method with qubits.
In our method, we start from a coherent state with an amplitude of α.Importantly, we numerically find that, by changing the amplitude, we can tune the expressibility.Since we encode the input classical data by using the detuning of the KPO, we can include higher frequency as we increase the amplitude of the coherent state.We expect that the high frequency terms will improve the expressibility, and we confirm this point by using numerical simulations.As the expressivity increases, on the other hand, more often overfitting occurs, and so our method allows us to optimize the expressibility by tuning the amplitude of the coherent state.
This paper is organized as follows.In Sec.II, we review the physics of single and multiple KPO systems.The latter is called KPO network.In Sec.III, we explain a standard supervised machine learning algorithm as a NISQ algorithm, and a supervised machine learning algorithm for KPO is proposed based on the ideas in Sec.IV.We performed numerical simulations to validate our proposing method.In Sec.V, we explain the simulations, and the results precisely.Finally, we conclude with some final thoughts in Sec.VI.

II. KPO
KPO is a bosonic system with a nonlinear effect called Kerr nonlinearity.Here we first describe a single KPO, and next explain a network of KPOs that have been used for a gate-type quantum computer or quantum annealing.
First, in a frame rotating at half the pump frequency of the parametric drive and in the rotating-wave approximation, the Hamiltonian of the single KPO is written as [33,37] where χ, ∆, p and r are the Kerr nonlinearity, the detuning, the pump amplitude of the parametric drive, and the strength of the coherent drive, respectively.We can easily tune ∆, p, and r during the experiment by changing the parameters of the external driving fields.Although we can tune χ by changing magnetic flux penetrating the superconducting loop of the KPO, the dynamic range is typically small, and therefore we assume that χ is fixed at a specific value.
The coherent state is defined by where |k⟩ are the fock states.The system is initially prepared in the coherent state in our method.For a linear resonator, we can prepare the coherent state by adding the coherent driving term r(â + â † ).However, due to the term χâ †2 â2 in Eq. ( 1), we cannot prepare the coherent state just by adding the coherent drive.Instead, we can prepare the coherent state by using the KPO as follows.
By setting p = r = 0, the Hamiltonian (1) becomes If ∆ > χ is satisfied, the ground state of this Hamiltonian becomes the vacuum state |0⟩.On the other hand, when ∆ and r are zero, Eq. ( 1) can be rewritten as where K denotes the number of KPOs and J jj ′ denotes the coupling strength between KPOs.Here, we assume that we fix the values of χ j and J jj ′ during the experiment, while we can control the values of ∆ j , p j , and r j .
If J jj ′ is zero, we can independently perform the adiabatic state preparation described above, and prepare the following state.
Here, each α j is the eigenvalue of |α j ⟩ with the annihilation operator on the j-th KPO âj .
It is worth mentioning that even when J jj ′ is nonzero, we can prepare the product of the coherent state as follows.Let us assume that ∆ j , r j , and J ij are much smaller than p j and χ j .In this case, the last terms of the Hamiltonian Eq. ( 5) can be interpreted as the longitudinal-field Ising Hamiltonian in a coherent-state basis.If J ij is negative, we have a ferromagnetic Hamiltonian.Moreover, by setting J ij to be much smaller than r j , the state in Eq. ( 6) can be a ground state, and so we can prepare this state in an adiabatic way.Also, a coupling scheme of KPOs with high fidelity has already been proposed theoretically [36][37][38].

III. QUANTUM SUPERVISED MACHINE LEARNING AS A NISQ ALGORITHM
In this section, let us review a quantum supervised machine learning as a preparation for introducing our model.In a supervised learning task, a number of training set {(x m , y m )} N m=1 is given.Here, all input data x m (output data y m ) are d x (d y ) dimensional arrays.Suppose that there is a hidden relationship between an input data x and the output data y as y = f (x) with a function f .The objective of the task is to find the hidden relationship f from the training data.More specifically, we define the model function f and optimize it so that it becomes close to f by using the training data.
In most of the quantum machine learning with nearterm devices, we use a parameterized quantum circuit to construct a model function.More precisely, by tuning a parameter, we try to minimize a cost function.In usual cases, we choose the mean squared error for the cost function.Here, N is the number of data sets, f (x; θ) is an array as an output of the parameterized quantum circuit, and θ is the corresponding parameter.
Let us summarize such a quantum machine learning as follows.
1. Prepare an initial state |ψ⟩, and apply an input gate Û (x) to encode the input data {x i }.
2. Apply a parameterized unitary V (θ) to the state.
3. Measure the expectation values of an observable M , and we define the function as f (x; θ) = ⟨ M ⟩.
4. By repeating the above three steps, minimize the cost function L by tuning the parameter θ iteratively.
The function f (x) is represented as According to a previous study [20], we may not expect high expressibility with a parametrized quantum circuit using single-qubit rotations in the NISQ era.In fact, the study shows that only a sinusoidal curve can be obtained as Eq. ( 8) with using a qubit and single-qubit rotations and if we want different functions as an output, we need to prepare more qubits or obtain other outputs than Eq. ( 8) with adding another operation called datareuploading.However, neither increasing the number of qubits nor increasing the number of gate operations that cause noise is desirable for the NISQ algorithm.

IV. QUANTUM SUPERVISED MACHINE LEARNING WITH KPO
We introduce our method to use the KPO for supervised quantum machine learning.We begin by describing a simplified scenario with d x = d y = 1 by using the single KPO.Next, we explain how to use the KPO network for supervised quantum machine learning with d x = d y = 1.Finally, we describe a scenario to implement supervised quantum machine learning with d x > 1 and/or d y > 1 by using the KPO network.

Single KPO
In our paper, the initial state is set to be a coherent state.Also, to upload the classical data, we could adopt Û (x) = e −iπxn , where n is the number operator defined by n = â † â.However, if we use the KPO, it is difficult to realize in situ tunability of the nonlinearity χ.Assuming that we fix the value of χ during the experiment, we will adopt the following operator to upload the classical data where we have, In the actual experiment, we can easily tune the time duration t d and detuning ∆.Throughout of this paper, we fix the value of χ.Let us define a set of unitary operators Vi (∆ i , p i , r i ) where Ĥ denotes the Hamiltonian of the KPO (1) and τ denotes an evolution time by the Hamiltonian.By turning on and off the parameters of the Hamiltonian, we can construct a unitary operator where D is the number of combinations of (∆ i , p i , r i ).
Here, θ corresponds to a set of parameters {∆ j , p j , r j } D j=1 .For simplicity, we define with i = 1, . . ., d.We choose â+â † = M as the observable to be measured.Since a bosonic system has an infinite dimensional Fock space, even a single KPO may have the ability to approximate the target function, while the previous approach required multiple qubits to represent the target function.
To minimize the cost function, we need to tune the parameter θ.For this purpose, we should adopt a classical algorithm to show how we should update the parameters based on the expectation value of M .
Several types of classical algorithms are used to update θ.One of them is the gradient descent method to use the gradient of the cost function.If we construct the unitary operator V (θ) by using a sequence of parameterized gates, we can use the so-called parameter shift rule [39,40] to determine the gradient.On the other hand, since we use the Hamiltonian dynamics to realize the unitary operator V (θ), it is not straight forward to use the parameter shift rule.We could use a numerical differentiation where we measure small changes in f (x; θ) when we incrementally increase θ. by changing θ in small increments and detecting the resulting small changes in output f (x; θ).However, to detect the small changes, this method requires a large number of measurements.
If we cannot use a sufficient number of shots, we could adopt an optimization using the Nelder-Mead or Powell method, which do not use the information of gradients.Throughout of this paper, we use the Nelder-Mead method [41] for our simulation.
Our method to use the single KPO needs to access higher excited states in the Fock space, which may cause experimental difficulties.This problem may be circumvented by using the KPO network.

KPO network
Next, we consider a case using a KPO network.We prepare the product state of the coherent state (6) as the initial state.To upload the classical data, we apply the following operator on the j-th KPO, where we have χj = t d χ j and πx = t d ∆ j .We define a unitary operator with 3K parameters, where Ĥ is given by Eq. ( 5). Here, If we need more than 3K adjustable parameters, we could consider a different combination of ⃗ ∆, ⃗ p, and ⃗ r.Let us define a set of such a combination as where D is the number of the combination.Thus, we can generate D different unitary operators based on Eq. ( 16).
When we sequentially implement these, the unitary op-erator is given as Here, θ corresponds to a set of parameters as described below.
After applying V (θ) given by Eq. ( 17), we measure an observable M .For this M , we can choose an observable of â1 + â † 1 , for example.
B. dx > 1 and/or dy > 1 case We describe our method to implement supervised quantum machine learning with d x > 1 and/or d y > 1 by using the KPO network.Let us assume K ≥ d x .For To upload the classical data, we use a unitary operator of dx j=1 Ûj (x j ).Subsequently, we apply V (θ) in Eq. ( 17), and measure a set of observable { Mk } dy k=1 .The expectation value of Mk corresponds to the k-th component of y.By repeating these steps, we update the parameter θ to minimize the cost function.In principle, we could use the single KPO with d x > 1 and d y > 1, and we discuss such an example in Appendix C.

C. Potential advantage to use KPOs
Even if we can use only a single KPO, the function obtained as Eq. ( 8) is expected to exhibit a large expressibility.Similar to the previous study [20], we construct the Fourier spectrum of Eq. ( 8) in the case of a single KPO as When χ is negligibly small, we obtain which is the Fourier series.Importantly, there are high frequency terms in this form, and the number of terms is infinite.This would improve expressibility.A similar discussion has been made by Gan et al in the context of a multi-mode photonic device, which supports our claim [24].
If we can provide an appropriate V (θ) and M , we could represent any function that can be represented by the Fourier series.Moreover, previous research shows that the Kerr-nonlinearity could enhance the performance of a specific scheme of quantum machine learning [25], and so our method to utilize the Kerr-nonlinearity might improve the expressibility.
On the other hand, if we use ordinary qubits, the number of high frequency terms is limited by the finite number of qubits.This could limit the expressibility, as suggested in [20].To improve the expressibility, we could increase the number of qubits [20] or circuit depth.However, it is difficult to increase the number of qubits or circuit depth in the NISQ device.

V. SIMULATIONS AND RESULTS
To evaluate the performance of our proposed method, we perform numerical simulations for d x = d y = 1, and compare the results of our method with that of the conventional one [39].Specifically, we perform the fitting of f (x) = e −36x 2 (Gaussian), |x|, and 0.4 sin(4πx) + 0.5 sin(6πx).Also, we perform the fitting of the square wave defined as We create the training set as follows.We set N = 100.First, we randomly choose a value between −1 and 1, and adopt these values as x m .Next, for each x m , we calculate f (x m ) by using the given function f and assign this value as y m .
For our method to use a single KPO, we choose χ = 0.1, t d = τ = 0.7, M = â + â † , and D = 12.Also, we set the cut-off of the Hilbert-space dimension as 25.
For the conventional method [39], we set depth D = 2, the number of qubit K = 6, time step τ = 10, and M = 2Z (1) .Precise setups of the conventional method is given in Appendix B. Here, for a fair comparison, we set the number of parameter θ as 36, which is equal to that of our method.
We show the results of the fitting in Fig. 1.Our method approximates all functions better than the conventional method.In order to compare the expressibility more clearly, we define a Fourier transform as for any function F (x), and we plot the absolute value of f (ν) in Fig. 2. As can be easily seen in (b) and (d), the results by our method contains more Fourier components than that by the conventional method.Also, we plot the value of the cost function after the optimization by our method, and compare this with the conventional method in Table I.Next, let us discuss the case of the KPO network for d x = d y = 1.Here, we use χ 1 = χ 2 = 1, J 12 = −0.1,K = 2, and t d = τ = 1.Also, by choosing D = 6, we set the total number of parameters as 36, which is equal to that of the single KPO.
We could choose M = (â 1 + â † 1 ) ⊗ (â 2 + â † 2 ) for our numerical simulations.However, it is not straightforward to measure such a non-local observable with the KPO.So, instead, we consider two observables M1 = â1 + â † 1 and M2 = â2 + â † 2 .Also, we represent the function as f (x; θ) = ⟨ M1 ⟩ ⟨ M2 ⟩.We perform the fitting of the Gaussian and square wave, which we used in the case of the single KPO.Finally, we set the Hilbert-space cutoff dimension of each KPO as 10.
We plot the results in Fig. 3, and compare the performance of our method to use the KPO network with that to use the single KPO.The cost functions after the optimization for 1KPO (2KPO) are 1.016 × 10 −4 (9.711×10 −5 ) for the Gaussian (e −36x 2 ) and 1.344×10 −2 (2.119 × 10 −2 ) for the square wave.The performance of our method using the KPO network is similar to that using the single KPO.However, we need to access higher excited states for the case of the single KPO than that of the KPO network, and therefore we could avoid the experimental difficulties by using the KPO network.
Let us explain the runtime of our scheme.During our simulations, we employed a maximum iteration count of 7200, the default setting provided by Scipy.optimize.minimize[42], when dealing with 36 variables.The optimization process terminates when the cost function either meets the predefined tolerance level (default value: 10 −4 ) or when the maximum allowed number of iterations is reached.In either case, both the parameter set to minimize the cost function and the iteration number are outputted in Table II.
In the KPO cases, the number of iterations is equal to or less than that in the conventional cases.The most time-consuming part of the practical runtime of the superconducting circuit is the execution time of two-qubit gates.Importantly, coupling strength between KPOs, as demonstrated in previous work [43], is approximately 10MHz, which is similar to that of superconducting transmon qubits [44].Consequently, these findings highlight that the runtime of our method using KPOs is comparable with that of the conventional approach using transmon qubits.
We show how our fitting results depend on the number of training data N in Fig. 4 and the variation of the num-ber of iterations in Table III.For small N , our method seems to be susceptible to overfitting due to its inherent high expressiveness.Fortunately, to reduce the impact of overfitting, we can regulate this expressiveness by adjusting the photon number of the initial coherent state, as we will show in Sec.V A. From Eq. ( 19), we find a tendency that, as we increase (decrease) α, more (less) high frequency terms are added.Therefore, it is expected that we can control the express-ibility by tuning the size of the coherent state prepared as the initial state.
We confirmed this point by numerical simulations.In Fig. 6, we perform numerical simulations for α = 1, 3, and 5 with the use of supervised data generated from two In machine learning, there is a trade-off between increasing expressive power and overfitting.This means that, as we increase the expressibility, the problem of the overfitting becomes more severe.In our method, we could tune the parameter α to choose the best point for the fitting.
To illustrate our concept, we performed numerical simulations in which we varied the number of photons in the initial coherent state.As we mentioned before, in Fig. 4, overfitting occurs for a smaller number of the training data N .We apply our method to tune the expressibility to this case.In Fig. 7, we present the results, highlighting that reducing the photon number of the initial coherent state effectively mitigates the impact of overfitting.

VI. CONCLUSIONS AND DISCUSSION
In conclusion, we propose to use the KPO for the quantum supervised machine learning with variational quantum circuits.We numerically show that, although we use a single KPO, the expressibility of our method is higher than the conventional method with six qubits.In our method, we can tune an amplitude of the initial coherent state, and we numerically show that the expressibility increases as we increase the amplitude.
In this paper, we provide proof of concept using a regression problem as an example.Our method also offers advantages due to its expressive nature for other machine learning problems, including classification, generation, reinforcement learning, and sequential learning.Furthermore, we acknowledge that the quantum kernel method [45] could be another promising application of our approach, as our data encoding methodology into quantum states introduces new types of quantum kernels.Exploring these applications is a promising direction for future research.
In the NISQ era, it is crucial to implement the algorithm with a fewer resource, and our results to use the KPO will contribute to reduce resource.KPO network may be used as a variant of continuous variable neural network [21].There are many potential applications to use the continuous degrees of freedom of the KPO.We hope that our research will help to expand the range of applications of the KPO.where N S , E J , and E C are the number of SQUIDs, and the Josephson energy of a SQUID, and the charging energy of the resonator.
By using a time-dependent magnetic flux Φ(t) penetrating the SQUID loops, we can modulate the Josephson energy as E J (t) = E J + δE J cos ω p t.For simplicity, we set the phase of the pump field to be zero, θ = 0.
By performing a Taylor expansion and taking into account up to the fourth order of φ/N S , we approximate Hamiltonian as Here, we assume |⟨φ/N S ⟩| ≪ 1.

(C1)
It is notable that φ is defined on the interval [−π, π).We We obtain the solution as follows.

FIG. 2 .
FIG. 2. Plot of the absolute value of the Fourier transform of the function against the frequency.The blue dotted line denotes the function to be fitted.The orange (green) line denotes the output by our (conventional) method after the optimization.The functions are (a) e −36x 2 , (b) |x|, (c) square wave and (d) 0.4 sin(4πx) + 0.5 sin(6πx), respectively.

FIG. 3 .
FIG. 3. Demonstration results of our quantum machine learning for e −36x 2 (a) and square wave (b) for the 1KPO and 2KPO cases.Left: the teacher data and the training results.Right: the average photon number.(Since we evaluated ⟨a † a⟩, the result depends on the variable x)

FIG. 6 .
FIG. 6. Demonstration results of our quantum machine learning for e −36x 2 (a) and the square wave (b) for the different α = 1, 3, 5 cases.Left: the teacher data and the training results.Right: the Fourier spectrum of the training results.

FIG. 8 .
FIG. 8. Circuit model of a KPO with N SQUIDs and a shunt capacitor C.

TABLE III .
Variation of the number of iterations with the number of training data