Expressive quantum supervised machine learning using Kerr-nonlinear parametric oscillators

Mori, Yuichiro; Nakaji, Kouhei; Matsuzaki, Yuichiro; Kawabata, Shiro

doi:10.1007/s42484-024-00152-5

Expressive quantum supervised machine learning using Kerr-nonlinear parametric oscillators

Research Article
Open access
Published: 04 March 2024

Volume 6, article number 14, (2024)
Cite this article

Download PDF

You have full access to this open access article

Quantum Machine Intelligence Aims and scope Submit manuscript

Expressive quantum supervised machine learning using Kerr-nonlinear parametric oscillators

Download PDF

Yuichiro Mori¹,
Kouhei Nakaji^1,2,3,
Yuichiro Matsuzaki^1,4 &
…
Shiro Kawabata^1,4

2742 Accesses
11 Altmetric
Explore all metrics

Abstract

Quantum machine learning with variational quantum algorithms (VQA) has been actively investigated as a practical algorithm in the noisy intermediate-scale quantum (NISQ) era. Recent researches reveal that the data reuploading, which repeatedly encode classical data into quantum circuit, is necessary for obtaining the expressive quantum machine learning model in the conventional quantum computing architecture. However, the data reuploading tends to require a large amount of quantum resources, which motivates us to find an alternative strategy for realizing the expressive quantum machine learning efficiently. In this paper, we propose quantum machine learning with Kerr-nonlinear parametric oscillators (KPOs), as another promising quantum computing device. We use not only the ground state and first excited state but also higher excited states, which allows us to use a large Hilbert space even if we have a single KPO. Our numerical simulations show that the expressibility of our method with only one mode of the KPO is much higher than that of the conventional method with six qubits. Our results pave the way towards resource-efficient quantum machine learning, which is essential for the practical applications in the NISQ era.

Hybrid Helmholtz machines: a gate-based quantum circuit implementation

Article 22 April 2020

Quantum and quantum-like machine learning: a note on differences and similarities

Article 24 October 2019

A Variational Algorithm for Quantum Neural Networks

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The quantum computers have attracted much attention due to its potential impact on quantum chemistry (Aspuru-Guzik et al. 2005; Cao et al. 2019; McArdle et al. 2020; Armaos et al. 2020), machine learning (Schuld et al. 2015; Biamonte et al. 2017; Schuld and Killoran 2022), cryptography (Shor 1994, 1997; Lenstra 2000), search problems (Grover 1996), and so on. With advancements in quantum technology, commercially available quantum computers have become a reality. In principle, we could realize a fault-tolerant quantum computer, if the number of qubits is more than 10 million with a fidelity around 0.999 (Jones et al. 2012; Devitt et al. 2013; Gidney and Ekerå 2021). However, in the current device, the available number of qubits is an order of 500 or less, which is much smaller than that required for the fault-tolerant quantum computation. A more feasible scenario to be realized in the near future is the so-called NISQ regime (Preskill 2018; Bharti et al. 2022).

Numerous quantum algorithms have been designed for execution on NISQ devices. Among these, VQAs are considered some of the most promising applications for NISQ devices (Bharti et al. 2022; Endo et al. 2021). Specifically, quantum machine learning has emerged as an appealing use case for VQAs. As a NISQ algorithm, quantum machine learning has been predominantly investigated in the context of qubit-based systems. Recent studies have shown that data reuploading, the process of repeatedly encoding classical data into quantum circuits, is essential for achieving expressive quantum machine learning models within traditional quantum computing frameworks (Pérez-Salinas et al. 2020; Gil Vidal and Theis 2020; Schuld et al. 2021). However, data reuploading often demands much quantum resources. This encourages us to seek alternative approaches to achieve expressive quantum machine learning. We could adopt a photonic device where fock states can be used and the necessary quantum resources for the data embedding with this device could be different from that with the conventional approach using qubits (Killoran et al. 2019; Steinbrecher et al. 2019; Volkoff 2021; Gan et al. 2020; Liu et al. 2023).

On the other hand, the KPO is one of the candidates to realize quantum computation (Milburn and Holmes 1991; Wielinga and Milburn 1993; Cochrane et al. 1999). The KPO is a parametric oscillator with large Kerr nonlinearity. This Kerr nonlinearity can be used to generate cat states. We can realize the KPO by using superconducting resonators with Josephson junctions (Bourassa et al. 2012; Meaney et al. 2014). The KPO is one of the candidates to perform gate-type quantum computation (Cochrane et al. 1999; Goto 2016a; Puri et al. 2017) and quantum annealing (Goto 2016b; Puri et al. 2017), and the KPO qubit is realized experimentally (Grimm et al. 2020). It is known that the KPO qubit is highly tolerant to bit-flip errors, and we can exploit this property to reduce the overhead for fault-tolerant quantum computation (Puri et al. 2017; Masuda et al. 2022).

In this paper, we propose to use the KPO for the supervised machine learning with a variational algorithm. KPO is a bosonic system, and we can in principle use the infinitely large Hilbert space with the single KPO. Also, unlike the conventional approach to use parametrized gates, we use a natural Hamiltonian dynamics where we change the Hamiltonian parameter to implement the variational algorithm. We numerically study the performance of our method to use the KPO with that of the conventional method with qubits.

In our method, we start from a coherent state with an amplitude of $\alpha $. Importantly, we numerically find that, by changing the amplitude, we can tune the expressibility. Since we encode the input classical data by using the detuning of the KPO, we can include a higher frequency as we increase the amplitude of the coherent state. We expect that the high-frequency terms will improve the expressibility, and we confirm this point by using numerical simulations. As the expressivity increases, on the other hand, more often, overfitting occurs, and so our method allows us to optimize the expressibility by tuning the amplitude of the coherent state.

This paper is organized as follows. In Section 2, we review the physics of single and multiple KPO systems. The latter is called KPO network. In Section 3, we explain a standard supervised machine learning algorithm as a NISQ algorithm, and a supervised machine learning algorithm for KPO is proposed based on the ideas in Section 4. We performed numerical simulations to validate our proposed method. In Section 5, we explain the simulations and the results precisely. Finally, we conclude with some final thoughts in Section 6.

2 KPO

KPO is a bosonic system with a nonlinear effect called Kerr nonlinearity. Here, we first describe a single KPO and next explain a network of KPOs that have been used for a gate-type quantum computer or quantum annealing.

First, in a frame rotating at half the pump frequency of the parametric drive and in the rotating wave approximation, the Hamiltonian of the single KPO is written as Goto (2016b, 2019)

$$\begin{aligned} \hat{H}&=\chi \hat{a}^{\dagger 2}\hat{a}^{2} + \Delta \hat{a}^{\dagger } \hat{a}\nonumber \\&\quad - p(\hat{a}^{2} + \hat{a}^{\dagger 2}) + r(\hat{a}+\hat{a}^{\dagger }), \end{aligned}$$

(1)

where $\chi $, $\Delta $, p, and r are the Kerr nonlinearity, the detuning, the pump amplitude of the parametric drive, and the strength of the coherent drive, respectively.

We can easily tune $\Delta $, p, and r during the experiment by changing the parameters of the external driving fields. Although we can tune $\chi $ by changing magnetic flux penetrating the superconducting loop of the KPO, the dynamic range is typically small, and therefore, we assume that $\chi $ is fixed at a specific value.

The coherent state is defined by

$$\begin{aligned} \vert {\alpha }\rangle = e^{-\frac{|\alpha |^{2}}{2}}\sum _{k} \frac{\alpha ^{k}}{\sqrt{k!}}\vert {k}\rangle , \end{aligned}$$

(2)

where $\vert {k}\rangle $ are the fock states. The system is initially prepared in the coherent state in our method. For a linear resonator, we can prepare the coherent state by adding the coherent driving term $r(\hat{a}+\hat{a}^{\dagger })$. However, due to the term $\chi \hat{a}^{\dagger 2}\hat{a}^{2}$ in Eq. 1, we cannot prepare the coherent state just by adding the coherent drive. Instead, we can prepare the coherent state by using the KPO as follows. By setting $p=r=0$, the Hamiltonian Eq. 1 becomes

$$\begin{aligned} \hat{H}&=\chi \hat{a}^{\dagger 2}\hat{a}^{2} + \Delta \hat{a}^{\dagger } \hat{a}. \end{aligned}$$

(3)

If $\Delta >\chi $ is satisfied, the ground state of this Hamiltonian becomes the vacuum state $\vert {0}\rangle $. On the other hand, when $\Delta $ and r are zero, Eq. 1 can be rewritten as

$$\begin{aligned} \hat{H} =\chi \left( \hat{a}^{\dagger 2}-\frac{p}{\chi }\right) \left( \hat{a}^{2}-\frac{p}{\chi }\right) -\frac{p^{2}}{\chi }, \end{aligned}$$

(4)

and the ground state is in the eigenspace that is spanned by two coherent states $\vert {\sqrt{p/\chi }}\rangle $ and $\vert {-\sqrt{p/\chi }}\rangle $. By adding a coherent drive as a perturbation, we can solve the degeneracy, and the ground state becomes approximately $\vert {\sqrt{p/\chi }}\rangle $ with a negative value of r. If we prepare a vacuum state with the Hamiltonian of Eq. 3, the system is in the ground state. By adiabatically changing the Hamiltonian from Eq. 3 to Eq. 4, we obtain the coherent state $\vert {\sqrt{p/\chi }}\rangle $ due to the adiabatic theorem. This operation is frequently used in quantum annealing with KPOs (Goto 2016a, b, 2019; Puri et al. 2017).

Next, the Hamiltonian of multiple KPOs, which is called a KPO network, is written as

$$\begin{aligned} \hat{H} =&\sum _{j=1}^{K} \chi _{j}\hat{a}^{\dagger 2}_{j}\hat{a}_{j}^{2} + \Delta _{j}\hat{a}^{\dagger }_{j}\hat{a}_{j}\nonumber \\&\qquad \quad -p_{j}(\hat{a}_{j}^{2}+\hat{a}^{\dagger 2}_{j})+r_{j}(\hat{a}_{j} + \hat{a}^{\dagger }_{j})\nonumber \\&\ +\sum _{j>j'}^{K} \left( J_{jj'}\hat{a}^{\dagger }_{j}\hat{a}_{j'}+J_{jj'}^{*}\hat{a}^{\dagger }_{j'}\hat{a}_{j}\right) . \end{aligned}$$

(5)

where K denotes the number of KPOs and $J_{jj'}$ denotes the coupling strength between KPOs. Here, we assume that we fix the values of $\chi _{j}$ and $J_{jj'}$ during the experiment, while we can control the values of $\Delta _{j}$, $p_{j}$, and $r_{j}$.

If $J_{jj'}$ is zero, we can independently perform the adiabatic state preparation described above and prepare the following state:

$$\begin{aligned} \bigotimes _{j=1}^{K} \vert {\alpha _{j}}\rangle . \end{aligned}$$

(6)

Here, each $\alpha _{j}$ is the eigenvalue of $\vert {\alpha _{j}}\rangle $ with the annihilation operator on the j-th KPO $\hat{a}_{j}$.

It is worth mentioning that even when $J_{jj'}$ is nonzero, we can prepare the product of the coherent state as follows. Let us assume that $\Delta _j$, $r_j$, and $J_{ij}$ are much smaller than $p_j$ and $\chi _j$. In this case, the last terms of the Hamiltonian Eq. 5 can be interpreted as the longitudinal-field Ising Hamiltonian in a coherent state basis. If $J_{ij}$ is negative, we have a ferromagnetic Hamiltonian. Moreover, by setting $J_{ij}$ to be much smaller than $r_j$, the state in Eq. 6 can be a ground state, and so we can prepare this state in an adiabatic way. Also, a coupling scheme of KPOs with high fidelity has already been proposed theoretically (Goto 2019; Masuda et al. 2022; Aoki et al. 2024).

3 Quantum supervised machine learning as a NISQ algorithm

In this section, let us review a quantum supervised machine learning as a preparation for introducing our model. In a supervised learning task, a number of training set $\{(\varvec{x}_{m}, $ $ \varvec{y}_{m})\}_{m=1}^N$ are given. Here, all input data $\varvec{x}_{m}$ (output data $\varvec{y}_{m}$) are $d_{x}$ ($d_{y}$) dimensional arrays. Suppose that there is a hidden relationship between an input data $\varvec{x}$ and the output data $\varvec{y}$ as $\varvec{y} = \tilde{f}(\varvec{x})$ with a function $\tilde{f}$. The objective of the task is to find the hidden relationship $\tilde{f}$ from the training data. More specifically, we define the model function f and optimize it so that it becomes close to $\tilde{f}$ by using the training data.

In most of the quantum machine learning with near-term devices, we use a parameterized quantum circuit to construct a model function. More precisely, by tuning a parameter, we try to minimize a cost function. In usual cases, we choose the mean squared error

$$\begin{aligned} L(\varvec{\theta }) = \frac{1}{N}\sum _{m=1}^{N} \left| f(\varvec{x}_{m};\varvec{\theta })-\varvec{y}_{m}\right| ^{2}, \end{aligned}$$

(7)

for the cost function. Here, N is the number of data sets, $f(\varvec{x};\varvec{\theta })$ is an array as an output of the parameterized quantum circuit, and $\varvec{\theta }$ is the corresponding parameter. Let us summarize such a quantum machine learning as follows.

1.
Prepare an initial state $\vert {\psi }\rangle $, and apply an input gate $\hat{U}(\varvec{x})$ to encode the input data $\{\varvec{x}_i\}$.
2.
Apply a parameterized unitary $\hat{V}(\varvec{\theta })$ to the state.
3.
Measure the expectation values of an observable $\hat{M}$, and we define the function as $f(\varvec{x};\varvec{\theta })= \langle \hat{M}\rangle $.
4.
By repeating the above three steps, minimize the cost function L by tuning the parameter $\varvec{\theta }$ iteratively.

The function $f(\varvec{x})$ is represented as

$$\begin{aligned} f(\varvec{x};\varvec{\theta })=\langle {\psi }\vert \hat{U}^{\dagger }(\varvec{x})\hat{V}^{\dagger }(\varvec{\theta })\hat{M}\hat{V}(\varvec{\theta })\hat{U}(\varvec{x})\vert {\psi }\rangle . \end{aligned}$$

(8)

According to a previous study (Schuld et al. 2021), we may not expect high expressibility with a parametrized quantum circuit using single-qubit rotations in the NISQ era. In fact, the study shows that only a sinusoidal curve can be obtained as Eq. 8 with using a qubit and single-qubit rotations, and if we want different functions as an output, we need to prepare more qubits or obtain other outputs than Eq. 8 with adding another operation called data reuploading. However, neither increasing the number of qubits nor increasing the number of gate operations that cause noise is desirable for the NISQ algorithm.

4 Quantum supervised machine learning with KPO

We introduce our method to use the KPO for supervised quantum machine learning. We begin by describing a simplified scenario with $d_x=d_y=1$ by using the single KPO. Next, we explain how to use the KPO network for supervised quantum machine learning with $d_x=d_y=1$. Finally, we describe a scenario to implement supervised quantum machine learning with $d_x>1$ and/or $d_y>1$ by using the KPO network.

4.1 $d_{x} = d_{y} =1$ case

4.1.1 Single KPO

In our paper, the initial state is set to be a coherent state. Also, to upload the classical data, we could adopt $\hat{U}(x) = e^{-i\pi x \hat{n}}$, where $\hat{n}$ is the number operator defined by $\hat{n}=\hat{a}^{\dagger }\hat{a}$. However, if we use the KPO, it is difficult to realize in situ tunability of the nonlinearity $\chi $. Assuming that we fix the value of $\chi $ during the experiment, we will adopt the following operator to upload the classical data

$$\begin{aligned} \hat{U}(x) = e^{-i\tilde{\chi }\hat{n}^{2}-i\pi x \hat{n}}, \end{aligned}$$

(9)

where we have,

$$\begin{aligned} \tilde{\chi }&=t_{d}\chi ,\end{aligned}$$

(10)

$$\begin{aligned} \pi x&= t_{d}(\Delta -\chi ). \end{aligned}$$

(11)

In the actual experiment, we can easily tune the time duration $t_{d}$ and detuning $\Delta $. Throughout this paper, we fix the value of $\tilde{\chi }$.

Let us define a set of unitary operators $\hat{V}_{i}(\Delta _{i}, p_{i}, r_{i})$

$$\begin{aligned} \hat{V}_{i}(\Delta _{i}, p_{i}, r_{i}) = e^{-i\tau \hat{H}}. \end{aligned}$$

(12)

where $\hat{H}$ denotes the Hamiltonian of the KPO Eq. 1 and $\tau $ denotes an evolution time by the Hamiltonian. By turning on and off the parameters of the Hamiltonian, we can construct a unitary operator

$$\begin{aligned} \hat{V}(\varvec{\theta })=\prod _{i}^{D} \hat{V}_{i}(\Delta _{i}, p_{i}, r_{i}), \end{aligned}$$

(13)

where D is the number of combinations of $(\Delta _{i}, p_{i}, r_{i})$. Here, $\varvec{\theta }$ corresponds to a set of parameters $\{\Delta _{j}, p_{j}, r_{j}\}_{j=1}^{D}$. For simplicity, we define

$$\begin{aligned} \theta _{k} := {\left\{ \begin{array}{ll} \Delta _{i} &{} k = 3 i -2,\\ p_{i} &{} k = 3 i - 1,\\ r_{i} &{} k = 3 i, \end{array}\right. } \end{aligned}$$

(14)

with $i = 1, \dots , d.$ We choose $\hat{a}+\hat{a}^{\dagger }=\hat{M}$ as the observable to be measured. Since a bosonic system has an infinite dimensional Fock space, even a single KPO may have the ability to approximate the target function, while the previous approach required multiple qubits to represent the target function.

To minimize the cost function, we need to tune the parameter $\varvec{\theta }$. For this purpose, we should adopt a classical algorithm to show how we should update the parameters based on the expectation value of $\hat{M}$.

Several types of classical algorithms are used to update $\varvec{\theta }$. One of them is the gradient descent method to use the gradient of the cost function. If we construct the unitary operator $\hat{V}(\varvec{\theta })$ by using a sequence of parameterized gates, we can use the so-called parameter shift rule (Mitarai et al. 2018; Wierichs et al. 2022) to determine the gradient. On the other hand, since we use the Hamiltonian dynamics to realize the unitary operator $\hat{V}(\varvec{\theta })$, it is not straightforward to use the parameter shift rule. We could use a numerical differentiation where we measure small changes in $f(x;\varvec{\theta })$ when we incrementally increase $\varvec{\theta }$ by changing $\varvec{\theta }$ in small increments and detecting the resulting small changes in the output $f(x;\varvec{\theta })$. However, to detect the small changes, this method requires a large number of measurements.

If we cannot use a sufficient number of shots, we could adopt an optimization using the Nelder-Mead or Powell method, which does not use the information of gradients. Throughout this paper, we use the Nelder-Mead method (Nelder and Mead 1965) for our simulation.

Our method to use the single KPO needs to access higher excited states in the Fock space, which may cause experimental difficulties. This problem may be circumvented by using the KPO network.

4.1.2 KPO network

Next, we consider a case using a KPO network. We prepare the product state of the coherent state (6) as the initial state. To upload the classical data, we apply the following operator on the j-th KPO,

$$\begin{aligned} \hat{U}_{j}(x) = e^{-i\tilde{\chi }\hat{n}_{j}^{2}-i\pi x \hat{n}_{j}}, \end{aligned}$$

(15)

where we have $\tilde{\chi }_j=t_{d}\chi _j$ and $\pi x= t_{d}\Delta _j.$ We define a unitary operator with 3K parameters,

$$\begin{aligned} \hat{V}(\mathbf {\Delta }, \textbf{p}, \textbf{r}) = e^{-i t_{d} \hat{H}}, \end{aligned}$$

(16)

where $\hat{H}$ is given by Eq. 5. Here, $\mathbf {\Delta }=(\Delta _1, \Delta _2, \cdots , \Delta _K)$, $\textbf{p}=(p_1, p_2,\cdots ,p_{K})$ and $\textbf{r}=(r_1,r_2,\cdots , r_K)$ are K dimensional arrays.

If we need more than 3K adjustable parameters, we could consider a different combination of $\mathbf {\Delta }$, $\textbf{p}$, and $\textbf{r}$. Let us define a set of such a combination as $\{\mathbf {\Delta }_{i}, \textbf{p}_{i}, \textbf{r}_{i}\}_{i=1}^{D}$ where D is the number of the combination. Thus, we can generate D different unitary operators based on Eq. 16. When we sequentially implement these, the unitary operator is given as

$$\begin{aligned} \hat{V}(\varvec{\theta }) = \prod _{i}^{D}\hat{V}_{i}(\mathbf {\Delta }_{i},\textbf{p}_{i},\textbf{r}_{i}). \end{aligned}$$

(17)

Here, $\varvec{\theta }$ corresponds to a set of parameters as described below.

$$\begin{aligned}&(\mathbf {\Delta }_{1},\textbf{p}_{1},\textbf{r}_{1},...,\mathbf {\Delta }_{D},\textbf{p}_{D},\textbf{r}_{D})\nonumber \\&\quad = (\Delta _{11},\Delta _{12},...,\Delta _{1K},p_{11},p_{12},...,p_{1K},...,r_{DK}). \nonumber \end{aligned}$$

After applying $\hat{V}(\varvec{\theta })$ given by Eq. 17, we measure an observable $\hat{M}$. For this $\hat{M}$, we can choose an observable of $\hat{a}_{1}+\hat{a}^{\dagger }_{1}$, for example.

4.2 $d_x>1$ and/or $d_y>1$ case

We describe our method to implement supervised quantum machine learning with $d_x>1$ and/or $d_y>1$ by using the KPO network. Let us assume $K\ge d_x$. For $j=1,2,\cdots , d_x$, we define

$$\begin{aligned} \hat{U}_{j}(x_{j}) = e^{-i\tilde{\chi }\hat{n}_{j}^{2}-i\pi x_{j} \hat{n}_{j}}. \end{aligned}$$

(18)

To upload the classical data, we use a unitary operator of $\prod _{j=1}^{d_x}\hat{U}_{j}(x_{j})$.

Subsequently, we apply $\hat{V}(\varvec{\theta })$ in Eq. 17 and measure a set of observable $\{\hat{M}_{k}\}_{k=1}^{d_{y}}$. The expectation value of $\hat{M}_{k}$ corresponds to the k-th component of $\varvec{y}$. By repeating these steps, we update the parameter $\varvec{\theta }$ to minimize the cost function. In principle, we could use the single KPO with $d_x>1$ and $d_y>1$, and we discuss such an example in Appendix C.

4.3 Potential advantage to use KPOs

Even if we can use only a single KPO, the function obtained as Eq. 8 is expected to exhibit a large expressibility. Similar to the previous study (Schuld et al. 2021), we construct the Fourier spectrum of Eq. 8 in the case of a single KPO as $d_x=d_y=1$.

When $\chi $ is negligibly small, we obtain

$$\begin{aligned} f(x;\varvec{\theta })&= \langle {\alpha |e^{i\pi x \hat{n}}\hat{V}^{\dagger }(\varvec{\theta })\hat{M}\hat{V}(\varvec{\theta })e^{-i\pi x \hat{n}}|\alpha }\rangle \nonumber \\&=e^{-|\alpha |^{2}}\sum _{k, l=0}^{\infty } \langle {k|\hat{V}^{\dagger }(\varvec{\theta })\hat{M}\hat{V}(\varvec{\theta })|l}\rangle \frac{\alpha ^{l}\alpha ^{*k}}{\sqrt{k! l!}}e^{i\pi x (k-l)}, \end{aligned}$$

(19)

which is the Fourier series. Importantly, there are high-frequency terms in this form, and the number of terms is infinite. This would improve expressibility. A similar discussion has been made by Gan et al. in the context of a multi-mode photonic device, which supports our claim (Gan et al. 2020).

If we can provide an appropriate $\hat{V}(\varvec{\theta })$ and $\hat{M}$, we could represent any function that can be represented by the Fourier series. Moreover, previous research shows that the Kerr nonlinearity could enhance the performance of a specific scheme of quantum machine learning (Liu et al. 2023), and so our method to utilize the Kerr nonlinearity might improve the expressibility.

On the other hand, if we use ordinary qubits, the number of high-frequency terms is limited by the finite number of qubits. This could limit the expressibility, as suggested in Schuld et al. (2021). To improve the expressibility, we could increase the number of qubits (Schuld et al. 2021) or circuit depth. However, it is difficult to increase the number of qubits or circuit depth in the NISQ device.

5 Simulations and results

To evaluate the performance of our proposed method, we perform numerical simulations for $d_x=d_y=1$ and compare the results of our method with that of the conventional one (Mitarai et al. 2018). Specifically, we perform the fitting of $\tilde{f}(x)=$ $e^{-36x^{2}}$ (Gaussian), |x|, and $0.4\sin (4\pi x)+0.5\sin (6\pi x)$. Also, we perform the fitting of the square wave defined as

$$\begin{aligned} \tilde{f}(x) = {\left\{ \begin{array}{ll} 1 &{} (|x|<0.4) \\ 0 &{} (|x|\ge 0.4). \end{array}\right. } \end{aligned}$$

(20)

We create the training set as follows. We set $N=100$. First, we randomly choose a value between $-1$ and 1 and adopt these values as $x_m$. Next, for each $x_m$, we calculate $\tilde{f}(x_m)$ by using the given function $\tilde{f}$ and assign this value as $y_m$.

For our method to use a single KPO, we choose $\chi =0.1$, $t_{d}=\tau =0.7$, $\hat{M} = \hat{a} + \hat{a}^{\dagger }$, and $D=12$. Also, we set the cutoff of the Hilbert space dimension as 25.

For the conventional method (Mitarai et al. 2018), we set depth $D=2$, the number of qubit $K=6$, time step $\tau =10$, and $\hat{M}=2Z^{(1)}$. Precise setups of the conventional method are given in Appendix B. Here, for a fair comparison, we set the number of parameter $\theta $ as 36, which is equal to that of our method.

We show the results of the fitting in Fig. 1. Our method approximates all functions better than the conventional method. In order to compare the expressibility more clearly, we define a Fourier transform as

$$\begin{aligned} \hat{F}(\nu )= \frac{1}{\sqrt{2\pi }}\int _{-1}^{1} dx F(x)e^{-2\pi i\nu x}, \end{aligned}$$

(21)

for any function F(x), and we plot the absolute value of $\hat{f}(\nu )$ in Fig. 2. As can be easily seen in (b) and (d), the results by our method contain more Fourier components than that by the conventional method.

Also, we plot the value of the cost function after the optimization by our method and compare this with the conventional method in Table 1.

Next, let us discuss the case of the KPO network for $d_x=d_y=1$. Here, we use $\chi _{1} = \chi _{2} = 1$, $J_{12} = -0.1$, $K=2$, and $t_{d} = \tau = 1$. Also, by choosing $D=6$, we set the total number of parameters as 36, which is equal to that of the single KPO.

We could choose $\hat{M}= (\hat{a}_{1} + \hat{a}^{\dagger }_{1})\otimes (\hat{a}_{2} +\hat{a}^{\dagger }_{2}) $ for our numerical simulations. However, it is not straightforward to measure such a non-local observable with the KPO. So, instead, we consider two observables $\hat{M}_{1} = \hat{a}_{1} + \hat{a}^{\dagger }_{1}$ and $\hat{M}_{2} = \hat{a}_{2} + \hat{a}^{\dagger }_{2}$. Also, we represent the function as $f(x;\varvec{\theta })=\langle {\hat{M}_{1}}\rangle \langle {\hat{M}_{2}}\rangle $. We perform the fitting of the Gaussian and square wave, which we used in the case of the single KPO. Finally, we set the Hilbert space cutoff dimension of each KPO as 10.

Table 1 Finally obtained values of the cost function

Full size table

We plot the results in Fig. 3 and compare the performance of our method to use the KPO network with that to use the single KPO. The cost functions after the optimization for 1KPO (2KPO) are $1.016\times 10^{-4}$ ($9.711\times 10^{-5}$) for the Gaussian ($e^{-36x^{2}}$) and $1.344\times 10^{-2}$ ($2.119\times 10^{-2}$) for the square wave. The performance of our method using the KPO network is similar to that using the single KPO. However, we need to access higher excited states for the case of the single KPO than that of the KPO network, and therefore, we could avoid the experimental difficulties by using the KPO network.

Let us explain the runtime of our scheme. During our simulations, we employed a maximum iteration count of 7200, the default setting provided by Scipy.optimize.minimize (Virtanen et al. 2020), when dealing with 36 variables. The optimization process terminates when the cost function either meets the predefined tolerance level (default value, $10^{-4}$) or when the maximum allowed number of iterations is reached. In either case, both the parameter set to minimize the cost function and the iteration number are outputted in Table 2.

Table 2 Numbers of iterations

Full size table

Table 3 Variation of the number of iterations with the number of training data

Full size table

In the KPO cases, the number of iterations is equal to or less than that in the conventional cases. The most time-consuming part of the practical runtime of the superconducting circuit is the execution time of two-qubit gates. Importantly, coupling strength between KPOs, as demonstrated in previous work (Yamaji et al. 2022), is approximately $10 \textrm{MHz}$, which is similar to that of superconducting transmon qubits (Stehlik et al. 2021). Consequently, these findings highlight that the runtime of our method using KPOs is comparable with that of the conventional approach using transmon qubits.

We show how our fitting results depend on the number of training data N in Figs. 4 and 5 and the variation of the number of iterations in Table 3. For small N, our method seems to be susceptible to overfitting due to its inherent high expressiveness. Fortunately, to reduce the impact of overfitting, we can regulate this expressiveness by adjusting the photon number of the initial coherent state, as we will show in Section 5.1.

5.1 $\alpha $ and expressive power

From Eq. 19, we find a tendency that, as we increase (decrease) $\alpha $, more (less) high-frequency terms are added. Therefore, it is expected that we can control the expressibility by tuning the size of the coherent state prepared as the initial state.

We confirmed this point by numerical simulations. In Fig. 6, we perform numerical simulations for $\alpha = 1, 3,$ and 5 with the use of supervised data generated from two functions, a Gaussian and a square wave. Only for $\alpha = 5$, we change the cutoff dimension of the Hilbert space from 25 to 100 because the average photon number is 25.

In machine learning, there is a trade-off between increasing expressive power and overfitting. This means that, as we increase the expressibility, the problem of the overfitting becomes more severe. In our method, we could tune the parameter $\alpha $ to choose the best point for the fitting.

To illustrate our concept, we performed numerical simulations in which we varied the number of photons in the initial coherent state. As we mentioned before, in Fig. 4, overfitting occurs for a smaller number of the training data N. We apply our method to tune the expressibility to this case. In Fig. 7, we present the results, highlighting that reducing the photon number of the initial coherent state effectively mitigates the impact of overfitting.

6 Conclusions and discussion

In conclusion, we propose to use the KPO for the quantum supervised machine learning with variational quantum circuits. We numerically show that, although we use a single KPO, the expressibility of our method is higher than the conventional method with six qubits. In our method, we can tune an amplitude of the initial coherent state, and we numerically show that the expressibility increases as we increase the amplitude.

In this paper, we provide proof of concept using a regression problem as an example. Our method also offers advantages due to its expressive nature for other machine learning problems, including classification, generation, reinforcement learning, and sequential learning. Furthermore, we acknowledge that the quantum kernel method (Havlíček et al. 2019) could be another promising application of our approach, as our data encoding methodology into quantum states introduces new types of quantum kernels. Exploring these applications is a promising direction for future research.

In the NISQ era, it is crucial to implement the algorithm with a fewer resource, and our results to use the KPO will contribute to reduce resource. KPO network may be used as a variant of continuous variable neural network (Killoran et al. 2019). There are many potential applications to use the continuous degrees of freedom of the KPO. We hope that our research will help to expand the range of applications of the KPO.

Availability of data and materials

The datasets used and/or analyzed during the current study are available from Y. Mori on reasonable request.

References

Aoki T, Kanao T, Goto H, Kawabata S, Masuda S (2023) Control of the ZZ coupling between Kerr-cat qubits via transmon couplers. Phys Rev Applied 21:014030. https://journals.aps.org/prapplied/abstract/10.1103/PhysRevApplied.21.014030
Armaos V, Badounas DA, Deligiannis P, Lianos K (2020) Computational chemistry on quantum computers. Appl Phys A 126:625
Google Scholar
Aspuru-Guzik A, Dutoi AD, Love PJ, Head-Gordon M (2005) Simulated quantum computation of molecular energies. Science 309:1704
Google Scholar
Bharti K, Cervera-Lierta A, Kyaw TH, Haug T, Alperin-Lea S, Anand A, Degroote M, Heimonen H, Kottmann JS, Menke T, Mok W-K, Sim S, Kwek L-C, Aspuru-Guzik A (2022) Noisy intermediate-scale quantum algorithms. Rev Mod Phys 94:015004
MathSciNet Google Scholar
Biamonte J, Wittek P, Pancotti N, Rebentrost P, Wiebe N, Lloyd S (2017) Quantum machine learning. Nature 549:212–219
Google Scholar
Bourassa J, Beaudoin F, Gambetta JM, Blais A (2012) Josephson-junction-embedded transmission-line resonators: from Kerr medium to in-line transmon. Phys Rev A 86:013814
Google Scholar
Cao Y, Romero J, Olson JP, Degroote M, Johnson PD, Kieferová M, Kivlichan ID, Menke T, Peropadre B, Sawaya NPD, Sim S, Veis L, Aspuru-Guzik A (2019) Quantum chemistry in the age of quantum computing. Chem Rev 119:10856. pMID:31469277
Cochrane PT, Milburn GJ, Munro WJ (1999) Macroscopically distinct quantum-superposition states as a bosonic code for amplitude damping. Phys Rev A 59:2631
Google Scholar
Devitt SJ, Stephens AM, Munro WJ, Nemoto K (2013) Requirements for fault-tolerant factoring on an atom-optics quantum computer. Nat Commun 4:2524
Google Scholar
Endo S, Cai Z, Benjamin SC, Yuan X (2021) Hybrid quantum-classical algorithms and quantum error mitigation. J Phys Soc Jpn 90:032001
Google Scholar
Gan BY, Leykam D, Angelakis DG (2022) Fock stateenhanced expressivity of quantum machine learning models. EPJ Quantum Techno 9:16
Google Scholar
Gao YY, Rol MA, Touzard S, Wang C (2021) Practical guide for building superconducting quantum devices. PRX Quantum 2:040202
Google Scholar
Gidney C, Ekerå M (2021) How to factor 2048 bit RSA integers in 8 hours using 20 million noisy qubits. Quantum 5:433
Google Scholar
Gil Vidal FJ, Theis DO (2020) Input redundancy for parameterized quantum circuits. Front Phys 8:297
Google Scholar
Goto H (2016a) Universal quantum computation with a nonlinear oscillator network. Phys Rev A 93:050301
MathSciNet Google Scholar
Goto H (2016b) Bifurcation-based adiabatic quantum computation with a nonlinear oscillator network. Sci Rep 6:21686
Google Scholar
Goto H (2019) Quantum computation based on quantum adiabatic bifurcations of Kerr-nonlinear parametric oscillators. J Phys Soc Jpn 88:061015
Google Scholar
Grimm A, Frattini NE, Puri S, Mundhada SO, Touzard S, Mirrahimi M, Girvin SM, Shankar S, Devoret MH (2020) Stabilization and operation of a Kerr-cat qubit. Nature 584:205
Google Scholar
Grover, LK (1996) A fast quantum mechanical algorithm for database search, in proceedings of the twenty-eighth annual acm symposium on theory of computing. STOC ’96 Association for Computing Machinery, New York, USA, p 212–219
Havlíček V, Córcoles AD, Temme K, Harrow AW, Kandala A, Chow JM, Gambetta JM (2019) Supervised learning with quantum-enhanced feature spaces. Nature 567:209
Google Scholar
Johansson JR, Nation P, Nori F (2013) QuTiP 2: a python framework for the dynamics of open quantum systems. Comput Phys Commun 184:1234
Google Scholar
Jones NC, Van Meter R, Fowler AG, McMahon PL, Kim J, Ladd TD, Yamamoto Y (2012) Layered architecture for quantum computing. Phys Rev X 2:031007
Google Scholar
Killoran N, Bromley TR, Arrazola JM, Schuld M, Quesada N, Lloyd S (2019) Continuous-variable quantum neural networks. Phys Rev Res 1:033063
Google Scholar
Lenstra AK (2000) Integer Factoring. Des Codes Crypt 19:101
MathSciNet Google Scholar
Liu J, Zhong C, Otten M, Chandra A, Cortes CL, Ti C, Gray SK, Han X (2023) Quantum Kerr learning. Mach Learn Sci Technol 4:025003
Google Scholar
Masuda S, Kanao T, Goto H, Matsuzaki Y, Ishikawa T, Kawabata S (2022) Fast tunable coupling scheme of Kerr parametric oscillators based on shortcuts to adiabaticity. Phys Rev Appl 18:034076
Google Scholar
McArdle S, Endo S, Aspuru-Guzik A, Benjamin SC, Yuan X (2020) Quantum computational chemistry. Rev Mod PhysRev Mod Phys 92:015003
MathSciNet Google Scholar
Meaney CH, Nha H, Duty T, Milburn GJ (2014) Quantum and classical nonlinear dynamics in a microwave cavity. EPJ Quantum Technol 1:7
Google Scholar
Milburn G, Holmes C (1991) Quantum coherence and classical chaos in a pulsed parametric oscillator with a Kerr nonlinearity. Phys Rev A 44:4704
Google Scholar
Mitarai K, Negoro M, Kitagawa M, Fujii K (2018) Quantum circuit learning. Phys Rev A 98:032309
Google Scholar
Nelder JA, Mead R (1965) A simplex method for function minimization. J Comput 7:308
MathSciNet Google Scholar
Pérez-Salinas A, Cervera-Lierta A, Gil-Fuster E, Latorre JI (2020) Data re-uploading for a universal quantum classifier. Quantum 4:226
Google Scholar
Preskill J (2018) Quantum computing in the NISQ era and beyond. Quantum 2:79
Google Scholar
Puri S, Boutin S, Blais A (2017) Engineering the quantum states of light in a Kerr-nonlinear resonator by twophoton driving. Npj Quantum Inf 3:18
Puri S, Andersen CK, Grimsmo AL, Blais A (2017) Quantum annealing with all-to-all connected nonlinear oscillators. Nat Commun 8:15785
Google Scholar
Schuld M, Killoran N (2022) Is quantum advantage the right goal for quantum machine learning? PRX Quantum 3:030101
Google Scholar
Schuld M, Sinayskiy I, Petruccione F (2015) An introduction to quantum machine learning. Contemp Phys 56:172
Google Scholar
Schuld M, Sweke R, Meyer JJ (2021) Effect of data encoding on the expressive power of variational quantummachine- learning models. Phys Rev A 103:032430
Google Scholar
Shor P. (1994) Algorithms for quantum computation: discrete logarithms and factoring. In: Proceedings 35th annual symposium on foundations of computer science, pp 124–134
Shor PW (1997) Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM J Comput 26:1484
MathSciNet Google Scholar
Stehlik J, Zajac D, Underwood D, Phung T, Blair J, Carnevale S, Klaus D, Keefe G, Carniol A, Kumph M et al (2021) Tunable coupling architecture for fixed-frequency transmon superconducting qubits. Phys Rev Lett 127:080505
Google Scholar
Steinbrecher GR, Olson JP, Englund D, Carolan J (2019) Quantum optical neural networks. Npj Quantum Information 5:60
Google Scholar
Suzuki Y, Kawase Y, Masumura Y, Hiraga Y, Nakadai M, Chen J, Nakanishi KM, Mitarai K, Imai R, Tamiya S, Yamamoto T, Yan T, Kawakubo T, Nakagawa YO, Ibe Y, Zhang Y, Yamashita H, Yoshimura H, Hayashi A, Fujii K (2021) Qulacs: a fast and versatile quantum circuit simulator for research purpose. Quantum 5:559
Google Scholar
Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, Burovski D, Peterson P, Weckesser W, Bright W, van der Walt SJ, Brett M, Wilson J, Millman KJ, Mayorov N, Nelson ARJ, Jones E, Kern R, Larson E, Carey CJ, Polat İ, Feng Y, Moore EW, VanderPlas J, Laxalde D, Perktold J, Cimrman R, Henriksen I, Quintero EA, Harris CR, Archibald AM, Ribeiro AH, Pedregosa F, van Mulbregt P (2020) Fundamental algorithms for scientific computing in python and SciPy 10. contributors SciPy 10. Nat Methods 17:261
Volkoff TJ (2021) Efficient trainability of linear optical modules in quantum optical neural networks. J Russ Laser Res 42:250
Google Scholar
Wang Z, Pechal M, Wollack EA, Arrangoiz-Arriola P, Gao M, Lee NR, Safavi-Naeini AH (2019) Quantum dynamics of a few-photon parametric oscillator. Phys Rev X 9:021049
Google Scholar
Wielinga B, Milburn G (1993) Quantum tunneling in a Kerr medium with parametric pumping. Phys Rev A 48:2494
Google Scholar
Wierichs D, Izaac J, Wang C, Lin CY-Y (2022) General parameter-shift rules for quantum gradients. Quantum 6:677
Google Scholar
Yamaji T, Kagami S, Yamaguchi A, Satoh T, Koshino K, Goto H, Lin Z, Nakamura Y, Yamamoto T (2022) Spectroscopic observation of the crossover from a classical Duffing oscillator to a Kerr parametric oscillator. Phys Rev A 105:023519
Google Scholar

Download references

Acknowledgements

We would like to thank Takashi Imoto, Suguru Endo, and Takaaki Aoki for fruitful discussions. We also would like to thank the developers of QuTiP (Johansson et al. 2013) and Qulacs (Suzuki et al. 2021). They were used for our numerical simulations.

Funding

This work was supported by the Leading Initiative for Excellent Young Researchers, MEXT, Japan, and JST Presto (Grant No. JPMJPR1919), Japan. This work was also supported by Grant-in-Aid for JSPS Research Fellow 22J01501. This work was supported by JST Moonshot R &D (Grant Number JPMJMS226C). This paper is partly based on the results obtained from a project, JPNJ16007, commissioned by the New Energy and Industrial Technology Development Organization (NEDO), Japan.

Author information

Authors and Affiliations

Global Research and Development Center for Business by Quantum-AI Technology (G-QuAT), National Institute of Advanced Industrial Science and Technology (AIST), 1-1-1, Umezono, Tsukuba, Ibaraki, 305-8568, Japan
Yuichiro Mori, Kouhei Nakaji, Yuichiro Matsuzaki & Shiro Kawabata
Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Ontario, Canada
Kouhei Nakaji
Quantum Computing Center, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa, 223-8522, Japan
Kouhei Nakaji
NEC-AIST Quantum Technology Cooperative Research Laboratory, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Ibaraki, 305-8568, Japan
Yuichiro Matsuzaki & Shiro Kawabata

Authors

Yuichiro Mori
View author publications
You can also search for this author in PubMed Google Scholar
Kouhei Nakaji
View author publications
You can also search for this author in PubMed Google Scholar
Yuichiro Matsuzaki
View author publications
You can also search for this author in PubMed Google Scholar
Shiro Kawabata
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Y. Mori performed the calculations and wrote the draft of the manuscript. Also, Y. Mori prepared the figures. K.N. introduced the motivation of this study and wrote the abstract and introduction in terms of studies of the VQA. Y. Matsuzaki and S.K. supervised the project. All authors reviewed and revised the manuscript.

Corresponding author

Correspondence to Yuichiro Mori.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: A derivation of a Hamiltonian of a KPO

We explain a derivation of the effective Hamiltonian of the KPO composed of an array of superconducting quantum interference devices (SQUIDs) (Wang et al. 2019; Gao et al. 2021; Masuda et al. 2022). Let us denote the overall phase across the junction array by $\hat{\varphi }$. Also, we denote $\hat{n}_{C}$ by a difference in the number of charges on the two plates of the capacitor in units 2e, which corresponds to the charge of a Cooper pair.

These will obey the following canonical commutation relation.

$$\begin{aligned}{}[\hat{\varphi },\hat{n}_{C}]=i. \end{aligned}$$

(A1)

The Hamiltonian of the system is given as

$$\begin{aligned} \hat{H} =4 E_{C}\hat{n}_{C}^{2} -N_{S}E_{J}[\Phi (t)]\cos {\frac{\hat{\varphi }}{N_{S}}}, \end{aligned}$$

(A2)

where $N_{S}$, $E_{J}$, and $E_{C}$ are the number of SQUIDs, the Josephson energy of a SQUID, and the charging energy of the resonator.

By using a time-dependent magnetic flux $\Phi (t)$ penetrating the SQUID loops, we can modulate the Josephson energy as $E_{J} (t) = E_{J} + \delta E_{J} \cos {\omega _{p}t}$. For simplicity, we set the phase of the pump field to be zero, $\theta = 0$.

By performing a Taylor expansion and taking into account up to the fourth order of $\varphi /N_{S}$, we approximate Hamiltonian as

$$\begin{aligned} \hat{H}&=4E_{C}\hat{n}_{C}^{2}+\frac{E_{J}}{2N_{S}}\hat{\varphi }^{2}-\frac{E_{J}}{24N_{S}^{3}}\varphi ^{4}\nonumber \\&\quad -N_{S}\delta E_{J}\cos {\omega _{p}t}\left( 1-\frac{1}{2}\frac{\hat{\varphi }^{2}}{N_{S}^{2}}+\frac{1}{24}\frac{\hat{\varphi }^{4}}{N_{S}^{4}}\right) . \end{aligned}$$

(A3)

Here, we assume $|\langle \varphi /N_{S}\rangle | \ll 1$.

We can define the creation and annihilation operator as follows.

$$\begin{aligned} \hat{a}&= \frac{1}{2}\left( \frac{E_{J}}{2N_{S}E_{C}}\right) ^{\frac{1}{4}}\hat{\varphi } + i\left( \frac{2N_{S}E_{C}}{E_{J}}\right) ^{\frac{1}{4}}\hat{n}_{C},\end{aligned}$$

(A4)

$$\begin{aligned} \hat{a}^{\dagger }&= \frac{1}{2}\left( \frac{E_{J}}{2N_{S}E_{C}}\right) ^{\frac{1}{4}}\hat{\varphi } - i\left( \frac{2N_{S}E_{C}}{E_{J}}\right) ^{\frac{1}{4}}\hat{n}_{C}. \end{aligned}$$

(A5)

Then, the Hamiltonian Eq. A3 becomes

$$\begin{aligned} \hat{H}&=\omega \left( \hat{a}^{\dagger } \hat{a} +\frac{1}{2}\right) -\frac{K}{12}(\hat{a}+\hat{a}^{\dagger })^{4}\nonumber \\&\quad +\left[ -N_{S}\delta E_{J} +p(a+a^{\dagger })^{2} -\frac{Kp}{3\omega }(a+a^{\dagger })^{4}\right] \nonumber \\&\qquad \times \cos {\omega _{p}t}, \end{aligned}$$

(A6)

where $\omega =\sqrt{8E_{C}E_{J}/N_{S}}$, $K=E_{C}/N_{S}^{2}$, and $p=2\omega \delta E_{J}/8E_{J}$.

We assume $Kp\ll \omega $, and we drop the last term in Eq. A6. Then, we obtain

$$\begin{aligned} \hat{H} = \omega a^{\dagger }a -\frac{K}{12}(a+a^{\dagger })^{4} + p(a+a^{\dagger })^{2}\cos {\omega _{p}t}. \end{aligned}$$

(A7)

Moving to the rotating frame at the frequency of $\omega _{p}/2$ and using the rotating wave approximation, we obtain

$$\begin{aligned} \hat{H}=\Delta a^{\dagger } a -\frac{K}{12}a^{\dagger 2}a^{2}+\frac{p}{2}(a^{2}+a^{\dagger 2}), \end{aligned}$$

(A8)

where $\Delta = \omega -K-\omega _{p}/2.$

Appendix B: A conventional scheme for our simulation for $d_{x}=d_{y}=1$

Let us review a conventional scheme for quantum circuit learning with qubits (Mitarai et al. 2018). The unitary gate to encode input data $U(\varvec{x})$ is chosen as

$$\begin{aligned} \hat{U}(x)=\prod _{j=1}^{M}R^{Z}_{j}(\arccos (x^{2}))R^{Y}_{j}(\arcsin (x)), \end{aligned}$$

(B1)

where $R^{Z}_{j}(\phi )$ is the rotation of the j-th qubit around the z axis with an angle of $\phi $, $R^{Y}_{j}(\phi )$ is the rotation of the j-th qubit around the y axis with an angle of $\phi $, respectively, and M is the number of the qubits.

The parameterized unitary $\hat{V}(\varvec{\theta })$ is composed of two parts. The first part is a single-qubit rotation on the j-th qubit $\hat{U}(\theta ^{i}_{j})$, which can be generally decomposed into the following form:

$$\begin{aligned} \hat{U}(\theta ^{i}_{j}) = R^{X}_{j}(\theta ^{i}_{j1})R^{Z}_{j}(\theta ^{i}_{j2})R^{X}_{j}(\theta ^{i}_{j3}). \end{aligned}$$

(B2)

This means that the single-qubit rotation gate contains three free parameters. The other part is a unitary operation induced by the following Hamiltonian:

$$\begin{aligned} \hat{H} = \sum _{j=1}^{M} a_{j}X_{j} +\sum _{j=1}^{M}\sum _{k=1}^{j-1}J_{jk}Z_{j}Z_{k}, \end{aligned}$$

(B3)

where the coefficients $a_{j}$ and $J_{jk}$ are taken randomly from a uniform distribution on $[-1, 1]$. This means that the unitary operator becomes

$$\begin{aligned} \hat{V}(\varvec{\theta }) = \prod _{i=1}^{D}e^{-i\tau \hat{H}}\prod _{j = 1}^{K}\hat{U}(\theta ^{i}_{j}). \end{aligned}$$

(B4)

We use $\tau = 10$ in our simulation, which is the same as that used in Mitarai et al. (2018).

Appendix C: A method for supervised machine learning with $d_x>1$ by using a single KPO

We could use the single KPO with $d_x>1$ and $d_y>1$. For $d_x>1$, we need to encode the variable into the initial state. If we encode a different data, different outputs should be generated. So, in this case, $\vert {\psi (x_1,.....x_{d_{x}})}\rangle $ should be different from $\vert {\psi (x'_1,.....x'_{d_{x}})}\rangle $ unless $x_j=x'_j$ for $j=1,2,\cdots , d_x$ is satisfied. For $d_y>1$, we have to measure independent observables $\hat{M}_{1}$, $\hat{M}_{2}$,..., $\hat{M}_{d_{y}}$ at the step 3.

Let us explain such an example. Firstly, we prepare the coherent state $\vert {\alpha }\rangle $ where we set $\alpha = r = \sqrt{x_{1}^{2}+x_{2}^{2}}$. Second, we perform a unitary operation $e^{-i\tilde{\chi }\hat{n}^{2}-i\varphi \hat{n}}$, where $\varphi $ is defined by

$$\begin{aligned} \varphi = {\left\{ \begin{array}{ll} \arccos {\frac{x_{1}}{r}},\quad (x_2 > 0) \\ -\arccos {\frac{x_{1}}{r}},\quad (x_2 \le 0) \\ 0,\quad (r = 0). \end{array}\right. } \end{aligned}$$

(C1)

It is notable that $\varphi $ is defined on the interval $[-\pi , \pi ).$ We obtain

$$\begin{aligned} \vert {\psi (x_{1},x_{2})}\rangle =e^{-i\tilde{\chi }\hat{n}^{2}-i\varphi \hat{n}}\vert {r}\rangle . \end{aligned}$$

(C2)

The overlap $\langle {\psi (x'_{1},x'_{2})|\psi (x_{1},x_{2})}\rangle $ is calculated as follows:

$$\begin{aligned}&\langle {\psi (x'_{1},x'_{2})|\psi (x_{1},x_{2})}\rangle \nonumber \\&=\langle {r'}\vert e^{i\tilde{\chi }\hat{n}^{2}+i \varphi '\hat{n}}e^{-i\tilde{\chi }\hat{n}^{2}-i \varphi \hat{n}}\vert {r}\rangle \nonumber \\&=\langle {r'}\vert e^{i(\varphi '-\varphi )\hat{n}}\vert {r}\rangle \nonumber \\&=\langle {r'|re^{i(\varphi '-\varphi )}}\rangle \nonumber \\&=e^{-\frac{1}{2}(|r'|^{2}+|r|^{2}-2r'r e^{i(\varphi '-\varphi )})}. \end{aligned}$$

(C3)

If the overlap Eq. C3 is 1, the exponent of the overlap is zero, and we obtain

$$\begin{aligned} r'^{2}+r^{2}-2r'r e^{i(\varphi '-\varphi )} = 0. \end{aligned}$$

(C4)

We obtain the solution as follows.

$$\begin{aligned} \varphi '&= \varphi ,\end{aligned}$$

(C5)

$$\begin{aligned} r'&= r. \end{aligned}$$

(C6)

Therefore, the overlap is not unity unless $x_1=x_1'$ and $x_2=x_2'$ are satisfied. Also, as the observable, we can adopt $\hat{M}_{1}=\hat{a}+\hat{a}^{\dagger }$ and $\hat{M}_{2}=\hat{a}^{\dagger }\hat{a}$. So we can use this initial state for our method with $d_x=2$ and $d_y=2$.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Mori, Y., Nakaji, K., Matsuzaki, Y. et al. Expressive quantum supervised machine learning using Kerr-nonlinear parametric oscillators. Quantum Mach. Intell. 6, 14 (2024). https://doi.org/10.1007/s42484-024-00152-5

Download citation

Received: 01 May 2023
Accepted: 17 February 2024
Published: 04 March 2024
DOI: https://doi.org/10.1007/s42484-024-00152-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Expressive quantum supervised machine learning using Kerr-nonlinear parametric oscillators

Abstract

Similar content being viewed by others

Hybrid Helmholtz machines: a gate-based quantum circuit implementation

Quantum and quantum-like machine learning: a note on differences and similarities

A Variational Algorithm for Quantum Neural Networks

1 Introduction

2 KPO

3 Quantum supervised machine learning as a NISQ algorithm

4 Quantum supervised machine learning with KPO

4.1 \(d_{x} = d_{y} =1\) case

4.1.1 Single KPO

4.1.2 KPO network

4.2 \(d_x>1\) and/or \(d_y>1\) case

4.3 Potential advantage to use KPOs

5 Simulations and results

5.1 \(\alpha \) and expressive power

6 Conclusions and discussion

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A: A derivation of a Hamiltonian of a KPO

Appendix B: A conventional scheme for our simulation for \(d_{x}=d_{y}=1\)

Appendix C: A method for supervised machine learning with \(d_x>1\) by using a single KPO

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Expressive quantum supervised machine learning using Kerr-nonlinear parametric oscillators

Abstract

Similar content being viewed by others

Hybrid Helmholtz machines: a gate-based quantum circuit implementation

Quantum and quantum-like machine learning: a note on differences and similarities

A Variational Algorithm for Quantum Neural Networks

1 Introduction

2 KPO

3 Quantum supervised machine learning as a NISQ algorithm

4 Quantum supervised machine learning with KPO

4.1 \(d_{x} = d_{y} =1\) case

4.1.1 Single KPO

4.1.2 KPO network

4.2 \(d_x>1\) and/or \(d_y>1\) case

4.3 Potential advantage to use KPOs

5 Simulations and results

5.1 \(\alpha \) and expressive power

6 Conclusions and discussion

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A: A derivation of a Hamiltonian of a KPO

Appendix B: A conventional scheme for our simulation for \(d_{x}=d_{y}=1\)

Appendix C: A method for supervised machine learning with \(d_x>1\) by using a single KPO

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation