Benchmarking quantum versions of the kNN algorithm with a metric based on amplitude-encoded features

Guerrero-Estrada, Areli-Yesareth; Quezada, L. F.; Sun, Guo-Hua

doi:10.1038/s41598-024-67392-0

Benchmarking quantum versions of the kNN algorithm with a metric based on amplitude-encoded features

Article
Open access
Published: 19 July 2024

Volume 14, article number 16697, (2024)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Benchmarking quantum versions of the kNN algorithm with a metric based on amplitude-encoded features

Download PDF

Areli-Yesareth Guerrero-Estrada¹,
L. F. Quezada² &
Guo-Hua Sun¹

215 Accesses
1 Altmetric
Explore all metrics

Abstract

This work introduces a quantum subroutine for computing the distance between two patterns and integrates it into two quantum versions of the kNN classifier algorithm: one proposed by Schuld et al. and the other proposed by Quezada et al. Notably, our proposed subroutine is tailored to be memory-efficient, requiring fewer qubits for data encoding, while maintaining the overall complexity for both QkNN versions. This research focuses on comparing the performance of the two quantum kNN algorithms using the original Hamming distance with qubit-encoded features and our proposed subroutine, which computes the distance using amplitude-encoded features. Results obtained from analyzing thirteen different datasets (Iris, Seeds, Raisin, Mine, Cryotherapy, Data Bank Authentication, Caesarian, Wine, Haberman, Transfusion, Immunotherapy, Balance Scale, and Glass) show that both algorithms benefit from the proposed subroutine, achieving at least a 50% reduction in the number of required qubits, while maintaining a similar overall performance. For Shuld’s algorithm, the performance improved in Cryotherapy (68.89% accuracy compared to 64.44%) and Balance Scale (85.33% F1 score compared to 78.89%), was worse in Iris (86.0% accuracy compared to 95.33%) and Raisin (77.67% accuracy compared to 81.56%), and remained similar in the remaining nine datasets. While for Quezada’s algorithm, the performance improved in Caesarian (68.89% F1 score compared to 58.22%), Haberman (69.94% F1 score compared to 62.31%) and Immunotherapy (76.88% F1 score compared to 69.67%), was worse in Iris (82.67% accuracy compared to 95.33%), Balance Scale (77.97% F1 score compared to 69.21%) and Glass (40.04% F1 score compared to 28.79%), and remained similar in the remaining seven datasets.

Hierarchical quantum classifiers

Article Open access 17 December 2018

Classification Problem in a Quantum Framework

Quantum Algorithm for K-Nearest Neighbors Classification Based on the Metric of Hamming Distance

Article 23 August 2017

Introduction

Machine learning, as a subfield of artificial intelligence, focuses on developing algorithms and models capable of learning from data without explicit programming. This characteristic enables them to adapt and improve performance automatically as they gain exposure to more information. Consequently, they can identify patterns, extract meaningful insights, and make predictions or decisions.

In supervised learning, algorithms are trained using labeled datasets, where each pattern is associated with a known class label. This methodology allows the algorithm to learn the relationship between input features and corresponding labels. Supervised learning algorithms encompass multiple methods. For instance, in 1952, Fix and Hodges introduced the k-nearest neighbors (kNN) algorithm ¹, which utilizes the distance between patterns to assign class labels. In the same decade, Rosenblatt ² presented the perceptron, a fundamental neural network model that adjusts weights for classification tasks. Another significant development occurred in 1986 when Quinlan proposed the Iterative Dichotomiser 3 (ID3) algorithm, a model based on decision trees ³, which divides data into branches based on relevant features. Subsequently, in 1995, Vapnik and Chervonenkis presented support vector machines (SVM) ⁴, aiming to find optimal hyperplanes in a space to separate data from different classes. These algorithms found numerous applications across various fields, including computer vision, image, and speech recognition ^5,6,7, natural language processing ^8,9,10, recommendation systems ^11,12, fraud detection ^13,14,15, healthcare ^16,17, finance ^18,19,20, among others.

On the other hand, in recent years, quantum computing has emerged as a new attractive field that leverages quantum phenomena such as entanglement and superposition to efficiently solve complex mathematical problems that traditional computers may struggle with or find unfeasible to solve. This new paradigm has opened opportunities for developing algorithms that demonstrate superiority over classical computation in specific tasks, such as factorizing large numbers, simulating quantum systems, and optimizing complex systems ^21,22,23,24. Naturally, quantum computing began to be applied to machine learning tasks, giving rise to the field now known as Quantum Machine Learning (QML). Thanks to quantum parallelism, tasks that were challenging for classical computers can be executed more efficiently on quantum computers, instilling optimism about the potential of QML ²⁵.

To date, several quantum and quantum-classical hybrid versions of classical machine learning algorithms have been proposed, including quantum neural networks (QNN) ^26,27,28,29, quantum associative memories (QAM) ³⁰, quantum support vector machine (QSVM) ^31,32, and quantum k-nearest neighbors algorithm (QkNN) ^25,33,34,35. Quantum approaches have demonstrated effectiveness in various applications such as image classification ^26,36,37, text processing ³⁸, medical applications ^{39,40,41,42,43}, data mining ⁴⁴ and financial applications ⁴⁵. QkNN algorithms, in particular, use quantum circuits for distance calculations. This method can reduce time complexity from polynomial to logarithmic scales in some scenarios, making feasible to analyze certain datasets where the classical kNN is computationally too demanding ^44,46,47. However, as we will show, there is still room for improvement, particularly in reducing the number of qubits required by these algorithms to encode numerical features. This reduction can potentially make them easier to implement on NISQ computers.

In this paper, we explore the modification of two such QkNN models ^25,33 to further optimize their efficiency and memory requirements. Instead of the originally used Hamming metric ⁴⁸ with qubit-encoded features, this work proposes a quantum subroutine that computes the distance between patterns using amplitude-encoded features. To assess the performance of the modified algorithms, we conduct a thorough analysis using thirteen numerical datasets. Our objective is to evaluate the effectiveness and impact of these modified algorithms in classification tasks, offering a promising alternative to metric-based quantum algorithms.

This paper is organized as follows: Section 2 provides some basic definitions and descriptions of the subroutines and the quantum kNN versions proposed by Schuld and Quezada. Section 3 gives the details of the proposed metric and the corresponding adaptations of the quantum kNN algorithms incorporating the proposed subroutine. The results of the computational experiments and the comparison between the two algorithms are presented in Section 4. Lastly, Section 5 concludes the paper and discusses potential future work.

Background

Classical kNN

One common approach in supervised learning involves using distances to classify patterns. These distances, such as the Euclidean or the Hamming metrics, quantify the similarity or dissimilarity between patterns. The underlying assumption is the elements of the same class are more likely to exhibit similarities and be closer to each other in the features space. By calculating the distances between patterns and using predefined decision boundaries or thresholds, an algorithm can assign a class to an unlabeled pattern.

One of the most popular metric-based algorithms is the kNN ¹. Due to its simplicity, it is widely used on different datasets, including images and texts ⁴⁹. Furthermore, its implementation can employ different classical, quantum, and hybrid (quantum-classical) approaches ^50,51.

Consider a training set of binary n-dimensional patterns $T=\{ (x^1,c^1),\dots ,(x^N,c^N) \}$ where $x^j=[ 0,1 ]_n$ and an unlabeled pattern $x^k$ will be classified. The classical version of the kNN algorithm consists of the following steps:

The algorithm stores the training patterns and their corresponding class labels in the training phase.
Subsequently, in the classification phase, the algorithm calculates the distance between a given pattern to be classified, denoted as $x^k$, and each pattern element of the training set T.
Lastly, the algorithm assigns the majority class among the k closest elements based on the computed distances. In the case of ties, a rule such as selecting the class with the smallest mean distance must be predefined.

The choice of the distance metric and the value of k can significantly affect the performance of the kNN algorithm. Different distance metrics may be more suitable for different data types and domains. Even though the kNN algorithm is simple and easy to implement, it can be computationally expensive, specially for large datasets, as it requires calculating the distance to all training samples for every prediction. The resulting complexity is thus O(nN), where n is the number of features (the dimension of the pattern), and N is the number of patterns in the training set.

Schuld’s quantum version

The quantum kNN algorithm proposed by Schuld et al. ²⁵ initializes the patterns in the training set into an equiprobable superposition

$$\begin{aligned} \mid \psi _0 \rangle = \frac{1}{\sqrt{N}} \sum \limits _{j=1}^{N} \mid x^k;x^j;c^j;0 \rangle . \end{aligned}$$

(1)

The similarity between features is stored in the qubits associated with the training patterns $\mid x^j \rangle$, and a special unitary operator $U_f$ is used to encode the corresponding Hamming distance $d_{h}$ in the amplitude of each element in the superposition, resulting in an output state of the form:

$$\begin{aligned} \mid \psi _f \rangle = \frac{1}{\sqrt{N}} \sum \limits _{j=1}^{N} \cos \left( \beta ^j \right) \mid x^k;d^{j};c^j;0 \rangle +\frac{1}{\sqrt{N}} \sum \limits _{j=1}^{N} \sin \left( \beta ^j \right) \mid x^k;d^{j};c^j;1 \rangle , \end{aligned}$$

(2)

where $\beta ^j = \displaystyle \frac{\pi \cdot d_h(x^k;x^j)}{2n}$ and

$$\begin{aligned} d^{j}_{i} = \left\{ \begin{matrix} 1 &{} \text {if} \quad x^{k}_{i} = x^{j}_{i} ,\\ \\ 0 &{} \text {if} \quad x^{k}_{i} \ne x^{j}_{i} . \end{matrix} \right. \end{aligned}$$

(3)

Figure 1 shows the circuit associated with Schuld’s algorithm.

Notice that the term where the last qubit is $\mid 0 \rangle$ in Eq. (2) is the one where it is more likely to measure a class corresponding to one of the nearest neighbors. This is due to the amplitude being $\cos (\beta _{j})$, and $\beta _{j}$ being proportional to the Hamming distance between patterns. That is, if the neighbors are near, then $\beta _{j} \approx 0$ and thus $\cos (\beta _{j}) \approx 1$. On the other hand, the term where the last qubit is $\mid 1 \rangle$, the sine amplitude amplifies the opposite probability, that is the one where the class corresponds to one of the farthest neighbors.

Schuld et al. propose to run the algorithm t times, where t is a previously defined threshold such that $t > k$. For each execution, the ancilla qubit is measured. If $\mid 0 \rangle$ is obtained, the class is also measured; if $\mid 1 \rangle$ is measured, the execution is discarded. This process is repeated until the k neighbors or threshold t are reached. The class that appears the most among the k (or less) candidates is selected. Analogous to the classical version, tie-breaking rules must be defined beforehand.

The probability of measuring the ancilla qubit at $\mid 0 \rangle$ is given by

$$\begin{aligned} P_0 = \frac{1}{N} \sum \limits _{j=1}^{N} \cos ^2\left( \beta ^{j} \right) . \end{aligned}$$

(4)

So, the probability of obtaining a specified class c is given by

$$\begin{aligned} P(c) = \frac{1}{P_0 N} \sum \limits _{j\mid x^j\in c}^{N} \cos ^2\left( \beta ^{j} \right) . \end{aligned}$$

(5)

It is worth mentioning that this algorithm is strongly based on the Hamming distance, thus requiring classical data to be binarized and encoded in qubits. If the analyzed dataset has f features, each feature requires an average of $\bar{\alpha }$ qubits to encode the corresponding numerical values and c represents the number of qubits required to encode the class, then the algorithm necessitates at least

$$\begin{aligned} N^{\text {S}}_{\text {original}} = 2\bar{\alpha }f+c+1 \end{aligned}$$

(6)

qubits in order to be implemented (ignoring initialization).

Quezada’s quantum version

The quantum kNN algorithm proposed by Quezada et al. ³³ is based on the (m, p) sorting algorithm, where m is the length of the array to be sorted and $p \in \mathbb {N}$ corresponds to the times that the Grover subroutine is applied. The initial state is prepared as follows:

$$\begin{aligned} \mid \psi _0 \rangle = \frac{1}{\sqrt{N}} \sum _{j=1}^{N} \mid c^{j};x^{k};x^{j} \rangle \otimes \mid T_x \rangle ^{\otimes (m-1)} \otimes \mid 0 \rangle , \end{aligned}$$

(7)

where $\mid T_x \rangle = \displaystyle \frac{1}{\sqrt{N}} \sum _{j=1}^{N} \mid x^{j} \rangle .$

As in Schuld’s version, the second step consists of computing the features’ similarities between the training patterns $x^{j}$ and the unlabeled pattern $x^{k}$ and storing them in the qubits associated with the training patterns. The (m, p) sorting algorithm is then applied to these qubits, which at this point of the algorithm are in the state $\mid d^j\rangle ^{\otimes m}\otimes \mid 0 \rangle$, where $d^{j}_{i}$ is as in Eq. (3).

The resulting final state is given by

$$\begin{aligned} \mid \psi _f \rangle = \frac{\cos [(2p+1)\theta ]}{\sqrt{\nu }} \sum \limits _{ \begin{matrix} {j_1},...,{j_m} \\ \text {No ord} \end{matrix} }^{N} \mid c^{j_{1}};x^k;d^{j_1}...d^{j_m};0 \rangle + \frac{\sin [(2p+1)\theta ]}{\sqrt{\mu }} \sum \limits _{ \begin{matrix} {j_1},...,{j_m} \\ \text {Ord} \end{matrix} }^{N} \mid c^{j_{1}};x^k;d^{j_1}...d^{j_m};1 \rangle , \end{aligned}$$

(8)

where

$$\begin{aligned} \mu&= \displaystyle \frac{N!}{m!(N-m)!}, \end{aligned}$$

(9)

$$\begin{aligned} \nu&= N^m - \mu ,\end{aligned}$$

(10)

$$\begin{aligned} \theta&= \arcsin \left( \displaystyle \sqrt{\frac{\mu }{N^m}}\right) , \end{aligned}$$

(11)

and “No ord”, “Ord” stand for “non ordered” and “ordered” respectively. This order is the one performed by the (m, p) sorting algorithm, which tags those registers that do not respect $d^{j_1}< \cdots < d^{j_m}$, where the relation < is based on the number of 1’s in each $d^{j_i}$. Figure 1 shows the circuit associated with Quezada’s algorithm.

The last step is simply measuring the class qubit $\mid c^j \, \rangle$, and adding it as one of the k possible candidates. The probability of getting an arbitrary class c is given by

$$\begin{aligned} P(c) = \frac{\cos ^{2}[(2p+1)\theta ]}{\nu } \sum \limits _{x \in c} N_0(x) + \frac{\sin ^{2}[(2p+1)\theta ]}{\mu } \sum \limits _{x \in c} N_1(x), \end{aligned}$$

(12)

where,

$$\begin{aligned} N_0(x)= & {} \left\{ \begin{matrix} N^{m-1} &{} if\, x < m, \\ N^{m-1} - \frac{(x-1)!}{(m-1)!(x-m)!} &{} if\, x \ge m, \end{matrix} \right. \end{aligned}$$

(13)

$$\begin{aligned} N_1(x)= & {} \left\{ \begin{matrix} 0 &{} if\, x < m, \\ \frac{(x-1)!}{(m-1)!(x-m)!} &{} if\, x \ge m. \end{matrix} \right. \end{aligned}$$

(14)

Lastly, the whole process is repeated k times to obtain the k class candidates.

Under the assumptions that $N \gg m$ and $\arcsin {\frac{1}{\sqrt{m!}}} \approx \frac{1}{\sqrt{m!}}$, the relationship between m and p that optimizes the algorithm is given by,

$$\begin{aligned} (2p+1)\sqrt{\frac{1}{m!}} \approx \frac{\pi }{2} (2 w + 1), \end{aligned}$$

(15)

where w is an integer. It should also be considered that $p \in \mathbb {N}$ represents the number of times that the Grover subroutine is applied. The optimal value of p as a function of m is thus

$$\begin{aligned} p_{\text {opt}} \approx \frac{\pi }{4}\sqrt{m!}-\frac{1}{2}. \end{aligned}$$

(16)

As in Schuld’s version, Quezada’s also uses the Hamming distance to compare patterns, and thus, its implementation requires numerical data to be binarized. In this case, if the analyzed dataset has f features, each feature requires an average of $\bar{\alpha }$ qubits to encode the corresponding numerical values and c represents the number of qubits required to encode the class, then the algorithm necessitates at least

$$\begin{aligned} N^{\text {Q}}_{\text {original}} = (m+1)\bar{\alpha }f+c+1 \end{aligned}$$

(17)

qubits in order to be implemented (ignoring initialization).

QkNN algorithm with a non-binary similarity between features

Metric based on amplitude-encoded features

This section introduces a subroutine that calculates the similarity between features of distinct patterns, and subsequently, this information will be used to compute the distance between them.

When conducting real data analysis, it is necessary to perform preprocessing steps to prepare the data. For numerical data, normalization between the range of 0 and 1 is required, while categorical data needs to be encoded as binary numbers. In this context, let us consider a training set consisting of n-dimensional numerical patterns denoted as $T=\left\{ \left( x^1,c^1\right) ,\dots ,\left( x^N,c^N\right) \right\}$, where $x^{j} = \left( x^{j}_{1},\dots , x^{j}_{n} \right)$. Additionally, we have an unlabeled pattern $x^k$ that needs to be classified.

Computing the Hamming distance, which is employed in both Schuld’s and Quezada’s QkNN algorithms, requires the utilization of CNOT gates to compare the qubit-encoded features of the test pattern $x^{k}$ with those of the training patterns $x^{j}$. The comparison information, termed as $d^{j}$ in Eq. (3), is then stored in the qubits corresponding to the training patterns, effectively deleting the original data in the process. This can be clearly seen in Eqs. (2) and (8), where the qubits corresponding to the training patterns $x^{j}$ have been replaced by $d^{j}$.

Here, we present an alternative similarity measure that eliminates the need for binarizing numerical data. Furthermore, our proposed method reduces the number of qubits needed to implement the algorithms, as it only requires a single qubit per numerical feature. The computation of the proposed similarity between features involves applying a rotation around the y-axis on a single qubit initialized in $\mid 0 \rangle$. This rotation employs the difference between the numerical values of the corresponding features as the angle of rotation. The resulting quantum state compares each feature of $x^k$ with all the patterns from the training set:

$$\begin{aligned} \mid d^{j} \rangle = \bigotimes \limits _{i=1}^{n} \left[ \cos \left( \frac{\pi \lambda _i^j}{2} \right) \mid 0 \rangle + \sin \left( \frac{\pi \lambda _i^j}{2} \right) \mid 1 \rangle \right] \end{aligned}$$

(18)

with $\lambda _i^j = x_i^j - x_i^k$ and $j \in \{1,2,...,n\}$. Notice that Eq. (18) can also be written in the more convenient form

$$\begin{aligned} \mid d^j \rangle = \sum \limits _{g=0}^{2^n-1} \gamma _g^j \mid g \rangle , \end{aligned}$$

(19)

where

$$\begin{aligned} \gamma _g^j = \prod \limits _{i=1}^{n} \sin \left[ \frac{\pi }{2}\left( x_{i}^{j}-x_{i}^{k} + g_{i}\right) \right] , \end{aligned}$$

(20)

and $g_{i}$ denotes the i-th binary digit of g.

The transition from $\mid 0 \, \rangle ^{\otimes n}$ to $\mid d^{j} \, \rangle$ can be unitarily done using a set of controlled rotations. These rotations need to have the corresponding angles encoded in them, and the controlling qubits must uniquely identify each pattern in the training set, similar to having an index register. Expressing the index register of $\mid x^{j} \rangle$ as $\mid j \rangle$, the Eq. (21) describes the unitary operator, which we will denote as $U_{i}$, satisfies

$$\begin{aligned} U_{i} \mid j \rangle \otimes \mid 0 \rangle ^{\otimes n} = \mid j \rangle \otimes \bigotimes _{w=1}^{n} \left[ R_{y} \left( \frac{\pi }{2}\cdot x_{w}^{j} \right) \mid 0 \rangle \right] . \end{aligned}$$

(21)

Afterwards, the comparison with the features of $x^{k}$ is performed by rotating the same n qubits in the opposite direction using the operator $R_{y} \left[ -\frac{\pi }{2}\left( x_{w}^{k}\right) \right]$, which results in the state $\mid d^{j} \, \rangle$ from Eq. (19).

Notice that the cosine term in equation Eq. (18) quantifies the similarity between the features. $\left( x_{w}^{j}-x_{w}^{k} \right) \rightarrow 0$ indicates a high degree of similarity, resulting in a cosine value close to one. Conversely, if the corresponding features are significantly different, the sine term becomes dominant. Furthermore, if the patterns are binary, then $\cos \left( \frac{\pi \lambda _{k}}{2}\right)$ is equivalent to the binary similarity outlined in Eq. (3).

Schuld’s QkNN modified algorithm

The algorithm implements the similarity measure discussed in the former subsection and modifies the QkNN algorithm proposed by Schuld et al. in the following way:

The initial step of the algorithm is preparing the pattern superposition
$$\begin{aligned} \mid \psi _0 \rangle = \frac{1}{\sqrt{N}} \sum \limits _{j=1}^{N} \mid j; c^j \rangle \otimes \mid 0 \rangle ^{\otimes n} \otimes \mid 0 \rangle , \end{aligned}$$
(22)
where N is the number of patterns in the training set, n is the number of features in each pattern, and $\mid j \, \rangle$ is an index register for the pattern $x^j$.
The next step involves applying the $U_{i}$ operator described in Eq. (21), which rotates the n qubits $\mid 0 \rangle ^{\otimes n}$ based on the n features associated with each pattern in the training dataset. To differentiate the features belonging to each specific pattern, these rotations must be controlled through the index register qubits $\mid j \rangle$, which provide unique identification for each pattern in the training dataset. Subsequently, the comparison with the features of $x^{k}$ is performed by rotating the same n qubits in the opposite direction using the operator $R_{y} \left[ -\frac{\pi }{2}\left( x_{w}^{k}\right) \right]$. The resulting state is given by
$$\begin{aligned} \mid \psi _1 \rangle = \frac{1}{\sqrt{N}} \sum \limits _{j=1}^{N} \left[ \mid j; c^j \rangle \otimes \mid d^j \rangle \right] \otimes \mid 0 \rangle , \end{aligned}$$
(23)
where $\mid d^j \rangle$ is described as in Eq. (19).
Apply a Hadamard gate to the ancilla qubit, resulting in
$$\begin{aligned} \mid \psi _2 \rangle = \frac{1}{\sqrt{2N}} \sum \limits _{j=1}^{N} \left( \mid j; c^j ; d^j \rangle \right) \otimes \left( \mid 0 \rangle + \mid 1 \rangle \right) . \end{aligned}$$
(24)
This step is similar to the one described in Schuld’s proposal to encode the Hamming distance in the amplitude of the corresponding states. Nevertheless, in this case, its purpose is to amplify the probability amplitude of the state with more $\mid 0 \rangle$’s, as those corresponding to the training patterns closest to $x^{k}$.

Apply the unitary operator $U_f = e^{-i\pi H / 2n}$ to qubits $\mid d^j \rangle$ and the ancilla, where H is the operator that sums all the binary digits of $\mid d^j \rangle$. The resulting state is thus
$$\begin{aligned} \mid \psi _3 \rangle = \frac{1}{\sqrt{2N}} \sum \limits _{j=1}^{N} \mid j; c^j \rangle \otimes \left( \sum \limits _{g=0}^{2^n-1} \gamma _g^j \mid g;\phi _0 \rangle \right) , \end{aligned}$$
(25)
where
$$\begin{aligned} \mid \phi _0 \rangle = e^{i\pi z_{g} / 2n} \mid 0 \rangle + e^{-i\pi z_{g} / 2n} \mid 1 \rangle , \end{aligned}$$
(26)
and $z_g$ is the sum of all the binary digits of g.
Lastly, apply a Hadamard gate to the ancilla qubit:
$$\begin{aligned} \mid \psi _4 \rangle = \frac{1}{\sqrt{N}} \sum \limits _{j=1}^{N} \sum \limits _{g=0}^{2^n-1} \gamma _g^j \mid j; c^j; g; \phi _1 \rangle , \end{aligned}$$
(27)
where
$$\begin{aligned} \mid \phi _1 \rangle = \cos \left( \displaystyle \frac{\pi }{2n}z_g \right) \mid 0 \rangle + i \sin \left( \displaystyle \frac{\pi }{2n}z_g \right) \mid 1 \rangle . \end{aligned}$$
(28)
Similar to Schuld’s original algorithm, if the ancilla qubit is in state $\mid 0 \rangle$, there is a high probability of measuring a state corresponding to a pattern close to $x^k$. This occurs because the states corresponding to small values of $z_{g}$, those with more digits equal to 0 in the binary representation of g, are the ones corresponding to the nearest neighbors.

In this case, the probability of finding the ancilla qubit in the state $\mid 0 \rangle$ is
$$\begin{aligned} P_{0} = \frac{1}{N}\sum \limits _{j=1}^{N} \sum \limits _{g=0}^{2^n-1} \left[ \gamma _g^j \cos \left( \frac{\pi }{2n}z_g \right) \right] ^2. \end{aligned}$$
(29)
While the probability of measuring a specific class C is given by
$$\begin{aligned} P(C) = \frac{1}{P_0N} \sum \limits _{j\mid x^j\in C}^{N} \sum \limits _{m=0}^{2^n-1} \left[ \gamma _m^j \cos \left( \frac{\pi }{2n}z_m \right) \right] ^{2}. \end{aligned}$$
(30)

Analogously, as done in Schuld’s original proposal, the algorithm concludes in one of the two following ways. Firstly, if the ancilla qubit is found to be in $\mid 1 \rangle$, the result is disregarded and counted as one tryout of t (the previously defined threshold). Alternatively, if the ancilla measurement yields $\mid 0 \rangle$, the class information qubit is also measured, and the outcome is stored as a class candidate. This process continues until k neighbors are obtained or the threshold is reached. At the end, the class that appears the most among the k (or less) candidates is selected. Figure 2 shows the circuit associated with Schuld’s modified algorithm.

As previously mentioned, the main advantage of employing the proposed similarity measure lies in reducing required qubits. In contrast to binarized data, which necessitates multiple qubits for each feature, our approach only needs one qubit per feature. To implement this modified version, the number of required qubits (ignoring initialization) is given by

$$\begin{aligned} N^{\text {S}}_{\text {modified}} = f+c+1, \end{aligned}$$

(31)

where f is the number of features in the dataset and c represents the number of qubits required to encode the class. Regarding the algorithm’s complexity, the modification only impacts the initialization process, leading to an overall complexity of O(ntN), which aligns with the original algorithm’s complexity when initialization is considered.

Quezada’s QkNN modified algorithm

Implementing the similarity measure discussed in Subsect.Metric based on amplitude-encoded features” modifies the QkNN algorithm proposed by Quezada et al. in the following way:

First, the modified algorithm requires the initial state to be in the following superposition,
$$\begin{aligned} \mid \psi _1 \rangle = \frac{1}{\sqrt{N^{m}}} \sum \limits _{j_{1}, \ldots , j_{m}} \mid c^{j_{1}}; j_{1}, \ldots , j_{m} \rangle \otimes \mid 0 \rangle ^{\otimes n m + 1}, \end{aligned}$$
(32)
where each $j_{i}$ runs from 1 to N.
Apply the controlled rotations $U_{i}$ defined in Eq. (21) to each of the m copies of $\mid 0 \rangle ^{\otimes n}$, each one of them controlled by the corresponding register $j_{i}$. Subsequently, implement the similarity between features by applying the rotations $R_{y} \left[ \frac{\pi }{2}\left( -x_{w}^{k}\right) \right]$ in the opposite direction. The resulting state is given by
$$\begin{aligned} \mid \psi _2 \rangle = \frac{1}{\sqrt{N^{m}}} \times \sum \limits _{j_{1}, \ldots , j_{m}} \mid c^{j_{1}}; j_{1}, \ldots , j_{m} ; d^{j_{1}}, \ldots ,d^{j_{m}} \rangle \otimes \mid 0 \rangle , \end{aligned}$$
(33)
where each $\mid d^{j_{i}} \rangle$ is as in Eq. (19).
Next, apply $U_{f_m}$ the (m, p) sorting algorithm to sort the states and tag (via the ancilla qubit) those corresponding to the patterns closest to $x^{k}$, following of G(p) with p applications of Grover’s subroutine to amplify the tagged states:
$$\begin{aligned} \mid \psi _3 \rangle = \left[ \frac{\cos [(2p+1)\theta ]}{\sqrt{\nu }} \sum \limits _{J} \sum _{ G \, \text {No ord}} \! \Gamma _{G}^{J} \mid c^{j_{1}}; J; G; \! 0 \rangle \right. + \left. \frac{\sin [(2p+1)\theta ]}{\sqrt{\mu }} \sum \limits _{J} \sum _{ G\,\text {Ord}} \Gamma _{G}^{J}\mid c^{j_{1}}; J; G; \! 1 \rangle \right] . \end{aligned}$$
(34)
Here, in order to simplify the notation, the indexes J and G respectively denote the sets of indexes $j_{1},\ldots ,j_{m}$ and $g_{1},\ldots ,g_{m}$, such that
$$\begin{aligned} \Gamma _{G}^{J} = \gamma _{g_1}^{j_1} \ldots \gamma _{g_1}^{j_1} \ldots \gamma _{g_m}^{j_m}, \end{aligned}$$
(35)
and
$$\begin{aligned} \mu&= \sum \limits _{J} \sum _{ G \, \text {ord}} \! \mid \Gamma _{G}^{J} \mid ^2 , \end{aligned}$$
(36)
$$\begin{aligned} \nu&= \sum \limits _{J} \sum _{ G \, \text {No ord}} \! \mid \Gamma _{G}^{J} \mid ^2 ,\end{aligned}$$
(37)
$$\begin{aligned} \theta&= \arcsin \left( \displaystyle \sqrt{\frac{\mu }{N^m}}\right) . \end{aligned}$$
(38)
The probability of measuring a specific class C is thus given by
$$\begin{aligned} P(C) = \sum \limits _{J\mid x^{j_{1}}\in C} \left[ \sum _{G \, \text {No ord}} \mid \Gamma _{G}^{J} \mid ^2 + \sum _{G \, \text {ord}} \mid \Gamma _{G}^{J} \mid ^2 \right] . \end{aligned}$$
(39)
As in the original algorithm, we run the previous steps k times in order to obtain k class candidates. Figure 2 shows the circuit associated with Schuld’s modified algorithm.

Given that the probability distribution in Eq. (19) is structured so that the largest amplitude of the superposition aligns with the Hamming state defined by Eq. (3), we expect the optimal value of p to be similar to that of the non-modified version. However, the numerical representation of the features may induce slight variations depending on the dataset.

Like Schuld’s algorithm, the modification here only affects the initialization process, resulting in an overall complexity identical to the original algorithm when initialization is considered. However, the true advantage becomes evident in the memory requirement, as the modified version needs only one qubit per feature. Here, the number of required qubits, without considering initialization, is

$$\begin{aligned} N^{\text {Q}}_{\text {modified}} = mf+c+1, \end{aligned}$$

(40)

where f is the number of features in the dataset and c represents the number of qubits required to encode the class.

Results

Table 1 Datasets information: number of patterns, features, classes and imbalance ratio (IR).

Full size table

Table 2 Probabilities of measuring the final qubit as zero and of failing to gather the required k candidates in Schuld’s algorithm.

Full size table

Table 3 Accuracy comparison between preprocessing techniches on the classic kNN ($k=1,15$), Schuld’s original and Quezada’s original algorithms, using the Iris dataset and Hamming distance.

Full size table

In this section, we conduct a performance comparison of the two QkNN algorithms, including both the original versions and the modified adaptations. The analysis utilizes a set of 13 numerical datasets: Iris ⁵², Cryotherapy ⁵³, Seed ⁵⁴, Raisin ^55,56, Mine ^57,58, Data Bank Authentication (DBA) ⁵⁹ and Caesarian ⁶⁰, which are balanced datasets, as well as Wine ⁶¹, Haberman ⁶², Transfusion ⁶³, Immunotherapy ^64,65, Balance scale ⁶⁶, and Glass ⁶⁷, which are imbalanced datasets. Detailed information regarding these datasets, including their imbalance ratio (IR), can be found in Table 1.

Methodology

All the data analysis here presented was performed using Python 3.12: the scikit-learn library to evaluate performance metrics and Qiskit to simulate algorithms and assess noise. The computations were carried out on a system with the following specifications: an Intel Core i7 10700K CPU at 3.80GHz, 48GB of RAM, and an Nvidia GeForce RTX 2060S GPU.

For the performance comparison, we employ two metrics: Accuracy and F1 score. Accuracy represents the proportion of correctly classified patterns out of the total predictions, and it is widely used for evaluating classification models. However, for imbalanced datasets, accuracy may not provide reliable results. In such cases, utilizing the F1 score, defined as the harmonic mean of precision and recall, is recommended. In addition, following the common practice in the performance analysis of quantum machine learning algorithms, we employ the Leave-One-Out method as the validation method. This deterministic approach ensures that no additional probabilistic factors impact the outcome, providing reliable and consistent results.

Data preprocessing is a crucial step in classification algorithms, as it can significantly enhance their performance. Here, various tests were conducted to analyze the performance of both QkNN algorithms, including normalization of numerical values and different types of binary encoding (Table 3). In the case of the original versions, we found that the best results were obtained using normalization and scaling of the datasets to integers, followed by binarization using the Gray code, which is explicitly designed for numerical data. On the other hand, for the modified algorithms, preprocessing solely consisted on normalizing the numerical data to have values between 0 and 1. This preprocessing techniques were applied to all datasets to ensure a fair comparison of results, guaranteeing that the only factor influencing performance was the algorithms and their corresponding metric.

For Schuld’s algorithm, we set the threshold value at $T=5k$. The resulting probability of measuring the last qubit as $\mid 0 \rangle$ and the probability of not gathering the required k candidates at the algorithm’s conclusion ($P(\lnot k)$) are presented in Table 2 for each dataset. As for Quezada’s original algorithm, when $m=2$, the optimal value of p is found to be $p_{\text {opt}} = 0.5$ regardless of the dataset (as long as N is large enough). However, as we have stated before, in the modified version, the optimal parameter is expected to exhibit slight variations depending on the dataset. Consequently, we conduct our analyses using $m=2$ in all cases and a range of p values, specifically $p \in \{0.5, 1, \dots , 8\}$. It is important to note that the case $p=0.5$ introduces a minor complication, as p is defined as a natural number. Nevertheless, from a strict mathematical perspective, $p=0.5$ also defines a valid unitary operator, and thus we employ it for testing purposes.

The maximum accuracy and F1-score obtained for each dataset and version of QkNN are presented in Tables 4 and 5, respectively. In most cases, this value was obtained utilizing a theoretical approach, assigning to each pattern the class with the maximum probability, calculated using Eqs. (5), (12), (30) and (39). However, for some datasets, the maximum performance was obtained through experimentation and for a finite value of k.

It is important to recognize that a practical implementation of these algorithms may produce different results. To analyze the practical behavior of the algorithms, we conducted a series of experiments with specific values for k, specifically $k \in \{1, 15, 50\}$. Given the inherently probabilistic nature of quantum algorithms, we repeated these experiments 100 times for each value of k, resulting in an accuracy (or F1-score) distribution in each case. The findings are presented in Figs. 3, 4, 5, 6, 7, 8, and 9, illustrated as box-whisker plots for all datasets.

Table 4 Accuracy.

Full size table

Table 5 F1 score.

Full size table

Discussion

From Eqs. (6), (17), (31), and (40), it is evident that the reduction in the number of required qubits when using the modified algorithms can be significant in most real-life scenarios. Here, we analyze whether this benefit comes at the cost of performance.

The theoretical results in Tables 4 and 5 indicate that Schuld’s algorithm shows significant improvements in Cryotherapy and Balance Scale, but exhibits reduced performance in Iris and Raisin, while maintaining similar results in the remaining nine datasets. Conversely, Quezada’s algorithm demonstrates better performance in Caesarian, Haberman, and Immunotherapy, but performs worse in Iris, Balance Scale, and Glass, while maintaining similar results in the remaining seven datasets.

Regarding real-life performance, experiments on the Iris dataset (left panel of Fig. 3) show that both modified versions exhibit performance comparable to Schuld’s original algorithm for the analyzed values of k. In contrast, Quezada’s original algorithm consistently achieves the highest accuracy in all cases. However, it is important to note that the theoretical maximum accuracy of both original versions exceeds that of the modified versions.

The case of the Balance Scale dataset (right panel of Fig. 3) is particularly interesting. Here, for the studied values of k, the performance of the four algorithms is quite similar, with one exception: Schuld’s original version lags slightly behind when $k=50$. Nevertheless, when we consider the theoretical maximums, Schuld’s modified version achieves the highest F1-score, followed by the original version. In contrast, the modification has an adverse effect on Quezada’s algorithm, resulting in Quezada’s modified version delivering the lowest performance among the four.

In the Cryotherapy dataset (left panel of Fig. 4), the observed behavior for finite values of k closely aligns with the behavior at the theoretical limit. In both cases, the performance of the modified algorithms falls between that of their original counterparts. Here, Schuld’s original algorithm consistently performs the worst, while Quezada’s original algorithm delivers the best results. Notably, the mean accuracy values for both modified versions are similar; however, it is worth noting that the accuracy distribution for Quezada’s modified version exhibits a narrower spread across all finite cases.

The experiments on the Caesarian dataset (right panel of Fig. 4) exhibit a consistent performance trend across all four quantum algorithms, with performance steadily and slowly increasing as k increases. However, the results also show a highly dispersed distribution, suggesting significant overlap among classes in the feature space. Interestingly, although Caesarian’s imbalance ratio (IR) is not high enough to be considered an imbalanced dataset, the theoretical F1 score gives a significant advantage to Quezada’s modified version. Unfortunately, this advantage does not manifest for the analyzed values of k.

In the Raisin dataset (left panel of Fig. 5), all algorithms show a steadily increasing performance as k increases. However, they do so at different rates, which is crucial for real-life implementations, as larger k values result in longer processing times for quantum algorithms. In this case, Quezada’s original algorithm leads, reaching near-maximum performance at $k=50$. In the DBA dataset (left panel of Fig. 5), a similar pattern is observed. However, in this case, both Quezada’s original and modified algorithms show a comparable rate of performance improvement, with the modified version demonstrating slight superiority for the analyzed values of k.

Experiments on both the Transfusion and Immunotherapy datasets reveal interestingly similar results (Fig. 6) for all algorithms, with all reaching their maximum theoretical performance at relatively low values of k. Furthermore, the performance distribution of all four algorithms converged to a single value, showing no advantage for any specific metric or approach. This behavior can be attributed to one class having a significantly higher probability of being assigned to every pattern (usually the majority class), which also explains why the classical algorithm achieves the same performance as its quantum counterparts for large values of k ($k=50$ in this case).

Results on the Mine dataset (left panel of Fig. 7) show poor performance with all algorithms, consistent with the classical results. This suggests that the dataset is not well-suited for analysis using distance-based algorithms. Quezada’s more complex approach, compared to Schuld’s, is evident as performance increases with k for both the modified and original versions, even in this challenging dataset.

Similar to the Transfusion and Immunotherapy datasets, the Haberman dataset (right panel of Fig. 7) also exhibits a performance distribution with low spread, even for $k=1$, eventually converging to a single value at $k \le 50$. However, in this case, Quezada’s approach demonstrates its capabilities, as both the original and modified versions outperform Schuld’s algorithms, suggesting the metric itself was not a relevant feature in the analysis of this dataset.

In both the Seed and Wine datasets (Fig. 8), the performance of both versions of Schuld’s algorithm is strikingly deficient, even decreasing as k increases. In contrast, Quezada’s algorithm behaves as expected, whether in its modified or original form, with the mean accuracy values increasing as k grows. Both versions display similar behavior across all scenarios, with the original algorithm producing slightly higher mean accuracy values.

Lastly, in the case of the Glass dataset, for finite values of k, the performance of both versions of Schuld’s algorithm and the modified version of Quezada’s algorithm appears to stall without substantial improvement as k increases. This pattern remains consistent even when considering the theoretical maximum values, which closely resemble the mean F1-score obtained for $k=1$. In this dataset, only Quezada’s original version behaves as expected by consistently enhancing its performance as k increases, ultimately achieving the highest overall performance among the algorithms.

One of the key differences between the classical and quantum versions of the KNN algorithm is the interpretation of the parameter k. While in the classical algorithm, k represents the number of (strictly different) neighbors considered for class selection, the quantum algorithms behave as a probabilistic version of the classical 1NN algorithm, randomly (weighted by the corresponding probability distribution) picking one neighbor in each execution. Hence, the performance of the quantum versions will increase with k if the maximum obtained probability corresponds to the correct class. Conversely, if this is not the case, performance will decrease with k. This feature explains the behavior observed in the analysis of some datasets, as Quezada’s algorithm, which sacrifices simplicity for performance, is more likely to obtain a maximum probability for the correct class. On the other hand, Schuld’s algorithm sacrifices performance for simplicity, making it more prone to obtain a maximum probability for the wrong class.

Throughout this analysis, for both versions of Quezada’s algorithm, we have used $m=2$. The reason for this choice is that the computation time required for simulating the quantum algorithm on a classical computer significantly increases with higher values of m, particularly in the case of the modified version, as can be inferred from Eq. (39). In the right panel of Fig. 9, we compare the performance achieved with $m=2$ and $m=3$ on the Iris dataset. Here, we observe that increasing the value of m, and consequently the value of p, results in a steeper rise in performance as k increases. For each finite value of k the accuracy of both algorithms with $m=3$ outperforms their counterparts with $m=2$. Interestingly, this pattern holds even for the theoretical maximum, even though the performance improvement is less than 1.5%. Regarding the proposed metric, we observe that the modified versions perform similarly to their original counterparts for $k=1$. However, as k increases, the performance of the original versions surpasses that of the modified versions, consistent with the pattern observed in the left panel of Fig. 3 for the Iris dataset.

Noise analysis

In order to assess the possible effect that noise would have on the performance of the modified algorithms compared to the original versions, we simulated a noisy implementation using the Qiskit AerSimulator simulator, which introduces depolarization errors for one-, two-, and three-qubit gates.

Table 6 Accuracy obtained from 2048 executions for each algorithm with and without noise.

Full size table

For this purpose, a prototype dataset of four patterns, two features and two classes, was created. Using this toy-model dataset, the implementation of Schuld’s original algorithm required 10 qubits, while the modified version required only 6. Similarly, Quezada’s original algorithm (with $m=2$) utilized 14 qubits, whereas the modified version needed only 10. Thus demonstrating the advantage of the modified versions in reducing the number of required qubits.

Table 6 show the results obtained from 2048 executions. The noisy implementation of Shuld’s and Quezada’s original algorithms decreased the accuracy by $8.18\%$ and $15.26\%$ respectively. On the other hand, the corresponding reduction in both Shuld’s and Quezada’s modified algorithms was $16.94\%$ and $9.39\%$ respectively.

These mixed results clearly indicate that, for Schuld’s algorithm, the modified version is more susceptible to depolarization than its original counterpart. In contrast, for Quezada’s algorithm, the modified version proved to be more resilient. In both cases, susceptibility to depolarization was observed during the initialization phase, where the noise induced by the QRAM impacted the performance of both modified algorithms. However, the results suggest that Quezada’s original initialization phase is more prone to noise-induced errors than Schuld’s.

Conclusions

In this work, we introduced a quantum similarity measure for patterns and integrated it into two quantum adaptations of the kNN algorithm. To evaluate the impact of this modification, we conducted benchmark tests on both the original and modified versions of these algorithms across 13 diverse datasets (Iris, Seeds, Raisin, Mine, Cryotherapy, Data Bank Authentication, Caesarian, Wine, Haberman, Transfusion, Immunotherapy, Balance Scale, and Glass). The main advantages of this implementation encompass the use of non-binarized numerical data and a reduced memory requirement when compared to the original versions, all while maintaining their complexity.

Both theoretical and real-life results show that both algorithms benefit from the proposed subroutine, achieving a considerable reduction in the number of required qubits, while maintaining a similar overall performance. It is expected that not all datasets will show improvement, as the “no-free-lunch” theorem ⁶⁸ states that no classifier delivers good results for all datasets. The modified versions can thus be considered the first choice for analyzing datasets similar to those where they showed improved performance, and be regarded as a memory-efficient option for analyzing datasets where they did not show improvement.

This study highlights the dynamic nature of quantum machine learning and the need for adaptable quantum algorithms. The contrasting outcomes for Schuld’s and Quezada’s algorithms illustrate the intricate interplay between quantum techniques and the specific characteristics of datasets. These results reinforce the notion that quantum machine learning is an evolving discipline where choices must be made according to the distinctive requirements of each application. Future studies should focus on developing and optimizing quantum algorithms for various datasets, ensuring that quantum machine learning continues evolving as a powerful data analysis tool.

Data availability

References

Fix, E. & Hodges, J. L. Jr. Discriminatory analysis-nonparametric discrimination: Small sample performance, (California Univ Berkeley, Tech. Rep., 1952).
Rosenblatt, F. The perceptron: A probabilistic model for information storage and organization in the brain. Psychol. Rev. 65, 386 (1958).
Article CAS PubMed Google Scholar
Quinlan, J. R. Induction of decision trees. Mach. Learn. 1, 81–106 (1986).
Article Google Scholar
Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn. 20, 273–297 (1995).
Article Google Scholar
Deng, L. & Li, X. Machine learning paradigms for speech recognition: An overview. IEEE Trans. Audio Speech Lang. Process. 21, 1060–1089 (2013).
Article Google Scholar
Nassif, A. B., Shahin, I., Attili, I., Azzeh, M. & Shaalan, K. Speech recognition using deep neural networks: A systematic review. IEEE Access 7, 19143–19165 (2019).
Article Google Scholar
Álvarez, A. R., Gálvez, L. A. S., García, M. A., Gálvez, S. S. & Gómez, M. L. Sistema de reconocimiento de voz basado en un método de aprendizaje supervisado y la correlación de pearson (k-nn algorithm and pearson correlation-based a voice recognition system). Pistas Educativas 42 (2020).
Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems, (eds Guyon, I. et al.) vol. 30 (Curran Associates, Inc., 2017).
Otter, D. W., Medina, J. R. & Kalita, J. K. A survey of the usages of deep learning for natural language processing. IEEE Trans. Neural Netw. Learn. Syst. 32, 604–624 (2021).
Article MathSciNet PubMed Google Scholar
Torfi, A., Shirvani, R. A., Keneshloo, Y., Tavaf, N. & Fox, E. A. Natural language processing advancements by deep learning: A survey. arXiv:2003.01200 (2020).
Portugal, I., Alencar, P. & Cowan, D. The use of machine learning algorithms in recommender systems: A systematic review. Expert Syst. Appl. 97, 205–227 (2018).
Article Google Scholar
Pande, S. M. et al. Crop recommender system using machine learning approach. In 2021 5th International Conference on Computing Methodologies and Communication (ICCMC), 1066–1071 (2021).
Alvarez, F. Machine learning en la detección de fraudes de comercio electrónico aplicado a los servicios bancarios. Ciencia y tecnología 81–95 (2020). ISSN: 1850-0870.
Thennakoon, A., Bhagyani, C., Premadasa, S., Mihiranga, S. & Kuruwitaarachchi, N. Real-time credit card fraud detection using machine learning. In 2019 9th International Conference on Cloud Computing, Data Science and Engineering (Confluence), 488–493 (2019).
Varmedja, D., Karanovic, M., Sladojevic, S., Arsenovic, M. & Anderla, A. Credit card fraud detection - machine learning methods. In 2019 18th International Symposium INFOTEH-JAHORINA (INFOTEH), 1–5 (2019).
Moreb, M., Mohammed, T. A. & Bayat, O. A novel software engineering approach toward using machine learning for improving the efficiency of health systems. IEEE Access 8, 23169–23178 (2020).
Article Google Scholar
Ferdous, M., Debnath, J. & Chakraborty, N. R. Machine learning algorithms in healthcare: A literature survey. In 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), 1–6 (2020).
González, V. M. M., Aragón, G. D. & La Arango, F. O. Popularidad de las marcas y su valor económico en el marco de las finanzas corporativas: un análisis de aprendizaje máquina. Contaduría y Administración 68, 289–323 (2023).
Google Scholar
Gogas, P. & Papadimitriou, T. Machine learning in economics and finance. Comput. Econ. 57, 1–4 (2021).
Article Google Scholar
Rundo, F., Trenta, F., di Stallo, A. L. & Battiato, S. Machine learning for quantitative finance applications: A survey. Appl. Sci. 9, 5574 (2019).
Article Google Scholar
Martyn, J. M., Rossi, Z. M., Tan, A. K. & Chuang, I. L. Grand unification of quantum algorithms. PRX Quantum 2, 040203 (2021).
Article ADS Google Scholar
Biamonte, J. et al. Quantum machine learning. Nature 549, 195–202 (2017).
Article ADS CAS PubMed Google Scholar
Shor, P. W. Algorithms for quantum computation: Discrete logarithms and factoring. In Proceedings 35th Annual Symposium on Foundations of Computer Science, 124–134 (IEEE, 1994).
Grover, L. K. A fast quantum mechanical algorithm for database search. In Proceedings of the Twenty-eighth Annual ACM Symposium on Theory of computing, 212–219 (1996).
Schuld, M., Sinayskiy, I. & Petruccione, F. Quantum computing for pattern classification. In Pacific Rim International Conference on Artificial Intelligence, 208–220 (Springer, 2014).
Farhi, E. & Neven, H. Classification with quantum neural networks on near term processors. arXiv:1802.06002 (2018).
Zeng, Y., Wang, H., He, J., Huang, Q. & Chang, S. A multi-classification hybrid quantum neural network using an all-qubit multi-observable measurement strategy. Entropy 24, 394 (2022).
Article ADS PubMed PubMed Central Google Scholar
Schuld, M. & Killoran, N. Quantum machine learning in feature hilbert spaces. Phys. Rev. Lett. 122, 040504 (2019).
Article ADS CAS PubMed Google Scholar
Rebentrost, P., Bromley, T. R., Weedbrook, C. & Lloyd, S. Quantum hopfield neural network. Phys. Rev. A 98, 042308 (2018).
Article ADS CAS Google Scholar
Sánchez-Manilla, A. A., López-Yáñez, I. & Sun, G.-H. Alpha-beta hybrid quantum associative memory using hamming distance. Entropy 24, 789 (2022).
Article ADS MathSciNet PubMed PubMed Central Google Scholar
Rebentrost, P., Mohseni, M. & Lloyd, S. Quantum support vector machine for big data classification. Phys. Rev. Lett. 113, 130503 (2014).
Article ADS PubMed Google Scholar
Shan, Z. et al. Demonstration of breast cancer detection using qsvm on ibm quantum processors. Res. Square. https://doi.org/10.21203/rs.3.rs-1434074/v1 (2022).
Quezada, L., Sun, G.-H. & Dong, S.-H. Quantum version of the k-nn classifier based on a quantum sorting algorithm. Ann. Phys. 534, 2100449 (2022).
Article MathSciNet Google Scholar
Getachew, A. T. Quantum k-medians algorithm using parallel euclidean distance estimator. arXiv:2012.11139 (2020).
Li, J., Zhang, J., Zhang, J. & Zhang, S. Quantum knn classification with k value selection and neighbor selection. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 43(5), 1332–1345. https://doi.org/10.1109/TCAD.2023.3345251 (2023). Print ISSN: 0278-0070, Electronic ISSN: 1937-4151.
Wiebe, N., Kapoor, A. & Svore, K. M. Quantum nearest-neighbor algorithms for machine learning. Quantum Inf. Comput. 15(3–4), 318–358 (2015).
Abohashima, Z., Elhosen, M., Houssein, E. H. & Mohamed, W. M. Classification with quantum machine learning: A survey. arXiv:2006.12270 (2020).
Tiwari, P. & Melucci, M. Towards a quantum-inspired binary classifier. IEEE Access 7, 42354–42372 (2019).
Article Google Scholar
Sergioli, G. et al. Quantum-inspired minimum distance classification in a biomedical context. Int. J. Quant. Inf. 16, 1840011 (2018).
Article Google Scholar
Chakraborty, S., Shaikh, S. H., Chakrabarti, A. & Ghosh, R. A hybrid quantum feature selection algorithm using a quantum inspired graph theoretic approach. Appl. Intell. 50, 1775–1793 (2020).
Article Google Scholar
Adhikary, S., Dangwal, S. & Bhowmik, D. Supervised learning with a quantum classifier using multi-level systems. Quantum Inf. Process. 19, 1–12 (2020).
Article MathSciNet Google Scholar
Schuld, M., Bocharov, A., Svore, K. M. & Wiebe, N. Circuit-centric quantum classifiers. Phys. Rev. A 101, 032308 (2020).
Article ADS MathSciNet CAS Google Scholar
Ullah, U. & Garcia-Zapirain, B. Quantum machine learning revolution in healthcare: A systematic review of emerging perspectives and applications. IEEE Access 12, 11423–11450. https://doi.org/10.1109/ACCESS.2024.3353461 (2024).
Qi, H., Wang, L., Gong, C. & Gani, A. A survey on quantum data mining algorithms: Challenges, advances and future directions. Quantum Inf. Process. 23, 1–42 (2024).
Article MathSciNet Google Scholar
Pistoia, M. et al. Quantum machine learning for finance iccad special session paper. In 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD), 1–9 (IEEE, 2021).
Kharsa, R., Bouridane, A. & Amira, A. Advances in quantum machine learning and deep learning for image classification: A survey. Neurocomputing 560, 126843 (2023).
Article Google Scholar
Priyanka, G., Venkatesan, M. & Prabhavathy, P. Advancements in quantum machine learning and quantum deep learning: A comprehensive review of algorithms, challenges, and future directions. In 2023 International Conference on Quantum Technologies, Communications, Computing, Hardware and Embedded Systems Security (iQ-CCHESS), 1–8 (IEEE, 2023).
Hamming, R. W. Error detecting and error correcting codes. Bell Syst. Tech. J. 29, 147–160 (1950).
Article MathSciNet Google Scholar
Dang, Y., Jiang, N., Hu, H., Ji, Z. & Zhang, W. Image classification based on quantum k-nearest-neighbor algorithm. Quantum Inf. Process. 17, 1–18 (2018).
Article Google Scholar
Ruan, Y., Xue, X., Liu, H., Tan, J. & Li, X. Quantum algorithm for k-nearest neighbors classification based on the metric of hamming distance. Int. J. Theor. Phys. 56, 3496–3507 (2017).
Article MathSciNet Google Scholar
Li, J., Lin, S., Yu, K. & Guo, G. Quantum k-nearest neighbor classification algorithm based on hamming distance. Quantum Inf. Process. 21, 18 (2022).
Article ADS MathSciNet Google Scholar
Dua, D. & Graff, C. Iris. UCI Machine Learning Repository (2019).
Fahime Khozeimeh, M. R. Roohallah Alizadehsani & Layegh (P. Cryotherapy Dataset, UCI Machine Learning Repository, 2018).
Charytanowicz, M., Niewczas, J., Kulczycki, P., owalski, P. & Lukasik, S. Seeds. UCI Machine Learning Repository (2012).
Cinar, I., Koklu, M. & Tasdemir, S. Raisin. UCI Machine Learning Repository (2023).
Cinar, I., Koklu, M. & Tasdemir, P. D. S. Classification of raisin grains using machine vision and artificial intelligence methods. Gazi Muhendislik Bilimleri Dergisi 6, 200–209 (2020).
Google Scholar
Yilmaz, C., Kahraman, H. & Söyler, S. Land Mines. UCI Machine Learning Repository (2022).
Yilmaz, C., Kahraman, H. T. & Söyler, S. Passive mine detection and classification method based on hybrid model. IEEE Access 6, 47870–47888 (2018).
Article Google Scholar
Lohweg, V. Banknote Authentication. UCI Machine Learning Repository (2013).
Amin, M. & Ali, A. Caesarian Section Classification Dataset. UCI Machine Learning Repository (2018).
Aeberhard, S. & Forina, M. Wine. UCI Machine Learning Repository (1991).
Haberman, S. Haberman’s Survival. UCI Machine Learning Repository (1999).
Yeh, I.-C. Blood Transfusion Service Center. UCI Machine Learning Repository (2008).
Khozeimeh, F., Alizadehsani, R., Roshanzamir, M. & Layegh, P. Immunotherapy Dataset. UCI Machine Learning Repository (2018).
Khozeimeh, F. et al. Intralesional immunotherapy compared to cryotherapy in the treatment of warts. Int. J. Dermatol. 56, 474–478 (2017).
Article CAS PubMed Google Scholar
Siegler, R. Balance Scale. UCI Machine Learning Repository (1994).
German, B. Glass Identification. UCI Machine Learning Repository (1987).
Wolpert, D. H. The supervised learning no-free-lunch theorems. Soft computing and industry: Recent applications 25–42. https://doi.org/10.1007/978-1-4471-0123-9_3 (2002). Print ISBN: 978-1-4471-1101-6, Online ISBN: 978-1-4471-0123-9.

Download references

Acknowledgements

This work was partially supported by project No. 20231289 SIP-IPN, México. Areli-Yesareth Guerrero-Estrada acknowledges support from CONAHCyT México, Grant No. 815227.

Author information

Authors and Affiliations

Computing Research Center, National Polytechnic Institute, 07700, Mexico City, Mexico
Areli-Yesareth Guerrero-Estrada & Guo-Hua Sun
Research Center for Quantum Physics, Huzhou University, Huzhou, 313000, People’s Republic of China
L. F. Quezada

Authors

Areli-Yesareth Guerrero-Estrada
View author publications
You can also search for this author in PubMed Google Scholar
L. F. Quezada
View author publications
You can also search for this author in PubMed Google Scholar
Guo-Hua Sun
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.Y.G.E.: conceptualization, formal analysis, investigation, methodology, software, visualization, writing (original draft). L.F.Q.: formal analysis, investigation, methodology, Software, Validation, Writing (review and editing). G.H.S.: Formal analysis, Investigation, Supervision, Validation, Writing (review and editing).

Corresponding author

Correspondence to L. F. Quezada.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Guerrero-Estrada, AY., Quezada, L.F. & Sun, GH. Benchmarking quantum versions of the kNN algorithm with a metric based on amplitude-encoded features. Sci Rep 14, 16697 (2024). https://doi.org/10.1038/s41598-024-67392-0

Download citation

Received: 11 April 2024
Accepted: 10 July 2024
Published: 19 July 2024
DOI: https://doi.org/10.1038/s41598-024-67392-0
Springer Nature Limited

Benchmarking quantum versions of the kNN algorithm with a metric based on amplitude-encoded features

Abstract

Similar content being viewed by others

Hierarchical quantum classifiers

Classification Problem in a Quantum Framework

Quantum Algorithm for K-Nearest Neighbors Classification Based on the Metric of Hamming Distance

Introduction

Background

Classical kNN

Schuld’s quantum version

Quezada’s quantum version

QkNN algorithm with a non-binary similarity between features

Metric based on amplitude-encoded features

Schuld’s QkNN modified algorithm

Quezada’s QkNN modified algorithm

Results

Methodology

Discussion

Noise analysis

Conclusions

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Benchmarking quantum versions of the kNN algorithm with a metric based on amplitude-encoded features

Abstract

Similar content being viewed by others

Hierarchical quantum classifiers

Classification Problem in a Quantum Framework

Quantum Algorithm for K-Nearest Neighbors Classification Based on the Metric of Hamming Distance

Introduction

Background

Classical kNN

Schuld’s quantum version

Quezada’s quantum version

QkNN algorithm with a non-binary similarity between features

Metric based on amplitude-encoded features

Schuld’s QkNN modified algorithm

Quezada’s QkNN modified algorithm

Results

Methodology

Discussion

Noise analysis

Conclusions

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation