Quantum density peak clustering

Clustering algorithms are of fundamental importance when dealing with large unstructured datasets and discovering new patterns and correlations therein, with applications ranging from scientific research to medical imaging and marketing analysis. In this work, we introduce a quantum version of the density peak clustering algorithm, built upon a quantum routine for minimum finding. We prove a quantum speedup for a decision version of density peak clustering depending on the structure of the dataset. Specifically, the speedup is dependent on the heights of the trees of the induced graph of nearest-highers, i.e. the graph of connections to the nearest elements with higher density. We discuss this condition, showing that our algorithm is particularly suitable for high-dimensional datasets. Finally, we benchmark our proposal with a toy problem on a real quantum device.


I. INTRODUCTION
Machine Learning (ML) [1,2] is a field with an exceptional cross-disciplinary breadth of applications and studies.The aim of ML is to develop computer algorithms that improve automatically through experience by learning from data, so as to identify distinctive patterns and make decisions with minimal human intervention.The applications of ML that are already possible today are extremely compelling and diverse [3][4][5], and still growing at a steady pace.However, the training and deployment of these models, involving an ever increasing amount of data, faces computational challenges [6] that are only partially met by the development of special purpose classical computing units such as GPUs.
This challenge posed by ML algorithms has led to a recent interest in applying quantum computing to machine learning tasks [7][8][9][10][11] that sparked the field of Quantum Machine Learning (QML).So far, there have been many proposals for different QML algorithms.Several of them [12][13][14] have shown the potential to accelerate ML tasks, but largely rely on heuristic methods, and proving an advantage in terms of computational complexity for these particular algorithms is generally hard.On the other hand, there have been a number of works that, by using quantum algorithms with known complexity as subroutines [15,16], could prove the existence of a quantum advantage in QML.Indeed, it has been conjectured that QML algorithms could provide the first breakthrough algorithms on near-term quantum devices, given the inherent robustness of these algorithms to noise and perturbations.
Most of the literature on QML has focused on supervised learning [17] problems.This particular subset of ML has a number of appealing characteristics as it is more immediate to implement and allows for feedback loops and minimization of well-kown and well-behaved cost functions.Unlike supervised learning, unsupervised learning is a much harder, and still largely unsolved, problem.And yet, it has the appealing potential to learn the hidden statistical correlations of large unlabeled datasets [18,19], which constitute the vast majority of data being available today.Amongst all unsupervised learning problems, clustering is one of the most popular.In clustering, we consider a dataset D composed of n elements, in which we are given a notion of distance between every two elements in the dataset x i , x j ∈ D, dist(x i , x j ). ( Interpreting this distance as a similarity measure, the (ill-defined) problem of clustering can be formulated as follows.
Clustering.Given a dataset D and a distance dist(•, •) between each pair of elements of D, partition D into sets called clusters, such that similar elements belong to the same cluster and dissimilar elements belong to distinct clusters.
Clustering algorithms thus need to separate unlabeled data into different classes (or clusters) without any external labeling and supervision.Solutions to the clustering problem based on quantum computing have been proposed for a long time, resorting to a plethora of different strategies [14,[20][21][22][23][24][25][26][27][28][29][30].For example, in Ref. [22] the data poins are treated arXiv:2207.10559v1[quant-ph] 21 Jul 2022 as interacting quantum walkers on a lattice; whereas in Ref. [27] quantum subroutines for distance estimation and matrix arithmetics are employed to develop an efficient quantum version of a classical standard algorithm, k-means clustering.In [14], the clustering problem was reformulated as an optimisation problem and solved by applying a hybrid optimisation algorithm on a Rigetti quantum processor, hinting to the possibility of realising such advantage on near term quantum devices.
In this work, we adopt the black-box/oracular model of clustering introduced in [20], in which the information concerning the distances between points in the dataset is available only through oracle queries.Under this framework, references [20,23], using variants of Grover's search [31], quantize typical subroutines of learning algorithms, such as finding the largest distance in a dataset, computing the median, or constructing the c-neighbourhood graph.These subroutines are then used to accelerate standard clustering methods, namely, divisive clustering, k-medians clustering and clustering via minimum spanning tree.Nevertheless, the state-of-the-art in clustering has been steadily evolving in recent years, and so the question arises: can we also use quantum computing to speedup modern clustering algorithms?
We consider a recent and very popular clustering method, usually referred to as density peak clustering (DPC) [32].In a nutshell, the idea is to attribute a "density" value to every element of the dataset based on their distances to all other elements, and then assign each element to the same cluster as the nearest neighbour that has a density greater than itself (referred to as its nearest-higher ).Relying on a variant of quantum search known as quantum minimum finding [33], we show that, for any element of the dataset, we can find its nearest-higher (up to boundederror probability)in time O n 3/2 , as opposed to the classical O n 2 complexity.Unfortunately, to fully solve DPC we would need to repeat this subroutine n times.This would provide no advantage as we can classically implement DPC in O n 2 time.
Motivated by this, in this work we consider a simpler variant of the clustering problem, which we refer to a decision clustering.
Decision Clustering.Given a dataset D, a distance dist(•, •) between each pair of elements of D, and two elements x i , x j ∈ D, decide if x i and x j are in the same cluster.
Evidently, any solution of the clustering problem contains the answer to decision clustering, but not the other way around.This problem is relevant in situations where one is interested in establishing connections between particular elements but does not need to know the cluster structure of the entire dataset.For example, we may want to know if two users of social media platform are friends, or if an individual fits a particular consumer segment.Moreover, this can also be useful when one has to decide to which cluster a new datapoint belongs without having to re-cluster all the data.Finally, decision clustering can be iteratively applied to cluster small subsets of the data.
Classically solving decision clustering with DPC has the same complexity as solving the full clustering problem, O n 2 .However, in the quantum setting we show that we can solve it in Õ n 3/2 H time, where H is the maximum height of all trees in the graph of nearest-highers (to be properly defined in Section III).The factor H is not known a priori, depending on the specific structure of the dataset.Nevertheless, we argue that for high-dimensional datasets H scales as O(n 1/d eff ), where d eff is a constant greater than 2, and confirm this hypothesis with numerical simulations.When this holds, quantum density peak decision clustering provides a speedup over its classical counterpart.
Finally, we benchmark our quantum algorithm using a toy problem on a real quantum device, the ibm-perth 7qubit quantum processor.As expected, the hardware errors severally mitigate any possible quantum advantage.Nevertheless, the noiseless simulations confirm the potential of our approach.
The article is structured as follows.Section II provides summary of background material that is necessary for understanding our quantum algorithm.Namely, we introduce the data model and review the quantum minimum finding algorithm.Section III explains the classical DPC algorithm, assuming that the reader is not yet familiar with it.Then, in section IV we present our quantum algorithm.The complexity of the algorithm depends on the H factor, and so in section V we study how H behaves for different datasets.In section VI we show the results of the implementation of our proposal on a real quantum device.Section VII concludes the article with a brief summary and discussion.

II. PRELIMINARIES A. Data model
In this article, we work in a black-box model.We assume that our knowledge about the data comes uniquely from querying an "oracle" that returns the distance between pairs of points (i, j) We make no assumptions about this distance besides that it is non-negative and symmetric.That is, it does not need to be a distance by a proper mathematical definition.Our complexity measure is the number of performed queries (this is known as query complexity).This model is an abstraction that reasonably fits a number of problems.For example, the oracle may represent accessing a database with the distances between a group of cities, or a routine that estimates the dissimilarity between pairs of images.The query complexity is a good estimate of the total time complexity whenever determining the distance between elements is the most computationally intensive part of the algorithm.Nevertheless, we also point out that this model may not be a good description of other common situations.For example, if we know the coordinates of a set of points in R d (for some integer d), we already have more structure than in the black-box model and there are geometrical methods that allow significant speedups for certain tasks.
In the quantum setting, we assume that we can query the distances between elements in quantum superposition.That is, we assume access to a unitary Q (the quantum oracle) such that where q is the number of bits necessary to store the distances up to desired accuracy.In particular, given a superposition ij α ij |i, j (for any set of normalized complex amplitudes {α ij } ij ), Q acts as The quantum query complexity is counted as the number of applications of the unitary Q (for more details on the quantum query model refer to [34]).When describing classical data, the oracle may be realized with a quantum random access memory (qRAM) architecture [35].In the end, our results are critically dependent on the existence of a qRAM with the above mentioned properties, representing a common setting in theoretical work on quantum algorithms, and in quantum machine learning in particular.Nevertheless, even though there have been proposals of physical architectures for implementing QRAM [36,37], there are still significant challenges to overcome before such a device can be practically realized [38].

B. Quantum minimum finding
The key quantum routine we use in our work is the well-known quantum minimum finding algorithm of Dürr and Høyer [33], which in itself is a specific application of the quantum search algorithm [31].
Begin by considering a boolean function F defined on a domain of size n, F : {0, 1, . . ., n − 1} → {0, 1}.Our goal is to find an element x such that F (x) = 1, assuming one exists.F is provided as a black box, that is, we can only gain information about the function by evaluating it on given elements.In the worst-case scenario, we may need to query all n possible inputs before succeeding.
Now suppose that we have access to a unitary O F that marks the 1-inputs with a −1 phase, The unitary O F is an oracle for F , and an application of O F is referred to as a quantum query to F as introduced in the section above.Lets now consider |Ψ to be the uniform superposition of all input states, If we measure |Ψ , we obtain every input with equal probability.Grover's algorithm [31] is based on the observation that we can amplify the probability of measuring 1-input states via the repeated application of the operator to an initial state |Ψ .Roughly, the amplitudes of the 1-input states grow linearly with each application of G F , while their measurement probabilities grow quadratically.This means that ∼ √ n quantum queries are sufficient to measure an input state with high probability.Boyer, Brassard, Høyer, and Tapp [39], generalizing Grover's algorithm [31], prove the following theorem.
Quantum minimum finding calls quantum search as a subroutine.Suppose that we want to find a minimizer of a black-box function f : {0, . . ., n − 1} → N.For any element i ∈ [n], define the boolean function which can be evaluated with two queries to f .Let O Fi be a unitary that evaluates F i , as in expression (6).The quantum minimum finding algorithm starts by choosing a threshold element i uniformly at random between 0 and n − 1. Employing quantum search with O Fi as oracle, we look an element j such that F (j) < F (i).We then repeat this process, updating j as the threshold element, until the probability that the selected threshold is the minimum of f is sufficiently large.With this algorithm, Dürr and Høyer [33] reach the following result.

III. DENSITY PEAK CLUSTERING
Since its introduction in 2014 [32], density peak clustering (DPC) has been widely studied and applied [40][41][42].This algorithm, albeit remaining quite simple in the concept and implementation, presents some interesting features that are absent from most of the simpler clustering algorithms discussed in the QML literature.As an example, it does not make assumptions on the number of clusters present in the data (unlike simpler algorithms such as k-means [43]), but it can infer this information from the data itself.Linked to this first property, DPC does not require any prior hypothesis on the shape of the dataset (i.e., gaussian distributed or symmetric), and works well with datasets of virtually any shape [32,44].Finally, DPC is able to detect outliers in the dataset, that is, elements that do not belong to any cluster.This last property opens the possibility of using DPC beyond the usual scope of clustering problems, such as anomaly detection [42,45].We will now introduce this algorithm from a mathematical point of view and have a deeper look at its implementation to understand its capabilities and its criticalities.
In density peak clustering, the first step is to compute the density ρ(x i ) of every element where χ is a convolutional kernel that can be optimized according to the specific application.Common choices include the step kernel χ(x) = Θ(x − d c ) or the Gaussian kernel χ(x) = exp −x 2 /d 2 c , for some normalization parameter d c .Next, for each element x i we find its "nearest-higher" h(x i ), defined as the closest element to x i with higher density than x i , The nearest-higher separation δ(x i ) is the distance between x i and its nearest-higher, Naturally, if the point in question is the one with highest density in the entire dataset then it does not have a nearest-higher.In that case, by convention, the nearest-higher separation is +∞.
The key observation at the basis of DPC is that, for the elements that are local maxima of the density, the nearesthigher separation is much larger than the typical nearest-neighbour distance.This idea is illustrated in Figure 1, where we consider a small dataset embedded in R 2 with the similarity measure provided by the Euclidean distance.In Figure 1b, by plotting the density versus the nearest-higher separation for all elements in the dataset, it becomes clear that the elements 7, 11, and 15 stand out from their neighbours.These three elements are classified as roots, and each root will originate its own cluster.The elements 0 and 16, despite also having large nearest-higher separations, also have low densities and so are classified as outliers.In more detail, for some choice of threshold ρ c and δ c (to be specified according to the typical scales of the dataset), we promote to roots the elements x i satisfying and demote to outliers those who obey Then, the rest of the elements are assigned to the same cluster as their nearest-higher.As evidenced in Figure 2c, the directed graph of nearest-highers (with edges originated from roots removed) forms a forest and each tree corresponds to a distinct cluster, the root nodes of the tree being the elements that were prometed to roots.The main steps of density peak clustering are summarized in Algorithm 1.The complexity of the algorithm becomes clear immediately from the first step.Indeed, in order to compute the density of a given element, we need to query the distance to every other element in the dataset.Repeating this for all elements in the dataset means that we end up querying all (n 2 − n)/2 distances.
Claim 1.The density peak clustering algorithm (Algorithm 1) has O(n 2 ) query complexity, where n is the size of the dataset.

Now consider a decision version of the clustering problem
, where the goal is, given two elements x i , x j ∈ D, to decide if they belong to the same cluster.Evidently, we could solve the full clustering problem and then verify if x i and x j were assigned to the same cluster.But in the context of DPC there is a more direct approach, relying on the following simple observation: the two elements belong to the same cluster if and only if the sequences of nearest-highers starting from x i and x j lead to the same root.In other words, the problem is reduced to finding the roots of the respective trees in the graph of nearest-highers.This can be solved with Algorithm 2, where we just walk up the trees node by node by computing the corresponding nearest-higher.We know that we have reached a root when the nearest-higher separation is larger than δ c .
Unfortunately, this approach comes with no significant complexity advantage as every step we take up the tree has essentially the same computational cost of the full clustering.To see this, just note that computing a nearest-higher of an element x i requires knowing how the density of all other element compares to ρ(x i ) (cf.Eq. ( 11)).So, it seems unavoidable to compute all the densities.Claim 2. The decision version of density peak clustering (Algorithm 2) has O(n 2 ) query complexity, where n is the size of the dataset.
In the next section, we show that we can circumvent this cost with quantum search, establishing a quantum advantage for the decision version of density peak clustering.

IV. QUANTUM ALGORITHM
We now have all the elements to introduce a quantum algorithm to solve the decision version of density peak clustering.The quantum algorithm will use the same strategy as in Algorithm 2, calling the quantum minimum finding routine (section II B) to determine the nearest-highers.
For each i ∈ {0, . . ., n − 1}, consider the function From the definition of nearest-higher (Eq.( 11)), it is clear that finding the minimizer of f i is equivalent to determining the nearest-higher of x i , For any element in the dataset, we can evaluate its density (Eq.( 10)) with n − 1 distance queries.So, we can compute f i (j), for any j ∈ {0, . . ., n − 1}, with O(n) queries.Since any classical computation can be simulated on a quantum computer [46], there exists a unitary U i that uses O(n) quantum queries and achieves the transformation for any j ∈ {0, . . ., n − 1}, where q is the number of qubits to store the distance up to the desired accuracy and we add an extra flag qubit to indicate if f i evaluates to +∞.Therefore, by Theorem 2, we can use the quantum FIG. 1. Density peak clustering.In Figure 1a we represent a dataset of 23 elements embedded in R 2 , with the similarity measure provided by the Euclidean distance.The roots are represented by octagons, the outliers by pentagons, and the rest of the dataset by circles.The elements are coloured by cluster (with the outliers coloured in black).The gray arrows indicate the edges of the directed graph of nearest-higher connections.This graph forms a forest, with one tree per distinct cluster.In Figure 1b we plot the nearest-higher separation as a function of density for all points in the dataset.The elements satisfying ρ(xi) > ρc ∧ δ(xi) > δc are promoted to roots, while the ones satisfying ρ(xi) < ρc ∧ δ(xi) > δc are demoted to outliers.minimum finding algorithm to find the minimizer of f i (i.e., the nearest-higher of x i ) with probability at least 1 − with O( √ n log(1/ )) applications of U i , amounting to a total of O n 3/2 log(1/ ) (18) quantum queries.
Let H be the maximum height of any tree in the graph of nearest-highers.In the worst case scenario, one may need to call quantum minimum finding 2H times before reaching the root node for both inputs.Moreover, we want to make sure that it finds the correct nearest-highers every single call with high probability, which can be done by setting the constant in equation ( 18) to be sufficiently small.A simple calculation concludes the following.Claim 3. Let D be a dataset of n elements and let H be the maximum height of any tree in the graph of nearesthighers.For any two elements x i , x j ∈ D, we can solve density peak decision clustering with success probability at least 1 − (for any > 0) with quantum query complexity Recall that the classical complexity of the decision version of density peak clustering is O(n 2 ) (Claim 2), irrespective of the value of H.

V. HEIGHT OF TREES
We have seen that we can reach a lower quantum query complexity for the density peak decision clustering problem depending on the factor H, the maximum height of the trees in the graph of nearest-highers.Indeed, there is quantum speedup when H scales as O(n a ) for some a < 1/2.In contrast, there is no speedup when H = Ω(n a ) with a > 1/2, as we may solve clustering classically in O(n 2 ) time.
To understand how the factor H scales, we can start by considering very simple data model.We assume that the dataset is generated by sampling n points uniformly at random from a bounded region of R d .In this case, we expect to see only one cluster that spans the entire region.A straightforward calculation reveals that the expected nearest-neighbour distance nn scales as While the nearest-higher does not always coincide with the nearest-neighbour, the nearest-higher separation is most likely not much larger than the typical nearest-neighbour distance (except for the root node).So, we should have Now consider a leaf node of the graph of nearest-highers at a distance, say, L from the root node.As it probably lies close the edge of the region, L characterizes the "size" of the cluster.Its nearest-higher is found roughly along the direction of the centre of the cluster.That is, the nearest-higher is at a distance ∼ δ closer to the root node.
Repeating this reasoning for all nodes in the path to the root, we conclude that Admittedly, we have merely provided an informal argument for the expression (22), not a fully rigorous proof.Still, we can verify this behaviour numerically.We generate such artificial datasets by sampling uniformly from d-balls of radius one, for different dimensions d.The results, shown in Figure 2a, confirm that indeed H scales as n 1/d .To study how H behaves in more general settings, we consider two other types of datasets: • Well-separated, Gaussian-shaped clusters (Figure 2b).For different values of d, we pick ten centroids at random in range [0, 100] d , associating to each a randomly chosen covariance matrix.We then generate artificial datasets by drawing from the corresponding Gaussian distributions.With high probability, the clusters will be wellseparated.FIG. 2. Heights of trees.We plot the maximum height of any tree in the graphs of nearest-highers, H, for different types of datasets (we show the average over five runs).Both axis are shown in logarithmic scale.For plots (a) and (b) we artificially generate uniform and Gaussian datasets (respectively) for several dimensions d with varying number of elements n.In the upper left corners, we show examples of such datasets in two dimensions.For plot (c), we take n random samples from the Forest Cover Type dataset [47].To visualize the dataset in two dimensions, we used a t-distributed stochastic neighbour embedding [48].
• Real-world dataset (Figure 2c).We randomly sample entries from the Forest Cover Type dataset [47].This dataset originally contains fifty-five features, both numerical and categorical, conveying information about the Roosevelt National Forest in Colorado, namely, tree types, shadow coverage, distance to nearby landmarks, soil type, and local topography.We preprocess the dataset by numerically encoding the categorical data, and then rescaling the numerical variables such that each has mean zero and variance one.We define the clustering distance as the Euclidean distance between the first ten principal components of the data.
For both cases, we observe that for some parameter d eff .We interpret d eff as an "effective dimension" of the dataset, which can be smaller than the number of features of the data.For example, for the Gaussian datasets a simple polynomial fit reveals d eff = 1.94, 2.36, 2.80, 3.22 for d = 2, 3, 4, 5, respectively.For the Forest Cover Type dataset, we find d eff = 3.71.An interesting research question (outside the scope of the present article) would be to properly understand how this effective dimension arises from the structure of the data.
For the cases considered where the datasets had more than two features, we have verified that the effective dimension was greater than two, entering the regime where our density peak decision clustering algorithm shows a quantum speedup.This is evidence that our quantum algorithm could be suitable for high-dimensional, real-world problems.Our conclusions are summarized in the following claim.Claim 4. Let D be a dataset of n elements and let the maximum height of any tree in the graph of nearest-highers scale as O n 1/d eff for some parameter d eff .Then, for any two elements x i , x j ∈ D, we can solve density peak decision clustering with constant error probability with quantum query complexity In particular, if d eff > 2, quantum query complexity is better than the classical query complexity O n 2 .
We would like to stress that our speedup shows a dependence on a geometric property of the dataset that, to the best of our knowledge, has not yet been seen in the literature.While it is a common idea that quantum speedups in machine learning may rely on the structure of the dataset, it is usually hard to rigorously characterize the necessary structure.In contrast, in this work we were able prove that we have speedup if H scales better than √ n (or, in other words, if the effective dimension is larger than 2).FIG. 3. Experimental implementation of toy problem. Figure 3a shows toy dataset of 8 points.The two colours (blue and yellow) denote the two distinct clusters in the dataset.The arrows denote the nearest-higher (except for the elements 1 and 5 which are the cluster centers).In plot 3b we plot the average number of iterations, or average number of oracle calls nO , to find the nearest-higher for each of the points in the dataset.We reported a classical random search strategy, an ideal simulated quantum minimum finding subroutine and the implementation of the same quantum subroutine in a real IBM quantum processor.The average is taken over 1000 consecutive runs of each strategy.The datapoints reported in the x-axis of the bar graph are the decimal representation of the binary points in the dataset (e.g.datapoint 3 is the element 011 of the dataset), notice that datapoints 1 and 5 are missing from the bar graph as they are the cluster centers and their number of iterations is fixed.

VI. EXPERIMENTAL IMPLEMENTATION
In this section, we test the proposed quantum density peak clustering on a real quantum processor.Specifically, we implement the main quantum routine, minimum finding.Since we are limited by the size of the devices, we solve a synthetic clustering problem involving just eight elements -see the dataset depicted in Figure 3a.We can encode each element of the dataset in log(8) = 3 qubits, being suitable to run on NISQ machines.The first problem to address is implementing the oracle in a suitable manner.
Given the density ρ i computed for each i ∈ {0, . . ., 7} in our synthetic dataset, consider the function introduced in Eq. (15).The quantum minimum finding calls an oracle O i,j that implements the Boolean function as Present-day quantum machines do not have qRAM access, nor can perform complex arithmetic operations such as computing densities.So, we classically pre-compute F i,j (k) for every values of i, j, and k and construct a circuit for each oracle O i,j following scheme outlined in [49].While this is not how the algorithm is meant to be implemented in practice (cf.section IV), we believe that this approach is suited for the purposes of a proof-of-concept.At this point, we have all the ingredients to start the quantum nearest-higher search.Given a point i we start by selecting a random threshold j and call the oracle dictionary F i,j .We then mark the states that have F i,j (k) = 1 following the same marking scheme outlined in [49] and apply the amplitude amplification subroutine.At the end, we measure a state |k that is selected as the new threshold j = k if f i (k) ≥ f i (j), otherwise a new threshold j is selected at random.This procedure is iterated until the nearest higher is found for every point i in the dataset.
We benchmark this strategy against the classical method of nearest-higher search that is just a random sampling of j.In either case, the figure of merit is the average number of oracles calls before finding the nearest-higher, n O , i.e., the quantum query complexity.
In Figure 3b we show the results for n O taken over 1000 run of the algorithm by using both the classical strategy and the quantum routine.As expected, the classical nearest-higher algorithm (i.e.random search) always takes on average iterations ∼ 3.5 (blue bars in Figure 3b), irrespectively of the point as one would expect given the size of the dataset.
Regarding the quantum search, we first used the Qiskit package [50] to simulate the algorithm without noise.In general, we can have multiple rounds of amplitude amplification in each step of our quantum minimum finding subroutine.However, here we opted by always running just one round.The simulations (green bars in Figure 3b) clearly demonstrate quantum speedup, as the average number of oracle calls before convergence is lower than in the classical case for all points.We can observe that there are some points with speedups more pronounced than others as, for each specific datapoint, this depends on the number of states marked by the oracle.
Finally, we ran our Qiskit program on a real quantum computer by IBM, the ibm-perth 7-qubit processor (see red bars in Figure 3b).In this case, the analysis is more subtle because of the real-world noise on top of our ideal quantum algorithm.Indeed, for points whose oracles mark several states, the depth search circuits is quite large, making the computations more sensible to noise.On the other hand, there are some points for which the number of states marked by the oracle is low and thus the circuit is small enough that we can still observe an advantage over the classical case (even if it is less pronounced than in the ideal quantum simulation due to noise).This observation is very relevant as it is generally difficult to observe such quantum advantages running quantum machine learning subroutines on real hardware, even if for a toy problem as the one explored here.We can thus hope that, with some improvement in coherences and maybe error correcting codes being built in real quantum processors, we can start observing some advantages for real-world problems.

VII. CONCLUSIONS
In this work, we have introduced a quantum version of the density peak clustering algorithm, specifically aimed at its decision version.Our proposed algorithm builds upon the well-known quantum minimum finding algorithm, giving us the possibility of computing the query complexity of quantum density peak clustering.Indeed, while the classical query complexity of density peak clustering is O n 2 , we prove that our proposed quantum algorithm has complexity O n 3/2+1/d eff , for a parameter d eff that depends on the structure of the dataset.For values of d eff > 2, we have a quantum speedup, albeit a modest one.This provable dependence of the complexity on this geometric property of the dataset constitutes by itself an notable result.Indeed, while it is widely accepted that quantum speedups for machine learning may depend on the structure of the data, it is often difficult to precisely characterize this dependence.As discussed in section V, in our case we interpret the parameter d eff as an "effective dimension" of the dataset, making quantum density peak clustering specially suited for high-dimensional problems.The successful implementation of a toy problem in a real quantum computer and the observation of an advantage, even in the presence of the noise typical of a NISQ device, hints at the concrete possibility of exploiting the capability of quantum density peak clustering in near-term quantum computers.
To conclude, we would like to raise two points that are relevant not only for the quantum density peak clustering, but also for the quantum machine learning community at large.The first one regards the efficient implementation of the classical computation routines, which in this work were included in the oracles.For example, while comparing two distances consumes O(1) time, the overhead of this calculation prohibits its implementation on present-day quanutm hardware.The second point is to understand how the effective dimension of the dataset d eff scales in general for an arbitrary dataset (and if we can always define an effective dimension given the scaling of nearest neighbors).Going even deeper, one could ask if d eff represents some fundamental property of the data and its structure (and can thus be exploited further) or if it is just a scaling parameter.We believe that answering these questions could be crucial for quantum density peak clustering and its future as a viable clustering algorithm for quantum machine learning on near-term quantum devices.