1 Introduction

Graph data models are useful to store, process and analyse highly interlinked data [1]. This is achieved through the use of graph theory to store data in the form of nodes and edges [2]. With the recent rise in the popularity of graph databases, which rely on graph data models to store and query data, there is a growing need to incorporate state-of-the-art learning algorithms to analyse data in graph data models. This incorporation enables the extraction of meaningful information that might otherwise be hidden when analysed in a tabular structure. Employing unsupervised and supervised learning techniques is highly effective for analysing data across several domains, e.g. to study social networks [3], physical systems [4], proteomics knowledge graphs [5], etc [6].

Fig. 1
figure 1

Taxonomy of clustering algorithms [7, 8]. Coloured nodes and dark edges highlight the roadmap leading to spectral clustering. The dotted arrow denotes that k-means is also a hard/crisp clustering, in addition to being a squared error reduction-based clustering algorithm as well

One of the most commonly used unsupervised learning algorithms is clustering, which is widely used by data analysts and domain experts to group similar instances and explore hidden structures in a wide spectrum of fields, ranging from engineering, computer science and medical sciences to social sciences and economics as well [2]. The challenges and opportunities of graph data have led to the development of specialized clustering algorithms designed specifically for graph data. These algorithms include spectral clustering which often outperforms basic clustering algorithms such as K-means, hierarchical, etc. [9].

The popularity of graph data is on the rise with the development of graph database management systems, e.g. AllegroGraph, ArangoDB, InfiniteGraph, Neo4J [10]. The flexibility of the graph data structures allows novel possibilities for data exploration, and consequently, knowledge discovery. As a result, clustering algorithms designed for graph data models, e.g. spectral clustering algorithms, are gaining momentum along with the applications of graph data and databases.

A popular class of clustering algorithms, designed specifically for graph data models, is known as spectral clustering. Spectral clustering utilizes eigendecomposition to represent and group data into clusters [11], and its conceptualization dates back to 1973 [12]. This survey analyses popular spectral clustering algorithms, which have been widely discussed in the scientific community due to their high efficiency in applications such as image segmentation, analysing patterns in gene expression data or proteomics data. We initially provide a roadmap (see Fig. 1) to navigate through the clustering paradigm till spectral clustering is reached, upon which we elaborate on 16 primary spectral clustering algorithms and conclude with a comparison of their complexities and applications. Since several variations of these algorithms were developed, we will concentrate on discussing most of them extensively in a single survey.

In this paper, we have provided a comprehensive overview of the following topics:

  • Background: Graph theory, database models, and cluster analysis (Sect. 3)

  • Types of clustering algorithms (Sect. 3.3)

  • Clustering graph data: Graph and node clustering (Sect. 4.1 and 4.2)

  • Spectral clustering algorithms (Sect. 4.3)

1.1 Related work

There are several clustering techniques and comprehensive analyses of clustering algorithms for e.g., clustering algorithms in general by Ezugwu et al. [7], clustering algorithms for graph data by Aggarwal et al. [8], spectral clustering algorithms by Nascimento et al. [11] and Verma et al. [13]. The conception of new spectral clustering algorithms is more suited for specific tasks rather than generic data. Karim et al. [14] have demonstrated in their survey on deep learning-based clustering approaches their usefulness in the field of bioinformatics. Another similar work by Qi et al. [15] has surveyed various clustering and classification methods specifically for single-cell RNA-sequencing data. Several recent developments in the domain of spectral clustering require an elaborate survey of significant techniques and a roadmap to trace the development of the use of eigenvectors for clustering.

2 Definitions and notations

Graph: A graph is represented by, G = (V,E), where V denotes a set of nodes (vertices) and E denotes a set of edges (relationships) between the nodes. Edges can be weighted or unweighted. Edges should be undirected for spectral clustering algorithms (von Luxburg [9]). There are three different methods to transform data points into a similarity graph, for spectral clustering [9] as shown in Fig. 2:

Fig. 2
figure 2

Types of neighbourhood graphs. a Fully-connected graph; b \(\epsilon\)-neighbourhood graph; c k-neighbourhood graph for weighted graphs, all the edge weights in b and b would not be replaced by 1

  • Fully connected graph [16, 17]: Any data points with positive adjacency values can be connected to form a graph which is only useful when the similarity function itself can model local neighbourhoods, e.g.:

    $$A_{ij}= exp(-||s_{i}-s_{j}||^{2}/2\sigma ^{2})$$
    (1)

    where Aij is the affinity between si and sj; si − sj is the distance between si and sj; σ is the scaling parameter

  • \(\epsilon\)-neighbourhood graph [18]: Data points are connected based on a threshold, \(\epsilon\). Edges with weights lower than the \(\epsilon\) are discarded to form a \(\epsilon\)-neighbourhood graph from a fully-connected graph.

  • k-neighbourhood graph [19]: Formed using the k-nearest neighbour algorithm resulting in either k-nearest neighbour or mutual k-nearest neighbour graph, depending on how the vertices were connected. k denotes the minimum number of points required to define a local neighbourhood.

Proximity measure: Measure of distance, similarity, dissimilarity and/or adjacency between vertices/nodes/data points derived from their attributes. Clustering algorithms always require calculating proximity measures, among the given data points, as their primary step. Popular examples (Mehta et al. [20]) in the context of clustering can be broadly categorized into two types: metric and non-metric (see Table 1). Metric proximity measures satisfy properties such as non-negativity, symmetry, and triangle inequality. On the other hand, non-metric proximity measures may violate symmetry and/or triangle inequality.

Table 1 A generic overview of popular proximity measures used in clustering

Distance matrix: Square matrix (refer Fig. 3a) representing the distance between data points based on a distance measure, e.g. Euclidean, Manhattan, etc. The diagonal values are 0 which signifies the lowest possible distance value.

Fig. 3
figure 3

Matrix representation of a distance, b similarity (affinity) and c adjacency matrices for nodes N1, N2 and N3

Similarity matrix: Square matrix (refer Fig. 3b) representing the similarity between data points based on a similarity measure, e.g. Dice, Cosine, etc. Diagonal value 1 represents the highest possible similarity value. However, the diagonal of a square matrix containing dissimilarity between data points may contain 0 as the lowest possible dissimilarity value.

Adjacency (affinity) matrix: Square matrix (refer Fig. 3c) representing the adjacency (also affinity or node similarity) between points/nodes in a graph, G (Hogben [29]). It can be of two types—unweighted and weighted.

For weighted adjacency matrix: A = [\(\alpha _{{i}_{j}}\)], where, \(\alpha _{{i}_{j}}\) is an edge of graph, G

For unweighted adjacency matrix: A = [\(\alpha _{{i}_{j}}\)], where, \(\alpha _{{i}_{j}}\) = 1 if {i,j} is an edge of graph, G and \(\alpha _{{i}_{j}}\) = 0 otherwise.

Diagonal (degree) matrix: Square matrix containing the degree of each node in its diagonal (Hogben [29]) which represents the number of edges, connected to each node in the graph.

D = diag(deg\(_{G}\)1,...,deg\(_{G}\)n), where, n = number of nodes/data points and, deg = degree of a node (number of edges connected to a node).

Laplacian matrix: Spectral clustering algorithms require adjacency and degree matrix to create the Laplacian matrix of the input graph which acts as a primary input to most. Commonly used types in spectral clustering [9, 30] are:

  • Unnormalized, L = D − A

  • Normalized:

    • Symmetric, L\(_{sym}\) = \(D^{-1/2}LD^{-1/2}\)

    • Random Walk, L\(_{rw}\) = \(D^{-1}L\)

  • Relaxed, L\(_{\rho }\) = L − \(_{\rho }\)D

    where D = Diagonal and A = Adjacency matrix, and \(\rho\) = relaxation parameter.

3 Background

3.1 Graph theory

Graph theory is a branch of discrete mathematics that deals with the study of mathematical structures to model entities and relationships between them, in the form of nodes/vertices and relations/edges, respectively. It has been discussed by Pal Singh et al. [31], along with its wide spectrum of applications in database design, software engineering, circuit designing, network designing, and visual interfaces. Graph theory influenced the conception of several database models, such as semantic, object-oriented, graph, and XML, as shown in Fig. 4. Some popular data structures influenced by graph theory are trees, linked lists, etc. that can be used to model graphs.

Fig. 4
figure 4

Evolution of database models. Arrows denote influence—dotted arrows represent the influence of graph theory on various database models [32]

3.2 Graph database models

Graph databases primarily provide storage and querying of data stored in graph data models. Additionally, available plug-ins [32] can provide features such as conceptual visualization, e.g. Neo4j bloom [33], data analytics, e.g. Graph Data Science library [34], Decision Tree Plug-in [35]. Hence the full potential of graph data models could be realized on a Graph Database Management System such as Neo4j [36], AllegroGraph [10]. Chad et al. [36] have evaluated Neo4j, a Graph Database Management System, against Relational Database Management System as shown in Table 2. The flexibility of data schema in a graph database represents the added benefit over a relational database. Regarding scalability, it could be argued that relational databases perform better regarding data distribution across several machines, as discussed by Pokorny [1]. However, scalability on large datasets is not an issue for graph databases [1].

Table 2 Graph vs relational database models [36, 37]

3.3 Cluster analysis

Clustering (cluster analysis), is a process of grouping data into distinct classes so that objects with similar attributes and/or characteristics are grouped in the same class/clusters [7]. It is classified as an unsupervised learning algorithm where the goal is to find meaningful patterns from underlying unlabeled data [38].

Traditional clustering algorithms such as K-means, DBSCAN, and agglomerative clustering, suffer when dealing with high-dimensional data since most often Euclidean distance alone is used as the distance/proximity measure between data points which fails at accurately portraying the relative positions of data points at high dimensions [39]. Spectral clustering algorithms overcome this (graphs are non-Euclidean data structures [40]) through the calculation of eigenvalues and eigenvectors from Euclidean distances of the graph Laplacian matrix to partition the graph in the eigenspace [8].

Clustering algorithms could be broadly generalized into two categories: partitional and hierarchical clustering (Celebi et al. [41]). The former partitions data points according to a pre-defined number of groups, while the latter hierarchically assigns data points as groups of subgroups, until all points belong to one cluster (bottom-up) or individual clusters (top-down). A brief comparison between partitional and hierarchical clustering algorithms is provided in Table 3. While hierarchical clustering is generously illustrative to have an elaborate overview of the cluster formation, which acts as a huge advantage to realize the similarity between data points in sub-clusters, it comes at the cost of higher time and space complexity (Garima et al. [42]) than partitional clustering.

Table 3 Comparison of commonly used clustering algorithms [41, 42]

Sum of squares of error (SSE) minimization: The most commonly used partitional clustering technique, K-means, optimizes SSE of clusters, while Ezugwu et al. [7] also labels it as a hard clustering technique. It has the advantage of being easily implemented on large datasets at a considerably low run time. The results are easily interpretable, which benefits the user in having a general overview of the data and possible clusters.

Fuzzy: Fuzzy clustering involves assigning the degree of membership, for each data point to more than one cluster, e.g. fuzzy c-means algorithm [48].

Mixture resolving: Mixture resolving methods, according to Grira et al. [49], assume that data points belong to one of several distributions. Expectation maximization (EM) is an iterative approach that aims to find the maximum likelihood estimates of the parameters [50] and is used in this case for parameter estimation.

Hard clustering: Hard clustering, e.g. K-means, groups data into prespecified k non-overlapping groups, without a hierarchy [7].

  • Density-based clustering algorithms such as DBSCAN (Ester et al. [51]), OPTICS (Ankerst et al. [44]), DENCLUE (Hinneburg et al. [52]) provide better performance than K-means by handling arbitrary shapes and detecting outliers.

  • Subspace clustering algorithms can be categorized into top-down and bottom-up algorithms. PART (Cao et al. [53]) and PROCLUS (Aggarwal et al. [54]) are top-down algorithms, in which the whole set of dimensions is used to find an initial grouping, and the subspaces of each cluster are assessed. On the other hand CLIQUE (Agrawal et al. [55]), and MAFIA (Nagesh et al. [56]) are bottom-up subspace algorithms, in which first dense areas in low-dimensional spaces are identified, then by combining them, clusters are created (Gan et al. [57]).

  • In model-based clustering such as COOLCAT (Barbará et al. [58]) and STUCCO (Bay et al. [59]), it is presupposed that the data are produced by a combination of probability distributions, each of which components represents a distinct cluster (Gan et al. [57]).

  • Search-based algorithms such as Genetic Algorithms (Holland et al. [60]), Al-Sultan’s Method (Barbara et al. [58]) work towards globally optimal clustering to fit the data. Compare to, for example, fuzzy clustering, search-based algorithms do not stop at a local optimum partition (Gan et al. [57]).

  • Other algorithms inside the class of hard clustering are termed Miscellaneous Algorithms. Examples include (Gan et al. [57]) Time Series Clustering Algorithms in which data is usually classified into two categories—much individual time series and a single time series; Streaming Algorithms—tremendous amounts of data, including network data, temperature data, and satellite imagery data; Transaction Data Clustering Algorithms—for transaction data (market basket data) etc.

  • Graph clustering is another type of hard clustering, tailored to cluster data stored in graph data structures (Nascimento et al. [11]). Algorithms of graph clustering can be broadly classified into two categories—graph and node clustering algorithms.

4 Clustering graph data: graph and node clustering algorithms

4.1 Graph clustering algorithms

Graph clustering algorithms are concerned with clustering several graphs rather than one, each with a set of nodes and edges, based on their underlying structure, and could be discussed either in the context of graph data as well as semi-structured data, e.g. XML data. Some popular approaches in this regard are:

  • Structural distance-based approach, e.g. XClust (Lee et al. [61]).

  • Structural summary-based approach (Dalamagas et al. [62]).

  • The XProj approach (Aggarwal et al. [63]).

The use of eigenvectors to represent and cluster data points or graph nodes was made popular by the conception of spectral clustering, which is a form of a node clustering algorithm.

4.2 Node clustering algorithms

Node clustering algorithms use a distance function to measure proximity between data points or nodes, of a multi-dimensional dataset (Aggarwal et al. [8]). The desired goal is to partition the graph by minimizing the weights of the edges across the partition.

Minimum cut: Given a graph, G = (V,E) with vertex (node) set V and edge (relation) set E, the minimum-cut algorithm tries to identify the smallest sum of edge weights that need to be removed to separate the graph V into two disconnected components for binary graph partitioning [64]. It has a complexity of O(\(n^{2}\)), where n is the number of nodes (Karger [65]).

Ratio cut: Ratio cut is a measure of the quality of a partition. It is the ratio of the total edge weights between the clusters to the total edge weights within the clusters. The objective in ratio cut is to find a partition that minimizes this ratio, indicating a good separation of clusters. Ratio cut is often employed in the context of spectral clustering, especially for binary partitioning [66].

Multi-way graph partitioning: Multi-way graph partitioning is an NP-Hard problem, where the goal is to partition the set of vertices into k (greater than 2) clusters so that the weights of edges whose ends are in different partitions are minimized (Kernighan et al. [67]). The time complexity in this case increases exponentially with the value of k. A variation of this heuristic approach has been discussed by Fjällström [68].

Network-structure index: In this technique, the graph is partitioned into zones through a competitive flooding algorithm achieved through labelling seeds by zone identification, i.e. randomly selecting unlabeled neighbours and adding a label that matches its current value. The process repeats until all nodes are labelled [69].

Girvan–Newman algorithm: A divisive clustering algorithm based on the concept of edge betweenness centrality (Girvan et al. [70]) which is the number of shortest paths passing through the endpoints of the edge. The algorithm starts with calculating edge betweenness for every edge in the graph, then removes the edge with the highest edge betweenness and calculates edge betweenness for the remaining edges. The process repeats until all edges are removed (Despalatović et al. [71]).

Determining quasi-cliques: While most partitioning algorithms try to minimize edge density, this technique focuses on maximizing edge density within a partition. To elaborate, a clique is a graph where all pairs of nodes have an edge between them and a quasi-clique is defined by imposing a lower bound on the degree of each vertex in the given set of nodes (Abello et al. [72]).

Min-hash approach: Min-hash approach attempts to define a node’s outlinks (hyperlinks) as sets, i.e. two nodes are considered similar, if they share many outlinks [73]. The jaccard coefficient is used to represent the similarity between two nodes (Baharav et al. [74]).

4.3 Spectral clustering algorithms

Spectral clustering use eigenvalues and eigenvectors to represent clusters, as a set of vertices (nodes), derived from the node adjacency matrix of a graph [8]. These algorithms can provide lower/upper bounds for minimization/maximization of graph partitioning problems [11]. In the following, we discuss popular spectral clustering algorithms, along with their steps and complexity. In Fig. 5, we compare the steps of spectral clustering algorithms compared to a basic clustering algorithm, such as K-means. The steps are often repeated iteratively to reduce the value of a chosen cost function (SSE in K-means clustering). Spectral clustering algorithms have the additional steps of creating the Laplacian matrix and deriving eigenvectors and eigenvalues from it, which is why they have high computational costs, similar to hierarchical clustering (Table 3).

Fig. 5
figure 5

Step-wise comparison of spectral algorithms against basic clustering algorithms. Dashed lines represent paths for spectral clustering while undashed lines represent paths for basic clusterings, such as K-means

4.3.1 EIGI algorithm

Algorithm 4.3.1
figure a

EIGI algorithm [75]

The EIGI algorithm (see Algorithm 4.3.1) is based on the linear ordering of the Fiedler eigenvectors, performed using the Lanczos algorithm [11]. The eigenvector corresponding to the second smallest eigenvalue of a graph Laplacian matrix is usually referred to as the Fiedler vector, as defined by Doshi et al. [76]. The Lanczos method is an algorithm which is used to find a few extreme eigenvalues of a large symmetric matrix along with the associated eigenvectors (Parlett et al. [77]).

The Lanczos algorithm has a complexity of O(nk), where n = number of nodes and k = number of Lanczos iterations. As Nascimento [11] states, EIGI has the same computational complexity as the Lanczos algorithm, which would be O(\(n^{2}\)) in the worst-case scenario, if k = n.

4.3.2 KP algorithm

Algorithm 4.3.2
figure b

KP Algorithm [78]

The KP algorithm (Nascimento et al. [11])—defined from its k-way partitioning—intends to calculate how close nodes are by observing cosine similarities between pairs of rows from the eigenmatrix U (see Algorithm 4.3.2).

4.3.3 Multiple Eigenvector linear orderings (MELO) algorithm

Algorithm 4.3.3
figure c

MELO algorithm [79]

MELO algorithm (see Algorithm 4.3.3), is a greedy approach proposed by Alpert et al. [79], which partitions data into k segments using a dynamic programming procedure.

4.3.4 Shi and Malik (SM/KNSC) algorithm

Algorithm 4.3.4
figure d

SM algorithm [80]

A popular algorithm, Shi and Malik [80] (refer Algorithm 4.3.4) applied K-means algorithm on the eigenmatrix, where each row of the matrix is treated as a single object from the dataset.

4.3.5 Meila-Shi (MS/multicut) algorithm

Algorithm 4.3.5
figure e

MS algorithm [82]

Proposed by Meila and Shi [82] (refer Algorithm 4.3.5) the algorithm clusters a matrix of k largest eigenvalues. In this case, the normalized graph is formed through random walks.

4.3.6 Ng–Jordan–Weiss (NJW/KNSC1) algorithm

Algorithm 4.3.6
figure f

NJW algorithm [17]

The Ng–Jordan–Weiss algorithm proposed by Ng et al. [17] (refer Algorithm 4.3.6) is another improvement over the SM algorithm, given that the NJW algorithm applies K-means algorithm on a renormalized Laplacian matrix representing the dataset.

4.3.7 Kannan, Vempala and Vetta (KVV) algorithm

Algorithm 4.3.7
figure g

KVV algorithm [83]

The KVV algorithm is an improvement over the SM algorithm with the KVV algorithm using Cheeger conductance for partitioning. Calculating the Cheeger conductance is beneficial in the context of graph partitioning and clustering because it provides a quantitative measure of the quality of a graph cut [84]. In order to find the Cheeger conductance or conductance of a cluster, the set of vertices is weighted to reflect their importance (Kannan et al. [83]).

4.3.8 Self-tuning spectral clustering algorithm

Algorithm 4.3.8
figure h

Self-tuning spectral clustering algorithm [85]

Most algorithms till now require the scaling parameter to be stated explicitly by the user, derived through domain knowledge, trial and error, or optimally found through several runs. To find the optimal  hyperparameter value for scaling, for a given graph, Zelnik-Manor et al. [85] introduced a method to analyse the local scaling parameter \(\sigma\) for each data point. The self-tuning algorithm performs a similar eigendecomposition to NJW resulting in a worst possible complexity of O(\(n^3\))

4.3.9 Co-trained multi-view spectral clustering algorithm

Algorithm 4.3.9
figure i

Co-trained multi-view spectral clustering algorithm [86]

Multi-view data refers to data that is generated from different sources or observed from different perspectives (data pre-processing and/or analysis methods). As Yang et al. discussed [87], multi-view data refers to data objects that can be viewed from different angles or measured using different instruments, resulting in multiple views of the same data. Each individual view, in this case, has the possibility to lead to distinct knowledge discovery. These algorithms can be classified into five categories [87]:

  • Co-training algorithms bootstrap clustering of the different views, either by using the prior or by gaining knowledge from one another.

  • Multi-kernel learning predefine and combine kernels corresponding to each view, either linearly or non-linearly.

  • Multi-view graph clustering fuses graphs from all views to a single graph and then implements graph-cut (e.g. node clustering) algorithms.

    • Multi-view spectral clustering

  • Multi-view subspace clustering algorithms learn unified feature representations from all feature subspaces of all views.

  • Multi-task multi-view clustering uses tasks to assess views and extracts inter-task knowledge to exploit multi-task and multi-view relationships.

In Algorithm 4.3.9, Kumar et al. [86] merge the co-training and the multi-view graph clustering as a novel approach to the problem of multi-view spectral clustering.

4.3.10 Constrained spectral clustering algorithm

Algorithm 4.3.10
figure j

Constrained spectral clustering algorithm [88]

Constrained spectral clustering [88] (refer Algorithm 4.3.10) is a method of encoding many constraints into an algorithm such as K-means or hierarchical clustering. It uses the graph Laplacian and Eigenspace to explicitly encode ML (Must Link) and CL (Cannot Link) constraints to optimize the objective function for better results.

4.3.11 Anchor algorithm

Algorithm 4.3.11
figure k

Anchor algorithm [89]

Anchors hierarchy is a method of structuring data of generating nodes suited to the given task [89] (refer Algorithm 4.3.11). This concept has been used to define anchors for the anchor algorithm.

4.3.12 Hierarchical spectral clustering algorithm

Algorithm 4.3.12
figure l

Hierarchical spectral clustering algorithm [90]

HSC (Hierarchical based Spectral Clustering) [90] (refer Algorithm 4.3.12) is a novel clustering algorithm that combines spectral clustering with hierarchical methods to cluster datasets in convex and non-convex spaces more efficiently and accurately. It obviates the disadvantage of traditional spectral clustering by not using misleading information from noisy neighbouring data points, thus avoiding local optimum traps.

4.3.13 Spectral clustering using deep neural networks algorithm

Algorithm 4.3.13
figure m

Spectral clustering using deep neural networks algorithm [91]

Spectral clustering using deep neural networks [91] (refer Algorithm 4.3.13) is a technique that uses deep neural networks to cluster data points into groups. It overcomes the limitations of scalability and generalization by training a network, called SpectralNet, which learns an embedding map from input data points to their associated graph Laplacian matrix and then clusters them.

4.3.14 Ultra-scalable spectral clustering algorithm

Algorithm 4.3.14
figure n

Ultra-scalable spectral clustering algorithm [93]

Ultra-scalable spectral clustering (U-SPEC) [93] (refer Algorithm 4.3.14) is an efficient algorithm for partitioning large datasets into clusters. It has nearly linear time and space complexity, allowing it to robustly and efficiently process 10-million-level nonlinearly separable data sets on a PC with 64 GB memory.

4.3.15 Spectral clustering with graph neural network for graph pooling

Algorithm 4.3.15
figure o

Spectral clustering with graph neural network for graph pooling [94]

The algorithm (refer algorithm 4.3.15) employs Graph Neural Networks (GNNs) for spectral clustering, introducing a MinCutPool layer to coarsen the graph representation hierarchically. It utilizes a multi-layer perceptron (MLP) to compute soft cluster assignments based on node features, optimizing an unsupervised loss that balances cut loss and orthogonality loss. Through iterative pooling, the algorithm generates a hierarchy of coarsened graph representations, capturing diverse scales of structural information. End-to-end training ensures jointly optimized GNN and MLP parameters, demonstrating effectiveness in various tasks by avoiding degenerate solutions and handling imbalanced clusters.

4.3.16 Quantum spectral clustering algorithm

Algorithm 4.3.16
figure p

Quantum spectral clustering [95]

Of the several implementations of quantum spectral clustering algorithms [96,97,98], Kerenidis et al. [95] implemented a method for to group data with non-convex or nested structures. This method derives the normalized incidence matrix of a graph from the adjacency matrix to to calculate the Laplacian from. As a result the data is projected in a low-dimensional space where clustering can be done more efficiently and quickly than traditional methods using the spectral properties of the Laplacian matrix.

5 Discussion

5.1 Computational complexity of spectral clustering algorithms

Spectral clustering, at its worst, would provide a computational complexity of O(n\(^{3}\)) to calculate the eigenvectors and eigenvalues from the adjacency matrix. This is similar to hierarchical clustering and some density-based approaches, e.g. OPTICS (Tables 3 and 4). However, spectral techniques benefit from the Laplacian representation of the data, which helps identify local neighbourhoods using eigenvectors. The least computationally expensive is the EIGI algorithm which employs the ratio cut solution to partition a graph into two clusters with a computational complexity of O(n\(^{2}\)). The general range of the computational expenses of spectral clustering algorithms has been plotted in the Fig. 6.

Table 4 Comparison of spectral clustering algorithms discussed in this survey
Fig. 6
figure 6

Range of computational complexity of spectral clustering algorithms. Black line represents O(n), blue line represents O(\(n^2\)) and red represents the highest possible complexity of O(\(n^3\)). Values in the y-axis (number of computations) have raised to 1 * \(e^6\)

The fast accelerating computational complexity against the increasing number of nodes and samples is one of the primary, if not the main, drawbacks of employing spectral clustering on large datasets. Scalability issues have been the key driving factor to investigate improved methods which lowers the the expense of spectral clustering algorithms down to O(n\(^{2}\)) or even O(n) [94].

The space (memory) complexity of spectral clustering algorithms are O(n\(^{2}\)), at its worst, to store the square adjacency matrix and perform further calculations within the same memory storage. When compared with other clustering algorithms, spectral clustering algorithms are as computationally expensive as the agglomerative clustering algorithm. However, with spectral clustering, the benefits further outweigh the expenses as we get a representation of the data points in the eigenspace which can be used for other tasks than spectral clustering e.g. visualization.

5.2 Applications of spectral clustering algorithms

The primary strength of spectral clustering lies in partitioning a graph containing nodes, whether this graph is created from pixel data of an image, vectors generated from texts or documents or abundance data of proteins in samples. In Table 4, a comprehensive overview of the applications of different spectral clustering algorithms is provided. We can also observe a correlation between the type of laplacian matrix used and the application areas in this case.

The initial spectral clustering algorithms, where the unnormalized Laplacian matrices were used to generate eigenvalues and eigenvectors, mainly were used for tasks such as parallel computing, sparse matrix partitioning, electronic chip design (VLSI—Very Large Scale Integration). The introduction of the normalized Laplacian, in algorithms such as the SM, NJW, and self-tuning proved successful in image segmentation and general-purpose data analysis [108]. Some algorithms have been designed to handle very specific tasks such as the Anchor algorithm is used for text and document categorization. Additionally, there has been progress in the application of spectral clustering in several other domains such as protein abundance, gene expression, and social network analysis. Such progress deserves attention to motivate further research in graph data and spectral clustering.

5.3 Future research scope

As discussed in the previous section, scalability is still a major challenge that faces spectral clustering and as a result is one of the the major scope of improvement in the domain of graph clustering. While parallelization of calculations using GPUs are already created huge positive differences along with substantial decrease in computational complexity of algorithms, one issue that still persists is the recalculation of all intermediate steps when new data points are introduced for clustering on an existing model. Spectral clustering using deep neural networks and graph neural networks overcome this issue and possibly there could be solutions which uses simpler models than neural networks to solve this issue.

Another promising direction of improvement could be heterogeneous node clustering. While it is quite straightforward to create similarity graphs from homogeneous node attributes or features, it is quite challenging to create similarity graphs from heterogeneous set of nodes containing varying node attributes or features. This could be addressed by improved methods of meta-path selection, cross-domain generalization techniques and incorporating external information for heterogeneous set of nodes.