Clustering graph data: the roadmap to spectral techniques

Graph data models enable efficient storage, visualization, and analysis of highly interlinked data, by providing the benefits of horizontal scalability and high query performance. Clustering techniques, such as K-means, hierarchical clustering, are highly beneficial tools in data mining and machine learning to find meaningful similarities and differences between data points. Recent developments in graph data models, as well as clustering algorithms for graph data, have shown promising results in image segmentation, gene data analysis, etc. This has been primarily achieved through research and development of algorithms in the field of spectral theory, leading to the conception of spectral clustering algorithms. Spectral clustering algorithms have been one of the most effective in grouping similar data points in graph data models. In this paper, we have compiled 16 spectral clustering algorithms and compared their computational complexities, after an overview of graph data models and graph database models. Furthermore, we provided a broad taxonomy to classify most existing clustering algorithms and discussed the taxonomy in detail.


Introduction
Graph data models are useful to store, process and analyse highly interlinked data [1].This is achieved through the use of graph theory to store data in the form of nodes and edges [2].With the recent rise in the popularity of graph databases, which rely on graph data models to store and query data, there is a growing need to incorporate state-of-the-art learning algorithms to analyse data in graph data models.This incorporation enables the extraction of meaningful information that might otherwise be hidden when analysed in a tabular structure.Employing unsupervised and supervised learning techniques is highly effective for analysing data across several domains, e.g. to study social networks [3], physical systems [4], proteomics knowledge graphs [5], etc [6].
One of the most commonly used unsupervised learning algorithms is clustering, which is widely used by data analysts and domain experts to group similar instances and explore hidden structures in a wide spectrum of fields, ranging from engineering, computer science and medical sciences to social sciences and economics as well [2].The challenges and

Review
Discover Artificial Intelligence (2024) 4:7 | https://doi.org/10.1007/s44163-024-00102-xopportunities of graph data have led to the development of specialized clustering algorithms designed specifically for graph data.These algorithms include spectral clustering which often outperforms basic clustering algorithms such as K-means, hierarchical, etc. [9].The popularity of graph data is on the rise with the development of graph database management systems, e.g.Allegro-Graph, ArangoDB, InfiniteGraph, Neo4J [10].The flexibility of the graph data structures allows novel possibilities for data exploration, and consequently, knowledge discovery.As a result, clustering algorithms designed for graph data models, e.g.spectral clustering algorithms, are gaining momentum along with the applications of graph data and databases.
A popular class of clustering algorithms, designed specifically for graph data models, is known as spectral clustering.Spectral clustering utilizes eigendecomposition to represent and group data into clusters [11], and its conceptualization dates back to 1973 [12].This survey analyses popular spectral clustering algorithms, which have been widely discussed in the scientific community due to their high efficiency in applications such as image segmentation, analysing patterns in gene expression data or proteomics data.We initially provide a roadmap (see Fig. 1) to navigate through the clustering paradigm till spectral clustering is reached, upon which we elaborate on 16 primary spectral clustering algorithms and conclude with a comparison of their complexities and applications.Since several variations of these algorithms were developed, we will concentrate on discussing most of them extensively in a single survey.
In this paper, we have provided a comprehensive overview of the following topics: Fig. 1 Taxonomy of clustering algorithms [7,8].Coloured nodes and dark edges highlight the roadmap leading to spectral clustering.The dotted arrow denotes that k-means is also a hard/crisp clustering, in addition to being a squared error reduction-based clustering algorithm as well Discover Artificial Intelligence (2024) 4:7 | https://doi.org/10.1007/s44163-024-00102-xReview developments in the domain of spectral clustering require an elaborate survey of significant techniques and a roadmap to trace the development of the use of eigenvectors for clustering.

Definitions and notations
Graph: A graph is represented by, G = (V,E), where V denotes a set of nodes (vertices) and E denotes a set of edges (relationships) between the nodes.Edges can be weighted or unweighted.Edges should be undirected for spectral clustering algorithms (von Luxburg [9]).There are three different methods to transform data points into a similarity graph, for spectral clustering [9] as shown in Fig. 2: • Fully connected graph [16,17]: Any data points with positive adjacency values can be connected to form a graph which is only useful when the similarity function itself can model local neighbourhoods, e.g.: where A ij is the affinity between s i and s j ; s i − s j is the distance between s i and s j ; σ is the scaling parameter • -neighbourhood graph [18]: Data points are connected based on a threshold, .Edges with weights lower than the are discarded to form a -neighbourhood graph from a fully-connected graph.• k-neighbourhood graph [19]: Formed using the k-nearest neighbour algorithm resulting in either k-nearest neighbour or mutual k-nearest neighbour graph, depending on how the vertices were connected.k denotes the minimum number of points required to define a local neighbourhood.
Proximity measure: Measure of distance, similarity, dissimilarity and/or adjacency between vertices/nodes/data points derived from their attributes.Clustering algorithms always require calculating proximity measures, among the given data points, as their primary step.Popular examples (Mehta et al. [20]) in the context of clustering can be broadly categorized into two types: metric and non-metric (see Table 1).Metric proximity measures satisfy properties such as non-negativity, symmetry, and triangle inequality.On the other hand, non-metric proximity measures may violate symmetry and/or triangle inequality.
Distance matrix: Square matrix (refer Fig. 3a) representing the distance between data points based on a distance measure, e.g.Euclidean, Manhattan, etc.The diagonal values are 0 which signifies the lowest possible distance value.
Similarity matrix: Square matrix (refer Fig. 3b) representing the similarity between data points based on a similarity measure, e.g.Dice, Cosine, etc. Diagonal value 1 represents the highest possible similarity value.However, the diagonal of a square matrix containing dissimilarity between data points may contain 0 as the lowest possible dissimilarity value.
Adjacency (affinity) matrix: Square matrix (refer Fig. 3c) representing the adjacency (also affinity or node similarity) between points/nodes in a graph, G (Hogben [29]).It can be of two types-unweighted and weighted.
For weighted adjacency matrix: A = [ i j ], where, i j is an edge of graph, G For unweighted adjacency matrix: A = [ i j ], where, i j = 1 if {i,j} is an edge of graph, G and i j = 0 otherwise.Diagonal (degree) matrix: Square matrix containing the degree of each node in its diagonal (Hogben [29]) which represents the number of edges, connected to each node in the graph.
(1) Laplacian matrix: Spectral clustering algorithms require adjacency and degree matrix to create the Laplacian matrix of the input graph which acts as a primary input to most.Commonly used types in spectral clustering [9,30] are: where D = Diagonal and A = Adjacency matrix, and = relaxation parameter.
Table 1 A generic overview of popular proximity measures used in clustering n = data set size, x i , y i = vectors with i elements, p = scaling factor, V = covariance matrix, E = cross-correlation, = standard deviation, N(x)/N(y) = set of vertices that form the "neighbourhood" of a single vertex x/y, ∩ = set intersections, ∪ = set union, A = ground-truth, B = predicted label, C ij = sum of lesser values for species found in sites i and j, S i = sum of species found at site i, S j = sum of species found at site j Proximity Type Equation Euclidean distance [21] Metric [22] Non-metric Hamming distance [22] Non-metric

Graph theory
Graph theory is a branch of discrete mathematics that deals with the study of mathematical structures to model entities and relationships between them, in the form of nodes/vertices and relations/edges, respectively.It has been discussed by Pal Singh et al. [31], along with its wide spectrum of applications in database design, software engineering, circuit designing, network designing, and visual interfaces.Graph theory influenced the conception of several database models, such as semantic, object-oriented, graph, and XML, as shown in Fig. 4. Some popular data structures influenced by graph theory are trees, linked lists, etc. that can be used to model graphs.

Graph database models
Graph databases primarily provide storage and querying of data stored in graph data models.Additionally, available plug-ins [32] can provide features such as conceptual visualization, e.g.Neo4j bloom [33], data analytics, e.g.Graph Data Science library [34], Decision Tree Plug-in [35].Hence the full potential of graph data models could be realized on a Graph Database Management System such as Neo4j [36], AllegroGraph [10].Chad et al. [36] have evaluated Neo4j,   2. The flexibility of data schema in a graph database represents the added benefit over a relational database.Regarding scalability, it could be argued that relational databases perform better regarding data distribution across several machines, as discussed by Pokorny [1].However, scalability on large datasets is not an issue for graph databases [1].

Cluster analysis
Clustering (cluster analysis), is a process of grouping data into distinct classes so that objects with similar attributes and/ or characteristics are grouped in the same class/clusters [7].It is classified as an unsupervised learning algorithm where the goal is to find meaningful patterns from underlying unlabeled data [38].
Traditional clustering algorithms such as K-means, DBSCAN, and agglomerative clustering, suffer when dealing with high-dimensional data since most often Euclidean distance alone is used as the distance/proximity measure between data points which fails at accurately portraying the relative positions of data points at high dimensions [39].Spectral clustering algorithms overcome this (graphs are non-Euclidean data structures [40]) through the calculation of eigenvalues and eigenvectors from Euclidean distances of the graph Laplacian matrix to partition the graph in the eigenspace [8].
Clustering algorithms could be broadly generalized into two categories: partitional and hierarchical clustering (Celebi et al. [41]).The former partitions data points according to a pre-defined number of groups, while the latter hierarchically assigns data points as groups of subgroups, until all points belong to one cluster (bottom-up) or individual clusters (top-down).A brief comparison between partitional and hierarchical clustering algorithms is provided in Table 3.While hierarchical clustering is generously illustrative to have an elaborate overview of the cluster formation, which acts as a huge advantage to realize the similarity between data points in sub-clusters, it comes at the cost of higher time and space complexity (Garima et al. [42]) than partitional clustering.
Sum of squares of error (SSE) minimization: The most commonly used partitional clustering technique, K-means, optimizes SSE of clusters, while Ezugwu et al. [7] also labels it as a hard clustering technique.It has the advantage of being easily implemented on large datasets at a considerably low run time.The results are easily interpretable, which benefits the user in having a general overview of the data and possible clusters.
Fuzzy: Fuzzy clustering involves assigning the degree of membership, for each data point to more than one cluster, e.g.fuzzy c-means algorithm [48].
Mixture resolving: Mixture resolving methods, according to Grira et al. [49], assume that data points belong to one of several distributions.Expectation maximization (EM) is an iterative approach that aims to find the maximum likelihood estimates of the parameters [50] and is used in this case for parameter estimation.
• Density-based clustering algorithms such as DBSCAN (Ester et al. [51]), OPTICS (Ankerst et al. [44]), DENCLUE (Hinneburg et al. [52]) provide better performance than K-means by handling arbitrary shapes and detecting outliers.• Subspace clustering algorithms can be categorized into top-down and bottom-up algorithms.PART (Cao et al. [53]) and PROCLUS (Aggarwal et al. [54]) are top-down algorithms, in which the whole set of dimensions is used to find an initial grouping, and the subspaces of each cluster are assessed.On the other hand CLIQUE (Agrawal et al. [55]), and MAFIA (Nagesh et al. [56]) are bottom-up subspace algorithms, in which first dense areas in low-dimensional spaces are identified, then by combining them, clusters are created (Gan et al. [57]).• In model-based clustering such as COOLCAT (Barbará et al. [58]) and STUCCO (Bay et al. [59]), it is presupposed that the data are produced by a combination of probability distributions, each of which components represents a distinct cluster (Gan et al. [57]).• Search-based algorithms such as Genetic Algorithms (Holland et al. [60]), Al-Sultan's Method (Barbara et al. [58]) work towards globally optimal clustering to fit the data.Compare to, for example, fuzzy clustering, search-based algorithms do not stop at a local optimum partition (Gan et al. [57]).4 Clustering graph data: graph and node clustering algorithms

Graph clustering algorithms
Graph clustering algorithms are concerned with clustering several graphs rather than one, each with a set of nodes and edges, based on their underlying structure, and could be discussed either in the context of graph data as well as semistructured data, e.g.XML data.Some popular approaches in this regard are: • Structural distance-based approach, e.g.XClust (Lee et al. [61]).
The use of eigenvectors to represent and cluster data points or graph nodes was made popular by the conception of spectral clustering, which is a form of a node clustering algorithm.

Node clustering algorithms
Node clustering algorithms use a distance function to measure proximity between data points or nodes, of a multidimensional dataset (Aggarwal et al. [8]).The desired goal is to partition the graph by minimizing the weights of the edges across the partition.Minimum cut: Given a graph, G = (V,E) with vertex (node) set V and edge (relation) set E, the minimum-cut algorithm tries to identify the smallest sum of edge weights that need to be removed to separate the graph V into two disconnected components for binary graph partitioning [64].It has a complexity of O(n 2 ), where n is the number of nodes (Karger [65]).
Ratio cut: Ratio cut is a measure of the quality of a partition.It is the ratio of the total edge weights between the clusters to the total edge weights within the clusters.The objective in ratio cut is to find a partition that minimizes this ratio, indicating a good separation of clusters.Ratio cut is often employed in the context of spectral clustering, especially for binary partitioning [66].
Multi-way graph partitioning: Multi-way graph partitioning is an NP-Hard problem, where the goal is to partition the set of vertices into k (greater than 2) clusters so that the weights of edges whose ends are in different partitions are minimized (Kernighan et al. [67]).The time complexity in this case increases exponentially with the value of k.A variation of this heuristic approach has been discussed by Fjällström [68].

Normalized
Text and document categorization [89] Image segmentation [80] Dimension reduction [105] Text clustering [106] General clustering Quantum [95] O(T dlog(d) 2 Projected normalized General clustering [95] Fig. Girvan-Newman algorithm: A divisive clustering algorithm based on the concept of edge betweenness centrality (Girvan et al. [70]) which is the number of shortest paths passing through the endpoints of the edge.The algorithm starts with calculating edge betweenness for every edge in the graph, then removes the edge with the highest edge betweenness and calculates edge betweenness for the remaining edges.The process repeats until all edges are removed (Despalatović et al. [71]).
Determining quasi-cliques: While most partitioning algorithms try to minimize edge density, this technique focuses on maximizing edge density within a partition.To elaborate, a clique is a graph where all pairs of nodes have an edge between them and a quasi-clique is defined by imposing a lower bound on the degree of each vertex in the given set of nodes (Abello et al. [72]).
Min-hash approach: Min-hash approach attempts to define a node's outlinks (hyperlinks) as sets, i.e. two nodes are considered similar, if they share many outlinks [73].The jaccard coefficient is used to represent the similarity between two nodes (Baharav et al. [74]).

Spectral clustering algorithms
Spectral clustering use eigenvalues and eigenvectors to represent clusters, as a set of vertices (nodes), derived from the node adjacency matrix of a graph [8].These algorithms can provide lower/upper bounds for minimization/maximization of graph partitioning problems [11].In the following, we discuss popular spectral clustering algorithms, along with their steps and complexity.In Fig. 5, we compare the steps of spectral clustering algorithms compared to a basic clustering algorithm, such as K-means.The steps are often repeated iteratively to reduce the value of a chosen cost function (SSE in K-means clustering).Spectral clustering algorithms have the additional steps of creating the Laplacian matrix and deriving eigenvectors and eigenvalues from it, which is why they have high computational costs, similar to hierarchical clustering (Table 3).

EIGI algorithm
The EIGI algorithm (see Algorithm 4.3.1) is based on the linear ordering of the Fiedler eigenvectors, performed using the Lanczos algorithm [11].The eigenvector corresponding to the second smallest eigenvalue of a graph Laplacian matrix is usually referred to as the Fiedler vector, as defined by Doshi et al. [76].The Lanczos method is an algorithm which is used to find a few extreme eigenvalues of a large symmetric matrix along with the associated eigenvectors (Parlett et al. [77]).
The Lanczos algorithm has a complexity of O(nk), where n = number of nodes and k = number of Lanczos iterations.As Nascimento [11] states, EIGI has the same computational complexity as the Lanczos algorithm, which would be O(n 2 ) in the worst-case scenario, if k = n.

KP algorithm
The KP algorithm (Nascimento et al. [11])-defined from its k-way partitioning-intends to calculate how close nodes are by observing cosine similarities between pairs of rows from the eigenmatrix U (see Algorithm 4.3.2).

Algorithm 4.3.3 MELO algorithm [79]
1: Input: Graph G, its Laplacian matrix L, the number of desired clusters k and the number of eigenvectors to be used d. 2: Construct matrix of scaled eigenvectors.
3: Perform linear ordering on the eigenvectors.4: Find the final k-way partition using linear ordering.5: Output: k partitions.Algorithm 4.3.4SM algorithm [80] 1: Input: Graph G, its normalized Laplacian matrix L sym and the number of desired clusters k. 2: Find k eigenvectors of the generalised eigensystem [81] and arrange them in matrix U.  [79], which partitions data into k segments using a dynamic programming procedure.

Shi and Malik (SM/KNSC) algorithm
A popular algorithm, Shi and Malik [80] (refer Algorithm 4.3.4)applied K-means algorithm on the eigenmatrix, where each row of the matrix is treated as a single object from the dataset.

Meila-Shi (MS/multicut) algorithm
Proposed by Meila and Shi [82] (refer Algorithm 4.3.5) the algorithm clusters a matrix of k largest eigenvalues.In this case, the normalized graph is formed through random walks.

Kannan, Vempala and Vetta (KVV) algorithm
The KVV algorithm is an improvement over the SM algorithm with the KVV algorithm using Cheeger conductance for partitioning.Calculating the Cheeger conductance is beneficial in the context of graph partitioning and clustering because it provides a quantitative measure of the quality of a graph cut [84].In order to find the Cheeger conductance or conductance of a cluster, the set of vertices is weighted to reflect their importance (Kannan et al. [83]).

Self-tuning spectral clustering algorithm
Most algorithms till now require the scaling parameter to be stated explicitly by the user, derived through domain knowledge, trial and error, or optimally found through several runs.To find the optimal hyperparameter value for scaling, for a given graph, Zelnik-Manor et al. [85] introduced a method to analyse the local scaling parameter for each data point.The self-tuning algorithm performs a similar eigendecomposition to NJW resulting in a worst possible complexity of O(n 3 )

Co-trained multi-view spectral clustering algorithm
Multi-view data refers to data that is generated from different sources or observed from different perspectives (data pre-processing and/or analysis methods).As Yang et al. discussed [87], multi-view data refers to data objects that can be viewed from different angles or measured using different instruments, resulting in multiple views of the same data.Each individual view, in this case, has the possibility to lead to distinct knowledge discovery.These algorithms can be classified into five categories [87]: • Co-training algorithms bootstrap clustering of the different views, either by using the prior or by gaining knowledge from one another.• Multi-kernel learning predefine and combine kernels corresponding to each view, either linearly or non-linearly.
• Multi-view graph clustering fuses graphs from all views to a single graph and then implements graph-cut (e.g.node clustering) algorithms.
-Multi-view spectral clustering • Multi-view subspace clustering algorithms learn unified feature representations from all feature subspaces of all views.Algorithm 4.3.9Co-trained multi-view spectral clustering algorithm [86] 1: Input: Graph G, Laplacian matrices, e.g.L 1  and L 2 for two views (derived from similarity matrices S 1 and S 2 ), and the number of desired clusters k. 2: Get discriminative eigenvectors in each view U 1 and U 2 .3: Cluster U 1 and modify graph in view 2 and vice versa 4: Go to step 1 and repeat for a number of iterations.5: Output: k partitions.
In Algorithm 4.3.9,Kumar et al. [86] merge the co-training and the multi-view graph clustering as a novel approach to the problem of multi-view spectral clustering.1: Input: Graph G, its (Normalized) Laplacian matrix L sym , threshold β, number of desired clusters K.
Return empty set of cluster assignment indicator u * 6: else 7: Solve the generalized eigenvalue system;

8:
Remove eigenvectors associated with negative eigenvalues and normalize the rest by, v , where the columns of V are a subset of the feasible eigenvectors generated in step 8; 10: 1: Input: Graph G and the number of desired clusters k. 2: Choose anchors at random, set k' = 1 and k" = 0 and assign final anchors as the farthest points from initially chosen points.3: Construct clusters associated with anchors, x k 4: Test if x k has enough points, specified through threshold parameter.5: Set k' = k' + 1. Choose x k to be the farthest from all other existing anchors 6: If k' − k" < k, go to step 3 7: Output: k partitions.Algorithm 4.3.12Hierarchical spectral clustering algorithm [90] 1: Input: Graph G, its Laplacian matrix L, number of desired clusters k, indicated number of eigenvectors α. 2: Find the largest eigenvectors of L and produce the normalized feature vector space T=(t 1 ,...,t n ) 3: for i ∈ 1, 2, . . ., n do 4:

Anchor algorithm
Anchors hierarchy is a method of structuring data of generating nodes suited to the given task [89] (refer Algorithm 4.3.11).This concept has been used to define anchors for the anchor algorithm.

Hierarchical spectral clustering algorithm
HSC (Hierarchical based Spectral Clustering) [90] (refer Algorithm 4.3.12) is a novel clustering algorithm that combines spectral clustering with hierarchical methods to cluster datasets in convex and non-convex spaces more efficiently and accurately.It obviates the disadvantage of traditional spectral clustering by not using misleading information from noisy neighbouring data points, thus avoiding local optimum traps.
Algorithm 4.3.13Spectral clustering using deep neural networks algorithm [91] 1: Input: Graph G from unlabelled data X ⊆ R d , loss of similarity L SpectralNet (θ), Siamese net L siamese [92], number of desired clusters k and batch size m.

Quantum spectral clustering algorithm
Of the several implementations of quantum spectral clustering algorithms [96][97][98], Kerenidis et al. [95] implemented a method for to group data with non-convex or nested structures.This method derives the normalized incidence matrix of a graph from the adjacency matrix to to calculate the Laplacian from.As a result the data is projected in a low-dimensional space where clustering can be done more efficiently and quickly than traditional methods using the spectral properties of the Laplacian matrix.

Computational complexity of spectral clustering algorithms
Spectral clustering, at its worst, would provide a computational complexity of O(n 3 ) to calculate the eigenvectors and eigenvalues from the adjacency matrix.This is similar to hierarchical clustering and some density-based approaches, e.g.OPTICS (Tables 3 and 4).However, spectral techniques benefit from the Laplacian representation of the data, which helps identify local neighbourhoods using eigenvectors.The least computationally expensive is the EIGI algorithm which employs the ratio cut solution to partition a graph into two clusters with a computational complexity of O(n 2 ).The general range of the computational expenses of spectral clustering algorithms has been plotted in the Fig. 6.The fast accelerating computational complexity against the increasing number of nodes and samples is one of the primary, if not the main, drawbacks of employing spectral clustering on large datasets.Scalability issues have been the key driving factor to investigate improved methods which lowers the the expense of spectral clustering algorithms down to O(n 2 ) or even O(n) [94].
The space (memory) complexity of spectral clustering algorithms are O(n 2 ), at its worst, to store the square adjacency matrix and perform further calculations within the same memory storage.When compared with other clustering algorithms, spectral clustering algorithms are as computationally expensive as the agglomerative clustering algorithm.However, with spectral clustering, the benefits further outweigh the expenses as we get a representation of the data points in the eigenspace which can be used for other tasks than spectral clustering e.g.visualization.

Applications of spectral clustering algorithms
The primary strength of spectral clustering lies in partitioning a graph containing nodes, whether this graph is created from pixel data of an image, vectors generated from texts or documents or abundance data of proteins in samples.In Table 4, a comprehensive overview of the applications of different spectral clustering algorithms is provided.We can also observe a correlation between the type of laplacian matrix used and the application areas in this case.
The initial spectral clustering algorithms, where the unnormalized Laplacian matrices were used to generate eigenvalues and eigenvectors, mainly were used for tasks such as parallel computing, sparse matrix partitioning, electronic chip design (VLSI-Very Large Scale Integration).The introduction of the normalized Laplacian, in algorithms such as the SM, NJW, and self-tuning proved successful in image segmentation and general-purpose data analysis [108].Some algorithms have been designed to handle very specific tasks such as the Anchor algorithm is used for text and document categorization.Additionally, there has been progress in the application of spectral clustering in several other domains such as protein abundance, gene expression, and social network analysis.Such progress deserves attention to motivate further research in graph data and spectral clustering.

Future research scope
As discussed in the previous section, scalability is still a major challenge that faces spectral clustering and as a result is one of the the major scope of improvement in the domain of graph clustering.While parallelization of calculations using GPUs are already created huge positive differences along with substantial decrease in computational complexity of algorithms, one issue that still persists is the recalculation of all intermediate steps when new data points are introduced for clustering on an existing model.Spectral clustering using deep neural networks and graph neural networks overcome this issue and possibly there could be solutions which uses simpler models than neural networks to solve this issue.
Another promising direction of improvement could be heterogeneous node clustering.While it is quite straightforward to create similarity graphs from homogeneous node attributes or features, it is quite challenging to create similarity graphs from heterogeneous set of nodes containing varying node attributes or features.This could be addressed by improved methods of meta-path selection, cross-domain generalization techniques and incorporating external information for heterogeneous set of nodes.

Fig. 2
Fig. 2 Types of neighbourhood graphs.a Fully-connected graph; b -neighbourhood graph; c k-neighbourhood graph for weighted graphs, all the edge weights in b and b would not be replaced by 1

Fig. 3
Fig. 3 Matrix representation of a distance, b similarity (affinity) and c adjacency matrices for nodes N1, N2 and N3

Fig. 4
Fig.4 Evolution of database models.Arrows denote influence-dotted arrows represent the influence of graph theory on various database models[32]

Fig. 5
Fig. 5 Step-wise comparison of spectral algorithms against basic clustering algorithms.Dashed lines represent paths for spectral clustering while undashed lines represent paths for basic clusterings, such as K-means

•
Other algorithms inside the class of hard clustering are termed Miscellaneous Algorithms.Examples include (Gan [57]l.[57])Time Series Clustering Algorithms in which data is usually classified into two categories-much individual time series and a single time series; Streaming Algorithms-tremendous amounts of data, including network data, temperature data, and satellite imagery data; Transaction Data Clustering Algorithms-for transaction data (market basket data) etc.

Table 3
[41,42]son of commonly used clustering algorithms[41,42]Graph clustering is another type of hard clustering, tailored to cluster data stored in graph data structures(Nascimento  et al. [11]).Algorithms of graph clustering can be broadly classified into two categories-graph and node clustering algorithms. •

Table 4
Comparison of spectral clustering algorithms discussed in this survey n = number of nodes, d = number eigenvectors used, b = max 1⩽i⩽n d i , k = number of partitions (clusters), K = number of nearest representatives, p = number of representatives, t = Number of iterations, v = number of views, E = number of non-zero edges in coarsened Adjacency matrix, T = time of quantum state, = relative error

2 :
Calculate the second smallest set of eigenvalues and corresponding eigenvectors of L, using the Lanczos Algorithm.3: Compare the second set of eigenvalues to threshold r and assign it to one of two clusters.4: Output: Resulting partition.

3 :
Apply K-means algorithm on matrix U and find k partitions.4: Assign nodes to clusters if their eigenvalue belongs to the partition.5: Output: k partitions.

.3 Multiple Eigenvector linear orderings (MELO) algorithm MELO
algorithm (see Algorithm 4.3.3), is a greedy approach proposed by Alpert et al.
[17]h G, its normalized Laplacian matrix L sym and the number of desired clusters k. 2: Find k eigenvectors of the normalized Laplacian matrix L, arranging them in matrix U'.Assign nodes to clusters if their eigenvalue belongs to the partition.The Ng-Jordan-Weiss algorithm proposed by Ng et al.[17](refer Algorithm 4.3.6) is another improvement over the SM algorithm, given that the NJW algorithm applies K-means algorithm on a renormalized Laplacian matrix representing the dataset.
1: Input: 3: Find k partitions using Cheeger Conductance.4:5: Output: k partitions.1:Input: Graph G, its normalized Laplacian matrix L sym calculated from optimal σ for every pair of nodes and desired number of clusters, k. 2: Find k eigenvectors of the normalized Laplacian matrix L, arranging them in matrix U'. 6: Output: k partitions.

2 :
Construct a training set of positive and negative pairs and train a Siamese network.3: Randomly initialize the network weights θ 4: while L SpectralNet (θ) not converged do: Sample a random minibatch x 1 ,....,x m b: Compute the m×m affinity matrix W using the Siamese net c: Forward propagate x 1 .....,x m to get y 1 ,....., y m d: Compute the loss e: Use the gradient of L SpectralNet (θ) to tune all Fθ weights, except for the output layer 7: end while 8: Output: Embeddings y 1 ,.....,y n , y i ∈ R k , cluster assignments c 1 , ....c n , c i ∈ 1, . . .k Input: Dataset X = x 1 , x 2 , . . ., x N 2: Hybrid Representative Selection: a: Random sample of a set of p' candidate representatives such that p < p' N. b: p' candidates, we perform the K-means method to obtain p clusters and exploit the p cluster centres as the set of representatives.

13 Spectral clustering using deep neural networks algorithm
[93]tral clustering using deep neural networks[91](refer Algorithm 4.3.13) is a technique that uses deep neural networks to cluster data points into groups.It overcomes the limitations of scalability and generalization by training a network, called SpectralNet, which learns an embedding map from input data points to their associated graph Laplacian matrix and then clusters them.Ultra-scalable spectral clustering (U-SPEC)[93](refer Algorithm 4.3.14) is an efficient algorithm for partitioning large datasets into clusters.It has nearly linear time and space complexity, allowing it to robustly and efficiently process 10-million-level nonlinearly separable data sets on a PC with 64 GB memory.
4.3.15Spectralclustering with graph neural network for graph poolingThe algorithm (refer algorithm 4.3.15)employs Graph Neural Networks (GNNs) for spectral clustering, introducing a Min-CutPool layer to coarsen the graph representation hierarchically.It utilizes a multi-layer perceptron (MLP) to compute soft cluster assignments based on node features, optimizing an unsupervised loss that balances cut loss and orthogonality loss.Through iterative pooling, the algorithm generates a hierarchy of coarsened graph representations, capturing diverse scales of structural information.End-to-end training ensures jointly optimized GNN and MLP parameters, demonstrating effectiveness in various tasks by avoiding degenerate solutions and handling imbalanced clusters.
Graph G, its normalized Laplacian L and the number of desired clusters, k. 2: Calculate L projected on its k lowest eigenvectors, projected normalized Laplacian L(k) .3: Quantum clustering in the spectral space.4: Output: k partitions.