1 Introduction

Graphs play a pivotal role in modeling and analyzing complex relationships and structures in various real-world scenarios, ranging from social networks and biological systems to transportation networks and recommendation systems (Tsitsulin et al. 2023; Du et al. 2023). Clustering, as a fundamental task in data analysis, aims to group similar data points together based on certain criteria, thereby uncovering underlying patterns and structures within the data (Wen et al. 2023). In this context, graph clustering emerges as a powerful technique for analyzing and understanding the inherent structure of graph data (Zhang et al. 2023; Zhao et al. 2024). The need for graph clustering as a graph analysis technique arises from the inherent complexity and interconnectedness of real-world graphs (Wen et al. 2023; Qi et al. 2024). However, this task is not without its challenges. Traditional clustering methods often overlook the rich structural and attribute information present in graphs, leading to suboptimal results. In response to these challenges, attributed graph clustering techniques have been developed, which leverage both structural and attribute information to learn effective graph representations (Tsitsulin et al. 2023; Ugander and Yin 2023).

Existing attributed graph clustering techniques encompass a variety of approaches, including methods that consider the assumption that the representation of the same node should be similar when combining attribute and structure, configurations that utilize attribute information for linear representation of structure, strategies that divide the graph embedding process into several distributed models, approaches that attribute homogeneity and structural coherence by identifying communities, configurations that create a low-rank representation considering both attribute and structure information, and models that integrate both attribute and structure-based graphs to create augmented representations (Fan 2020; Yin et al. 2024). Despite these advancements, the emergence of Graph Neural Networks (GNNs), inspired by Convolutional Neural Networks (CNNs), has revolutionized the field of graph clustering (Ugander and Yin 2023; Qiao et al. 2024).

GNNs are a class of neural networks designed specifically to operate on graph-structured data. They leverage the inherent connectivity of graphs to perform node-level or graph-level tasks, such as node classification, link prediction, and graph clustering. GNNs typically operate by iteratively aggregating information from neighboring nodes, passing messages between nodes to update their representations based on the local graph structure (Wen et al. 2023; Li et al. 2023). This message-passing process allows GNNs to capture both the topological structure of the graph and the attributes associated with each node (Tsitsulin et al. 2023). To cluster a graph using GNNs, the network is trained to learn representations of nodes that are discriminative for the clustering task. These learned representations are then used as input to a clustering algorithm, such as k-means, to partition the nodes into clusters based on their learned embeddings (Cheng et al. 2022; Xia et al. 2023). By leveraging both the graph structure and node attributes, GNNs can effectively capture the inherent patterns and structures within the graph data, leading to improved clustering performance compared to traditional methods (Xing et al. 2023; Chen et al. 2022).

Meanwhile, GNNs have emerged as powerful tools for attributed graph clustering, leveraging both structural and attribute information to learn effective representations of graph data. Several GNN-based methods have been proposed for attributed graph clustering, each offering unique approaches to address the challenges associated with this task. While GNN-based methods for attributed graph clustering have shown promising results, they are not without their challenges (Liu et al. 16,17,; Liu et al. 2024). Some of the key challenges associated with existing methods based on GNN for attributed graph clustering include: neglect of global relationships, graph dispersion, vulnerability to noise, limited discriminative power, lack of edge awareness (Li et al. 2023; Li et al. 2023). Addressing these challenges requires the development of novel GNN-based methods that can effectively capture global relationships, mitigate the impact of noise, and enhance discriminative power in attributed graph clustering tasks (Yang et al. 2024). By overcoming these challenges, future GNN-based approaches have the potential to significantly improve the accuracy and robustness of attributed graph clustering in various real-world applications.

GNNs operate by iteratively aggregating information from neighboring nodes to learn representations of graph data (Wen et al. 2023; Fan et al. 2024). They have demonstrated impressive performance in graph clustering tasks, particularly for homophilic graphs where nodes with similar attributes tend to be densely connected. Homophilic graphs and heterophilic graphs represent different patterns of connectivity between nodes based on their attributes (Wen et al. 2023; Fan 2020; Xu et al. 2022). In homophilic graphs, nodes with similar attributes tend to be densely connected, meaning that nodes belonging to the same class or category are more likely to be connected to each other (Zhang et al. 2022; Chen et al. 2023). On the other hand, in heterophilic graphs, the majority of connections occur between nodes belonging to different classes or categories, resulting in a more dispersed and interconnected network structure (Fan 2020; Yin 2024). Traditional GNNs use message passing policy to represent data in homophilic graphs. However, traditional GNNs have two major challenges for data representation in heterophilic graphs (Wen et al. 2023; Fan 2020; Xia et al. 2023): (1) Local neighbors in a graph include nodes with close hops, while two nodes with semantic similarity in heterophilic graphs may be far apart; (2) Similar and dissimilar neighbors with different information cannot be identified in heterophilic graphs. According to these challenges, the use of GNNs to learn the representation of heterophilic graphs requires passing diverse messages. As a result, traditional GNN-based methods perform well on homophilic graphs, but are not always efficient on heterophilic graphs.

So far, several GNN methods have been presented to deal with these challenges in heterophilic graphs (Tsitsulin et al. 2023; Du et al. 2023). Some methods extend the neighbor fields, while others modify the GNN architecture (Xia et al. 2023; Cao et al. 2022). However, these methods are only suitable for heterophilic graphs with downstream tasks and have two basic limitations (Ugander and Yin 2023; Fan 2020): (1) Mechanisms such as graph rewiring, adaptive filter learning, and customized network training are applicable only on labeled samples, while this information is not available for the clustering problem; (2) GNN uses a single subspace for embedding, which has negative effects for heterophilic graphs when faced with both topological structure and attribute.

In clustering problem, which operates as an unsupervised learning technique, distinguishing between homophilic and heterophilic graphs is challenging. Meanwhile, it is challenging to recognize the graph type in terms of homophilic or heterophilic for real-world graphs (Zhu et al. 2024; Tang et al. 2024). Moreover, in most GCN-based models’ global relationships are neglected for clustering. Issues such as graph dispersion and vulnerability to noise should not be neglected (Wang et al. 2024c; Jin et al. 2024). Meanwhile, the absence of an edge between nodes does not necessarily mean that they are unrelated. In contrast, the generalizability of the representation may be reduced by considering all direct neighbors. Furthermore, in the propagation phase of GCN when nodes seek to identify related nodes, nodes with different labels can lead to distortion of graph embedding learning (Wen et al. 2023; Ugander and Yin 2023). Therefore, there is a need for novel GNN-based methods that can effectively handle both homophilic and heterophilic graphs while considering global relationships and addressing challenges such as graph dispersion and noise vulnerability.

To address these issues, we propose an Attribute Graph Clustering approach based on an Enhanced graph and a Reconstructed graph (AGCER). AGCER includes four key components: graph enhance, graph reconstruction, graph refine, and dual-guidance supervisor module. Here, a dual-guidance updating module is developed which includes two supervisors: an enhanced graph and a reconstructed graph. Enhanced graph has the ability to represent distant relationships between nodes. Also, reconstructed graph involves creating and combining homophilic and heterophilic graphs in order to explore both low and high frequency information by an unsupervised process. Using both enhanced and reconstructed graphs in the dual-guidance supervisor module can reduce the misleadingness of the sparse graph. Meanwhile, graph refine continuously improves dual-guidance by creating a new auxiliary graph. In dual-guidance, we include a subspace clustering module that can convert attribute-based embeddings into relationship-based ones.

Our main contributions to the development of attributed graph clustering are summarized as follows:

  • We introduce an enhanced graph and a graph refine module to improve graph representation learning.

  • An unsupervised strategy is applied to reconstruct the graph structure that includes both homophilic and heterophilic information.

  • A dual-guidance supervisor module is developed to map features from extracted graphs in attribute and structure subspaces.

  • A subspace clustering approach is applied to improve the dual-guidance supervisor module and increase the robustness of the shallow GCN.

The organization of the outline is as follows: Sect. 2 surveys related work on graph clustering. Section 3 describes the system model. Section 4 describes the proposed methodology. Section 5 describes the experiments and discusses the results. Finally, Sect. 6 concludes the paper.

2 Related works

Several researches related to graph clustering have been presented during the last decade, and we review some of them in this section. Early approaches to graph clustering primarily focused on leveraging structural properties of graphs to partition nodes into clusters (Ugander and Yin 2023; Fan 2020). However, with the proliferation of attributed graph data in various domains, researchers began exploring techniques that incorporate both structural and attribute information for more informative clustering (Du et al. 2023; Hu et al. 2024). This led to the development of attributed graph clustering techniques, which aim to learn representations that capture both the topological structure of the graph and the attributes associated with each node. Over the years, researchers have proposed a variety of methods for attributed graph clustering, including those based on low-rank representation, graph embedding, and community detection (Tsitsulin et al. 2023; Ugander and Yin 2023; Jia et al. 2024). Recent advancements have seen the emergence of GNNs as powerful tools for attributed graph clustering, enabling the integration of structural and attribute information through message passing and representation learning (Fan 2020; Zhou et al. 2023).

Graph clustering methodologies aim to organize nodes based on both node attributes and the topological structure of the graph. While certain techniques, such as node2vec (Grover 2016) and Graph Auto-Encoders (GAE) (Kipf 2016), offer approaches for learning embeddings suitable for traditional clustering methods, the resulting embeddings may lack specificity for capturing cluster-related features. Additionally, while S3GC (Devvrit et al. 2022) achieves promising outcomes on large-scale graphs through random walk-based sampling and a lightweight encoder, it incurs significant computational costs during training. In contrast, our model offers linear-time graph clustering capabilities with favorable performance. Embedding Graph Auto-Encoder (EGAE) (Zhang et al. 2022) introduces an optimizable soft k-means module as an additional decoder, enhancing the suitability of learned representations for clustering tasks. Moreover, recent advancements include the application of contrastive learning to attributed graph clustering. Hard Sample Aware Network (HSAN) (Liu et al. 2023) devises a dynamic sample weighting strategy to tackle the challenge of hard positive sample pairs, while Contrastive deep Clustering Graph network (CCGC) (Yang et al. 2023) leverages cluster information to enhance the quality of negative sample pairs. In (Kang 2024), the Complex Data Clustering (CDC) framework for graph clustering is introduced, which can cluster different data with linear complexity. CDC uses a graph filter to combine attribute information and geometric structure. Complexity is then reduced with high-quality anchors that are adaptively learned by a new similarity-preserving regularizer.

Sieranoja et al. (2022) proposed two unique methods for clustering networks and graphs. The first is referred to as the K-algorithm and is a direct offspring of the k-means algorithm. Analogous iterative local optimization is employed, albeit the tools are not required. It has similarities with k-means clustering in terms of both strong local optimization abilities and a tendency to reach a local optimum. The second, known as the M-algorithm, finds new and better local optima by iteratively improving the K-output algorithms. It continuously splits and joins random clusters, then fine-tunes the results using the K-algorithm. Both techniques are general since they can be used with different cost functions. Group recommendation approaches typically employ aggregation functions to represent the preferences of group members. Huang et al. (Huang et al. 2020) proposed a brand-new Multi Attention-based Group Recommendation Model (MAGRM) that took into account preference interactions between groups and their members as well as the social dynamics inside the groups.

Wu et al. (2023) developed Consensus learning behind interactions for group Recommendation (ConsRec). ConsRec employs diverse graph-level neural networks to obtain group behavioral representations, which are then aggregated using learnable influence mechanisms. For the purpose of learning static and dynamic group preferences in a consistent manner, Wang et al. (2021) presented a social-based multi-interactive group representation approach. In order to capture a group’s static and dynamic preferences, this approach introduced social homogeneity and social influence. Multiple user-item and group-hidden item interactions were then studied using bipartite graphs to represent the group. Furthermore, Chen et al. (2021) presented the Attentive MultiTask learning-based group Itinerary REcommendation model (AMT-IRE), which used the attention mechanism to dynamically learn the inner relationships among group members and derive consensus group preferences.

Wang et al. (2024a) introduced HyperGraph convolution networks for group Recommendation (HGRec). Specifically, the method’s design tries to create a high-order preference extraction view, represented by a hypergraph, in order to attain superior group preferences. An overlap graph is used to depict the fixed priority extraction perspective known as HGRec. Furthermore, HGRec is a bipartite graph representation of a traditional preference extraction view. This strategy uses cross-view contrastive learning to generate links between three distinct perspectives. Leng and Yu (2022) introduced a novel model to address the issue of group recommendation by using both global and local social networks. Within a global network, a user’s social influence disseminates through social relationships and impacts the preferences of others. Within a local network, individuals within a group may contribute in varying ways to reach a final decision, resulting in a dynamic process of negotiation and consensus. This approach involves the creation of global and local networks using two main components. Firstly, a Global Network Diffusion (GND) module based on an attentive GCN is used to model the spread of social influence and determine the social gate of each user. Secondly, a Local Network Fusion (LNF) module based on multi-channel attention is employed to understand the intricate decision-making process among group members and combine their inputs into a final representation of the group. Ultimately, two distinct Neural Collaborative Filtering (NCF) modules were introduced to represent group-item and user-item interactions, respectively, in order to mutually improve each other.

GNN-based methods for attributed graph clustering demonstrate the effectiveness of leveraging both structural and attribute information to learn representations that capture the inherent patterns and structures within graph data. For example, Marginalized Graph Auto-Encoder (MGAE) (Wang 2017) integrates the interaction between structure and attribute information to improve graph representation. It employs a denoising autoencoder framework, training layer by layer to capture the interplay between content and structure, leading to enhanced clustering performance. Deep Attentional Embedded Graph Clustering (DAEGC) (Wang 2019) incorporates an attention mechanism to create a graph with weighted neighbors, allowing for more effective representation learning. By attending to informative neighbors, DAEGC improves the discriminative power of the learned embeddings, leading to improved clustering performance. Multi-View Representation Graphs Learning (MVGRL) (Hassani 2020) leverages both local and global information for graph embedding by maximizing mutual information between node representations and sub-graph representations. This approach enables MVGRL to capture rich structural and attribute information, leading to more robust and informative embeddings for clustering. Automated Self-Supervised Learning (AutoSSL) (Jin 2021) employs a deep adaptive strategy to fuse multiple pretext tasks, improving the quality of graph representations. By adaptively selecting and combining pretext tasks, AutoSSL enhances the discriminative power of the learned embeddings, leading to improved clustering performance. By effectively integrating structural and attribute information, these methods offer promising solutions for various real-world applications, ranging from social network analysis to bioinformatics and beyond.

While GNN-based methods for attributed graph clustering have shown promising results, they are not without their challenges (Shi et al. 2023; He et al. 2024). These challenges include the neglect of global relationships, making them ineffective for graphs with sparse or disconnected regions; difficulty in capturing the inherent structure of graphs with dispersed connectivity patterns; vulnerability to noise in attributed graph data, leading to degraded clustering performance; limited discriminative power, especially in scenarios where nodes with different attributes are densely connected; and the lack of edge awareness, where all edges are treated equally during the message-passing process, neglecting the semantic significance of different edges (Yang et al. 2022; Peng et al. 2023). Overcoming these challenges requires the development of novel GNN-based methods that can effectively capture global relationships, mitigate the impact of noise, enhance discriminative power, and incorporate edge awareness, thereby improving the accuracy and robustness of attributed graph clustering in real-world applications (Wu et al. 2023).

3 System model

Let the input graph be denoted by \(\:G=(V,A,W,X)\), where \(\:V\) is the set of nodes, \(\:A\) is the topology structure of the graph, \(\:W\) is the similarity matrix, and \(\:X\) is the attribute matrix. In \(\:G\), there is a set of \(\:n\) nodes defined as \(\:V=\{{v}_{1},{v}_{2},…,{v}_{n}\}\). Also, \(\:G\) contains the adjacency matrix \(\:A={\left({a}_{i,j}\right)}_{n\times\:n}\) where \(\:{a}_{i,j}=1\) indicates that \(\:{v}_{i}\) and \(\:{v}_{j}\) are adjacent, and otherwise \(\:{a}_{ij}=0\). The similarity matrix \(\:W=COS(X,{X}^{T})\) is calculated based on the cosine similarity of node attributes, where \(\:{w}_{i,j}\) expresses the similarity between \(\:{v}_{i}\) and \(\:{v}_{j}\). Since the similarity matrix \(\:W\) is calculated by \(\:X\), it obviously contains noise. Hence, we update the similarity matrix by considering the \(\:k\) largest values ​​of each row. We set \(\:k=100\) for all datasets empirically. Meanwhile, we consider the normalized similarity matrix for the graph clustering task as \(\:\widehat{W}\).

$$\:{w}_{i,j}^{k}=\left\{\begin{array}{cc}{w}_{i,j}&\:if\:{w}_{i,j}\ge\:{w}_{i,k}^{r}\\\:0&\:\text{e}\text{l}\text{s}\text{e}\end{array}\right.$$
(1)
$$\:{\widehat{w}}_{i,j}=\frac{{w}_{i,j}^{k}}{\sum\:_{l=1}^{n}{w}_{i,l}^{k}}$$
(2)

where \(\:{w}_{i}^{r}=[{w}_{1}^{r},{w}_{2}^{r},\dots\:,{w}_{n}^{r}]\) is the \(\:i\)-th row of \(\:W\) matrix as sorted in descending order.

The attributes of node \(\:{v}_{i}\) in \(\:G\) are represented by the vector \(\:{x}_{i}\), where \(\:X=\{{x}_{1},{x}_{2},…,{x}_{n}{\}}^{T}\) represents the attribute matrix. The degree matrix is ​​defined by \(\:D=({d}_{1},{d}_{2},…,{d}_{n})\), where \(\:{d}_{i}={\sum\:}_{j=1}^{n}{a}_{i,j}\). Furthermore, let the normalized adjacency matrix \(\:\widehat{A}={D}^{-1}(I+A)\) and the graph Laplacian be defined by \(\:L=I-\widehat{A}\), where \(\:I\) is the identity matrix. Finally, suppose \(\:\widehat{1}\) is a matrix with all elements equal to 1.

4 Methodology

Most of the existing GCN-based graph clustering methods have limitations: ignore global relations, inability to deal with sparse graphs, considering the connection of two nodes only based on the explicit edge between them, failure to simultaneously consider both attribute and structure spaces, vulnerability in facing noisy graphs as well as the generalizability of the finite representation. To address these issues, we propose the Attribute Graph Clustering approach based on an Enhanced graph and a Reconstructed graph (AGCER). Figure 1 shows the overview of the proposed methodology. Also, the pseudo-code of AGCER is shown in Algorithm 1.

Fig. 1
figure 1

An overview of the proposed methodology for graph clustering

AGCER addresses key challenges encountered by existing methods through a multi-phase process. It leverages an enhanced graph module to capture distant relationships, a graph reconstruction phase to integrate both homophilic and heterophilic information, and a dual-guidance supervisor module for perform graph clustering. Also, a refined graph in AGCER is designed to improve graph representation learning with the help of the dual-guidance supervisor module. Additionally, AGCER incorporates a subspace clustering module to enhance the robustness of the shallow GCN. Through these components, AGCER demonstrates promising performance in handling sparse graphs, mitigating noise vulnerability, and improving clustering accuracy, showcasing its potential for diverse real-world applications.

Algorithm 1. Attribute graph clustering based on enhanced graph and reconstructed graph

 

Input: Graph \(\:G=(V,A,W,X)\), cluster number \(\:K\), refine interval \(\:\tau\:\), iteration number \(\:T\), \(\:\alpha\:\), \(\:\beta\:\), \(\:\gamma\:\), \(\:\delta\:\), \(\:\omega\:\), \(\:\xi\:\).

 

Output: Matrix \(\:C\) as clusters.

1:

Establishing \(\:\widehat{W}\) using Eqs. (1) and (2).

2:

\(\:\text{E}\text{s}\text{t}\text{a}\text{b}\text{l}\text{i}\text{s}\text{h}\text{i}\text{n}\text{g}\:\widehat{A}\:\text{b}\text{y}\:{D}^{-1}(I+A).\)

3:

// Enhanced graph module

4:

Build enhanced graph \(\:E\) by Eq. (3).

5:

// Reconstructed graph

6:

Extracting the homophilic graph \(\:S\) using Eqs. (57).

7:

Extracting the heterophilic graph \(\:H\) using Eq. (8).

8:

Establishing \(\:S\) by normalizing \(\:\widehat{S}\).

9:

Establishing \(\:H\) by normalizing \(\:\widehat{H}\).

10:

Build reconstructed graph \(\:R\) by Eq. (9).

11:

for\(\:t\) = 1 to \(\:T\)do

12:

// Refined graph

13:

Build refined graph \(\:\mathcal{R}\mathcal{F}\) by Eq. (12).

14:

// Dual-guidance supervisor module

15:

Create the embeddings \(\:{Z}_{E}\) using Eq. (13) and \(\:{Z}_{R}\) using Eq. (14).

16:

Establishing reconstructed graph \(\:{A}_{\mathcal{R}}\) with \(\:{Z}_{E}\) and \(\:{Z}_{R}\) using Eq. (15).

17:

Calculation of loss associated with reconstructed graph by Eq. (16).

18:

Calculation of loss associated with refined graph by Eq. (18).

19:

Calculation of loss function to minimize dual-guidance module by Eq. (19).

20:

Obtaining the \(\:C\) matrix by a subspace clustering module.

21:

Calculation of loss associated with self-expressive learning by Eq. (17).

22:

if

\(\:t\) mod \(\:\tau\:\) = 0 then

23:

Build affinity graph \(\:{\widehat{A}}_{C}\) with \(\:C\) by Eqs. (10) and (11).

24:

Renew refined graph \(\:\mathcal{R}\mathcal{F}\) using Eq. (12).

25:

end

26:

Update the final loss function by Eq. (20).

27:

end

28:

Return Matrix \(\:C\).

4.1 Enhanced graph

The representation of the graph \(\:G\) by the normalized adjacency matrix \(\:\widehat{A}\) focuses only on local relationships. Therefore, the propagation phase to identify neighbors using \(\:\widehat{A}\) only includes 1-hop neighbors. To improve the representation, we construct an enhanced graph based on the filtered and normalized similarity matrix \(\:\widehat{W}\). Figure 2 shows the architecture of enhanced graph. Here, the enhanced graph \(\:E\) is created through the weighted combination of \(\:\widehat{A}\) and \(\:\widehat{W}\):

$$\:E=\alpha\:.\widehat{W}+\left(1-\alpha\:\right).\widehat{A}$$
(3)

where \(\:\alpha\:\) is an adjustable coefficient to account for the effect of \(\:\widehat{W}\) on \(\:\widehat{A}\). According to this definition, \(\:E\) considers global relationships and can provide a wider neighborhood range.

Fig. 2
figure 2

Architecture of enhanced graph. The enhanced graph \(\:E\) is created through the weighted combination of the normalized adjacency matrix \(\:\widehat{A}\) and the normalized similarity matrix \(\:\widehat{W}\)

4.2 Reconstructed graph

Since it is not possible to determine whether a graph is homophilic or heterophilic in an unsupervised problem, we present a holistic model that includes a reconstructed graph. We create the reconstructed graph based on the homophilic or heterophilic graphs that are extracted from the original graph. Pan and Kang (2023) used an optimization approach based on minimizing the distance between adjacent nodes for homophilic extraction, which is defined as follows:

$$\:\underset{{S}_{i}}{\text{min}}\sum\limits_{j=1}^{n}{s}_{i,j}.{k}_{i,j}+{s}_{i,j}^{2}$$
(4)

where \(\:{S}_{i}\) is the \(\:i\)-th row of matrix \(\:S\) initialized with \(\:\widehat{A}\) and \(\:{k}_{i,j}\) is the Euclidean distance between nodes \(\:{v}_{i}\) and \(\:{v}_{j}\) based on their corresponding attribute vector.

In general, the high similarity of two nodes leads to a difference in the homophily ratio and thus affects the message propagation path. This negative effect includes two issues: (1) the highest similarity is for pairs of nodes that are present in both 1-hop and 2-hop neighbor sets; (2) The lowest similarity is for pairs of nodes that are present in the set of 2-hop neighbors but are not observed in the set of 1-hop neighbors. Hence, in (Pan 2023), a mechanism was developed to merge 1-hop and 2-hop neighborhood relationships to improve the homophily ratio. Here, assuming\(\:{S}^{\left(2\right)}=S\times\:S\) as a 2-hop graph, the following optimization problem is defined to extract a homophilic graph:

$$\:\begin{array}{c}\underset{{s}_{i,j}}{\text{min}}{s}_{i,j}.{k}_{i,j}+{s}_{i,j}^{2}+{\left({s}_{i,j}^{\left(2\right)}-{s}_{i,j}\right)}^{2},\\\:\text{s}.\text{t}.\:\:\:{s}_{i,j}>0,\:\:\sum\limits_{j=1}^{n}{s}_{i,j}=1\end{array}$$
(5)

This optimization problem has been solved by Pan and Kang (2023) based on Lagrangian and assuming the removal of self-loops from the graph as follows:

$$\:{s}_{i,j}=\text{max}\left(\left[\frac{2{s}_{i,j}^{\left(2\right)}+{\lambda\:}_{j}-{k}_{i,j}-2\sum\limits_{l\ne\:j}{s}_{j,l}.{C}_{l}}{2(2+\sum\limits_{l\ne\:i}{s}_{i,l}^{2})}\right],0\right)$$
(6)

where \(\:{C}_{l}\) is equal to \(\:{s}_{i,l}^{\left(2\right)}-{s}_{i,j}.{s}_{j,l}-{s}_{i,l}\) if \(\:i\ne\:l\), and otherwise \(\:{C}_{l}=0\). Also, \(\:{\lambda\:}_{j}\) is defined as follows and can be solved using gradient descent algorithm.

$$\:\underset{{\lambda\:}_{j}}{\text{min}}\sum\:_{j=1}^{n}\text{max}\left(\left[\frac{2{s}_{i,j}^{\left(2\right)}+{\lambda\:}_{j}-{k}_{i,j}-2\sum\:_{l\ne\:j}{s}_{j,l}.{C}_{l}}{2(2+\sum\:_{l\ne\:j}{s}_{j,l}^{2})}\right],0\right)-\widehat{1}$$
(7)

In addition, to derive a heterophilic graph, we define all nodes that are far from each other in both topology and attribute spaces as negative pairs. Therefore, the heterophilic graph based on \(\:\widehat{A}\) and \(\:\widehat{W}\) is created as follows:

$$\:H=\left(\widehat{1}-\widehat{W}\right)\odot\:\left(\widehat{1}-\widehat{A}\right)$$
(8)

where \(\:\odot\:\) as Hadamard product includes description of non-adjacent relationships in topology and attribute spaces. Obviously, \(\:H\)may be dense, so only 5 edges for each node are considered to be top 5 dissimilar nodes (Pan 2023).

Let \(\:\widehat{S}\) and \(\:\widehat{H}\) be the normalized matrices of \(\:S\) and \(\:H\), respectively. Considering the graphs \(\:\widehat{S}\) and \(\:\widehat{H}\), the reconstructed graph \(\:R\) is defined. Figure 3 shows the architecture of reconstructed graph. Graph \(\:R\) is defined as a combination filter of low-pass and high-pass representations as follows:

$$\:R=\beta\:.{\left(\frac{\widehat{H}}{2}\right)}^{k}.X+\left(1-\beta\:\right).(I-\frac{\widehat{S}}{2}{)}^{k}.X$$
(9)

where \(\:\beta\:\) is an adjustable coefficient to consider the effect of low-pass representation compared to high-pass representation.

Fig. 3
figure 3

Architecture of reconstructed graph. The reconstructed graph \(\:R\) is created through the weighted combination of the normalized homophilic graph \(\:\widehat{S}\) and the normalized heterophilic graph \(\:\widehat{H}\). Graph \(\:R\) is defined as a combination filter of low-pass and high-pass representations

4.3 Refined graph

Let the self-expression coefficient of the spectral clustering by matrix \(\:C\) be available when running the dual-guidance module, where its details will be explained in the next subsection. Although \(\:\widehat{W}\) provides an acceptable level of communication between nodes, \(\:C\) can capture the overall level of communication between distant nodes with better performance. Meanwhile, \(\:C\) facilitates dual-guidance quality improvement compared to \(\:\widehat{A}\). Therefore, we create a refined graph to improve the performance of the dual-guidance module. Let \(\:\stackrel{\sim}{W}=COS(C,{C}^{T})\) be the updated similarity value considering the \(\:C\) matrix. The similarity matrix is ​​updated by considering the \(\:k\) largest values ​​of each row, where \(\:k\) is set to 100. Accordingly, we construct the graph \(\:{\widehat{A}}_{C}\) regularly as follows:

$$\:{\stackrel{\sim}{w}}_{i,j}^{k}=\left\{\begin{array}{cc}{\stackrel{\sim}{w}}_{i,j}&\:if\:{\stackrel{\sim}{w}}_{i,j}\ge\:{\stackrel{\sim}{w}}_{i,k}^{r}\\\:0&\:\text{e}\text{l}\text{s}\text{e}\end{array}\right.$$
(10)
$$\:{\widehat{A}}_{C}=\frac{{\stackrel{\sim}{w}}_{i,j}^{k}}{\sum\:_{l=1}^{n}{\stackrel{\sim}{w}}_{i,l}^{k}}$$
(11)

Finally, the refined graph \(\:\mathcal{R}\mathcal{F}\) is created based on \(\:\widehat{W}\) and \(\:{\widehat{A}}_{C}\) matrices. Figure 4 shows the architecture of refined graph.

in each predefined iteration as follows:

$$\:\mathcal{R}\mathcal{F}=\gamma\:.{\widehat{A}}_{C}+\left(1-\gamma\:\right).\widehat{W}$$
(12)

where \(\:\gamma\:\) is an adjustable coefficient to account for the effect of \(\:{\widehat{A}}_{C}\) on \(\:\widehat{W}\).

Fig. 4
figure 4

Architecture of refined graph. The refined graph \(\:\mathcal{R}\mathcal{F}\) is created through the weighted combination of the similarity graph \(\:{\widehat{A}}_{C}\) with \(\:k\) largest values and the normalized similarity matrix \(\:\widehat{W}\)

4.4 Dual-guidance supervisor module

In this subsection, the dual-guidance supervisor module is applied to perform graph clustering. This module includes a GCN with two unshared encoders and a decoder. Here, dual-guidance is implemented based on Vanilla Graph Auto-Encoder (VGAE) (Yang et al. 2024). The encoder in VGAE includes a 2-layer GCN, while the decoder is designed based on an inner-product operation. It is worth noting that all GCNs in VGAE are configured based on Multi-Layer Perceptron (MLP) neural network (Talatian Azad et al. 2022; Huang et al. 2024). We train the MLP encoder by the Adam optimizer. Considering the two existing encoders, VGAE performs the mapping in two spaces of attribute and structure, where this reduces the interaction between attribute and structure in heterophilic graphs. Let the encoder\(\:{\mathbb{E}}_{{\theta\:}_{E}}\) perform the mapping in the attribute space based on the enhanced graph \(\:E\). Also, let \(\:{\mathbb{E}}_{{\theta\:}_{R}}\) map the structure in space based on the reconstructed graph \(\:R\). During training in both \(\:{\mathbb{E}}_{{\theta\:}_{E}}\) and \(\:{\mathbb{E}}_{{\theta\:}_{R}}\), graph \(\:\widehat{A}\) is used as a supervisor, while graphs \(\:E\) and \(\:R\) are used for aggregation. VGAE merges \(\:X\) attributes with \(\:E\) and \(\:R\) graphs to produce the following representations:

$$\:Z_{E}^{{l + 1}} = {\mathbb{E}}_{{\theta \:_{E} }} \left\langle {\Delta \:\left( {E.Z_{E}^{l} .\psi \:} \right)} \right\rangle$$
(13)
$$\:Z_{R}^{{l + 1}} = {\mathbb{E}}_{{\theta \:_{R} }} \left\langle {\Delta \:\left( {R.Z_{R}^{l} .\psi \:} \right)} \right\rangle$$
(14)

where \(\:l\) is the number of layers, \(\:\psi\:\) is the encoder parameters, \(\:\varDelta\:\) is an activation function, and \(\:Z\) is the representation in a layer initialized with \(\:X\). Here, \(\:{Z}_{E}\) and \(\:{Z}_{R}\) are the encoder outputs \(\:{\mathbb{E}}_{{\theta\:}_{E}}\) and \(\:{\mathbb{E}}_{{\theta\:}_{R}}\), respectively.

The decoder can provide reconstructed attributes. After encoding, VGAE applies an inner-product operation as a decoder to produce the reconstructed graph \(\:{A}_{\mathcal{R}}\).

$$\:{A}_{\mathcal{R}}=\sigma\:\left({Z}_{E}.{{Z}_{E}}^{T}+{Z}_{R}.{{Z}_{R}}^{T}\right)$$
(15)

We adopt error optimization by considering the approximation of both the reconstructed graph \(\:{A}_{\mathcal{R}}\) and the given graph \(\:\widehat{A}\), where represents the objective of reconstruction in the training process.

$$\:{\mathcal{L}}_{\mathcal{G}}=\frac{1}{2n}\sum\:{\left(\widehat{A}-{A}_{\mathcal{R}}\right)}^{2}$$
(16)

To obtain clustering results in AGCER, a subspace clustering module based on self-expressive learning is applied. The subspace clustering used in this study follows (Ji 2014). This module is configured in the encoder output and can improve the robustness of graph embedding learning. Clustering is done based on a spectral clustering algorithm. Let\(\:C\) be defined as self-expression coefficient for spectral clustering. In matrix \(\:C\), all elements are empirically initialized with 10−8. The result of spectral clustering is obtained by applying it to a final affinity graph, which is built by \(\:\left|C\right|+\left|{C}^{T}\right|\). The process is as follows:

$$\:\begin{array}{c}{\mathcal{L}}_{\mathcal{C}}=\delta\:\sum\:{\left({Z}_{E}-C{Z}_{E}\right)}^{2}+\left(1-\delta\:\right)\sum\:{\left({Z}_{R}-C{Z}_{R}\right)}^{2}+\sum\:{C}^{2},\\\:\text{s}.\text{t}.\:\:\:Diag\left(C\right)=0\end{array}$$
(17)

where \(\:\delta\:\) is an adjustable coefficient to account for the effect of \(\:{Z}_{E}\) on \(\:{Z}_{R}\), and \(\:Diag\left(C\right)\) is a vector of diagonal elements in \(\:C\).

To improve the observer quality, we created the \(\:\mathcal{R}\mathcal{F}\) graph for the dual-guidance configuration. Meanwhile, in addition to \(\:{\mathcal{L}}_{\mathcal{G}}\), we also force \(\:{A}_{\mathcal{R}}\) to approximate the \(\:\mathcal{R}\mathcal{F}\) graph during training.

$$\:{\mathcal{L}}_{\mathcal{R}}=\frac{1}{2n}\sum\:{\left(\mathcal{R}\mathcal{F}-{A}_{\mathcal{R}}\right)}^{2}$$
(18)

Considering two loss functions \(\:{\mathcal{L}}_{\mathcal{G}}\) and \(\:{\mathcal{L}}_{\mathcal{R}}\), the dual-guidance module is minimized jointly:

$$\:{\mathcal{L}}_{rec}={\mathcal{L}}_{\mathcal{G}}+\omega\:.{\mathcal{L}}_{\mathcal{R}}=\frac{1}{2n}\sum\:{\left(\widehat{A}-{A}_{\mathcal{R}}\right)}^{2}+\frac{\omega\:}{2n}\sum\:{\left(\mathcal{R}\mathcal{F}-{A}_{\mathcal{R}}\right)}^{2}$$
(19)

where \(\:\omega\:<1\) is an adjustable coefficient to account for the effect of \(\:{\mathcal{L}}_{\mathcal{G}}\) on \(\:{\mathcal{L}}_{\mathcal{R}}\).

Briefly, the dual-guidance supervisor module seeks to optimize a fusion between \(\:{\mathcal{L}}_{rec}\) and \(\:{\mathcal{L}}_{\mathcal{C}}\). This optimization function for training should be minimized to obtain the clustering result for each node.

$$\:{\mathcal{L}}_{total}=\xi\:.{\mathcal{L}}_{rec}+\left(1-\xi\:\right).{\mathcal{L}}_{\mathcal{C}}$$
(20)

where \(\:\xi\:\) is a balance factor in total loss between \(\:{\mathcal{L}}_{rec}\) and \(\:{\mathcal{L}}_{\mathcal{C}}\). We set \(\:\xi\:=0.5\) for all datasets empirically.

4.5 Computational complexity

The computational complexity of our model can be analyzed by considering the primary phases involved in the process. In the graph enhancement phase, the model constructs an auxiliary graph by processing the adjacency and feature matrices, which involves operations with a time complexity of \(\:O\left({n}^{2}\right)\), where \(\:n\) is the number of nodes. Next, in the graph reconstruction phase, the model generates and integrates homophilic and heterophilic graphs. This phase also involves matrix operations, including matrix multiplication and combination, leading to a time complexity of \(\:O\left({n}^{2}\right)\) when using dense matrices, though this can be reduced with sparsity optimizations. In the graph refinement phase, the model iteratively improves the auxiliary graph, which includes both feature extraction and subspace clustering. The complexity here depends on the number of iterations \(\:T\) and the dimensionality of the feature space \(\:d\), resulting in a complexity of \(\:O(T.n.{d}^{2})\). Finally, the dual-guidance supervisor module combines the enhanced and reconstructed graphs, which involves mapping features to subspaces and clustering them. The complexity of this step is dominated by the clustering process, which can vary but generally falls within \(\:O\left({n}^{2}\right)\) to \(\:O\left(n\text{log}{n}^{2}\right)\), depending on the clustering algorithm used. Given the iterations and the operations within each step, the total computational complexity of our model is influenced by both the number of nodes and the number of iterations, making it scalable but computationally intensive for very large graphs.

5 Experiments

In this section, a series of experiments are presented to evaluate AGCER. We show that the proposed model has better performance compared to existing baseline and state-of-the-art methods. In the rest of this section, we introduce all the datasets used, discuss the experimental setup, describe the evaluation metrics, present the benchmark approaches for the comparison work, and then we analyze some components the proposed model. Finally, the results of the experiments for homophilic and heterophilic graphs are reported.

5.1 Datasets

For the homophilic graph, we conducted graph clustering experiments on four widely used network datasets: CORA, ACM, AMAP, and DBLP. The attributes in CORA and ACM datasets are represented as binary word vectors, while in AMAP and DBLP datasets, nodes are linked to tf-idf weighted word vectors (Ji 2014). For the heterophilic graph, four datasets are chosen: Cornell, Washington, Squirrel, and Roman-Empire. Cornell and Washington are associated with web-graphs collected by Carnegie Mellon University, which are extracted from computer science departments at several universities (Sun et al. 2024). Squirrel contains page-page networks with specific topics, sourced from Wikipedia. Also, Roman-Empire is related to the Roman Empire article on Wikipedia. In this study, the homophily ratio is according to (Wang et al. 2024a). Table 1 provides statistical information for each of the four datasets associated with hemophilic graphs as well as the four datasets associated with heterophilic graphs.

Table 1 Details of the dataset

5.2 Experimental setup

All experiments were conducted on a desktop computer equipped with an Intel®Core™ i7-3.30 GHz CPU, an NVIDIA GeForce RTX 3050 Ti GPU (graphics processing unit), 16 GB DDR4 RAM, and running the Windows 11 Pro operating system. The GCN used for the implementation consists of a two-layer encoder, where the output for each layer is < input dimension>-<512>-<16> (Zhou et al. 2021; Guo et al. 2024). The activation function\(\:{\Delta\:}\)in both decoders is LeakRelu and the initial graph weights are assigned based on Xavier Uniform (Yang 2023; Xia et al. 2019). The settings of all methods used for comparisons follow their respective papers (Cui 2020; Zhu et al. 2024). Also, we present the average results from all 10 experiments to be reliable. Our model includes several hyperparameters: number of epochs, interval rate, dropout rate, learning rate,\(\:\alpha\:\), \(\:\beta\:\), \(\:\gamma\:\), \(\:\delta\:\), and \(\:\omega\:\). The values ​​of this hyperparameter experimentally for different data sets are given in Table 2.

Table 2 Details on hyper-parameters

Meanwhile, the dual-guidance supervisor module seeks to optimize a fusion between \(\:{\mathcal{L}}_{rec}\) and \(\:{\mathcal{L}}_{\mathcal{C}}\). Here, \(\:\xi\:\) is a balance factor in total loss between \(\:{\mathcal{L}}_{rec}\) and \(\:{\mathcal{L}}_{\mathcal{C}}\). A thorough sensitivity analysis of the balance factor is crucial for understanding its influence on the performance and robustness of your model. This analysis can show the stability of our model and identify the optimal settings for the balance factor. According to the results of Fig. 5, the optimal value for \(\:\xi\:\) is equal to 0.5.

Fig. 5
figure 5

Analysis of the optimal value for \(\:\xi\:\)

5.3 Evaluation metrics

We adopt ACCuracy (ACC), Average Rand Index (ARI), Normalized Mutual Information (NMI), and runtime as evaluation metrics in all experiments (Wu et al. 2024; Wei et al. 2024). These metrics are known as internal measures and are used when true labels are available. Clustering evaluation metrics assess the quality of clustering results. ACC measures the proportion of correctly assigned data points (Li et al. 2022; Wang et al. 2024b). In unsupervised learning, the objective is to identify the label permutation that maximizes the alignment between predicted and true labels, thereby evaluating clustering accuracy based on the optimal label matching.

$$\:ACC\:=\underset{\rho\:\in\:\mathcal{M}}{\text{max}}\frac{1}{n}\sum\:_{i=1}^{n}\left\{\begin{array}{cc}1&\:{y}_{i}=\rho\:\left({\widehat{y}}_{i}\right)\\\:0&\:\text{o}\text{t}\text{h}\text{e}\text{r}\text{w}\text{i}\text{s}\text{e}\end{array}\right.$$
(21)

where \(\:{y}_{i}\) and \(\:{\widehat{y}}_{i}\) denote the true and estimated labels, respectively. Additionally, \(\:\mathcal{M}\) is an abbreviation for the complete collection of permutations.

ARI measures the degree of agreement between the cluster assignments produced by a clustering algorithm and the true labels, while accounting for the expected agreement due to random chance (Liu and Xu 2024; Bothorel et al. 2015). ARI is a modified version of the Rand Index (RI) that ensures a value close to 0, where RI calculates the proportion of correct clustering decisions. While RI ranges from 0 to 1, ARI may potentially yield negative values.

$$\:ARI\left({P}^{*},P\right)=\frac{\sum\:_{ij}\left(\begin{array}{c}{n}_{ij}\\\:2\end{array}\right)-\left[\sum\:_{i}\left(\begin{array}{c}{n}_{i}\\\:2\end{array}\right)\sum\:_{j}\left(\begin{array}{c}{n}_{j}\\\:2\end{array}\right)\right]/\left(\begin{array}{c}n\\\:2\end{array}\right)}{0.5\left[\sum\:_{i}\left(\begin{array}{c}{n}_{i}\\\:2\end{array}\right)+\sum\:_{j}\left(\begin{array}{c}{n}_{j}\\\:2\end{array}\right)\right]-\left[\sum\:_{i}\left(\begin{array}{c}{n}_{i}\\\:2\end{array}\right)+\sum\:_{j}\left(\begin{array}{c}{n}_{j}\\\:2\end{array}\right)\right]/\left(\begin{array}{c}n\\\:2\end{array}\right)}$$
(22)

where \(\:{P}^{\text{*}}\) represents the estimated partition, \(\:P\) represents the real partition, \(\:{n}_{i0}\) refers to the number of samples in the \(\:i\)-th cluster of partition \(\:P\), \(\:{n}_{\text{*}j}\) refers to the number of samples in the \(\:j\)-th cluster of partition \(\:{P}^{\text{*}}\), and \(\:{n}_{i,j}\) represents the number of samples shared by the clusters \(\:{c}_{i}\in\:P\) and \(\:{c}_{j}\in\:{P}^{\text{*}}\).

NMI assesses the agreement between true and predicted partitions, accounting for cluster label permutations (Li et al. 2022; Bothorel et al. 2015). In fact, the NMI quantifies the mutual information entropy between the cluster labels and ground truth labels, and then applies a normalizing procedure. NMI between real partition\(\:P\) and estimated partition \(\:{P}^{\text{*}}\) can be defined as:

$$\:NMI\left({P}^{*},P\right)=\frac{-2\sum\:_{i}\sum\:_{j}{n}_{ij}.\text{log}\left(\frac{n.{n}_{ij}}{{n}_{i}.{n}_{j}}\right)}{\sum\:_{i}{n}_{i}.\text{log}\left(\frac{{n}_{i}}{n}\right)+\sum\:_{j}{n}_{j}.\text{log}\left(\frac{{n}_{j}}{n}\right)}$$
(23)

Finally, runtime measures the computational efficiency of the clustering algorithm in seconds. Runtime metrics play a crucial role in assessing the computational complexity of graph clustering methods. By measuring the time taken for the algorithm to execute, runtime metrics provide insights into the efficiency and scalability of the clustering approach. This information is essential for determining the feasibility of applying the method to large-scale datasets or real-time applications.

In addition to these metrics, we use the Wilcoxon Signed-Rank Test (WSRT) (Mirzaei et al. 2008) to perform a significance test at a significance level of 0.05 (as\(\:p\)-value). The WSRT is used to estimate the difference between each set of matched pairs, making it useful for identifying statistically significant differences between the results of various graph clustering methods. In WSRT, the null hypothesis (H₀) states that there is no significant difference between the paired samples, meaning the median difference between the pairs is zero. In contrast, the alternative hypothesis (H₁) posits that there is a significant difference between the paired samples, implying the median difference is not zero. If the test yields a \(\:p\)-value less than the chosen significance level (e.g., 0.05), the null hypothesis is rejected in favor of the alternative hypothesis, indicating a statistically significant difference between the paired samples.

5.4 Benchmark approaches

To demonstrate the superiority of our model, we adopt six baseline and state-of-the-art methods for performance comparison. Specifically, the details of these methods are as follows:

  • GAE (Kipf 2016): Graph Auto-Encoders.

  • EGAE (Zhang et al. 2022): Embedding Graph Auto-Encoder.

  • MVGRL (Hassani 2020): Multi-View Representation Graphs Learning.

  • AutoSSL (Jin 2021): Automated Self-Supervised Learning.

  • HSAN (Liu et al. 2023): Hard Sample Aware Network.

  • CCGC (Yang et al. 2023): Contrastive deep Clustering Graph network.

  • CDC (Kang 2024): Complex Data Clustering.

5.5 Model analysis

In this subsection, some components and parameters of the proposed model are analyzed. Figure 6 depicts the NMI scores for graph clustering across four datasets, each with a different number of MLP layers. The results demonstrate that AGCER consistently outperforms GAE across most scenarios while employing fewer parameters. Moreover, GAE’s efficiency notably diminishes with increasing layer count, whereas AGCER exhibits a more consistent performance. This phenomenon can be attributed to the challenge of effectively training multiple weight matrices when stacking numerous graph convolution layers. Additionally, the depth of the network influences training efficiency.

Fig. 6
figure 6

Comparison of the baseline GAE method with our AGCER in terms of different number of layers

For the unsupervised learning, the choosing of good hyper-parameters is extremely difficult. We set these hyper-parameters empirically in the proposed model. By the sensitivity analysis for these hyper-parameters, we can see our model are not sensitive to them within a certain range. The sensitivity analysis for the hyper-parameters is shown from Figs. 7, 8 and 9, where ‘Dropout’ denotes the dropout rate, ‘Interv’ denotes the number of updating, ‘Epochs’ denotes the number training iteration. This process happens in every \(\:\left\lfloor {\frac{{epochs}}{{interv}}} \right\rfloor\) iterations.

Fig. 7
figure 7

The sensitivity of AGCER with the variation of Dropout

Fig. 8
figure 8

The sensitivity of AGCER with the variation of Interv

Fig. 9
figure 9

The sensitivity of AGCER with the variation of Epochs

In general, \(\:t\)-SNE (Stochastic Neighbor Embedding), is a dimensionality reduction technique primarily used for visualizing high-dimensional data in a lower-dimensional space (Van der Maaten and Hinton 2008). The key idea behind\(\:t\)-SNE is to map high-dimensional data points to a lower-dimensional space while preserving their local and global structure as much as possible. The algorithm works by first constructing a probability distribution over pairs of high-dimensional data points, where the probability of two points being neighbors is proportional to their similarity. Next, \(\:t\)-SNE constructs a similar probability distribution over pairs of points in the low-dimensional space. The goal is to minimize the difference between these two probability distributions using a cost function such as the Kullback-Leibler divergence. This process encourages nearby points in the high-dimensional space to remain close together in the low-dimensional space, while pushing apart points that are dissimilar. In order to reveal the underlying clustering structure, we utilize \(\:t\)-SNE to extract the distribution of embeddings \(\:Z\). To conduct our experiments, we apply the \(\:t\)-SNE algorithm to the CORA, ACM, AMAP, and DBLP datasets. Our findings show that our model produces clearer cluster boundaries compared to alternative methods.

5.6 Results in homophilic graph

This subsection compares the simulation results for different methods in homophilic graphs. Tables 3, 4 and 5 presents the performance comparison of our proposed model and seven other baseline and state-of-the-art approaches across four homophilic datasets. Table 3 is the results related to ACC, while Tables 4 and 5 show the results related to ARI and NMI, respectively. Bold values in these tables represent the best results for each metric. The results unequivocally demonstrate the superiority of our model over methods that exclusively utilize node attributes for representation learning. This performance discrepancy can be attributed to our model’s adept utilization of both structural and attribute information. Unlike GAE and EGAE, which rely solely on attributes and thus suffer from limited generalization capability, our proposed model achieves enhanced performance. Additionally, our model outperforms shallow GCN models like AutoSSL and CCGC, which incorporate only attribute or structure, by effectively leveraging both aspects. However, these GCN-based models are vulnerable to noise, thereby limiting their potential for improvement. By integrating GCN with a subspace clustering module, our model enhances the robustness of representations while maintaining discriminability.

Table 3 Experimental results of node clustering based on ACC metric in homophilic graphs
Table 4 Experimental results of node clustering based on ARI metric in homophilic graphs
Table 5 Experimental results of node clustering based on NMI metric in homophilic graphs

The proposed model surpasses deep graph clustering models MVGRL, HSAN, and CDC, while demonstrating comparable results to CDC, which ranks as the second most effective approach. Compared to CDC, our model exhibits superior performance, showing improvements of 3.3% and 2.6% in ARI, and NMI, respectively. However, our model has 0.9% poorer results than CDC in terms of average ACC. These approaches effectively balance discriminativeness and robustness in representation learning by integrating auto-encoder with GCN, leading to significantly enhanced performance. However, they lack a reliable supervisor for training despite reducing excessive smoothness. In contrast, AGCER is specifically designed to address these challenges by implementing advanced techniques such as enhanced, reconstructed, and refined graph mechanisms, aiming to bolster system resilience and provide a more dependable supervisor for representation learning. In most cases, the proposed model outperforms CCGC and CDC, although CDC achieves better ACC on ACM and AMAP datasets, while CCGC attains the highest NMI on AMAP datasets. These strategies aim to enhance representation quality through contrastive learning, maintaining the ability to distinguish between data points. However, existing models are constrained by noise, while the proposed model can acquire noise-resistant representations. Overall, compared to GAE, EGAE, MVGRL, AutoSSL, HSAN, CCGC, and CDC, the proposed model exhibits superior performance, with improvements of 41.2%, 35.1%, 13.9%, 9.7%, 13.8%, 6.8% and − 0.9% in average ACC, respectively. This superiority in terms of average ARI is 98.7%, 79.5%, 19.6%, 9.7%, 14.6%, 6.5%, and 3.3%, respectively. In addition, AGCER is 57.3%, 48.5%, 20.0%, 6.2%, 13.5%, 5.5%, and 2.6% better than other methods in terms of average NMI.

In this experiment, the \(\:p\)-value is considered equal to 0.05, which indicates the probability of WSRT for each algorithm. Here, the \(\:p\)-value for each method compared to AGCER is presented. The WSRT test results show that AGCER is statistically significantly better than all other methods except AutoSSL. Meanwhile, the \(\:p\)-value results show that the difference between the our AGCER and AutoSSL in the ACM and AMAP datasets is not significant and as a result, null hypothesis is rejected.

In another experiment, we perform an ablation study considering several scenarios to evaluate the effectiveness of each component of the proposed model. These scenarios show the use or non-use of Enhanced Graph (EG), Reconstructed Graph (RG), and Refined Graph (RF) components in the proposed AGCER model. Table 6 shows the details of the defined scenarios. The results of this experiment for the four available datasets in terms of ACC, ARI and NMI metrics are reported in Table 7. Bold values ​​represent the best results for each metric. The results confirm the importance and effectiveness of all three components EG, RG, and RF in improving graph clustering. From the obtained results, it can be seen that the removal of different components reduces the performance of AGCER to different degrees. According to the findings, AGCER tends to continuously improve in performance with the addition of a new component. We found that the improvement of the RF component is more obvious than the others. The original AGCER (i.e. scenario 8) has significant performance compared to the version without RF component (i.e. scenario 7), where ACC shows 3.9%, 7.5%, 17.2%, and 24.2% superiority in CORA, ACM, AMAP, and DBLP datasets, respectively. Also, the improvement of the results by considering the EG component confirms the effectiveness of introducing an auxiliary graph to facilitate the supervisor. Meanwhile, the improvement of the results with the RG component indicates the ability to learn the representation of sparse graphs. Overall, this experiment validates the effectiveness of each component of AGCER for graph clustering.

Table 6 Defined scenarios to evaluate the components of the proposed model

In another experiment, we evaluate the performance of AGCER against various baseline and state-of-the-art graph clustering methods. This assessment encompasses different datasets with considerations for runtime metric and GPU memory usage. Based on the data presented in Fig. 10; Table 8, it is evident that AGCER has a similar GPU memory requirement and training time compared to other clustering algorithms. To summarize, the efficiency of AGCER is satisfactory. The reason for our adoption of the graph filter is to extract the attribute, hence circumventing the need for intricate convolution and aggregation operations.

Table 7 Ablation study to evaluate the components of the proposed model based on 8 defined scenarios
Fig. 10
figure 10

Illustration of GPU memory cost in AGCER and seven available methods on in homophilic datasets. Average results from 25 independent runs are reported

Table 8 Average running time (s) comparison as training time in four homophilic datasets

Based on observing the results of this experiment, AGCER can achieve a reasonable balance between the clustering performance and the running time. Furthermore, both GAE and EGAE focus solely on attribute-based algorithms, offering high-speed processing. However, our primary focus lies on algorithms that leverage both attributes and structure. Despite a slight increase in processing time, our proposed model outperforms the majority of graph clustering approaches in terms of performance, highlighting the effectiveness of incorporating both attribute and structural information. Nonetheless, AutoSSL and CDC are comparable methods known for their efficiency and effectiveness, attributed to their utilization of a GCN-based optimizable clustering module.

5.7 Results in heterophilic graph

This subsection presents an in-depth analysis of our model’s performance on heterophilic graphs, where nodes with dissimilar attributes are more likely to be connected. We evaluate our model using several well-known heterophilic datasets, including Cornell, Washington, Squirrel, and Roman-Empire. The clustering results in heterophilic graphs are shown in Tables 9, 10 and 11. Table 9 is the results related to ACC, while Tables 10 and 11 show the results related to ARI and NMI, respectively. Bold values in these tables represent the best results for each metric. The results demonstrate the effectiveness of our approach in capturing the complex relationships in these graphs, highlighting the model’s robustness and adaptability in scenarios where traditional homophilic assumptions do not hold. This section provides empirical evidence that our model not only performs well on homophilic graphs but also excels in heterophilic contexts, thus broadening the applicability of our proposed method.

Table 9 Experimental results of node clustering based on ACC metric in heterophilic graphs
Table 10 Experimental results of node clustering based on ARI metric in heterophilic graphs
Table 11 Experimental results of node clustering based on NMI metric in heterophilic graphs

The results indicate that AGCER consistently outperforms other methods across all datasets. Specifically, our model surpasses AutoSSL and CCGC, which account for heterophily, demonstrating that our reconstructed graph module more effectively leverages graph information. Unlike these methods, which primarily focus on local information, our approach incorporates a global cluster structure. Notably, AGCER achieves up to 8.7% higher ACC on Cornell, 12.3% on Washington, 4.6% on Squirrel, and 2.6% on Roman-Empire compared to other models, confirming that considering heterophily enhances model performance on real-world datasets. Additionally, traditional GAE-based methods perform poorly on heterophilic graphs, aligning with previous findings. In general, AGCER delivers consistent and strong performance in heterophilic scenarios, thanks to its ability to capture both homophilic and heterophilic information within graphs, making it well-suited for clustering real-world graphs, even when homophily ratios are uncertain.

5.8 Scalability analysis

Overall, graph clustering is essential for gaining insights from large-scale datasets, enabling efficient analysis, and informed decision-making. In the last experiment, we evaluate the performance of AGCER in the face of large-scale data. We use the Epinion dataset with 75,879 nodes and 508,837 edges as a large dataset for this experiment. Epinion is an online social network based on who-trust-whom, where the edge between two users is formed based on trust. The results of this experiment for the proposed AGCER and the CDC method are reported in Table 12. The outcome indicates that the proposed approach is suitable for handling sizable datasets with commendable accuracy and reasonable processing speed. The acknowledgment that the proposed method demonstrates applicability to large datasets with satisfactory accuracy and manageable execution time is encouraging. However, recognizing the need for ongoing enhancement, focusing on improving the efficiency of handling large datasets is an insightful direction for future research. To achieve this, various strategies can be explored, including optimizing algorithmic efficiency, leveraging parallel and distributed computing techniques, adopting data partitioning and streaming approaches, and implementing hardware acceleration where applicable. Additionally, advancements in hardware infrastructure, such as high-performance computing clusters and specialized processing units, can also contribute to improving efficiency.

Table 12 Performance comparison of AGCER and CDC on Epinion large-scale dataset

Based on the WSRT and the results presented in Table 9, the difference between our AGCER method and the CDC method on the Epinion dataset is statistically significant. The test, performed at a significance level of 0.05, resulted in a \(\:p\)-value below this threshold. This allows us to reject the null hypothesis, confirming that the differences in performance between the AGCER and CDC methods are significant and not due to random variation.

6 Conclusion

This paper presents a novel approach, AGCER (Attribute Graph Clustering based on Enhanced Graph and Reconstructed Graph), for attributed graph clustering that addresses several key challenges encountered by existing methods. By leveraging an enhanced graph module to capture distant relationships, a graph reconstruction phase to incorporate both homophilic and heterophilic information, a refined graph to improve the representation, and a dual-guidance supervisor module map features from extracted graphs in attribute and structure subspaces, AGCER demonstrates promising performance in handling sparse graphs, mitigating noise vulnerability, and improving clustering accuracy. Furthermore, the integration of a subspace clustering module enhances the robustness of the shallow GCN, ensuring more reliable clustering results. Through extensive experimental validation on benchmark datasets, AGCER showcases its effectiveness and outperforms existing methods, offering significant advancements in attributed graph clustering. The proposed model not only contributes to the theoretical understanding of attributed graph clustering but also holds practical implications for various real-world applications where accurate clustering of complex relational data is essential. Overall, AGCER represents a significant step forward in the field of attributed graph clustering, promising improved applicability and performance in diverse domains.

One potential drawback to consider is the possibility of information loss stemming from attribute reconstruction without structural consideration, suggesting a potential avenue for future research. In subsequent endeavors, our focus will be on devising a novel architecture to address this challenge effectively. Additionally, future directions for our research entail efforts to diminish computational overhead while also exploring the adaptation of our model to multi-view clustering scenarios, thereby broadening its applicability and scope.