UCoDe: unified community detection with graph convolutional networks

Community detection finds homogeneous groups of nodes in a graph. Existing approaches either partition the graph into disjoint, non-overlapping, communities, or determine only overlapping communities. To date, no method supports both detections of overlapping and non-overlapping communities. We propose UCoDe, a unified method for community detection in attributed graphs that detects both overlapping and non-overlapping communities by means of a novel contrastive loss that captures node similarity on a macro-scale. Our thorough experimental assessment on real data shows that, regardless of the data distribution, our method is either the top performer or among the top performers in both overlapping and non-overlapping detection without burdensome hyper-parameter tuning.


Introduction
Community detection (Fortunato, 2010) is the problem of identifying sets of nodes in a graph that share common characteristics.In social networks, community detection identifies groups of individuals who participate in joint activities (e.g.sports clubs) or having similar preferences (Perozzi et al., 2014); in biological networks, communities represent proteins that contribute to a specific disease (Mall et al., 2017).Such networks include information in node attributes that may be helpful when identifying similarities (e.g., the age of a person).However, these attributes are typically not considered by traditional community detection methods, such as spectral clustering (Shi & Malik, 2000), modularity maximization (Newman, 2006), or more recent graph embeddings (Cai et al., 2018), making them ill-suited for detecting node communities in attributed graphs.
In recent years, graph neural networks (GNNs) (Kipf & Welling, 2017;Veličković et al., 2018;Hamilton et al., 2017;Bronstein et al., 2017;Battaglia et al., 2018) have shown superior performance in a number of supervised tasks on graphs, especially link prediction, node classification, and graph classification.GNNs popularity stems from their aptitude to capture complex relationships in networks, typically by means of propagating node attributes and features to neighboring nodes by a message-passing process (Battaglia et al., 2018).These are typically accompanied by graph pooling (Bruna et al., 2014;Bianchi et al., 2020;Lee et al., 2019), which aggregates multiple nodes into higher-level representations to reduce the number of parameters of the neural network.
GNNs have propelled advancements in supervised tasks; yet on unsupervised tasks such as community detection, GNNs have not yet received the same attention.Most existing GNN methods do not directly optimize for community detection but achieve the objective indirectly.Unsupervised GNNs, such as the popular Deep Graph Infomax (DGI) (Veličković et al., 2018), find node representations that, in a second step, need to be subjected to a clustering algorithm, such as the widely used k-means, to actually obtain communities.
Recently, a few methods propose GNNs that explicitly optimize for community detection.GNNs for non-overlapping community detection either optimize for a single score or combine several scores.Single score methods revisit traditional measures such as min-cut (Bianchi et al., 2020) and modularity (Tsitsulin et al., 2020) objectives to return node-community probabilities.Combined score methods (Zhang et al., 2019(Zhang et al., , 2020) ) integrate multiple different objectives.These methods outperform single score methods in non-overlapping community detection, but require substantial tuning to the dataset at hand and are typically less robust and less interpretable than their single objective counterparts.
Non-overlapping community detection aims at returning a single community assignment for each node.As such, they are ill-suited for overlapping community detection.NOCD (Shchur & Günnemann, 2019) is, at the time of writing this paper, the only GNN that optimizes for overlapping community detection.In particular, NOCD finds communities that maximize the probability of recovering the graph structure.Yet, this approach constrains the community structure to be overlapping and thus does not capture non-overlapping communities.In conclusion, to date, no GNN detects both overlapping and non-overlapping communities.
Contributions. (1) We introduce a new GNN method, UCoDe, for community detection on graphs.We devise a simple effective single score model which leverages state-of-the-art representations; (2) UCoDe features a novel contrastive loss function that promotes both overlapping and non-overlapping communities, thus being the first approach to achieve competitive results across these tasks with a single model.(3) We perform extensive experiments on real data, showing that our method outperforms single-objective methods without the need for extensive parameter tuning, achieving quality on par with more complex combined scores.

Related work
Before delving into our solution, we provide an overview of the literature on community detection, graph neural networks, graph pooling, and graph embeddings.Table 1 provides a summary of the characteristics of the most important work in the area, highlighting core properties of the methods, such as their ability to capture overlapping and non-overlapping communities and whether they achieve their results in an unsupervised manner with a single score approach.

Traditional community detection
Community detection has a long history in graph analysis (Fortunato, 2010) with applications across the natural sciences.There are two main categories of community detection: non-overlapping community detection, also called partitioning, which seeks an assignment of each node to exactly one community; overlapping community detection, seeking a soft-assignment of nodes into potentially multiple communities.A community detection algorithm optimizes a score that describes the cohesiveness of nodes in the community with respect to the rest of the nodes.A number of scores and methods have been proposed based on the graph structure, such as spectral clustering for min-cut (Shi & Malik, 2000), Louvain's method for modularity (Newman, 2006), and the Girvan-Newman algorithm for betweenness (Girvan & Newman, 2002).Other works extend such methods by incorporating node features into the graph analysis (Yang et al., 2013).
Overlapping community detection is often approached using algorithms similar to the Expectation-Maximization algorithm for soft-clustering (Dempster et al., 1977) where each point is a distribution over the clusters.Similarly, AGM (Yang & Leskovec, 2012) and BigCLAM (Yang and Leskovec, 2013) formulate the community detection problem as finding soft assignments to communities that maximize the model likelihood.Other traditional methods find overlapping communities by removing high-betweenness edges (Gregory, 2007) or by propagating label information (Gregory, 2010).Lastly, EPM (Zhou, 2015) fits a Bernoulli-Poisson model, SNMF (Wang et al., 2010) and CDE (Li et al., 2018) use non-negative matrix factorization.

Graph neural networks for community detection
GNNs (Wu et al., 2020) are a family of parametric models that learn node representations by aggregating features over the graph's structure.GNNs exhibit state-of-the-art performance in supervised tasks, such as link prediction, node and graph classification.
Popular GNN models include spectral GNNs (Defferrard et al., 2016;Bronstein et al., 2017), GCNs (Hamilton et al., 2017;Kipf & Welling, 2017), graph autoencoders (GAEs) (Kipf and Welling, 2016), graph isomorphism networks (Xu et al., 2018), and Deep Graph Infomax (DGI) (Veličković et al., 2018).These models compute node features in an unsupervised manner if equipped with a reconstruction loss.A clustering algorithm, such as k -means, can cluster the node features to return communities.Since there is no coupling between such GNN model objectives and the clustering algorithm, the resulting communities may not accurately represent all groups in the graph.

GNNs for community detection
Some GNNs directly optimize for non-overlapping community detection with community-wise loss functions.Single objective approaches propose variations of traditional cohesiveness scores, such as min-cut (Bianchi et al., 2020) and modularity (Tsitsulin et al., 2020).Yet, single-objective methods inherit the limitations of the score they aim to optimize, providing community memberships that are subject to the loss objective's definition of community.
CommDGI (Zhang et al., 2020) proposes a combined objective as a linear combination of three objectives, the DGI objective (Veličković et al., 2018), modularity, and mutual information.CommDGI's combined objective overcomes the limitations of the single score methods but requires extensive parameter tuning for proper results.Similarly, recent multiobjective methods operate on the pairwise correlation matrix (Liu et al., 2022), unsupervised contrastive relations (Park et al., 2022), KL-divergence between clusters (Zhao et al., 2021) (Bo et al., 2020), and structured encodings (He et al., 2021).These methods, besides employing complex combined objectives, often require initialization with elaborate pretrained models (Bo et al., 2020;Zhao et al., 2021;Liu et al., 2022), running k-means either in the computation of the embeddings (Liu et al., 2022), in each epoch (Sun et al., 2021), or as an initialization step (Bo et al., 2020), and hyperparameter tuning for each dataset (Bo et al., 2020;Zhao et al., 2021;Liu et al., 2022;Park et al., 2022).In contrast, our model uses the same hyperparameters for all datasets, devises a single-objective contrastive loss, requires no sophisticated initialization, and detects communities without the need to run kmeans.Nevertheless, in our evaluation, we also compare with DCRN (Liu et al., 2022), the most recent of such combined objective methods.
While models like DMoN (Tsitsulin et al., 2020) return soft community assignments through a softmax output layer, both single and combined objective methods explicitly penalize overlap among communities.
NOCD (Shchur & Günnemann, 2019) proposes an overlapping community detection loss that maximizes the likelihood of Bernoulli-Poisson models (Shchur & Günnemann, 2019).NOCD achieves competitive results on overlapping community detection but cannot directly detect non-overlapping communities.

Graph pooling
Graph pooling (Bruna et al., 2014;Bianchi et al., 2020;Lee et al., 2019) is an operation that aggregates nodes so as to learn summarized representations.The purpose of graph pooling is to remove redundant information and reduce the number of parameters of the GNN.
Model-free pooling coarsens the graph structure by aggregating nodes without considering the node attributes.Graclus (Dhillon et al., 2007) revisits max-pooling to aggregate similar nodes in a hierarchical fashion.SAGPool (Lee et al., 2019) proposes a self-attention layer to reweigh nodes in the graph.Model-free approaches act as layers in the network and do not provide communities as output.
Model-based pooling learns coarsening operators through a differentiable loss function.DiffPool (Ying et al., 2018) learns a hierarchical clustering assignment of the graph for supervised graph classification.Top-K pooling (Gao & Ji, 2019) trains an autoencoder that assigns a score to each node; the pooling phase retains the k nodes with the highest score.Yet, these methods do not explicitly optimize for cluster assignments resulting in substandard communities (Bianchi et al., 2020).
MinCutPool (Bianchi et al., 2020), although a pooling technique, returns community assignments by optimizing the min-cut objective of spectral clustering (Shi & Malik, 2000).MinCutPool does not require eigendecomposition of the Laplacian matrix and instead propagates node attributes over the GNN.

Node embedding methods
Node embeddings (Cai et al., 2018;Chami et al., 2020) learn node representations of the graph structure in an unsupervised manner with shallow neural networks (Perozzi et al., 2014;Tang et al., 2015), autoencoders (Wang et al., 2016), or matrix factorization (Ou et al., 2016;Qiu et al., 2018).Similar to GNN-based representations, a clustering algorithm on the embeddings can be used to detect communities from these representations.Node embeddings can be seen as a generalization of dimensionality reduction methods, and tend to preserve the structure, but disregard node attributes.
A few recent works address the problem of attributed node embeddings through matrix factorization (Yang et al., 2015) or deep models (Gao & Huang, 2018).None of these models are designed for community detection.AGC (Zhang et al., 2019) proposes a combined score based on spectral clustering on top of a GNN representation.

Consider an attributed graph
is a set of edges and A = {a 1 , ..., a l } is set of l attributes.Each node v i has an associated vector x i ∈ ℝ l of real features for each attribute.The node features f̣ orm an n × l matrix X ∈ ℝ n×l where each node-feature vector x i is a row in such matrix.The adjacency matrix is a matrix representation A of the graph's structure, where A ij = 1 if (v i , v j ) ∈ E , and 0 otherwise.The degree d i of a node v i is the number of neighbors of node i, i.e., d i = ∑ n j=0 A ij ; d is the vector containing the degree d i = d i of all nodes, and D is the diago- nal degree matrix.
Problem (Attributed graph community detection.)We aim to assign each node to at least one of k communities, C 1 , ..., C k , such that a score of community cohesiveness is maximized.The cluster assignment is a probability vector c i indicating the probability of node v i belonging to community C j .Cluster assignments form a matrix C ∈ [0, 1] n×k where row i contains node i's cluster assignment c i .
One of the determinant choices for community detection algorithms is the definition of the community cohesiveness score that determines the quality of the cluster assignments.We now review modularity (Newman, 2006), a popular measure for community detection.

Modularity
Modularity (Newman, 2006) Q(G;C) measures the quality of a partition C of the nodes of the graph G ; a high modularity score indicates that node grouped by C have dense inter- nal connections and sparse connections to outside nodes.More specifically, modularity captures the difference in density between the edges inside a community c i and the edges of a fixed null model: The quantity 2|E| is the null-model representing the probability that two nodes v i , v j are con- nected by chance.The null model in the modularity score is the rewiring model, in which each node v i preserves its degree d i but connects randomly to any other node in the graph.By defining the modularity matrix Eq. 1 simplifies into

Limits and pitfalls
Modularity maximization is one of the most popular methods for community detection (Fortunato, 2010).However, its direct maximization may fail to provide optimal communities.As shown in (Fortunato & Barthelemy, 2007), modularity may fail to recognize communities that fall below a graph-specific size.Furthermore, modularity is a measure for discrete partitioning and does not perform well in the case of overlapping communities (Devi & Poovammal, 2016).In the following section, we show how to overcome these limitations of modularity by combining the expressiveness of Graph Neural Networks with a novel contrastive modularity loss that captures both overlapping and non-overlapping communities.

Our solution: UCoDe
The modularity objective in Eq. 2 is NP-hard, but can be solved efficiently with a spectral approach similar to spectral clustering (Newman, 2006) if we allow matrix C to be real rather than binary.This relaxed objective admits as solutions the k leading eigenvalues of the matrix B .This convenient relaxation enables soft clustering assignments and, in princi- ple, overlapping community detection.
To circumvent the modularity's resolution limit and capture interactions among nodes that are not directly connected, we further assume that C is the output of a Graph Neural Network model.

Graph neural network approach
Graph Neural Networks (GNNs) (Kipf & Welling, 2017;Veličković et al., 2018;Hamilton et al., 2017;Bronstein et al., 2017;Battaglia et al., 2018) transform the node attributes by nonlinear aggregation of attributes of each node's neighbors.By virtue of this aggregation mechanism, these networks are called message passing (Battaglia et al., 2018).We now review the Graph Convolutional Network (GCN) model (Kipf & Welling, 2017).We denote as X [0] the initial node attributes X the normalized adjacency matrix with self-loops, and W [t] the weight matrix at layer t, which encodes the parameters of the network.The t+1 layer X [t+1] is The function is a non-linear activation function, such as softmax, SeLU, or ReLU.The matrix W [0] is randomly initialized, typically as W [0] ∼N(0, 1) .The parameters W are learned via stochastic gradient descent on a supervised or unsupervised loss function.The result of a GNN in the last layer T is a matrix X [T] which rows are embeddings of a node in a d-dimensional space.
To train a GNN, we need to specify a differentiable loss function.For instance, in the node classification task, the loss function is typically the binary cross-entropy.An optimizer, such as ADAM (Kingma & Ba, 2014), finds the parameters W [1] , ..., W [T] that mini- mize the loss function.
The choice of the architecture and the loss function are determinant choices for GNNs.In what follows, we present our model UCoDe that integrates the simplicity of singleobjective community detection with the power of combined scores, by virtue of a new loss function that encourages robust community memberships while maintaining consistent separation between dissimilar nodes.

UCoDe loss function
We build our loss function based on community modularity (Eq.2).We start by showing that the entire matrix C ⊤ BC can be interpreted as the modularity across communities.Afterward, we introduce our contrastive loss and show how such a loss aims to detect overlapping and non-overlapping communities alike.

C ⊤ BC as modularity across communities
We observe that C ⊤ BC encodes the modularity matrix at the community scale We refer to Q M as the community-wise modularity matrix.
In the simple setting where C is binary, such that C ∈ {0, 1} n×k , then Q M reasonably rep- resents the modularity across the community graph.Note that A C is the weighted adjacency matrix of a graph where nodes are communities and the weight A C ij is twice the number of edges between community C i and community C j .The diagonal entries A C ii , therefore, represent the weight from community i to itself and are equal to double the number of edges between the nodes within community C i .We also observe that D C = C ⊤ d is the community degree matrix and that represents the likelihood of an edge existing between communities.As such, we can interpret Q M as the modularity of the graph in which nodes are replaced with their corresponding communities.
In the more practical case of non-binary community memberships with C ∈ [0, 1] n×k , we can interpret Q M as the modularity across "fuzzy" communities, where each entry of the matrix is proportional to the corresponding community membership strengths.
We can now state our objective as maximizing the diagonal values of Q M while mini- mizing off-diagonal entries that correspond to dissimilar communities.Clearly, then, our target diagonal values should be 1.However, setting the target off-diagonal values to 0 would penalize overlapping community detection.For this reason, we define a target distribution y ∈ ℝ 2k as follows: where is a threshold parameter set to 0 in the non-overlapping setting and a pre-determined value in the overlapping setting. 1 A 2k vector is necessary to enforce the similarity between the first 1, .., k elements and dissimilarity among the next k + 1, ..., 2k elements.Under the distribution in Eq. 3, we optimize for community-wide modularity by matching intra-community similarities (Q M ) ii to the target y i;i≤k and inter-community similarities (Q M ) jl;l≠j to the target y j;j>k .Thus, our loss function becomes where dg extracts the vector of the diagonal of a matrix, P returns a random row-permuted matrix, and is the element-wise sigmoid.The row permutation ensures that every community is compared repulsively to another community, as the post-permutation diagonal contains the community modularity between separate clusters.Although the loss allows for including multiple permutations P of the modularity matrix, in practice, we only consider one as we find that this choice strikes a balance between speed and quality.Thus, this loss function has the straightforward interpretation of clustering similar groups of nodes while encouraging separation between dissimilar ones.
Note that L UCoDe has a natural relationship to cross-entropy and contrastive objective functions.In the non-overlapping setting, it corresponds to the cross-entropy loss as it represents the KL divergence between Bernoulli random variables.Our target is not a probability distribution in the overlapping setting, however, requiring us to scale the loss by (1 + ) to recover the cross-entropy interpretation.

A loss for overlapping and non-overlapping communities
Our loss in Eq. 4 clearly encourages non-overlapping community structure by maximizing the diagonal of Q M and minimizing the off-diagonal.It is less clear whether such a loss also supports overlapping community detection.To this end, we consider the bowtie graph depicted in Fig. 1 with 5 vertices and edges 2,4,5 have degree 2, v 3 has degree 4. The optimal overlapping clustering then groups vertices v 1 , v 2 , v 3 into community c 1 , v 3 , v 4 , v 5 into c 2 with v 3 shared among c 1 and c 2 . (3) 85 in all datasets in our experimental cohort.
If we assume our loss is minimized by non-overlapping communities, it would incentivize orthogonal binary community indicator vectors.WLOG, let c n 1 = [1, 1, 0, 0, 0] ⊤ and c n 2 = [0, 0, 1, 1, 1] ⊤ be two such non-overlapping communities.Comparing this to the optimal overlapping clustering c o 1 = [1, 1, 0.5, 0, 0] ⊤ and c o 2 = [0, 0, 0.5, 1, 1] ⊤ , we obtain An exhaustive search over all possible communities shows that the minimum of the loss function is the clustering As such, the loss already encourages overlapping communities.Yet, the value of can increase to allow for additional overlap-sensitivity if necessary.In the future, one could consider varying on a per-community basis.We support the above example with an ablation study across datasets.Table 2 shows that optimizing both elements of the contrastive loss yields the best overlapping and non-overlapping NMI.

UCoDe architecture
The main purpose of our GCN is to learn the community assignment matrix C using the graph structure and the node attributes.Our architecture is a two-layer GCN (Kipf & Welling, 2017):  The last layer of our GCN outputs community assignments via This architecture, although simple, allows for propagating information over the entire graph, thus capturing relationships within the graph's structure and the nodes' attributes.

Experiments
In this section, we empirically evaluate UCoDe in comparison with state-of-the-art approaches for community detection on several benchmark graph datasets.We analyze our results in both non-overlapping community detection (graph partitioning), and in overlapping community detection in Sect.5.1 where nodes may be assigned to more than one community (as discussed in Sect.5.2).We further analyse the stability of the performance 5.3 and sensitivity of our approach to its few hyperparameters (Sect.5.4).We implement UCoDe using PyTorch version 1.10.0 and Python v3.8.We release the implementation of UCoDe at https:// github.com/ AU-DIS/ UCODE.We evaluate our methods on a 14-core Intel Core i9 10940X 3.3GHz machine with 256GB RAM.
Our method UCoDe outputs an assignment matrix C where c ij represents the likeli- hood of node v i belonging to community j.For non-overlapping community detection, we assign the node to the community with the highest score, i.e., arg max j c ij .
In additional experiments, we also investigate a second version, UCoDe k , which applies the k-means algorithm on the representations obtained by the RReLU function in Eq. 5.
Studying this version, we show the benefit of our method compared to decoupled community detection approaches.The results suggest that k-means contributes only marginal quality improvement, which confirms the validity of our efficient end-to-end loss function for community detection.
Adapting to regularization We note that the values in Q M can be positive or negative and are not necessarily bounded.The sigmoid is thus necessary in order to calculate the cross-entropy to the target distribution.However, we found empirically that the division by 4|E| in Eq. 2 settles the values in C ⊤ BC close to 0, leaving the sigmoid outputs near 1/2.To this end, we apply a logarithm in C ⊤ BC that preserves the ordering but amplifies the val- ues.In preliminary experiments, we empirically confirmed that this approach sufficiently amplifies the values so as to achieve good performance when using network regularization.
Competitors We collect results for a number of state-of-the-art non-overlapping (Sect. 1 in the appendix) and overlapping (Sect. 1 in the appendix) community detection methods.
Quality measures For both tasks of overlapping and non-overlapping community detection, we provide the Normalized Mutual Information (NMI) between the cluster assignments and the ground-truth communities.In addition, for non-overlapping community detection we provide the pairwise F1 score between all node pairs and their corresponding ground-truth community; we also provide two intrinsic quality measures, namely ( 5) W [1] ) modularity (Eq. 1) and network conductance (Yang & Leskovec, 2015).The network conductance ( C ) measures how well-connected the nodes in the communities are related to the escape probabilities of random walks.Modularity (Q) (Newman, 2006) assesses whether intra-community nodes are more densely connected than their inter-community counterparts.We report the average value of each measure over 10 runs of the algorithms.
Data We perform experiments on 14 real-world graphs with non-overlapping and overlapping communities.The largest graph has 34.5K nodes and 247K edges.Further details on the datasets, quality measures and parameter settings can be found in Table 3.Our choice of datasets includes graphs with different types of communities, density and attributes, as well as the largest networks evaluated by the competitors.
• Cora, Citeseer, and Pubmed (Sen et al., 2008) are co-citation networks among papers where attributes are bag-of-words representations of the paper's abstracts, and labels are paper topics.• Amz-Pho and Amz-PC (Shchur et al., 2018) are subsets of the Amazon co-purchase graph with the frequency of products purchased together; attributes are bag-of-words representations of product reviews, and class labels are product categories.• CoA-CS and CoA-Phy (Shchur et al., 2018) are co-authorship networks based on the MS Academic Graph (MAG) for the computer science and physics fields respectively; attributes are collections of paper keywords; class labels indicate common fields of study.• Fb-X datasets (Mcauley & Leskovec, 2014) are ego-nets from Facebook where X is the id of the central node.• Eng (Shchur & Günnemann, 2019) is a co-authorship graph from MAG.

Non-overlapping community detection
We begin our experimental evaluation with an overall comparison of methods for nonoverlapping community detection across different datasets.We compare with the methods described in Sect. 1 in the appendix.We additionally include NOCD (Shchur &  Günnemann, 2019), a state-of-the-art GNN for overlapping community detection.To obtain non-overlapping clusters, we assign each node to the cluster with the highest probability.
UCoDe parameter setup We train UCoDe for 1000 epochs, which shows consistent results across datasets and tasks.We use two GCN layers with a hidden dimension 256.We default to producing k = 16 communities for all datasets as this choice is consistent with MinCut (Bianchi et al., 2020) and DMoN (Tsitsulin et al., 2020) and, in a set of preliminary experiments, we found the performance with k = 8 and k = 32 to give inferior results.We apply batch normalization in both internal layers and set a learning rate 10 −3 for the Adam optimizer (Kingma & Ba, 2014) for learning.We add weight decay to both weight matrices with regularization strength = 10 −1 .
We additionally experimented with GraphSAGE (Hamilton et al., 2017) for the internal propagation layer, but opt for GCN (Kipf & Welling, 2017) due to the superior performance in our analyses.

Analysis of ground-truth communities
We compare the methods in terms of NMI and F1-score with respect to ground-truth communities.As Fig. 2 confirms, UCoDe is the most robust choice for non-overlapping communities across datasets.Regardless of dataset characteristics, we observe that UCoDe attains competitive results even where existing approaches under-perform in several datasets.Indeed, a more detailed analysis reveals that UCoDe ranks on average higher than any other competitor (Sect. 2 in appendix).The additional k-means clustering offered to DGI k and UCoDe k offers a competitive edge only on three of the seven datasets.Further, note that on the denser Amz-PC and Amz-Pho, methods like MinCut and DCRN fail to converge.They provide overall lower scores, indicating that graph pooling and combinedobjectives are not viable approaches for the community detection task.Our method outperforms traditional methods, such as k-means, demonstrating an advantage of a graph-learn- ing approach over attribute clustering to capture the structural characteristics of a graph.NOCD fares relatively good against methods explicitly targeting non-overlapping communities, but still fails to provide competitive results against UCoDe.In conclusion, there is no clear second choice, promoting UCoDe to be the method of choice, as it shows consistent behavior across datasets.

Analysis of conductance and modularity
We now turn our attention to intrinsic measures to analyze the impact of the various objective functions on community connectedness.Table 4 reports conductance ( C ) and modularity (Q).UCoDe shows the best performance in terms of conductance, which means that UCoDe is particularly good at identifying well-connected communities.This makes sense, as our loss function specifically encourages high intra-connections and low inter-connections.
At the same time, DMoN, which optimizes for modularity, does not consistently attain the best modularity.Yet, UCoDe attains modularity superior to DMoN in most datasets, although not explicitly encouraging modularity.This indicates that the contrastive loss in UCoDe indeed yields a more nuanced community structure than can be obtained through optimizing modularity alone.This is even more notable when considering the other measures where UCoDe outperforms DMoN.
In conclusion, the empirical evaluation clearly shows that our model is highly robust and widely applicable in the non-overlapping setting, obtaining competitive results across the evaluation metrics and datasets rather than targeting any single one.We note that methods that directly optimize modularity achieve good modularity scores at the expense of performance on other measures.UCoDe instead achieves competitive results across every metric with little-to-no hyperparameter tuning.

Overlapping community detection
Here, we analyze the performance of UCoDe on overlapping community detection.The list of competitors is described in Sect. 1 in the appendix.UCoDe parameter setup While UCoDe does not require hyperparameter tuning across datasets, it requires small adaptations across tasks to accommodate for the uncertain nature of overlapping communities.To reflect the intrinsic dimensionality of each dataset that grows with the number of nodes (Tsitsulin et al., 2019), we set the size of the first layer to 128 while keeping the output layer's size fixed to the number of communities k.We apply batch normalization after the first graph convolutional layer.We add weight decay to both weight matrices with regularization strength = 10 −2 .The rest of the hyperparameters are the same as in non-overlapping community detection.
We set the diagonal elements of the permuted matrix P(Q M ) in Eq. 4 to a value ∈ [0, 1] to avoid penalizing intra-cluster connections.We find experimentally = 0.85 to attain good experimental results on all datasets, without the need for further tuning.
Community assignment.In the overlapping scenario, we set a threshold p for scores c ij above which a node i is assigned to a community j.We set a threshold that exhibits good average performance on all the datasets, thereby eschewing per-dataset tuning.Our first threshold p 1 is the average of the exp of the assignment scores, i.e., 1 nk ∑ ij exp(c ij ) , where the exp encourages sparsity by distributing the values on the range [0, +∞) .We note in Fig. 5 that this choice corresponds to elbow points in a grid search.For the NOCD model, we set p 2 = 0.5 as in their experiments.We evaluate the DMoN model using p 1 and p 2 , and p 3 = [C] and report results with p 3 as they were the highest in all experiments.

Analysis of ground-truth communities
Overlapping community detection results are given in Table 5 and verify that UCoDe outperforms the state-of-the-art methods on the majority of datasets.The direct optimization of modularity in DMoN cannot easily detect overlapping communities, as opposed to our contrastive modularity loss.More importantly, UCoDe outperforms NOCD in many cases, a GCN that directly aims to detect overlapping communities.Lastly, we note that none of the other methods in Table 5 obtain comparable results to our method.This suggests that our method, that requires no hyperparameter tuning, is an effective choice for both overlapping and non-overlapping community detection.(Shchur & Günnemann, 2019) We highlight the model with the best performance.We did not perform any t-test because of the low variance Dataset CDE SNMF BigClam NOCD COPRA DMoN UCoDe

Stability analysis
After having established the top-performers in the respective tasks, namely MinCut, DGI k , NOCD, and DMoN for non-overlapping community detection, and NOCD for overlapping community detection, we analyze them in terms of variance.Table 6 shows the NMI and the confidence intervals at 95% level.UCoDe retains competitive stability across datasets.More interestingly, while our k-means variant, UCoDe k , attains lower variance due to the k-means clustering, UCoDe is typically comparable and sometimes more stable than the competitors.
In overlapping community detection in Table 7, we compare only with NOCD that competes with UCoDe.The probabilistic nature of the two methods is reflected in the deviation, which is typically around 1.0.However, in most of the cases, the deviation does not affect the final result and shows that UCoDe is competitive regardless of the variance.

Sensitivity analysis
We analyse UCoDe as a function of the number of training epochs.Figure 3 gives results for non-overlapping community detection for the Cora dataset (other datasets show similar trends).As expected in Fig. 3 (left), the loss decreases with more epochs and converges after only about 100 epochs.This behaviour is confirmed in the NMI score (center).We study the impact of the embedding dimension on the quality of the communities.In Fig. 4, we report both NMI and modularity for the Cora dataset (other datasets show similar trends).For NMI, we note that increasing the dimension is beneficial until the dimension reaches 256-512.After that point, the quality plateaus and gently decreases.We settle on 256 dimensions as it exhibits a consistent behaviour across datasets and tasks.For overlapping community detection, 128 and 256 dimensions display comparable results; we opt for 128 for the sake of efficiency.
On the other hand, modularity is maximum at 16 dimensions.This discrepancy between NMI and modularity reinforces once more the observation that the pure modularity optimization of models such as DMoN does not necessarily lead to superior quality.Finally, the results for other datasets follow a similar trend, confirming the robustness of UCoDe.
Figure 5 reports the overlapping threshold p 1 for the Fb-686 dataset as an example of a dataset with overlapping communities; we observe similar results in other datasets.The results indicate that there is a relatively broad range of values within [0,40] in which our method performs well.A threshold > 22 misses community assignments, while a low value assigns every node to all communities.The choice 21 corresponds to the earlier discussed setting p 1 (Fig. 5, red line).

Ablation study
Table 2 in Sect.4.1.2shows the results of the ablation study on three datasets for nonoverlapping and four datasets for overlapping community detection.We experiment with a variant of our loss function in Eq. 4 only with intra-cluster similarity (modularity), only with inter-cluster similarity (row-permuted modularity) and UCoDe's loss.In non-overlapping community detection, the intra-cluster similarity produces noisy communities.Yet, the results improve significantly with a combination of the two modularity scores, as the objective drives the model to discriminate true communities from noise.In overlapping community detection, the effect of the row-permuted modularity is more tangible and vindicates the choice of our contrastive loss showing a sensitive increase in performance when both similarities are introduced.Furthermore, the introduction of overlapping community probabilities in UCoDe effectively encourages the model to discover nodes belonging to multiple communities.The results show that the combination of the intra-cluster and the inter-cluster similarity brings the largest benefit.

Conclusion
We propose UCoDe, a new Graph Neural Network method for community detection in attributed graphs.UCoDe performs both overlapping and non-overlapping community detection, by virtue of a novel contrastive loss that maximizes a soft version of network modularity.Our experimental assessment confirms that our method is expressive and overall superior in both overlapping and non-overlapping community detection tasks, exhibiting competitive performance in comparison with state-of-the-art methods designed for either one of the tasks.

A Additional material
Here we introduce additional material for reproducing the experiments and support further the analyses in the paper.
• DGI k : Deep Graph Infomax (DGI) (Veličković et al., 2018) is an unsupervised GNN model.After obtaining DGI node representations, k-meansclusters these representa- tions; we use the implementation from the authors 4 • DMoN (Tsitsulin et al., 2020) is a state-of-the-art community detection model that trains a shallow GCN to exclusively optimize graph modularity; we use the implementation from the authors 5 • MinCut (Bianchi et al., 2020) is a graph pooling technique that trains a GNN with a min-cut loss similar to spectral clustering (Shi & Malik, 2000); we use the pytorch implementation from DMoN (Tsitsulin et al., 2020).• DCRN (Liu et al., 2022) is the most recent GNN for non-overlapping community detection.DCRN employs a combined objective and requires a pre-trained DFCN (Tu et al., 2021) network to initialize the model embeddings; the communities are k-means clus- ters of the output embeddings.We downloaded the implementation from the authors 6 including the pre-trained networks.We tried to reproduce the experiments in the best of our capacity using the same version of the libraries, hyperparameters, and code, but the results were inconsistent with the ones reported in (Liu et al., 2022).After a thorough investigation, we realized that the reported values must be the maximum NMI across epochs.However, in an unsupervised task, the ground-truth communities are unknown, hence the maximum NMI is unknown as well.Even considering the maximum value, the method does not attain the results declared in the paper.We, therefore, report results obtained using standard evaluation methodology, i.e., NMI for a fixed number of training epochs, consistent with the remainder of our experiments.

Overlapping
We compare UCoDe against the following baselines and state-of-the-art methods for overlapping community detection, including DMoN: • NOCD (Shchur & Günnemann, 2019) is a GCN based on the BigClam objective; we use the implementation from the authors 7 • COPRA (Gregory, 2010) discovers an arbitrary number of overlapping communities via label propagation; we use the implementation from the authors 8 • CDE (Li et al., 2018) and SNMF (Wang et al., 2010) employ non-negative matrix factorization to detect communities; results are from (Shchur & Günnemann, 2019) • BigClam (Yang and Leskovec, 2013) finds overlapping communities optimizing the parameters of a Bernoulli-Poisson model; results are from (Shchur & Günnemann, 2019).

Complete results for non-overlapping communities
For completeness, Table 8 reports the NMI values and Table 9 the F1 values for each method reported in Fig. 2.

Statistical significance test
We perform an individual two-sided t-test using NMI to compare each model with UCoDe.The arrows in Table 10 indicate a statistically significant difference (with p-value < 0.05 ) compared to UCoDe.The results demonstrate that in 85% of the cases, UCoDe is significantly better than the competitors.

Extended sensitivity analysis
Figure 6 extends the analysis in Sect.5.4 to the Citeseerand Amz-Pho datasets.The results consistently indicate 256 as an optimal embedding dimension for the intermediate layer.
We additionally present the analysis of the loss function for Citeseer and Amz-Pho in Fig. 7.The loss exhibits a steady increasing behaviour, stabilizing around 100 epochs as Fig. 6 Impact of the embedding dimension on NMI and modularity for non-overlapping community detection; Amz-Pho and Citeseer datasets Fig. 7 Training UCoDe that quickly minimizes the loss (left); NMI increases steadily and achieves 9% ( 19% in Amz-Pho) higher value than DMoN (center); UCoDe gradually outperforms DMoN's modularity (right) experienced in Sect.5.4.Interestingly, DMoN's performance is more fluctuating in Amz-Pho as opposed to other datasets.

Fig.
Fig. NMI, F1, and confidence intervals, for non-overlapping community detection Figure 3 (right) compares the modularity score for DMoN and UCoDe per training epoch.We note that while the contrastive loss stabilizes after 200 epochs, the modularity continues to increase until it outperforms DMoN, confirming our analysis in Sect.4.1.

Fig. 4
Fig.4Impact of the embedding dimension for non-overlapping dataset Cora (similar for other data).The maximum modularity (right) does not correspond to the best NMI (left).The optimal embedding dimension for the intermediate layer is 256

Table 1
Related work in terms of present (✔) and absent (✘) properties

Table 2
NMI scores optimizing only intra-community similarity with target function y i;i<k , inter-community similarity with target function y i;i>k , and the UCoDe objective in Eq. 4The values in bold indicate the model's superior performance, achieved through optimizing both inter-community, intra-community.We did not perform any t-test because of the low variance

Table 3
Datasets and their main characteristics

Table 4
Graph conductance C (low is better) and modularity Q Best performer in bold; second best performer underlined.Louvain is included for reference since a direct comparison is not possible as it is not possible to set the number of communities

Table 5
NMI for overlapping community detection; CDE, SNMF, and BigClam results are from

Table 6
Non-overlapping community detection: NMI and confidence intervals

Table 8
NMI for non-overlapping community detection The best performer is highlighted in bold, and the second best is underlined Dataset k-means Louvain DCRN DGI k MinCut NOCD DMoN UCoDe k UCoDe

Table 9
F1-result scores for non-overlapping community detection results on the seven real world data sets as summarized in Fig.2The best performer is highlighted in bold, and the second best is underlined Dataset k-means Louvain DCRN DGI k MinCut NOCD DMoN UCoDe k UCoDe

Table 10
Results of the t-tests using Normalized Mutual Information (NMI) p-value < 0.05 ; the arrows indicate statistical significance; ↑ indicates that UCoDe is significantly better than the competitor