Introduction

The complex networks play a pivotal role in various fields such as physics, biology, economics, social sciences, and urban planning. Thus understanding the underlying community structure of the networks saw a wide range of applications, including social science (Plantié and Crampes 2013), biology(Guimerà and Nunes Amaral 2005), and economics (Piccardi and Tajoli 2012). In particular, partitioning the networks of human mobility and interactions is broadly applied to regional delineation (Ratti et al. 2010; Blondel et al. 2010; Sobolevsky et al. 2013; Amini et al. 2014; Hawelka et al. 2014; Kang et al. 2013; Sobolevsky et al. 2014; Belyi et al. 2017; Grauwin et al. 2017; Xu et al. 2021) as well as urban zoning (Sobolevsky et al. 2018; Landsman et al. 2020, 2021).

Over the last two decades, a large number of approaches and algorithms for community detection in complex networks have been suggested. Some of them are just straightforward heuristics such as hierarchical clustering (Hastie 2001) or the Girvan-Newman (Girvan and Newman 2002) algorithm, while the vast majority rely on optimization techniques based on the maximization of various objective functions. The first and the most well-known partition quality function is modularity (Newman and Girvan 2004; Newman 2006) assessing the relative strength of edges and quantifying the cumulative strength of the intra-community links. Many modularity optimization strategies have been suggested over the last two decades (Newman and Girvan 2004; Newman 2006, 2004; Clauset et al. 2004; Agarwal and Kempe 2008; Sun et al. 2009; Blondel et al. 2008; Guimera et al. 2004; Good et al. 2010; Duch and Arenas 2005; Lee et al. 2012; Aloise et al. 2012; Barber and Clark 2009; Liu and Murata 2010; Sobolevsky et al. 2014; Džamić et al. 2019; Traag et al. 2019; Biedermann et al. 2018). Comprehensive historical overviews are presented in Fortunato (2010); Fortunato and Hric (2016) as well as some later surveys (Khan and Niazi 2017; Javed et al. 2018).

And while the problem of finding the exact modularity maximum is known to be NP-hard (Brandes et al. 2006), most of the available modularity optimization approaches rely on specific discrete optimization heuristics (although, in some cases, an algorithmic optimality proof of the partition is possible Agarwal and Kempe 2008; Aloise et al. 2010; Sobolevsky et al. 2017; Belyi et al. 2019, 2021; Belyi and Sobolevsky 2022).

As we show below, modularity optimization can be formulated as a continuous matrix optimization problem. However, the direct application of generic gradient descent methods is inefficient due to a large number of local maxima, which gradient descent might not be able to overcome.

Recently, graph neural networks (GNNs) became increasingly popular for supervised classifications and unsupervised embedding of the graph nodes with diverse applications in text classification, recommendation systems, traffic prediction, computer vision and many others (Wu et al. 2020). GNNs were already successfully applied for community detection, including supervised learning of the ground-truth community structure (Chen et al. 2017) as well as unsupervised learning of the node features enabling representation modeling of the network, including stochastic block-model (Bruna and Li 2017) and other probabilistic models with overlapping communities (Shchur and Günnemann 2019) or more complex self-expressive representation (Bandyopadhyay and Peter 2020). However, existing GNN applications overlook unsupervised modularity optimization, which so far has been a major approach in classic community detection.

This paper aims to fill this gap by proposing a straightforward GNN-inspired algorithmic framework for unsupervised community detection through modularity optimization. We perform a comprehensive comparative evaluation of the performance of the proposed method against the state-of-the-art ADVNDS and Combo algorithms (capable of reaching the best known partition in most cases), a viral Louvain algorithm (which, despite its sub-optimal performance, is very fast and capable of handling the large-scale networks), and its successor, Leiden algorithm, that improves partition quality while preserving short execution time. We demonstrate that the method provides a reasonable balance between performance and speed for classic, synthetic and real-world networks, including temporal networks, and is sometimes capable of finding partitions with a higher modularity score that other algorithms cannot achieve.

More importantly, we believe the proposed approach serves as a proof of concept of leveraging GNN approaches for solving a broader range of network optimization problems. Such problems often arise when the aim is to reconstruct nodes’ attributes based on their features and the network structure, including various types of unsupervised graph clustering (Kampffmeyer et al. 2019; Bianchi 2022). Following the work applying machine learning to combinatorial optimization problems by  Bengio et al. (2021), several attempts to apply GNNs to hard combinatorial optimization problems were recently made. Some first promising results were obtained for the problems of minimum vertex cover, maximal clique, maximal independent set, and the satisfiability problem (Li et al. 2018). Furthermore, various GNN architectures were adapted to address the graph correlation clustering problem formulated as the minimum cost multicuts problem (Jung and Keuper 2022). Initial steps were even taken towards graph clustering via maximizing some variant of modularity function (Lobov and Ivanov 2019; Tsitsulin et al. 2020). However, their results still fall behind current state-of-the-art approaches in terms of modularity score. In this work, we show that with GNN-inspired techniques it is possible to achieve more practical results close to state-of-the-art.

In the following sections, first, we recall the formulation of community detection through modularity maximization and show how it could be framed as continuous quadratic optimization. Then we propose our GNN-inspired method and describe how to select its parameters. Lastly, we evaluate our approach using three benchmarks: classical real-world networks used previously in the literature to test modularity maximization algorithms, synthetic-networks benchmark, and a temporal network of taxi trips. The paper ends with conclusions and discussions.

The modularity optimization problem

The network modularity was among the first quality/objective functions proposed to assess and optimize the community structure (Newman 2006). However, it is now known to have certain shortcomings, including a resolution limit (Fortunato and Barthélémy 2007; Good et al. 2010) and the fact that it does not compare with a proper baseline and finds communities in random networks. Therefore alternative objective functions should be mentioned, e.g., Infomap description code length (Rosvall and Bergstrom 2007, 2008), Stochastic Block Model likelihood (Karrer and Newman 2011; Ball et al. 2011; Bickel and Chen 2009; Decelle et al. 2011, 2011; Yan et al. 2014), and Surprise (Aldecoa and Marìn 2011). Nevertheless, despite its limitations, modularity remains perhaps the most commonly used objective function so far.

In 2014, the authors proposed a novel optimization technique for community detection, “Combo” (Sobolevsky et al. 2014), capable of maximizing various objective functions, including modularity, description code length, and pretty much any other metric based on the link scoring and assessing the cumulative score of the intra-community links. At the time of publication, for modularity optimization, Combo outperformed other state-of-the-art algorithms, including a popular Louvain method (Blondel et al. 2008) in terms of the quality (modularity score) of the resulting partitioning, which could be achieved within a reasonable time for most real-world and synthetic networks of up to tens of thousands of nodes. The size limitation for the algorithm evaluation is due to the current implementation handling a full modularity matrix in the memory entirely. However, this is not a fundamental limitation and could be overcome by using sparse matrix operations. Recently, more precise algorithms were proposed, but they are slower and often designed to run on a distributed cluster (Džamić et al. 2019; Biedermann et al. 2018; Hamann et al. 2018; Lu et al. 2015). Moreover, their code is not available.

The proposed algorithms, including Combo, are often quite efficient and, in some cases, are able to reach the theoretical maximum of the modularity score as revealed by a suitable upper bound estimate (Sobolevsky et al. 2017). However, in general, finding the theoretically optimal solution may not be feasible, and one has to rely on heuristic algorithmic solutions without being certain of their optimality. Instead, an empiric assessment of their performance in comparison with other available algorithms could be performed.

The modularity function

In short, the modularity (Newman and Girvan 2004; Newman 2006) function of the proposed network partition quantifies how relatively strong all the edges between the nodes attached to the same community are. Specifically, if the network edge weights between each pair of nodes ij are denoted as \(e_{i,j}\), then the modularity of the partition com(i) (expressed as a mapping assigning community number com to each node i) can be defined as

$$\begin{aligned} M=\sum _{i,j, com(i)=com(j)}q_{i,j}, \end{aligned}$$
(1)

where the quantity \(q_{i,j}\) for each edge ij (call q a modularity score for an edge) is defined as its normalized relative edge weight in comparison with the random network model with the same node strengths. Namely,

$$\begin{aligned} q_{i,j}=\frac{e_{i,j}}{T}-\frac{w^{out}(i)w^{in}(j)}{T^2}, \end{aligned}$$

where \(w^{out}(i)=\sum _k e_{i,k}\), \(w^{in}(j)=\sum _k e_{k,j}\), \(T=\sum _i w^{out}(i)=\sum _j w^{in}(j)=\sum _{i,j}e_{i,j}\).

Rewrite the modularity optimization problem in a vector form: let \(Q=\left( q_{i,j}\right)\) be the matrix of all the modularity scores for all the edges (call it a modularity matrix). Let C be an \(n\times k\) matrix, where n is the number of network nodes and k is the number of communities we are looking to build. Each element \(c_{i,p}\) of the matrix can be zero or one depending on whether the node i belongs to the community p or not, i.e., whether \(com(i)=p\). If the communities are not overlapping, then each row of the matrix has one single unit element, and the rest of its elements are zeros.

More generally, if we admit uncertainty in community attachment, then the elements \(c_{i,p}\) of the matrix C could represent the probabilities of the node i to be attached to the community p. This way, \(c_{i,p}\in [0,1]\) and the sum of each row of the matrix C equals 1.

Then the modularity score M in the case of a discrete community attachment could be represented as a trace of matrix product

$$\begin{aligned} M=tr(C^T Q C), \end{aligned}$$
(2)

where tr denotes the trace of the matrix – a sum of all of its diagonal elements.

This way, finding the community structure of up to k communities optimizing the network modularity could be expressed as a constrained quadratic optimization problem of finding the \(n\times k\) matrix C maximizing the trace of matrix product \(M=tr(C^TQC)\), such that all \(c_{i,p}\in \{0,1\}\) and the sum of each row of the matrix C equals 1 (having a single unit element).

Replacing the binary attachment constraint \(c_{i,p}\in \{0,1\}\) with a continuous attachment \(c_{i,p}\in [0,1]\) relaxes the optimization problem to finding probabilistic community attachments. It could be easily shown that the optimal solution of the binary attachment problem could be derived from the optimal solution of the probabilistic attachment problem after assigning \(q_{i,i}=0\) for all the diagonal elements of the matrix Q. As diagonal elements \(q_{i,i}\) are always included in the sum M since \(com(i)=com(j)\) for \(i=j\), the values of the diagonal elements serve as constant adjustment of the objective function M and do not affect the choice of the optimal partition, so we are free to null them without loss of generality. At the same time, for each given i, once \(q_{i,i}=0\), if we fix the community attachments of all the other nodes \(j\ne i\), the objective function M becomes a linear function of the variables \(c_{i,p}\) subject to constraints \(\sum _p c_{i,p}=1\) and \(c_{i,p}\in [0,1]\). Obviously, the maximum of the linear function with linear constraints is reached at one of the vertices of the domain of the allowed values for \(c_{i,p}\), which will involve a single \(c_{i,p}\) being one and the rest being zeros. This way, we have proven the following:

Proposition

The optimal probabilistic attachment \(c_{i,p}\in [0,1]\) maximizing (2) in the case of \(q_{i,i}=0\) represents a binary attachment \(c_{i,p}\in \{0,1\}\) maximizing (2) for an arbitrary original Q.

So the discrete community detection problem through modularity optimization could be solved within the continuous constrained quadratic optimization framework. This allows the application of methods and techniques developed for continuous optimization, such as gradient descent, for example. However, despite its analytic simplicity, the quadratic programming problem with indefinite matrix Q is still NP-hard. In particular, the dimensionality of the problem leads to multiple local maxima challenging direct application of the standard continuous optimization techniques, like gradient descent. Indeed, any discrete partition, such that no single node could be moved to a different community with a modularity gain, will become such a local maximum. Unfortunately, finding such a local maximum rarely provides a plausible partition – such solutions could have been obtained with a simple greedy discrete heuristic iteratively adjusting the single node attachments, while we know that the modularity optimization, being NP-hard, generally requires more sophisticated non-greedy heuristics, like Sobolevsky et al. (2014). To address this challenge, we introduce a new heuristic method that efficiently finds high-quality solutions to the described quadratic optimization problem, although without guaranteed achievement of the global optimum.

The GNNS method

In this section, we present a GNN-style method (GNNS) for unsupervised network partition through modularity optimization inspired by recurrent graph neural network models (in the definition of Wu et al. (2020)) as well as an older Weisfeiler-Lehmann graph node labeling algorithm Weisfeiler and Leman (1968). Ideas similar to the Weisfeiler-Lehmann algorithm have already found their application to community detection in a well-known label propagation algorithm Raghavan et al. (2007). Their development and application to modularity maximization were proposed and studied in detail in consequent works (Barber and Clark 2009; Liu and Murata 2010). However, they still were discrete optimization heuristics in their spirit and could not take into account recent advances in graph neural networks. Unlike these methods, we propose a continuous optimization technique that considers current nodes’ attachments combined with attachments of their neighbors. Namely, we propose a simple iterative process starting with a random initial matrix \(C=C_0\) and at each step \(t=1,2,3,...,N\) performing an iterative update of the rows \(c_i\) of the matrix C representing the node i community attachments as follows:

$$\begin{aligned} \tilde{c}_{i}^t = F\left( c_i^{t-1}, Q_i C^{t-1}\right) , \end{aligned}$$
(3)

where \(Q_i=\left( q_{i,j}:j=1,2,...,n\right)\) is the i-th row of the modularity matrix Q representing the outgoing edges from the node i. This way, the term \(Q_i C^{t-1}\) collects information about the neighbor nodes’ community attachments (this could be viewed as a development of ideas discussed in Barber and Clark (2009)), and the equation (3) updates the node community attachments with respect to their previous attachments as well as the neighbor node attachments. In order to ensure the conditions \(\sum _p c_{i,p}=1\), a further normalization \(c_{i,p}^t=\tilde{c}_{i,p}^t/\sum _{p^*}\tilde{c}_{i,p^*}^t\) needs to be applied at each iteration.

A simple form for an activation function F could be a superposition of a linear function subject to appropriate scale normalization and a rectified linear unit \(ReLU(x)=\left\{ \begin{array}{c}0,x\le 0\\ x,x>0\end{array}\right.\), leading to

$$\begin{aligned} \tilde{c}_{i,p}^t=ReLU\left( f_1 c_{i,p}^{t-1} + f_2 Q_i C_p^{t-1}/\tau _i^t + f_0\right) , \ c_{i,p}^t = \tilde{c}_{i,p}^t/\sum _{p^*}\tilde{c}_{i,p^*}^t, \end{aligned}$$
(4)

where \(f_0, f_1, f_2\) are the model parameters, \(C_p=\left( c_{j,p}:j=1,2,...,n\right)\) is the p-th column of the matrix C representing all the node attachments to the community p, and the \(\tau _i^k = \left| \max _{p^*}{Q_i C_{p^*}^{t-1}}\right|\) are the normalization coefficients ensuring the same scale for terms of the formulae.

Intuitive considerations allow defining possible ranges for the model coefficients \(f_0,f_1,f_2\). Defining the coefficient \(f_1\) within the range \(f_1\in [0,1]\) would ensure decay scaling of the community attachment at each iteration unless confirmed by the strength of the node’s attachment to the rest of the community expressed by \(Q_i C_p^{t-1}\). A free term \(f_0\in [-1,0]\) provides some additional constant decay of the community attachment at each iteration, while the term \(Q_i C_p^{t-1}/\tau _i^t\) strengthens the attachment of node i to those communities having positive modularity scores of the edges between node i and the rest of the community. Normalization term \(\tau _i^t\) ensures that the strongest community attachment gets a maximum improvement of a fixed scale \(f_2\).

A consistent node attachment that cannot be improved by assigning the given node i to a different community p (i.e., having \(Q_i C_p^{t-1}=\tau _i^{t-1}=\max _{p^*} Q_i C_{p^*}^{t-1}\)) should see the fastest increase in the attachment score \(\tilde{c}_{i,p}^t\), eventually converging to the case of \(c_{i,p}=1\). Any weaker attachment should see a decreasing community membership score \(c_{i,p}^t\), eventually dropping to zero. This could be ensured by the balancing equation \(f_1+f_2+f_0 = 1\), allowing to define appropriate \(f_2\) given \(f_0\) and \(f_1\).

Training the GNNS

The sequence of the GNNS iterations (4) depends on the choice of the model parameters \(f_0,f_1\), as well as the initial community attachments. The final convergence also often depends on those choices. Given that, a good strategy is to simulate multiple iteration sequences with different initial attachments and choose the best final result. Also, it turns out that the method demonstrates reasonable performance for a broad range of parameter values \(f_0\in [-1,0]\), \(f_1\in [0,1]\), so rather than trying to fit the best choice for all the networks or a given network, one may simply include the random choice for \(f_0, f_1\) along with a random choice of the initial community attachments.

So the proposed GNNS algorithm starts with a certain number of S random partitions and parameter choices. Then the GNNS performs 10 iterations of the partition updates according to (4). Among those, a batch of the best \(\lfloor S/3 \rfloor\) partitions (with the highest achieved modularity scores) is selected and further supplemented with another \(S-\lfloor S/3 \rfloor\) configurations derived from the selected batch by randomly shuffling partitions and assigning new random parameters. Another 10 iterations are performed. Another batch of \(\lfloor S/9 \rfloor\) best partitions is selected and shuffled, creating a total of \(\lfloor S/3 \rfloor\) samples. Then another 30 iterations are performed with those, and for small S (\(S \le 1000\)), the best resulting partition is selected as the final outcome of the algorithm. For larger S (\(S > 1000\)), another batch of \(\lfloor S/30 \rfloor\) best partitions is selected and shuffled, creating a total of \(\lfloor S/10 \rfloor\) samples. Then the final 100 iterations are performed with those, and the best resulting partition is selected as the final outcome of the algorithm. Finally, the partition is discretized by assigning each node to the cluster with maximum probability. The following algorithm describes the whole process.

figure a

If matrix Q is stored as a sparse graph matrix and two separate vectors for in- and out-degrees, this algorithm has a time complexity of O(Simk), where S is the number of attempts with different initial attachments and parameter values, i is the number of iterations, m is the number of edges, and k is the maximum number of communities. Value k is usually selected as an (educated) guess of the expected maximum number of communities or as the maximum feasible value. One nice property the GNNS inherits from neural networks is that it is easily parallelizable by modern frameworks.

Below we evaluate the three versions of the GNNS: a fast version with \(S=100\) denoted GNNS100, a slower but more precise version with \(S=2500\) denoted GNNS2500, and a slow but very precise version with \(S=25000\) denoted GNNS25000.

Comparative evaluation

We implemented the proposed GNNS modularity optimization algorithm in Python and ran our experiments in Google ColabFootnote 1 (results of Combo runs on the three largest networks were obtained on a laptop because of the time limits in Colab). The source code is available on GitHubFootnote 2. We evaluate our algorithm against other state-of-the-art techniques mentioned above: a fast and popular Louvain method Blondel et al. (2008), its successor, the Leiden algorithm Traag et al. (2019), Combo algorithm Sobolevsky et al. (2014), often capable of reaching the best known modularity score over a wide range of networks, and the ADVNDS method claimed to be state-of-the-art method in 2017 Džamić et al. (2019). Both papers introducing Combo and ADVNDS make a thorough comparison with other methods available at their time and conclude that those methods outperform their competitors. We also tried all other modularity maximization methods implemented in the CDLib library developed specifically for evaluating community detection methods Rossetti et al. (2019). However, their performance was much worse both in terms of running time and achieved modularity scores. Some more recent methods claim to be new state-of-the-art, but they are designed to be run on distributed systems, and their results cannot be fairly compared Biedermann et al. (2018). Thus, first, we shall compare our method with four selected approaches over a sample of classic network examples. And then, all those methods but ADVNDS (whose implementation is not available to us) shall also be evaluated over the series of the two types of random graphs often used for benchmarking the community detection algorithms:  Lancichinetti et al. (2008) and Block-model graphs (Karrer and Newman 2011).

As the GNNS chooses the best partition among multiple runs, we consider its three configurations involving a) 100, b) 2500, and c) 25000 initial random samples of partitions plus model configurations. We shall refer to those as GNNS100, GNNS2500, and GNNS25000. All other algorithms in our comparison also involve random steps, and the best partition they converge to is not perfectly stable. Thus their performance could also benefit from choosing the best partition among multiple runs, especially for the Louvain method. It often takes up to 10-20 attempts to find the best partition they are capable of producing. E.g. applying Combo and Louvain to the classic case of the Email network leads to the following performance reported in Table 1 below. As we see, both reach their best performance after 20 iterations (although results of the Combo for this network are significantly better compared to Louvain) but do not further improve over the subsequent 30 iterations. Based on that, in the further experiments, we shall report the best performance of the Combo, Louvain, and Leiden algorithms achieved after 20 attempts for each. Since the implementation of the ADVNDS algorithm is not available to us, we provide results reported in its original paper after ten runs with running time calculated as the reported average multiplied by ten Džamić et al. (2019).

Table 1 The best modularity scores reached by the Combo and Louvain method after a different number of attempts for the Email network

Classic examples

Most of the classic instances were taken from the clustering chapter of the 10th DIMACS Implementation ChallengeFootnote 3 and were often reused in community detection literature Sanders et al. (2014). Table 2 reports the sources and details of those networks. Since the complexity of our method is proportional to the maximum number of communities, we limited this dataset to the networks with less than three hundred communities, according to the ADVNDS paper Džamić et al. (2019). All originally directed networks were symmetrized, and self-loops were removed. The largest network, Krong500slogn16, is synthetic, while all other networks represent real-world data.

Table 2 List, with sources, of the networks we used in our benchmark

Tables 3 and 4 report the performance of the proposed approach compared with ADVNDS Džamić et al. (2019), Leiden Traag et al. (2019), Louvain Blondel et al. (2008), and Combo methods Sobolevsky et al. (2014). Missing values in the ADVNDS column mean that the corresponding networks were not present in the original paper. For the Leiden and Louvain methods, we used their implementation in the Python igraph package, setting the number of iterations to \(-1\) allowing the Leiden algorithm to run until the best modularity is achieved. The missing value in the Combo column corresponds to the network too large for the current implementation.

Table 3 Execution time (in seconds) of the ADVNDS, Leiden, Louvain, Combo, and GNNS algorithms over the classic network examples
Table 4 Best modularity scores of the ADVNDS, Leiden, Louvain, Combo, and GNNS algorithms over the classic network examples

According to the results reported in Tables 3 and 4, Louvain is the fastest algorithm, closely followed by the GNNS100 method, especially for larger networks, with GNNS100 often providing higher modularity scores, especially for smaller networks, while ADVNDS, Combo, and GNNS25000 are by far the slowest, demonstrating however superior performance. GNNS2500 finds pretty good partitions for some networks, and it works much faster than ADVNDS and Combo. In general, its performance is comparable to Leiden. Both algorithms work quickly and find partitions with modularity just a bit below the best-known. Moreover, Combo could not handle the largest network, while GNNS2500 found a partition better than Leiden and did so almost six times faster. For that network, GNNS25000 finds the highest modularity score, outperforming all other methods, including ADVNDS.

While ADVNDS reported the highest known modularity scores for all the networks where it was applied except the largest one, it is much slower than GNNS, and its implementation is not available, so we could not test it on other networks. Besides, the current GNNS implementation uses pure Python, while implementing it in C++ (as done for all other algorithms) could provide further speed improvements.

Overall, while no single heuristic is the best solution for all the cases, a GNNS algorithm often finds a plausible solution, sometimes the best-known one, and provides a flexible parameter-controlled trade-off between speed and performance ranging from the fastest to the close-to-optimal performance, which makes it a valuable addition to an existing collection of algorithms. More importantly, since this simple GNN-style heuristic can perform comparably to the state-of-the-art, that serves as a proof-of-concept for considering more sophisticated GNN architectures, configurations, and learning techniques that could provide further improvement in solving community detection problem as well as other complex network optimization problems like minimum vertex cover, maximal independent set Li et al. (2018), clique partitioning, correlation clustering Jung and Keuper (2022), spectral clustering Bianchi et al. (2020), and others Bengio et al. (2021), Yow and Luo (2022).

Synthetic networks

In the next test, the methods were applied to the sets of synthetic networks – Lancichinetti–Fortunato–Radicchi (LFR) Lancichinetti et al. (2008) and Stochastic Block-Model (SBM) Holland et al. (1983). We built two sets of ten LFR networks of size 250 each, with overall average node degrees of \(\overline{d}=29.6\) and \(\overline{d}=13.3\), and three sets of ten SBM networks of size 300 for each value of the parameter \(\nu\), defining the ratio of the probability for the model graph to have inner-community edges (three communities of size 100 each) divided by the probability of the inter-community edges. For all SBM, networks the average probability of an edge was 0.1, leading to the average node degree \(\overline{d}=30\). The Python package networkx was used to generate these synthetic networks.

Table 5 shows the average values of modularity, normalized mutual information (NMI), and execution time obtained after partitioning ten instances of each type using the Leiden, Louvain, Combo (best of the 20 runs), and GNNS100 algorithms. As we can see, Combo demonstrates superior performance in terms of modularity in all sets of networks except for the last SBM set with the highest \(\nu =3\), where all algorithms performed achieve the same score. While GNNS100 demonstrates suboptimal performance for those networks compared to Combo, it works at unparalleled speed, several times faster than Louvain and Leiden and two orders of magnitude faster than Combo. Based on that, GNNS100 proves itself to be the fastest solution for partitioning the synthetic networks demonstrating reasonable performance in terms of the modularity and NMI scores achieved.

Table 5 Comparative evaluation of the Leiden, Louvain, Combo, and GNNS algorithms over the synthetic networks

Temporal networks

Earlier works established the applicability of the GNN architecture for capturing dynamic properties of the evolving graphs Ma et al. (2020). As GNNS is well-suited for the iterative partition improvement, it could be suggested for active learning of the temporal network partition. For example, initial warm-up training could be performed over the first temporal layers with subsequent tuning iterations while moving from a current temporal layer to the next one.

Below we apply the approach to the temporal network of the daily taxi mobility between the taxi zones in New York City (NYC). We use the 2016-2017 data provided by the NYC Taxi and Limousine CommissionFootnote 4 to build the origin-destination network of yellow and green taxi ridership between the NYC taxi zones (edges of the network are weighted by the number of trips). The results of the temporal GNNS (GNNStemp) for each daily ridership network are compared against single runs (for the sake of speed) of the Louvain and Combo algorithms as the fastest and the most precise from the available algorithms. GNNStemp uses an initial warm-up over the year 2016 aggregated network and then performs a single run of 20 fine-tune iterations for each daily temporal layer in 2017, starting with the previously achieved partition. The achieved best modularity scores fluctuate slightly between the daily layers. The 2017 yearly average of the ratios of the daily scores achieved by each algorithm to the best score of all three algorithms for that day look as follows: \(97.97\%\) for Louvain, \(99.99\%\) for Combo, and \(99.78\%\) for GNNStemp. At the same time, the total elapsed time is as follows: 1.71 sec for Louvain, 16.04 sec for Combo, and 8.40 sec for GNNStemp. Furthermore, GNNStemp managed to find the best modularity score not reached by two other algorithms on \(11.2\%\) of the temporal layers.

So the performance of GNNStemp in terms of the achieved modularity score falls right in the middle between Louvain and Combo, while GNNStemp is nearly twice as fast as the single run of Combo.

Details of the day-by-day GNNStemp performance compared to the best of the three algorithms are shown in fig. 1. On some days, one may notice somewhat visible differences between the top algorithm performance and that of the GNNStemp, but GNNStemp captures the temporal pattern of the modularity score dynamics pretty well, with a correlation of \(99.44\%\) between the GNNStemp score and the top algorithm score timelines.

Fig. 1
figure 1

Comparative performance of GNNS vs. the best of GNNS, Louvain, and Combo for community detection on the temporal network of daily taxi ridership in NYC. Subplots depict achieved network modularity and its time-series properties: autocorrelation, seasonality, and periodicity obtained by seasonal decomposition using moving averages

By performing time-series periodicity and trend-seasonality analysis, one could notice some interesting temporal patterns in the strength of the network community structure, as also presented in fig. 1. The community structure quantified by means of the best achieved modularity score demonstrates a strong weekly periodicity with a stronger community structure over the weekends, including Fridays. The strength of the community structure also shows noticeable seasonality, with stronger communities over the winter and weaker over the summer. One may relate this observation with the people exploring more destinations during their weekend and holiday time. As seen in fig. 1, the GNNStemp accurately reproduces the patterns discovered for the best partition timeline.

In conclusion, while Combo runs could demonstrate higher partition accuracy at certain temporal layers, the GNNStemp is capable of reaching similar performance much faster while adequately reproducing the qualitative temporal patterns. So GNNStemp could be trusted as a fast and efficient solution for extracting insights into the dynamics of the temporal network structure.

Conclusions

We proposed a novel recurrent GNN-inspired framework for unsupervised learning of the network community structure through modularity optimization. A simple iterative algorithm depends on only two variable parameters, and we propose an integrated technique for tuning those so that parameter selection and modularity optimization are performed within the same iterative learning process.

The algorithm’s performance has been evaluated on classic network examples, synthetic network models, and a real-world temporal network case. Despite its simplicity, the new algorithm reaches similar and, in some cases, higher modularity scores compared to the more sophisticated discrete optimization state-of-the-art algorithms. One of the possible limitations of the proposed method is its dependence on the number of runs. However, this could be viewed as an advantage since it allows flexible adjustment of the algorithm’s complexity, tuning the model parameters to find the right balance between speed and performance. At the low-complexity settings, it can significantly outperform alternative methods in terms of running time while maintaining reasonable performance in terms of partition quality. Thus, the algorithm is efficiently applicable in both scenarios—when the execution time is of the essence as well as when the quality of the resulting partition is a paramount priority.

Furthermore, the algorithm enables a special configuration for the active learning of the community structure on temporal networks, reconstructing all the important longitudinal patterns.

But more importantly, we believe the algorithm serves as a successful proof of concept for applying more advanced GNN-type techniques for unsupervised network learning, and opens possibilities for solving a broader range of network optimization problems. Similar approaches could work for such related problems as clique partitioning and correlation clustering. Applying more sophisticated model architectures along with constantly developing new methods for neural network training can possibly further improve the quality of found solutions. Developing these methods will constitute our future work in this direction.