1 Introduction

Spectral clustering is one of the most popular clustering techniques and—despite its simplicity—often outperforms traditional clustering algorithms (von Luxburg 2007). The goal is to identify groups of vertices in a graph that are highly connected to other vertices within the cluster, but only loosely coupled to other clusters. This can be viewed as an unsupervised learning problem. The graph could, for instance, represent the similarity between a set of given items or data points. Each item or data point corresponds to a vertex, and two vertices are connected by an edge if they are similar. The degree of similarity can be represented by the associated edge weight. By applying clustering techniques to such a similarity graph, it is possible to identify groups of items or data points that share similar properties. If the similarity measure is symmetric (i.e., object A is similar to object B implies that object B is similar to object A and the weights are identical), then the resulting graph is undirected. Spectral clustering algorithms for undirected graphs are well understood and have been successfully applied to a host of different applications, see von Luxburg (2007) for a detailed introduction and overview. If the similarity measure, however, is asymmetric, this results in a directed graph. A simple example is the internet, where website A might point to website B, but not vice versa. Many different spectral clustering algorithms for directed graphs have been proposed over the last decades. A compelling idea is to turn the directed graph into an undirected graph and to then leverage state-of-the-art clustering techniques for undirected graphs. Many symmetrization approaches construct—often based on intuition or empirically determined hyperparameters—symmetric graph representations using, e.g., combinations of adjacency matrices, Laplacians, (reweighted) in- and out-degree matrices, and invariant distributions. Introducing all the proposed clustering algorithms in detail would go beyond the scope of this paper, we thus refer the reader to Zhou et al. (2005), Huang et al. (2006), Meila and Pentney (2007), Satuluri and Parthasarathy (2011), Malliaros and Vazirgiannis (2013) and Rüdrich (2019) and references therein. Instead of constructing a symmetric matrix representation, a clustering approach that is based on computing dominant eigenvalues and eigenvectors of a complex-valued but Hermitian adjacency matrix is described in Cucuringu et al. (2020).

Our approach relies on established dynamical systems theory and in particular the analysis of transfer operators—e.g., the Perron–Frobenius operator or the Koopman operator—that describe the evolution of a dynamical system (Koopman 1931; Lasota and Mackey 1994; Dellnitz and Junge 1999; Budišić et al. 2012; Klus et al. 2016). These methods have been successfully applied to high-dimensional molecular dynamics, fluid dynamics, and quantum mechanics problems (Rowley et al. 2009; Klus et al. 2018a, 2022), but also to stock-market, EEG, and traffic data (Hua et al. 2017; Marrouch et al. 2020; Avila and Mezić 2020). A recent review of these methods can be found in Klus et al. (2018b). Transfer operator approaches have also been used to analyze undirected graphs, e.g., for partitioning power networks (Raak et al. 2016), spectral network identification (Mauroy and Hendrickx 2017), and decentralized spectral clustering (Zhu et al. 2022). We will focus in particular on directed and time-evolving graphs and relationships between graph Laplacians and transfer operators.

It is well known that spectral clustering algorithms for undirected graphs can be interpreted in terms of random walks. The goal is to find a partition of the graph with the property that a random walker stays within a cluster for a long time and rarely jumps to other clusters (von Luxburg 2007; Djurdjevac et al. 2011; Sarich et al. 2014; Djurdjevac 2012). This is equivalent to the detection of metastable sets in stochastic dynamical systems. Metastability is well defined if the process is in equilibrium, which means that it is reversible with respect to its stationary distribution. The associated transfer operators are then self-adjoint with respect to appropriately defined inner products, and the eigenvalues are consequently real-valued and can be interpreted as inherent time scales. However, in many cases the process is out of equilibrium as described in Koltai et al. (2018). The system might, for instance, be time-homogeneous but non-reversible or it might be time-inhomogeneous due to time-dependent energy potentials or external forces. In the graph setting, this corresponds to directed and time-evolving networks (Naoki et al. 2017; Petit et al. 2018; Holme and Saramäki 2019). Non-equilibrium processes are in general not reversible, and the eigenvalues of transfer operators are complex-valued (Embree and Trefethen 2005; Djurdjevac Conrad et al. 2016). A natural extension of the definition of metastability to non-reversible and time-inhomogeneous systems is the notion of coherence. Coherent sets (Froyland 2013; Allshouse and Peacock 2015; Banisch and Koltai 2017; Froyland and Junge 2018; Klus et al. 2019a) are regions of the state space that are only slowly—compared to other sets—dispersed by the flow and play an essential role in the analysis of complex fluid flows. For the detection of such coherent sets, we need to analyze generalized transfer operators that are related to the forward–backward dynamics of the system (Banisch and Koltai 2017). We will directly define these operators on graphs and also use the random walk perspective to extend spectral clustering algorithms to directed and time-evolving networks. This results in a generalized Laplacian, whose eigenvalues and eigenvectors again encode information about (potentially time-dependent) clusters. The main contributions of this work are:

  • We show that spectral clustering of undirected graphs corresponds to computing and analyzing eigenfunctions of the Koopman operator associated with the random walk process defined on the graph. The detected patterns can be interpreted as metastable sets.

  • We apply transfer operator theory to random walks on directed and time-evolving graphs and compute eigenfunctions of an operator that is related to the forward–backward dynamics. By clustering the eigenfunctions, we obtain coherent sets.

  • We construct benchmark problems by discretizing dynamical systems such as a rotating double-well problem and the quadruple gyre and show that random walkers starting within the same cluster will on average remain in close proximity over long time scales.

  • Furthermore, we analyze a time-evolving network that describes the social interactions of high-school students and show that the detected clusters correspond to the different specializations.

Our approach provides a clear physical interpretation of clusters in directed and time-evolving graphs and a principled way to evaluate the quality of the clustering. The remainder of the paper is structured as follows: In Sect. 2, we will introduce transfer operators and directed and undirected graphs. In Sect. 3, we will show how transfer operator theory can be applied to graphs and how this relates to conventional spectral clustering techniques. Furthermore, we define the forward–backward Laplacian and analyze its properties. This allows us to extend spectral clustering methods to directed and time-evolving graphs as we will show in Sect. 4. All results will be illustrated with the aid of guiding examples and benchmark problems. Open questions and future work will be discussed in Sect. 5.

2 Transfer Operators and Graphs

Our goal is to define transfer operators on graphs and to apply data-driven methods for the approximation of these operators to time-series data generated by random walkers. In this section, the required concepts and notation will be introduced.

2.1 Transfer Operator Theory

Transfer operators such as the Perron–Frobenius operator or the Koopman operator describe the evolution of probability densities or observables of a dynamical system. Eigenvalues and eigenfunctions of these operators contain important information about global properties of the underlying system.

2.1.1 Time-Homogeneous Systems

Let \( \{ X_t \}_{t \ge 0} \) be a time-homogeneous stochastic process defined on the state space \( \mathbb {X} \subset \mathbb {R}^n \) and \( p_\tau :\mathbb {X} \times \mathbb {X} \rightarrow \mathbb {R}_{\ge 0} \) the transition density function for a fixed lag time \( \tau \) so that for every set \( \mathbb {A} \) it holds that

$$\begin{aligned} \mathbb {P}[X_{t+\tau } \in \mathbb {A} \mid X_t = x] = \intop _\mathbb {A} p_\tau (x,y) \textrm{d}y. \end{aligned}$$

That is, \( p_\tau (x, y) \) is the probability density of \( X_{t+\tau } = y \) conditioned on \( X_t = x \).

Definition 2.1

(Transfer operators) Let \( \rho \in L^1(\mathbb {X}) \) be a probability density and \( f \in L^\infty (\mathbb {X}) \) an observable of the system.

  1. (i)

    The Perron–Frobenius operator \( \mathcal {P}_\tau :L^1(\mathbb {X}) \rightarrow L^1(\mathbb {X}) \) is given by

    $$\begin{aligned} \mathcal {P}_\tau \rho (x) = \intop _{\mathbb {X}} p_{\tau }(y,x) \rho (y) \textrm{d}y. \end{aligned}$$
  2. (ii)

    The Koopman operator \( \mathcal {K}_\tau :L^\infty (\mathbb {X}) \rightarrow L^\infty (\mathbb {X}) \) is defined by

    $$\begin{aligned} \mathcal {K}_\tau f(x) = \intop _{\mathbb {X}} p_{\tau }(x,y) f(y) \textrm{d}y = \mathbb {E}[f(X_{t+\tau }) \mid X_t = x]. \end{aligned}$$

Remark 2.2

Assuming the system admits a unique invariant density \( \pi \), let \( u(x) \in L_\pi ^1(\mathbb {X}) \) be a probability density with respect to the equilibrium density, then the Perron–Frobenius operator with respect to the equilibrium density is defined by

$$\begin{aligned} \mathcal {T}_\tau u(x) = \intop _{\mathbb {X}} \frac{\pi (y)}{\pi (x)} p_{\tau }(y, x) u(y) \textrm{d}y. \end{aligned}$$

A dynamical system is said to be reversible if the so-called detailed balance condition

$$\begin{aligned} \pi (x) p_{\tau }(x, y) = \pi (y) p_{\tau }(y, x) \end{aligned}$$

is fulfilled for all \( x, y \in \mathbb {X} \). Roughly speaking, this means that the stochastic process is indistinguishable from its time-reversed counterpart. If the system is reversible—this is, for example, the case for classical molecular dynamics problems—then the eigenvalues of the associated Perron–Frobenius operator and Koopman operator are real-valued and we can compute metastable sets by applying clustering techniques to the dominant eigenfunctions of these operators (Klus et al. 2016, 2018b). For non-reversible systems, we typically obtain complex eigenvalues.

2.1.2 Time-Inhomogeneous Systems

For time-inhomogeneous systems, the transition density function and thus the operators defined above explicitly depend on the starting time t. Such systems are in general not reversible (Koltai et al. 2018). To simplify the notation, we will omit the explicit time-dependence and write again, e.g., \( \mathcal {P}_\tau \) instead of \( \mathcal {P}_{t, \tau } \).

Definition 2.3

(Forward–backward operator) Let \( \mathbbm {1}_\mathbb {X} \) denote the indicator function on \( \mathbb {X} \) and define \( \nu = \mathcal {P}_\tau \mathbbm {1}_\mathbb {X} \). The forward–backward operator \( \mathcal {F}_\tau \) is given by

$$\begin{aligned} \mathcal {F}_\tau f(x) = \intop _{\mathbb {X}} p_\tau (x, y) \frac{1}{\nu (y)} \intop _{\mathbb {X}} p_\tau (z, y) f(z) \textrm{d}z \textrm{d}y. \end{aligned}$$

Applying clustering techniques to the dominant eigenfunctions of the operator \( \mathcal {F}_\tau \), we obtain coherent sets, see Froyland (2013), Banisch and Koltai (2017) and Klus et al. (2019a).

2.1.3 Data-Driven Transfer Operator Approximation

Popular data-driven approaches for the approximation of transfer operators include extended dynamic mode decomposition (EDMD) (Williams et al. 2015; Klus et al. 2016) and its various extensions (Williams et al. 2015; Li et al. 2017; Klus et al. 2019b). Assume that we have data of the form \( \{ (x^{(i)}, y^{(i)}) \}_{i=1}^m \), where \( y^{(i)} = \Theta ^\tau (x^{(i)}) \) and \( \Theta ^\tau \) is the flow map associated with the dynamical system. In addition to the training data, EDMD requires a vector-valued function \( \phi :\mathbb {R}^n \rightarrow \mathbb {R}^N \), with \( \phi (x) = [\phi _1(x), \dots , \phi _N(x)]^\top \), that maps the data into a typically higher-dimensional feature space.Footnote 1 Given the transformed data matrices \( \Phi _x, \Phi _y \in \mathbb {R}^{N \times m} \), defined by

$$\begin{aligned} \Phi _x = \begin{bmatrix} \phi \big (x^{(1)}\big )&\phi \big (x^{(2)}\big )&\dots&\phi \big (x^{(m)}\big ) \end{bmatrix} \quad \text {and} \quad \Phi _y = \begin{bmatrix} \phi \big (y^{(1)}\big )&\phi \big (y^{(2)}\big )&\dots&\phi \big (y^{(m)}\big ) \end{bmatrix}, \end{aligned}$$

we can compute empirical estimates of matrix representations of transfer operators projected onto the space spanned by \( \phi \): The approximated matrix representations of the Koopman operator \( \mathcal {K}_\tau \) and the Perron–Frobenius operator \( \mathcal {P}_\tau \) are given by

$$\begin{aligned} \widehat{K}_\tau ^{(m)} = C_{xx}^+ C_{xy} \quad \text {and} \quad \widehat{P}_\tau ^{(m)} = C_{xx}^+ C_{yx}, \end{aligned}$$

where \( C_{xx} = \frac{1}{m} \Phi _x \Phi _x^\top \) and \( C_{xy} = C_{yx}^\top = \frac{1}{m} \Phi _x \Phi _y^\top \) and \( ^+ \) denotes the pseudoinverse. Analogously, we can approximate the forward–backward operator by

$$\begin{aligned} \widehat{F}_\tau ^{(m)} = C_{xx}^+ C_{xy} C_{yy}^+ C_{yx} \end{aligned}$$

as shown in Klus et al. (2019a). The pseudoinverses \( C_{xx}^+ \) and \( C_{yy}^+ \) could also be replaced by regularized inverses of the form \( (C_{xx} + \varepsilon I)^{-1} \) and \( (C_{yy} + \varepsilon I)^{-1} \), respectively, where \( \varepsilon \) is a regularization parameter. The representation of the forward–backward operator is closely related to canonical correlation analysis (CCA) (Hotelling 1936; Melzer et al. 2001), which aims at maximizing the correlation between multidimensional random variables, see Klus et al. (2019a) for details.

Remark 2.4

For the approximation of the Perron–Frobenius operator, we need to assume that \( x^{(i)} \) is sampled from the uniform distribution. If we use one long equilibrated trajectory instead, then \( x^{(i)} \sim \pi \) and we obtain \( \widehat{T}_\tau ^{(m)} = C_{xx}^+ C_{yx} \), i.e., an approximation of the reweighted operator \( \mathcal {T}_\tau \), cf. Klus et al. (2018b).

2.2 Graph Theory

We assume the reader to be familiar with graph-theoretical concepts and will thus only briefly define directed and undirected graphs as well as associated matrix representations, a more detailed introduction can be found, e.g., in Cormen et al. (2009), Rigo (2016) and Lambiotte and Schaub (2022).

2.2.1 Basic Graph Properties

A directed graph is given by a set of vertices and a set of edges . An undirected graph can then be regarded as a special case, where the edges have no directionality.

Definition 2.5

(Weighted adjacency matrix) The weighted adjacency matrix \( A = (a_{ij})_{i,j=1}^n \) associated with a graph is defined by

where is a function that determines the weight of the edge .

Furthermore, we define the out-degree of a vertex and the degree matrix by

In many applications, the network structure is changing in time. We will consider time-evolving graphs—also called temporal graphs, dynamic graphs, or time-varying graphs—where the number of vertices is fixed, but the number of edges can change. Different approaches have been developed to study dynamics of network change. Aggregation-based methods merge all temporal information into one graph (Onnela et al. 2007), focusing on the global network structure and neglecting information on smaller scales. More temporal information is preserved when aggregating networks to, e.g., multilayer networks (Boccaletti et al. 2014). In this paper, we focus on approaches where each time-slice is analyzed separately and then, this information is studied to extract global network change (Holme and Saramäki 2019). In particular, we assume that a temporal change of a graph at discrete points in time, \( t_1, t_2, \dots \), is formally given by a sequence \((A^{(t_1)}, A^{(t_2)},\dots )\), where \(A^{(t_i)}\) is the weighted adjacency matrix of the graph at time \( t_i \).

2.2.2 Random Walks on Graphs

A time-discrete random walk on a graph is a discrete stochastic process \( X_t \) that starts in a vertex and at each time step moves to an adjacent vertex (or stays in if self-loops are allowed) with a probability that is proportional to the weight of the edge . Such a random walk on is defined by the row-stochastic transition probability matrix

(1)

That is, the entry \( p_{ij} \) of P is given by . If a directed graph is strongly connected and aperiodic,Footnote 2 then the random walk process is ergodic and it converges to a unique stationary distribution. If does not satisfy these properties, typically the random walk process is modified by including a so-called teleportation probability, i.e., a small probability to randomly jump to any vertex in the graph (Brin and Page 1998). If the graph is undirected, the adjacency matrix is symmetric and the matrix P is similar to a symmetric matrix and its spectrum real-valued. Many variants of Laplacians for directed and undirected graphs have been used in the literature, we will adopt the following definition, cf. von Luxburg (2007).

Definition 2.6

(Graph Laplacian) The unnormalized graph Laplacian associated with is defined by and the random-walk normalized graph Laplacian by

Note that if \( \lambda \) is an eigenvalue of P, then \( 1 - \lambda \) is an eigenvalue of \( L_rw \). Furthermore, the eigenvectors of the two matrices are identical (Meila and Shi 2001).

2.2.3 Spectral Clustering of Undirected Graphs

Several different clustering algorithms based on graph Laplacians have been proposed, we will use the normalized spectral clustering algorithm as defined in von Luxburg (2007).

figure b

We can either use \( L_rw \) or P for spectral clustering. The only difference is that for the former we have to compute the smallest and for the latter the largest eigenvalues. The number of clusters k is typically chosen in such a way that there exists a spectral gap between \( \lambda _k \) and \( \lambda _{k+1} \) (Djurdjevac 2012; Djurdjevac et al. 2011). We can also apply other clustering algorithms in step 4. In what follows, we will sometimes use the sparse eigenbasis approximation (SEBA) algorithm (Froyland et al. 2019), which is advantageous when there is no clear eigengap and was specifically developed for the extraction of metastable and coherent sets.

3 The Forward–Backward Laplacian

We will now define transfer operators on graphs and highlight relationships with conventional spectral clustering algorithms.

3.1 Transfer Operator Perspective

We consider the case \( \tau = 1 \). That is, given a discrete distribution \( \rho \) on the graph at time t the Perron–Frobenius operator applied to \( \rho \) yields the distribution at time \( t + 1 \). Correspondingly, the Koopman operator describes the evolution of observables f. From a data-driven perspective, this means that each random walker takes just one step. The state space of the random walkers is given by and since \( \tau = 1 \) it holds that . The Perron–Frobenius operator and the Koopman operator defined on a graph can thus be written as

respectively. Analogously, the forward–backward operator \( \mathcal {F}_\tau \) can be expressed as

With a slight abuse of notation, we define vectors \( \rho , f \in \mathbb {R}^n \) with entries and so that we can write

$$\begin{aligned} \mathcal {P}_\tau \rho = P^\top \rho \quad \text {and} \quad \mathcal {K}_\tau f = P f, \end{aligned}$$

where the operators are applied component-wise. The matrix representation of the operator \( \mathcal {F}_\tau \) is given by

where . To ensure that the inverse of \( D_\nu \) exists, the probability that a random walker (starting in any vertex) ends up in after one step must be nonzero, i.e., there must be at least one incoming edge. We account for this by including self-loops with a small weight to each vertex, which can be regarded as a form of regularization. Another possible approach would be to introduce teleportation probabilities, as described in Sect. 2.

Lemma 3.1

It holds that \( Q = P D_\nu ^{-1} P^\top \) is a doubly stochastic matrix.

Proof

First, note that P and \( D_\nu ^{-1} P^\top \) are row-stochastic matrices and thus their product. Furthermore, Q is symmetric. \(\square \)

In order to determine eigenfunctions of the operators introduced above, we can thus simply compute eigenvectors of the corresponding matrix representations. This illustrates that the conventional spectral clustering for undirected graphs is based on the eigenfunctions of the Koopman operator. For directed and time-evolving graphs, we will now extend this idea and propose a clustering algorithm that utilizes eigenfunctions of the forward–backward operator.

Definition 3.2

(Forward–backward Laplacian) We call the matrix

$$\begin{aligned} L_fb = I - Q \end{aligned}$$

associated with the graph the forward–backward Laplacian.

That is, we define the forward–backward Laplacian based on the generalized operator \( \mathcal {F}_\tau \) by mirroring the definition of the random-walk Laplacian as a shifted matrix representation of the Koopman operator \( \mathcal {K}_\tau \).

3.2 Random Walker Perspective

We derived the forward–backward Laplacian by applying the definition of transfer operators to graphs. Alternatively, we can interpret these results in terms of random walks again. We define \( \phi (x) = [\phi _1(x), \dots , \phi _n(x)]^\top \), with

That is, the basis function \( \phi _i(x) \) is the indicator function for vertex and the random walk data are simply represented in the one-hot encoding format. Furthermore, the feature space dimension is \( N = n \).

Proposition 3.3

Assume \( x^{(i)} \sim U(\{1, \dots , n\}) \), where U denotes the uniform distribution. Using a basis comprising indicator functions, we obtain

$$\begin{aligned}&\lim \limits _{m \rightarrow \infty }{\widehat{K}_\tau ^{(m)}} = P,{} & {} \lim \limits _{m \rightarrow \infty }{\widehat{P}_\tau ^{(m)}} = P^\top ,{} & {} \lim \limits _{m \rightarrow \infty }{\widehat{F}_\tau ^{(m)}} = Q. \end{aligned}$$

The proof is an application of EDMD and CCA convergence results to discrete Markov chains, where the dictionary is now given by indicator functions. For the sake of completeness, it is included in the appendix.

Remark 3.4

We can also approximate the Koopman operator associated with a time-homogeneous system with invariant density \( \pi \) using one long equilibrated trajectory. Furthermore, in this case \( \widehat{T}_\tau ^{(m)} = C_{xx}^+ C_{yx} \) converges to the Perron–Frobenius operator with respect to the equilibrium density.

This shows that we can estimate metastable and coherent sets from data and also allows us to interpret the spectral clustering methods for directed and time-evolving graphs in terms of random walkers.

4 Spectral Clustering of Directed and Time-Evolving Graphs

We will now illustrate how the forward–backward Laplacian \( L_fb \) can be used for spectral clustering.

4.1 Spectral Clustering of Directed Graphs

We have seen that the detection of clusters in undirected graphs is related to the computation of metastable sets. In order to detect clusters in directed graphs, we compute coherent sets.

figure c

As before, instead of computing the k smallest eigenvalues of \( L_fb \), we can determine the k largest eigenvalues and associated eigenvectors of Q. Alternatively, we can compute the first k right singular vectors of the matrix . This shows that spectral clustering for directed graphs can again be regarded as a spectral decomposition of an appropriately normalized adjacency matrix.

Example 4.5

Consider the directed graph shown in Fig. 1. The graph comprises three unidirectionally connected clusters, and a random walker will typically spend a long time in one cluster before moving to the next one. Although this behavior is highly similar to the undirected case, the eigenvalues and eigenvectors of \( L_rw \) are complex-valued and standard spectral clustering techniques fail. We instead compute the eigenvalues and eigenvectors of the forward–backward Laplacian \( L_fb \) and then apply clustering techniques to the dominant eigenvectors. Due to the symmetry, we obtain repeated eigenvalues and the corresponding eigenvectors are only determined up to basis rotations. Nevertheless, it can be seen that the values of the eigenvectors are almost constant within the clusters. This indicates a crisp clustering. \(\triangle \)

Fig. 1
figure 1

a Directed graph with three clusters. The weight of each solid edge is 1 and the weight of each dashed edge 0.01. Self-loops are omitted. b Corresponding asymmetric adjacency matrix. c Two random walks of length 2000 starting in different vertices. d Eigenvalues of the matrix Q. Three eigenvalues are close to 1, which implies that there are three coherent sets. e Eigenvectors corresponding to the dominant eigenvalues. By applying k-means, SEBA, or other clustering techniques to the eigenvectors, we can extract the coherent sets. f Resulting clustering of the graph

In order to avoid having to deal with complex-valued eigenvalues and eigenvectors, one might be tempted to consider only the real parts (or a combination of real and imaginary parts). This will, however, in general not lead to satisfactory results. Even for the simple graph introduced in Example 4.5, we would not obtain the three clusters shown in Fig. 1. It is important to note here that the self-loops are crucial and improve the clustering results. Without the self-loops, a random walker starting in vertex 1 that moves forward and then backward will end up in vertex 1 with probability one. This leads to clusters of size one. Adding self-loops hence regularizes the problem and leads to more balanced clusters. Unless noted otherwise, we will always add self-loops with edge weights to all vertices. Let us now apply Algorithm 4.1 to a larger graph that does not have such a clearly defined cluster structure.

Fig. 2
figure 2

a Randomly generated directed graph with 100 vertices consisting of 10 sparsely connected graphs with 10 vertices. We apply k-means with \( k = 10 \) to the dominant eigenvectors of \( L_{\textrm{fb}} \). The resulting clusters are represented in different colors. b Adjacency matrix of the graph, where the clusters are marked in the corresponding colors

Example 4.6

We generate a directed graph comprising 100 vertices by sparsely connecting 10 randomly generated sparse matrices of size 10. The clustered graph and its adjacency matrix are shown in Fig. 2. Algorithm 4.1 splits the graph into the 10 clusters associated with the 10 randomly generated matrices of size 10. \(\triangle \)

This shows that the proposed spectral clustering algorithm for directed graphs successfully identifies groups of vertices that share similar properties. In particular, reversing the direction of certain edges may also change the cluster assignments, which illustrates that the directionality of the edges is taken into account. The proposed spectral clustering algorithm requires only matrix–matrix multiplications, inverses of diagonal matrices, and methods to compute spectral properties of the resulting forward–backward Laplacian. Using state-of-the-art numerical linear algebra libraries containing, for instance, iterative Arnoldi-type methods for the computation of dominant eigenvalues and eigenvectors of high-dimensional sparse matrices, the algorithm can be easily applied to large-scale problems.

Example 4.7

We cluster a directed graph representing a memory circuit. The matrix, which is available on the Matrix Market website, is of size \( 17{,}758 \times 17{,}758 \) and contains 99,147 nonzero entries. We apply Algorithm 4.1 and arbitrarily choose \( k = 50 \) since there is no clear spectral gap in this case. Applying the clustering algorithm just takes a couple of seconds on a conventional laptop. The clustering is shown in Fig. 3. The results demonstrate the efficacy and scalability of our approach. \(\triangle \)

Fig. 3
figure 3

a Adjacency matrix of the memory circuit. b Spectral clustering of the graph into 50 clusters. In addition to the pink cluster in the middle, which contains approximately 18% of the vertices, and the light-blue cluster, which contains approximately 9%, we obtain many regular-looking clusters surrounding the center cluster (Color figure online)

4.2 Spectral Clustering of Time-Evolving Graphs

To illustrate the versatility of the forward–backward Laplacian, we will now apply our approach to find coherent sets in time-evolving graphs. In particular, we will consider the following two approaches for estimating the matrix Q: In Approach A, the matrix Q is obtained from random walk data by computing \( \widehat{F}_\tau ^{(m)} = C_{xx}^+ C_{xy} C_{yy}^+ C_{yx} \), see Proposition 3.3. In Approach B, we define \( Q = P D_\nu ^{-1} P^\top \) using

$$\begin{aligned} P = \prod _{t=0}^{T} P^{(t)}, \end{aligned}$$

where the transition matrix \( P^{(t)} \) corresponds to the adjacency matrix of the graph at time \( t \in \{0, \dots , T \} \). Constructed in this way, the matrix P contains transition probabilities over time T. We then apply again Algorithm 4.1 to obtain the clusters. The two approaches differ in that the former requires only random walk data, whereas the latter assumes that the time-evolving network structure is known. In the following examples, we will demonstrate how both approaches can be applied to real-world data. We will use Approach A in Example 4.8, mainly to illustrate the notion of coherent sets in time-evolving graphs, and Approach B in Examples 4.9 and 4.7 to show how coherent sets can be found when the structure of a time-evolving graph is known.

Example 4.8

Let us now analyze the time-evolving graph shown in Fig. 4a–c. The graph can be viewed as a discretization of a two-dimensional double-well problem with rotating wells. Random walkers will quickly move to one of the two attracting sets and follow the movement of these time-dependent clusters. The behavior of random walkers starting in the same cluster is hence coherent. We estimate the forward–backward transition matrix Q by computing \( F_\tau ^{(m)} \) using \( m = 5000 \) random walks of length 100 and then compute the eigenvalues and eigenvectors. Since the matrix \( C_{yy} \) is singular in this case (there are no random walkers in some vertices at the final time), we have to use Tikhonov regularization (or, alternatively, the pseudoinverse). We choose \( \varepsilon = 10^{-8} \). In order to detect coherent sets, we apply SEBA to the dominant two eigenvectors. The results are shown in Fig. 4d–f. Unlike k-means, SEBA does not assign all vertices to clusters. In addition to the coherent sets (marked in red and green), we obtain a transition region (marked in yellow). Figure 4g illustrates the dependence of the estimated second eigenvalue on the number of random walkers. The convergence rate in this case is approximately \( \mathcal {O}\left( m^{-1}\right) \), although in general we can only expect \( \mathcal {O}\left( m^{-\nicefrac {1}{2}}\right) \) since EDMD and its extensions are based on Monte Carlo approximations, see also Williams et al. (2015). \(\triangle \)

Fig. 4
figure 4

Time-evolving graph with two attracting sets (marked in light gray) at times a \( t = 0 \), b \( t = 10 \), and c \( t = 20 \). All edge weights are one. Note that some edges are directed and some undirected. After every ten steps, the edges of the graph are ‘rotated’ counterclockwise. The red, green, and yellow dots represent—initially uniformly distributed—random walkers colored according to the coherent sets computed below. (For visualization purposes, we added Gaussian noise to the positions of the random walkers. Nevertheless, the state space is discrete and the random walkers are always assigned to one of the vertices.) d Clustering of the time-evolving graph at time \( t = 0 \) using SEBA. Random walkers starting in a coherent set will typically be trapped within the set for a long time. The yellow nodes are not assigned to any coherent set and can be regarded as transition regions. The corresponding random walkers will end up in either the red or the green cluster. e Eigenvalues of the matrix Q estimated from trajectory data. Two eigenvalues are close to 1, which implies that there are two coherent sets. f Eigenvectors corresponding to the two dominant eigenvalues. g Convergence of the second eigenvalue using approach A to the eigenvalue computed with the aid of Approach B. The red shaded area represents the standard deviation (Color figure online)

The coherence of the sets detected in Example 4.8 can be decreased by adding reverse edges with a low edge weight to all directed edges. This allows random walkers to transition between the coherent sets more easily. It is important to stress here that the notion of coherence is different from the standard interpretation of clusters: Spectral clustering algorithms for static undirected graphs identify fixed sets of vertices with the property that random walkers will remain in such a set for a long time—with a probability that is related to the dominant eigenvalues of the random-walk Laplacian—before moving to another cluster. Coherent sets, on the other hand, are time-evolving structures. Random walkers trapped in such time-dependent sets will with a high probability—now determined by the dominant eigenvalues of the forward–backward Laplacian—move in a coherent way.

Example 4.9

We create a time-evolving graph based on the quadruple-gyre system defined in Denner et al. (2016). Let \( f(t, z) = \delta \sin (\omega t) z^2 + (1 - 2 \delta \sin (\omega t)) z \) and

$$\begin{aligned} g(t, z_1, z_2) = \pi \sin (\pi f(t, z_1)) \cos (\pi f(t, z_2)) \frac{\partial f}{\partial z_2}(t, z_2), \end{aligned}$$

where \( \delta = 0.1 \) and \( \omega = 2 \pi \). The quadruple-gyre flow on the 2-torus \( \mathbb {X} = [0, 2] \times [0, 2] \) is then given by

$$\begin{aligned} \begin{aligned} \dot{x}&= -g(t, x, y), \\ \dot{y}&= g(t, x, y). \end{aligned} \end{aligned}$$
(2)

The intersection point of the lines separating the four gyres that are either rotating clockwise or counterclockwise moves periodically along the diagonal, see Denner et al. (2016). We subdivide \( \mathbb {X} \) into \( 10 \times 10 \) equally sized boxes and select 16 test points per box, which are then mapped forward by the flow map \( \Theta ^{0.05} \). Each box is represented by a vertex . If a test point is transported from box i to box j, we add an edge . We simulate the system from \( \overline{t} = 0 \) to \( \overline{t} = 1 \), resulting in 20 graphs given by a sequence of adjacency matrices \( A^{(t)} \), \( t = 0, \dots , 19 \) (i.e., \( \overline{t} = 0.05 t \)), some of which are shown in Fig. 5b–g. At time \( t = 0 \), the clusters corresponding to the four gyres are disconnected, but as time increases transitions between clusters become possible. Next, we apply Approach B to compute the transition probability matrix P and define \( Q = P D_\nu ^{-1} P^\top \). Applying Algorithm 4.1 results in four coherent sets corresponding to the four vortices of the original quadruple-gyre system as shown in Fig. 5h–j. \(\triangle \)

Fig. 5
figure 5

a One long trajectory generated by the quadruple-well system defined in (2). bg Time-evolving quadruple-gyre graph at different times t. h Eigenvalues of the matrix Q. i Clustering of the graph at time \( t = 0 \) into four sets. j Clustering of the graph at time \( t = 20 \) obtained by mapping indicator functions for each individual cluster forward by the transition probability matrices. The resulting probabilities are plotted as pie charts, where the size of each vertex corresponds to the probability that a random walker will end up in this vertex at \( t = 20 \). It can be seen that information is leaking into the other clusters, but the influx and outflux is comparatively small, although some of the dominant eigenvalues are relatively close to zero. The identified sets are therefore coherent

Fig. 6
figure 6

ae Adjacency matrices of the time-evolving contact network on days 1–5. f Spectral clustering of the graph. The clusters are: ; 2BIO1, ; 2BIO2, ; 2BIO3, ; MP*1, ; MP*2, ; PSI*, ; PC, ; PC*, and ; MP. The gray vertices have not been assigned to any of the clusters. Two vertices corresponding to students who did not have any contacts with their peers over the entire week are not displayed (Color figure online)

In order to illustrate how the spectral clustering approach can be applied to more complex time-evolving networks, let us consider a high school contact and friendship network.

Example 4.10

We analyze data describing the social interactions of more than 300 high school students over a period of 1 week (Mastrandrea et al. 2015). Face-to-face contacts were measured using wearable sensors that exchange IDs only when two students are facing each other. The data can be downloaded from www.sociopatterns.org. We construct an undirected time-evolving graph by aggregating the contacts for each of the five consecutive days (smaller time intervals would be possible as well but result in very sparse graphs). The adjacency matrices, shown in Fig. 6a–e, exhibit a cluster structure that corresponds to the different specializations: Mathematics and Physics (MP*1, MP*2, and MP), Physics and Chemistry (PC and PC*), Engineering (PSI*), and Biology (2BIO1, 2BIO2, and 2BIO3). We construct again a transition probability matrix P as described in Example 4.9 (i.e., using Approach B), compute the forward–backward transition matrix Q, and cluster the associated eigenvectors into 9 coherent sets. The resulting clustering is shown in Fig. 6f.

The results are also summarized in Table 1. More than 93% (307/329) of the students are classified correctly. Approximately 5% (17/329) could not be assigned to a class. These are exactly the students who did not have any contacts with other students on the first day. Less than 2% (5/329) are misclassified. The incorrectly assigned students (IDs 274, 1543, 446, 9, and 784) either have only few contacts or more contacts with students from other classes. \(\triangle \)

Table 1 The entry (ij) of the table shows how many students belonging to class i are assigned to class j by the spectral clustering algorithm

This demonstrates that the spectral clustering algorithm correctly identifies clusters in time-evolving networks. Interestingly, Gephi’s (Bastian et al. 2009) graph layout algorithm automatically places the Biology classes (2BIO1, 2BIO2, and 2BIO3) and Mathematics and Physics classes (MP*1, MP*2, and MP) in close proximity, which indicates that there is more interaction between students belonging to classes specializing in the same subject.

5 Conclusion

We showed how spectral clustering of undirected graphs is related to computing eigenfunctions of the Koopman operator pertaining to the associated random walk process. For such reversible processes, the operator is self-adjoint and the spectrum real-valued. If the process, however, is non-reversible or time-inhomogeneous, we in general obtain complex eigenvalues, which cannot be interpreted as relaxation time scales anymore. For such systems, the conventional concept of metastability no longer applies since metastable sets might now be time-dependent and move in state space. This leads to the definition of coherent sets, which have been extensively used to study fluid flows such as ocean or atmospheric dynamics. By defining transfer operators on graphs, it is possible to detect coherent sets in directed and time-evolving networks. These sets have the property that random walkers starting in such a cluster behave in a coherent way. Our approach can be regarded as a straightforward extension of the popular spectral clustering approach for undirected graphs. It follows the same steps, but replaces the standard random-walk Laplacian by a forward–backward counterpart. We illustrated that the generalized Laplacian leads to meaningful and interpretable clusters. Moreover, we showed how time-evolving benchmark graphs with intricate but intuitively accessible cluster structure can be constructed by discretizing time-inhomogeneous molecular dynamics or fluid dynamics problems. These graphs could, for instance, be used to compare various clustering algorithms for time-varying networks. Additionally, the probability leakage—i.e., the amount of information escaping the clusters—could be used as a quality measure. This would allow for a systematic comparison of different clustering techniques, which will be considered in future work.

The next steps include analyzing the properties of the forward–backward Laplacian in detail and applying these methods to complex time-evolving graphs such as social networks in order to identify, for example, groups of users sharing similar behavioral patterns. Finding large-scale clusters in complex networks is often challenging and might necessitate fine-tuning and optimizing the proposed approach. Furthermore, it would also require efficient, reliable, and ideally easily parallelizable implementations of the algorithms. Another open question is how the chosen lag time \( \tau \) affects the detected coherent sets. The current (data-driven) spectral clustering method for time-evolving graphs takes only the start and end points of the trajectories into account. If the dynamics are smooth, this is often sufficient. More complex problems might benefit from multi-view CCA (Shawe-Taylor and Cristianini 2004), which would take multiple time steps into account. Alternatively, the time-averaging approach using forward–backward diffusion matrices proposed in Banisch and Koltai (2017) to construct space–time diffusion maps could be applied in our setting as well. These topics will be the subject of future research.