Recurrent segmentation meets block models in temporal networks

A popular approach to model interactions is to represent them as a network with nodes being the agents and the interactions being the edges. Interactions are often timestamped, which leads to having timestamped edges. Many real-world temporal networks have a recurrent or possibly cyclic behaviour. For example, social network activity may be heightened during certain hours of day. In this paper, our main interest is to model recurrent activity in such temporal networks. As a starting point we use stochastic block model, a popular choice for modelling static networks, where nodes are split into $R$ groups. We extend this model to temporal networks by modelling the edges with a Poisson process. We make the parameters of the process dependent on time by segmenting the time line into $K$ segments. To enforce the recurring activity we require that only $H<K$ different set of parameters can be used, that is, several, not necessarily consecutive, segments must share their parameters. We prove that the searching for optimal blocks and segmentation is an NP-hard problem. Consequently, we split the problem into 3 subproblems where we optimize blocks, model parameters, and segmentation in turn while keeping the remaining structures fixed. We propose an iterative algorithm that requires $O(KHm + Rn + R^2H)$ time per iteration, where $n$ and $m$ are the number of nodes and edges in the network. We demonstrate experimentally that the number of required iterations is typically low, the algorithm is able to discover the ground truth from synthetic datasets, and show that certain real-world networks exhibit recurrent behaviour as the likelihood does not deteriorate when $H$ is lowered.


Introduction
A popular approach to model interactions between set of agents is to represent them as a network with nodes being the agents and the interactions being the edges.Naturally, many interactions in real-world datasets have a timestamp, in which case the edges in networks also have timestamps.Consequently, developing methdology for temporal networks has gained attention in data mining literature.
Many temporal phenomena have recurrent or possibly cyclic behaviour.For example, social network activity may be heightened during certain hours of day.Our main interest is to model recurrent activity in temporal networks.As a starting point we use stochastic block model, a popular choice for modelling static networks.We can immediately extend this model to temporal networks, for example, by modelling the edges with a Poisson process.Furthermore, Corneli et al. [6] modelled the network by also segmenting the timeline and modelled each segment with a separate Poisson process.
To model the recurrent activity we can either model it explicitly, for example, by modelling explicitly cyclic activity, or we can use more flexible approach where we look for segmentation but restrict the number of distinct parameters.Such notion was proposed by Gionis and Mannila [10] in the context of segmenting sequences of real valued vectors.
In this paper we extend the model proposed by Corneli et al. [6] using the ideas proposed by Gionis and Mannila [10].More formally, we consider the following problem: given a temporal graph with n nodes and m edges, we are looking to partition the nodes into R groups and segment the timeline into K segments that are grouped into H levels.Note that a single level may contain non-consecutive segments.An edge e = (u, v) is then modelled with a Poisson process with a parameter λ ijh , where i and j are the groups of u and v, and h is the level of the segment containing e.
To obtain good solutions we rely on an iterative method by splitting the problem into three subproblems: (i) optimize blocks while keeping the remaining parameters fixed, (ii) optimize model parameters Λ while keeping the blocks and the segmentation fixed, (iii) optimize the segmentation while keeping the remaining parameters fixed.We approach the first subproblem by iteratively optimizing block assignment of each node while maintaining the remaining nodes fixed.We show that such single round can be done in O m + Rn + R 2 H + K time, where n is the number of nodes and m is the number of edges.Fortunately, the second subproblem is trivial since there is an analytic solution for optimal parameters, and we can obtain the solution in O m + R 2 H + K time.Finally, we show that we can find the optimal segmentation with a dynamic program.Using a stock dynamic program leads to a computatonal complexity of O m 2 KH .Fortunately, we show that we can speed up the computation by using a SMAWK algorithm [2], leading to a computational complexity of O mKH + HR 2 .
In summary, we extend a model by Corneli et al. [6] to have recurring segments.We prove that the main problem is NP-hard as well as several related optimization problems where we fix a subset of parameters.Navigating around these NP-hard problems we propose an iterative algorithm where a single iteration requires O KHm + Rn + R 2 H time, a linear time in edges and nodes.
The rest of the paper is organized as follows.First we introduce preliminary notation, the model, and the optimization problem in Section 2. We then proceed to describe the iterative algorithm in Section 3. We present the related work in Section 4. Finally, we present our experiments in Section 5 and conclude the paper with discussion in Section 6.The proofs are provided in Appendix 1 .

Preliminary notation and problem definition
Assume a temporal graph G = (V, E), where V is a set of nodes and E is a set of edges, where each edge is tuple (u, v, t) with u, v ∈ V and t being the timestamp.We will use n = |V | to denote the number of nodes and m = |E| the number of edges.For simplicity, we assume that we do not have self-loops, though the models can be adjusted for such case.We write t(e) to mean the timestamp of the edge e.We also write N (u) to denote all the edges adjacent to a node u ∈ V .
Perhaps the simplest way to model a graph (with no temporal information) is with Erdos-Renyi model, where each edge is sampled independently from a Bernoulli probability parameterized with q.Let us consider two natural extensions of this model.The first extension is a block model, where nodes are divided into k blocks, and an edge (u, v) are modelled with a Bernoulli probability parameterized with q ij , where i is the block of u and j is the block of v. Given a graph, the optimization problem is to cluster nodes into blocks so that the likelihood of the model is optimized.For the sake of variability we will use the words block and group interchangeably.
A convenient way of modelling events in temporal data is using Poisson process: Assume that you have observed c events with timestamps t 1 , . . ., t c in a time interval T of length ∆.The log-likelihood of observing these events at these exact times is equal to c log λ − λ∆, where λ is a model parameter.Note that the log-likelihood does not depend on the individual timestamps.
If we were to extend the block model to temporal networks, the log-likelihood of c edges occurring between the nodes u and v in a time interval is equal to c log λ ij −λ ij ∆, where λ ij is the Poisson process parameter and i is the block of u and j is the block of v.Note that λ ij does not depend on the time, so discovering optimal blocks is very similar to discovering blocks in a static model.
A natural extension of this model, proposed by Corneli et al. [6], is to make the parameters depend on time.Here, we partition the model into k segments and assign different set of λs to each segment.
More formally, we define a time interval T to be a continuous interval either containing the starting point T = [t 1 , t 2 ] or excluding the starting point T = (t 1 , t 2 ].In both cases, we define the duration as Given a time interval T , let us define to be the number of edges between u and v in T .The log-likelihood of Poisson model for nodes u, v and a time interval T is We extend the log-likelihood between the two sets of nodes U and W , by writing where U × W is a set of all node pairs {u, w} with u ∈ U and w ∈ W and u = v.We consider {u, w} and {w, u} the same, so only one of these pairs is visited.
For notational simplicity, we require that the boundaries t i must collide with the timestamps of individual edges.We also assume that D covers the edges.If D is not specified, then it is set to be the smallest interval covering the edges.
Given a K-segmentation, a partition of nodes P = P 1 , . . ., P R into R groups, and a set of KR(R + 1)/2 parameters Λ = {λ ijk } 2 , the log-likelihood is equal to This leads immediately to the problem considered by Corneli et al. [6].
Problem 1 ((K, R) model).Given a temporal graph G, a time interval D, integers R and K, find a node partition with R groups, a K-segmentation, and a set of parameters Λ so that ℓ(P, T , Λ) is maximized.
We should point out that for fixed P and T , the optimal Λ is equal to .
In this paper we consider an extension of (K, R) model.Many temporal network exhibit cyclic or repeating behaviour.Here, we allow network to have K segments but we also limit the number of distinct parameters to be at most H ≤ K.In other words, we are forcing that certain segments share their parameters.We do not know beforehand which segments should share the parameters.
We can express this constraint more formally by introducing a mapping g : [K] → [H] that maps a segment index to its matching parameters.We can now define the likelihood as follows: given a K-segmentation, a partition of nodes P = P 1 , . . ., P R into R groups, a mapping g : [K] → [H], and a set of HR(R + 1)/2 parameters Λ = {λ ijh }, the log-likelihood is equal to .
We will refer to g as level mapping.This leads to the following optimization problem.
Problem 2 ((K, H, R) model).Given a temporal graph G, a time interval D, integers R, H, and K, find a node partition with R groups, a K-segmentation, a level mapping g : [K] → [H], and parameters Λ maximizing ℓ(P, T , g, Λ).
2 For notational simplicity we will equate λ ijh and λ jih .
Algorithm 1: Main loop of the algorithm 3 Fast algorithm for obtaining good model In this section we will introduce an iterative, fast approach for obtaining a good model.The computational complexity of one iteration is O KHm + Rn + R 2 H , which is linear in both the nodes and edges.

Iterative approach
Unfortunately, finding optimal solution for our problem is NP-hard.
Proposition 1. Problem 2 is NP-hard, even for Consequently, we resort to a natural heuristic approach, where we optimize certain parameters while keeping the remaining parameters fixed.We split the original problem into 3 subproblems as shown in Algorithm 1. First, we find good groups, then update Λ, and then optimize segmentation, followed by yet another update of Λ.
When initializing, we select groups P and parameters Λ randomly, then proceed to find optimal segmentation, followed by opimizing Λ.
Next we will explain each step in details.

Finding groups
Our first step is to update groups P while maintaining the remaining parameters fixed.Unfortunately, finding the optimal solution for this problem is NP-hard.
Proposition 2. Finding optimal partition P for fixed Λ, T and g is NP-hard, even for Due to the previous proposition, we perform a simple greedy optimization where each node is individually reassigned to the optimal group while maintaining the remaining nodes fixed.
We should point out that there are more sophisticated approaches, for example based on SDP relaxations, see a survey by Abbe [1].However, we resort to a simple greedy optimization due to its speed.
Algorithm 2: Algorithm FindGroups(P, Λ) for finding groups for a fixed segmentation T , g and parameters Λ (update P also); A naive implementation of computing the log-likelihood gain for a single node may require Θ(m) steps, which would lead in Θ(nm) time as we need to test every node.Luckily, we can speed-up the computation using the following straightforward proposition.Proposition 3. Let P be the partition of nodes, Λ set of parameters, and T and g the segmentation and the level mapping.Let S h = {T k ∈ T | h = g(k)} be the segments using the hth level.
Let u be a node, and let P b be the set such that u ∈ P b .Select P a , and let P ′ be the partition where u has been moved from P b to P a .Then where Z is a constant, not depending on a, t h = ∆(S h ) is the total duration of the segments using the hth level and c jh = c(u, P j , S h ), is the number of edges between u and P j in the segments using the hth level.
The proposition leads to the pseudo-code given in Algorithm 2. The algorithm computes an array c and then uses Proposition 3 to compute the gain for each swap, and consequently to find the optimal gain.
Computing the array requirs iterating over the adjacent edges, leading to O(|N (v)|) time, and computing the gains requires O R 2 H time. Consequently, the computational complexity for FindGroups is O m + R 2 Hn + K .
The running time can be further optimized by modifying Line 8.There are at most 2m non-zero c[i, j] entries (across all v ∈ V ), consequently we can speed up the computation of a second term by ignoring the zero entries in c[i, j].In addition, for each a, the remaining terms can be precomputed in O(RH) time and maintained in O(1) time.This leads to a running time of O m + Rn + R 2 H + K .

Updating Poisson process parameters
Our next step is to update Λ while maintaining the rest of the parameters fixed.This refers to UpdateLambda in Algorithm 1. Fortunately, this step is straightforward as the optimal parameters are equal to , where } are the segments using the hth level.Updating the parameters requires In practice, we would like to avoid having λ = 0 as this forbids any edges occurring in the segment, and we may get stuck in a local maximum.We approach this by shifting λ slightly by using where θ and η are user parameters.

Finding segmentation
Our final step is to update the segmentation T and the level mapping g, while keeping Λ and P fixed.Luckily, we can solve this subproblem in linear time.
Note that we need to keep Λ fixed, as otherwise the problem is NP-hard.
Proposition 4. Finding optimal Λ, T and g for fixed P is NP-hard.
On the other hand, if we fix Λ, then we can solve the optimization problem with a dynamic program.To be more specific, assume that the edges in E are ordered, and write o[e, k] to be the log-likelihood of k-segmentation covering the edges prior and including e.Given two edges s, e ∈ E, let y(s, e; h) be the loglikelihood of a segment (t(s), t(e)] using the hth level of parameters, λ ••h .If s occurs after e we set y to be −∞.Then the identity Using an off-the-shelf approach by Bellman [5] leads to a computational complexity of O m 2 KH , assuming that we can evaluate y(s, e; h) in constant time.
However, we can speed-up the dynamic program by using a SMAWK algorithm [2].Given a function x(i, j), where i, j = 1, . . ., m, SMAWK computes z(j) = arg max i x(i, j) in O(m) time, under two assumptions.The first assumption is that we can evaluate x in constant time.The second assumption is that x is totally monotone.We say that x is totally monotone, if x(i 2 , j 1 ) > x(i 1 , j 1 ), then x(i 2 , j 2 ) ≥ x(i 1 , j 2 ) for any i 1 < i 2 and j 1 < j 2 .
We have the immediate proposition.
Our last step is to compute x in constant time.This can be done by first precomputing f [e, h], the log-likelihood of a segment starting from the epoch and ending at t(e) using the hth level.The log-likelihood of a segment is then y(s, e; h) = f [e, h] − f [s, h], which we can compute in constant time.
Algorithm 3: Algorithm FindSegments(P, Λ) for finding optimal segmentation for fixed groups P and parameters Λ In the inner loop we use SMAWK to find optimal starting points.Note that we have to do this for each h, and only then select the optimal h for each segment.Note that we do define x on Line 5 but we do not compute its values.Instead this function is given to SMAWK and is evaluated in a lazy fashion.
Once we have constructed the arrays, we can recursively recover the optimal segmentation and the level mapping from q and r, respectively.
FindSegments runs in O mKH + HR 2 time since we need to call SMAWK O(HK) times.
We were able to use SMAWK because the optimization criterion turned out to be totally monotone.This was possibly only because we fixed Λ.The notion of using SMAWK to speed up a dynamic program with totally monotone scores was proposed by Galil and Park [9].Fleischer et al. [7], Hassin and Tamir [14] used this approach to solve dynamic program segmenting monotonic one-dimensional sequences with L 1 cost.
We fixed Λ because Proposition 4 states that the optimization problem for H < K cannot be solved in polynomial time if we optimize T , g, and Λ at the same time.Proposition 4 is the main reason why we cannot use directly the ideas proposed by Corneli et al. [6] as the authors use the dynamic program to find T and Λ at the same time.
However, if K = H, then the problem is solvable with a dynamic program but requires O Km 2 R 2 time.However, if we consider the optimization problem as a minimization problem and shift the cost with a constant so that it is always positive, then using algorithms by Guha et al. [11], Tatti [25] we can obtain (1 + ǫ)-approximationn with O K 3 log K log m + K 3 ǫ −2 log m number of cost evaluations.Finding the optimal parameters and computing the cost of a single segment can be done in O R 2 time with O R 2 + m time for precomputing.This leads to a total time of O R 2 (K 3 log K log m + K 3 ǫ −2 log m) + m for the special case of K = H.

Related work
The closest related work is the paper by Corneli et al. [6] which can be viewed as a special case of our approach by requiring K = H, in other words, while the Poisson process may depend on time we do not take into account any recurrent behaviour.Having K = H simplifies the optimization problem somewhat.While the general problem still remains difficult, we can now solve the segmentation T and the parameters Λ simultaneously using a dynamic program as was done by Corneli et al. [6].In our problem we are forced to fix Λ while solving the segmentation problem.Interestingly enough, this gives us an advantage in computational time: we only need O KHm + HR 2 time to find the optimal segmentation while the optimizing T and Λ simultaneously requires O R 2 Km 2 time.On the other hand, by fixing Λ we may have a higher chance of getting stuck in a local maximum.
The other closely related work is by Gionis and Mannila [10], where the authors propose a segmentation with shared centroids.Here, the input is a sequence of real valued vectors and the segmentation cost is either L 2 or L 1 distance.Note that there is no notion of groups P, the authors are only interested in finding a segmentation with recurrent sources.The authors propose several approximation algorithms as well as an iterative method.The approximation algorithms rely specifically on the underlying cost, in this case L 1 or L 2 distance, and cannot be used in our case.Interestingly enough, the proposed iterative method did not use SMAWK optimization, so it is possible to use the optimization described in Section 3 to speed up the iterative method proposed by Gionis and Mannila [10].
In this paper, we used stochastic block model (see [3,16], for example) as a starting point and extend it to temporal networks with recurrent sources.Several past works have extended stochastic block models to temporal networks: Matias and Miele [20], Yang et al. [28] proposed an approach where the nodes can change block memberships over time.In a similar fashion, Xu and Hero [26] proposed a model where the adjacency matrix snapshots are generated with a logistic function whose latent parameters evolve over time.The main difference with our approach is that in these models the group memberships of nodes are changing while in our case we keep the memberships constant and update the probablties of the nodes.Moreover, these methods are based on graph snapshots while we work with temporal edges.In another related work, Matias et al. [21] modelled interactions using Poisson processes conditioned by stochastic block model.Their approach was to estimate the intensities non-parametrically through histograms or kernels while we model intensities with recurring segments.For a survey on stochastic block models, including extensions to temporal settngs, we refer the reader to a survey by Lee and Wilkinson [18].
Stochastic block models group similar nodes together; here similarity means that nodes in the same group have the similar probabilities connecting to nodes from other group.A similar notion but a different optimization criterion was proposed by Arockiasamy et al. [4].Moreover, Henderson et al. [15] proposed a method where nodes with similar neighborhoods are discovered.
In this paper we modelled the recurrency by forcing the segments to share their parameters.An alternative approach to discover recurrency is to look explictly for recurrent patterns [8,12,13,19,22,27].We should point out that these works are not design to work with graphs; instead they work with event sequences.We leave adapting this methodology for temporal networks as an interesting future line of work.
Using segmentation to find evolving structures in networks have been proposed in the past: Kostakis et al. [17] introduced a method where a temporal network is segmented into k segments with h < k summaries.A summary is a graph, and the cost of an individual segment is the difference between the summary and the snapshots in the segment.Moreover, Rozenshtein et al. [24] proposed discovering dense subgraphs in individual segments.

Experimental evaluation
The goal in this section is to evaluate experimentally our algorithm.Towards that end, we first test how well the algorithm discovers the ground truth using synthetic datasets.Next we study the performance of the algorithm on realworld temporal datasets in terms of running time and likelihood.We compare our results to the following baselines: the running times are compared to a naive implementation where we do not utilize SMAWK algorithm, and the likelihoods are compared to the likelihoods of the (R, K) model.
We implemented the algorithm in Python 3 and performed the experiments using a 2.4GHz Intel Core i5 processor and 16GB RAM.
Synthetic datasets: To test our algorithm, we generated 5 temporal networks with known groups and known parameters Λ which we use as a ground truth.To generate data, we first chose a set of nodes V , number of groups R, number 3 The source code is available at https://version.helsinki.fi/chamwick/recurrent-segmentation-sbm.gitTable 1: Dataset characteristics and results from the experiments.Here, n is the number of nodes, m is the number of edges, R is the number of groups, K is the number of segments, H is the number of levels, LL 1 is the normalized log-likelihood for the ground truth, G is the Rand index, LL 2 is the discovered normalized log-likelihood, I is the number of iterations, and CT is the computational time in seconds. of segments K, and number of levels H. Next we assumed that each node has an equal probability of being chosen for any group.Based on this assumption, the group memberships were selected at random.We then randomly generated Λ from a uniform distribution.More specifically, we generated H distinct values for each pair of groups and map them to each segment.Note that, we need to ensure that each distinct level is assigned to at least one segment.To guarantee this, we first deterministically assigned the set of H levels to first H segments and the remaining (K − H) segments are mapped by randomly selecting (K − H) elements from H level set.
Given the group memberships and their related Λ, we then generated a sequence of timestamps with a Poisson process for each pair of nodes.The sizes of all synthetic datasets are given in Table 1.
Real-world datasets: We used 7 publicly available temporal datasets.Email-Eu-1 and Email-Eu-2 are collaboration networks between researchers in a European research institution. 4Math Overflow contains user interactions in Math Overflow web site while answering to the questions. 4 CollegeMsg is an online message network at the University of California, Irvine. 4MOOC contains actions by users of a popular MOOC platform. 4Bitcoin contains member rating interactions in a bitcoin trading platform. 4Santander contains station-to-station links that occurred on Sep 9, 2015 from the Santander bikes hires in London. 5The sizes of these networks are given in Table 1.Results for synthetic datasets: To evaluate the accuracy of our algorithm, we compare the set of discovered groups and their intensity functions with the ground truth groups and intensity functions.Our algorithm found exact groups of nodes: in Table 1 we can see that Rand index6 (column G) is equal to 1.
Next we compare the log-likelihood values from true models against the loglikelihoods of discovered models.To evaluate the log-likelihoods, we normalize the log-likelihood, that is we computed ℓ(P, T , g, Λ) /ℓ(P ′ , T ′ , g ′ , Λ ′ ), where P ′ , T ′ , g ′ , Λ ′ is a model with a single group and a single segment.Since all our log-likelihood values were negative, the normalized log-likelihood values were between 0 and 1, and smaller values are better.
As demonstrated in column LL 1 and column LL 2 of Table 1, we obtained similar normalized log-likelihood values when compared to the normalized loglikelihood of the ground truth.The obtained normalized log-likelihood values were all slightly better than the log-likelihoods of the generated models, that is, our solution is as good as the ground truth.
An example of the discovered parameters, λ 11 and λ 12 , for Synthetic-4 dataset are shown in Figure 1.The discovered parameters matched closely to the generated parameters with the biggest absolute difference being 0.002 for Synthetic-4.The figures for other values and other synthetic datasets are similar.
Computational time: Next we consider the computational time of our algorithm.We varied the parameters R, K, and H for each dataset.The model parameters and computational times are given in Table 1.From the last column CT , we see that the running times are reasonable despite using inefficient Python libraries: for example we were able to compute the model for MOOC dataset, with over 400 000 edges, under four minutes.This implies that the algorithm scales well for large networks.This is further supported by a low number of iterations, column I in Table 1.
Next we study the computational time as a function of m, number of edges.We first prepared 4 datasets with different number of edges from a real-world dataset; Santander-large.To vary the number of edges, we uniformly sampled edges without replacement.We sampled like a .4,.6,.8, and 1 fraction of edges.
Next we created 4 different Synthetic-large dataset with 30 nodes, 3 segments with unique λ values but with different number of edges.To do that, we gradually increase the number of Poisson samples we generated for each segment.
From the results in Figure 2 we see that generally computational time increases as |E| increases.For instance, a set of 17 072 edges accounts for 18.46s whereas a set of 34 143 edges accounts for 36.36sw.r.t Santander-large.Thus a linear trend w.r.t |E| is evident via this experiment.
To emphasize the importance of SMAWK, we replaced it with a stock solver of the dynamic program, and repeat the experiment.We observe in Figure 2 that computational time has increased drastically when stock dynamic program algorithm is used.For example, a set of 34 143 edges required 3.7h for Santanderlarge dataset but only 36.36s when SMAWK is used.
Likelihood vs number of levels: Our next experiment is to study how normalized log-likelihood behaves upon the choices of H.We conducted this experiment for K = 20 and vary the number of levels (H) from H = 1 to H = 20.The results for the Santander, Bitcoin, Synthetic-5, and Email-Eu-1 dataset are shown in Figure 3. From the results we see that generally normalized log-likelihood decreases as H increases.That is due to the fact that higher the H levels, there exists a higher degree of freedom in terms of optimizing the likelihood.Note that if H = K, then our model corresponds to the model studied by Corneli et al. [6].Interestingly enough, the log-likelihood values plateau for values of H ≪ K suggesting that existence of recurring segments in the displayed datasets.

Concluding remarks
In this paper we introduced a problem of finding recurrent sources in temporal network: we introduced stochastic block model with recurrent segments.
To find good solutions we introduced an iterative algorithm by considering 3 subproblems, where we optimize blocks, model parameters, and segmentation in turn while keeping the remaining structures fixed.We demonstrate how each subproblem can be optimized in O(m) time.Here, the key step is to use SMAWK algorithm for solving the segmentation.This leads to a computational complexity of O KHm + Rn + R 2 H for a single iteration.We show experimentally that the number of iterations is low, and that the algorithm can find the ground truth using synthetic datasets.
The paper introduces several interesting directions: Gionis and Mannila [10] considered several approximation algorithms but they cannot be applied directly for our problem because our optimization function is different.Adopting these algorithms in order to obtain an approximation guarantee is an interesting challenge.We used a simple heuristic to optimize the groups.We chose this approach due to its computational complexity.Experimenting with more sophisticated but slower methods for discovering block models, such as methods discussed in [1], provides a fruitful line of future work.edges are excluded) in (i, j)th block of P. Note that k 12 + k 11 + k 22 = n and m 12 + m 11 + m 22 = 2m.
The parameters for P are , and .
Let us define P ′ = {X ∪ U, Y ∪ V }.Note that the parameters for P ′ are equal to .
To prove the claim we will assume that a 1 b 1 + a 2 b 2 > 0 or k 12 < n, and show that ℓ(P ′ ) > ℓ(P) which is a contradiction.
First, note that we can write the score difference as We claim that A ≥ 0, B > α2 −7 , and C ≥ 2m log 1 4αnr 2 .This proves the lemma since We will first bound C. Since we may have at most αn edges per node pair, we have λ 11 , λ 12 , λ 22 ≤ αn.Moreover, since r > n, we have λ ′ 11 ≥ (r+n) −2 ≥ r −2 /4.The bound follows from Eq. 1.
Next we will bound B. Assume that x 1 , x 2 ≥ 2n.Our next step is to upper bound λ 11 and λ 22 .In order to do this, first note that since m 11 ≤ α/2 and k 11 ≤ y 1 , we have In addition, since x 1 ≥ 3, we have We can combine the two bounds, leading to The same bound holds for λ 22 .
Since r ≥ 4n, we have λ ′ 12 ≥ 17α/25.Thus, Finally we will bound A by showing that λ 12 ≤ λ ′ 12 .Assume for simplicity that Assume that z 1 , z 2 ≥ r.We claim that N ≤ r 2 + (z 1 − r)(z 2 − r), which leads to Here the first inequality holds since m 12 ≤ 2m ≤ α.The right hand side achieves its maximum when z 1 z 2 is maximized, that is, z 1 = z 2 = n + r.In such case, the upper bound is equal to λ ′ 12 .
To prove the claim, first note that with the equality holding if and only if Assume a 1 b 1 + a 2 b 2 = 0. Then k 12 < n, and the inequality is strict in Eq. 2. Consequently, Assume now that x 1 ≤ x 2 − 1.Since a node in (X ∪ Y ) ∩ P 1 is connected to r nodes, we must have a 1 b 2 + a 2 b 1 ≤ x 1 r.Due to the assumption, x 1 ≤ r − 1 and x 2 ≥ r + 1, which leads to and As a final case assume that z 1 < r.If x 1 = z 1 , then and again The case for z 2 < r is symmetrical.
Proof (Proof of Proposition 2).To prove NP-hardness we will reduce the Max-Cut problem, where we are asked to partition graph into 2 subgraphs and maximize cross-edges.
Assume that we given a static graph H.We will use H as our temporal graph G by setting the edges to the same timestamp, say t.We also set H = K = 1, and use R = 2 groups.We also set the segmentation T = [t, t].
Let P 1 , P 2 be a partition of the nodes and let x be the number of the inner edges, that is, edges (u, v, t) with u, v ∈ P 1 or u, v ∈ P 2 .Note that m − x is the number of cross edges.
The log-likelihood is then equal to which is maximized when m − x is maximized since β > α.Since m − x is the number of cross-edges, this completes the proof.
Proof (Proof of Proposition 3).Let us write Q to be the partition obtained from P by deleting u, that is for any T and λ.Moreover, ℓ P Here we used the fact that c(u, P j , S h ) = c u, P ′ j , S h .The claim follows by setting Z = ℓ(Q, T , g, Λ) − ℓ(P, T , g, Λ).
Proof (Proof of Proposition 4).Assume that we given an instance of 3-Matching, that is, a domain X of size n, where n is divisible by 3, and a collection S of m sets such that S ⊆ X and |S| = 3 for each S ∈ S. The problem whether there is a disjoint subcollection in S covering X is known to be NP-complete.
Let T = {S ⊆ X | |S| = 3, S / ∈ S} be the complement collection of S. For each i ≤ j ≤ m, define c ij to be the number of sets in S containing i and j, To construct the dynamic graph G we will use 5 sets of nodes, namely {u}, A, B, C, D. The first set consists only of one node u.Every edge will be adjacent to u.The second set A contains as many nodes as there are sets in T .For each i ∈ T j ∈ T , we add an edge (u, a j ) at timestamp i.The third set B contains i<j c ij nodes which we divide further into n(n − 1)/2 sets B ij with |B ij | = c ij .For each i < j we connect nodes in B ij with u at timestamp i and at timestamp j.The fourth set C contains n(m − c ii ) nodes which we divide further into n sets C i with |C i | = m − c ii .For each i ≤ n we connect nodes in C i with u at timestamp i.The fifth set D contains nw nodes, where w = 24n 6 , which we divide further into n sets D i with |D i | = w.For each i ≤ n we connect w nodes with u at timestamp i.
We will set K = n and H = n/3.We set R to be the number of nodes and set the partition P to be the partition where each node is contained in its own block.We require for a segmentation to start from 0. Since there are only n timestamps, the segmentation consists of n segments of form (i − 1, i] or [0, 1].
Let g the optimal grouping of segments and let Λ be its parameters.We first claim that g groups timestamps into groups of 3. To prove this assume that there is a group of size y = 1, 2. Then there is another group with a size of x ≥ 6 − y.Let g ′ be a mapping where we move 3 − y timestamps from the larger group to the smaller group and let Λ ′ be the new optimal parameters.
The number of edges adjacent to A, B, and C can be bound by 3n 3 + 2n 2 m+ nm ≤ 6n 5 .Moreover, the non-zero parameters can be bound by λ ′ ≥ 1/n and λ ≤ 1.Consequently, the score difference can be bound by ℓ(g ′ ) − ℓ(g) ≥ 6n 5 (log 1/n − log 1) + Z(x) ≥ 6n 6 + Z(x), where Z(x) is equal to The derivative of Z(x) with respect to x is equal to log(x/(x − 3 + y)) > 0, that is, Z(x) is at smallest when x = 6 − y.A direct calculation shows that Z(x) is the smallest when y = 2, leading to ℓ(g ′ ) − ℓ(g) ≥ −6n 6 + w(6 log > −6n 6 + w/4 = 0 . In summary, ℓ(g ′ ) > ℓ(g) which is a contradiction.Thus, g groups of segments to size of at least 3. Since K = 3H, the groups are exactly of size 3.Our next step is to calculate the impact of a single group to the score.In order to do that first note that for any i < j there are Proof (of Proposition 5).Assume four edges s 1 , s 2 , e 1 and e 2 with t(s 1 ) ≤ t(s 2 ) and t(e 1 ) ≤ t(e 2 ).We can safely assume that t(s 2 ) ≤ t(e 1 ).We can write the difference as x(s 1 , e 2 ) − x(s 1 , e 1 ) = y(e 1 , e 2 ; h) = x(s 2 , e 2 ) − x(s 2 , e 1 ) .

B Detailed pseudo-code for FindSegments
Algorithm 4: Algorithm FindSegments(P, Λ) for finding optimal segmentation for fixed groups P and parameters Λ

7 z
[e, h] ← arg maxs x(s, e; h) for each e ∈ E (use SMAWK); 8 o[e, k] ← max h x(z[e, h], e; h) for each e ∈ E; 9 r[e, k] ← arg max h x(z[e, h], e; h); 10 q[e, k] ← z[e, r[e, k]]; The pseudo-code for finding the segmentation is given in Algorithm 3. A more detailed version of the pseudo-code is given in Appendix.Here, we first precompute f [e, h].We then solve segmentation with a dynamic program by maintaining 3 arrays: o[e, k] is the log-likelihood of k-segmentation covering the edges up to e, q[e, k] is the starting point of the last segment responsible for o[e, k], and r[e, k] is the level of the last segment responsible for o[e, k].

Fig. 2 :Fig. 3 :
Fig. 2: Computational time as a function of number of temporal edges (|E|) for Synthetic-large (a,c) and Santander-large (b,d).This experiment was done with R = 3, K = 5, and H = 3 using SMAWK algorithm (a-b) and naive dynamic programming (c-d).The times are in seconds in (a-c) and in hours in (d).