1 Introduction

In the era of digital transformation, networks have become a fundamental part of our lives: social networks, brain connectivity, and population migrations - these are all examples of dynamic networks that evolve over time. As these networks grew in complexity and scale, so did the need for understanding their evolution. This led to the rise of change detection in dynamic networks. Change detection in dynamic networks involves identifying change points where the behavior of the observed network deviates from its usual state. These changes often indicate the onset of drifting data, where the newly observed data differs from the previously observed data. This phenomenon is common across various fields. For instance, in market phase discovery, it can help identify shifts in market trends. In fraud detection, it can spot unusual patterns that may indicate fraudulent activities. In healthcare monitoring, it can detect changes in brain connectivity patterns, potentially signaling health issues. In essence, change detection in dynamic networks has risen due to the increasing complexity and ubiquity of networks in our lives. It serves as a crucial tool, helping us understand, monitor, and respond to the ever-evolving digital landscape.

Unfortunately, traditional change detection approaches designed for (multidimensional) time series are hardly applicable to dynamic networks, or even inapplicable. They may lead to poor detection accuracy or efficiency issues (Akoglu et al., 2015). In fact, traditional approaches require a data transformation procedure that adapts the observed dynamic networks into an equivalent data representation, compatible with the change detection approach at hand. However, operating such a conversion is not straightforward, and could lead to information loss. Indeed, changes may regard the whole structure of the network or only some nodes and edges, because they are likely associated with evolutions of the majority of the domain or small portions. Second, changes do not only affect the structural aspects but may have an impact also on properties of nodes and edges. Third, regardless of where the changes occur, not all of them can be worth of interest. Indeed, episodic or sporadic evolutions may be of little relevance for the problem at hand, even whether they have an impact on the whole network. On the contrary, changes occurring on well-established portions of the network can be of higher interest. Consequently, it is hard to take into account all the dynamic network factors that could possibly originate a change over time: an agnostic change detection approach should be favored in all the scenarios where it is not straightforward to define in advance the factors on which to detect the changes, as in the case of dynamic networks.

The category of Pattern-based Change Detection methods (PBCD hereafter) (Zhang et al., 2023; Flouvat et al., 2020; Chaturvedi et al., 2021; Scharwächter et al., 2016) turns out to be promising, as it provides a convenient methodological framework for revealing changes in dynamic networks without requiring an apriori data transformation. They become a particularly useful alternative to the model-driven approaches (e.g., supervised learning algorithms) when the validation of the detection model is not applicable, which is the typical situation when there is no labelled change points are inaccessible, that is, ground truth is missing.

PBCDs are unsupervised, non-parametric change detection algorithms that seek changes by measuring variations in the set of observed patterns over time, which, in turn, are incrementally updated through pattern mining algorithms as new data arrives. Recent works argue the relevance of detecting changes through sub-structures of the dynamics networks since these turn out to be helpful in the interpretation and understanding of the cause of the change. Indeed, an interpretable detection method also locates the source of the change and characterizes its nature (Micevska et al., 2021).

However, traditional PBCDs suffer from efficiency issues inherited by the graph mining core procedures. Indeed, they adopt exhaustive search strategies that lead to the completeness of pattern mining, but, on the other hand, generate large sets of patterns, which introduce further costs for subsequent inspection of relevant patterns (Haghir Chehreghani et al., 2020). Indeed, to filter out irrelevant information, we should rely either on manual activities or on semi-automatic filtering techniques, which introduce further computation or require end-user interaction (e.g., configuration of subjective criteria of interestingness) (van Leeuwen et al., 2016). Also, patterns manifest redundant knowledge as some express the same changes already represented by others. Thus, of all the possible patterns, only a fraction is likely to be relevant for detecting changes: a quality particularly desirable in dynamic network scenarios, where the data are acquired at a high rate and giving a prompt response is prominent. Therefore, turning down the completeness property, thus considering only a reduced set of patterns for the change detection, could increase the overall efficiency.

A promising idea for reducing the number of interesting patterns, exhaustively discovered by mining algorithms, is that of resorting to non-exhaustive search (Ventura & Luna, 2016) in which uninteresting subspaces are ignored during mining, thus widely reducing the computational effort. This could be done by an alternative process that explores only the branches of the search space that guarantee quality with respect to heuristic criteria. Some attempts have been made in this direction, but very few designed for dynamic networks. Giacometti and Soulet (2016) uses pattern sampling, based on their frequency, to draw itemsets used to compute outlier scores of transactions. They do not handle network data and search for outliers, which are different from change points. Recently, Preti et al. (2023) focuses on (static) network data with a pattern sampling approach, which however requires a the trade-off between the sample size, the time needed to analyze the sample and the quality that can be obtained from a sample of the specific size. The sole contributions of non-exhaustive search we find are for transactional streams (Barik et al., 2023). For instance, (Galbrun, 2022; Yamanishi, 2023; van Leeuwen & Siebes, 2008) resort to heuristics to assign minimum description length-based optimal codes to itemsets frequently occurring in a transactional stream. Changes are detected when incoming data blocks do not fit the codes.

In this work, we detect changes in dynamic networks by proposing new non-exhaustive PBCD methodologies based on novel heuristic criteria. The new heuristic criteria guide the selection of promising portions of the search space (pattern sub-spaces) that will be then inspected in order to reveal whether and how much they are changed. The core assumption is that the most relevant changes should related to the occurrences of the patterns sought on promising sub-spaces. Experimental results demonstrate that adopting alternative heuristic criteria and search strategies provides data scientists with the flexibility to prioritize either accuracy improvement or reduction in time consumption, depending on their specific needs.

2 Contribution

To explore sub-spaces of the whole pattern space through heuristic criteria, we need to revise the two-step blueprint of a PBCD, that is, mining and detection and, specifically, integrate the heuristic criteria in the mining step.

In particular, we study non-exhaustive PBCDs based on frequency measures and contrast measures (Bailey, 2013) as heuristic criteria. Given a generic pattern, the frequency measures (e.g., Area and support Impedovo et al., 2019) quantify the pattern interestingness in a time window, the contrast measures (called e.g., growth-rate Chavary et al., 2017) quantify variations in the observed frequencies the pattern manifests in a time window against another time window. We deem contrast measures potentially useful heuristic criteria based on two considerations. Firstly, traditional interestingness measures polarize the mining process around the most interesting patterns in a time window, neglecting whether they are equally interesting on two different time windows or not, with the result of leaving any change-related consideration to the change detection step of the PBCD algorithm. On the contrary, it would be useful to evaluate how the patterns deviate over two different time windows at an early stage. Secondly, contrast measures are used to identify the discriminating patterns as the most interesting ones, that is, patterns whose frequency significantly differs between two data portions. It follows that their adoption within non-exhaustive PBCDs can help in identifying only the most relevant patterns on which to identify the change.

To give a concrete illustration of the contribution of heuristic methods in PBCDs, we provide a toy example in Fig. 1. Consider a dynamic network represented by a sequence of undirected simple graphs, each depicts the network at a specific time point. At \(t_0\), the graph initially consists of four nodes, namely Alice, Bob, Carol, and Dave: Alice knows (aka, ’is connected with’) both Bob and Carol, Bob knows Dave and Alice, while Carol knows only Alice, Dave knows only Bob. At time \(t_1\) the graph does not change and remains the same. Then, after some time, at \(t_2\), there are structural changes corresponding to the appearance of the node Eve, which is connected with Dave. The graph remains unaltered at \(t_3\). Thus, the dynamic network exhibits changes over the time points \(t_2\) and \(t_3\) compared to the time points \(t_0\) and \(t_1\). The most interesting changes are not those that concern only individual insertions of nodes but subgraphs and can be seen at higher temporal granularity, that is, time intervals, rather than merely individual time points. For instance, during [\(t_0\),\(t_1\)], we register two occurrences of the (open) triplets “Alice-Bob-Carol”Footnote 1 and “Alice-Bob-Dave”, while during [\(t_2\), \(t_3\)], we register two (new) occurrences of the triplet “Bob-Dave-Eve”. To reveal these changes, a traditional PBCD, which relies on exhaustive search, would enumerate the triplet “Bob-Carol-Dave” or even more simply the subgraph “Bob-Carol” which never appeared over time and does not convey information depicting the changes of the whole network, but it would be equally considered as ’sources’ of changes. In a toy example, this has little impact on the efficiency of the PBCD, but things may change in real-world dynamic networks with thousands of nodes, where the efficiency may be affected.

The key idea behind heuristic PBCDs is to narrow down the search space of frequent subgraphs. This is done by limiting the selection of the edges when generating subgraphs of n edges from subgraphs of \(n-1\) edges to a restricted set of edges. In the case of heuristic PBCDs based on a frequency measure, each subgraph is expanded by considering the top-k most frequently observed edges in a time window. For instance, given the subgraph “Bob-Dave”, this is further expanded by prioritizing the edges with the highest frequency concerning \([t_2, t_3]\), thus selecting the top-k edges from the list “Alice-Bob” (2 occurrences) and “Dave-Eve” (2 occurrences). Here, values of k lower than 3 lead to discarding both candidates and therefore not being evaluated.

In the case of heuristic PBCDs based on a contrast measure, each subgraph is expanded by considering the top-k edges whose frequency has changed significantly in \([t_2, t_3]\) compared to \([t_0, t_1]\), thus, for the subgraph “Bob-Dave”, we select the top-k edges from the list “Dave-Eve” (2 occurrences against 0), “Alice-Bob” (2 occurrences against 2) and “Alice-Carol” (2 occurrences against 2). Here, on the contrary, for k=1, “Dave-Eve” is selected for generating the subgraph “Bob-Dave-Eve”, which is already the triplet that completely depicts the change. On the contrary, the exhaustive approach would have still expanded “Bob-Dave”, with “Alice-Bob”, “Dave-Eve”, and “Bob-Alice” without considering any associated interestingness.

Our study comprises an extensive empirical evaluation conducted on both real-world and synthetic dynamic networks. We evaluate the efficiency and accuracy of different algorithmic solutions, that are compliant with the general PBCD architecture. This has been done by injecting four different heuristic approaches, namely three contrast measures (e.g. the growth rate, the odds ratio, and the support difference) and one frequency-based measure, into multiple non-exhaustive PBCDs built by tweaking:

  • The time-window models used to scan and collect dynamic network data. Specifically, we consider three different time-window models to detect the underlying changes.

  • The dissimilarity measures used to quantify the changes. These measures operate on pattern sets and are used to estimate the changes between two different time windows in terms of variations occurring on the two respective pattern sets.

  • The pattern representations. Two alternative notions of patterns are considered to model different structural configurations of the underlying dynamic network. The first one refers to connected unweighted induced sub-graphs. The second one corresponds to unweighted induced sub-trees and is used to make the pattern generation operation more efficient, at the cost of missing some topological regularities when compared to the first one.

Then, we compare each resulting heuristic non-exhaustive PBCD against its exhaustive counterpart. Evaluating the efficiency allows us to draw indications concerning methods that equally leverage the non-completeness of the pattern set while evaluating the accuracy allows us to compare with significant competitors from the viewpoint of change detection predictive performance. We also provide arguments on the viability of heuristic approaches with a qualitative evaluation of the measured changes on a synthetic dynamic networks. Finally, this study comprises also the analysis of the computational complexity in the worst case and a discussion of the related literature.

Fig. 1
figure 1

Social dynamic networks made of 4 snapshots observed ad time points \(t_0\), \(t_1\), \(t_2\), and \(t_3\), respectively

3 Background

Let N be the set of nodes, L be the set of edge labels, and \(I = N \times N \times L\) the alphabet of all the possible labeled edges, on which a lexicographic order \(\ge \) is defined. A dynamic network is represented as the time-ordered stream of graph snapshots \(D = \langle G_1, G_2, \dots , G_n \rangle \). Each snapshot \(G_i \subseteq I\) is a set of edges denoting a directed graph observed in \(t_i\), which allows self-loops and multiple edges with different labels. \(G_i\) is uniquely identified by id i. Let G be a directed graph, two notions can be derived, i) connected subgraph \(C \subseteq G\), which is a directed graph such that for any pair of nodes (of the node set of C) there exists a path linking them, and ii) subtree \(T \subseteq G\), which is a connected subgraph where every node (of the node set of T) is linked to a unique parent node, except for the root node. Intuitively, subtrees have a structure less complex than the connected subgraphs, which, on an equal node-set, may represent other sorts of relationships, besides the one hierarchical, typical of the subtrees.

The data representation fits with the one adopted in transactional data mining, allowing the mining of frequent patterns by adapting traditional frequent itemset mining algorithms. In this perspective, a snapshot \(G_{tid} \in D\) is a transaction uniquely identified by tid, whose items are labeled edges from I. While a pattern \(P \subseteq I\), with length |P|, can be seen as a word \(P = \langle i_1\dots i_n\rangle \) of n lexicographic sorted items, with prefix \(S = \langle i_1\dots i_{n-1} \rangle \) and suffix \(i_n\). The tidset of P in the network D is defined as \(tidset(P,D) = \{ tid \mid \exists G_{tid} \in D \wedge P \subseteq G_{tid}\}\), while the support of P in D is \(sup(P,D) = \frac{|tidset(P,D)|}{|D|}\). P is frequent in D if \(sup(P,D) > \alpha \), where \(\alpha \in [0,1]\).

We deem two types of patterns interesting for PBCDs designed for network data. They are defined upon the two notions of directed graph before introduced respectively, namely,frequent connected subgraphs (FCSs), ii) frequent subtrees (FSs). Both FCSs and FSs are mined from snapshots belonging to time windows. A window \(W=[t_i, t_j]\), with \(t_i < t_j\), is the sequence of snapshots \(\{G_i, \dots , G_j\} \subseteq D\). Consequently, the width \(|W|=j-i+1\) is equal to the number of snapshots collected in W. For our convenience, we term \(F_W\) the set of all the FCSs (FSs) in the window W.

3.1 Problem statement

Let \(D = \langle G_1, G_2, \dots , G_n \rangle \) a dynamic network, \(\alpha \in [0,1]\) be the minimum support threshold, \(\beta \in [0,1]\) the minimum change threshold. Then, pattern-based change detection finds pairs of windows \(W=[t_b, t_e]\) and \(W'=[t_b', t_e']\), where \(t_b \le t_b' \le t_{e+1}\) and \(t_e < t_e'\), satisfying \(d(F_W, F_{W'}) > \beta \), where i) \(F_W\) and \(F_{W'}\) are the sets of patterns discovered on W and \(W'\) according to \(\alpha \), and ii) \(d(F_W, F_{W'}) \in [0,1]\) is a dissimilarity measure between sets of patterns. In this perspective, changes correspond to significant variations in the set of patterns discovered on two windows, which denote stable features exhibited by the graph snapshots.

4 Architecture of a PBCD

The change detection problem can be solved by various computational solutions. In this section, we provide the general architecture of a PBCD for network data, by generalizing the framework KARMA (Loglisci et al., 2018).

In general, a PBCD forms a two-step approach in which: i) a pattern mining algorithm extracts the set of patterns observed from the incoming data, and ii) the amount of change is quantified by adopting a dissimilarity measure defined between sets of patterns. Practically speaking, a PBCD is an iterative algorithm that consumes data coming from a data source, in our case a dynamic network, and produces quantitative measures of changes. In its original design, the framework KARMA implemented a PBCD algorithm based on exhaustive mining of FCSs, whose general workflow can be seen in Fig. 2. The algorithm iteratively consumes blocks \(\varPi \) of graph snapshots coming from D (Step 2) by using two successive landmark windows W and \(W'\) (Step 3). This way, it mines the complete sets of FCSs, \(F_W\) and \(F_{W'}\), necessary to the detection step (Steps 4-5). The window grows (\(W = W'\), Step 8) with new graph snapshots, and the associated set of FCSs is kept updated (Step 9) until the Tanimoto coefficient \(d(F_W,F_{W'})\) exceeds \(\beta \) and a change is detected. In that case, the algorithm drops the content of the window by retaining only the last block of transactions (\(W = \varPi \), Steps 6-7). Then, the analysis restarts.

Fig. 2
figure 2

Block-diagram of a PBCD architecture

The framework offers a general architecture (Fig. 2) for building different variants of a PBCD through different design decisions of four basic components. These are i) the window model (Fig. 2, Steps 3, 8 and 6), ii) the feature space (FCSs or FSs), iii) the mining step (Fig. 2, Steps 4, 9 and 7), and iv) the detection step (Fig. 2, Step 5). In the following sections, we will focus on both the mining step and the detection step, also by commenting on their contribution to the efficiency of the PBCD strategy.

Here, we briefly discuss the choice of an appropriate time window model for a PBCD. We deem as interesting 3 models: the landmark model, the sliding model, and the mixed model. They differ in the way they consume the incoming block \(\varPi \) of graph snapshots. In its original design, KARMA uses the landmark model. Here, \(\varPi \) is added to the window W, forming the successive window \(W' = W \cup \varPi \) (Fig. 2, Step 3). In this model, the window grows until a change is detected and when a change is detected, old data are discarded. In the sliding model, the detection is performed on two successive windows W and \(\varPi \) of fixed size. The windows are non-overlapping and they always slide forward, both in case of change detected and not detected (\(W = \varPi \)). Therefore, old data are always discarded. In the mixed model, the detection is performed on W and \(\varPi \), as in the sliding model. However, as in the landmark model, \(\varPi \) is added to W, forming \(W' = W \cup \varPi \) until a change is detected. In that case, old data are discarded.

5 Exhaustive and non-exhaustive mining in PBCDs

The main difference between exhaustive and non-exhaustive PBCDs lies in the exhaustiveness of the mining step used to discover the patterns, which is the major bottleneck of any exhaustive PBCD approach. In fact, the discovery of an exponentially large number of patterns affects the efficiency of both the mining and detection steps, hence rendering this class of algorithms not particularly efficient in practice. The main objective of this paper is to reduce the computational complexity of exhaustive approaches by adopting heuristic and non-exhaustive ones. In particular, we propose a mining algorithm able to prune the search space of patterns following a beam-search approach.

Being based on beam-search, the proposed approach relies on a heuristic criterion h(PW) and a parameter k which controls the beam size of the mining step. In particular, the mining algorithm traverses the search space of patterns, that is, the lattice \(L=(2^I, \subseteq )\) ordered by the generality relation \(\subseteq \) and conveniently represented in a SE-Tree (Rymon, 1992) data structure. The mining step only considers the most interesting patterns according to both the heuristic and the beam size k, thus preventing the mining of less interesting ones.

As in every beam-search approach, different heuristic criteria and values of k may produce very different results. For instance, regardless of the heuristic adopted, exhaustive mining can be achieved with non-exhaustive procedures by setting \(k = |I|\). In this case, different heuristics would lead to evaluating every pattern because, independently of the ranking imposed by the heuristic, all of them are considered in the next evaluation step. For this reason, we will refer to Algorithm 1 in both cases. Furthermore, the interestingness of patterns is subjective and therefore evaluated according to different heuristic measures h(PW), thus completely changing the way the search space is pruned. For instance, frequency measures h(PW) evaluate the interestingness of patterns according to the window W only. While, a different approach would evaluate the interestingness \(h(P,W,W')\) with respect to the two time windows W and \(W'\) on which to detect the change, as done in contrast measures. In the following, we refer to Algorithm 1 in the case of heuristic approaches based on frequency measures, while we refer to Algorithm 2 in the case of heuristic approaches based on contrast measures.

Regardless of the heuristics adopted, both Algorithms 1 and 2 implement a pattern-growth approach for mining patterns \(F_W\) in a time window W, and it is initially called with empty prefix \(\emptyset \). An important remark is that exhaustive PBCDs rely on complete pattern sets, discovered by the exhaustive mining procedure, as the feature sets for the detection problem. On the contrary, non-exhaustive PBCDs rely only on limited pattern sets discovered by the non-exhaustive mining procedure.

Algorithm 1
figure a

Mining based on beam search and frequency heuristics.

5.1 Exhaustive FCSs and FSs mining

The mining procedure (Algorithm 1) takes 5 input parameters, that is the content of the window W, the minimum support threshold \(\alpha \), the beam-size k, the pattern prefix (initially equals to \(\emptyset \)), and the heuristic \(h(P,W) \in \mathbb {R}\). The algorithm exhaustively traverses the search space of FCSs and FSs by setting \(k=|I|\), following a recursive DFS approach. In particular, it can i) build patterns with a pattern-growth approach in which items are appended as suffixes to a pattern prefix, and ii) evaluate the supports through the tidset intersection. The result is the complete set of the frequent patterns \(F_W\) in W according to \(\alpha \).

The procedure considers the window W as an i-conditional database of transactions in which every item \(j \le i\) has been removed, as done in Geerts et al. (2004). At the beginning of each recursive call, Line 2 initializes the set F[P] of frequent patterns on prefix P as empty. Then, Lines 3-10 exploit the vertical layout of W, and test the supports against the threshold \(\alpha \). The FCS (FS) \(P \cup \{i\}\) is built by appending the item i to the prefix P only when allowed by the predicate isValid (Line 4), which checks whether \(P \cup \{i\}\) is a connected subgraph (when mining FCSs) or a subtree (when mining FSs), respectively. Lastly, they are added to the set F[P]. The algorithm adds the suffix i of any pattern discovered to Items.

The line 11 selects only the most promising subset of k patterns, according to the heuristic adopted. In practice, this line is irrelevant in exhaustive mining as it will always select all the patterns, since \(k=|I|\). Then, lines 12-22 build the i-conditional databases on which to perform recursive calls. In particular, the algorithm iterates over each item i in Beam, and Line 13 initialize the associated i-conditional database \(W_i\) as empty. Lines 14-17, iterate on items j from Beam such that \(j > i\). This way, Line 15 computes the tidset C as the set intersection between the tidsets of i and j in the database W, respectively. Then, C is the tidset of j in the newly created i-conditional database \(W_i\). The mining procedure is recursively called at Line 19 for mining the set \(F[P \cup \{i\}]\) of FCSs (or FSs) with valid prefix \(P \cup \{i\}\), according to the pattern language. This way, subgraphs that are not connected (or do not represent trees) are pruned at Line 18. Finally, the patterns in \(F[P \cup \{i\}]\) are added to F[P], which is returned as the final result.

The exhaustive mining of FCSs and FSs requires time proportional to \(O(2^{|I|})\) in the worst-case scenario, in line with that of traditional frequent itemset mining. Moreover, due to the constraints imposed by the pattern language, the number of FCSs and FSs is in practice much lower than the number of itemsets, FSs < FCSs \(< 2^{|I|}\). However, the mining time is still exponential in the number of edges |I|.

5.2 Non-exhaustive FCSs and FSs mining with frequency heuristics

The non-exhaustive mining is achieved by pruning the search space of FCSs and FSs according to some heuristic criteria. In particular, when calling Procedure 1 with \(k < |I|\), the intermediate selection step (Line 11) selects only the most promising subset of k patterns to further advance the process (Lines 12-22). As in traditional beam search-based algorithms, frequent patterns in a recursive call are evaluated by means of a heuristic evaluation function and sorted in increasing order. Then, only the first top-k of them are further considered. Clearly, many heuristics can be used to select the most promising subset of patterns, among them we will adopt frequency measures. In particular, Line 11 sorts the patterns in F[P] according to their supports in the window W, then only k of them with the greatest scores are kept to further advance the process, while the remaining ones are ignored. In this case, since the algorithm sorts patterns having same length, the evaluation based on the support is consistent with the evaluation based on the area.

Proof

Let S be a pattern, and W be a window. Then, the support \(sup(S,W) = \frac{|tidset(S,W)|}{|W|}\) and the area \(area(S,W) = |S|\cdot |tidset(S,W)|\) are linearly proportional to |tidset(SW)| with two constant factors \(\frac{1}{|W|}\) and |S|. The area of the FCS (or FS) S in the window W, area(SW), is an interestingness measure adopted in tile mining (Geerts et al., 2004). In our case, it is used to restrict the search space by considering only the most interesting patterns at each recursion step.

This simple, yet effective approach allows us to significantly prune the search space when mining the limited sets \(F_W\) and \(F_{W'}\). In particular, when adopting frequency measures the mining is more focused on patterns covering large portions of the window, tiles from a transactional database point of view (Geerts et al., 2004), and hence more interesting. The non-exhaustive mining procedure is more efficient than the exhaustive one, requiring time proportional to \(O(2^k)\) in the worst-case scenario. In fact, the algorithm restricts the attention to only k items from I in the base recursion step, with \(k<< |I|\).

Algorithm 2
figure b

Mining based on beam search and contrast heuristics.

5.3 Non-exhaustive FCSs and FSs mining with contrast heuristics

When performing non-exhaustive mining with contrast heuristics, additional considerations arise. In particular, while traditional frequency-based interestingness is evaluated on a single window, contrast heuristics quantify the discriminating power of patterns between two windows. Contrast measures, typically ascribe such interestingness to variations in the observed frequencies of patterns over two windows. For this reason, we refer to Algorithm 2 which is able to compute the frequencies associated with W and \(W'\) for every considered pattern, in place of Algorithm 1, when performing non-exhaustive PBCD with contrast heuristics.

Algorithm 2 extends Algorithm 1 in three directions. Firstly, the mining procedure (Algorithm 2) takes 6 input parameters instead of 5, that is the content of the windows W and \(W'\), the minimum support threshold \(\alpha \), the beam-size k, the pattern prefix (initially equals to \(\emptyset \)), and the contrast heuristic \(h(P,W,W') \in \mathbb {R}\). Secondly, lines 15-20 build two initially empty i-conditional databases \(W_i\) and \(W'_i\), instead of one. Then, lines 16-17 compute the tidsets C and \(C'\) as the set intersection between the tidsets of i and j in the database \(W_i\) and \(W'_i\), respectively, while lines 18-19 append them to the newly created \(W_i\) and \(W'_i\). Thirdly, the procedure is recursively called at Line 22 for mining the set \(F[P \cup \{i\}]\) by passing the two i-conditional databases \(W_i\) and \(W'_i\).

As for the contrast heuristics, we consider three measures: the growth-rate, the odds-ratio, and the support difference of patterns (Bailey, 2013) between two windows W and \(W'\), respectively. In their original formulations, the three measures quantify the interestingness of patterns when picked as discriminative features between two databases. Specifically, they symmetrically quantify how the support of patterns deviates between two databases, this means that they give a practical idea of three cases: i) the support increases, ii) the support decreases, and iii) the support does not vary between the two databases, respectively. In the case of PBCD we deem patterns whose support decreases as equally interesting as those whose support increases between two time windows W and \(W'\), for this reason, we give the following asymmetric formulations:

Definition 1

(Growth-rate) Let P be a pattern, and W and W’ two time windows. The growth-rate heuristic is defined as follows

$$\begin{aligned} h(P,W,W') = GR(P,W,W') = \frac{max(sup(P,W), sup(P,W'))}{min(sup(P,W),sup(P,W'))} \end{aligned}$$
(1)

The growth rate is an interestingness measure used in Dong and Li (1999) to assess the quality of emerging patterns. It is greater or equal to 1, specifically high values are associated with the most interesting patterns, while growth-rate equal to 1 is associated with uninteresting patterns.

Definition 2

(Support difference) Let P be a pattern, and W and W’ two time windows. The support difference is defined as follows

$$\begin{aligned} h(P,W,W') = SD(P,W,W') = |sup(P,W) - sup(P,W')| \end{aligned}$$
(2)

Similarly to the growth rate, the support difference quantifies the absolute support difference of a pattern S between W and \(W'\). It was used as an interestingness measure used in Dong et al. (2004) to assess the quality of contrast sets. It is greater or equal to 0, specifically high values are associated with the most interesting patterns, while a difference equal to 0 is associated with uninteresting ones.

Definition 3

(Odds ratio) Let P be a pattern, and W and W’ two time windows. The odds-ratio is defined as follows

$$\begin{aligned} h(P,W,W') = OR(P,W,W') = \frac{max(\frac{sup(P,W)}{1-sup(P,W)}, \frac{sup(P,W')}{1-sup(P,W')})}{min(\frac{sup(P,W)}{1-sup(P,W)}, \frac{sup(P,W')}{1-sup(P,W')})} \end{aligned}$$
(3)

Similarly to the growth rate, the odds ratio is greater or equal to 1. Specifically, high values are associated with the most interesting patterns, while a ratio equal to 1 is associated with uninteresting ones.

Regardless of the heuristic used, non-exhaustive mining based on contrast measures prunes the search space by focusing on patterns covering large portions of W and small portions of \(W'\), and vice-versa. This means that the mining is focused on patterns on which a change is more likely to be detected from subsequent steps in the PBCD process.

From a computational perspective, the procedure is as efficient as the one using frequency-based heuristics. In fact, Algorithm 2 still requires time proportional to \(O(2^k)\) in the worst-case scenario, since, as for Algorithm 1, it restricts the attention on only k items from I in the base recursion step, with \(k<< |I|\). Moreover, the three contrast heuristics considered can be computed in O(1) without adding significant overhead to frequency-based ones. Finally, the computation of an additional tidset \(C'\) for each pattern could worsen the efficiency of the algorithm. However, since the two windows W and \(W'\) are always related to each other, according to the time window model used by the PBCD algorithm, so the tidsets are. Therefore, the algorithm can be improved by storing the tidsets previously computed on \(W'\) (W) when evaluating the contrast with respect to W (\(W'\), respectively) for later use, that is when non-exhaustively discovering patterns on W (\(W'\)) while evaluating the contrast with respect to \(W'\) (W, respectively) and vice-versa.

6 Detecting changes on pattern sets

Once the complete or limited pattern sets, \(F_W\) and \(F_{W'}\), have been discovered by either the exhaustive or non-exhaustive procedure, respectively, the detection step can be executed and the dissimilarity score \(\beta = d(F_W, F_{W'})\) computed. We recall that \(d(F_W, F_{W'})\) is a binary dissimilarity measure defined on sets of patterns. For our convenience, we define it as operating on the vector encoding \(\textbf{w}\) and \(\mathbf {w'}\) of \(F_W\) and \(F_{W'}\), respectively.

6.1 Detecting changes on complete pattern sets

When detecting changes on complete pattern sets, the encoding is built by enumerating the patterns in \(F_W \cup F_{W'}\). More specifically, \(\textbf{w}\) (\(\mathbf {w'}\)) is a vector of size \(n = |F_W \cup F_{W'}|\), where the i-th element is a weight associated to the i-th pattern in the enumeration of \(F_W \cup F_{W'}\) with respect to W (or \(W'\), respectively). Then, a change is detected if the dissimilarity score exceeds the minimum threshold \(\beta \), that is when \(d(F_W,F_{W'}) > \beta \).

Fig. 3
figure 3

Example of binary (top) and real-valued (bottom) vector encoding of \(F_{W}\) and \(F_{W'}\). Dashed circles denote infrequent patterns with \(\alpha =0.5\)

In the case of KARMA, as shown in Fig. 3, \(\textbf{w}\) and \(\mathbf {w'}\) are binary vectors indicating whether each FCS from the enumeration is frequent or not in W and \(W'\), respectively. Then the algorithm computes the Tanimoto coefficient \(d(F_W,F_{W'}) = 1 - \frac{\textbf{w} \cdot \mathbf {w'}}{\Vert \textbf{w}\Vert ^2 + \Vert \mathbf {w'}\Vert ^2 - \textbf{w} \cdot \mathbf {w'}}\). By doing so, KARMA quantifies the portion of FCSs that have crossed the minimum support threshold, thus indicating a relevant change in the underlying graph data distribution. However, this solution does not take into account the FCSs not crossing the minimum support threshold, although exhibiting a potentially significant support spread.

To overcome this limitation, an alternative approach also shown in Fig. 3 is to build the vector encoding as real-valued vectors of supports in W and \(W'\), respectively. Then, it is possible to compute the weighted Jaccard dissimilarity \(d(F_W,F_{W'}) = 1 - \frac{\sum _{i} min(\textbf{w}_i, \mathbf {w'}_i)}{\sum _{i} max(\textbf{w}_i, \mathbf {w'}_i)}\). We deem this measure as relevant because relates the dissimilarity to the support difference (Bailey, 2013) of each pattern S, defined as \(SD(S,W,W') = |sup(S,W) - sup(S,W')|\).

Proof

Given the analytic formulations for \(max(a,b) = \frac{1}{2}(a+b+|a-b|)\) and \(min(a,b)=\frac{1}{2}(a+b-|a-b|)\), and the vector encoding \(\textbf{w}\) and \(\mathbf {w'}\) of \(F_W\) and \(F_{W'}\). The weighted Jaccard dissimilarity can be rewritten as \(d(F_W,F_{W'}) = 1 - \frac{\sum _{i} min(\textbf{w}_i, \mathbf {w'}_i)}{\sum _{i} max(\textbf{w}_i, \mathbf {w'}_i)} = 1 - \frac{\sum _{i} \textbf{w}_i + \mathbf {w'}_i - SD(S_i,W,W')}{\sum _{i} \textbf{w}_i + \mathbf {w'}_i + SD(S_i,W,W')}\)

In exhaustive PBCDs, the number of patterns grows exponentially in the number of items. Therefore, regardless of the measure \(d(F_W,F_{W'})\) adopted, the detection step on complete pattern sets requires an amount of time proportional to the number of patterns in the enumeration.

6.2 Detecting changes on limited pattern sets

Non-exhaustive PBCDs detect changes in the same way exhaustive PBCDs do, that is by computing the score \(d(F_W,F_{W'})\) in terms of the Tanimoto coefficient or the unweighted Jaccard dissimilarity, and testing it against the minimum change threshold \(\beta \). Although the detection approach remains the same, a subtle difference in the meaning of the detection is present. In fact, while the dissimilarity measures adopted on complete pattern sets quantify how much the supports of FCSs (FSs) change between W and \(W'\), they do not consider the interestingness of patterns on W and \(W'\), respectively, as intended by the non-exhaustive mining algorithm.

Fig. 4
figure 4

Example of limited sets of frequent patterns \(F_W\) (left) and \(F_{W'}\) (right) discovered with \(k=2\) and \(\alpha =0.1\). Dashed circles denote pruned non-interesting patterns

Non-exhaustive mining prunes the search space of patterns in two different ways for W and \(W'\), respectively. Thus restricting the search only to the most interesting FCSs and FSs, while discarding the less interesting ones (Fig. 4). By doing this, the detection relies on a considerably low number of patterns, hence becoming more efficient, while losing information associated with patterns that have been pruned. This affects the construction of the vector encoding \(\textbf{w}\) and \(\mathbf {w'}\), which is built according to the enumeration \(F_W \cup F_{W'}\) consisting of a reduced number of patterns. The example reported in Fig. 4 depicts a scenario in which every pattern is frequent in both W and \(W'\), although with different supports, thus determining different interestingness. Any information related to the patterns “ac” and “abc” is lost, as they are not present in the enumeration of \(F_W \cup F_{W'}\), and therefore they do not contribute to the change. Therefore, the detection step becomes non-exhaustive by focusing the detection only on the most interesting frequent patterns. In particular, as the number of patterns discovered by the non-exhaustive mining procedure grows exponentially with the parameter \(k<< |I|\), the detection step requires in practice much smaller time than that required on complete pattern sets.

The heuristic plays a crucial role in determining which portions of the search to prune or not, and consequently determining the most interesting features on which to measure the dissimilarity score \(d(F_W,F_{W'})\). In particular, non-exhaustive mining with frequency heuristics discovers FCSs (or FSs) among patterns associated with large portions of the window, thus denoting stable features over time. On the contrary, non-exhaustive mining with contrast heuristics discovers FCSs (or FSs) among the ones associated with unstable regions of W and \(W'\) over time, thus denoting highly varying features over time which are more promising generators of changes. The advantage of contrast heuristics is that of accounting for changes at an early stage, that is, during mining, rather than postponing any change-related decision at the late stage of change detection in the PBCD process.

7 Computational Complexity

In this section, we study the computational complexity of PBCDs in the worst-case scenario. In particular, the analysis takes into account the influence of the feature space, the mining step, the detection step and the window model. Given the dynamic network \(D = \langle G_1, G_2, \dots , G_n \rangle \) where I denotes the possible labeled edges observed over time, and \(|\varPi |\) the size of blocks, then every PBCD built according to the architecture in Fig. 2 consumes exactly \(e = \frac{n}{|\varPi |}\) blocks of transactions, thus requiring O(e) iterations. The time complexity \(O(a + b)\) required during every iteration depends on the cost a of the mining step, and the cost b of the detection step.

The mining step requires time complexity \(a = O(2^{c}) \cdot d\) in the worst-case scenario. \(O(2^{c})\) denotes the number of patterns discovered according to the feature space and to the exhaustiveness of the mining step. In the exhaustive setting, all the edges (\(c=|I|\)) are considered to discover \(O(2^{|I|})\) patterns. Since \(FSs< FCSs < 2^{|I|}\), we refer to \(O(2^{|I|})\) as the maximum number of patterns discovered in the worst-case scenario. However, it reduces to \(O(2^k)\) in the non-exhaustive setting (\(c=k\)), where \(k<< |I|\). The term d denotes a multiplicative factor describing the amount of work spent by the algorithm in tidset intersections, which depends on the time window model adopted. In the case of landmark and mixed model, it is \(O(|W|+|\varPi |)\), while in the case of the sliding model, it is \(O(|\varPi |)\). As for the detection step, the computation of the \(d(F_{W},F_{W'})\) requires time complexity O(b) proportional to the enumeration of patterns \(|F_W \cup F_{W'}|\), which is \(O(2^{|I|})\) in exhaustive setting, and \(O(2^k)\) in the non-exhaustive one.

Then, the computational complexity in the worst-case scenario of exhaustive PBCDs is \(O(e \cdot (d2^{|I|} + 2^{|I|}))\), while for non-exhaustive PBCDs is \(O(e \cdot (d2^k + 2^k))\). Therefore, it is exponential in |I| and k, with \(k<< |I|\), respectively.

8 Experimental results

The experiments are organized alongside different perspectives concerning both synthetic and real-world dynamic networks.

In particular, we answer the following research questions:

\({\textbf {Q1)}}\):

How the components of the general architecture affect efficiency and accuracy?

\({\textbf {Q2)}}\):

How the parameter k affects the efficiency and the accuracy of non-exhaustive PBCD on synthetic dynamic networks?

\({\textbf {Q3)}}\):

How the parameter k affects the efficiency of non-exhaustive PBCD on real-world dynamic networks?

\({\textbf {Q4)}} \):

How do the frequency and contrast heuristics affect the accuracy of non-exhaustive PBCD on synthetic dynamic networks?

For experiments on synthetic networks, we generated 40 networks, 20 with frequent drifts and 20 with rare drifts. Every network consists of 200 hourly blocks made of 120 graph snapshots, one observed every 30 seconds, for a total amount of 24000 snapshots. Each hourly block is built by randomly choosing with replacement i) one out of 10 different generative models in the case of frequent drifts, and ii) one out of 2 different generative models in the case of rare drifts. As a consequence, it is more likely that two consecutive hourly blocks are built according to different generative models, thus denoting a change, in the dataset with frequent drifts than in the one with rare drifts. Every generative model builds a first snapshot made of 50 nodes by adopting a random scale-free network generator, which is then replicated for the remaining snapshots of the block. Every graph snapshot of a block is then perturbed by adding new edges and removing existing ones with a probability equal to \(1.5\%\). A random perturbation is required to test the false alarm rate of the two approaches.

The experiments were executed on a desktop computer equipped with an Intel i7 @ 3.4 GHz processor, 16 GB of RAM, and Windows 10. We implemented every exhaustive and non-exhaustive PBCD algorithm using the jKarma software framework (Impedovo et al., 2020). In particular, the implementations of 60 different PBCDs used in the experiments, as well as every dataset considered, are publicly available and completely open-sourceFootnote 2.

8.1 Q1: How the components of the general architecture affect efficiency and accuracy

In this paper, we report a discussion on the differences, in terms of accuracy and efficiency, of the different PBCD variants, we may build, on the four main components of Fig. 2. These are i) time window model (in the three alternatives, landmark-LAN, sliding-SLI, and mixed-MIX), ii) feature space (FCS, FS), iii) mining step (exhaustive-EX, non-exhaustive-NEX), iv) detection step (Tanimoto dissimilarity score-TAN, weighted Jaccard score-WJ). Thus, we a total number of 24 alternative configurations which we tested on 40 synthetic dynamic networks, at six different values of \(\beta \) (while k was fixed to 20). To compare the variants under statistical significance criteria, two different statistical tests have been considered (both at significance level \(\alpha =0.05\)), that is, the Wilcoxon post-hoc test on the variants along the feature space, mining step, and detection step, while Nemenyi-Friedman post-hoc test on the variants along the time-window models. In Table 1, for each component, we report the solution that outperforms the others in terms of accuracy and in terms of efficiency, along the values of \(\beta \).

Table 1 Most accurate (top) and most efficient (bottom) PBCD variant when tuning \(\beta \)

As for the accuracy, the results show that FSs are more appropriate features than FCSs. This is not surprising why the simplified structure of the sub-trees mirrors the changes occurring to the edges involved better than the structure of the connected sub-graphs, which should account for the occurrences of a larger set of edges. Furthermore, the PBCDs equipped with non-exhaustive mining steps always outperform exhaustive ones. As expected, ranking mechanisms able to select the candidates to explore are beneficial for the ability to capture changes. The Tanimoto measure, as originally used by the framework KARMA, outperforms the weighted Jaccard measure for low values of \(\beta \). From this set of experiments, it is clear that the factors that impact the most over the PBCD accuracy are the change detection measure and the time windows model. In particular, it is strongly evident that a mixed model outperforms the landmark model, which is preferred only when \(\beta = 0.10\). This suggests that often storing long “histories” of snapshots does not make the detector sensitive to changes emerging from new snapshots.

As for the efficiency, the very low p-values indicate strong evidence that non-exhaustive PBCD based on FSs and the Tanimoto distance in the sliding model, outperforms every other PBCD approach. In particular, this is an expected result since i) the mining of FSs is less time-consuming than FCSs, and ii) a non-exhaustive mining strategy is more efficient than an exhaustive one.

An aspect worth considering is that the original implementation of KARMA (FCSs + EX + Tanimoto + Landmark) is never selected as the best PBCD. In this perspective, the adoption of a new feature set, the FSs, jointly with a non-exhaustive mining step generally improves the the detection accuracy and efficiency. However, the test suggests that the landmark model adopted by KARMA turns out to be brittle, leading to poor accuracy and efficiency. While the Tanimoto coefficient leads to poor detection accuracy.

8.2 Q2: How the parameter k affects the efficiency and the accuracy of non-exhaustive PBCD on synthetic dynamic networks

We report the results of a comparative evaluation in which we compare the running times (Table 2), accuracy values and false alarm rates (Table 3) of two PBCD variants against two competitors, namely KARMA (Loglisci et al., 2018) and StreamKRIMP (van Leeuwen & Siebes, 2008). KARMA works as above described, while StreamKRIMP handles the dynamic network as a stream of labeled edges and adopts a compression-based mining step.

The two selected PBCD variants are the configurations FSs + NEX + Weighted Jaccard + Mixed (afterward, PBCD-1) and FSs + NEX + Tanimoto + Sliding) (afterward, PBCD-2) and have been considered by their accuracy and efficiency, as illustrated in Section 8.1 (at \(\beta = 0.2\). ). Here, we test their performances on eigth synthetic networks while tuning the parameter k.

The results in Tables 2 and 3 show increasing efficiency and accuracy for decreasing values of k. The accuracy is computed as in the conventional classification problem, so we frame the change detection problem as a binary classification problem (change detected vs change undetected) and evaluate the rate of correct detection with respect to the synthetic ground truth. Similarly, the false alarm rate quantifies the probability of falsely detecting a change after it has occurred in the synthetic ground truth.

Table 2 Running times of PBCD-1 and PBCD-2, when tuning k, against KARMA and StreamKRIMP on synthetic data (\(\alpha =0.5\), \(\beta =0.20\), \(|\varPi |=15\))
Table 3 Accuracy and false alarm rate of PBCD-1 and PBCD-2, when tuning k, against KARMA and StreamKRIMP on synthetic data (\(\alpha =0.5\), \(\beta =0.20\), \(|\varPi |=15\))

Results in Table 2 show improved efficiency for both PBCD-1 and PBCD-2 against KARMA and StreamKRIMP. This is an expected result, also confirmed by the statistical significance test in Section 8.1, because non-exhaustive mining of FSs is more efficient than i) exhaustive mining of FCSs performed by KARMA, and ii) non-exhaustive mining of itemsets performed by StreamKRIMP. In particular, the running times of both PBCD-1 and PBCD-2 increase as with k, since high values of k lead to the discovery of an increasing number of patterns. Moreover, the results show that PBCD-2 is more efficient than PBCD-1, as the sliding window model leads to increasing efficiency with respect to the mixed model. This is explained by the forgetful nature of the sliding window, in which old graph snapshots are immediately discarded with the arrival of a new block of snapshots. In this way, the mining step requires reduced computational efforts, as patterns are mined from reduced sets of transactions. Thus intersecting small tidsets when computing the support of each pattern.

As for the accuracy (Table 3), both PBCD-1 and PBCD-2 are optimal change detection solutions, for the considered synthetic networks, for low values of k (\(k=5\) and \(k=10\), respectively). Moreover, the results show a decreasing tendency in the accuracy of both PBCD-1 and PBCD-2. In particular, PBCD-1 always outperforms KARMA and StreamKRIMP (except for \(k=30\)), while this is not the case for PBCD-2. These are expected results, again confirmed by the significance test in Section 8.1, as the non-exhaustive strategy of FSs with a detection step based on the weighted Jaccard dissimilarity mostly leverages the absolute growth-rate and the interestingness of patterns. This is not the case with PBCD-2, in which the Tanimoto coefficient computed in the sliding window setting, on large sets of patterns, exhibits higher false positive rates. For high values of k, the mining step discovers patterns representing behavior local to the snapshots collected in two successive sliding windows of equal size. Thus, injecting noisy features in the detection step. Furthermore, both PBCD-1 and PBCD-2 optimally perform for very low values of k. This is an expected result since the non-exhaustive search on a reduced number of patterns has the discriminating ability to select the patterns truly associated with changes. Consequently, changes occurring on these (reduced) pattern sets can be effectively detected. We also note that i) PBCD-1 exhibits moderately lower false alarm rates than PBCD-2, also outperforming KARMA for high values of k, and StreamKRIMP for low values of k, and ii) PBCD-2 outperforms KARMA and StreamKRIMP for low values of k only.

We conclude that both the accuracy and efficiency of non-exhaustive PBCDs benefit from the limited pattern sets that have been discovered. In particular, the combination of a mixed window model with the weighted Jaccard dissimilarity leads to accurate detection, while the combination of sliding windows and the Tanimoto coefficient leads to efficient detection while improving the detection accuracy for very low values of k. From this perspective, the two approaches offer two efficient alternatives to the KARMA algorithm, in which the running times can be greatly reduced (up to two orders of magnitude in this set of experiments).

8.3 Q3: How the parameter k affects the efficiency and the change sensitivity of non-exhaustive PBCD on real-world dynamic networks?

We also provide a practical illustration of the performances in efficiency and sensitivity in change detection through experiments on real-world networks (Table 4). Thus, we report the results of a comparative evaluation between PBCD-1, PBCD-2, and KARMA on 5 real-world dynamic networks, when tuning k. Those datasets were also used to test the competitor KARMA in Loglisci et al. (2018). A description is reported in the following.

The keds datasetFootnote 3 (Schrodt et al., 1994; Brandes & Lerner, 2008) depicts the socio-political interactions, occurring between nations and worldwide organizations in the gulf region, as depicted in the news reports collected day by day from April 1979 to July 2009. In this dataset, 208 different nodes correspond to nations and worldwide organizations operating in the Gulf region, while 20 different edge labels are used to denote the type of political relationships occurring between the nodes.

The nodobo datasetFootnote 4 (Bell et al., 2011) concerns the state of the telecommunication network made of transactions (phone calls, SMSs, and Bluetooth interactions) between 27 students from a Scottish state high-school, from September 2010 to February 2011. In its original form, the dataset collected the transactional records of 13035 phone calls, 83542 SMSs, and 5292103 Bluetooth interactions. The duration of the phone calls and the length of the SMSs were also reported. When building the dynamic graph counterpart, we built a dynamic graph in which the nodes represent the students, while 11 edge labels denote the modalities of communication. In particular, the edge labels were generated by discretizing, with an equal width in 5 bins, the duration of the phone calls, and the length of the text messages between two students, respectively. On the other hand, we used a single label to refer to the presence of the Bluetooth connection.

Table 4 Real world datasets used in the experiments, values aggregated over time points

The noaa datasetFootnote 5 (Kalnay et al., 1996) was developed in the Reanalysis project by the National Center for Environmental Prediction and the National Center for Atmospheric Research. In its original form, the dataset gathered the atmospheric measurements of different meteorological quantities (e.g.: air temperature, wind speed, and relative humidity) made by geo-localized sensors equally distributed over space. When building the dynamic graph, we focused on the relative humidity measurements from January 1st, 1990 to December 31st, 2010, recorded on a daily basis over an area roughly spanning North-Central America. Clearly, nodes from the dynamic graph denote sensors, while edge labels are nominal values denoting the relative humidity between two linked sensors. As for nodobo, the edge labels were generated by applying an equal-width discretization with 10 bins to the values of relative humidity.

The wikitalkFootnote 6 dataset depicts the network of interactions among the authors of Wikipedia, the free encyclopedia, observed day by day from December 2001 to January 2008. More specifically, 1140141 nodes correspond to the authors, while 1 edge label is used to denote an edit performed by an author towards the Wikipedia talk page of another author.

The mawiFootnote 7 was developed in the MAWI Project by the Measurement and Analysis on the WIDE Internet Working Group. In its original form, the dataset collected the network traffic monitored over the network, in the form of IPv6 packets sent. In this work, we have used only a portion of the whole network, that is, the traffic monitored by sampling point D from January 25th, 2005 to January 31st, 2005. In particular, we built the dynamic graph by considering the IPv6 addresses as nodes, while 2745 edge labels represent both the communication protocol and the communication port used between the two devices. For example, the pair (TCP,80) indicates an HTTP packet. A graph snapshot represents a time interval of five seconds, during which two IPv6 addresses may communicate through different protocols.

To guarantee a fair comparison, we fixed the value \(\beta = 0.20\) in each experiment. However, since the networks span different periods, we independently fixed the size of block \(|\varPi |\) to 10% of each dataset, guaranteeing 100 iterations of each PBCD on every dataset. The minimum support \(\alpha \) has been fixed to 0.05 for keds, nodobo, and mawi, 0.20 for noaa, and 0.40 for wikitalks. By observing the results in Tables 5 and 6, it is evident that both PBCD-1 and PBCD-2 are always more efficient than KARMA for all the values of k. Furthermore, as observed on synthetic networks in Section 8.2 and as confirmed in Section 8.1, PBCD-2 is still more efficient than PBCD-1. The increasing tendency of the running times with k is verified in the mawi, noaa, and nodobo datasets.

As for the sensitivity in change detection, we assessed the impact of k on the characteristics of the detected changes over the considered datasets. To do so, we computed two measures, i) the average dissimilarity scores over each dataset and ii) the distribution of changes, that is, the average number of blocks between two consecutive change points. The results shown in Table 6 clearly denote that there is no influence of k on the changes, in fact, we observe nearly constant average change dissimilarities for both PBCD-1 and PBCD-2 on every considered dataset. Arguably, we note also that the set of patterns discovered, with the non-exhaustive strategy, when \(k = 5\) appears to be sufficient for detecting the same changes discovered at higher values of k. This consideration also applies to the change distribution. In fact, except for PBCD-2 on the mawi dataset, the detected changes have an almost uniform distribution over the blocks of the dynamic networks. Therefore, we conclude that non-exhaustive PBCDs on real-world datasets achieve approximately the same change detection performances regardless of the value of k, with the advantage of improving the efficiency for decreasing values of k.

Table 5 Running times of PBCD-1 and PBCD-2, when tuning k, against KARMA on real-world networks (\(\beta =0.20\), \(|\varPi |= 10\%\) of each dataset)
Table 6 Average dissimilarity scores and average distance between subsequent change points of PBCD-1 and PBCD-2, when tuning k, against KARMA on real-world networks (\(\beta =0.20\), \(|\varPi |= 10\%\) of each dataset)

8.4 Q4: How the frequency and contrast heuristics affect the accuracy of non-exhaustive PBCD on synthetic dynamic networks?

Up to this point, we have evaluated how the parameter k affects the efficiency and the accuracy of non-exhaustive PBCD algorithms, without making any consideration of the heuristics adopted. In fact, we have only considered non-exhaustive PBCDs with frequency heuristics and this analysis allowed us to gain a general knowledge of the behavior of non-exhaustive PBCDs against exhaustive ones.

We now evaluate how the frequency heuristic (area) and the three contrast heuristics (growth-rate, odds-ratio, support difference) affect the accuracy and the efficiency of non-exhaustive PBCD algorithms. In particular, our main claim is that different heuristics profoundly impact the overall detection accuracy. Specifically, differently informed mining, which is guided by different heuristic criteria, discovers considerably different incomplete sets of patterns. Furthermore, our second claim is that different heuristics would not significantly alter the efficiency of the mining process for two reasons: i) they can be efficiently computed in O(1) without adding significant computation overhead in the mining step, and ii) the efficiency of non-exhaustive PBCDs is controlled by the k parameter.

Testing for the best heuristics is not an easy task, as this would add a factor to the combinatorial explosion of the PBCDs already considered in the statistical significance test (Section 8.1), leading to a considerably high number of computational solutions. Instead, we perform a comparative evaluation between PBCDs drawn from a reduced space of all the possible combinations at \(\beta = 0.20\). In particular, we compare only non-exhaustive PBCDs based on FSs (FSs + NEX) since they are always selected as the most efficient and the most accurate PBCDs. For the same reason, we do not focus on PBCDs based on landmark windows, since they are not selected as the most accurate and most efficient time window models at \(\beta = 0.20\). On the contrary, we do not make assumptions on the dissimilarity measures adopted. This spans a set of 16 PBCDs based on i) non-exhaustive FSs mining (FSs + NEX), ii) 2 time window models (Mixed and Sliding), iii) 2 dissimilarity measures (Tanimoto and Weighted Jaccard, and iv) 4 heuristics (area, growth-rate, odds-ratio, absolute difference), which have been executed on 8 randomly generated datasets when tuning k. Then, tendencies are computed by averaging the accuracy and the running times on the considered datasets.

Table 7 Average accuracy of non-exhaustive PBCDs (FSs+NEX) with different heuristics on 8 synthetic dynamic networks (\(\alpha =0.50\), \(\beta =0.20\), \(|\varPi |= 15\))

As for the accuracy (Table 7), two considerations arise: i) regardless of the dissimilarity score and the time window models, the area leads to the best accuracy on average, and ii) the contrast measures always perform better in combination with Tanimoto rather than with the weighted Jaccard measure. In particular, when closely looking at the heuristics involved, on average, we observe that i) the growth-rate and the odds-ratio are comparable for low values of k on PBCDs based on sliding windows and Tanimoto, ii) the growth-rate performs in a comparable way with the area on PBCDs based on mixed windows and Tanimoto, and iii) the odds-ratio performs better than the other contrast measures on PBCDs based on the weighted Jaccard dissimilarity. Moreover, regardless of the heuristic adopted, PBCDs working with mixed windows are more accurate than the ones working with sliding windows, as also suggested by the statistical significance test. Although comparable in some cases, frequency heuristics perform better than contrast ones. Such an unexpected behavior is easily explained considering that contrast heuristics bias the mining towards frequent (infrequent) patterns which can become infrequent (frequent). Specifically, in the presence of candidate infrequent patterns, the mining is severely limited, since no more specific frequent patterns (FCSs or FSs) could be discovered starting from those infrequent. Therefore, frequency-based heuristics are a better choice for the non-exhaustive mining of frequent patterns (FCSs or FSs).

Table 8 Average running times of non-exhaustive PBCDs (FSs+NEX) with different heuristics on 8 syntetic dynamic networks (\(\alpha =0.50\), \(\beta =0.20\), \(|\varPi |= 15\))

As for the efficiency (Table 8), it is evident that the running times are always comparable, regardless of the heuristic, time window model, and dissimilarity measure used. In particular, non-exhaustive PBCDs based on sliding windows are slightly more efficient than those based on mixed windows, as also confirmed by the statistical significance test. The heuristic does not heavily influence the running times of PBCDs on average, confirming our claim, since their computation is always done in constant time, thus without adding overhead to the mining process. Furthermore, the running times of PBCDs equipped with contrast heuristics are bounded by those of PBCDs equipped with frequency heuristics. Therefore, contrast heuristics lead to faster PBCD algorithms. This is explained with the same consideration done when studying the accuracy, that is, the mining is severely limited when searching for frequent patterns among those that are more like to become infrequent. Gathering these considerations, we acknowledge that contrast heuristics lead to more efficient and generally less accurate PBCD algorithms based on frequent patterns. While, on the contrary, frequency heuristics lead to more accurate and comparably efficient PBCDs.

Fig. 5
figure 5

Changes quantified over time by exhaustive PBCD and heuristic PBCD based on the area of on FCSs (top) and FSs (bottom) (\(k=5\), \(\alpha =0.1\), \(\beta =0.2\), \(|\varPi |=15\))

9 Qualitative analysis

The experimental results so far presented show the contribution of heuristic PBCDs, compared to those exhaustive, in the numeric terms of the accuracy and efficiency of heuristic. In this section, we present a qualitative discussion by exploring the differences on how the heuristic approach quantify the amount of change in comparison to the exhaustive one. This has been done with a synthetic dataset paying attention to two aspects, i) differences in terms of frequency and contrast measures (Figs. 5 and 6), ii) response of the heuristic PBCDs to the parameter k setting (Fig. 7). Note that, in the following, we compare i) frequency-based heuristic PBCD (SLI + TAN) based on the area of FCSs and FSs, ii) contrast-based heuristic PBCD (SLI + TAN) based on the support difference of FCSs and FSs, and iii) exhaustive PBCD (SLI + TAN) based on FCSs and FSs. Furthermore, since the synthetic dataset is very long, for the sake of conciseness, we only report excerpts made of the first 100 change measurements.

Fig. 6
figure 6

Changes quantified over time by exhaustive PBCD and heuristic PBCD based on the support difference of FCSs (top) and FSs (bottom) (\(k=5\), \(\alpha =0.1\), \(\beta =0.2\), \(|\varPi |=15\))

Figure 5a and b show the changes quantified (Tanimoto) by the exhaustive PBCDs against those quantified by the heuristic PBCDs based on the area of FCSs and FSs. The figures clearly show that the Tanimoto scores always peak at five distinct time points. In particular, the changes are so relevant because data has been generated to guarantee that the overall topology of the underlying network snapshots changed considerably. Consequently, so did the sets of sub-graphs and sub-trees. It is worth noting that exhaustive PBCDs consistently yield higher quantifications of changes than frequency-based heuristic PBCDs. This result is determined by the non-exhaustive mining: while in exhaustive PBCD every FCSs (FSs) contributes to the Tanimoto dissimilarity score, even noisy or irrelevant patterns, in heuristic PBCD based on the area of FCSs (FSs) only the most important ones give contribution to the changes. Additionally, the spread between the quantifications at actual change points and the remaining ones is more evident for the heuristic PBCD based on the area of FSs (Fig. 5b), where the majority of change measurements correctly falls under the minimum change threshold \(\beta =0.2\), thus denoting a lower amount of false alarms in contrast of the heuristic PBCD based on the area of FCSs (5a). In practice, the change measurements are more stable and less noisy on FSs than on FCSs: detecting changes by observing FSs, a subset of FCSs, already mitigates the problem of injecting noisy features into the detection stage. Even if the end-users are allowed to carefully tune \(\beta \), change detection techniques able to contrast real change points from the remaining ones could free them from the burden of choosing the optimal \(\beta \) value.

Complementary to the frequency-based heuristic PBCDs, Fig. 6a and b show the changes measured (Tanimoto) by exhaustive PBCDs against those measured by heuristic PBCDs based on the support-difference of FCSs and FSs. This time, regardless of whether they operate on FCSs and FSs, the change measurements are overstated: abrupt changes are quantified (Tanimoto = 1.0) even in the presence of only minor changes in the underlying network (that is time points at which the topology of the network has not significantly mutated), thus denoting a considerably high amount of false alarms. Note that the same change quantifications occur regardless of whether the support difference is computed on FCSs and FSs. As stated in the introduction, the contrast heuristic prunes the search space of patterns during mining towards those whose support has already considerably shifted (in an absolute sense in the case of support difference) between the time windows. This way, change measurements are magnified because change-related information guides the mining process itself. Consequently, mining patterns by giving importance to those whose support has significantly shifted over time leads to high Tanimoto dissimilarity scores, as they will be computed upon almost completely disjoint sets of patterns that were frequent (infrequent) and that have become infrequent (frequent), hence almost completely different.

As concerns the response to the k setting, we observe that as the value of k increases, so does the amount of patterns the heuristic PBCDs inject into the detection step. Figure 7a and b show that the change quantification are gradually adjusted as k increases, thus ultimately (when \(k = |I|\)) letting the heuristic PBCDs behave like exhaustive PBCDs, regardless of the heuristic considered (frequency-based area measure or contrast-based support difference measure). Note that, to avoid the abruptly shifting change dissimilarity scores observed in Fig. 6 it was necessary to considerably increase the value of k up to values of 350 and 700.

Fig. 7
figure 7

Changes quantified over time (Tanimoto) by exhaustive PBCD and i) heuristic PBCD based on the area of FCSs when tuning \(k =\{5, 25\}\), and ii) heuristic PBCD based on the support difference of FCSs when tuning \(k =\{350, 700\}\) (\(\alpha =0.1\), \(\beta =0.2\), \(|\varPi |=15\))

10 Related work

The identification of concise patterns revealing changes in time-evolving data has gathered contributions from different fields, but no attempt seems to use non-exhaustive approaches to network data.

The idea that the compressed representations of changes are those that best compress the data is the common factor of a series of works that exploit the MDL principle, whose main peculiarity is preventing the generation of all the change-aware representations.

Paudel and Eberle (2020) processes the graph stream using a sliding window technique by decomposing each graph into a number of discriminative subgraphs that best compresses the graph using an MDL heuristic. The entropy of the window is calculated by analyzing the distribution of these discriminative subgraphs in relation to the graphs. The density-ratio estimation approach is applied to detect concept drift in the sequence of entropy values, which are obtained by advancing the sliding window one step at a time.

Yamanishi and Miyaguchi (2016) investigate the sequential change detection using the minimum description length (MDL)-change statistics. The authors first compute MDL-change statistics, as the difference in terms of minimum encoding length required for the case where the change does not occur and the case where it occurs. Then, they aggregate MDL changes over sequences of time-windows. The method seems to be able to recognize gradual and abrupt changes. The same authors extended that research by focusing on ’meta changes’, that is, variations about when and how changes occur (Fukushima & Yamanishi, 2019). StreamKRIMP, one of the competitors we used, encompasses MDL (van Leeuwen & Siebes, 2008). It computes and updates an MDL code table, which encodes the data-generating distribution in a space-efficient manner. An MDL-optimal codeword is assigned to each frequently occurring itemset in the stream. New blocks of items are tested whether they fit well with the distribution encoded in the current code table, when this does not happen a data distribution change is alerted.

Non-exhaustive pattern mining for time-evolving data has been designed to solve outlier detection problems, which cannot be seen as change detection since there are no data points that determine data distribution drifts, but spots that do not follow the expected distribution. In Giacometti and Soulet (2016) and Koufakou et al. (2011), the authors leverage the ability of the itemsets to handle high-dimensional categorical data for outlier detection and face the huge number of frequent itemsets with the well-known condensed representation of non-derivable itemsets (Calders et al., 2004). For instance, Koufakou et al. (2011) extract non-derivable itemsets and non-almost derivable itemsets (a flexible notion of the non-derivable ones) from each point and use these to compute an anomaly score. A long list of contributions is instead available under the umbrella of evolutionary algorithms adopting heuristic criteria on static data. For instance, Lim et al. (2012) proposes a genetic algorithm and exploits the correlation intra-items, in place of frequency-based thresholds and anti-monotonicity property, to derive itemsets with high correlation and supersets of those itemsets. On the contrary, in Djenouri et al. (2012), the authors propose a bee swarm optimization strategy that randomly explores the space to find the possible itemsets, which are then selected using quality measures based on inter-distance. The algorithm has been later extended with a Tabu search optimizer to identify the best neighbors (Djenouri et al., 2013).

As to the network data, Bifet et al. (2011) offers a perspective of representative patterns centered on the notion of closed patterns. They focus on the discovery of closed frequent sub-graphs that remain stable over streaming graphs under the influence of concept drift. However, their algorithm has not been designed for change detection and is exhaustive since the closure operator implicitly evaluates all the patterns. One of the few methods of change detection on graphs is reported in Chavary et al. (2017). It uses closed contrast patterns to summarize changes in network traffic. In particular, it first discovers closed patterns from attack connections and normal connections and then selects those of contrast as the closed patterns that statistically discriminate a category of connections with respect to the other. Another category of non-exhaustive methods is represented by pattern sampling techniques, which pick individual patterns proportionally to a given quality measure, instead of enumerating all of them. The sampling performs a search on the candidate subgraph partial order and returns subgraph samples when the exploration converges to a desired stationary distribution (Hasan & Zaki, 2009) or when solves a constraint satisfaction problem (Dzyuba et al., 2017). These works however have been designed only for large graphs and not for time-evolving graphs, which justifies the study on the heuristics proposed in this manuscript.

Recent research in Deep learning has boosted the interest both in the representation models for graph-structured data and in the design of algorithms to analyze advanced forms of graph data, such as those dynamic. Those in Deep learning, along with matrix factorization techniques and random walk-based methods complete the landscape of works proposed in the literature.

Factorization-based techniques for dynamic graph embedding aim to produce node embeddings across different time points by decomposing time-dependent similarity measures. These measures can be represented as sequences of matrices or three-way tensors. Matrix factorization methods express network evolution as matrices, leveraging time-dependent structural correlations among node pairs. They vary based on the minimized loss function, such as similarity matrix factorization with an inner-product function (Ferreira et al., 2019) or adjacency matrix reconstruction using Laplacian Eigenmaps (Li et al., 2017). The innovation lies in extending factorization over time while maintaining embedding stability and incorporating temporal dependence into matrix decomposition. Three paradigms characterize the modeling of connections between timestamps: direct temporal smoothing in the loss function, updating similarity matrices via matrix perturbation theory, and decomposing similarity matrices into constant and time-dependent terms.

Another category of methods for graph embedding relies on random walks. Multiple random walks of fixed length are treated as sentences, providing context for each node and capturing higher-order dependencies without adjacency matrices. This method conducts random walks on each snapshot of a dynamic graph, deriving vector representations by optimizing a joint problem considering temporal dependencies, and concatenates node representations across time (Mitrovic & De Weerdt, 2019).

Architectures based on recurrent neural networks (RNNs), including long short-term memory units (LSTMs) and gated recurrent units (GRUs), utilize a sequence of graphs or their representations to learn or enhance embeddings, considering temporal correlation and storing information over time to handle complex correlations beyond consecutive timestamps (Chen et al., 2021). Also, convolutional networks are used to exploit temporal dependencies in graph data (Xiong et al., 2019). Another approach employs autoencoding mechanisms, with methods like DynGEM employing transfer learning to share parameters between consecutive autoencoders, adapting to growing graph sizes and node attributes (Goyal et al., 2018). Dynamic autoencoders use historical network records to forecast future states, addressing the temporal graph offset reconstruction problem. Techniques like dyngraph2vecAE and dyngraph2vecRNN employ various architectures, including autoencoders and LSTM networks (Goyal et al., 2020).

A category of neural models explicitly designed for networked data is Graph Neural Networks (GNNs). GNNs possess the capability to extract information from arbitrarily structured network topologies by incorporating various graph-related features, such as node and edge roles, and correlations between them. Recent advancements have seen innovative GNNs adept at capturing both the topological and temporal features of time-varying graphs. Many of these approaches focus on algorithms aimed at learning the topology and temporal dynamics simultaneously. For instance, Ma et al. (2020) models evolving graphs but does not explicitly address the changes that evolution entails. It specifically targets growing-interactions graphs, primarily considering the insertion of new edges without accounting for edge deletions. The method continuously updates node information using a GNN upon the arrival of new edges, incorporating only the most recent node information with LSTMs. Similarly, Taheri et al. (2019) aims to model changes in dynamic graphs but falls short in revealing them. It embeds GNNs within an encoder to preserve the topology of dynamic graphs at each time step. LSTMs are then introduced to capture temporal dynamics by propagating information across consecutive time steps. Finally, a decoder reconstructs the graph’s history. However, these methods require sequential reading of graph snapshots before modeling evolution, potentially hindering their ability to detect changes effectively. While changes are accommodated within the GNN models, they may not be discernible. One of the few works focused on GNNs and aimed at detecting changes is presented in Wang et al. (2023). However, in this work, the notion of change primarily concerns the strength of relationships between nodes rather than structural aspects.

Change detection algorithms have been investigated also by using embedding techniques for static graphs. Many of these approaches focus on proposing novel statistical tests defined on continuous spaces. For example, Grattarola et al. (2020) concentrates on detecting changes in non-Euclidean spaces, such as hyperspherical and hyperbolic spaces. They utilize autoencoder-based embeddings and conduct tests on a stream of geodesic distances computed between embedded graphs and a ’mean’ graph. Similarly, Luo and Krishnamurthy (2024) applies a similar concept to identify multiple change points. Adhering to the principle of matrix-based embeddings, Hewapathirana et al. (2020) introduces a spectral representation derived from the weighted adjacency matrix of the graph at each time instant. Each embedded point characterizes the behavior of a vertex in the graph at a specific time, reflecting the average behavior in preceding time instants. Their detection method employs a moving window strategy to compute change scores for vertices at each time step, with these embedded points represented as vectors. More recently, Zhang et al. (2023) argues that dynamic networks often exhibit changes in their regular patterns. They propose a framework that leverages existing embeddings enhanced with learned patterns to identify different patterns in dynamic networks. This framework adjusts network embeddings based on detected patterns by assigning node features from previous snapshots to subsequent snapshots within the same pattern, thereby enhancing the distinction between different network patterns. Sulem et al. (2024) introduces a change-point agnostic method for online change detection by incorporating general graph features such as node attributes, edge weights, or attributes. This method relies on a similarity statistic, data-driven graph similarity function, and short-term history of graph snapshots to detect changes efficiently.

While dynamic network embedding and node embedding techniques offer numerous benefits, they also come with some disadvantages. such as limited generalization to unseen nodes or graphs with significantly different structures, the huge computational demand, especially for deep learning-based approaches, and the limited interpretability of the learned representations.

11 Concluding remarks

In this paper, we have studied the problem of change detection in dynamic networks with pattern-based approaches. Specifically, we have proposed heuristic approaches of non-exhaustive pattern-based change detection (PBCD), which, to the best of our knowledge, have never been considered before. In particular, we have collected several improvements contributing to the efficiency and accuracy of traditional PBCDs. These have been possible by resorting to the general PBCD schema and integrating algorithmic improvements and pattern evaluation measures. Specifically, we have relaxed the exhaustiveness of the PBCDs mining step with a non-exhaustive mining strategy, inspired by beam search algorithms and heuristics criteria. Since non-exhaustive mining discovers limited sets of patterns according to the heuristic adopted, we evaluated how frequency-based and contrast-based heuristics affect the overall performances. In particular, we discussed how they affect the mining efficiency and the detection accuracy under different PBCD configurations, that is different time windows and different dissimilarity measures. To do so, we have conducted an extensive exploratory evaluation on both real and synthetic networks.

The experimental results provided some insights on the most efficient and the most accurate PBCDs among the possible approaches (Section 8.1).

In particular, they have shown that non-exhaustive PBCDs are more efficient than exhaustive PBCDs, while achieving comparable levels of accuracy (Sections 8.2 and 8.3). Specifically, following the experimental results on both synthetic and real-world networks of different characteristics, we concluded that heuristic PBCDs should be favored over exhaustive ones for detecting changes in dynamic networks. In particular, heuristic PBCDs were at most 16x times more efficient than KARMA in the case of real-world networks, at most 20x times more efficient than KARMA in the case of synthetic networks with frequent drifts, and at most 506x times more efficient than KARMA on synthetic networks with rare drifts.

Moreover, we concluded that heuristic PBCDs based on frequency measures are more accurate than those based on contrast measures (Section 8.4).

The two-step procedure makes many PBCDs modular in mining and detection steps and demonstrates adaptability to various categories of dynamic networks beyond directed simple networks, such as attributed-node networks and weighted networks. The detection step operates on (two) object sets representing pattern subspaces and makes the heuristics applicable to any pair of sets of objects, irrespective of how the pattern subspaces were constructed or which structural component or network properties were utilized. Differently, the mining step does not have the same adaptability, because the pattern mining algorithms used in this manuscript work on directed simple networks which are required to generate sub-graphs and sub-trees indicative of structural changes. This reveals a limitation of the proposed approach, indeed it is unable to work on attribute-node temporal networks, that is, sequences of graph snapshots having the same topology (the edges remain unaltered) but time-varying node properties. This would not exclude apriori the applicability of a PBCD, but requires, due to the characteristic of modularity, only appropriate mining algorithms to generate pattern subspaces able to represent changes at the level of the node properties. Both exhaustive and heuristic PBCDs could potentially be used, but for attributed-node networks and weighted networks in the real-world data, the search space of changes becomes overwhelming for exhaustive approaches. Consequently, heuristic methods become more favorable in such scenarios.

Future directions of research involve the evaluation of the performances when adopting more sophisticated feature spaces, in conjunction with the contrast measures. For example by considering variations in the emerging patterns, which can be non-exhaustively discovered according to the contrast measures discussed in the present work.

Furthermore, as PBCDs denote a versatile class of change detection algorithms, evaluating their effectiveness on dynamic networks of different kind (e.g. dynamic attributed networks, dynamic weighted networks, etc.).