The central idea of update handling is to employ a delicate data structure to store and incrementally maintain partial solutions.
A concise representation
There has been a long tradition in graph community to harness a tree structure for fast pattern matching/search [1, 8]. We also follow this tradition, and conceive a succinct data structure for keeping partial solutions. PT is constructed by removing the edges that are not in the spanning tree, i.e., non-tree edges, if P contains cycles. The vertices in P are partitioned according to their levels in the spanning tree where the level of a vertex in PT is its depth compared to the root vertex of PT.
To keep partial solutions, we offer a concise representation named TreeMat, which comprises matching vertices to those of PT in topology graph G. Given a vertex v in PT, its matching vertices in TreeMat are arranged into
-
match(⋅): the set of vertices {u} in G that map to v in some solutions to PT; and
-
stree(⋅): the set of vertices {u} in G such that 1) the subtree residing at v matches the corresponding subtree at u via subgraph homomorphism [10], and 2) there does not exist a solution to PT that map v to u.
Here, subgraph homomorphism can be obtained by just removing the injectivity constraint. It can be seen that the two sets are mutually exclusive, and we use a general designation candidates of v i.e., cand(v) to refer the vertices in either match(v) or stree(v). As a consequence, the structure of TreeMat is defined as follows.
-
It is a tree-like structure, and for each query vertex v in PT, there is a node containing the candidates of v, which is constituted of two sets match(v) and stree(v); and
-
there is an edge between u ∈cand(v) and \(u^{\prime }\in \textsf {cand}(v^{\prime })\) for adjacent query vertices v and \(v^{\prime }\) in TreeMat, if and only if edge \(\langle u,u^{\prime }\rangle \in G\).
It is noted that stree(vr) of the root vertex vr in PT is empty, since PT is also a subtree residing at vr.
Example 1
Figure 4b shows the TreeMat for PT (Figure 4a) and initial data graph G0. Given a vertex v in T, the orange square in cand(v) represents a data vertex u ∈stree(v); and the black square in cand(v) represents a data vertex u ∈match(v). Furthermore, we can see that the root vertex v1 of PT only has the set match(⋅).
Remark 1
As pointed out in [10], existing work on continuous subgraph matching caches either a set of partial solutions or a set of candidate vertices for each query vertex. These paradigms incur not only great memory overhead but also large computational cost. In contrast, our model takes a more eager strategy, and proposes to keep complete solutions (in match(⋅)) as well as solution-likely-to-be’s (in stree(⋅)). In this way, we save TreeMat from filling up the main memory while offering guidance to efficiently derive affected answers.
Data graph change-oriented rationale of maintenance
In this subsection, we propose a vertex state transition strategy (denoted as VST) to efficiently maintain the intermediate results.
When an edge update operation \(\langle u,u^{\prime }\rangle \) arrives, we try to match it with an edge \(\langle v,v^{\prime }\rangle \) in PT. Here, the level of v is deemed to be smaller than the level of \(v^{\prime }\). Then, we use VST to maintain the TreeMat. We set the data vertex u ∈NULL if u∉cand(v). Figure 1 shows the state transition diagram, consisting of three states and six transition rules (Transitions 1–6), which demonstrates how one state is transited to another. Here, Transition 1–3 are triggered by edge insertion, and Transition 4–6 are triggered by edge deletion.
Handling edge insertion
Consider an edge \(\langle u,u^{\prime }\rangle \) inserted into G0, to which \(\langle v,v^{\prime }\rangle \) is matched in PT. Let v be the parent vertex of \(v^{\prime }\).
From NULL to match. Suppose that u ∈match(v) and \(u^{\prime }\in \textsf {NULL}\). If \(v^{\prime }\) is a leaf vertex, then we add \(u^{\prime }\) into \(\textsf {match}(v^{\prime })\).
Suppose that v is the root vertex in PT, \(u^{\prime }\in \textsf {cand}(v^{\prime })\) and u ∈NULL. For each child vertex vc of v except \(v^{\prime }\), if vc is a leaf vertex, we check if there is an edge 〈u,uc〉 matching 〈v,vc〉; else we further check if uc ∈cand(vc). If so, we add vertex u into match(v). In specific, if vc is a leaf vertex and uc ∈NULL, we should also add vertex uc into match(vc).
From NULL to stree. Suppose that u ∈NULL and \(u^{\prime }\in \textsf {cand}(v^{\prime })\). Here, v is not the root vertex in PT. For each child vertex vc of v except \(v^{\prime }\), if vc is a leaf vertex, we check if there is an edge 〈u,uc〉 matching 〈v,vc〉; else we further check if uc ∈cand(vc). If so, we add vertex u into stree(v). In specific, if vc is a leaf vertex and uc ∈NULL, we should also add vertex uc into stree(vc).
Suppose that the data vertex u is added into stree(v). For each up ∈NULL that is adjacent to u, if 〈u,up〉 matches 〈v,vp〉 where vp is the parent vertex \(v^{\prime \prime }\) of v, we further check whether up can be added into stree(vp) with a similar manner (Fig. 2).
From stree to match. Suppose that \(u^{\prime }\in \textsf {stree}(v^{\prime })\) and u ∈match(v). Then we remove \(u^{\prime }\) from \(\textsf {stree}(v^{\prime })\) to \(\textsf {match}(v^{\prime })\).
Suppose that the data vertex u is added into match(v). For each child vertex vc of v, if there is a vertex uc in stree(vc) that is adjacent to u in TreeMat, then we remove uc from stree(vc) to match(vc).
Example 2
Figure 2c–h give the examples of vertex state transition strategy for edge insertion, where Figure 2c–d show the strategy
, Figure 2e–f show the strategy
, and Figure 2g–h show the strategy
. In Figure 2c, the edge insertion Δo1 matches 〈v4,v7〉 where u6 ∈match(v4). Since v7 is a leaf vertex in PT, we add u17 to match(v7). In Figure 2d, the edge insertion Δo2 matches 〈v1,v2〉 where u2 ∈match(v2). Since v1 is the root vertex in PT and 〈u18,u4〉 matches 〈v1,v3〉 with u4 ∈match(v3), we add u18 into match(v1). In Figure 2e, the edge insertion Δo3 matches 〈v4,v7〉 where u14 ∈stree(v7). Since v4 has no child vertex exclude v7, we add u19 into stree(v4). In Figure 2f, there is a neighbor u20 of u19 that satisfies 〈u19,u20〉 matches 〈v4,v2〉. Since 〈u20,u9〉 matches 〈v2,v5〉, we further add u20 into stree(v2). In Figure 2g, the edge insertion Δo4 matches 〈v4,v2〉 where u2 ∈match(v2) and u7 ∈stree(v4). We then remove u7 from stree(v4) to match(v4). In Figure 2h, we further check the data vertices in stree(v7) where v7 is the child vertex of v4. Since u13 and u14 are the neighbors of u7 in stree(v7), we remove u13 and u14 from stree(v7) to match(v7).
Handling edge deletion
Consider an edge \(\langle u,u^{\prime }\rangle \) deleted from G0, to which \(\langle v,v^{\prime }\rangle \) is matched in PT. Let v be the parent vertex of \(v^{\prime }\).
From match to NULL. Suppose that u ∈match(v) and \(u^{\prime }\in \textsf {match}(v^{\prime })\). If there is no data vertex in \(\textsf {match}(v^{\prime })\) that is adjacent to u except \(u^{\prime }\), we delete u from match(v). In specific, if \(v^{\prime }\) is a leaf vertex, and there is no other data vertex in cand(v) that is adjacent to \(u^{\prime }\), we delete \(u^{\prime }\) from \(\textsf {match}(v^{\prime })\).
Suppose that u is deleted from match(v). For each neighbor up of u in match(vp) where vp is the parent of v, if there is no other data vertex in match(v) that is adjacent to up, then we delete up from match(vp).
From match to stree. Suppose that u ∈match(v) and \(u^{\prime }\in \textsf {match}(v^{\prime })\). If there is no other data vertex in match(v) that is adjacent to \(u^{\prime }\), then we remove \(u^{\prime }\) from \(\textsf {match}(v^{\prime })\) to \(\textsf {stree}(v^{\prime })\). In specific, if \(v^{\prime }\) is a leaf vertex, we need further check if there is a vertex in stree(v) that is adjacent to \(u^{\prime }\); if so, remove \(u^{\prime }\) from \(\textsf {match}(v^{\prime })\) to \(\textsf {stree}(v^{\prime })\).
From stree to NULL. Suppose that u ∈stree(v) and \(u^{\prime }\in \textsf {cand}(v^{\prime })\). If there is no other data vertex in \(\textsf {cand}(v^{\prime })\) that is adjacent to u, we then delete u from stree(v). In specific, if \(v^{\prime }\) is a leaf vertex in PT and \(u^{\prime }\in \textsf {stree}(v^{\prime })\), we need further check whether there is a data vertex in stree(v) that is adjacent to \(u^{\prime }\). If not, we delete \(u^{\prime }\) from \(\textsf {stree}(v^{\prime })\).
Suppose that the vertex u is deleted from stree(v). For each neighbor up of u in stree(vp) where vp is the parent of v, if there is no other data vertex in cand(v) that is adjacent to up, then we delete up from stree(vp).
Pattern graph change-oriented rationale of maintenance
It can be seen that if inserted (or deleted) edge is a non-tree edge, we do not update TreeMat, since it has no impact on TreeMat. Thus, the following exposition concentrates on tree edges.
Handling edge insertion
Consider a tree edge \(\langle v,v^{\prime }\rangle \) inserted into PT, where \(v^{\prime }\) is the vertex newly introduced. Under this scenario, candidate vertices are only to be excluded from match(⋅) or stree(⋅), back to NULL state, but not vice versa. To identify affected candidates, we check, for each vertex u in match(v), whether there is an edge \(\langle u,u^{\prime }\rangle \) with \(u^{\prime }\in \textsf {NULL}\) matching \(\langle v,v^{\prime }\rangle \). If not, we delete u from match(v); otherwise, we add vertex \(u^{\prime }\) into \(\textsf {match}(v^{\prime })\) if u ∈match(v). stree(v) or \(\textsf {stree}(v^{\prime })\) can be updated in a similar fashion.
Moreover, when vertex u is excluded from the candidates of v, such update needs to be propagated upwards in TreeMat till the root vertex. Consider the parent vertex vp of v, if up is the neighbor of u in match(vp), and there is no vertex in match(v) that is adjacent to up in TreeMat, we exclude up from match(vp).
Handling edge deletion
We discuss edge deletion in two cases based on whether the deletion involves a leaf vertex of PT.
Case 1
Consider tree edge \(\langle v,v^{\prime }\rangle \) with \(v^{\prime }\) as a leaf vertex. Note that in this case, NULL vertices only are to be included into match(⋅) or stree(⋅), but not vice versa. Intuitively, a vertex u of G0 is added into stree(v), only if for each child vertex vc of v exclude \(v^{\prime }\), there is a vertex uc that is candidate to vc such that 〈u,uc〉 matches 〈v,vc〉.
Then, update needs to be propagated upwards to the root of TreeMat. Suppose that vertex u is added into stree(v). For each vertex up that is adjacent to u and 〈up,u〉 matches 〈vp,v〉, if up ∈NULL, we check whether up can be added into stree(vp) in a similar manner; else if up ∈match(vp), we move u from stree(v) to match(v). In the other situation when vertex u is added into match(v), we examine, for each child vertex vc of v, whether there is vertex uc in stree(vc) that is adjacent to u in TreeMat; if so, remove data vertex uc to match(vc).
Case 2
Consider a tree edge \(\langle v,v^{\prime }\rangle \) not involving any leaf vertex. This type of edge deletion will break the connectivity of PT but not PFootnote 1. Thus, a non-tree edge that connects \(v^{\prime }\) with an arbitrary vertex will become a tree edge. By intuition, we choose, among all the non-tree edges, that one \(v^{\prime \prime }\) that connects \(v^{\prime }\) to a vertex closer to the root and has smaller match(⋅) set.
Then, for each vertex \(u^{\prime \prime } \in \textsf {stree}(v^{\prime \prime })\), we check whether there is a candidate \(u^{\prime }\) of \(v^{\prime }\) such that \(\langle u^{\prime \prime },u^{\prime }\rangle \) matches \(\langle v^{\prime \prime },v^{\prime }\rangle \); if not, we exclude \(u^{\prime \prime }\) from \(\textsf {stree}(v^{\prime \prime })\), and further check the vertices in stree(vp), where vp is the parent of \(v^{\prime \prime }\). The update is propagated upwards till the root.
Example 3
Figure 3d–h give the examples of updating process for edge insertions and deletion of the pattern graph. In Figure 3d, since 〈v4,v5〉 is a non-tree edge, we only add edge 〈v6,v10〉 into PT. In Figure 3e, since there is no vertex \(u^{\prime }\) that is adjacent to u11 such that \(\langle u_{11},u^{\prime }\rangle \) matches 〈v6,v10〉, we remove u11 from stree(v6). Accordingly, we remove the parent vertex u5 of u11 from stree(v3). What’s more, since u10 ∈match(v6), and there are two vertices u17 and u18 that are adjacent to u10 such that edges 〈u10,u17〉 and 〈u10,u18〉 match 〈v6,v10〉, we add u17 and u18 into match(v10). Figure 3f gives the updated TreeMat with edge insertion Δg2. When the edge Δp1 is deleted from P, there are two non-tree edges 〈v5,v6〉 and 〈v6,v8〉 that can be translated into tree edges. Here, we translate 〈v5,v6〉 into tree edge, since |match(v5)| = |match(v8)| and v5 is closer to the root vertex v1. The updated PT and TreeMat are given in Figures 3g and h, respectively.