Efficient Implementation of Color Coding Algorithm for Subgraph Isomorphism Problem

We consider the subgraph isomorphism problem where, given two graphs G (source graph) and F (pattern graph), one is to decide whether there is a (not necessarily induced) subgraph of G isomorphic to F. While many practical heuristic algorithms have been developed for the problem, as pointed out by McCreesh et al. [JAIR 2018], for each of them there are rather small instances which they cannot cope. Therefore, developing an alternative approach that could possibly cope with these hard instances would be of interest. A seminal paper by Alon, Yuster and Zwick [J. ACM 1995] introduced the color coding approach to solve the problem, where the main part is a dynamic programming over color subsets and partial mappings. As with many exponential-time dynamic programming algorithms, the memory requirements constitute the main limiting factor for its usage. Because these requirements grow exponentially with the treewidth of the pattern graph, all existing implementations based on the color coding principle restrict themselves to specific pattern graphs, e.g., paths or trees. In contrast, we provide an efficient implementation of the algorithm significantly reducing its memory requirements so that it can be used for pattern graphs of larger treewidth. Moreover, our implementation not only decides the existence of an isomorphic subgraph, but it also enumerates all such subgraphs (or given number of them). We provide an extensive experimental comparison of our implementation to other available solvers for the problem.


Introduction
Many real-world domains incorporate large and complex networks of interconnected units. Examples include social networks, the Internet, or biological and chemical systems. These networks raise interesting questions regarding their structure. One of those questions asks whether a given network contains a particular pattern, which typically represents a specific behaviour of interest [1,7,15]. The problem of locating a particular pattern in the given network can be restated as a problem of locating a subgraph isomorphic to the given pattern graph in the network graph.
Formally, the Subgraph Isomorphism (SubIso) problem is, given two undirected graphs G and F , to decide whether there is a (not necessarily induced) subgraph of G isomorphic to F . Or, in other words, whether there is an adjacency-preserving injective mapping from vertices of F to vertices of G. Since we do not require the subgraph to be induced (or the mapping to preserve non-adjacencies), some authors call this variant of the problem Subgraph Monomorphism.
For many applications it is not enough to just learn that the pattern does occur in the network, but it is necessary to actually obtain the location of an occurrence of the pattern or rather of all occurrences of the pattern [21,27]. Because of that, we aim to solve the problem of subgraph enumeration, in which it is required to output all subgraphs of the network graph isomorphic to the pattern graph. In Subgraph Enumeration (SubEnum), given again two graphs G and F , the goal is to enumerate all subgraphs of G isomorphic to F . Note, that SubEnum is at least as hard as SubIso. We call the variants, where the problem is required to be induced IndSubIso and IndSubEnum, respectively.
As Clique, one of the problems on the Karp's original list of 21 NP-complete problems [18], is a special case of SubIso, the problem is NP-complete. Nevertheless, there are many heuristic algorithms for SubEnum, many of them based on ideas from constraint programming (see Section 1.1), which give results in reasonable time for most instances. However, for each of them there are rather small instances which they find genuinely hard, as pointed out by McCreesh et al. [25]. Therefore, developing an alternative approach that could possibly cope with these hard instances would be of interest.
In this paper we focus on the well known randomized color coding approach [2], which presumably has almost optimal worst case time complexity. Indeed, its time complexity is O n tw(F )+1 G 2 O(n F ) with memory requirements of O n tw(F )+1 G tw(F )n F 2 n F , where n G and n F denote the number of vertices in the network graph G and the pattern graph F , respectively, and tw(F ) is the treewidth of graph F -a measure of tree-likeness (see Section 1.2 for exact definitions). Moreover, we presumably cannot avoid the factor exponential in treewidth in the worst case running time, as Marx [23] presented an ETH 1based lower bound for Partitioned Subgraph Isomorphism problem.
Proposition 1 (Marx [23]). If there is a recursively enumerable class F of graphs with unbounded treewidth, an algorithm A, and an arbitrary function f such that A correctly decides every instance of Partitioned Subgraph Isomorphism with the smaller graph F in F in time f (F )n o(tw(F )/log tw(F )) G , then ETH fails.
As the memory requirements of the color coding approach grow exponentially with treewidth of the pattern graph, existing implementations for subgraph enumeration based on this principle restrict themselves to paths [16] or trees [27], both having treewidth 1. As the real world applications might employ networks of possibly tens to hundreds of thousands of vertices and also pattern graphs with structure more complicated than trees, we need to significantly reduce the memory usage of the algorithm.
Using the principle of inclusion-exclusion, Amini et al. [3,Theorem 15] suggested a modification of the color coding algorithm, which can decide whether the pattern F occurs in the graph G in expected time O n tw(F )+1 G 2 O(n F ) with memory requirements reduced to O n tw(F )+1 G log n F ). 2 While single witnessing occurrence can be found by means of self-reduction (which is complicated in case of randomized algorithm), the inclusion-exclusion nature of the algorithm does not allow to find all occurrences of pattern in the graph, which is our main goal. Therefore, our approach rather follows the paradigm of generating only those parts of a dynamic programming table that correspond to subproblems with a positive answer, recently called "positive instance driven" approach [30]. This further prohibits the use of the inclusion-exclusion approach of Amini et al. [3], since the inclusion-exclusion approach tends to use most of the table and the term O n tw(F )+1 G is itself prohibitive in the memory requirements for tw(F ) ≥ 2. Because of the time and memory requirements of the algorithm, for practical purposes we restrict ourselves to pattern graphs with at most 32 vertices.
Altogether, our main contribution is twofold: -We provide a practical implementation of the color coding algorithm of Alon, Yuster, and Zwick [2] capable of processing large networks and (possibly disconnected) pattern graphs of small, yet not a priory bounded, treewidth. -We supply a routine to extract the occurrences of the subgraphs found from a run of the algorithm.
It is important to note that all the modifications only improve the practical memory requirements and running time. The theoretical worst case time and space complexity remain the same as for the original color coding algorithm and the algorithm achieves these, e.g., if the network graph is complete. Also, in such a case, there are n Θ(n F ) G occurrences of the pattern graph in the network implying a lower bound on the running time of the enumeration part.
In Section 2 we describe our modifications to the algorithm and necessary tools used in the process. Then, in Section 5, we benchmark our algorithm on synthetic and realistic data and compare its performance with available existing implementations of algorithms for subgraph isomorphism and discuss the results obtained. Section 6 presents future research directions.

Related work
There are several algorithms tackling SubIso and its related variants. Some of them only solve the variant of subgraph counting, our main focus is however on algorithms actually solving SubEnum. Following Carletti et al. [10] and Kimmig et al. [19], we categorize the algorithms by the approach they use (see also Kotthoff et al. [20] for more detailed description of the algorithms). Many of the approaches can be used both for induced and non-induced variants of the problem, while some algorithms are applicable only for one of them.
Vast majority of known algorithms for the subgraph enumeration problem is based on the approach of representing the problem as a searching process. Usually, the state space is modelled as a tree and its nodes represent a state of a partial mapping. Finding a solution then typically resorts to the usage of DFS in order to find a path of mappings in the state space tree which is compliant with isomorphism requirements. The efficiency of those algorithms is largely based on early pruning of unprofitable paths in the state space. Indeed, McCreesh et al. [25] even measure the efficiency in the number of generated search tree nodes. The most prominent algorithms based on this idea are Ullmann's algorithm [31], VF algorithm and its variants [8,10,12,13] (the latest VF3 [8] only applies to IndSubEnum) and RI algorithm [7]. The differences between these algorithms are based both on employed pruning strategies and on the order in which the vertices of pattern graph are processed (i.e. in the shape of the state space tree).
Another approach is based on constraint programming, in which the problem is modelled as a set of variables (with respective domains) and constraints restricting simultaneous variable assignments. The solution is an assignment of values to variables in a way such that no constraint remains unsatisfied. In subgraph isomorphism, variables represent pattern graph vertices, their domain consists of target graph vertices to which they may be mapped and constraints ensure that the properties of isomorphism remain satisfied. Also in this approach, a state space of assignments is represented by a search tree, in which non-profitable branches are to be filtered. Typical algorithms in this category are LAD algorithm [28], Ullmann's bitvector algorithm [32], and Glasgow algorithm [24]. These algorithms differ in the constraints they use, the way they propagate constraints, and in the way they filter state space tree.
There are already some implementations based on the color coding paradigm, where the idea is to randomly color the input graph and search only for its subgraphs, isomorphic to the pattern graph, that are colored in distinct colors (see Section 2.1 for more detailed description). This approach is used in subgraph counting algorithms, e.g., in ParSE [33], FASCIA [26], and in [1], or in algorithms for path enumeration described in [27] or in [16]. Each of these algorithms, after the color coding step, tries to exploit the benefits offered by this technique in its own way; although usually a dynamic programming sees its use. Counting algorithms as ParSE and FASCIA make use of specifically partitioned pattern graphs, which allow to use combinatorial computation. Weighted path enumeration algorithms [27,16] describe a dynamic programming approach and try to optimize it in various ways. However, to the best of our knowledge there is no color coding algorithm capable of enumerating patterns of treewidth larger than 1.
Our aim is to make step towards competitive implementation of color coding based algorithm for SubEnum, in order to see, where this approach can be potentially beneficial against the existing algorithms. To this end, we extend the comparisons of SubEnum algorithms [9,20,25] to color coding based algorithms, including the one proposed in this paper.

Basic definitions
All graphs in this paper are undirected and simple. For a graph G we denote V (G) its vertex set, n G the size of this set, E(G) its edge set, and m G the size of its edge set.
As already said, we use the color coding algorithm. The algorithm is based on a dynamic programming on a nice tree decomposition of the pattern graph. We first define a tree decomposition and then its nice counterpart.

Definition 1.
A tree decomposition of a graph F is a triple (T, β, r), where T is a tree rooted at node r and β : V (T ) → 2 V (F ) is a mapping satisfying: (i) We shall denote bag β(x) as V x . The width of tree decomposition (T, β, r) is max x∈V (T ) |V x | − 1. Treewidth tw(F ) of graph F is the minimal width of a tree decomposition of F over all such decompositions.

Definition 2.
A tree decomposition of a graph F is nice if deg T (r) = 1, V r = ∅, and each node x ∈ V (T ) is of one of the following four types: -Leaf node-x has no children and |V x | = 1; -Introduce node-x has exactly one child y and V x = V y ∪ {u} for some u ∈ V (F ) \ V y ; -Forget node-x has exactly one child y and V x = V y \ {u} for some u ∈ V y ; -Join node-x has exactly two children y, z and V x = V y = V z .
Note that for practical purposes, we use a slightly modified definition of nice tree decomposition in this paper. As the algorithm starts the computation in a leaf node, using the standard definition with empty bags of leaves [14] would imply that the tables for leaves would be somewhat meaningless and redundant. Therefore, we make bags of leaf nodes contain a single vertex.
Definition 3. For a tree decomposition (T, β, r), we denote by V * x the set of vertices in V x and in V y for all descendants y of x in T .
Note that, by Definition 3, for the root

Algorithm Description
In this section we first briefly describe the idea of the original color coding algorithm [2], show, how to alter the computation in order to reduce its time and memory requirements, and describe implementation details and further optimizations of the algorithm.

Idea of the Algorithm
The critical idea of color coding is to reduce the problem to its colorful version. For a graph G and a pattern graph F , we color the vertices of G with exactly n F colors. We use the randomized version, i.e., we create a random col- After the coloring, the algorithm considers as valid only subgraphs G of G that are colorful copies of F as follows.
if G is isomorphic to F and all of its vertices are colored by distinct colors in ζ.
As the output of the algorithm heavily depends on the chosen random coloring of G, in order to reach some predefined success rate of the algorithm, we need to repeat the process of coloring several times. The probability of a particular occurrence of pattern graph F becoming colorful with respect to the random coloring is n F ! n n F F , which tends to e −n F for large n F . Therefore, by running the algorithm e n F log 1 ε times, each time with a random coloring ζ : V (G) → {1, 2, . . . , n F }, the probability that an existing occurrence of the pattern will be revealed in none of the runs is at most ε. While using more colors can reduce the number of iterations needed, it also significantly increases the memory requirements. Hence, we stick to n F colors. Even though it is possible to derandomize such algorithms, e.g., by the approach shown in [14], in practice the randomized approach usually yields the results much quicker, as discussed in [27]. Moreover, we are not aware of any actual implementation of the derandomization methods.
The main computational part of the algorithm is a dynamic programming. The target is to create a graph isomorphism Φ : V (F ) → V (G). We do so by traversing the nice tree decomposition (T, β, r) of the pattern graph F and at each node x ∈ V (T ) of the tree decomposition, we construct possible partial mappings ϕ : V * x → V (G) with regard to required colorfulness of the copy. Combination of partial mappings consistent in colorings then forms a desired resulting mapping.
The semantics of the dynamic programming table is as follows. For any tree decomposition node x ∈ V (T ), any partial mapping ϕ : V x → V (G) and any color subset x ] using exactly the colors in C, that is, If there is no such isomorphism, then we let D(x, ϕ, C) = 0. We denote all configurations (x, ϕ, C) for which D(x, ϕ, C) = 1 as nonzero configurations. The original version of the algorithm is based on top-down dynamic programming approach with memoization of already computed results. That immediately implies a big disadvantage of this approach-it requires the underlying dynamic programming table (which is used for memoization) to be fully available throughout the whole run of the algorithm. To avoid this inefficiency in our modification we aim to store only nonzero configurations, similarly to the recent "positive instance driven" dynamic programming approach [30].

Obtaining a Nice Tree Decomposition
The algorithm requires a nice tree decomposition of the pattern graph for its work. The running time of the algorithm actually does not depend on the treewidth of the graph, but rather on the width of the tree decomposition supplied. On one hand, for a given graph G and an integer k, the problem of determining whether the treewidth of G is at most k, is NP-complete [4]. On the other hand, as the main algorithm is exponential in the size of the pattern anyway, we can afford to use exponential time algorithms in order to obtain a tree decomposition of minimum possible width. We employ known technique based on graph triangulation, chordal graphs and elimination orderings, combining the ideas of Bodlaender et al. [5,6]. Time complexity of this algorithm is O(n 2 F ·2 n F ). After obtaining tree decomposition of the pattern graph, we construct a nice tree decomposition as described by Cygan et al. [14]-any tree decomposition of a graph F that consists of O(n F ) nodes with width at most t, can be, in O(t 2 ·n F ) time, transformed to a nice tree decomposition of F with O(t · n F ) nodes and width bounded by t. Our further modification to the nice tree decomposition does not affect the running time.

Initial Algorithm Modification
In our implementation, we aim to store only nonzero configurations, therefore we need to be able to construct nonzero configurations of a parent node just from the list of nonzero configurations in its child/children.
We divide the dynamic programming table D into lists of nonzero configurations, where each nice tree decomposition node has a list of its own. Formally, for every node x ∈ V (T ), let us denote by D x a list of all mappings ϕ with a list of their corresponding color sets C, for which D(x, ϕ, C) = 1. The list D x for all x ∈ V (T ) is, in terms of contained information, equivalent to maintaining the whole table D-all configurations not present in the lists can be considered as configurations with a result equal to zero.
Dynamic Programming Description We now describe how to compute the lists D(x, ϕ, C) for each type of a nice tree decomposition node.
For a leaf node x ∈ T , there is only a single vertex u in V * x to consider. We can thus map u to all possible vertices of G, and we obtain a list with n G partial mappings ϕ, in which the color list for each mapping contains a single color set {ζ(ϕ(u))}.
For an introduce node x ∈ T and its child y in T , we denote by u the vertex being introduced in x, i.e., {u} = V x \V y . For all nonzero combinations of a partial mapping and a color set (ϕ , C ) in the list D y , we try to extend ϕ by all possible mappings of the vertex u to the vertices of G. We denote one such a mapping as ϕ. We can consider mapping ϕ as correct, if (i) the new mapping ϕ(u) of the vertex u extends the previous colorset C , that is, C = C ∪ {ζ(ϕ(u))} = C , and (ii) ϕ is edge consistent, that is, for all edges {v, w} ∈ E(F ) between currently mapped vertices, i.e., in our case v, w ∈ V x , there must be an edge {ϕ(v), ϕ(w)} ∈ E(G). However, because ϕ was by construction already edge consistent, it suffices to check the edge consistency only for all edges in F [V x ] with u as one of their endpoints, i.e., for all edges {u, w} ∈ E(F [V x ]) with w ∈ N F [Vx] (u). After checking those two conditions, we can add (ϕ, C) to D x .
For a forget node x ∈ V (T ) and its child y in T , there is not much work to do, as in the bottom-up construction, we directly obtain the parent list of nonzero configurations. We denote by u, the vertex being forgotten in x, i.e., {u} = V y \ V x . In this case, for all partial mappings ϕ in the list D y , we create a new mapping ϕ that excludes the mapping ϕ (u) for vertex u, i.e., ϕ = ϕ Vx . Color sets corresponding to ϕ are carried over to ϕ, as they represent colors already used in the construction. After this step, we might need to merge color lists of previously different mappings, as after the removal of the mapping ϕ (u), they might become the same mappings.
For a join node x ∈ V (T ), we denote by y and z its children in T . We traverse the children lists D y and D z and look for partial mappings ϕ and ϕ , for which ϕ = ϕ holds. Such mappings are the only ones to potentially form a new nonzero configuration in the parent list, as due to the fact that V x = V y = V z , we construct the new partial mapping ϕ as ϕ = ϕ = ϕ . However, for each such mapping ϕ, we must also construct the new list of color sets, which would afterwards be corresponding to the mapping in the parent list. We do that by traversing color lists corresponding to mappings ϕ and ϕ in D y and D z , respectively, and for particular sets C and C from the color lists of ϕ and ϕ construct a new color set C = C ∪ C . We must check, whether the intersection of C and C contains exactly the colors to color the vertices in V x . That is, for mapping ϕ, we add to D x a color set Because we build the result from the leaves of the nice tree decomposition, we employ a recursive procedure on its root, in which we perform the computations in a way of a post-order traversal of a tree. From each visited node, we obtain a bottom-up dynamic programming list of nonzero configurations. After the whole nice tree decomposition is traversed, we obtain a list of configurations, that were valid in its root. Such configurations thus represent solutions found during the algorithm, from which we afterwards reconstruct results. Note that as we prepend a root with no vertices in its bag to the nice tree decomposition, there is a nonzero number of solutions if and only if, at the end of the algorithm, the list D r contains a single empty mapping using all colors.

Further Implementation Optimizations
Representation of Mappings For mapping representation, we suppose that the content of all bags of the nice tree decomposition stays in the same order during the whole algorithm. This natural and easily satisfied condition allows us to represent a mapping ϕ : V x → V (G) in a nice tree decomposition node x simply by an ordered tuple of |V x | vertices from G. From this, we can easily determine which vertex from F is mapped to which vertex in G. Also, for a mapping in an introduce or a forget node, we can describe a position in the mapping, on which the process of introducing/forgetting takes place.

Representation of Color Sets
We represent color sets as bitmasks, where the i-th bit states whether color i is contained in the set or not. For optimization purposes, we represent bitmasks with an integer number. As we use n F colors in the algorithm and restricted ourselves to pattern graphs with at most 32 vertices, we represent a color set with a 32-bit number.
Compressing the Lists Because we process the dynamic programming lists one mapping at a time, we store these lists in a compressed way and decompress them only on a mapping retrieval basis. We serialize dynamic programming lists into a simple buffer of bytes. We store a dynamic programming list as a continuous group of records, each of which represents one partial mapping and its corresponding list of color sets. Each record contains: a mapping in the form of ordered tuple of vertices, the number of color sets included, and color sets corresponding to the mapping in the form of non-negative integer numbers.
While deserializing, the number of vertices in a mapping can be easily determined by the size of the bag of a particular node. As there is no requirement on the order of the color sets, we store these sets sorted in the increasing order of their number representation. This allows us to effectively use delta compression. Moreover, we use variable length encoding to store the numbers into buffer.
For several routines we use the LibUCW library, 3 a C language library highly optimized for performance. The employed routines include data structures like growing buffers, hash tables, red-black trees, fast sorters, fast buffered input/output, and also the efficient variable length encoding of integers.
Masking Unprofitable Mappings Our implementation supports an extended format of input graphs where one can specify for each vertex of the network, which vertices of the pattern can be mapped to it. This immediately yields a simple degree-based optimization. Before the run of the main algorithm, we perform a linear time preprocessing of input graphs and only allow a vertex y ∈ V (F ) to be mapped to a vertex

Mapping Expansion Optimizations
The main "brute-force" work of the algorithm is performed in two types of nodes-leaf and introduce nodes, as we need to try all possible mappings of a particular vertex in a leaf node or all possible mappings of an introduced vertex in a introduce node to a vertex from G. We describe ways to optimize the work in introduce nodes in this paragraph.
Let x be an introduce node, u the vertex introduced and ϕ a mapping from a nonzero configuration for the child of x. We always need to check whether the new mapping of u is edge consistent with the mapping ϕ of the remaining vertices for the corresponding bag, i.e., whether all edges of F incident on u would be realized by an edge in G. Therefore, if u has any neighbors in F [V x ], then a vertex of G is a candidate for the mapping of u only if it is a neighbor of all vertices in the set ϕ(N F [Vx] (u)), i.e., the vertices of G, where the neighbors of u in F are mapped. Hence, we limit the number of candidates by using the adjacency lists of the already mapped vertices.
In the case deg F [Vx] (u) = 0 we have to use different approach. The pattern graphs F tend to be smaller than the input graphs G by several orders of magnitude. Hence, if the introduced vertex is in the same connected component of F as some vertex already present in the bag, a partial mapping processed in an introduce node anchors the possible resulting component to a certain position in G. During the construction of possible mapping of u, it is useless to try mapping u to vertices in G that are very distant to the current position and, therefore, could by no means form a resulting subgraph isomorphic to the component of F . We obtain the maximal possible distance to be considered in G as a minimal distance of u to a vertex in V x \ {u}. I.e., we determine w = argmin v∈Vx\{u} dist F (v, u). Due to the limit on pattern graph size, we precompute distances between every pair of vertices by using BFS on each vertex before the start of the algorithm. Then it suffices to try vertices from G that are in distance at most dist(w, u) from ϕ(w) in G; again, by a simple BFS usage.
Only if there is no vertex in the bag sharing a connected component of F with u, we have to fall back to trying all possible mappings.

Result Reconstruction
It is usual in dynamic programming to store a witness or all witnesses for each reasonable value in the table. However, this would completely neglect the effect of compression of the lists and significantly increase the memory requirements.
Hence, to enumerate all results, we recursively traverse the computed dynamic programming lists D x starting with the root r of underlying nice tree decomposition. The main idea is to ask the child y (or children y, z) of a node, how was a certain partial mapping ϕ obtained during the computation. In other words, for each partial mapping ϕ in D x of interest, we will extract partial mappings ϕ in D y (or possibly also partial mappings ϕ in D z ) that lead to the addition of ϕ to D x . To preserve the colorfulness (and thus validity) of glued partial mappings, with a partial mapping ϕ we also recursively pass a set of colors that can still be used in the choice of corresponding partial mappings to glue with ϕ. By applying this approach recursively and by gluing the partial information together in mappings, we obtain all possible ways an empty mapping in r could have been obtained-which is exactly the same as all possible ways a pattern graph can be mapped to the original graph. For the efficiency of the computation, we process each node only once and we thus recursively pass all partial mappings of interest and their remaining colorsets at once.
Formally, we will consider a recursive call of function R(x, M), where x ∈ V (T ), M is a set of pairs (ϕ, C), ϕ is a partial mapping ϕ : V x → V (G) and C is a color subset C ⊆ {1, 2, . . . , n F }. This function returns a list L containing, for each pair (ϕ, C) in M, a list L (ϕ,C) of mappings Φ : V * x → V (G) that lead to the appearance of (ϕ, C) in D x .
In detail, the function R(x, M) proceeds as follows.
If node x has a child y, we first scan D t,y for pairs (ϕ , C ), in which ϕ and ϕ and colorsets C and C are in some sense consistent. Pairs satisfying these conditions are then recursively passed to y in a list M, from which we obtain a list L. In case of a join node, we perform this process for both children. After that, we construct the resulting list from recursively computed list(s) by an addition of a mapping of the introduced vertex (in the case of a introduce node) or by all possible combinations of mappings (in the case of a join node). In the end, we sometimes need to flatten the resulting list, so all resulting partial mappings corresponding to a pair (ϕ, C) are in a single list L (ϕ,C) ∈ L. We now describe the process and consistency conditions formally for each of the node types.
For a leaf node x ∈ T , we simply return a list L of single element lists L (ϕ,C) for each pair (ϕ, C) ∈ M. Each such list contains a partial mapping Φ := ϕ.
For an introduce node x ∈ T and its child y in T , we denote by u the vertex being introduced in x, i.e., {u} = V x \ V y . We create M consisting of ( ϕ, C) by setting, for each pair (ϕ, C) ∈ M, ϕ := ϕ Vy and C := C \ {ζ(ϕ(u))}. Note that ( ϕ, C) must be in D y , as otherwise (ϕ, C) would not be present in M (or equivalently in D x ). Then we obtain L := R(y, M). Let ( ϕ, C) ∈ M be constructed from (ϕ, C) ∈ M and L ( ϕ, C) be the corresponding list of L. We construct the list L (ϕ,C) corresponding to (ϕ, C) by extending each Φ ∈ L ( ϕ, C) by ϕ(u).
For a forget node x ∈ T and its child y in T , we denote by u the vertex being forgotten in x, i.e., {u} = V y \ V x . We add to M for each (ϕ, C) ∈ M all ( ϕ, C) from D y such that ϕ = ϕ Vx and C = C. Then we obtain L := R(y, M). We construct L from L as follows. We do not need to modify any Φ in any L ∈ L, as the computation in forget node adds no new information to the mapping. However, as there may have been multiple (say k) pairs ( ϕ 1 , C 1 ), . . . , ( ϕ k , C k ) ∈ M constructed from a single (ϕ, C), we flatten all lists L ( ϕ1, C1) , . . . , L ( ϕ k , C k ) ∈ L obtained from such pairs into a single list L (ϕ,C) corresponding to (ϕ, C) ∈ M.
For a join node x ∈ V (T ), we denote by y and z its children in T . In this case, we create M and M consisting of (ϕ , C ) and (ϕ , C ), respectively. For each (ϕ, C), we find all (ϕ , C ) ∈ D y and (ϕ , C ) ∈ D z , such that ϕ = ϕ = ϕ , C = C ∪ C and C ∩ C = {ζ(ϕ(V x ))} and add them to M and M , respectively. Then we obtain L := R(y, M ) and L := R(z, M ). Let (ϕ 1 , C 1 ) and (ϕ 1 , C 1 ), . . . , (ϕ k , C k ) and (ϕ k , C k ) be the pairs constructed from single (ϕ, C) ∈ M and let L (ϕ 1 ,C 1 ) , . . . , L (ϕ k ,C k ) and L (ϕ 1 ,C 1 ) , . . . , L (ϕ k ,C k ) be the corresponding lists in L and L , respectively. We obtain L (ϕ,C) corresponding to (ϕ, C) as a union of lists L 1 , . . . , L k , where L i contains the union of Φ and Φ for each Φ ∈ L (ϕ i ,C i ) and each Φ ∈ L (ϕ i ,C i ) .

Experimental Results
The testing was performed on a 64-bit linux system with Intel Xeon CPU E3-1245v6@3.70GHz and 32GB 1333MHz DDR3 SDRAM memory. The module was compiled with gcc compiler (version 7.3.1) with -O3 optimizations enabled. Implementation and instances utilized in the testing are available at http://users.fit.cvut.cz/malikjo1/subiso/. All results are an average of 5 independent measurements.
We evaluated our implementation in several ways. Firstly, we compare available implementations on two different real world source graphs and a set of more-or-less standard target graph patterns. Secondly, we compare available implementations on instances from ICPR2014 Contest on Graph Matching Algorithms for Pattern Search in Biological Databases [11] with suitably small patterns. We also adapt the idea of testing the algorithms on Erdős-Rényi random graphs [25].

Algorithm Properties and Performance
In the first two subsection we used two different graphs of various properties as target graph G. The first instance, Images, is built from an segmented image, and is a courtesy of [29]. It consists of 4838 vertices and 7067 edges. The second instance, Trans, is a graph of transfers on bank accounts. It is a very sparse network, which consists of 45733 vertices and 44727 undirected edges.

Fig. 1: Pattern graph illustration
For the pattern graphs, we first use a standard set of basic graph patterns, as the treewidth of such graphs is well known and allows a clear interpretation of the results. In particular, we use paths, stars, cycles, an complete graphs on n vertices, denoted P n , S n , C n , and K n with treewidth 1, 1, 2, and n − 1, respectively. We further used grids G n,m on n × m vertices, with treewidth min{n, m}. Secondly, we use a special set of pattern graphs in order to demonstrate performance on various patterns. Patterns A, B, C, and D have 9, 7, 9, and 7 vertices, 8, 7, 12, 6 edges, and treewidth 1, 2, 2, and 2, respectively. Patterns A, B, and D appear in both dataset, pattern C in neither and pattern D is disconnected. Description of these pattern graphs is shown in Table 1 and their illustration is shown in Figure 1.
Due to randomization, in order to achieve some preselected constant error rate, we need to repeat the computation more than once. The number of found results thus depends not only on the quality of the algorithm, but also on the choice of the number of its repetitions. Hence, it is logical to measure performance of the single run of the algorithm. Results from such a testing, however, should be still taken as a rough average, because the running time of a single run of the algorithm depends on many factors. Therefore, we first present measurements, where we average the results of many single runs of the algorithm. We average not only the time and space needed, but also the number of found subgraphs. To obtain the expected time  needed to run the whole algorithm, it suffices to sum the time needed to create a nice tree decomposition and times the time required for a single run, if there are runs in total.

Comparison on Real World Graphs and Fixed Graph Patterns
We compare our implementation to three other tools for subgraph enumeration: RI algorithm [7] (as implemented in [22]), LAD algorithm [28] and color coding algorithm for weighted path enumeration [16] (by setting, for comparison purposes, all weights of edges to be equal). The comparison is done on the instances from previous subsection and only on pattern graphs which occur at least once in a particular target graph. In comparison, note the following specifics of measured algorithms. The RI algorithm does not support outputting solutions, which might positively affect its performance. LAD algorithm uses adjacency matrix to store input graphs, and thus yields potentially limited use for graphs of larger scale. Neither of RI or LAD algorithms supports enumeration of disconnected patterns. 4 Also we did not measure the running time of the weighted path algorithm on non-path queries and also on Trans dataset, as its implementation is limited to graph sizes of at most 32 000. We run our algorithm repeatedly to achieve an error rate of ε = 1 e . In order to be able to measure the computation for larger networks with many occurrences of the pattern, we measure only the time required to retrieve no more than first 100 000 solutions and we also consider running time greater than 10 minutes (600 seconds) as a timeout. Since we study non-induced occurrences (and due to automorphisms) there might be several ways to map the pattern to the same set of vertices. Other measured algorithms do count all of them. Our algorithm can behave also like this, or can be switched to count only occurrences that differ in vertex sets. For the sake of equal measurement, we use the former version of our algorithm.
From Table 4 and Table 5, we can see that RI algorithm outperforms all other measured algorithms. We can also say our algorithm is on par with LAD algorithm, as the results of comparison of running times are similar, but vary instance from instance. Our algorithm nevertheless clearly outperforms another color coding algorithm, which on one hand solves more complicated problem of weighted paths, but on the another, is still limited only to paths. Also, our algorithm is the only algorithm capable of enumerating disconnected patterns.
The weak point of the color coding approach (or possibly only of our implementation) appears to be the search for a pattern of larger size with very few (or possibly zero) occurrences. To achieve the desired error rate, we need to repeatedly run the algorithm many times. Therefore our algorithm takes longer time to run on some instances (especially close to zero-occurrence ones), which are easily solved by the other algorithms.

ICPR2014 Contest Graphs
To fully benchmark our algorithm without limitations on time or number of occurrences found, we perform a test on ICPR2014 Contest on Graph Matching Algorithms for Pattern Search in Biological Databases [11].
In particular, we focus our attention on a Molecules dataset, containing 10,000 (target) graphs representing the chemical structures of different small organic compounds and on a Proteins dataset, which contains 300 (target) graphs representing the chemical structures of proteins and protein backbones. Target graphs in both datasets are sparse and up to 99 vertices or up 10,081 vertices for Molecules and Proteins, respectively.
In order to benchmark our algorithm without limiting its number of iterations, we focus on pattern graphs of small sizes, which offer reasonable number of iterations for an error rate of 1 e . Both datasets contain 10 patterns for each of considered sizes constructed by randomly choosing connected subgraphs of the target graphs. We obtained an average matching time of all pattern graphs of a given size to all target graphs in a particular dataset. From the results in Table 6, we can see our algorithm being on par with LAD algorithm, while being outperformed by RI algorithm. However, we mainly include these results as a proof of versatility of our algorithm. As discussed in [25], benchmarks created by constructing subgraphs of target graphs do not necessarily encompass the possible hardness of some instances and might even present a distorted view on algorithms' general performance. Thus, in the following benchmark we opt to theoretically analyze our algorithm.

Erdős-Rényi Graph Setup
In order to precisely analyze the strong and weak points of our algorithm we measure its performance is a setting where both the pattern and the target are taken as an Erdős-Rényi random graph of fixed size with varying edge density and compare the performance of our algorithm with the analysis of McCreesh et al. [25], which focused on algorithms Glasgow, LAD, and VF2.
An Erdős-Rényi graph G(n, p) is a random graph on n vertices where each edge is included in the graph independently at random with probability p. We measure the performance on target graph of 150 vertices and pattern graph of 10 vertices with variable edge probabilities. As our algorithm cannot be classified in terms of search nodes used (as in [25]), we measure the time needed to complete 10 iterations of our algorithm.  Behavior for target graph of 150 vertices and pattern graph of 10 vertices. The x-axis is the pattern edge probability, the y-axis is the target edge probability, from 0 to 1 with step of 0.03. Graph shows the time required for our algorithm to complete 10 iterations (the darker, the more time is required). Black regions indicate instances on which a timeout of 600 s occurred.
From Fig. 4 we can see our algorithm indeed follows a well observed phase transition (transition between instances without occurrence of the pattern and with many occurrences of the pattern). If we compare our results from Fig. 2 to the results of [25], we can see that hard instances for our algorithm start to occur later (in terms of edge probabilities). However, due to the almost linear dependency of treewidth on edge probabilities (see Fig. 3), hard instances for our algorithm concentrate in the "upper right corner" of the diagram, which contains dense graphs with naturally large treewidth.   Therefore, it seems that our algorithm complements the portfolio of algorithms studied by Kotthoff et al. [20] by an algorithm suitable just below the phase transition (in view of Fig. 2).

Conclusion
We described an efficient implementation of the well known color coding algorithm for the subgraph isomorphism problem. Our implementation is the first color-coding based algorithm capable of enumerating all occurrences of patterns of treewidth larger than one. Moreover, we have shown that our implementation is competitive with existing state-of-the-art solutions in the setting of locating small pattern graphs. As it exhibits significantly different behaviour than other solutions, it can be an interesting contribution to the portfolio of known algorithms [20,25].
As an obvious next step, the algorithm could be made to run in parallel. We also wonder whether the algorithm could be significantly optimized even further, possibly using some of the approaches based on constraint programming.