Computing the number of induced copies of a fixed graph in a bounded degree graph

In this paper we show that for any graph $H$ of order $m$ and any graph $G$ of order $n$ and maximum degree $\Delta$ one can compute the number of subsets $S$ of $V(G)$ that induces a graph isomorphic to $H $in time $O(c^m\cdot n)$ for some constant $c = c(\Delta)>0$. This is essentially best possible.


Introduction
For two graphs H and G we denote by ind(H, G) the number of subsets of the vertex set of G that induce a graph that is isomorphic to H. (We recall that two graphs H = (V H , E H ) and G = (V G , E G ) are said to be isomorphic if there exists a bijection f : V H → V G such that for any u, v ∈ V H , we have that f (u) f (v) ∈ E G if and only if uv ∈ E H .) Throughout we take G and H to have n and m vertices respectively.
Understanding the numbers ind(H, G) for different choices of H gives us much important information about G. Determining these induced subgraph counts is closely related to determining subgraph counts and homomorphism counts; these parameters play a central role in the theory of graph limits [11], and frequently appear in statistical physics see e.g. Section 2.2 in [14] and the references therein.
When H and G are both part of the input, computing ind(H, G) is clearly an NPhard problem because it includes the problem of determining the size of a maximum clique in G. When the graph H is fixed, the brute-force algorithm takes time O(n m ) and improvements have been made for certain choices of the pattern graph H [9,12].
It is natural to consider this problem from a fixed parameter tractability (FPT) perspective. In general computing ind(H, G) when parameterizing by m = |H| is W[1]hard because even deciding whether G contains an independent set of size m is W [1]hard [7]. Curticapean, Dell and Marx [6] prove a number of interesting dichotomy results for W [1]-hardness using the treewidth of H (and of a certain class of graphs obtained from H) as an additional parameter.
However, when the graph G is of bounded degree, which is often of interest in statistical physics, the problem is no longer W[1]-hard. Indeed, Curticapean, Dell, Fomin, Goldberg, and Lapinskas [5,Theorem 13] showed that for a graph H on m vertices and a bounded degree graph G on n vertices, ind(H, G) can be computed in time O (nm O(m) ), thus giving an FPT algorithm in terms of m. In the present paper we go further and give an algorithm with essentially optimal running time. We assume the standard word-RAM machine model with logarithmic-sized words. Theorem 1.1. There is an algorithm which, given an n-vertex graph G of maximum degree at most ∆, and an m-vertex graph H computes ind(H, G) in timeÕ(n(4∆) 2m + 2 10m ). (Here thẽ O-notation means that we suppress polynomial factors in m.) Remark 1.1. Theorem 13 in [5] in fact concerns vertex-coloured graphs H and G. Our proof of Theorem 1.1 also easily extends to the coloured setting. We discuss this in Section 4.
The running time here is essentially optimal under the exponential time hypothesis. Indeed, if we could find an algorithm with an improved running time O(c o(m) poly(n)) (for some constant c possibly dependent on ∆), we could use it to determine the size of a maximum independent set in time O(nc o(n) poly(n)) = O(c o(n) poly(n)), which is not possible (even in graphs of maximum degree 3) under the exponential time hypothesis; see [8,Lemma 2.1].
Note that our algorithm allows us to compute ind(H, G) in polynomial time in |G|, even when |H| is logarithmic in |G| (and G has bounded degree). The special case of this when H is an independent set was a crucial ingredient in our recent paper [13], which uses the Taylor approximation method of Barvinok [2] to give (amongst others) a fully polynomial time approximation algorithm for evaluating the independence polynomial for bounded degree graphs. Our present paper completes the running time complexity picture for computing ind(H, G) on bounded degree graphs G.
We add a few remarks to give further perspective on the problem. Note that computing ind(H, G) in timeÕ(∆ m n) is relatively straightforward for G of bounded degree ∆ when H is connected (see Lemma 2.2). Thus the difficulty lies in graphs H that have many components. Note also that Curticapean et al. [5] use the fact that induced graph counts can be expressed in terms of homomorphism counts (see e.g. [11]) and that homomorphism counts from H to G can be computed in timeÕ(∆ m n) in their FPT algorithm. However the limiting factor is the time cost of expressing induced graph counts in terms of homomorphism graph counts, which is significantly larger than the running time of our algorithm.
Both our approach and the approach in [5] crucially use the bounded degree assumption. It would be very interesting to know if Theorem 1.1 could be extended to graphs of average bounded degree such as for example planar graphs. Organization The remainder of the paper is devoted to proving Theorem 1.1. The main idea in our proof is to define a multivariate graph polynomial where the coefficients are certain induced graph counts; in particular ind(H, G) will be the coefficient of a monomial. We then want to use machinery from [13] to compute coefficients of univariate evaluations of this polynomial. However, we need to slightly modify the result from [13]. This will be done in the next section, and in Section 3, we combine certain univariate evaluations to compute the coefficients of the multivariate polynomial.

Computing coefficients of graph polynomials
An efficient way to compute the coefficients of a large class of (univariate) graph polynomials for bounded degree graphs was given in [13]. We will need a small modification of this result, for which we will provide the details here. We start with some definitions after which we state the main result of this section.
By G we denote the collection of all graphs and by G k for k ∈ N we denote the collection of graphs with at most k vertices. A graph invariant is a function f : G → S for some set S that takes the same value on isomorphic graphs. A (univariate) graph polynomial is a graph invariant p : G → C[z], where C[z] denotes the ring of polynomials in the variable z over the field of complex numbers. Call a graph invariant f multiplicative if f (∅) = 1 and f (G 1 ∪ G 2 ) = f (G 1 ) f (G 2 ) for all graphs G 1 , G 2 (here G 1 ∪ G 2 denotes the disjoint union of the graphs G 1 and G 2 ). We can now give the key definition and tool we need from [13]. Definition 2.1. Let p be a multiplicative graph polynomial defined by for each G ∈ G with e 0 (G) = 1, where d(G) is the degree of the polynomial p(G). We call p a bounded induced graph counting polynomial (BIGCP) if there exists α ∈ N and a non-decreasing sequence β ∈ N N such that the following two conditions are satisfied: (i) for every graph G, the coefficients e i satisfy for certain λ H,i ∈ C; (ii) for each i and H ∈ G αi , the coefficient λ H,i can be computed in time β i .
We have the following result for computing coefficients of BIGCPs. Before we prove Theorem 2.1 we will first gather some facts from [13] about induced subgraph counts and the number of connected induced subgraphs of fixed size that occur in a graph. Compared to [13] we actually need to slightly sharpen the statements.

Induced subgraph counts
Define ind(H, ·) : G → C by G → ind(H, G). So we view ind(H, ·) as a graph invariant. We can take linear combinations and products of these invariants. In particular, for two graphs H 1 , H 2 we have In what follows we will often have to maintain a list L of subsets S of [n] with |S| ≤ k (for some k) as well as some (complex) number c S associated to S. We will use the standard word-RAM machine model with logarithmic-sized words. This means that given a set S of size k, we have access to c S in O(k) time. In particular, this also means we can determine whether S is contained in our list in O(k) time.
The next lemma says that computing ind(H, G) is fixed parameter tractable (and moreover gives an essentially optimal running time) when G has bounded degree and H is connected. Note that Lemma 2.2 (i) enables us to test for graph isomorphism between bounded degree graphs when |V(G)| = |V(H)|.
Proof. We follow the proof from [13]. We assume that V(G) = [n]. Let us list the vertices of V(H), v 1 , . . . , v k in such a way that for i ≥ 1 vertex v i has a neighbour among v 1 , . . . , v i−1 . Then to embed H into G we first select a target vertex for v 1 and then given that we have embedded v 1 , . . . , v i−1 with i ≥ 2 there are at most ∆ choices for where to embed v i . After k iterations, we have a total of at most n∆ k−1 potential ways to embed H and each possibility is checked in the procedure above. Hence we determine if ind(H, G) is zero or not in O(n∆ k−1 ) time.
Throughout the procedure above we maintain a list L that contains all sets S such that G[S] = H found thus far. Each time we find a set S ⊂ [n] such that G[S] = H we check if it is contained in L. If this is not the case we add S to L and we discard S otherwise. The length of the resulting list gives the value of ind(H, G).
Next we consider how to enumerate all possible connected induced subgraphs of fixed size in a bounded degree graph. We will need the following result of Borgs, Chayes, Kahn, and Lovász [ As a consequence we can efficiently enumerate all connected induced subgraphs of logarithmic size that occur in a bounded degree graph G. Proof. We assume that V(G) = [n]. By the previous result, we know that |T k | ≤ n(e∆) k−1 for all k.
We inductively construct T k . For k = 1, T k is clearly the set of singleton vertices and takes time O(n) to output.
Given that we have found T k−1 we compute T k as follows. We iteratively compute T k by going over all S ∈ T k−1 going over all v ∈ N G (S) (the collection of vertices that are connected to an element of S) and checking whether S ∪ {v} is already contained in T k or not. We add it to T k if it is not already contained in T k .
The set N G (S) has size at most |S|∆ ≤ k∆ and takes time O(k∆) to find (assuming G is given in adjacency list form). Therefore computing T k takes time bounded by O(|T k−1 |k 2 ∆) = O(nk 2 (e∆) k ).
Starting from T 1 , we perform the above iteration k times, requiring a total running time of O(nk 3 (e∆) k ). The proof that T k contains all the sets we desire is straightforward and can be found in [13].
We call a graph invariant f : The following lemma is a variation of a lemma due to Csikvári and Frenkel [4]; it is fundamental to our approach. See [13] for a proof. The next proposition is a variant of the Newton identities that relate the inverse power sums and the coefficients of a polynomial. We refer to [13] for a proof.

Proof of Theorem 2.1
We follow the proof as given in [13], which we modify slightly at certain points.
Recall that p(·) is a bounded induced graph counting polynomial (BIGCP). Given an n-vertex graph G with maximum degree at most ∆, we must show how to compute the first m coefficients of p. We will useÕ-notation throughout to mean that we suppress polynomial factors in m. To reduce notation, let us write p = p(G), d = d(G) for the degree of p, and e i = e i (G) for i = 0, . . . , d for the coefficients of p (from (1)). We also write p k := ζ −k 1 + · · · + ζ −k d , where ζ 1 , . . . , ζ d ∈ C are the roots of the polynomial p(G). Noting e 0 = 1, Proposition 2.6 gives for each k = 1, . . . , d.
By (2), for i ≥ 1, the e i can be expressed as linear combinations of induced subgraph counts of graphs with at most αi vertices. Since p 1 = −e 1 , this implies that the same holds for p 1 . By induction, (3), and (4) we have that for each k for certain, yet unknown, coefficients a H,k . Since p is multiplicative, the inverse power sums are additive. Thus Lemma 2.5 implies that a H,k = 0 if H is not connected. Denote by C i (G) the set of connected graphs of order at most i that occur as induced subgraphs in G. Let us assume that G has vertex set [n]. Denote by T ≤αk (G) the list consisting of those sets S ⊆ [n] of size at most αk that induce a connected graph in G. This way we can rewrite (5) as follows: The next lemma says that we can compute the coefficients a S,k := a G[S],k efficiently for k = 1, . . . , m. (6) for all S ∈ T ≤αk (G) and all k = 1, . . . , m.

Lemma 2.7. There is anÕ(n(e∆) αm β m 4 αm )-time algorithm, which given a BIGCP p (with parameters α and β) and an n-vertex graph G of maximum degree ∆, computes and lists the coefficients a S,k in
Proof. We assume that the vertex set of G is equal to [n]. Using the algorithm of Lemma 2.4, we first compute the list T ≤αk consisting of all subsets S of V(G) such that |S| ≤ αk and G[S] is connected. This takes time bounded by O(n(αm) 3 (e∆) αm ) =Õ(n(e∆) αm ).
(Note that the algorithm in Lemma 2.4 actually computes T ≤αk when it computes T αm .) To prove the lemma, let us fix k ≤ m and show how to compute the coefficients a S,k , assuming that we have already computed and listed the coefficients a S ′ ,k ′ for all k ′ < k and S ′ ∈ T ≤αk ′ . Let us fix S ∈ T ≤αk . Let H = G[S]. By (4), it suffices to compute the coefficient of ind(H, ·) in p k−i e i for i = 1, . . . , k (where we set p 0 = 1). By (2), (3) and (5) we know that the coefficient of ind(H, ·) in p k−i e i is given by As |V(H)| ≤ αk, the second sum in (8)

is over at most 4 αk = O(4 αm ) pairs (U, T). For each such pair, we need to compute λ H[U],i and a H[T],(k−i) . We can compute λ H[U],i in time bounded by O(β i ) = O(β m ) since p is a BIGCP. As H[T] = G[T], to compute a H[T],(k−i) we just need to look up the coefficient a T,k−i , which takes time O(k − i).
Together, all this implies that the coefficient of ind(H, ·) in p k−i e i can be computed in time bounded by O(4 αm (β m + m)) =Õ(β m · 4 αm ).
by Lemma 2.3. So the total running time is bounded by the time to compute the list T ≤αm (which is given by (7)) plus the time to compute the a S,k for S ∈ T ≤αk (which is given by (10)) for k = 1, . . . , m. This proves the lemma.
Computing ind(H, G) for any two graphs H = i 1 H 1 ∪ · · · ∪ i r H r and G can now be modelled as computing the coefficient of the monomial x Let us start by gathering some facts about the polynomial Z H . Proposition 3.1. The polynomial Z H is multiplicative, i.e., for any two graphs G 1 and G 2 , x). In particular, any evaluation of Z H is also multiplicative.
Proof. Note first that every monomial in Z H (G; x) is of the form x γ•h for some unique choice of γ. For notational convenience we write s γ (G) := ind(γH, G). Consider the coefficient of which counts precisely the number of copies of γ 1 H 1 ∪ · · · ∪ γ r H r in G 1 ∪ G 2 , that is, Suppose µ ∈ Z r ≥0 and let z be a variable. Define the graph polynomial Z µ = Z µ,H (G) ∈ Z[z] by Z µ (z) = Z H (G; (µ 1 z, . . . , µ r z)) =: ∑ i≥0 s i (µ)z i ; here the second equality defines the numbers s i (µ) = s i (µ)(G). In particular, we know that Proposition 3.2. Fix H = (H 1 , . . . , H r ) where the H i are pairwise non-isomorphic connected graphs each of maximum degree at most ∆ and fix µ ∈ Z r ≥0 . Then Z µ,H (G; z) is a BIGCP with parameters α = 1 and β i = i 2 r∆ i−1 .
Proof. Since Z µ,H (G) is a particular evaluation of Z H (G), we know by Proposition 3.1 that it is multiplicative.
The coefficient of z i in Z µ,H (G; z) is given by (11). Since γH is a graph with exactly γ · h = i vertices, we can take α to be 1 in the definition of BIGCP.
For a given graph F, we must determine λ F,i in the definition of BIGCP and the time β i required to do this. Note that we may assume |V(F)| = i; otherwise λ F,i = 0. If |V(F)| = i, we must test if F is isomorphic to a graph of the form γH with γ · h = i and if so we must output the value of λ F,i as µ γ (this last step taking i arithmetic operations). To test if F is isomorphic to a graph of the form γH, we test isomorphism of each component of F against each of the graphs H 1 , . . . , H r , which takes time at most O(ir∆ i−1 ) using Lemma 2.2 at most ir times. Thus the total time to compute λ F,i is at most O(i 2 r∆ i−1 ). Now since Z µ,H (G; z) is a BIGCP, Theorem 2.1 allows us to compute the coefficients s i (µ) in (11) with the desired running time. However the s i (µ) are linear combinations of the numbers ind(γH, G), while we wish to compute one of these numbers in particular, say ind(ρH, G). By making careful choices of different µ, we will obtain an invertible linear system whose solution will include the number ind(ρH, G). We will require Alon's Combinatorial Nullstellensatz [1], which we state here for the reader's convenience. 1 · · · x µ n n in f is nonzero and µ 1 + · · · + µ n = d. If S 1 , . . . , S n are finite subsets of F with |S i | ≥ µ i + 1 then there exists a point x ∈ S 1 × · · · × S n for which f (x) = 0.
Given a vector h ∈ N r , let us write P m,r,h for the set of vectors γ ∈ Z r ≥0 such that γ · h = m. We note that, as the the number of elements in P m,r,h is at most the number of monomials in r variables of degree m, we have Lemma 3.4. Fix m, r ∈ N and h ∈ N r , and let γ 1 , . . . , γ k be an enumeration of the elements in P m,r,h . Given a vector ν ∈ N r , let us write ν * ∈ N k for the vector (ν γ i •h ) n i=1 ∈ N k . In time O(k 5 + k 2 me m ), we can find vectors ν 1 , . . . , ν k ∈ N r such that ν * 1 , . . . , ν * k are linearly independent.
Assume the components of γ ℓ ∈ N r are a 1 , . . . , a r . Applying Theorem 3.3 to the monomial x γ ℓ •h and taking the sets S i = {1, . . . , a i h i + 1} for i = 1, . . . , r, we know there exists a vector ν ℓ ∈ S := S 1 × · · · × S r such that P(ν ℓ,1 , . . . , ν ℓ,ℓ ) = 0. Computing the polynomial P requires time at most O(k · k 3 ) (using that computing the determinant of an n × n matrix takes O(n 3 ) time) and evaluating it at every point in S requires at most O(m · k · |S|) operations. We can bound |S| as follows: The first inequality follows from the arithmetic-geometric mean inequality. Iterating the procedure, we can determine ν 1 , . . . , ν k in time O(k · (k 4 + mk|S|)) ≤ O(k 5 + k 2 me m ).
Remark 3.1. We suspect there should be a simpler argument than the one we have just given (perhaps one where the vectors ν 1 , . . . , ν k can be explicitly written down rather than having an algorithm to determine them). Note that one can also use a faster randomised algorithm by applying the Schwarz-Zippel Lemma.
We can now prove Theorem 1.1.

Recall that this coefficient is
More conveniently, writing s ∈ Z k ≥0 for the vector given by s j := s γ j = (ind(γ j H, G)) k j=1 , we have the invertible system of linear equations given by ν * i · s = s m (ν i ) for i = 1, . . . , k, where we have computed the values of s m (ν i ) and ν * i , while the vector s is unknown (the system is invertible because we chose the ν * i to be linearly independent). We can then invert the system in time O(k 3 ) =Õ(2 6m ). In particular finding the value of s 1 = ind(ρH, G) can be done inÕ(2 6m ). The total running time is bounded byÕ(n(7∆) 2m + 2 10m )).

Concluding remarks
As we remarked in the introduction our approach also works in the setting of vertexand edge-coloured graphs. We will not elaborate on the details here, but just refer the interested reader to Section 3.3 of [13] where we have briefly explained how to extend the results for computing coefficients of BIGCPs to the setting of coloured graphs. In addition we note that the part of the proof given in Section 3 also carries over to the coloured graphs setting replacing graph by coloured graph everywhere.
We moreover remark that the approach used to prove Theorem 1.1 is very robust. Besides extending to the coloured setting, it also easily extends to other graph like structures. For example, in [13] it has been extended to fragments, i.e., vertex-coloured graphs in which some edges may be unfinished and more recently, Liu, Sinclair, and Srivastava [10] extended it to insects, i.e., vertex-coloured hypergraphs in which some edges may be unfinished. We expect our approach to be applicable to the problem of counting (induced) substructures in other structures as well, as long as there is a notion of connectedness and maximum degree.