Compression of Dynamic Graphs Generated by a Duplication Model

We continue building up the information theory of non-sequential data structures such as trees, sets, and graphs. In this paper, we consider dynamic graphs generated by a full duplication model in which a new vertex selects an existing vertex and copies all of its neighbors. We ask how many bits are needed to describe the labeled and unlabeled versions of such graphs. We first estimate entropies of both versions and then present asymptotically optimal compression algorithms up to two bits. Interestingly, for the full duplication model the labeled version needs Θ(n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta (n)$$\end{document} bits while its unlabeled version (structure) can be described by Θ(logn)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta (\log n)$$\end{document} bits due to significant amount of symmetry (i.e. large average size of the automorphism group of sample graphs).


3
underlying graph); (ii) how to infer underlying dynamic processes governing network evolution; (iii) how to infer information about previous states of the network; and (iv) how to predict the forward evolution of the network state. In this paper we deal with the first question (i.e., labeled and unlabeled graph compression).
Clearly, some models are more suitable to certain types of data than others. For example, it has been claimed that the preferential attachment mechanism [2] plays a strong role in the formation of citation networks [23]. However, due to the high power law exponent of their degree sequence (greater than 2) and lack of community structure [6], preferential attachment graphs are not likely to describe well biological networks such as protein interaction networks or gene regulatory networks [19]. For such networks another model, known as the vertex-copying model, or simply the duplication model, has been claimed as a better fit [25]. In the vertex-copying model, one picks an existing vertex and inserts its clone, possibly with some random modifications, depending on the exact variation of the model [6,14,20]. Experimental results show that these variations on the duplication model better capture salient features of protein interaction networks than does the preferential attachment model [22].
In this paper we present comprehensive information-theoretic results for the full duplication model in which every new vertex is a copy of some older vertex. We establish precisely (that is, within a o(1) additive error) the entropy for both unlabeled and labeled graphs generated by this model and design asymptotically optimal compression algorithms that match the entropies up to a constant additive term. Interestingly, we shall see that the entropy of labeled graphs is H(G n ) = Θ(n) , while the structural entropy (the entropy of the isomorphism class of a random graph from the model, denoted by S(G n ) ) is significantly smaller: H(S(G n )) = Θ(log n) . Thus, the vast majority of information of the labeled graphs in this model is present in the labeling itself, not in the underlying graph structure. In contrast, the entropy of the labeled and generated by, e.g., the preferential attachment model is Θ(n log n) [17].
Clearly, given its simplicity, this model should be regarded as a stepping stone toward a better understanding of more advanced models of this type. The extensions are typically defined by a fixed-probability mix of the full duplication model and other rules, such as no-duplication or uniform attachment. We shall deal with such models in a forthcoming paper.
Graph compression has enjoyed a surge in popularity in recent years, as the recent survey [3] shows. However, rigorous information-theoretic results are still lacking, with a few notable exceptions. The rigorous information-theoretic analysis of graph compression (particularly in the unlabeled case) was initiated by Choi and Szpankowski [5], who analyzed structural compression of Erdős-Rényi graphs (see also [1]). The authors of [5] presented a compression algorithm that provably achieves asymptotically the first two terms of the structural entropy. In Łuczak et al. [17] the authors precisely analyzed the labeled and structural entropies and gave asymptotically optimal compression algorithms for preferential attachment graphs. There has been recent work on universal compression schemes, including in a distributed scenario, by Delgosha and Anantharam [8,9]. Additionally, several works deal with compression of trees [11,12,18,26]. The full duplication model was almost exclusively analyzed in the context of the typical properties such as degree distribution [6]. It was shown that the average degree depends strongly on the initial conditions [16]. It was also proved that the asymptotic degree distribution fails to converge, yet it exhibits power-law behavior with exponent dependent on the lowest nonzero degree in the initial graph [21]. Other parameters studied in the context of duplication models are the number of small cliques [13] or degree-degree correlations [4]. To the best of our knowledge the entropy and compression of duplication models were not discussed previously in any available literature.
The rest of the paper is organized as follows: In Sect. 2 we define the full duplication model and present its basic properties. In Sect. 3 we establish main results concerning the entropy of the unlabeled and labeled graphs with Sect. 4 being devoted to the construction of algorithms that achieve these bounds within a constant additive term.

Full Duplication Model
In this section we define the full duplication model and present some of its properties.

Definitions
The full duplication model is defined as follows: let us denote by G 0 a given graph on n 0 vertices for some fixed constant n 0 . Then, for any 1 ≤ i ≤ n we obtain G i from G i−1 by choosing one of the vertices of G i−1 (denoted by v) uniformly at random, attaching to the graph a new vertex v i and adding edges between v i and all vertices adjacent to v. Note that v and v i are not connected -although if one wants to achieve higher clustering, the results in this paper can be straightforwardly applied to the model in which we add not only edges between v i and the neighbors of v, but also between v i and v. Observe that G n has n + n 0 vertices. Also, properties of G n heavily depend on G 0 and its structure, which we assume to be fixed.
Throughout this paper, we will refer to the vertices of the starting graph G 0 as {u 1 , … , u n 0 } and to all other vertices from G n as {v 1 , … , v n } . We denote by V(G) and E(G) the set of vertices and the set of edges of a graph G, respectively. Moreover, we denote by N n (v) the neighborhood of the vertex v, that is, all vertices that are adjacent to v in G n . Sometimes we drop the subscript, if the size of the graph is clear from the context.
An example of the duplication process is presented in Fig. 1. On the top, we show the original G 0 on 6 vertices, and on the bottom we plot G 3 with new vertices such that v 1 is a copy of u 2 , v 2 is a copy of u 1 , and v 3 is a copy of v 1 .
Here, due to the limited space, we restrict our analysis to asymmetric G 0 (i.e., the underlying automorphism group is of size 1); however, extensions to general G 0 are rather straightforward. We observe that typically even moderate-sized graphs are likely to be asymmetric.

Basic Properties
Let us introduce the concept of a parent and an ancestor of a vertex. We say that w is the parent of v (denoted by w = P(v) ), when v was copied from w at some For convenience we write that if u ∈ U , then P(u) = u and A(u) = u . Note that the ancestor of any given vertex is unique. In our example from Fig. 1 u 2 is the ancestor of both v 1 and v 3 , but only a parent of v 1 and not v 3 .
Let now define the set of descendants of u i ∈ U : The neighborhood of a vertex is closely tied to its ancestor, as the following lemma shows: Lemma 1 Let us fix any 1 ≤ i ≤ n 0 . For all n ≥ 0 and any v ∈ C i,n we have Proof We prove this by induction. For n = 0 we have C i,0 = {u i } and the claim holds. Now suppose that the claim holds for some n ≥ 0 and that P(v n+1 ) = w . If We split the remaining part of the proof into several cases: by induction hypothesis we have Therefore, the proof is completed. ◻ This means that effectively G n is composed of clusters such that every vertex of i-th cluster is connected to every vertex of j-th cluster if and only if u i u j ∈ E(G 0 ) . For example, for a graph in Fig. 1b we may identify (marked with ellipses in the figure) the following classes of vertices with identical neighborhoods: Let now C i,n = |C i,n | , that is, the number of vertices from G n that are ultimately copies of u i (including u i itself).
It is not hard to see that the sequence of variables can be described as a ball and urn model with n 0 urns. At time n = 0 each urn contains exactly one ball. Each iteration consists of picking an urn at random, proportionally to the number of balls in each bin -that is, with probability C i,n ∑n 0 j=1 C j,n -and adding a new ball to this urn. It is known [15] that the joint distribution of (C i,n ) n 0 i=1 is directly related to the Dirichlet multinomial distribution denoted as Dir(n, 1 , … , K ) , with K = n 0 and where B(x, y) is the Euler beta function. Pr Each variable C i,n is identically distributed -though not independent, as we know that ∑ n 0 i=1 C i,n = n -so we may analyze the properties of C n ∼ C i,n for every 1 ≤ i ≤ n 0 . Actually, C n − 1 has the beta-binomial distribution BBin(n, , ) with parameters = 1 , = n 0 − 1 . That is, for any k ≥ 0: Chung et al. claimed in [6] that the distribution of C n can be approximated by a den- Instead, here we have an exact formula.
Moreover, since C n ∼ BBin(n, 1, n 0 − 1) + 1 we know immediately that C n = n n 0 + 1 . For further results we will also need further properties of the beta binomial distribution (with proofs provided in the appendices). Note that all the logarithms used in subsequent theorems (unless explicitly noted as ln ) have base 2.

Main Theoretical Results
As discussed in the introduction, our goal is to present results for the duplication graphs on structural parameters which are fundamental to statistical and information-theoretic problems involving the information shared between the labels and the structure of a random graph. In graph structure compression the goal is to remove label information to produce a compact description of a graph structure. Formally, the labeled graph compression problem can be phrased as follows: one is given a probability distribution G n on graphs on n vertices, and the task is to exhibit a pair of mappings (i.e., a source code) (E, D), where E maps graphs to binary strings satisfying the standard prefix code condition, and D maps binary strings back to graphs, such that, for all graphs G, D(E(G)) = G , and the expected code length [|E(G)|] , with G ∼ G n , is minimized. The standard source coding theorem tells us that the fundamental limit for this quantity is H(G), the Shannon entropy, defined as: where G is a functional of the distribution, not a fixed graph.
The unlabeled version of this problem relaxes the invertibility constraint on the encoder and decoder. In particular, we only require D(E(G)) ≅ G ; i.e., the decoder only outputs a graph isomorphic to G. Again, the optimization objective is to minimize the expected code length. Thus, in effect, the source code efficiently describes the isomorphism type of its input. Denoting by S(G) the isomorphism type of G, the fundamental limit for the expected code length is the structural entropy of the model, which is given by H(S(G)).
There is a relation between the labeled entropy H(G) and structural entropy H(S(G)). To express it succinctly for a broad class of graph models we need the automorphism group 1 Aut(G) , and the set Γ(G) of feasible permutations of G; i.e., the set of permutations of G that yield a graph that has positive probability under the random graph model in question. See [5,17] for more details. Now, we are ready to present a relation between H(G) and H(S(G)). The following lemma was proved in [17]: [C n log C n ] = 1 n 0 n log n + n (1 − H n 0 ) log e n 0 + log n

Lemma 4 We have, for any graph model G n in which all positive-probability labeled graphs that are isomorphic have the same probability,
Now we prove the following results regarding the expected logarithms of the sizes of the automorphism group and feasible permutation set for samples G n from the full duplication model.

Lemma 5 We have for large n.
Proof Under the assumption that

. To prove it, it is sufficient to notice that all vertices v, w such that A(v) = A(w) can be mapped on one another arbitrarily (since by Lemma 1 they have equal neighborhoods)-but if A(v) ≠ A(w)
, there does not exist any automorphism for which v and w are in the same orbit. Precisely, this is because, if such a did exist, then one may show that it induces an automorphism of G 0 . Thus, We use Stirling's approximation together with Corollarys 1 and 2 to obtain Finally,

H(G n ) − H(S(G n )) = [log |Γ(G n )|] − [log |Aut(G n )|].
[log |Aut(G n )|] = n log n − nH n 0 log e + 3n 0 2 log n [log C n !] = [C n log C n ] − C n log e + 1 2 The proof is completed. ◻ Observe that G n has n + n 0 vertices; therefore, the trivial upper bound on |Γ(G n )| is (n + n 0 )! . We can do the exact computation of Γ(G n ) using the following lemma:

Lemma 6
For a permutation of all vertices in G n , the following two claims are equivalent:

1.
is a relabeling of G n which produces a positive-probability graph under the full duplication model, 2.
is a permutation such that for every

Proof
In the whole proof we denote by u � 1 , … , u � n 0 the vertices that are mapped by to the starting graph vertices u 1 , … , u n 0 . That is, (⇒ ) Let produce a graph under the considered model with positive probability. Suppose now that there exists 1 ≤ k ≤ n 0 such that u � k ∉ C k,n , but u � k ∈ C l,n for some l ≠ k . Then, by Lemma 1 we know that N n (u k ) = ⋃ u k u j ∈E(G 0 ) C j,n and N n (u � k ) = N n (u l ) = ⋃ u l u j ∈E(G 0 ) C j,n . Since |Aut(G 0 )| = 1 by assumption, N 0 (u k ) ≠ N 0 (u l ) and therefore which proves that N n (u � k ) ≠ N n (u k ) and therefore G ′ 0 cannot be identical to G 0 .
i.e., these vertices are mapped by to vertices outside the seed graph.
By assumption, for every n . Now, in i-th step we may just copy v ′ i from its respective u ′ j . It is easy to check that for the neighborhoods N � (v � i ) in the graph created in this way for every 1 ≤ k ≤ n 0 and every v � i ∈ C k,n we have [log |Aut(G n )|] = n log n − nH n 0 log e + 3n 0 2 log n (1).

3
which concludes the proof. ◻

Lemma 7 Asymptotically
Proof From Lemma 6, we may construct all admissible permutations by choosing for each C i,n exactly one vertex which would be mapped to u i and then arranging remaining n vertices in any order. Therefore: Then and the final result follows from the Stirling approximation. ◻ We now proceed to estimate the structural entropy.

Theorem 1 For large n we have
Proof Recalling that we assume throughout that the initial graph G 0 is asymmetric, it may be seen that the isomorphism type of G n is entirely specified by the vector . We know that (C i,n ) n 0 i=1 has the Dirichlet multinomial distribution with i = 1 for 1 ≤ i ≤ n 0 . Therefore The last two lines follow respectively from the Stirling approximation and the Taylor expansion of log B(n, n 0 ) , which completes the proof. ◻ [log |Γ(G n )|] = n log n − n log e + n 0 + 1 2 log n − H n 0 −1 log e + 1 2 log(2 ) + o (1). (1), (1).
To compute the graph entropy H(G) we can use Lemmas 4, 5 and 7 together with Theorem 1, therefore obtaining the following result.

Theorem 2 For large n
Clearly, to compress the whole G n we would have to encode G 0 as well, but since n 0 is fixed, this does only affect the constant term. Moreover, by the conditional entropy property, any optimal G 0 compression algorithm yields an asymptotically optimal compression for G n .

Algorithmic Results
In this section we present asymptotically optimal algorithms for compression of labeled and generated according to the full duplication model.

Retrieval of Parameters from G n
In order to present efficient compression algorithms for the duplication model, we must first reconstruct G 0 from G n and find values of n 0 and n. This is relatively easy to accomplish, as the proof of the next theorem shows.
Theorem 3 For a given labeled G n or its unlabeled version S(G n ) , we can retrieve its n, n 0 and G 0 (in the case of structure up to isomorphisms of G 0 ) in polynomial times in terms of n.
Proof For a labeled G n let (w 1 , w 2 , … , w n+n 0 ) be its vertices in the order of appearance. Since (w 1 , … , w n 0 ) = (u 1 , … , u n 0 ) and (w n 0 +1 , … , w n 0 +n ) = (v 1 , … , v n ) , it is sufficient to find the smallest k such that N n (w k ) = N n (w i ) for some 1 ≤ i < k . Then n 0 = k − 1 and G 0 is induced by the sequence (w 1 , … , w k−1 ).
The case for is similar: we know (for details see Lemma 6) that the sequence of the first n 0 vertices of the graph (that is, G 0 ) contains exactly one vertex from each set C i,n .
From Lemma 1 it follows that A(v) = A(w) iff N n (v) = N n (w) for every v, w ∈ V(G n ) , so it is sufficient to scan all vertices of G n and split them into sets such that v and w belongs to the same set iff N n (v) = N n (w) . Then, we pick one vertex from each set to from G 0 . Obviously, n 0 and n may be extracted from the sizes of G 0 and G n . ◻ Recall for example that in Fig. 1b we identified the clusters {u 1 , v 2 } , {u 2 , v 1 , v 3 } , {u 3 } , {u 4 } and {u 5 } . Therefore, we know that n 0 = 6 , n = 3 and the G 0 is isomorphic to a graph induced, for example, by the set {v 2 , v 3 , u 3 , u 4 , u 5 }.

Unlabeled Graphs
A trivial algorithm CompressUnlabeledsimple for unlabeled compression writes down a sequence (C i,n ) n 0 i=1 associated with our G n as log n-bit numbers. This always requires n 0 log n bits, so L SU (n) = n 0 log n , where L SU denotes the code length of our proposed scheme. By Theorem 1 this achieves the fundamental limit to within a multiplicative factor of 1 + 1 However, it is easy to design an optimal algorithm up to a constant additive error, provided we have already compressed G 0 or S(G 0 ) (anyway, a graph of fixed size). The pseudocode of an optimal algorithm, called CompressUnlabeledopt, based on arithmetic coding, is as follows: The next finding proves that CompressUnlabeledopt is nearly optimal.
Theorem 4 Algorithm CompressUnlabeledopt is optimal up to a two bits for unlabeled graphs compression, when the graph is generated by the full duplication model.

Proof It is sufficient to observe that
The last equality follows from the fact that the marginal distribution of the Dirichlet multinomial distribution is the beta-binomial distribution, given by Eq. 1. Moreover, if we fix value of the last coordinate of (C i,n ) n 0 i=1 to k + 1 , then the resulting distribution is also (shifted) Dirichlet multinomial, but with n 0 − 1 coordinates and all values summing up to n + n 0 − k − 1.
We repeat this process until we have 2-dimensional distribution: By the properties of arithmetic coding (see e.g. [7]), where L O denotes the code length. This completes the proof. ◻

Labeled Graphs
We note that the labeled graph G n is equivalent to a sequence (A(v i )) n i=1 for a given (labeled) G 0 , which obviously can be encoded separately using a constant number of bits.
A trivial algorithm Compresslabeledsimple just writes all A(v i ) as log n 0 -bit numbers. Clearly, this always gives us a codeword with length exactly L SL (n) = n log n 0 . From Theorem 2 it is known that this algorithm is asymptotically (1 + 1− log n 0 )-approximately optimal, where is Euler-Mascheroni constant.
It is easy to design an asymptotically optimal algorithm up to a constant error. Indeed, the sequence of A(v i ) is random with Pr(A(v i ) = u j ) = C j,i−1 n 0 +i−1 for 1 ≤ i ≤ n , 1 ≤ j ≤ n 0 . Therefore, given G i−1 we know the conditional probabilities of G i and we may construct another algorithm based on arithmetic coding.
The next theorem proves that Compresslabeledopt is almost optimal up to a known additive constant.

Theorem 5 Algorithm
Compresslabeledopt is optimal up to a two bits for labeled graph compression, when the graph is generated by the full duplication model. Proof By the well-known properties of arithmetic encoding (see [7]), we know that L O (G n | G 0 ) ≤ H(G n | G 0 ) + 2 , where L O denotes the code length. ◻ Note that these two algorithms for the labeled graphs differ only in that the optimal one updates the probabilities at each step and the second fixes them to a constant value of 1 n 0 . Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/. Furthermore, for M 2 conditioned on ¬A , we use the Chernoff bound: for a fixed constant > 0 together with the obvious fact that (X − np) 2 ≤ n 2 to bound the remaining error The proof follows from using all the bounds presented above and combining them with Eqs. 4 and 5. ◻