Uniform random generation of large acyclic digraphs

Abstract

Directed acyclic graphs are the basic representation of the structure underlying Bayesian networks, which represent multivariate probability distributions. In many practical applications, such as the reverse engineering of gene regulatory networks, not only the estimation of model parameters but the reconstruction of the structure itself is of great interest. As well as for the assessment of different structure learning algorithms in simulation studies, a uniform sample from the space of directed acyclic graphs is required to evaluate the prevalence of certain structural features. Here we analyse how to sample acyclic digraphs uniformly at random through recursive enumeration, an approach previously thought too computationally involved. Based on complexity considerations, we discuss in particular how the enumeration directly provides an exact method, which avoids the convergence issues of the alternative Markov chain methods and is actually computationally much faster. The limiting behaviour of the distribution of acyclic digraphs then allows us to sample arbitrarily large graphs. Building on the ideas of recursive enumeration based sampling we also introduce a novel hybrid Markov chain with much faster convergence than current alternatives while still being easy to adapt to various restrictions. Finally we discuss how to include such restrictions in the combinatorial enumeration and the new hybrid Markov chain method for efficient uniform sampling of the corresponding graphs.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2

References

  1. Alon, I., Rodeh, M.: Finding a minimum circuit in a graph. SIAM J. Comput. 7, 413–423 (1978)

    Article  MATH  MathSciNet  Google Scholar 

  2. Andersson, S.A., Madigan, D., Perlman, M.D.: A characterization of Markov equivalence classes for acyclic digraphs. Ann. Stat. 25, 505–541 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  3. Bender, E.A., Robinson, R.W.: The asymptotic number of acyclic digraphs. II. J. Comb. Theory, Ser. B 44, 363–369 (1988)

    Article  MATH  MathSciNet  Google Scholar 

  4. Bender, E.A., Richmond, L.B., Robinson, R.W., Wormald, N.C.: The asymptotic number of acyclic digraphs. I. Combinatorica 6, 15–22 (1986)

    Article  MATH  MathSciNet  Google Scholar 

  5. Borboudakis, G., Tsamardinos, I.: Scoring Bayesian networks with informative, causal and associative priors. Preprint (2012). arXiv:1209.6561

  6. Colombo, D., Maathuis, M.H., Kalisch, M., Richardson, T.S.: Learning high-dimensional directed acyclic graphs with latent and selection variables. Ann. Stat. 40, 294–321 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  7. Daly, R., Shen, Q., Aitken, S.: Learning Bayesian networks: approaches and issues. Knowl. Eng. Rev. 26(2), 99–157 (2011)

    Article  Google Scholar 

  8. Emmert-Streib, F., Glazko, G.V., Altay, G., de Matos Simoes, R.: Statistical inference and reverse engineering of gene regulatory networks from observational expression data. Front. Genet. 3, 8 (2012)

    Article  Google Scholar 

  9. Friedman, N., Koller, D.: Being Bayesian about network structure. A Bayesian approach to structure discovery in Bayesian networks. Mach. Learn. 50, 95–125 (2003)

    Article  MATH  Google Scholar 

  10. Friedman, N., Linial, M., Nachman, I., Pe’er, D.: Using Bayesian networks to analyze expression data. J. Comput. Biol. 7, 601–620 (2000)

    Article  Google Scholar 

  11. Gillispie, S.B., Perlman, M.D.: The size distribution for Markov equivalence classes of acyclic digraph models. Artif. Intell. 141, 137–155 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  12. Grzegorczyk, M., Husmeier, D.: Improving the structure MCMC sampler for Bayesian networks by introducing a new edge reversal move. Mach. Learn. 71, 265–305 (2008)

    Article  Google Scholar 

  13. Ide, J.S., Cozman, F.G.: Random generation of Bayesian networks. In: Brazilian Symposium on Artificial Intelligence, pp. 366–375 (2002)

    Google Scholar 

  14. Ide, J.S., Cozman, F.G., Ramos, F.T.: Generating random Bayesian networks with constraints on induced width. In: European Conference on Artificial Intelligence, pp. 323–327 (2004)

    Google Scholar 

  15. Jiang, X., Neapolitan, R., Barmada, M.M., Visweswaran, S.: Learning genetic epistasis using Bayesian network scoring criteria. BMC Bioinform. 12, 89 (2011). doi:10.1186/1471-2105-12-89

    Article  Google Scholar 

  16. Kalisch, M., Bühlmann, P.: Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J. Mach. Learn. Res. 8, 613–636 (2007)

    MATH  Google Scholar 

  17. Kalisch, M., Machler, M., Colombo, D., Maathuis, M.H., Buhlmann, P.: Causal inference using graphical models with the R package pcalg. J. Stat. Softw. 47, 1–26 (2012)

    Google Scholar 

  18. Lauritzen, S.L.: Graphical Models. Clarendon, Oxford (1996)

    Google Scholar 

  19. Liskovets, V.: On the number of maximal vertices of a random acyclic digraph. Theory Probab. Appl. 20, 401–409 (1976)

    Article  Google Scholar 

  20. Madigan, D., York, J.: Bayesian graphical models for discrete data. Int. Stat. Rev. 63, 215–232 (1995)

    Article  MATH  Google Scholar 

  21. Madigan, D., Andersson, S.A., Perlman, M.D., Volinsky, C.T.: Bayesian model averaging and model selection for Markov equivalence classes of acyclic digraphs. Commun. Stat., Theory Methods 25, 2493–2519 (1996)

    Article  MATH  Google Scholar 

  22. McKay, B.D., Oggier, F.O., Royle, G.F., Sloane, N.J.A., Wanless, I.M., Wilf, H.S.: Acyclic digraphs and eigenvalues of (0, 1)-matrices. J. Integer Seq. 7, 04.3.3 (2004)

    MathSciNet  Google Scholar 

  23. Melançon, G., Philippe, F.: Generating connected acyclic digraphs uniformly at random. Inf. Process. Lett. 90, 209–213 (2004)

    Article  MATH  Google Scholar 

  24. Melançon, G., Dutour, I., Bousquet-Mélou, M.: Random generation of dags for graph drawing. Tech. rep. CWI INS-R 0005 (2000)

  25. Melançon, G., Dutour, I., Bousquet-Mélou, M.: Random generation of directed acyclic graphs. Electron. Notes Discrete Math. 10, 202–207 (2001)

    Article  Google Scholar 

  26. Neapolitan, R.E.: Learning Bayesian Networks. Prentice Hall, New York (2004)

    Google Scholar 

  27. Peña, J.M.: Approximate counting of graphical models via MCMC. In: Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, pp. 352–359 (2007)

    Google Scholar 

  28. Robinson, R.W.: Enumeration of acyclic digraphs. In: Proceedings of the Second Chapel Hill Conference on Combinatorial Mathematics and Its Applications, University of North Carolina, Chapel Hill, pp. 391–399 (1970)

    Google Scholar 

  29. Robinson, R.W.: Counting labeled acyclic digraphs. In: New Directions in the Theory of Graphs, pp. 239–273. Academic Press, New York (1973)

    Google Scholar 

  30. Robinson, R.W.: Counting unlabeled acyclic digraphs. In: Combinatorial Mathematics V. Springer Lecture Notes in Mathematics, vol. 622, pp. 28–43 (1977)

    Google Scholar 

  31. Scutari, M.: Learning Bayesian networks with the bnlearn R package. J. Stat. Softw. 35, 1–22 (2010)

    Google Scholar 

  32. Sloane, N.J.A.: The on-line encyclopedia of integer sequences (2013). http://oeis.org

  33. Stanley, R.P.: Acyclic orientations of graphs. Discrete Math. 5, 171–178 (1973)

    Article  MATH  MathSciNet  Google Scholar 

  34. Steinsky, B.: Enumeration of labeled chain graphs and labeled essential directed acyclic graphs. Discrete Math. 270, 267–278 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  35. Steinsky, B.: Asymptotic behaviour of the number of labelled essential acyclic digraphs and labelled chain graphs. Graphs Comb. 20, 399–411 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  36. Steinsky, B.: Enumeration of labelled essential graphs. Ars Comb. 111, 485–494 (2013)

    MATH  MathSciNet  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jack Kuipers.

Appendices

Appendix A: Convergence of the Markov chain sampler

To explore the convergence of the Markov chain sampler, consider the spectral decomposition of the real, symmetric transition matrix T

$$ T = \sum_{i=1}^{a_n} \lambda_i v_i v_i', $$
(30)

in terms of its real eigenvalues λ, which can be labelled by their size

$$ 1=\lambda_1 > \vert\lambda_2 \vert\geq\vert \lambda_3 \vert\geq \cdots\geq\vert\lambda_{a_n} \vert, $$
(31)

and orthonormal eigenvectors v. The vector v 1 is simply \((1,\ldots ,1)'/\sqrt{a_{n}}\) and responsible for the underlying uniform distribution. The matrix

$$ T^j = v_1v_1' + \sum_{i=2}^{a_n} \lambda_i^j v_i v_i', $$
(32)

then converges to this uniform background with a rate depending on the remaining eigenvalues and λ 2 in particular. This is easiest to see in terms of the Frobenius norm, which, setting \(S_{j} = T^{j} - v_{1}v_{1}' \) to be the transition matrix with the uniform background removed, satisfies

$$\begin{aligned} \Vert S_j \Vert =& \sqrt{\sum _{m,l=1}^{a_n}(S_j)_{ml}^2} = \sqrt{\sum_{i=2}^{a_n} \lambda_i^{2j}} \\ \leq&\vert\lambda_2\vert ^{j} \sqrt {a_n-1} = \vert\lambda_2 \vert^{j} \Vert S_0 \Vert , \end{aligned}$$
(33)

as T 0 is the identity matrix. From the resulting inequality,

$$ \bigl\Vert T^j - v_1v_1' \bigr\Vert \leq\vert\lambda_2 \vert^j \bigl\Vert I - v_1v_1' \bigr\Vert , $$
(34)

since the Frobenius norm involves a sum over the elements squared, it follows that every element of T j must approach the uniform background exponentially at least as fast as ∼exp(jlog|λ 2|), or

$$ \biggl\vert \bigl(T^j\bigr)_{ml} - \frac{1}{a_n}\biggr\vert \leq C_{ml} \exp\bigl(j \log\vert \lambda_2\vert\bigr) $$
(35)

for some constants C ml . A similar inequality likewise holds for the maximum and minimum elements of T j and their difference with corresponding constants. We can obtain upper bounds for the constants by returning to (32).

$$\begin{aligned} \biggl\vert \bigl(T^j\bigr)_{ml} - \frac{1}{a_n}\biggr\vert =& \Biggl\vert \sum_{i=2}^{a_n} \lambda _i^j (v_i)_m (v_i)_l \Biggr\vert \leq \vert\lambda_2 \vert^j \sum_{i=2}^{a_n} \bigl\vert (v_i)_m (v_i)_l \bigr\vert \\ \leq&\vert\lambda_2\vert^j (a_n-1). \end{aligned}$$
(36)

The comparison to (35) directly gives C ml <a n .

For the irreducible matrix U=T 2L, its powers U k converge to uniformity ∼exp(−αk) with a rate given by

$$ \alpha\geq-2L\log\vert\lambda_2\vert $$
(37)

which is simply minus the log of the largest eigenvalue of U below the one at unity. For the difference from the uniform background to converge on the scale below 1/a n , as discussed in Sect. 3.1, (35) provides an upper bound that −jlog|λ 2|, or equivalently −2Lklog|λ 2|, be of order n 2.

For comparison with other methods, and in particular the enumeration method studied in this paper, it would be useful to obtain a tight lower bound for convergence on the scale well below 1/a n . If v 2, like v 1, also had its weight evenly spread among all a n DAGs, for example if v 2 were \((\pm1,\ldots, \pm1)'/\sqrt{a_{n}}\), then the term \(\lambda_{2} v_{2} v_{2}'\) would start at the scale of 1/a n and converge directly for −jlog|λ 2| of order 1. The overall convergence would depend on the smaller eigenvalues and how the weight of their eigenvectors is spread amongst the DAGs. For a better handle on this, we can focus on the diagonal elements of the transition matrix in (30), which we can write as

$$ \mathrm{diag} (T) = \sum_{i=1}^{a_n} \lambda_i D^{i}, \quad D^{i} = \mathrm{diag} \bigl(v_i v_i' \bigr), $$
(38)

where the elements of the diagonal matrices D i are real, positive and satisfy

$$ \sum_{i=1}^{a_n} D^{j}_{i,i} = 1, \qquad\sum_{i=1}^{a_n} D^{i}_{j,j} = 1. $$
(39)

The diagonal elements of T depend on the number of edges in each DAG and how they are arranged, but we can consider the two extremes. For the empty DAG with no edges, the probability of staying still is 1/n, so if this DAG is chosen as the first element of the transition matrix

$$ \sum_{i=1}^{a_n}\lambda_i D^{i}_{1,1} =\frac{1}{n}. $$
(40)

At the other end of the spectrum are all the n! DAGs with L arcs and a staying probability of 1/2+1/2n. If one of them is chosen as the last element of the transition matrix

$$ \sum_{i=1}^{a_n}\lambda_i D^{i}_{a_n,a_n} = \frac{1}{2} +\frac {1}{2n}. $$
(41)

Assuming the eigenvalues are all positive, this along with (39) implies that the eigenvalues must cover a range from O(1) to O(1/n) and that the DAG with no edges must have its weight (in the matrices D i) preferentially spread over the smaller eigenvalues. Similarly, the DAGs with as many edges as possible concentrate their weight over the larger eigenvalues. Intuitively, the different staying probabilities of the DAGs is encoded in how far down the chain of eigenvalues the diagonal contribution is stored.

In fact, looking at the examples for n=2 and n=3 for which we can fill out the transition matrix, we find that matrices are positive definite and that we have repeated eigenvalues for n=3. Looking at the sum over the largest eigenvalues below 1,

$$ \sum_{i}D^i \delta_{\lambda_2,\lambda_i}, $$
(42)

we find that almost all of the weight is evenly spread amongst the n! DAGs with the highest number of edges. This suggests that the corresponding diagonal elements of the transition matrix are bounded below by

$$ \bigl(T^{j}\bigr)_{a_n,a_n} -\frac{1}{a_n} \gtrsim \frac{\vert\lambda_2\vert ^j}{n!}. $$
(43)

Convergence on the required scale still requires −jlog|λ 2| to be order n 2 to get the right hand side below 1/a n . When moving to the irreducible matrix U=T 2L, we have α=−2Llog|λ 2| when combined with (37), making j of the order of n 4/α, or k of the order n 2/α as in Sect. 3.

Appendix B: Complexity of the enumeration method

The time complexity of computing all the integers a j,k , b j,k as well as the totals a j for j up to n is now considered. A binary representation is used to treat the integers, and from (1) it follows that the number of bits needed to store a j grows like or is of order j 2. Once all the a j,k for all j up to n−1 have been calculated, the b n,k can be computed using (6) in the following way. For k>1, for each s first multiply a m,s by (2k−1)s. This can be done in s steps where first a simple shift of the binary representation is performed to multiply by 2k. Then a copy of the previous number is subtracted, an operation which takes a time proportional to the length of its binary representation which is bounded by n 2. For each s, calculating the term in the sum takes O(sn 2) while the final multiplication by 2k(ms) is again just a shift of the binary sequence. Finding all the terms in the sum is then O(n 3). Adding the terms to obtain b n,k then means adding up to n sequences of length up to n 2 which is also O(n 3).

Next the a n,k are obtained by multiplying by the binomial coefficients. These can be calculated recursively without any complexity overhead over recursively calculating b j,k . The binomial coefficients also have a binary length bounded by n so that multiplying b n,k by is still below O(n 3). However the b n,k need to be calculated for all 1≤kn, which leads to a complexity of O(n 4) for computing a n,k and b n,k given all the previous values. Finding a n by summing the a n,k is also order n 3 so it does not further affect the complexity.

Computing all the integers a j,k , b j,k as well as the totals a j for j up to n then involves repeating the above process n times giving a final complexity of O(n 5).

The above steps provide an upper bound for completing the first step of the algorithm in Sect. 4 but it uses the assumption that all of the a n,k have a similar length to a n . As seen in Sect. 5 though, a n,k has a similar length to a n only for a limited number of k and the length then decays to 1 for a n,n . Looking at the distribution of slog(a m,s ) can then give a more accurate estimate of the complexity of finding all the terms in the sum to calculate b n,k . This also gives a lower bound since at each of the s steps in the simplified algorithm above the length of the binary representation is actually increased. Numerically, this distribution has a single peak for s just beyond m/2 but its maximal value seems to grow like n 3 which also leads to a total complexity of O(n 5).

When sampling the DAGs, the first step of finding k given an integer between 1 and a n involves subtracting up to n numbers of binary length up to n 2 and is O(n 3). Then given (n,k) we look to sample (m,s) again using the sums over s that appear in (6). As discussed above, performing the sum is O(n 3) while in the end the sampling is performed up to n times which seems to give a total complexity of O(n 4). However, the sum of the outpoints sampled has to be exactly n while the complexity of sampling each k i is bounded by \((k_{i}+1-\delta_{k_{i},1})n^{2}\). With ∑ i k i =n, then ∑ i (k i +1)n 2≤2n 3 which reduces the total complexity to O(n 3). Also, since there is effectively no chance of choosing a large k i as discussed in Sect. 5 the complexity of sampling k 1 and each of the following k i reduces to O(n 2) immediately leading to a total complexity of O(n 3).

Appendix C: Pseudocode for uniform DAG sampling

The code uses arbitrary precision integers as in Maple or as provided by the GMP library and the ‘bigz’ package in R. First we recursively calculate and store the numbers a n,k and b n,k for nN

figurea

If the binomial coefficients are not readily available they can be built recursively in the same loops with no computational overhead. Next we sample an integer between 1 and a n

figureb

where ‘rand{0,1}’ provides a 0 or a 1 with probability 1/2. Now we use the integer to sample the number of outpoints k

figurec

which we store as the first element of a vector k. The current value of r should be between 1 and a n,k which we rescale to between 1 and b n,k

figured

Next we recursively generate the outpoints in the loop

figuree

The resulting vector k should be of length I and have its elements sum to n. We can now use this to fill the lower triangle of an empty matrix Q

figuref

Finally we sample a permutation π by randomly drawing all the integers {1,…,n} without replacement. To obtain the adjacency matrix R of the uniformly sampled DAG we correspondingly permute the column and row labels of Q via R π(m),π(l)=Q m,l .

Appendix D: Limiting conditional outpoint distribution

In the expression defining b n,k in (6), we can reorganise the powers of two

$$ b_{n,k}=2^{km}a_m\sum _{s=1}^{m} \biggl(1-\frac{1}{2^k} \biggr)^{s} \frac {a_{m,s}}{a_m}, $$
(44)

and artificially bring out a factor of a m . For large m, the fraction a m,s /a m can be replaced by its limit A s since it is only non-zero for a small number of s at a given accuracy

$$ b_{n,k}\propto\sum_{s} \biggl(1-\frac{1}{2^k} \biggr)^{s} A_s. $$
(45)

Given k, to sample the next number of outpoints s we can sample uniformly between 1 and b n,k as in Sect. 4.1. The limiting probability of sampling each value of s is then

$$ P(s\mid k) \propto \biggl(1-\frac{1}{2^k} \biggr)^{s} A_s, $$
(46)

which through normalisation reduces to (14).

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Kuipers, J., Moffa, G. Uniform random generation of large acyclic digraphs. Stat Comput 25, 227–242 (2015). https://doi.org/10.1007/s11222-013-9428-y

Download citation

Keywords

  • Random graph generation
  • Acyclic digraphs
  • Recursive enumeration
  • Bayesian networks
  • MCMC