Abstract
Directed acyclic graphs are the basic representation of the structure underlying Bayesian networks, which represent multivariate probability distributions. In many practical applications, such as the reverse engineering of gene regulatory networks, not only the estimation of model parameters but the reconstruction of the structure itself is of great interest. As well as for the assessment of different structure learning algorithms in simulation studies, a uniform sample from the space of directed acyclic graphs is required to evaluate the prevalence of certain structural features. Here we analyse how to sample acyclic digraphs uniformly at random through recursive enumeration, an approach previously thought too computationally involved. Based on complexity considerations, we discuss in particular how the enumeration directly provides an exact method, which avoids the convergence issues of the alternative Markov chain methods and is actually computationally much faster. The limiting behaviour of the distribution of acyclic digraphs then allows us to sample arbitrarily large graphs. Building on the ideas of recursive enumeration based sampling we also introduce a novel hybrid Markov chain with much faster convergence than current alternatives while still being easy to adapt to various restrictions. Finally we discuss how to include such restrictions in the combinatorial enumeration and the new hybrid Markov chain method for efficient uniform sampling of the corresponding graphs.
This is a preview of subscription content, log in to check access.


References
Alon, I., Rodeh, M.: Finding a minimum circuit in a graph. SIAM J. Comput. 7, 413–423 (1978)
Andersson, S.A., Madigan, D., Perlman, M.D.: A characterization of Markov equivalence classes for acyclic digraphs. Ann. Stat. 25, 505–541 (1997)
Bender, E.A., Robinson, R.W.: The asymptotic number of acyclic digraphs. II. J. Comb. Theory, Ser. B 44, 363–369 (1988)
Bender, E.A., Richmond, L.B., Robinson, R.W., Wormald, N.C.: The asymptotic number of acyclic digraphs. I. Combinatorica 6, 15–22 (1986)
Borboudakis, G., Tsamardinos, I.: Scoring Bayesian networks with informative, causal and associative priors. Preprint (2012). arXiv:1209.6561
Colombo, D., Maathuis, M.H., Kalisch, M., Richardson, T.S.: Learning high-dimensional directed acyclic graphs with latent and selection variables. Ann. Stat. 40, 294–321 (2012)
Daly, R., Shen, Q., Aitken, S.: Learning Bayesian networks: approaches and issues. Knowl. Eng. Rev. 26(2), 99–157 (2011)
Emmert-Streib, F., Glazko, G.V., Altay, G., de Matos Simoes, R.: Statistical inference and reverse engineering of gene regulatory networks from observational expression data. Front. Genet. 3, 8 (2012)
Friedman, N., Koller, D.: Being Bayesian about network structure. A Bayesian approach to structure discovery in Bayesian networks. Mach. Learn. 50, 95–125 (2003)
Friedman, N., Linial, M., Nachman, I., Pe’er, D.: Using Bayesian networks to analyze expression data. J. Comput. Biol. 7, 601–620 (2000)
Gillispie, S.B., Perlman, M.D.: The size distribution for Markov equivalence classes of acyclic digraph models. Artif. Intell. 141, 137–155 (2002)
Grzegorczyk, M., Husmeier, D.: Improving the structure MCMC sampler for Bayesian networks by introducing a new edge reversal move. Mach. Learn. 71, 265–305 (2008)
Ide, J.S., Cozman, F.G.: Random generation of Bayesian networks. In: Brazilian Symposium on Artificial Intelligence, pp. 366–375 (2002)
Ide, J.S., Cozman, F.G., Ramos, F.T.: Generating random Bayesian networks with constraints on induced width. In: European Conference on Artificial Intelligence, pp. 323–327 (2004)
Jiang, X., Neapolitan, R., Barmada, M.M., Visweswaran, S.: Learning genetic epistasis using Bayesian network scoring criteria. BMC Bioinform. 12, 89 (2011). doi:10.1186/1471-2105-12-89
Kalisch, M., Bühlmann, P.: Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J. Mach. Learn. Res. 8, 613–636 (2007)
Kalisch, M., Machler, M., Colombo, D., Maathuis, M.H., Buhlmann, P.: Causal inference using graphical models with the R package pcalg. J. Stat. Softw. 47, 1–26 (2012)
Lauritzen, S.L.: Graphical Models. Clarendon, Oxford (1996)
Liskovets, V.: On the number of maximal vertices of a random acyclic digraph. Theory Probab. Appl. 20, 401–409 (1976)
Madigan, D., York, J.: Bayesian graphical models for discrete data. Int. Stat. Rev. 63, 215–232 (1995)
Madigan, D., Andersson, S.A., Perlman, M.D., Volinsky, C.T.: Bayesian model averaging and model selection for Markov equivalence classes of acyclic digraphs. Commun. Stat., Theory Methods 25, 2493–2519 (1996)
McKay, B.D., Oggier, F.O., Royle, G.F., Sloane, N.J.A., Wanless, I.M., Wilf, H.S.: Acyclic digraphs and eigenvalues of (0, 1)-matrices. J. Integer Seq. 7, 04.3.3 (2004)
Melançon, G., Philippe, F.: Generating connected acyclic digraphs uniformly at random. Inf. Process. Lett. 90, 209–213 (2004)
Melançon, G., Dutour, I., Bousquet-Mélou, M.: Random generation of dags for graph drawing. Tech. rep. CWI INS-R 0005 (2000)
Melançon, G., Dutour, I., Bousquet-Mélou, M.: Random generation of directed acyclic graphs. Electron. Notes Discrete Math. 10, 202–207 (2001)
Neapolitan, R.E.: Learning Bayesian Networks. Prentice Hall, New York (2004)
Peña, J.M.: Approximate counting of graphical models via MCMC. In: Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, pp. 352–359 (2007)
Robinson, R.W.: Enumeration of acyclic digraphs. In: Proceedings of the Second Chapel Hill Conference on Combinatorial Mathematics and Its Applications, University of North Carolina, Chapel Hill, pp. 391–399 (1970)
Robinson, R.W.: Counting labeled acyclic digraphs. In: New Directions in the Theory of Graphs, pp. 239–273. Academic Press, New York (1973)
Robinson, R.W.: Counting unlabeled acyclic digraphs. In: Combinatorial Mathematics V. Springer Lecture Notes in Mathematics, vol. 622, pp. 28–43 (1977)
Scutari, M.: Learning Bayesian networks with the bnlearn R package. J. Stat. Softw. 35, 1–22 (2010)
Sloane, N.J.A.: The on-line encyclopedia of integer sequences (2013). http://oeis.org
Stanley, R.P.: Acyclic orientations of graphs. Discrete Math. 5, 171–178 (1973)
Steinsky, B.: Enumeration of labeled chain graphs and labeled essential directed acyclic graphs. Discrete Math. 270, 267–278 (2003)
Steinsky, B.: Asymptotic behaviour of the number of labelled essential acyclic digraphs and labelled chain graphs. Graphs Comb. 20, 399–411 (2004)
Steinsky, B.: Enumeration of labelled essential graphs. Ars Comb. 111, 485–494 (2013)
Author information
Affiliations
Corresponding author
Appendices
Appendix A: Convergence of the Markov chain sampler
To explore the convergence of the Markov chain sampler, consider the spectral decomposition of the real, symmetric transition matrix T
in terms of its real eigenvalues λ, which can be labelled by their size
and orthonormal eigenvectors v. The vector v 1 is simply \((1,\ldots ,1)'/\sqrt{a_{n}}\) and responsible for the underlying uniform distribution. The matrix
then converges to this uniform background with a rate depending on the remaining eigenvalues and λ 2 in particular. This is easiest to see in terms of the Frobenius norm, which, setting \(S_{j} = T^{j} - v_{1}v_{1}' \) to be the transition matrix with the uniform background removed, satisfies
as T 0 is the identity matrix. From the resulting inequality,
since the Frobenius norm involves a sum over the elements squared, it follows that every element of T j must approach the uniform background exponentially at least as fast as ∼exp(jlog|λ 2|), or
for some constants C ml . A similar inequality likewise holds for the maximum and minimum elements of T j and their difference with corresponding constants. We can obtain upper bounds for the constants by returning to (32).
The comparison to (35) directly gives C ml <a n .
For the irreducible matrix U=T 2L, its powers U k converge to uniformity ∼exp(−αk) with a rate given by
which is simply minus the log of the largest eigenvalue of U below the one at unity. For the difference from the uniform background to converge on the scale below 1/a n , as discussed in Sect. 3.1, (35) provides an upper bound that −jlog|λ 2|, or equivalently −2Lklog|λ 2|, be of order n 2.
For comparison with other methods, and in particular the enumeration method studied in this paper, it would be useful to obtain a tight lower bound for convergence on the scale well below 1/a n . If v 2, like v 1, also had its weight evenly spread among all a n DAGs, for example if v 2 were \((\pm1,\ldots, \pm1)'/\sqrt{a_{n}}\), then the term \(\lambda_{2} v_{2} v_{2}'\) would start at the scale of 1/a n and converge directly for −jlog|λ 2| of order 1. The overall convergence would depend on the smaller eigenvalues and how the weight of their eigenvectors is spread amongst the DAGs. For a better handle on this, we can focus on the diagonal elements of the transition matrix in (30), which we can write as
where the elements of the diagonal matrices D i are real, positive and satisfy
The diagonal elements of T depend on the number of edges in each DAG and how they are arranged, but we can consider the two extremes. For the empty DAG with no edges, the probability of staying still is 1/n, so if this DAG is chosen as the first element of the transition matrix
At the other end of the spectrum are all the n! DAGs with L arcs and a staying probability of 1/2+1/2n. If one of them is chosen as the last element of the transition matrix
Assuming the eigenvalues are all positive, this along with (39) implies that the eigenvalues must cover a range from O(1) to O(1/n) and that the DAG with no edges must have its weight (in the matrices D i) preferentially spread over the smaller eigenvalues. Similarly, the DAGs with as many edges as possible concentrate their weight over the larger eigenvalues. Intuitively, the different staying probabilities of the DAGs is encoded in how far down the chain of eigenvalues the diagonal contribution is stored.
In fact, looking at the examples for n=2 and n=3 for which we can fill out the transition matrix, we find that matrices are positive definite and that we have repeated eigenvalues for n=3. Looking at the sum over the largest eigenvalues below 1,
we find that almost all of the weight is evenly spread amongst the n! DAGs with the highest number of edges. This suggests that the corresponding diagonal elements of the transition matrix are bounded below by
Convergence on the required scale still requires −jlog|λ 2| to be order n 2 to get the right hand side below 1/a n . When moving to the irreducible matrix U=T 2L, we have α=−2Llog|λ 2| when combined with (37), making j of the order of n 4/α, or k of the order n 2/α as in Sect. 3.
Appendix B: Complexity of the enumeration method
The time complexity of computing all the integers a
j,k
, b
j,k
as well as the totals a
j
for j up to n is now considered. A binary representation is used to treat the integers, and from (1) it follows that the number of bits needed to store a
j
grows like
or is of order j
2. Once all the a
j,k
for all j up to n−1 have been calculated, the b
n,k
can be computed using (6) in the following way. For k>1, for each s first multiply a
m,s
by (2k−1)s. This can be done in s steps where first a simple shift of the binary representation is performed to multiply by 2k. Then a copy of the previous number is subtracted, an operation which takes a time proportional to the length of its binary representation which is bounded by n
2. For each s, calculating the term in the sum takes O(sn
2) while the final multiplication by 2k(m−s) is again just a shift of the binary sequence. Finding all the terms in the sum is then O(n
3). Adding the terms to obtain b
n,k
then means adding up to n sequences of length up to n
2 which is also O(n
3).
Next the a
n,k
are obtained by multiplying by the binomial coefficients. These can be calculated recursively without any complexity overhead over recursively calculating b
j,k
. The binomial coefficients also have a binary length bounded by n so that multiplying b
n,k
by
is still below O(n
3). However the b
n,k
need to be calculated for all 1≤k≤n, which leads to a complexity of O(n
4) for computing a
n,k
and b
n,k
given all the previous values. Finding a
n
by summing the a
n,k
is also order n
3 so it does not further affect the complexity.
Computing all the integers a j,k , b j,k as well as the totals a j for j up to n then involves repeating the above process n times giving a final complexity of O(n 5).
The above steps provide an upper bound for completing the first step of the algorithm in Sect. 4 but it uses the assumption that all of the a n,k have a similar length to a n . As seen in Sect. 5 though, a n,k has a similar length to a n only for a limited number of k and the length then decays to 1 for a n,n . Looking at the distribution of slog(a m,s ) can then give a more accurate estimate of the complexity of finding all the terms in the sum to calculate b n,k . This also gives a lower bound since at each of the s steps in the simplified algorithm above the length of the binary representation is actually increased. Numerically, this distribution has a single peak for s just beyond m/2 but its maximal value seems to grow like n 3 which also leads to a total complexity of O(n 5).
When sampling the DAGs, the first step of finding k given an integer between 1 and a n involves subtracting up to n numbers of binary length up to n 2 and is O(n 3). Then given (n,k) we look to sample (m,s) again using the sums over s that appear in (6). As discussed above, performing the sum is O(n 3) while in the end the sampling is performed up to n times which seems to give a total complexity of O(n 4). However, the sum of the outpoints sampled has to be exactly n while the complexity of sampling each k i is bounded by \((k_{i}+1-\delta_{k_{i},1})n^{2}\). With ∑ i k i =n, then ∑ i (k i +1)n 2≤2n 3 which reduces the total complexity to O(n 3). Also, since there is effectively no chance of choosing a large k i as discussed in Sect. 5 the complexity of sampling k 1 and each of the following k i reduces to O(n 2) immediately leading to a total complexity of O(n 3).
Appendix C: Pseudocode for uniform DAG sampling
The code uses arbitrary precision integers as in Maple or as provided by the GMP library and the ‘bigz’ package in R. First we recursively calculate and store the numbers a n,k and b n,k for n≤N

If the binomial coefficients are not readily available they can be built recursively in the same loops with no computational overhead. Next we sample an integer between 1 and a n

where ‘rand{0,1}’ provides a 0 or a 1 with probability 1/2. Now we use the integer to sample the number of outpoints k

which we store as the first element of a vector k. The current value of r should be between 1 and a n,k which we rescale to between 1 and b n,k

Next we recursively generate the outpoints in the loop

The resulting vector k should be of length I and have its elements sum to n. We can now use this to fill the lower triangle of an empty matrix Q

Finally we sample a permutation π by randomly drawing all the integers {1,…,n} without replacement. To obtain the adjacency matrix R of the uniformly sampled DAG we correspondingly permute the column and row labels of Q via R π(m),π(l)=Q m,l .
Appendix D: Limiting conditional outpoint distribution
In the expression defining b n,k in (6), we can reorganise the powers of two
and artificially bring out a factor of a m . For large m, the fraction a m,s /a m can be replaced by its limit A s since it is only non-zero for a small number of s at a given accuracy
Given k, to sample the next number of outpoints s we can sample uniformly between 1 and b n,k as in Sect. 4.1. The limiting probability of sampling each value of s is then
which through normalisation reduces to (14).
Rights and permissions
About this article
Cite this article
Kuipers, J., Moffa, G. Uniform random generation of large acyclic digraphs. Stat Comput 25, 227–242 (2015). https://doi.org/10.1007/s11222-013-9428-y
Received:
Accepted:
Published:
Issue Date:
Keywords
- Random graph generation
- Acyclic digraphs
- Recursive enumeration
- Bayesian networks
- MCMC