1 Introduction

Decomposable graphs, sometimes also called triangulated or chordal graphs, are characterized by the property that every cycle of length more than three has an edge (or chord) joining two nonconsecutive vertices (Lauritzen 1996). Another characteristic property is that these graphs can be recursively decomposed into smaller graphs, called cliques, where every pair of vertices are connected by an edge. In this paper we rely on the fact that a graph is decomposable if and only if its cliques can be arranged into a so-called junction tree. Figure 1 shows an example of a decomposable graph along with one of its junction tree representations. Decomposable graphs and their junction-tree representations as auxiliary data structure have been used in various contexts; examples include computational geometry, estimation of large-scale random graph models with local dependence, statistical inference (such as sparse covariance- and concentration-matrix computation), contingency-table analysis, probabilistic graphical models, and message passing; see e.g. (Eppstein 2009; Lauritzen 1996; Pearl 1997).

This work is mainly driven by application of decomposability to probabilistic graphical models for representing conditional independence relations. From a statistical point of view, learning the underlying graph structure based on observed data in such models is particularly convenient since the graph likelihood has a closed form. However, the complexity of the graph space makes estimators such as the maximum likelihood graph estimates intractable, which has lead to an increasing interest in Bayesian methods, in particular in Monte Carlo methods for sampling-based approximations of the graph posterior.

The available methods are based on Markov chain Monte Carlo  (MCMC)  schemes (Tierney 1994), especially variations of the Metropolis–Hastings algorithm (Hastings 1970; Metropolis et al. 1953), where new graphs are proposed by means of random single-edge perturbations, and the set of possible moves generated by subjecting a given graph to such perturbations defines a neighborhood in the decomposable-graph space; see e.g. (Frydenberg and Lauritzen 1989; Giudici and Green 1999; Green and Thomas 2013; Thomas and Green 2009a). However, since the only vertices that may be connected by an edge in a (connected) decomposable graph while maintaining decomposability are those that already have a neighbour in common and the removable edges are necessarily contained in exactly one clique, operations on the edge set are inherently local. As a consequence, an MCMC sampler based on such moves will most likely suffer from mixing problems (Giudici and Green 1999; Green and Thomas 2013).

Green and Thomas (2013) showed that edge moves on decomposable graph space can sometimes be designed more easily if one operates on the extended junction-tree space. While this approach is mainly computationally motivated, it is feasible also from a statistical point of view; indeed, a given distribution on the space of decomposable graphs can always be embedded into an extended version defined on the space of junction trees in such a way that the push-forward distribution of the extended distribution with respect to the underlying graph equals the given distribution on the decomposable-graph space. Thus, by running an MCMC  sampler producing a trajectory of junction trees targeting the extended distribution, an MCMC  trajectory targeting the original distribution is obtained as a by-product by simply extracting the underlying graphs of the trees in the former sequence.

Against this background, it is desirable to explore alternative ways of simulating decomposable graphs. In the present paper we take a different approach than the above, which instead of altering the edge set of a graph with a fixed set of vertices, builds new graphs incrementally, starting from the empty graph and adding vertices one by one. More specifically, we present two novel stochastic algorithms operating on junction-tree structures: the junction-tree expander  (JTE, or the Christmas-tree algorithm) and the junction-tree collapser (JTC). The JTE  (JTC) expands (collapses) a junction tree by randomly adding (removing) one vertex to (from) the underlying decomposable graph. As we shall see, the JTE  and JTC  have two theoretical properties that are of fundamental importance in Monte Carlo simulation. First, the transition probabilities of the induced Markov kernels are available in a closed form and can be computed efficiently; second, the JTE  algorithm is able to generate, with positive probability, when applied sequentially, all junction trees with a given number of vertices in its underlying graph.

In order to illustrate their application potential, we employ jointly the JTE  and the JTC to construct a sequential Monte Carlo (SMC) sampler (Del Moral et al. 2006), sampling from more or less arbitrary distributions defined on spaces of decomposable graphs. In this construction, which relies on the above-mentioned junction-tree embedding proposed by Green and Thomas (2013), the JTC  is used to extend the target distribution to a path space of junction trees of increasing dimension, whereas the JTE  is used to generating proposals on this new space.

Using the SMC  approach, we are able to provide unbiased estimates of the numbers of decomposable graphs and junction trees for any given number of vertices. This importance-sampling approach to the combinatorics of decomposable graphs and junction trees is the first of its kind. In the follow-up paper (Olsson et al. 2019), we cast further such an SMC  sampler into the framework of particle Gibbs samplers (Andrieu et al. 2010). The resulting MCMC algorithm, which relies heavily on on the JTE  and JTC  derived in the present paper, allows for global MCMC moves across the decomposable-graph space and, consequently, weakly correlated samples and fast mixing.

The JTE  is related to other existing approaches of generating junction trees. For instance, the algorithm presented in Markenzon et al. (2008) has similarities to ours in the sense that it expands the underlying graph incrementally in each step of the algorithm. However, unlike our proposed JTE, this algorithm is restricted to connected decomposable graphs and transition probabilities are not directly provided. A completely different strategy for decomposable-graph sampling based on tree-dependent bipartiet graphs is presented in Elmasri (2017a, 2017b). A recent MCMC algorithm for joint sampling of general undirected graphs and corresponding concentration matrices in Gaussian graphical models is presented in van den Boom et al. (2022).

The rest of this paper is structured as follows. Sect. 2 introduces notational conventions and a short background on decomposable graphs and junction trees. For a more detailed presentation, the reader is referred to e.g. (Blair and Peyton 1993) or (Lauritzen 1996). Sect. 3 and Sect. 4 present the JTE  and the JTC , respectively, along with their corresponding transition probabilities. Sect. 5 provides a novel factorisation of the number of junction trees of a decomposable graph and demonstrates its computational advantage. The application of the JTE  and the JTC  in the framework of SMC  sampling is found in Sect. 6 and Sect. 7 contains our numerical study. Appendix A contains detailed algorithm descriptions along with the proofs of lemmas and theorems stated in the paper, whereas Appendix B provides an algorithm, originally presented in (Thomas and Green 2009b), for randomly connecting a forest into a tree.

Finally, we remark that the code used for generating the examples in the paper is contained in the Python library trilearn available at https://github.com/felixleopoldo/trilearn. The junction-tree expander is also available through Benchpress (Rios et al. 2021), a recent software that enables execution and seamless comparison between state-of-the-art structure learning algorithms. The junction-tree expander is implemented as a module in Benchpress to simulate graphs underlying data for benchmarking.

2 Preliminaries

2.1 Notational convention

For any finite set \(a\), we denote its power set by \(\varvec{\wp }(a)\). The uniform distribution over the elements in \(a\) is denoted by . We assume that all random variables are well defined on a common probability space \((\Omega , \mathcal {F}, \mathbb {P})\).Abusing notation, we will always use the same notation for a random variable and a realisation of the same. Further, we will use the same notation for a distribution and its corresponding probability density function. For an arbitrary space \({\mathsf {X}}\), the support of a nonnegative function h defined on \({\mathsf {X}}\) is denoted by . For all sequences \((a_j)_{j = 1}^\ell \), we apply the convention . Moreover, for all sequences \((a_j)_{j = 1}^\ell \) of sets and all nonempty sets \(b\), we set . We denote by \({\mathbb {N}}\) the set of natural numbers \(\{1,2,\dots \}\) and by \({\mathbb {N}}_{p}\) the set \(\{1,\dots ,p\}\) for some \(p\in {\mathbb {N}}\).

The notation, \(\mathsf {pr}(\{ w_\ell \}_{\ell = 1}^N)\) is used to denote the categorical distribution induced by a set \(\{ w_\ell \}_{\ell = 1}^N\) of positive (possibly unnormalised) numbers. More specifically, writing \(x \sim \mathsf {pr}(\{ w_\ell \}_{\ell = 1}^N)\) means that the random variable x takes on the value with probability \(\textstyle w_\ell / \sum _{\ell ' = 1}^Nw_{\ell '}\).

2.2 Graph theory

A pair of a vertex set and an edge set , where is a set of unordered pairs such that , is called an undirected graph. Two vertices and \({y'}\) in are adjacent if they are directly connected by an edge, i.e., belongs to . The neighbors of a vertex is the set of vertices in adjacent to . A sequence of distinct vertices is called an --path, denoted by , if for all \(j\in \{2, \ldots , \ell \}\), belongs to . Two vertices and \({y'}\) are said to be connected if there exists an --path. Moreover, a graph is said to be connected if all pairs of vertices are connected. A graph is called a tree if there is a unique path between any pair of vertices in the graph. A connectivity component of a graph is a subset of vertices that are pairwise connected. A graph is a forest if all connectivity components induce distinct trees. Further, two graphs are said to be isomorphic if they have the same number of vertices and equivalent edge sets when disregarding the labels of the vertices.

Now, consider a general graph which we call . The order and the size of refer to the number of vertices and the number of edges , respectively. Let \(a\), \(b\), and \(s\) be subsets of ; then the set \(s\) separates \(a\) from \(b\) if for all and \({y'}\in b\), all paths intersect \(s\). We denote this by . The graph is complete if all vertices are adjacent to each other. A graph is a subgraph of if and . A subtree is a connected subgraph of a tree. For , the induced subgraph is the subgraph of with vertices and edge set given by the set of edges in having both endpoints in . A subset of is a complete set if it induces a complete subgraph. A complete subgraph is called a clique if it is not an induced subgraph of any other complete subgraph.

The primer interest of this paper regards decomposable graphs and the junction-tree representation.

Definition 1

A graph is decomposable if its cliques can be arranged in a so-called junction tree, i.e. a tree whose nodes are the cliques in , and where for any pair of cliques and in , the intersection is contained in each of the cliques on the unique path .

Note that a decomposable graph may have many junction-tree representations (referred to as a junction tree for the specific graph) whereas for any specific junction tree, the underlying graph is uniquely determined. For clarity, from now on we follow Green and Thomas (2013) and reserve the terms vertices and edges for the elements of . Vertices and edges of junctions trees will be referred to as nodes and links, respectively. Each link \((a, b)\) in a junction tree is associated with the intersection \(a\cap b\), which is referred to as a separator and denoted by \(s_{a,b}\). Note that, the empty set is also a valid separator and could separate any pair of cliques that belong to distinct connected components. The set of distinct separators in a junction tree with graph is denoted by . Since all junction-tree representations of a specific decomposable graph have the same set of separators, we may talk about the separators of a decomposable graph. In the following we consider a fixed sequence of vertices and denote by the space of decomposable graphs with vertex set . The space of junction-tree representations for graphs in is analogously denoted by . The graph corresponding to a junction tree is denoted by . We let denote the subtree induced by the nodes of a junction tree containing the separator \(s\) and let denote the forest obtained by deleting, in , the links associated with \(s\).

3 Expanding and collapsing junction trees

At the highest level, the JTE  can be described in a few main steps illustrated in Fig. 3. In the first step, the algorithm starts by drawing, at random, a subtree of the given tree (see Step 1 in Fig. 3). In the second step, a new vertex is connected to a random subset of each of the cliques in to form a new subtree , which is isomorphic to . The edges in are then removed and each of the nodes in are connected to the nodes in to which they stem from, while maintaining the junction tree property (see Step 2-4 in Fig. 3). On the other hand, the JTC  starts by selecting the unique subtree induced by a given vertex \({y'}\) (see Step 4 in Fig. 3). The second step amounts to drawing, for each clique in , a neighboring clique not containing \({y'}\), for which is substituted while maintaining the junction tree property (see Step 3-1 in Fig. 3). The two algorithms are complementary in the sense that the output obtained by subjecting a given tree to either the JTE  followed by the JTC, or, vice versa, the JTC  followed by the JTE, coincides with with positive probability.

3.1 Sampling subtrees

Before presenting our main algorithm for expanding junction trees, we present one of its crucial subroutines: an algorithm for random sampling of subtrees of a given, arbitrary tree. It takes two tuning parameters, \((\alpha , \beta ) \in (0,1)^2\), which together control the number of vertices in the subtree. The algorithm either, with probability \(1-\beta \), returns the empty tree or a breadth first tree traversal is performed, where new nodes are visited with probability \(\alpha \). Thus, the parameter \(\alpha \) controls the number of vertices in the subgraph given that it is nonempty. We call this algorithm the stochastic breath-first tree traversal and provide an outline below. Full details are given in Algorithm 3 in Appendix A.

Stochastic breadth-first tree traversal

Let be a tree.

Step 1. :

Perform a Bernoulli trial that with probability \(\beta \) determines if the subtree will be nonempty.

If the empty tree was sampled, return it. Otherwise, proceed according to the following steps.

Step 2. :

Sample a node uniformly at random from and add it to a list \( a\).

Step 3. :

Remove the first item, say , from \(a\) and add it to the set .

Step 4. :

Add independently each of the non-visited neighbors of to the end of \(a\) with probability \(\alpha \).

Step 5. :

If \(a\) is not empty, go to Step 2.

Step 6. :

Return the induced subtree .

The probability of extracting the induced subtree from by following the above steps is given by

where is the number of components in the forest . The factor stems from the fact that any vertex in is a valid starting vertex in the breadth-first traversal-like procedure and the probability of extracting a certain subtree is equal for each choice.

3.2 Expanding junction trees

In this section we present the main contribution of this paper, namely an algorithm for expanding randomly a given junction tree , \(m\in {\mathbb {N}}\), into a new junction tree such that is the induced subgraph of . This operation defines a Markov transition kernel , whose expression is derived at the end of this section. The full procedure, which in the following will be referred to as the junction-tree expander, is given below. Further details of these steps are provided in Algorithm 4 in Appendix A.

Junction tree expander

Let be a junction tree in .

Step 1. :

Sample a random subtree of .

If is empty, proceed as follows:

Step 2. :

Create a new node containing merely the vertex and connect it to an arbitrary node in .

Step 3. :

Cut the new tree at the empty separator to obtain a forest.

Step 4. :

Randomly reconnect the forest into a tree (see Appendix B).

If is non-empty, enumerate the nodes in as , and let, for each , be defined as the union of the separators associated with in . Proceed as follows:

Step 2\(^*\).:

For each node , draw uniformly at random a (possibly empty) subset of to create a new unique node , consisting of and the vertex . Note that for to be unique, has to be non-empty if any separator associated with in equals . If was engulfed in (i.e. ), simply delete .

Step 3\(^*\).:

To the nodes in , assign links which replicate the structure of . Then remove the links in and connect by a link each to its corresponding new node .

Step 4\(^*\).:

For each node , the neighbors whose links can be moved to while maintaining an equivalent junction tree, are distributed uniformly between and . The set of neighbors of is denoted by .

When using the subtree sampler provided in Algorithm 3 at Step 1, the parameters \(\alpha \) and \(\beta \) have clear impacts on the sparsity of the outcome of the JTE; more specifically, since each node in the selected subtree will give rise to a new node in , \(\alpha \) controls the number of nodes containing the new vertex . The parameter \(\beta \) is simply interpreted as the probability of being connected to some vertex in .

Example 1

We illustrate two possible scenarios for how the junction tree in Fig. 1 with underlying vertex set could be expanded by the vertex 10. Figure 2 shows the possible scenario where the subtree picked at Step 1 is empty. Figure 3 demonstrates the possible scenario where the subtree sampled at Step 1 contains the nodes , , and , colored in blue. The new nodes, colored in red, are \(d_{1}^+=\{3,4,10\}\), \(d_{2}^+=\{4,5,10\}\), and \(d_{3}^+=\{5,6,10\}\), built from the sets \(z_{1}=\{4\}\), \(z_{2}=\{4,5\}\), \(z_{3}=\{5\}\) and \(q_{1}=\{3\}\), \(q_{2}=\emptyset \), \(q_{3}=\{6\}\). The sets of moved neighbors are \(n_{1}=\emptyset \), \(n_{2}=\emptyset \) and \(n_{3}=\{\{5,6,9\}\}\). The resulting underlying graphs for these two examples are shown in Fig. 4.

Note that in this example, is a leaf node in the resulting tree, making it look like decoration in a Christmas tree.

Fig. 1
figure 1

A decomposable graph (left panel) and one of its junction tree representations (right panel)

Fig. 2
figure 2

A possible expansion of the junction tree in Fig. 1, where the empty subtree is drawn at Step 1

Fig. 3
figure 3

A possible outcome of the JTE  where a non-empty subtree was drawn in the expansion of the junction tree in Fig. 1

Fig. 4
figure 4

Two decomposable graphs resulting from expanding the graph in Fig. 1 by the vertex 10

Example 2

Figure 5 should be read in chunks of two rows (except for the first row) and shows the junction trees, the corresponding decomposable graphs and the subgraphs generated by the JTE  for \(m\in \{1,\dots ,5\}.\) The left column shows the expansion of the junction trees and the right column shows the underlying decomposable graphs. Subtrees are colored in blue and the new nodes are colored in red. Unaffected nodes are black. Vertices in the underlying graphs are colored analogously. For example, the subtree selected in the generation of on Row 5 is found on Row 4. The underlying nodes in for creating are also found on Row 4, and so on. Note that the subtree used in the creation of , is the empty tree, thus is black. The tuning parameters of the junction tree expander are set to \(\alpha =0.3\) and \(\beta =0.9\).

The main reason for operating on junction trees as opposed to decomposable graphs directly is computational tractability. Next we provide explicit expressions of the transition kernel of the JTE, for any given \(m\in {\mathbb {N}}\).

For and generated by the JTE, let denote the set of possible subtrees bridging and through the first step of the JTE. This set contains, depending on and , either one unique or two different trees, whose explicit forms are provided by Proposition 1.

Proposition 1

Let \(m \in {\mathbb {N}}\), , and be generated by the JTE. If the subtree of induced by the nodes containing the vertex has a single node with exactly two neighbors and such that , then ; otherwise, (a single tree), where and . Here and denote new nodes in and and are the corresponding nodes in . The sets \( r_{j}\) and \( r_{k}\) may be empty.

From a computational point of view, Proposition 1 is crucial since it guarantees a tractable expression of . Before we state this expression we introduce some further notation. We let denote the number of possible ways that , the tree obtained by cutting at the separator \(s\), can be connected to form a tree; this number is described in more detail in Theorem 5. Now, the transition probability of the JTE  takes the following form

(3.1)

where is understood as the probability that the JTE  generates with as input given that was drawn at Step 1. We stress again that the sum in (3.1) has either one or two terms and it is thus easily computed. The conditional probability takes two different forms depending on whether is empty or not. If is empty, since is randomised at \(\emptyset \), all the obtainable equivalent junction trees have equal probability. Otherwise, in case of non-empty, the probability of the subsets \(q_{j}\) are calculated according to the uniform subset distributions in Step 2\(^*\). Observe that, given and , the resulting tree is completely determined by and . Since the pairs are drawn conditionally independently given and we obtain

(3.2)

We examine the probabilities in (3.2) in the case where is nonempty. Since for each \(j\), the existence of a node such that forces \(q_{j}\) to be nonempty, it holds that

Conditionally upon , , and \(q_{j}\), the probability of each neighbor set \(n_{j}\) at Step \(4^*\) follows straightforwardly; indeed, the distribution of \(n_{j}\) takes two different forms depending on whether was engulfed into (i.e. ) or not. If so, all of the neighbors of are moved to with probability 1. Otherwise, it has equal probability over all subsets of giving

Observe that the simplicity of (3.1) is appealing from a computational point of view. In particular, as shown in Sect. 7, when is used as a proposal kernel in an SMC  algorithm, fast computation of the transition probability is crucial, especially as the graph space increases.

Fig. 5
figure 5

Example of a recursive application of the JTE with parameters \(\alpha =0.3\) and \(\beta =0.9\)

An important property of the JTE  is that for any \(m\in {\mathbb {N}}\) and , a tree generated by the JTE  is also a junction tree. In addition, is an induced subgraph of , having one additional vertex.

Theorem 1

For any \(m \in {\mathbb {N}}\) and it holds that

  1. (i)

    ,

  2. (ii)

    for all .

The following theorem states that for any \(m\in {\mathbb {N}}\), all junction trees in can be generated with positive probability using recursive application of the JTE. More specifically, we may define the marginal probability for any where and state the following theorem.

Theorem 2

For any ordering of vertices , \(m \in {\mathbb {N}}\), it holds that

For comparison, the algorithm for sequential a sampling of junction trees presented by Markenzon et al. (2008) corresponds to recursive application of a special case of the JTE, where \(\alpha =0\), \(\beta =1\), and where Step 4 is omitted. Note that Theorem 2 does not hold under such assumptions since the algorithm is forced to operate on a restricted space of junction trees for connected decomposable graphs. Markenzon et al. (2008) also proposes a final step that merges neighboring cliques an unspecified number of times in order to increase the number of edges in the underlying graphs. While this step has the intended effect on the graphs, the space is still restricted and calculating the transition probabilities becomes intractable in general.

4 Collapsing junction trees

In this section, we present the junction-tree collapser, a reversed version of the JTE, introduced in the previous section. The idea is to collapse a junction tree into a new tree by removing from the underlying graph in such a way that . As will be proved in this section, this procedure defines a Markov kernel .

Next follows a description of the different suboperations in the sampling procedure for . The details of the steps are given in Algorithm 5 in Appendix A.

Junction tree collapser

Let be a junction tree in . Similarly to the JTE, the JTC  takes two different forms depending on whether is present as a node in or not.

If is a node in proceed as follows:

Step 1. :

Remove and it incident links to obtain a forest, possibly containing only one tree.

Step 2. :

Randomly connect the forest into a tree.

If is not a node in proceed as follows:

Step 1\(^*\).:

Let be the subtree of induced by the nodes containing the vertex and enumerate the nodes in by .

Step 2\(^*\).:

For all , draw at random from \(M_{j}\), the set of neighbors of in having the associated separator . If no such neighbor exists, let .

Step 3\(^*\).:

Replace each node by the corresponding node in the sense that is assigned all former neighbors of .

The next example illustrates a reversed version of Example 1.

Example 3

Consider collapsing the junction tree in the bottom right panel of Fig. 3 by the vertex 10. The induced subgraph , having the nodes , and is colored in red in the same subfigure. Further we see that \(M_{1}=\emptyset \) implies that and \(M_{2}=\{\{1,4,5\}\}\) implies . By drawing from \(M_{3}=\{\{2,5,6\}, \{5,6,9\}\}\), the junction tree in the top left panel of Fig. 3 is obtained.

The induced transition probability of collapsing into a tree has the form

where, as before, is the set of nodes in containing . The max operation is needed in order to make the expression well defined even when \(M_{j}\) is empty.

The JTC  is a reversed version of the JTE  in the sense that for all \(m\in {\mathbb {N}}\), a junction tree , generated by the JTC  from a junction tree , can be used as input to the JTE  to generate . This property is formulated in the next theorem.

Theorem 3

For all \(m \in {\mathbb {N}}\) and ,

  1. (i)

    ,

  2. (ii)

    ,

  3. (iii)

    for any .

Theorem 3 proves to be crucial in the SMC  context described in Sect. 6 and in particular in the refreshment step of the particle Gibbs sampler detailed in Olsson et al. (2019).

5 Counting the number of junction trees for an expanded decomposable graph

Thomas and Green (2009b) provide an expression for counting the number of equivalent junction trees of a given decomposable graph. In this section we derive a factorisation of the same expression which shows to alleviate the computational burden when calculated for expanded graphs. For sake of completeness, we restate three theorems from (Thomas and Green 2009b). The first counts the number of ways a forest can be reconnected into a tree and was first established in Moon (1967).

Theorem 4

(Moon (1967)) The number of distinct ways that a forest of order \(m\) comprising q subtrees of orders \(r_1,\dots ,r_q\) can be connected into a single tree by adding \(q-1\) edges is

$$\begin{aligned} m^{q-2}\prod _{i=1}^qr_i. \end{aligned}$$

For a given junction tree , let \(t_{s}\) denote the order of the subtree induced by the separator \(s\). Now, let \(m_s\) be the number of links associated with \(s\) and let \(f_1,\dots ,f_{m_s+1}\) be the orders of the tree components in . Then, by Theorem 4 the following is obtained.

Theorem 5

(Thomas and Green (2009b)) The number of ways that the components of , where \(s\) is a separator in a graph with junction tree , can be connected into a single tree by adding the appropriate number of links is given by

Theorem 6

(Thomas and Green (2009b)) The number of junction trees for a decomposable graph is given by

In the sequential sampling context considered in this paper it is useful to exploit that any decomposable graph can be regarded as an expansion of another decomposable graph , in the sense that is obtained by expanding with the vertex . This follows for example by induction using (Lauritzen 1996, Corollary 2.8).

The key insight when calculating is that when a vertex is added to , not all separators will necessarily be affected. This implies that for some separators.

Theorem 7

Let be an expansion of some graph by the extra vertex . Let be the set of unique separators created (note that might be non-empty) by the expansion. Then

(5.1)

where is the set of separators in contained in some separator in \(S^\star \).

The potential computational gain obtained by using the factorisation in Theorem 7 is illustrated by the following example.

Example 4

Let be an expansion of a graph in the sense that is connected to every vertex in one of the cliques in . Then, since the set of separators is the same in the two graphs, it holds that

6 Applications to sequential Monte Carlo sampling

Sequential Monte Carlo (SMC) methods (Chopin and Papaspiliopoulos 2020) are a class of simulation-based algorithms that offers a principled way of sampling online from very general sequences of distributions, known up to normalising constants only, by propagating recursively a population of random draws, so-called particles, with associated importance weights. The particles evolve randomly and iteratively through selection and mutation. In the selection step, the particles are duplicated or eliminated depending on their importance, while the mutation operation disseminates randomly the particles in the state space and assigns new importance weights to the same for further selection at the next iteration. SMC methods have been particularly successful when it comes to online approximation of state posteriors in general state-space hidden Markov models (Arulampalam et al. 2002).

In this section we demonstrate how the JTE  and the JTC  can be cast into the framework of SMC  methods—or, more precisely, the SMC samplers proposed in Del Moral et al. (2006)—in order to sample from a sequence of probability distributions, where each is a distribution on . For every m we assume that is known only up to a normalising constant, i.e., , where is a tractable, unnormalised function. Following (Del Moral et al. 2006), we introduce path spaces and let

(6.1)

be extended target distributions. Importantly, each target is the marginal of \({\bar{\eta }}_{m}\) with respect to the mth component. In many applications, the aim is to sample from a given distribution \(\pi \) on some junction-tree space induced by n vertices, and in this case one may let and be the marginals of \(\pi \) (if these are known up to normalising constants), serving to guide the distribution flow towards the target \(\pi \).

Now, introduce, for all m, proposal distributions

(6.2)

Since Theorem 3 implies that for all \(\ell \in \{1,\dots ,m-1\}\), it is readily checked that \( {\text {Supp}}({\bar{\eta }}_{m}) \subseteq {\text {Supp}}({\bar{\rho }}_m)\). This property, along with Theorems 1 and 2, allows the extended target distributions (6.1) to be sampled by means of an importance-sampling procedure, where independent tree paths \(\tau _{1:m}^{i} = (\tau _{1}^{i},\ldots ,\tau _{m}^{i})\) generated sequentially using the JTE, are assigned importance weights

(6.3)

Here N is the Monte Carlo sample size. Thanks to the Markovian structure of the proposal (6.2) and the multiplicative structure of the weights (6.3), this procedure can be implemented sequentially by applying recursively the update described in Algorithm 1. This yields a sequence \((\tau _{m}^{i}, \omega _{m}^{i})_{i = 1}^N\), \(m \in {\mathbb {N}}\), of weighted samples, where, since is the marginal of \({\bar{\eta }}_{m}\) with respect to the last component, \(\sum _{i = 1}^N \omega _{m}^{i} h(\tau _{m}^{i}) / \Omega _{m}^N\), with , is a strongly consistent self-normalised estimator of the expectation of any real-valued test function h under . In the SMC literature, the draws \((\tau _{m}^{i})_{i = 1}^N\) are typically referred to as particles.

figure a

Even though this sequential importance sampling procedure, which is described in Algorithm 1, appears appealing at a first sight, the multiplicative weight updating formula (6.3) (Line 3 in Algorithm 1) is problematic in the sense that it will, inevitably, lead to severe weight skewness and, consequently, high Monte Carlo variance. In fact, it can be shown that updating the weights in this naive manner leads to a Monte Carlo variance that increases geometrically fast with m; see e.g. (Cappé et al. 2005, Chapter 7.3) for a discussion. Needless to say, this is impractical for most applications

In order to cope with the weight-degeneracy problem, Gordon et al. (1993) proposed furnishing the previous sequential importance sampling algorithm with a selection step, in which the particles are resampled, with replacement, in proportion to their importance weights. Upon selection, all particles are assigned the unit weight, and the particles and importance weights are then updated as in Algorithm 1. Such selection is a key ingredient in SMC methods, and it can be shown mathematically that the resulting sequential importance sampling with resampling algorithm, which is given in Algorithm 2, is indeed numerically stable Chopin and Papaspiliopoulos (2020), Del Moral (2004).

figure b

In standard self-normalised importance sampling, the average weight provides an unbiased estimator of the normalising constant of the target. However, when the particles are resampled systematically, as in Algorithm 2, this simple estimator is no longer valid. Instead, it is possible to show that for every m, the estimator

with , is an unbiased estimator of \(\gamma _m(h)\) for any real-valued test function h. In particular,

(6.4)

provides an unbiased estimator of the normalising constant of . This estimator will be illustrated in the next section.

7 Numerical study

We demonstrate two applications of Algorithm 2 for estimating the cardinalities and of the spaces of decomposable graphs and junction trees, respectively.

7.1 Estimating

Wormald (1985) provides an exact expression for and evaluates the same for \(m\le 13\). In the same reference, the author also establishes the asymptotic expression . Another exact algorithm that calculates for \(m\le 10\) is proposed in Kawahara et al. (2018).

In this study we will use Algorithm 2 for estimating , \(m \in {\mathbb {N}}\), on the basis of the target probability distributions

Note that the normalising constant of equals .; indeed,

With this formulation, unbiased estimates of , \(m \in {\mathbb {N}}\), can be obtained directly using (6.4). Note that in this setting Line 4 of Algorithm 2 reduces to

(7.1)

where , , for which, as demonstrated by Example 4, the computational burden can be substantially reduced using the factorisation (5.1) since .

Table 1 shows means and standard errors based on 10 estimates of for . The upper panel of the table shows while the lower panel shows , i.e. estimates of the fraction of undirected graphs that are decomposable. For \(m\le 13\) the exact enumerations are given in the second column. We ran the SMC  sampler with tuning parameters \(\alpha =0.5\), \(\beta =0.5\) and the number of particles was set to \(N=10000\). Figure 6 displays the asymptotic behavior of and for \(m\le 50\), along with the exact values for \(m\le 13\), justifying a concordance with the exact results. Each of the 10 estimates took about 10 minutes to calculate.

Finally, we also explored other parameterisations for \(\alpha \) and \(\beta \) and found that, in this case, the estimates seem to be less accurate in terms of standard error when using high values of \(\alpha \) about 0.9 and low values of \(\beta \) about 0.3. However, for \(\alpha \) and \(\beta \) about 0.3 and 0.9, respectively, the performance of the estimator was similar to that for the parameterisaion \(\alpha =\beta = 0.5\) considered above.

Table 1 Sequential Monte Carlo estimation of the number of decomposable graphs and the fraction of graphs which are decomposable
Fig. 6
figure 6

The number of decomposable graphs as a function of the number of vertices

7.2 Estimating

As far as we know there is no method available in the literature for efficiently calculating . However, for \(m\le 5\) it is computationally tractable to first find all the 822 graphs by Monte Carlo sampling and then evaluate \(\mu \) for each of them.

As in Sect. 7.1 we find an unbiased estimator of by constructing target distributions

so that the normalising constant equals , and then use (6.4). Note that with this setting the first factor in Line 4 in Algorithm 2 simplifies as

(7.2)

for all , .

The third and fourth columns of the upper panel in Table 2 show estimated means and standard deviations of for \(m\le 15 \) based on 10 replicates. The true values for are shown in the first column for \(m\le 5\). The lower panel of Table 2 displays estimates of the number of junction trees per decomposable graph, , for different numbers of vertices. True numbers as are shown in the first column, and estimated means and standard deviations of are shown in the third and fourth columns. Interestingly, Figure 7 indicates an exponential growth rate of the estimated junction trees per decomposable graph for \(p\le 50\). Each of the 10 estimates took about 6 minutes to compute.

Table 2 Sequential Monte Carlo estimation of the number junction trees and the expected number of junction trees per decomposable graph
Fig. 7
figure 7

Estimates of the expected number of junction trees per decomposable graph

8 Discussion

In this paper we have presented the JTE  and the JTC  for stochastically generating and collapsing junction trees for decomposable graphs in a vertex-by-vertex fashion. The Markovian nature of these procedures enables the development of sophisticated sampling technology such as SMC and particle MCMC methods; see (Olsson et al. 2019).

Several MCMC  methods for approximating distributions on the space of decomposable graphs have been proposed in the literature. Still, in most of these methods, an MCMC chain of graphs (or junction trees) is evolved by means of locally limited random perturbations, leading generally to bad mixing (Giudici and Green 1999; Green and Thomas 2013). The main benefit of casting the JTE and JTC procedures into the particle Gibbs framework is a substantial improvement of the mixing properties of the resulting MCMC chain; this improvement is possible since the JTE procedure allows the produced chain of junction trees to make long-range, global transitions across the state space.

The appealing properties of our approach do not come without a certain price. For instance, relying on the junction-tree representation when sampling from a given decomposable-graph distribution imposes an additional computational burden associated with calculating the number of possible junction-tree representations of each of the sampled graphs. In the present paper, we have been able to alleviate this burden by means of the factorisation property derived in Theorem 7, allowing for faster dynamic updates. Another factor that is challenging when using the SMC procedure in Algorithm 2 for sampling distributions over spaces of decomposable graphs with a very large number p of vertices stems from the well-known particle-path degeneracy phenomenon; see (Jacob et al. 2015; Koskela et al. 2020). More specifically, since the graphs propagated by Algorithm 2 are resampled systematically, many of them will, eventually, as the number of SMC iterations increases, have parts of their underlying graph in common. This may lead to high variance when p is large compared to the sample size N, and the \({\mathcal {O}}(N)\) bound on the resampling-induced particle-path coalescing time obtained recently in Koskela et al. (2020) suggests that p and \(N\) should be of at least the same order in order to keep the Monte Carlo error under control. In the particle Gibbs approach developed in Olsson et al. (2019) the particle-path degeneracy phenomenon is handled by means of an additional JTC-based backward-sampling operation.

As an alternative approach to the JTE, which incrementally constructs a junction tree by adding one vertex at a time to the underlying graph, one may suggest a method that operates directly on the space of decomposable graphs. The main difficulty arising when designing such a scheme is to express the transition probabilities in a tractable form while maintaining the ability to generate any decomposable graph with a given number of vertices, qualities possessed by the methods that we propose.

Finally, we expect that tailored data structures for the junction tree implementation which respect the sequential nature of the algorithms could greatly increase the computational speed. For instance, when propagating the particles in Algorithm 2, the junction trees are not altered but rather copied and expanded (since several trees must be able to stem from the same ancestor); thus, to use persistent data structures— which are widely used in functional programming to avoid the copying of data—in the SMC context of the present paper is an interesting line of research.