A Bayesian multilevel model for populations of networks using exponential-family random graphs

Lehmann, Brieuc; White, Simon

doi:10.1007/s11222-024-10446-0

A Bayesian multilevel model for populations of networks using exponential-family random graphs

Original Paper
Open access
Published: 19 June 2024

Volume 34, article number 136, (2024)
Cite this article

Download PDF

You have full access to this open access article

Statistics and Computing Aims and scope Submit manuscript

A Bayesian multilevel model for populations of networks using exponential-family random graphs

Download PDF

Brieuc Lehmann¹ &
Simon White^2,3

148 Accesses
1 Altmetric
Explore all metrics

Abstract

The collection of data on populations of networks is becoming increasingly common, where each data point can be seen as a realisation of a network-valued random variable. Moreover, each data point may be accompanied by some additional covariate information and one may be interested in assessing the effect of these covariates on network structure within the population. A canonical example is that of brain networks: a typical neuroimaging study collects one or more brain scans across multiple individuals, each of which can be modelled as a network with nodes corresponding to distinct brain regions and edges corresponding to structural or functional connections between these regions. Most statistical network models, however, were originally proposed to describe a single underlying relational structure, although recent years have seen a drive to extend these models to populations of networks. Here, we describe a model for when the outcome of interest is a network-valued random variable whose distribution is given by an exponential random graph model. To perform inference, we implement an exchange-within-Gibbs MCMC algorithm that generates samples from the doubly-intractable posterior. To illustrate this approach, we use it to assess population-level variations in networks derived from fMRI scans, enabling the inference of age- and intelligence-related differences in the topological structure of the brain’s functional connectivity.

Weighted Stochastic Block Models of the Human Connectome across the Life Span

Article Open access 29 August 2018

Normalised degree variance

Article Open access 22 June 2020

Null models in network neuroscience

Article 31 May 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The statistical analysis of network data is becoming increasingly commonplace, with applications across various disciplines, such as epidemiology, social science, neuroscience and finance (Kolaczyk 2009). Over the last four decades, a number of statistical models for networks have been developed, including stochastic blockmodels (Holland et al. 1983), latent space models (Hoff et al. 2002) and—the focus of this article—exponential random graph models (ERGMs) (Frank and Strauss 1986).

An exponential random graph model is a set of parametric statistical distributions on network data (see Schweinberger et al. (2020) for a recent review). The aim of the model is to characterise the distribution of a network in terms of a set of summary statistics. These summary statistics are typically comprised of topological features of the network, such as the number of edges and subgraph counts. The summary statistics enter the likelihood via a weighted sum; the weights are (unknown) model parameters that quantify the relative influence of the corresponding summary statistic on the overall network structure and must be inferred from the data. ERGMs are thus a flexible way in which to describe the global network structure as a function of network summary statistics.

To date, statistical network models, including ERGMs, have largely focused on the analysis of a single network. Formally, a network consists of a set of nodes and a set of edges between these nodes. Let $\mathcal {N} = \lbrace 1, \dots , N \rbrace $ be a finite set of nodes, each of which may be associated with covariates $x_i \in \mathcal {X} \subseteq \mathbb {R}^q$. An edge from node i to node j is denoted by $Y_{ij}$, so that the network is encoded by the adjacency matrix $\pmb {Y} = (Y_{ij})_{i,j \in \mathcal {N}}$. For our purposes, the set of nodes $\mathcal {N}$ and their covariates $\pmb {x} = \lbrace x_1, \dots , x_N \rbrace $ are considered fixed, while the edges are considered to be random variables. Denote $\pmb {y}$ to be an instantiation, or outcome, of the random adjacency matrix $\pmb {Y}$ and write $\mathbb {P}(\pmb {Y} = \pmb {y}):= \pi (\pmb {y})$ for the probability that $\pmb {Y}$ takes the value $\pmb {y}$. A statistical network model specifies a parametrised probability distribution on the adjacency matrix $\pi (\pmb {y} | \pmb {x}, \theta )$ where $\theta $ is a vector of model parameters.

A population of networks consists of $n > 1$ adjacency matrices $\pmb {Y}^{(1)}, \dots , \pmb {Y}^{(n)}$ defined on a common set of nodes $\mathcal {N}$. We will assume for simplicity that the nodal covariates are the same across networks, though in principle this not need be the case. A common example of a population of networks arises in neuroimaging, where a typical study consists of brain data across a number of participants, each constituting an individual network. Network analyses of the brain can provide insight into cognitive function by revealing how distinct brain areas work in conjunction (Fuster 2006). These analyses aim to identify salient topological features of the brain’s connectivity structure that are common across individuals or that vary with a given covariate.

While one could fit a single model to each individual network separately, it is not straightforward to combine these individual results into a single coherent result that is representative of the whole population. An alternative approach is to construct a group-representative network by, for example, taking the mean of the edges across the individual networks and applying a threshold to the resulting weighted network (Achard et al. 2006). These approaches ignore the individual variability present in the networks and, moreover, typically do not accurately summarise the topological information across the individual networks (Ginestet et al. 2011).

A more statistical approach is to treat each individual networks as distinct statistical units arising from a joint probability distribution $\pi (\pmb {y}^{(1)}, \dots , \pmb {y}^{(n)}| \pmb {x}, \theta )$ (Ginestet et al. 2017). Here, we describe how to perform Bayesian linear regression where the outcome of interest is a network-valued random variable whose distribution is described by an exponential random graph models. By modelling the networks jointly, this framework provides a principled approach to characterise the relational structure for an entire population, and allows one to assess how network structure varies with a given set of network-level covariates. In the case of binary covariates, our method can be used to infer group-level differences in the network structure between sets of networks. We demonstrate on both simulated networks and real networks derived from resting-state functional magnetic resonance imaging (fMRI) scans from an ageing study.

Inference for Bayesian ERGMs is challenging due to the double-intractability of the ERGM posterior distribution; standard Markov chain Monte Carlo (MCMC) schemes such as the Metropolis algorithm are not feasible as it is not possible to evaluate the acceptance ratio. A common workaround is to apply the exchange algorithm (Murray et al. 2006), which was first employed in the context of Bayesian ERGMs by Caimo and Friel (2011). To perform inference for our framework for populations of networks, we implemented an exchange-within-Gibbs algorithm that combines the exchange algorithm with the Gibbs sampler (Geman and Geman 1984) to produce samples from the target posterior distribution. The parameterization of general multilevel models can play an important role in the overall efficiency of a MCMC scheme (Gelfand et al. 1995; Papaspiliopoulos et al. 2003, 2007). To improve the mixing properties of the algorithm, we use an ancillarity-sufficiency interweaving strategy (ASIS) that interweaves between the centered and non-centered parameterizations (Yu and Meng 2011). To further boost efficiency, we also employ adaptation of the random-walk proposal parameters in the algorithm (see e.g. Roberts et al. (1997)).

1.1 Related work

Our work builds on that of Slaughter and Koehly (2016) who studied multiple approaches to building Bayesian hierarchical models for populations of networks based on ERGMs, including an example of Bayesian linear regression with a single covariate per network. We extend the approach of Slaughter and Koehly (2016) to explicitly handle multiple network-level covariates, employing a matrix Normal prior on the regression coefficients which admits (partial) conjugacy. As noted by Slaughter and Koehly (2016), multilevel models frequently exhibit poor mixing for some of the parameters. Our use of the ASIS algorithm greatly improves the efficiency of the sampler, allowing us to perform linear regression on larger populations of networks, and with a larger number of nodes in each network. We now describe some alternative approaches to modelling populations of networks.

1.2 Hierarchical ERGMs

Multilevel networks are networks with a nested hierarchical structure such that nodes may be grouped into subsets of nodes which may further be grouped into subset of subsets of nodes, and so on. It is worth emphasising that the hierarchical nature of a multilevel network corresponds to the grouping of nodes, as opposed to model parameters as might be typical in a Bayesian hierarchical model. A population of networks represents a two-level network such that each subset of nodes correspond to a separate network, with no connections between distinct subsets (see Fig. 1). Wang et al. (2013) proposed ERGMs for multilevel networks, introducing a range of model specifications to account for a range of multilevel structures for two-level networks. Yin and Butts (2022) develop a preprocessing approach to efficiently fit a ’pooled’ ERGM to multiple networks drawn from the same model. Yin et al. (2022) proposed a mixture of ERGMs to model populations of networks in which the group membership is unknown, which was extended to a data-adaptive Dirichlet process mixture of ERGMs by Ren et al. (2023). Schweinberger and Handcock (2015) introduced exponential random graph models with local dependence, providing a general framework encompassing multilevel networks (and thus populations of networks) and establishing a central limit theorem for this class of models.

1.3 ERGMs for brain networks

Exponential random graph models have been applied to resting-state fMRI brain networks (see Simpson et al. (2011) for an early example). Simpson et al. (2012) constructed group-representative networks by taking the mean of the parameter estimates from ERGMs fit to each individual network. Sinke et al. (2016) constructed group-representative networks directly from individual diffusion tensor imaging (DTI) brain networks and then fit Bayesian ERGMs to the resulting group networks. Obando and Fallani (2017) applied ERGMs to functional connectivity brain networks derived from electroencephalographic (EEG) signals. In each of these approaches, the networks are fit independently from each other and, unlike the hierarchical approach described here, there is no pooling of information across networks.

1.4 Other models for populations of networks

Other statistical network models have recently been extended to handle populations of networks. Sweet et al. (2013) proposed a general framework of hierarchical network models (HNMs), which encompasses the model described in this article. They focus on a hierarchical representation of latent space models (Hoff et al. 2002) applied to social networks. Sweet et al. (2014) studied stochastic blockmodel in the HNM framework to infer clusters of nodes shared across networks. Durante et al. (2017) develop an alternative extension of the latent space model (Hoff et al. 2002) to populations of networks based on a low-dimensional mixture model representation. Durante et al. (2018) applied this model in the context of groups of networks to test for differences. Mukherjee et al. (2017) used graphons to detect clusters among multiple networks within a population (as opposed to clusters within networks). Signorelli and Wit (2020) use a model-based clustering method based on generalized linear (mixed) models to cluster networks that share certain network properties of interest.

2 Model formulation

2.1 Exponential random graph models

The family of exponential random graph models define probability distributions over the space of networks in terms of sets of summary (or sufficient) statistics. We will focus on the case of undirected, binary networks, with $Y_{ij} = Y_{ji} \in \lbrace 0,1 \rbrace $. Let $\mathcal {Y}$ be the range of $\pmb {Y}$, i.e. the set of all possible outcomes. Let $s(\pmb {y}, \pmb {x})$ denote a vector of p summary statistics, such that each component is a function $s_i: \mathcal {Y} \times \mathcal {X}^N \mapsto \mathbb {R}$.

An ERGM is specified by a particular set of p summary statistics and a map $\eta : \Theta \mapsto \mathbb {R}^p $. The probability mass function of $\pmb {Y}$ under the corresponding ERGM is given by

$$\begin{aligned} \pi (\pmb {y}|, \pmb {x}, \theta ) = \dfrac{\exp \left\{ \eta (\theta )^Ts(\pmb {y}, \pmb {x})\right\} }{Z(\theta )}. \end{aligned}$$

(1)

Here, $\theta \in \Theta \subseteq \mathbb {R}^p$ is a vector of p model parameters that must be estimated from the data and $Z(\theta ) = \sum _{\pmb {y}' \in \mathcal {Y}} \exp \left\{ \eta (\theta )^Ts(\pmb {y}', \pmb {x})\right\} $ is the normalising constant ensuring the probability mass function sums to one. Given data, that is, a realisation $\pmb {y}$, the goal is to infer which values of $\theta $ best correspond to the data under this distribution. To reduce the notational burden, we will henceforth omit the dependence on the nodal covariates $\pmb {x}$, considering this to be implicit in the specification of the probability distribution.

2.2 A Bayesian multilevel model for populations of networks

The ERGM provides a flexible family of distributions for a single network. Our aim is to extend this to a model for a population of networks in which each network is accompanied by a set of covariates. To do so, we use ERGMs as the basis of a Bayesian multilevel model. Let $\varvec{Y} = (\pmb {Y}^{(1)}, \dots , \pmb {Y}^{(n)})$ be a set of n networks, and let $X \in \mathbb {R}^{n \times q}$ be a matrix of q network-level covariates. Identify each network $\pmb {Y}^{(i)}$ with its own vector-valued ERGM parameter $\theta ^{(i)}$. Write $\varvec{\theta } = (\theta ^{(1)}, \dots , \theta ^{(n)})$ for the set of network-level parameters.

We model each individual network $\pmb {Y}^{(i)}$ as an exponential random graph, which we denote $\pmb {Y}^{(i)} \sim \pi (\cdot | \theta ^{(i)})$. Each individual ERGM must consist of the same set of p summary statistics $s(\cdot )$. We then propose the following multilevel model:

$$\begin{aligned} \begin{aligned} \pmb {Y}^{(i)}&\sim \pi (\cdot | \theta ^{(i)}), ~~ i = 1, \dots , n \\ \theta ^{(i)}&\sim \mathcal {N}\left( x^T_i\beta , \Sigma _\epsilon \right) , ~~ i = 1, \dots , n \end{aligned} \end{aligned}$$

(2)

where $\beta $ is a $q \times p$ matrix of parameters, and the q-vector $x_i$ corresponds to the $i^{th}$ column of the matrix X. We assume that, conditional on their respective network-level parameters $\theta ^{(i)}$, the $\pmb {Y}^{(i)}$ are independent. We highlight the connection to multivariate linear regression: we have vector-valued ‘response’ variables $\theta ^{(i)}$ whose dependence on a set of q explanatory variables X we would like to assess, allowing the components of the residuals $\epsilon ^{(i)}:= \theta ^{(i)}-x^T_i\beta $ to be correlated. The difference with standard multivariate linear regression is that the responses are not observed but are instead latent parameters of an ERGM model. With this specification comes the flexibility associated with multivariate linear regression; the X matrix can include polynomial terms and interactions between the covariates of interest.

2.2.1 Prior specification

The full conditional likelihood described by (2) can be written

$$\begin{aligned} \begin{aligned} p(\varvec{Y} \mid \varvec{\theta }, X, \beta , \Sigma _\epsilon )&= \prod _{i=1}^n p(\pmb {Y}^{(i)} \mid \theta ^{(i)}, x_i, \beta , \Sigma _\epsilon ) \\&= \prod _{i=1}^n \pi (\pmb {Y}^{(i)} \mid \theta ^{(i)})p(\theta ^{(i)} \mid x_i, \beta , \Sigma _\epsilon ). \end{aligned} \end{aligned}$$

(3)

To complete this model, we must therefore specify a prior on $(\beta , \Sigma _\epsilon )$. Motivated by computational simplicity, we opt for the (conditional) conjugate prior $p(\beta , \Sigma _\epsilon ) = p(\beta \mid \Sigma _\epsilon )p(\Sigma _\epsilon )$, with

$$\begin{aligned} \begin{aligned} \beta \mid \Sigma _\epsilon&\sim \mathcal{M}\mathcal{N}\left( \beta _0, \Lambda _0^{-1}, \Sigma _\epsilon \right) , \\ \Sigma _\epsilon&\sim \mathcal {W}^{-1}\left( V_0, \nu _0 \right) , \end{aligned} \end{aligned}$$

(4)

where $\beta _0$ is a $q \times p$ prior mean matrix, $\Lambda _0$ is a $p \times p$ positive definite matrix, $V_0$ is a $q \times q$ positive definite matrix, and $\nu _0 > q - 1$. Here, $\mathcal{M}\mathcal{N}(M, U, V)$ denotes a matrix-normal distribution with location matrix M, row-based scale matrix U, and column-based scale V. $\mathcal {W}^{-1}(\Psi , \nu )$ is an inverse-Wishart distribution with scale $\Psi $ and $\nu $ degrees of freedom.

Equipped with this prior, we can factorise the posterior of $(\beta , \Sigma _\epsilon )$ given the matrix X and the network-level parameters $\varvec{\theta }$, into $p(\beta , \Sigma _\epsilon \mid \varvec{\theta }, X) = p(\beta \mid \Sigma _\epsilon , \varvec{\theta }, X)p(\Sigma _\epsilon \mid \varvec{\theta }, X)$ with

$$\begin{aligned} \begin{aligned} \beta \mid \Sigma _\epsilon , \varvec{\theta }, X&\sim \mathcal{M}\mathcal{N}\left( \beta _n, \Lambda _n^{-1}, \Sigma _\epsilon \right) , \\ \Sigma _\epsilon \mid \varvec{\theta }, X&\sim \mathcal {W}^{-1}\left( V_n, \nu _n \right) , \end{aligned} \end{aligned}$$

(5)

where

$$\begin{aligned} \begin{aligned} \nu _n&= \nu _0 + n \\ \Lambda _n&= X^TX + \Lambda _0 \\ \beta _n&= \Lambda _n^{-1}\left( X^T\varvec{\theta } + \Lambda _0\beta _0\right) \\ V_n&= V_0 + \left( \varvec{\theta } - X\beta _n\right) ^T\left( \varvec{\theta } - X\beta _n\right) \\&\quad + \left( \beta _n - \beta _0\right) ^T \Lambda _0 \left( \beta _n - \beta _0\right) . \end{aligned} \end{aligned}$$

(6)

Note that conditional on X and $\varvec{\theta }$, the networks Y are independent of $(\beta , \Sigma _\epsilon )$ and hence do not appear directly in the (conditional) posterior. However, as we shall see below, the networks are present in the posterior for $\varvec{\theta }$. This motivates a Gibbs sampling approach, whereby we iteratively draw from the required conditional distributions.

Regarding the choice of values for the prior hyperparameters $(\beta _0, \Lambda _0, V_0, \nu _0)$, studies of single-network Bayesian ERGMs typically assume relatively flat multivariate normal prior distributions on the model parameters (Caimo and Friel 2011; Sinke et al. 2016; Thiemichen et al. 2016). In this spirit, we suggest default priors of $\beta _0 = 0$ (i.e. the $q \times p$ zero matrix), $\Lambda _0^{-1} = 100I_p$, $V_0 = I_q$, and $n_0 = q + 1$. Informative priors can be used given information from previous studies (see e.g. Caimo et al. (2022), Caimo et al. (2017)) though we note that the appropriate setting of informative priors can be a challenging task due to the typically high levels of dependence between parameters (Koskinen et al. 2013).

3 Posterior computation

The double-intractability of the ERGM posterior distribution means that standard MCMC schemes such as the Metropolis algorithm are not suitable. This is due to the presence of the intractable normalising constants $Z(\theta ^{(i)})$ in the denominator, rendering calculation of the Metropolis acceptance rates computationally infeasible. Several methods have been proposed in recent years to perform Bayesian inference in the presence of intractable normalising constants (see Park and Haran (2018) for a review). We focus here on the exchange algorithm (Murray et al. 2006), employed in the context of single-network Bayesian ERGMs by Caimo and Friel (2011). We first recap the exchange algorithm in the context of Bayesian ERGMs before describing a exchange-within-Gibbs scheme to generate samples from the joint posterior.

Consider a Metropolis update for a single-network Bayesian ERGM. The acceptance probability for a proposal $\theta '$ from current value $\theta $ requires evaluation of the ratio $Z(\theta ) / Z(\theta ')$, which is computationally intractable. The exchange algorithm is an MCMC scheme designed to circumvent this obstacle. This is achieved by introducing an auxiliary variable $\pmb {y}' \sim \pi (\cdot |\theta ')$, i.e. a network drawn from the same exponential random graph model with parameter $\theta '$.

The algorithm targets an augmented posterior

$$\begin{aligned} \pi (\theta , \theta ', \pmb {y}'|\pmb {y}) \propto \pi (\theta |\pmb {y})h(\theta '|\theta )\pi (\pmb {y}'|\theta ') \end{aligned}$$

(7)

where $\pi (\theta |\pmb {y})$ is the original (target) posterior, $h(\theta '|\theta )$ is an arbitrary, normalisable proposal function, and $\pi (\pmb {y}'|\theta )$ is the likelihood of the auxiliary variable. For simplicity, we assume $h(\theta '|\theta )$ to be symmetric. Each of the three terms on the right-hand side of Eq. (7) can be normalised, so the left-hand side is well-defined as a probability distribution.

The algorithm proceeds as follows. At each iteration, first perform a Gibbs’ update of $(\theta ', \pmb {y}')$ by drawing $\theta ' \sim h(\cdot | \theta )$ followed by $\pmb {y}' \sim \pi (\cdot |\theta ')$. Next, exchange $\theta $ and $\theta '$ with probability $\min (1, AR(\theta ', \theta , \pmb {y}, \pmb {y}'))$, where

$$\begin{aligned} \begin{aligned}&AR(\theta ', \theta , \pmb {y}, \pmb {y}') \\&\quad = \dfrac{\pi (\theta '|\pmb {y})}{\pi (\theta |\pmb {y})} \cdot \dfrac{\pi (\pmb {y}'|\theta )}{\pi (\pmb {y}'|\theta ')} \\&\quad = \dfrac{\exp \left\{ \theta '^Ts(\pmb {y})\right\} \pi (\theta ')}{\exp \left\{ \theta ^Ts(\pmb {y})\right\} \pi (\theta )}\dfrac{Z(\theta )}{Z(\theta ')} \cdot \dfrac{\exp \left\{ \theta ^Ts(\pmb {y}')\right\} }{\exp \left\{ \theta '^Ts(\pmb {y}')\right\} }\dfrac{Z(\theta ')}{Z(\theta )} \\&\quad = \exp \left\{ [\theta ' - \theta ]^T[s(\pmb {y}) - s(\pmb {y}')]\right\} \frac{\pi (\theta ')}{\pi (\theta )} \end{aligned} \end{aligned}$$

(8)

Crucially, the ratio of intractable normalising constants cancel out, and so this acceptance ratio can indeed be evaluated. The stationary distribution of the Markov chain constructed through this scheme is $\pi (\theta , \theta ', \pmb {y}'|\pmb {y})$ (Murray et al. 2006). Thus, by marginalising out $\theta '$ and $\pmb {y}'$, the algorithm yields samples from the desired posterior, namely $\pi (\theta |\pmb {y})$.

The exchange algorithm update requires a sample $\pmb {y}'$ from the ERGM $\pi (\cdot |\theta ')$ in order to compute the acceptance ratio. Although perfect sampling for ERGMs is possible, it is computationally impractical except for a few special cases (Butts 2018). A pragmatic alternative, employed in Caimo and Friel (2011) and Wang and Atchadé (2014), is to use the final iteration of a Metropolis–Hastings algorithm as an approximate sample from $\pi (\cdot |\theta ')$ (Hastings 1970; Hunter et al. 2008). A theoretical justification of this approach is given by Everitt (2012): under certain conditions, despite using an approximate sample, the algorithm nevertheless targets an approximation to the correct posterior distribution. Further, this approximation improves as the number of iterations of the inner MCMC increases.

3.1 The exchange-within-Gibbs algorithm

We now extend the exchange algorithm in order to generate samples from our full posterior on a population of networks. As the name suggests, the exchange-within-Gibbs algorithm combines the exchange algorithm with the Gibbs sampler (Geman and Geman 1984) to produce samples from the desired posterior. Note that we can treat the unknown parameters of the model $(\beta , \epsilon , \Sigma _\epsilon )$ as components of a single multi-dimensional parameter. We iteratively sample each component from its conditional distribution given the remaining components.

The full exchange-within-Gibbs scheme is outlined in Algorithm 2. Since each step samples from the respective full conditional distribution, the algorithm ensures that the stationary distribution of the resulting Markov chain is indeed the joint posterior $\pi (\beta , \epsilon , \Sigma _\epsilon |\varvec{y})$ (Tierney 1994). As with the exchange algorithm for the single-network Bayesian ERGM, the most computationally expensive step is sampling $\pmb {y}'$ from $\pi (\cdot |\theta ')$, i.e. simulating an exponential random graph with parameter $\theta ' = X\beta ' + \epsilon '$. Moreover, this step must be performed for each of the individual-level parameter $\theta ^{(i)}$ updates. Thus, the computational cost of each iteration increases linearly with the number of networks in the data. However, these updates may be performed in parallel so, with access to a sufficient number of computing cores, the actual computational time per iteration typically increases sub-linearly with the number of networks.

3.1.1 Choice of parametrisation: centering vs. non-centering

The parametrisation of general multilevel models in the context of MCMC computation has been studied in some detail (Gelfand et al. 1995; Papaspiliopoulos et al. 2003, 2007; Yu and Meng 2011). Here, we discuss the two most commonly used parametrisations: the ‘centered’ and the ‘non-centered’. Let $\mu ^{(i)} = x^T_i\beta $ be the mean for the $i^{th}$ ERGM parameter $\theta ^{(i)}$. The parametrisation presented thus far is known as the centered parametrisation (CP) Gelfand et al. (1995), Papaspiliopoulos et al. (2007), in which the parameters ($\beta , \pmb {\Sigma }_\theta $) are independent of the data $\varvec{Y}$:

$$\begin{aligned} \begin{aligned} \pmb {Y}^{(i)}&\sim \pi (\cdot \mid \theta ^{(i)}), ~~~ i = 1, \dots , n \\ \theta ^{(i)}&\sim \mathcal {N}(\mu ^{(i)}, \pmb {\Sigma }_\epsilon ), ~~~ i = 1, \dots , n. \end{aligned} \end{aligned}$$

(9)

In contrast, the non-centred parametrisation (NCP) can be written as follows:

$$\begin{aligned} \begin{aligned} \pmb {Y}^{(i)}&\sim \pi (\cdot \mid \mu ^{(i)} + \epsilon ^{(i)}), ~~~ i = 1, \dots , n \\ \epsilon ^{(i)}&\sim \mathcal {N}(0, \pmb {\Sigma }_\epsilon ), ~~~ i = 1, \dots , n. \end{aligned} \end{aligned}$$

(10)

The identity $\theta ^{(i)} = \mu ^{(i)} + \epsilon ^{(i)}$ confirms the equivalence of the two parametrisations. Note that the parameter $\beta $ enters the likelihood directly in (10) via the $\mu ^{(i)}$, so the conditional distribution of $\beta $ given the remaining parameters has an intractable normalising constant $\prod _{i=1}^n Z(x^T_i\beta )$. As above, this can be dealt with via an exchange update, in this case requiring simulation of n networks for the normalising constants to cancel in the acceptance ratio, now given by

$$\begin{aligned} \begin{aligned}&AR(\beta ', \beta , \varvec{Y}, \varvec{Y}'; \pmb {\Sigma }_\epsilon , X) \\&\quad = \exp \left\{ \sum _{i=1}^n [x^T_i(\beta ' - \beta )]^T[s(\pmb {y}_i) - s(\pmb {y}_i')]\right\} \frac{\pi (\beta ' \mid \pmb {\Sigma }_\epsilon )}{\pi (\beta \mid \pmb {\Sigma }_\epsilon )}. \end{aligned} \nonumber \\ \end{aligned}$$

(11)

The centred parametrisation and the non-centred parametrisation tend to be complementary: when one performs poorly, the other tends to perform better (Papaspiliopoulos et al. 2007). More precisely, the centred parametrisation tends to lead to more efficient MCMC performance when $\theta $ is well-identified by the data $\pmb {Y}$, whereas the non-centered parametrisation can be more competitive when $\theta $ is weakly-identified (relative to $(\beta , \pmb {\Sigma }_\epsilon )$). However, if we are in an intermediate setting, and when the parameters of interest are the higher-level parameters $(\beta , \pmb {\Sigma }_\epsilon )$, it is possible to combine both approaches using an ancillarity-sufficiency interweaving strategy (ASIS; Yu and Meng (2011)). ASIS works by combining the updating schemes of the CP and NCP approaches, introducing an intermediate step to first draw ($\beta , \theta $) under the centred parametrisation, and then redrawing the parameters under the non-centered parametrisation. The ASIS algorithm for a multilevel Bayesian ERGM is described in Algorithm 3.

3.1.2 Proposal adaptation

We use multivariate normal random walk proposals in the respective exchange updates of both $\theta ^{(i)}$ and $\beta $, for example

$$\begin{aligned} h(\theta ' | \theta _{k-1}) = \mathcal {N}(\theta _{k-1}, \Sigma ) \end{aligned}$$

(12)

The choice of the the proposal covariance matrix $\Sigma $ is crucial to the overall efficiency of the MCMC algorithm; we wish to make large proposals that are likely to be accepted in order to explore the posterior in as few iterations as possible. A common approach to tuning covariance proposals for a wide range of random walk based algorithms, including Metropolis-within-Gibbs, is to target an acceptance rate close to 0.234, with acceptance rates between 0.1 and 0.5 often yielding satisfactory results (Roberts et al. 1997, 2001; Roberts and Rosenthal 2009). Since manual tuning of the $n + 1$ proposal covariance matrices would be impractical, we instead implement an adaptive proposal scheme.

For each proposal, we use a version of the adaptive Metropolis algorithm (Haario et al. 2001) considered by Roberts and Rosenthal (2009). Specifically, for the first 1000 iterations, we adapt every 20 iterations, with proposals of the form

$$\begin{aligned}{} & {} h_k(\theta ' | \theta _{k-1}) = (1-\gamma ) N \left( \theta _{k-1}, (2.38)^{2} \delta _{k} \Sigma _{k} / p \right) \nonumber \\{} & {} \quad + \gamma N\left( \theta _{k-1}, (0.1)^{2} \delta _{k} I_{p} / p \right) , \end{aligned}$$

(13)

where $\Sigma _k$ is the sample covariance matrix of the posterior samples $(\theta _{1},\dots ,\theta _{k-1})$ and $\delta _k$ is an additional scaling factor that is varied to control the magnitude of the proposals. Following Roberts and Rosenthal (2009), we set $\gamma = 0.05$. The role of $\Sigma _k$ is to adapt the direction of the proposals to the MCMC run so far, while $\delta _k$ serves to target an acceptance rate of 0.234. Specifically, we start with $\delta _1 = 1$ and increase (resp. decrease) $\log (\delta _k)$ by $\min (0.5, 1 / \sqrt{(}k)$ if the acceptance rate was below (resp. above) 0.234 in the previous 20 iterations.

3.2 Posterior predictive assessment

Having produced a sufficient number of samples from the posterior distribution, we then assess whether the model adequately describes the data. Since determining the distribution of appropriate test quantities is difficult, assessing such goodness-of-fit for ERGMs is typically performed graphically (Hunter et al. 2008). For a single ERGM fit, one can simulate a large number of networks from the fitted model and compare these ‘posterior predictive networks’ to the observed network. This comparison is usually done via a set of network metrics. If a model fits the data well then the network metrics of the posterior predictive networks should be similar to those of the observed network.

For a population of networks, we can apply the same principles. To do so, we choose uniformly at random S values from the posterior samples of $\beta $. For each value, we simulate a network from $\pi (\cdot |X\beta ^{(s)})$. We can then compare these posterior predictive networks to the observed networks based on a set of network metrics. For this purpose, we will use three important network metric distributions that are not explicitly modelled, namely degree distribution, geodesic distance distribution (length of shortest paths) and edge-wise shared partners distribution.

4 Results

To illustrate our method, we apply it to a set of simulated networks, demonstrating that it is capable of recovering the ground truth. We also apply our method to resting-state fMRI networks from the Cam-CAN project, a study on healthy ageing (Shafto et al. 2014), to assess how network structure varies with age and fluid intelligence. The R scripts used to generate these results can be found at https://github.com/brieuclehmann/multibergm-scripts.

4.1 Simulation

We generated sets of 30-node networks with nodes split into two ‘hemispheres’ of 15 nodes each. We simulated the networks from an exponential random graph model with three terms: total number of edges (‘edges’), total number of edges between nodes in the same hemisphere (‘nodematch.hemisphere’), and the geometrically-weighted edgewise-shared partner (GWESP) statistic (‘gwesp.fixed.0.9’). The GWESP statistic of a network y is a measure of clustering and is given by:

$$\begin{aligned} GWESP(\pmb {y}) = e^\tau \sum _{w=1}^N \lbrace 1 - (1 - e^{-\tau })^w \rbrace EP_w(\pmb {y}), \end{aligned}$$

(14)

where $EP_w(y)$ is the number of connected node pairs having exactly w shared partners and $\tau $ is a decay parameter, which we fix at $\tau = 0.9$. The decay parameter attenuates the effect of the number of higher-order edgewise shared partners relative to lower-order edgewise shared partners.

We simulate networks under three distinct settings, varying the number of networks in each case: (i) a population of networks with no additional covariate information, (ii) a population of networks where each network is associated with a single continuous covariate, and (iii) a population of networks with two subgroups, so that each network is associated with a binary covariate indicating group allocation.

4.1.1 No covariate information

To simulate the networks, we first generated individual-level parameters $\theta _i \sim \mathcal {N}(\mu , \Sigma ), ~ i = 1, \dots , n$ where

$$\begin{aligned} \mu&= (-3, 0.5, 0.5)^T \end{aligned}$$

(15)

$$\begin{aligned} \Sigma&= \frac{1}{50}\begin{pmatrix} 1 &{} -0.5 &{} 0 \\ -0.5 &{} 0.5 &{} 0 \\ 0 &{} 0 &{} 0.5 \end{pmatrix}. \end{aligned}$$

(16)

We then used the ergm R package (Hunter et al. 2008) to simulate n networks $\pmb {y}_i \sim p(\cdot |\theta _i), ~ i = 1, \dots , n$. The simulation procedure is based on an MCMC algorithm, initialised at a network with the prescribed number of nodes and covariates (in this case, hemisphere labels). With these simulated networks, we applied our exchange-within-Gibbs algorithm with ASIS (Algorithm 3) to generate 12,000 posterior samples, adapting the random-walk proposals for the first 1,000 iterations, and discarding the first 2,000 as burn-in.

Figure 2 displays summaries of the posterior samples for the group-level mean parameter $\mu $ of the model fit to $n=10$ networks. The true value of $\mu $ is covered by the posterior density, while the trace and autocorrelation plots indicate that the MCMC has mixed well. To assess the goodness-of-fit, we generated $S=100$ networks from the model at posterior samples of $\mu $ chosen uniformly at random. Figure 3 shows the degree distribution, geodesic distance distribution and edgewise shared partner distribution of these simulated networks against those to which the model was fit.

To complete our analysis of a single group of networks, we compare the density of the posterior samples between groups of size $n = 10, 20, 50, 100$. Figure 4 illustrates how the posterior samples of $\beta $ concentrates around the true value as the number of networks in the group increases. We also investigated different settings for the prior hyperparameters ($\Lambda _0^{-1} = 1, 10, 100$ and $\nu _0 = 5, 10, 50$) with $n=10$, finding that these did not have an appreciable effect on the posterior density (Supplementary Figure 2).

As a supplementary analysis, we investigated our model’s performance for increasing network size $N = 30, 60, 90, 120, 150$, where N corresponds to the number of nodes in each network. We kept the number of auxiliary MCMC iterations used to simulate each network within the exchange algorithm fixed at $n_{aux} = 1000$. This ensured that the computation time for each of these settings was of similar order, ranging from 50 min for $N=30$ to 90 min for $N=150$ using using 10 Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz processors on a computing cluster. However, the number of auxiliary iterations necessary for convergence increases with network size (Krivitsky and Handcock 2014) and hence these auxiliary draws may not adequately represent draws from the desired ERGM required in the exchange algorithm. Supplementary Figure 1 illustrates this, with model performance degrading significantly for $N > 60$.

4.1.2 Continuous covariate

We now consider a simulation setting where each network is a associated with a single continuous covariate, such as age. We again consider three cases with $n = 10, 20, 50$ networks in the population, with model matrix $x^T_i = (1, (i-1) / n)$ and $\beta = (a^T, b^T)^T$ where $a = (-3, 0.5, 0.5)^T$ and $b = (-2.6, 0.5, 0.2)^T$, so that the network-level parameter means are uniformly spaced between the vectors a and b. We then generate $\theta _i \sim \mathcal {N}(\mu _i, \Sigma ), ~ i = 1, \dots , n$ with $\Sigma $ as above, and

$$\begin{aligned} \mu _i = b + \frac{(i - 1)}{n - 1}(a - b), ~~ i = 1, \dots , n. \end{aligned}$$

Again, the posterior samples of $\beta $ concentrate around the true values for both the intercept and the covariate effect parameters as the number of networks in the group increases (Fig. 5).

4.1.3 Binary covariate

To complete our simulation study, we consider a multilevel setting in which the networks are split into two distinct groups $\mathcal {J}_1, \mathcal {J}_2$, so that $x^T_i = (1,0)$ if $i \in \mathcal {J}_1$ and $x^T_i = (0, 1)$ if $\mathcal {J}_2$. We set $\beta = (a^T, (b - a)^T)^T$ so that $\mu _i = a$ if $i \in \mathcal {J}_1$ and $\mu _i = b$ if $i \in \mathcal {J}_2$. As above, we first generated individual-level parameters $\theta _i \sim \mathcal {N}(\mu ^{(g_i)}, \Sigma ), ~ i = 1, \dots , n$, where $g_i \in \lbrace 1, 2 \rbrace $ denotes the group membership of the $i^{th}$ network, and then simulated networks $\pmb {y}_i \sim p(\cdot |\theta _i), ~ i = 1, \dots , n$. We considered a range of numbers of networks, $n = 10, 20, 50$ each of the two groups. The true values were

$$\begin{aligned} \mu ^{(1)}&= (-3, 0.5, 0.5)^T \end{aligned}$$

(17)

$$\begin{aligned} \mu ^{(2)}&= (-2.6, 0.5, 0.2)^T \end{aligned}$$

(18)

$$\begin{aligned} \Sigma&= \frac{1}{50}\begin{pmatrix} 1 &{} -0.5 &{} 0 \\ -0.5 &{} 0.5 &{} 0 \\ 0 &{} 0 &{} 0.5 \end{pmatrix}. \end{aligned}$$

(19)

Figure 6 shows the density of the posterior samples for the group-level parameters $(\mu ^{(1)}, \mu ^{(2)})$ for increasing number of networks n per group. We see that, as in the single-group setting, the posteriors concentrate around the true values for each group as the number of networks increases.

4.2 Application to human functional connectivity brain networks

We now turn our attention to a real data example: networks derived from resting-state fMRI scans of human brains from the Cambridge Centre for Ageing and Neuroscience (Cam-CAN) research project Shafto et al. (2014), a study on the effect of healthy ageing on cognitive and brain function. The Cam-CAN dataset consists of a range of cognitive tests and functional neuroimaging experiments for approximately 650 healthy individuals aged 18–87. Our aim will be to assess how the functional connectivity structure of the brain varies with age and fluid intelligence, as measured by the Cattell score.

Full details of data collection and preprocessing can be found in Lehmann et al. (2021). To summarise, both structural (T1 and T2) and eyes-closed, resting-state fMRI scans (261 volumes, lasting 8min 40 s) were acquired for each individual. The fMRI scans were motion-corrected and co-registered to the respective structural scans and then mapped to the common Montreal Neurological Institute (MNI) template to ensure comparability across individuals. The fMRI time series were then extracted from 90 cortical and subcortical regions of interest (ROIs) from the AAL atlas (Tzourio-Mazoyer et al. 2002) and adjusted for various confounds using the optimised pipeline of Geerligs et al. (2017).

To construct networks for each individual, we followed a thresholded correlation matrix approach. For individual i, we computed the pairwise Pearson correlation between each of the $N = 90$ preprocessed time series, yielding a $N \times N$ correlation matrix $\pmb {C}^{(i)}$. We then applied a threshold r to $\pmb {C}^{(i)}$ to produce an $N \times N$ adjacency matrix, $\pmb {A}^{(i)}$, with entries:

$$\begin{aligned} \pmb {A}^{(i)}_{kl} = {\left\{ \begin{array}{ll} 1 &{}\text {if } \pmb {C}^{(i)}_{kl} \ge r \quad k,l=1,\dots ,N\\ 0 &{}\text {otherwise.} \end{array}\right. } \end{aligned}$$

(20)

The adjacency matrix defines an individual’s network, $\pmb {y}^{(i)}$, with an edge between nodes k and l if and only if $\pmb {A}^{(i)}_{kl}=1$. The threshold r was chosen to yield an average node degree of 3 across all the networks, as recommended by Fallani et al. (2017). See Table 1 for summary statistics on the resulting networks for these individuals, as well as their ages and Cattell scores.

Table 1 Summary of age, Cattell score, network density, and network transitivity for all individuals in the Cam-CAN fMRI dataset, as well as the youngest 100 individuals, and the oldest 100 individuals

Full size table

We model the population of networks using the framework described in Sect. 2.2 with an exponential random graph model with four terms: total number of edges (‘edges’), total number of edges between nodes in the same hemisphere (‘nodematch.hemisphere’), total number of edges between homotopic nodes (mirror ROIs in each hemisphere; ‘nodematch.homotopy’) and the geometrically-weighted edgewise-shared partner (GWESP) statistic with decay parameter $\tau = 0.9$ (‘gwesp.fixed.0.9’).

4.2.1 Young vs. old

We first turn our attention to an age-only analysis, comparing the functional connectivity network structure between the 100 youngest individuals, indexed $\mathcal {J}_{\text {young}}$, aged 18-33, and the 100 oldest individuals, $\mathcal {J}_{\text {old}}$, aged 74–87. As in the simulation experiment with a binary covariate, we have $x^T_i = (1,0)$ if $i \in \mathcal {J}_{\text {young}}$ and $x^T_i = (0, 1)$ if $\mathcal {J}_{\text {old}}$.

We used the exchange-within-Gibbs algorithm with ASIS to generate 22,000 posterior samples, discarding the first 2,000 samples as burn-in. Figure 7 shows summaries of the posterior samples for $(\mu ^{(1)}, \mu ^{(2)})$, with the trace and autocorrelation plots demonstrating that the MCMC has mixed well. The posterior density plots show that the clearest difference between the old group and the young group was the difference in the parameter associated with the number of edges between homotopic nodes (‘nodematch.homotopy’). While this parameter is large and positive for both groups, it is moderately smaller in the old group, indicating that the propensity for homotopic connections is lower in old age. On the other hand, there is no clear evidence for group differences in the remaining parameters. The edges parameters are large and negative, pointing to the overall sparsity of the networks; the intrahemisphere parameters (‘nodematch.hemisphere’) are small and positive, indicating a moderate propensity for connections between nodes in the same half of the brain; and the GWESP parameters are also positive, indicating a propensity to form triangles and thus a degree of functional segregation (Bullmore and Sporns 2009).

To assess goodness-of-fit, for both groups we generated $S=100$ networks from the model at posterior samples of $\mu ^{(j)}$ chosen uniformly at random. Figure 8 indicates a reasonable fit for both groups, with the geodesic distance and edgewise shared partner distributions showing a good correspondence between the simulated networks and the observed networks. There appears to be a slight discrepancy in the degree distributions, with the simulated networks in the young group in particular having fewer nodes of degree 4 to 6 relative to the observed networks.

4.2.2 Age and fluid intelligence

Finally, we consider a model that jointly assesses the effect of age and fluid intelligence on the brain’s functional connectivity structure. We use the same ERGM summary statistics as before - edges, ‘nodematch.hemisphere’, ‘nodematch.homotopy’, GWESP - and set

$$\begin{aligned} x^T_i = (1, \text {age}_i, \text {IQ}_i, \text {age}_i*\text {IQ}_i), \end{aligned}$$

where $\text {age}_i$ and $\text {IQ}_i$ denote the age and Cattell score (a measure of fluid intelligence), respectively, of individual i. For this model, we took a subset of 100 individuals across the range of non-missing Cattell scores. We highlight the use of the interaction term between age and fluid intelligence to capture the joint effect of these two covariates over and above their corresponding main effects. We again used the exchange-within-Gibbs algorithm with ASIS to generate 22,000 posterior samples, and discarded the first 2,000 samples as burn-in.

Figure 9 shows the density plots for the resulting posterior samples. As with the previous analysis comparing a group of young individuals and a group of old individuals, the clearest age-related effect was associated with the number of edges between homotopic nodes. Higher fluid intelligence, as measured by the Cattell score, was associated with a higher propensity for the total number of edges, but a lower propensity for both intrahemispheric connections and homotopic connections. Reduced homotopic connectivity has previously been observed in rs-fMRI networks, with evidence suggesting that reduced synchrony between brain hemispheres at rest may be predictive of higher intelligence (Santarnecchi et al. 2015). The parameter estimates for the age - fluid intelligence interaction term were centred around zero, indicating no additional effect on top of the additive effects associated with age and fluid intelligence separately. To explore possible non-linear effects of age and fluid intelligence, we also fit a model with quadratic terms for age and Cattell score, finding no quadratic effects for age but a small quadratic effects on intrahemispheric connections (positive) and triangle propensity (GWESP; negative) (Supplementary Figure 3).

5 Discussion

The main contribution of this article is to introduce a multilevel framework for modelling populations of networks with network-level covariate information, along with a novel MCMC procedure for performing inference with the framework. While the framework itself is a natural multilevel extension of a single ERGMs, the inference procedure is more involved due to the intractability of the ERGMs likelihood and the challenges associated with MCMC for hierarchical models. We have presented how our framework can be applied to resting-state fMRI data to assess how the brain’s functional connectivity network structure varies with age and intelligence score. Although we chose here to focus on networks constructed from resting-state fMRI scans, our framework could also be applied to networks derived from other neuroimaging modalities such as magnetoencephalography (MEG) or diffusion tensor imaging (DTI).

An important extension to the framework would be to use weighted exponential random graph models (Krivitsky 2012; Desmarais and Cranmer 2012). These are an extension of the binary ERGMs that can be applied to weighted networks, thus avoiding the thresholding step in the construction of functional connectivity networks. Indeed, one version of a weighted ERGMs, the generalised exponential random graph model (GERGM) (Desmarais and Cranmer 2012) was recently applied to a 20-node functional connectivity network (Stillman et al. 2017). This approach has the additional advantage of modelling the mean connectivity directly and thus would avoid any confounding due to differences in mean connectivity. However, the GERGM is at present extremely computational intensive, rendering it infeasible for a population of networks.

One of the key challenges in applying our framework to real data is the choice of which network summary statistics to include in the model. A fully Bayesian model selection method based on reversible-jump MCMC has been developed for exponential random graph models on single networks (Caimo and Friel 2013). A similar approach could be developed for our framework, though the computational cost is likely to be prohibitive. A more pragmatic approach would be to develop a graphical goodness-of-fit method by comparing the posterior predictive distributions under different models. More flexible specifications of the relationship between the covariates and ERGM parameters, such as spline-based models, would also be a fruitful avenue for future work.

The computational cost of our MCMC algorithm is considerable. Even with a 20-core computing cluster (Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz), the algorithm took over 5 h to produce the 22,000 posterior samples in the real data example presented above. The main computational bottleneck lies in simulating the exponential random graphs at each MCMC iteration. While the computational cost should increase roughly linearly in the number of networks, Krivitsky and Handcock (2014) provide empirical evidence indicating that the cost may grow on the order of $p(N + E)\log (E)$ where p is the number of summary statistics, N is the number of nodes, and E is the number of edges. It may be possible to reduce the number of ERGMs simulations at each MCMC iteration using noisy Monte Carlo methods (Alquier et al. 2016). Other promising avenues include variational inference for ERGMs (Tan and Friel 2020), or pseudolikelihood methods (Bouranis et al. 2017), which could both be extended to our framework to yield approximate Bayesian inference at a much reduced computational cost relative to MCMC.

References

Achard, S., Salvador, R., Whitcher, B., Suckling, J., Bullmore, E.: A resilient, low-frequency, small-world human brain functional network with highly connected association cortical hubs. J. Neurosci. 26(1), 63–72 (2006). https://doi.org/10.1523/jneurosci.3874-05.2006
Article Google Scholar
Alquier, P., Friel, N., Everitt, R., Boland, A.: Noisy Monte Carlo: convergence of Markov chains with approximate transition kernels. Stat. Comput. 26(1), 29–47 (2016). https://doi.org/10.1007/s11222-014-9521-x
Article MathSciNet Google Scholar
Bouranis, L., Friel, N., Maire, F.: Efficient Bayesian inference for exponential random graph models by correcting the pseudo-posterior distribution. Soc. Netw. 50(Supplement C), 98–108 (2017). https://doi.org/10.1016/j.socnet.2017.03.013
Article Google Scholar
Bullmore, E., Sporns, O.: Complex brain networks: graph theoretical analysis of structural and functional systems. Nat. Rev. Neurosci. 10(3), 186–198 (2009). https://doi.org/10.1038/nrn2575
Article Google Scholar
Butts, C.T.: A perfect sampling method for exponential family random graph models. J. Math. Sociol. 42(1), 17–36 (2018)
Article MathSciNet Google Scholar
Caimo, A., Friel, N.: Bayesian inference for exponential random graph models. Soc. Netw. 33(1), 41–55 (2011). https://doi.org/10.1016/j.socnet.2010.09.004
Article Google Scholar
Caimo, A., Friel, N.: Bayesian model selection for exponential random graph models. Soc. Netw. 35(1), 11–24 (2013). https://doi.org/10.1016/j.socnet.2012.10.003
Article Google Scholar
Caimo, A., Pallotti, F., Lomi, A.: Bayesian exponential random graph modelling of interhospital patient referral networks. Stat. Med. 36(18), 2902–2920 (2017). https://doi.org/10.1002/sim.7301
Article MathSciNet Google Scholar
Caimo, A., Bouranis, L., Krause, R., Friel, N.: Statistical network analysis with bergm. J. Stat. Softw. 104(1), 1–23 (2022). https://doi.org/10.18637/jss.v104.i01
Article Google Scholar
Desmarais, B.A., Cranmer, S.J.: Statistical inference for valued-edge networks: the generalized exponential random graph model. PLoS ONE 7(1), 30136 (2012)
Article Google Scholar
Durante, D., Dunson, D.B., et al.: Bayesian inference and testing of group differences in brain networks. Bayesian Anal. 13(1), 29–58 (2018)
Article MathSciNet Google Scholar
Durante, D., Dunson, D.B., Vogelstein, J.T.: Nonparametric Bayes modeling of populations of networks. J. Am. Stat. Assoc. 112(520), 1516–1530 (2017)
Article MathSciNet Google Scholar
Everitt, R.G.: Bayesian parameter estimation for latent Markov random fields and social networks. J. Comput. Graph. Stat. 21(4), 940–960 (2012). https://doi.org/10.1080/10618600.2012.687493
Article MathSciNet Google Scholar
Fallani, F.D.V., Latora, V., Chavez, M.: A topological criterion for filtering information in complex brain networks. PLoS Comput. Biol. 13(1), 1005305 (2017). https://doi.org/10.1371/journal.pcbi.1005305
Article Google Scholar
Frank, O., Strauss, D.: Markov graphs. J. Am. Stat. Assoc. 81(395), 832–842 (1986)
Article MathSciNet Google Scholar
Fuster, J.M.: The cognit: a network model of cortical representation. Int. J. Psychophysiol. 60(2), 125–132 (2006)
Article Google Scholar
Geerligs, L., Tsvetanov, K.A., Cam-CAN, Henson, R.N.: Challenges in measuring individual differences in functional connectivity using fMRI: The case of healthy aging. Human Brain Mapping 38(8), 4125–4156 (2017) https://doi.org/10.1002/hbm.23653
Gelfand, A.E., Sahu, S.K., Carlin, B.P.: Efficient parametrisations for normal linear mixed models. Biometrika 82(3), 479–488 (1995)
Article MathSciNet Google Scholar
Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. PAMI–6(6), 721–741 (1984). https://doi.org/10.1109/TPAMI.1984.4767596
Article Google Scholar
Ginestet, C.E., Nichols, T.E., Bullmore, E.T., Simmons, A.: Brain network analysis: separating cost from topology using cost-integration. PLoS ONE 6(7), 1–17 (2011). https://doi.org/10.1371/journal.pone.0021570
Article Google Scholar
Ginestet, C.E., Li, J., Balachandran, P., Rosenberg, S., Kolaczyk, E.D.: Hypothesis testing for network data in functional neuroimaging. Ann. Appl. Stat. 11(2), 725–750 (2017). https://doi.org/10.1214/16-AOAS1015
Article MathSciNet Google Scholar
Haario, H., Saksman, E., Tamminen, J., et al.: An adaptive Metropolis algorithm. Bernoulli 7(2), 223–242 (2001)
Article MathSciNet Google Scholar
Hastings, W.K.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1), 97–109 (1970)
Article MathSciNet Google Scholar
Hoff, P.D., Raftery, A.E., Handcock, M.S.: Latent space approaches to social network analysis. J. Am. Stat. Assoc. 97(460), 1090–1098 (2002)
Article MathSciNet Google Scholar
Holland, P.W., Laskey, K.B., Leinhardt, S.: Stochastic blockmodels: first steps. Soc. Netw. 5(2), 109–137 (1983)
Article MathSciNet Google Scholar
Hunter, D., Handcock, M., Butts, C., Goodreau, S., Morris, M.: ERGM: a package to fit, simulate and diagnose exponential-family models for networks. J. Stat. Softw. Artic. 24(3), 1–29 (2008). https://doi.org/10.18637/jss.v024.i03
Article Google Scholar
Hunter, D.R., Goodreau, S.M., Handcock, M.S.: Goodness of fit of social network models. J. Am. Stat. Assoc. 103(481), 248–258 (2008). https://doi.org/10.1198/016214507000000446
Article MathSciNet Google Scholar
Kolaczyk, E.D.: Statistical Analysis of Network Data: Methods And Models. Springer Series in Statistics. Springer, New York (2009)
Book Google Scholar
Koskinen, J.H., Robins, G.L., Wang, P., Pattison, P.E.: Bayesian analysis for partially observed network data, missing ties, attributes and actors. Soc. Netw. 35(4), 514–527 (2013). https://doi.org/10.1016/j.socnet.2013.07.003
Article Google Scholar
Krivitsky, P.N.: Exponential-family random graph models for valued networks. Electron. J. Statist. 6, 1100–1128 (2012). https://doi.org/10.1214/12-EJS696
Article MathSciNet Google Scholar
Krivitsky, P.N., Handcock, M.S.: Supplementary material: a separable model for dynamic networks. J. R. Stat. Soc. Ser. B Stat. Methodol. 76(1), 29–46 (2014). https://doi.org/10.1111/rssb.12014
Article MathSciNet Google Scholar
Lehmann, B.C.L., Henson, R.N., Geerligs, L., White, S.R.: Characterising group-level brain connectivity: a framework using Bayesian exponential random graph models. NeuroImage 225, 117480 (2021). https://doi.org/10.1016/j.neuroimage.2020.117480
Article Google Scholar
Mukherjee, S.S., Sarkar, P., Lin, L.: On clustering network-valued data. In: Proceedings of the 31st international conference on neural information processing systems. NIPS’17, pp. 7074–7084. Curran Associates Inc., Red Hook, NY, USA (2017)
Murray, I., Ghahramani, Z., MacKay, D.J.C.: MCMC for doubly-intractable distributions. In: Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (UAI-06), pp. 359–366 (2006)
Obando, C., Fallani, F.D.V.: A statistical model for brain networks inferred from large-scale electrophysiological signals. J. R. Soc. Interface 14(128), 20160940 (2017). https://doi.org/10.1098/rsif.2016.0940
Article Google Scholar
Papaspiliopoulos, O., Roberts, G.O., Sköld, M.: Non-centered parameterisations for hierarchical models and data augmentation. In: Bernardo, J., Bayarri, M., Berger, J., Dawid, A., Heckerman, D., Smith, A., West, M. (eds.) Bayesian Statistics 7: Proceedings of the Seventh Valencia International Meeting, vol. 307 (2003). Oxford University Press, USA
Papaspiliopoulos, O., Roberts, G.O., Sköld, M.: A general framework for the parametrization of hierarchical models. Stat. Sci. 22(1), 59–73 (2007). https://doi.org/10.1214/088342307000000014
Article MathSciNet Google Scholar
Park, J., Haran, M.: Bayesian inference in the presence of intractable normalizing functions. J. Am. Stat. Assoc. 113(523), 1372–1390 (2018)
Article MathSciNet Google Scholar
Ren, S., Wang, X., Liu, P., Zhang, J.: Bayesian nonparametric mixtures of exponential random graph models for ensembles of networks. Soc. Netw. 74, 156–165 (2023). https://doi.org/10.1016/j.socnet.2023.03.005
Article Google Scholar
Roberts, G.O., Rosenthal, J.S., et al.: Optimal scaling for various Metropolis–Hastings algorithms. Stat. Sci. 16(4), 351–367 (2001)
Article MathSciNet Google Scholar
Roberts, G.O., Rosenthal, J.S.: Examples of adaptive MCMC. J. Comput. Graph. Stat. 18(2), 349–367 (2009)
Article MathSciNet Google Scholar
Roberts, G.O., Gelman, A., Gilks, W.R., et al.: Weak convergence and optimal scaling of random walk metropolis algorithms. Ann. Appl. Probab. 7(1), 110–120 (1997)
MathSciNet Google Scholar
Santarnecchi, E., Tatti, E., Rossi, S., Serino, V., Rossi, A.: Intelligence-related differences in the asymmetry of spontaneous cerebral activity. Hum. Brain Mapp. 36(9), 3586–3602 (2015). https://doi.org/10.1002/hbm.22864
Article Google Scholar
Schweinberger, M., Handcock, M.S.: Local dependence in random graph models: characterization, properties and statistical inference. J. R. Stat. Soc. Ser. B 77(3), 647–676 (2015). https://doi.org/10.1111/rssb.12081
Article MathSciNet Google Scholar
Schweinberger, M., Krivitsky, P.N., Butts, C.T., Stewart, J.R.: Exponential-family models of random graphs: inference in finite, super and infinite population scenarios. Statist. Sci. 35(4), 627–662 (2020). https://doi.org/10.1214/19-STS743
Article MathSciNet Google Scholar
Shafto, M.A., Tyler, L.K., Dixon, M., Taylor, J.R., Rowe, J.B., Cusack, R., Calder, A.J., Marslen-Wilson, W.D., Duncan, J., Dalgleish, T., Henson, R.N., Brayne, C., Matthews, F.E.: The Cambridge centre for ageing and neuroscience (Cam-CAN) study protocol: a cross-sectional, lifespan, multidisciplinary examination of healthy cognitive ageing. BMC Neurol. 14(1), 1–25 (2014). https://doi.org/10.1186/s12883-014-0204-1
Article Google Scholar
Signorelli, M., Wit, E.C.: Model-based clustering for populations of networks. Stat. Model. 20(1), 9–29 (2020)
Article MathSciNet Google Scholar
Simpson, S.L., Hayasaka, S., Laurienti, P.J.: Exponential random graph modeling for complex brain networks. PLoS ONE 6(5), 20039 (2011)
Article Google Scholar
Simpson, S.L., Moussa, M.N., Laurienti, P.J.: An exponential random graph modeling approach to creating group-based representative whole-brain connectivity networks. Neuroimage 60(2), 1117–1126 (2012). https://doi.org/10.1016/j.neuroimage.2012.01.071
Article Google Scholar
Sinke, M.R.T., Dijkhuizen, R.M., Caimo, A., Stam, C.J., Otte, W.M.: Bayesian exponential random graph modeling of whole-brain structural networks across lifespan. NeuroImage 135(Supplement–C), 79–91 (2016). https://doi.org/10.1016/j.neuroimage.2016.04.066
Article Google Scholar
Slaughter, A.J., Koehly, L.M.: Multilevel models for social networks: hierarchical Bayesian approaches to exponential random graph modeling. Soc. Netw. 44, 334–345 (2016)
Article Google Scholar
Stillman, P.E., Wilson, J.D., Denny, M.J., Desmarais, B.A., Bhamidi, S., Cranmer, S.J., Lu, Z.-L.: Statistical modeling of the default mode brain network reveals a segregated highway structure. Sci. Rep. 7(1), 11694 (2017). https://doi.org/10.1038/s41598-017-09896-6
Article Google Scholar
Sweet, T. M., Thomas, A. C., Junker, B. W.: Hierarchical mixed membership stochastic blockmodels for multiple networks and experimental interventions. Handbook on mixed membership models and their applications. 463–488 (2014)
Sweet, T.M., Thomas, A.C., Junker, B.W.: Hierarchical network models for education research: hierarchical latent space models. J. Edu. Behav. Stat. 38(3), 295–318 (2013)
Article Google Scholar
Tan, L.S., Friel, N.: Bayesian variational inference for exponential random graph models. J. Comput. Graph. Stat. 1–19 (2020)
Thiemichen, S., Friel, N., Caimo, A., Kauermann, G.: Bayesian exponential random graph models with nodal random effects. Soc. Netw. 46, 11–28 (2016). https://doi.org/10.1016/j.socnet.2016.01.002
Article Google Scholar
Tierney, L.: Markov chains for exploring posterior distributions. Ann. Statist. 22(4), 1701–1728 (1994). https://doi.org/10.1214/aos/1176325750
Article MathSciNet Google Scholar
Tzourio-Mazoyer, N., Landeau, B., Papathanassiou, D., Crivello, F., Etard, O., Delcroix, N., Mazoyer, B., Joliot, M.: Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. Neuroimage 15(1), 273–289 (2002). https://doi.org/10.1006/nimg.2001.0978
Article Google Scholar
Wang, J., Atchadé, Y.F.: Approximate Bayesian computation for exponential random graph models for large social networks. Commun. Stat. Simul. Comput. 43(2), 359–377 (2014)
Article MathSciNet Google Scholar
Wang, P., Robins, G., Pattison, P., Lazega, E.: Exponential random graph models for multilevel networks. Soc. Netw. 35(1), 96–115 (2013)
Article Google Scholar
Yin, F., Butts, C.T.: Highly scalable maximum likelihood and conjugate Bayesian inference for ERGMs on graph sets with equivalent vertices. PLoS ONE 17(8), 1–38 (2022). https://doi.org/10.1371/journal.pone.0273039
Article Google Scholar
Yin, F., Shen, W., Butts, C.T.: Finite mixtures of ERGMS for modeling ensembles of networks. Bayesian Anal. 17(4), 1153–1191 (2022)
Article MathSciNet Google Scholar
Yu, Y., Meng, X.-L.: To center or not to center: that is not the question-an ancillarity-sufficiency interweaving strategy (ASIS) for boosting MCMC efficiency. J. Comput. Graph. Stat. 20(3), 531–570 (2011). https://doi.org/10.1198/jcgs.2011.203main
Article MathSciNet Google Scholar

Download references

Acknowledgements

B.L. and S.W. were supported by the UK Medical Research Council [Programme number U105292687]. B.L. was also supported by the UK Engineering and Physical Sciences Research Council through the Bayes4Health programme [Grant number EP/R018561/1] and gratefully acknowledges funding from Jesus College, Oxford. This research was supported by the NIHR Cambridge Biomedical Research Centre (BRC-1215-20014). The computational aspects of this research were supported by the Wellcome Trust Core Award Grant Number 203141/Z/16/Z and the NIHR Oxford BRC. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care.

Author information

Authors and Affiliations

Department of Statistical Science, University College London, 1-19 Torrington Place, London, WC1e 7HB, UK
Brieuc Lehmann
Department of Psychiatry, University of Cambridge, Cambridge, CB2 0AH, UK
Simon White
MRC Biostatistics Unit, University of Cambridge, Cambridge, CB2 0SR, UK
Simon White

Authors

Brieuc Lehmann
View author publications
You can also search for this author in PubMed Google Scholar
Simon White
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Brieuc Lehmann or Simon White.

Ethics declarations

Conflict of interest

The authors have no Conflict of interest to declare.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 464 KB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Lehmann, B., White, S. A Bayesian multilevel model for populations of networks using exponential-family random graphs. Stat Comput 34, 136 (2024). https://doi.org/10.1007/s11222-024-10446-0

Download citation

Received: 03 July 2023
Accepted: 03 June 2024
Published: 19 June 2024
DOI: https://doi.org/10.1007/s11222-024-10446-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Bayesian multilevel model for populations of networks using exponential-family random graphs

Abstract

Similar content being viewed by others

Weighted Stochastic Block Models of the Human Connectome across the Life Span

Normalised degree variance

Null models in network neuroscience

1 Introduction

1.1 Related work

1.2 Hierarchical ERGMs

1.3 ERGMs for brain networks

1.4 Other models for populations of networks

2 Model formulation

2.1 Exponential random graph models

2.2 A Bayesian multilevel model for populations of networks

2.2.1 Prior specification

3 Posterior computation

3.1 The exchange-within-Gibbs algorithm

3.1.1 Choice of parametrisation: centering vs. non-centering

3.1.2 Proposal adaptation

3.2 Posterior predictive assessment

4 Results

4.1 Simulation

4.1.1 No covariate information

4.1.2 Continuous covariate

4.1.3 Binary covariate

4.2 Application to human functional connectivity brain networks

4.2.1 Young vs. old

4.2.2 Age and fluid intelligence

5 Discussion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 464 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation