Meta-analytic Gaussian Network Aggregation

A growing number of publications focus on estimating Gaussian graphical models (GGM, networks of partial correlation coefficients). At the same time, generalizibility and replicability of these highly parameterized models are debated, and sample sizes typically found in datasets may not be sufficient for estimating the underlying network structure. In addition, while recent work emerged that aims to compare networks based on different samples, these studies do not take potential cross-study heterogeneity into account. To this end, this paper introduces methods for estimating GGMs by aggregating over multiple datasets. We first introduce a general maximum likelihood estimation modeling framework in which all discussed models are embedded. This modeling framework is subsequently used to introduce meta-analytic Gaussian network aggregation (MAGNA). We discuss two variants: fixed-effects MAGNA, in which heterogeneity across studies is not taken into account, and random-effects MAGNA, which models sample correlations and takes heterogeneity into account. We assess the performance of MAGNA in large-scale simulation studies. Finally, we exemplify the method using four datasets of post-traumatic stress disorder (PTSD) symptoms, and summarize findings from a larger meta-analysis of PTSD symptom. Supplementary Information The online version supplementary material available at 10.1007/s11336-021-09764-3.


GGM Estimation Example
In this supplement, we will detail the steps used in maximum likelihood estimation of a Gaussian graphical model (GGM) for a small example. Suppose that a dataset of three variables with 1,000 observations gives the following sample correlation matrix: Given that there are three variables, there are 3 · 2/2 = 3 potential parameters in the GGM, leading to the model parameters θ θ θ = ω 21 ω 31 ω 32 . We first wish to estimate a saturated GGM in which all parameters are included. The manual Jacobian then becomes an identity matrix: This matrix encodes a model in which the free parameters (parameters to be estimated) are ψ 1 = ω 21 , ψ 2 = ω 31 , and ψ 3 = ω 32 . Next, we could estimate the parameters by minimizing Equation (2) (finding a solution on which Equation (3) is a vector of zeroes). For the saturated model, however, we don't have to perform iterative model search and can obtain ML estimates directly by standardizing the inverse of R R R and multiplying offdiagonal elements with −1 Lauritzen, 1996). This process is automated in, for example, the cor2pcor function from the corpcor package (Schafer et al., 2017). The ML estimate becomes: Ω Ω Ω = With this estimate, we can compute the parameter variance-covariance matrix in Equation (6). The square root of the diagonal of this matrix gives the standard errors 0.028, 0.032, and 0.030, which can be used to obtain the p-values in order to test whether each edge is non-zero: The edge between the first and the third variable is not significant. Next, we could fit a model with this edge constrained to zero. To do this, we specify the manual Jacobian as: This model encodes that we now have two parameters to estimate: ψ 1 = ω 21 and ψ 2 = ω 32 . As there is no free parameter for the edge ω 31 , this parameter is no longer included in the model and its value will be fixed to its starting value (in this case, 0). All other Jacobians have the same form as before (although numeric values will differ). Next, we can fix this element to zero in a matrix of starting values for Ω Ω Ω, and we can numerically optimize Equation (2). We then obtain as estimates: Note that these estimates are slightly different also for other edges in the model. We will term this process of removing non-significant edges and re-estimating parameters pruning. This is the most basic form of model selection of network models. 1

A Tutorial on GGM Estimation using psychonetrics
In this supplement, we describe how the psychonetrics package (Epskamp, 2020a(Epskamp, , 2020b for the statistical programming language R (R Core Team, 2019) can be used for performing the analyses described in the paper. For the single-group models there are many other software packages that estimate pruned GGMs in which some edges are set to equal zero. For example, the glasso (Friedman et al., 2019;Friedman et al., 2008) package can be used to estimate a GGM for a given network structure (the regularization parameter needs to be set to zero), which is utilized in the ggmFit function in the qgraph package (Epskamp et al., 2012). Furthermore, the GGMnonreg (Williams et al., 2019) can be used for ML pruning methods. More advanced ML model search strategies have also been implemented in qgraph (the ggmModSelect routine) and GGMnonreg. Finally, Bayesian estimation procedures have been implemented in the BDgraph (Mohammadi & Wit, 2019) and BGGM (Williams & Mulder, 2020) packages.
First, we need to install the psychonetrics package: 2 install.packages("psychonetrics") Next, we can load the psychonetrics package as well as the qgraph package for drawing networks and the dplyr (Wickham et al., 2019) package to gain access to the pipe operator %>%: library("psychonetrics") library("dplyr") library("qgraph") The pipe operator allows us to write f(x,...)as x %>% f(...), in which x is some object, f some function, and ... any number of arguments. This notation simplifies reading R code, as code can now be read from left to right rather than from inside to outside.

nobs <-1000
In psychonetrics, we first need to form a model object, which is subsequently evaluated and adjusted. The ggm function can be used to form a GGM model: mod1 <-ggm( covs = corMat, nobs = nobs, corinput = TRUE ) The command above forms a model based on a correlation matrix as input: the covs argument takes a single covariance (or correlation) matrix or a list of such matrices, the nobs argument takes a single number of observations or a vector for each dataset, and the corinput argument tells psychonetrics that the input is a correlation matrix. By default, a model is formed in which all edges are included. For fitting a pre-specified GGM structure, the omega argument could be supplied a binary matrix with a 1 indicating a parameter is free and a 0 indicating a parameter is fixed to zero. Next, we can run the model: Here, the var1, op, and var2 columns indicate what type of parameter is estimated, the est column indicates the ML estimate, the se column the standard error, the p column the p-value (which slightly differs from above due to rounding), the row and col columns the row and column index of the matrix, and finally the par column the parameter number (e.g., row and column index of the Fisher information matrix). To fit a model in which the edge V3 --V1 is fixed to zero we can use the fixpar command: mod2 <-mod1 %>% fixpar("omega","V1","V3") %>% runmodel Alternatively, we could have used the prune function to automate this: mod2 <-mod1 %>% prune(alpha = 0.05) We can compare these models using the compare function: The output includes the AIC, BIC, and a χ 2 difference test (which is only valid if the models are nested). All these measures indicate that the pruned model fits better. Finally, to obtain the network structure, we can use the getmatrix command: net <-getmatrix(mod2, "omega") which can subsequently be plotted using qgraph: qgraph(net, theme = "colorblind", layout = "spring")

Multiple Dataset ML Estimation: Fixed-effects MAGNA
Suppose we observed two more correlation matrices based on samples of 500 cases each: nobs <-c(1000, 500, 500)

Two-stage Estimation
In two-stage estimation, we first fit a model for a common correlation matrix over all datasets. The model can be formed with the corr function: mod_stage1 <-corr( covs = corMats, nobs = nobs, corinput = TRUE, equal = "rho" ) %>% runmodel Setting the equal command to "rho" (P P P ) tells psychonetrics we wish to fit an equal correlational structure (alternatively, the groupequal command could have been used after forming the model). Typical SEM fit indices can be obtained to check if this pooled correlational structure fits well:

Fisher <-mod_stage1@information
Which can subsequently be used for the stage two model: mod_stage2 <-ggm( covs = pool_cors, nobs = sum(nobs), corinput = TRUE, estimator = "WLS", WLS.W = Fisher ) %>% runmodel The estimator = "WLS" argument tells psychonetrics to use WLS estimation, and the WLS.W argument sets the WLS weights matrix. This model can be used in exactly the same way as the model described in section 2.

Multi-dataset Estimation
For multi-dataset estimation, we can form a common model directly using the ggm function, using the same command as above but replacing corr with ggm and specifying that now the omega matrix must be contained equal: mod_common <-ggm( covs = corMats, nobs = nobs, corinput = TRUE, equal = "omega" ) %>% runmodel Like before, we can look at the parameter estimates with:

mod_common %>% parameters
Which shows the exact same estimates as above.

Multiple Dataset ML Estimation: Random-effects MAGNA
Continuing the example from Section 2.2, we can form the random-effects MAGNA model with the meta_ggm function: mod_ranef <-meta_ggm( cors = corMats, nobs = nobs, Vmethod = "individual", Vestimation = "per_study" ) Here, the Vmethod argument controls how we form the V V V i matrix, which can be set to "individual" for individual estimation of "pooled" for pooled estimation. The Vestimation argument controls how the estimator will handle the sampling variation, which can be set to "averaged" for averaging over all estimates or "per_study" to use unique estimates per study. As before, we can use the runmodel and prune commands to run and prune the model: showing that the same edge as before was removed. All the elements of T T T are estimated to be near zero and non-significant, indicating that there are now random effects. The estimates of the random effect standard deviations can be obtained via: mod_ranef %>% getmatrix("sigma_randomEffects") %>% diag %>% sqrt %>% round (3) which also show very low estimates: [1] 0.019 0.023 0.002

Full-information Maximum Likelihood
The log-likelihood function L will typically take the form of a sum of log-likelihoods for n individual cases: Often, L will only be a function of summary statistics from the data, and there is no need to explicitly sum over individual likelihoods in the fit function computation. However, there may be reasons to instead define the fit function per individual case. In the main text, we discuss for example that the distribution may be assumed differently per case due to each case representing a set of sample correlations with more or less sampling variation. Another common reason for using a case-wise fit function is when some observations are missing. When using maximum likelihood estimation, the term full information maximum likelihood (FIML) estimation is used to describe this optimization scheme.
Here, we can define the fit function for each individual case: The gradient and Fisher information functions simply take the form of a sum over each case: For each case-wise fit function, we may define a set of case-specific distribution parameters φ φ φ c , which is still a function of a single set of model parameters θ θ θ which in turn is still a function of a single set of free parameters ψ ψ ψ. Some datasets may feature missing variables, leading to less distribution parameters to be needed to evaluate the fit function. Let φ φ φ * c denote a subset of φ φ φ c containing only distribution parameters used in the model for case c (e.g., only the means, variances and co-variances related to the variables that are not missing for case c). We can use a filter matrix F F F c with zeroes and ones such that: In the special case where no variables are missing, F F F c = I I I. We make use of the casespecific distribution set φ φ φ c and its subset φ φ φ * c in two ways: • We can assume every case is an independent replication of the same model, but not every case has responded to every item. In this case, all case-wise distribution parameters are equal (φ φ φ 1 = φ φ φ 2 = . . . = φ φ φ), but the subsets φ φ φ * may differ, as different cases may require different subsets of parameters (e.g., a person that did not respond to item 1 does not need the mean of item 1).
• We can also assume that cases are not replications from the same model, but rather that a different model generated each of the responses. We do this when modeling sample correlation coefficients to take into account that samples with a low sample size feature less sampling variation than samples with a large sample size.
The case-wise Jacobian takes the same form as Equation (4), with the exception that the distribution Jacobian is now defined per case separately and the filter matrix is added: Likewise, the case-wise fisher function follows a similar extension: The case-specific model Jacobian will take the exact same form as the general model Jacobian when FIML is not used. The distribution Jacobian and Distribution Hessian can be defined per individual case, and usually takes the same form of the derivatives in the general case. No new manual Jacobian is required.

Empirical Example: Depression, Anxiety and Stress
For a second empirical illustration, we analyzed data of the short version of the Depression Anxiety Stress Scales (DASS21; Lovibond and Lovibond, 1995), which contains 21 items intended to measure depression, anxiety and stress. Table 1 gives a description of the variables included in the DASS21. We obtained the data from the Open Source Psychometrics Project (openpsychometrics.org), which contained n = 39,775 full responses (no missings) on 21 items measured on a 4-point scale. To illustrate MAGNA, we split the data in 40 random samples (average n = 994.3). We estimated a model on the full dataset as described in Section 3, a multi-dataset fixed-effects MAGNA model as described in Section 4.2, and finally a random-effects MAGNA model (individual and averaged estimation) as described in Section 5. As these 'datasets' are drawn from the same population, our expectation is that these results would align. To increase the feasibility of this analysis, we used a slightly lower relative convergence tolerance of 1 × 10 5 . Figure 1 shows the results of the DASS21 analysis. Panel 1a shows the estimated GGM structures, and Panel 1b the correspondence between edge-weights. The full dataset model and fixed-effects MAGNA model are near identical. The random-effects MAGNA model, on the other hand, is highly similar but sparser than the other models.    Lovibond and Lovibond, 1995). Orange nodes indicate anxiety items, blue nodes depression items, and green nodes stress items. Node descriptions can be read in Table 1.  Edge weights that were included in the random-effects GGM were near identical to edge weights in the other models (Panel 1b). The average random-effects standard deviation was estimated to be 0.06, which is lower than the PTSD example, but still relatively high. As such, this dataset shows that all methods recover the same dominant structure, but also that including random effects in the model costs statistical power in detecting weaker edge weights, which is not surprising given the vastly increased model complexity of random-effects MAGNA compared to fixed-effects MAGNA.

Estimating Multi-group Ising Models from Summary Statistics
While the current paper focuses on the GGM for continuous data, there are other network models for other types of data that are also commonly used. The most prevalent other network model used is the Ising Model Ising, 1925;Marsman et al., 2018), which models dichotomous data. Originally used to model ferromagnetism, A particularly interesting property of the Ising model is that it can feature two stable states rather than one as in the GGM (Dalege et al., 2016;Dalege et al., 2018). More importantly, this behavior can be modeled with a single parameter, β, that controls the temperature in a system. Common methods for estimating Ising model parameters from data, such as the eLasso algorithm (van Borkulo et al., 2014), cannot directly estimate the β parameter, as it is not identified together with the network structure. We can make use of the estimator presented in Section 2 of the main text to also estimate Ising models together with equality constrains across parameters as well as across multiple datasets. In addition, we will see that these estimators also allow for the estimation of Ising model parameters from summary statistics-the variance-covariance matrix of the data-a novel contribution which has not yet been implemented in user-friendly software.
The presented methods below have been implemented in the psychonetrics (Epskamp, 2020b) package, and can be used through the function Ising. The package also contains example code and an example dataset. Let y y y C now represent a dichotomous random variable vector with observation y y y c ∈ {a, b} p for case c ∈ 1, . . . , n. Typically, encoding a = 0 and b = 1 are used in psychometric modeling and encodings a = −1 and b = 1 are used when using the Ising model as a computational model. While in the single group setting these encodings are equivalent and resulting parameters can be transformed between the two encodings , they are not equivalent in the multi-group setting with equality constrains on some parameters across groups. The interpretation of some parameters can be vastly different as well depending on the encoding used (Haslbeck et al., 2020). As such, this is an important consideration to make before estimating an Ising model. For the expressions below, however, the encoding is irrelevant, and can arbitrarily be defined as a and b.
The notation y y y here indicates that a sum is taken over all possible outcomes for y y y C . This sum can be very large, and becomes computationally intractable for models with many (e.g., over 20) nodes. However, many applications of Ising models in psychological datasets are on network structures of fewer nodes (e.g., Fried et al., 2015), in which case the evaluation of Z is entirely feasible.
Let v (1) i = n c=1 (y ci ) and v (2) ij = n c=1 (y ci y cj ) represent summary statistics of the data. These summary statistics can also be obtained directly from the sample means and sample variance-covariance matrix discussed in Section of the main text (there presented as sample means and covariances of the sample correlations rather than of the raw data): v v v (1) = nȳ y y V V V (2) = n S S S +ȳ y yȳ y y .
Using these summary statistics, we can characterize the sum of Hamiltonians over all cases, here defined as H * , as a function of these summary statistics: ij .
In this setting, we will not further model the parameters of the Ising model. As such, the distributional parameters φ φ φ and the model parameters θ θ θ are the same. For the single-group setting, these becomes: in which ω ω ω = vechs (Ω Ω Ω). This also simplifies the model Jacobian to be: ∂φ φ φ ∂θ θ θ = I I I.
Other setups are possible. For example, one could aim to model a low-rank Ising model as described by Marsman et al. (2015) in this setting as well by assigning a different model Jacobian. The manual Jacobian will be a matrix of ones and zeroes indicating which parameters are free to be estimated and which are fixed to their starting values. As the inverse temperature parameter β is not identified together with τ τ τ and Ω Ω Ω, it is typically constrained to 1, which can be modeled with: which is an identity matrix with the last diagonal element set to zero. Other (equality) constrains in the manual Jacobian can identify β, especially when extended to the multigroup setting. The distribution Jacobian for the single-group Ising model takes the following form: These elements then take the following form: H H H τ i ,τ j = 2β 2 (E (y i y j ) − E (y i ) E(y j )) H H H ω ij ,τ k = 2β 2 (E (y i y j y k ) − E (y i y j ) E (y k )) H H H ω ij ,ω kl = 2β 2 (E (y i y j y k y l ) − E (y i y j ) E (y k y l )) H H H τ i ,β = 2 (E (y i ) E (H) − E (y i H)) H H H ω ij ,β = 2 (E (y i y j ) E (H) − E (y i y j H)) H H H β,β = 2 E (H) 2 − E H 2 .
With these expressions, the Ising model can be estimated using maximum likelihood estimation from the sample means and variance-covariance structure exactly in the same manner the GGM can be estimated as described in the main text and supplementary materials. This allows for imposing equality constrains as well as to estimate an Ising model with certain edge weights fixed to zero. Extension to the multi-group setting can also be performed in exactly the same manner as described in the main text: by specifying the distribution Jacobians and Hessians described above for each group and imposing equality constrains across groups in the manual Jacobian.