A Partial Correlation Screening Approach for Controlling the False Positive Rate in Sparse Gaussian Graphical Models

Lafit, Ginette; Tuerlinckx, Francis; Myin-Germeys, Inez; Ceulemans, Eva

doi:10.1038/s41598-019-53795-x

A Partial Correlation Screening Approach for Controlling the False Positive Rate in Sparse Gaussian Graphical Models

Article
Open access
Published: 28 November 2019

Volume 9, article number 17759, (2019)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

A Partial Correlation Screening Approach for Controlling the False Positive Rate in Sparse Gaussian Graphical Models

Download PDF

Ginette Lafit^1,2,
Francis Tuerlinckx¹,
Inez Myin-Germeys² &
…
Eva Ceulemans¹

3862 Accesses
13 Citations
16 Altmetric
Explore all metrics

Abstract

Gaussian Graphical Models (GGMs) are extensively used in many research areas, such as genomics, proteomics, neuroimaging, and psychology, to study the partial correlation structure of a set of variables. This structure is visualized by drawing an undirected network, in which the variables constitute the nodes and the partial correlations the edges. In many applications, it makes sense to impose sparsity (i.e., some of the partial correlations are forced to zero) as sparsity is theoretically meaningful and/or because it improves the predictive accuracy of the fitted model. However, as we will show by means of extensive simulations, state-of-the-art estimation approaches for imposing sparsity on GGMs, such as the Graphical lasso, ℓ₁ regularized nodewise regression, and joint sparse regression, fall short because they often yield too many false positives (i.e., partial correlations that are not properly set to zero). In this paper we present a new estimation approach that allows to control the false positive rate better. Our approach consists of two steps: First, we estimate an undirected network using one of the three state-of-the-art estimation approaches. Second, we try to detect the false positives, by flagging the partial correlations that are smaller in absolute value than a given threshold, which is determined through cross-validation; the flagged correlations are set to zero. Applying this new approach to the same simulated data, shows that it indeed performs better. We also illustrate our approach by using it to estimate (1) a gene regulatory network for breast cancer data, (2) a symptom network of patients with a diagnosis within the nonaffective psychotic spectrum and (3) a symptom network of patients with PTSD.

The ‘un-shrunk’ partial correlation in Gaussian graphical models

Article Open access 07 September 2021

Learning mixed graphical models with separate sparsity parameters and stability-based model selection

Article Open access 06 June 2016

Novel model selection criteria on sparse biological networks

Article 06 February 2019

Introduction

In many scientific disciplines, researchers are interested in the linear dependencies and unique relations between larger sets of variables, such as genes¹, proteins², symptoms of a disease³, functional brain connectivity⁴, etc. There is consensus that computing all pairwise correlations between these variables is misleading, because such correlations do not correct for linear relations that might be due to other variables. Therefore, many researchers recur to calculating partial correlation coefficients, which express the remaining linear dependency between two variables, after the effect of the rest of the variables under study is removed. More specifically, Gaussian Graphical Models (GGMs) have become increasingly popular^5,6. These models yield an undirected network (i.e., undirected graph) in which the variables are depicted as nodes and the partial correlations among the variables are visualized as the edges among the nodes. The width of an edge reflects the size of the corresponding partial correlation (see Fig. 1).

Often, a sparse GGM is fitted, which implies that many of the partial correlations are forced to zero and thus that the corresponding edges in the network can be dropped. In some applications, the assumption of sparsity is intrinsic to the phenomenon under study. For instance, it has been shown that most genetic networks are sparse^7,8. In other applications, the assumption of sparsity is motivated through improved interpretability. Indeed, even if the true model is not sparse the sparsity assumption allows to more accurately estimate the remaining parameters when the amount of information per parameter (n/p) is relatively small⁹, and prevents overfitting¹⁰.

Popular methods to estimate sparse GGMs are the regularized nodewise regression approach of Meinshausen and Bühlmann¹¹, the joint sparse regression (SPACE) approach by Peng, et al.¹² and the Graphical lasso (Glasso) proposed by Friedman, Hastie and Tibshirani¹³. These three approaches optimize different objective functions (see Methods section) but all set some of the estimated parameters, and thus some of the network edges, to zero through ${\ell }_{1}$ penalization. This penalization boils down to summing the absolute values of the estimated parameters and adding this sum to the objective function, after multiplying it by a regularization parameter. This parameter determines the impact of the penalty and has to be tuned by the user. Different tuning approaches have been proposed, based on cross-validation, information criteria, or finite sample derivations. Yet, ${\ell }_{1}$ penalization often does not work well. Indeed, recent studies on the use of ${\ell }_{1}$ penalization in standard regression analysis have shown that it tends to yield too many non-zero regression weights^14,15,16. Translating these results to the estimation of sparse GGMs, we expect regularized nodewise regression, SPACE and the Glasso to often yield false positives, implying that some of the drawn edges should have been dropped. We will test this hypothesis in extensive simulations, in which we will also evaluate the effect of the tuning approach (i.e., information criteria, k-fold cross validation or finite sample derivations).

To overcome the problem of incorrectly included edges, we will present a novel approach, that we call Partial Correlation Screening (PCS). Our PCS approach consists of two steps. In the first step, we estimate a sparse partial correlation network using one of the state-of-the art methods mentioned above. In the second step, we try to filter out the false positives that will probably be present in the estimated network. To this end, we screen the resulting partial correlation matrix for values that are smaller in absolute value than a cross-validation based threshold and set these to zero. This novel approach is based on earlier work on thresholding after regularization. Specifically, Saligrama et al.¹⁷ and Descloux and Sardy¹⁸ proposed the idea of thresholding after applying an ${\ell }_{1}$ regularized procedure in the context of regression analysis. Ha and Sun¹⁹ presented a related idea for GGMs that consists of estimating the partial correlation matrix using a ridge penalty and then determining the non-zero entries of the matrix by hypothesis testing. Therefore, we will also evaluate what happens if we replace the ${\ell }_{1}$ penalty by a ridge penalty. We will apply the Partial Correlation Screening approach to the same simulated data to show that it indeed performs better. Finally, we will show how the PCS approach can be used to estimate networks based on real datasets: (1) a gene regulatory network of patients with breast cancer, (2) a symptom network of patients with a diagnosis within the nonaffective psychotic spectrum and (3) a symptom network of patients with Post-Traumatic Stress Disorder (PTSD).

The rest of the article is organized as follows. In the next section, we first present a toy example to introduce some notation and concepts and to illustrate that state-of-the-art estimation approaches yield networks that differ from the population model. Then, using this toy example we show how our PCS procedure works. Next, we discuss the results of two simulation studies, one based on settings that have been used in other papers on this topic and one based on the estimated network for a real data set. We present applications to real datasets. Next, we discuss our findings and formulate conclusions. Finally, the Methods section presents a detailed description of the evaluated tuning approaches for each of the state-of-the-art estimation approaches and of the PCS procedure.

Results

Toy example

The toy data consists of n = 100 observations that are sampled from a p = 6-dimensional multivariate Gaussian distribution. We set the covariance matrix of the distribution Σ to:

$$\Sigma =\left[\begin{array}{cccccc}1.63 & 0.00 & -0.70 & 0.00 & 0.63 & -0.70\\ 0.00 & 1.00 & 0.00 & 0.00 & 0.00 & 0.00\\ -0.70 & 0.00 & 1.68 & 0.00 & -0.70 & -0.13\\ 0.00 & 0.00 & 0.00 & 1.00 & 0.00 & 0.00\\ 0.63 & 0.00 & -0.70 & 0.00 & 1.63 & -0.70\\ -0.70 & 0.00 & -0.13 & 0.00 & -0.70 & 1.68\end{array}\right]$$

(1)

The conditional independence structure of this distribution can be represented by a GGM. The corresponding undirected network is shown in Fig. 1. The six variables X₁ to X₆ form the set of nodes V = {1, 2, 3, 4, 5, 6}. The set of edges E contains all node pairs (i, j) that are connected in the network, implying that X_i is conditionally dependent on X_j, given all the remaining variables. Thus, variable pairs that do not belong to the edge set are conditionally independent, given all remaining variables. For instance, in this illustration, the network shows an edge between variables X₃ and X₆. Therefore, these variables are conditionally dependent. However, there is no edge between variables X₁ and X₂, implying that X₁ and X₂ are conditionally independent.

Because the variables are Gaussian distributed, a variable pair (i, j) is conditionally independent if and only if their partial correlation given the rest of the variables is zero⁵. Let’s denote by Γ the partial correlation matrix. The entries ρ_{ij|V\{i, j}} of this matrix are the partial correlations between variables X_i and X_j, conditioned on the rest of variables. For the toy example the matrix Γ equals:

$$\Gamma =\left[\begin{array}{cccccc}1.00 & 0.00 & -0.45 & 0.00 & 0.00 & -0.45\\ 0.00 & 1.00 & 0.00 & 0.00 & 0.00 & 0.00\\ -0.45 & 0.00 & 1.00 & 0.00 & -0.45 & -0.45\\ 0.00 & 0.00 & 0.00 & 1.00 & 0.00 & 0.00\\ 0.00 & 0.00 & -0.45 & 0.00 & 1.00 & -0.45\\ -0.45 & 0.00 & -0.45 & 0.00 & -0.45 & 1.00\end{array}\right]$$

(2)

We can now define the neighborhood of each node. The neighborhood of node i consists of all the nodes j that form an edge with node i, implying that the partial correlation of X_i and X_j differs from zero. In the toy example the neighborhood of node 1 is formed by nodes 3 and 6, while the neighborhood of node 2 is empty.

Since the true edge set of the toy example is sparse, we can estimate it by means of the Glasso, SPACE, ${\ell }_{1}$ regularized nodewise regression (NR) and ridge nodewise regression (Ridge). Unlike Glasso and SPACE which directly estimate the edge structure, NR computes a regression model per node and thus yields two regression weights for each edge. To combine the information in these two weights into one edge, we can consider two variants, NR-AND and NR-OR. The AND rule means that an edge is only included in the model if both regression weights differ from zero, whereas the OR rule is more liberal and selects all edges for which at least one of the regression weights is not set to zero. Ridge estimates the partial correlations by fitting a regression model for each node using an ${\ell }_{2}$ penalty, which shrinks the regression weights towards zero.

For each of the estimation methods, a number of approaches have been put forward to tune the regularization parameter, the details of which are provided in the Methods section. For Glasso we will use 10-fold cross validation using two different loss functions: the first approach aims to minimize the negative log-likelihood function (CV1) and the second approach focuses on the sum of the prediction errors of each node (CV2). Moreover, we will apply two selection rules when using cross-validation: selecting the model that yields the lowest value and applying the one-standard-error-rule (1se)²⁰. Additionally, we will consider the Bayesian Information Criterion (BIC) and the Extended Bayesian Information Criterion (EBIC)²¹. To tune the weight of the ${\ell }_{1}$ penalty term in SPACE and NR, we will apply 10-fold CV, its one-standard-error-rule variant, BIC and the finite sample result (FSR) proposed by Meinshausen and Bühlmann¹¹. Note that in NR the tuning is performed for each separate regression. To optimize the weight of the ${\ell }_{2}$ penalty term in Ridge we will apply 10-fold CV for each separate regression. We note that this considered set of procedures is not intended to be exhaustive. Yet, the set is sufficient to illustrate the problem of efficiently tuning the penalty weight when there is limited information.

Figure 2 shows the GGMs obtained with the nineteen considered approaches (i.e., nineteen combinations of estimation method (Glasso, SPACE, NR-AND, NR-OR and Ridge) and tuning options (CV, CV-1se, BIC, EBIC, FSR). We observe that Glasso-CV1-1se (panel c), NR-AND-FSR (panel o) and NR-OR-FSR (panel s) yield a network that is more sparse than the true network. Applying Glasso-CV1-1se all edges are set to zero. Whereas with NR-AND-FSR the edges (1, 3) and (3, 6) are set to zero, with NR-OR-FSR only the edge (3, 6) is set to zero. The other estimation methods yield networks that contain the true set of edges as well as false positives, with the number of false positives varying from ten (Glasso-CV2, panel d; NR-AND-CV-1se, panel m; Ridge-CV, panel t) to one (SPACE-CV-1se, panel i).

Our PCS procedure aims to remove these false positive edges. The first step of the procedure is to apply one of the nineteen considered approaches. In the second step, we try to single out the false positives by thresholding the entries of the estimated partial correlation matrix. Specifically, only the partial correlations that are larger in absolute value than a given threshold are retained, whereas the others are set to zero and thus removed from the network. The threshold is calibrated by means of 10-fold cross-validation (see Methods section for more information). For the toy example the nineteen computed thresholds range from 0.0001 to 0.283. Figure 3 presents the networks that we obtain by applying these thresholds to the networks in Fig. 2. We observe that PCS-Glasso-CV1 (panel b), PCS-Glasso-CV2 (panel d), PCS-Glasso-CV2-1se (panel e), PCS-Glasso-BIC (panel f), PCS-Glasso-EBIC (panel g), PCS-SPACE-CV (panel h), PCS-SPACE-BIC (panel j), PCS-SPACE-FSR (panel k), PCS-NR-AND-CV (panel l), PCS-NR-AND-CV-1se (panel m), PCS-NR-AND-BIC (panel n), PCS-NR-OR-CV-1se (panel q), PCS-NR-AND-BIC (panel r) and Ridge-CV (panel t) remove the false positives and yield the true network. For, PCS-SPACE-CV-1se (panel i) none of the false positives are removed. PCS-NR-OR-CV (panel p) discards all but one false positive edge. Obviously, the networks with false negatives (panels c, o and s) cannot be improved by PCS.

Simulation study with synthetic data

In this section we perform an extensive simulation study to evaluate and compare the performance of the different procedures. We will inspect the results obtained with the nineteen combinations used above and study whether they improve when adding PCS. To this end, we replicated the settings used by Liu et al.²², Ravikumar et al.²³, Rothman et al.²⁴ and Yuan and Lin²⁵.

Design

Each simulated data set is generated by drawing n independent observations from a p-variate Gaussian distribution with mean zero and partial correlation matrix Γ. We considered two possible sample sizes n = {100, 500} and three different values of p = {20, 60, 200}. We inspected four different specifications of the population partial correlation matrix Γ. To illustrate these specifications for p = 60, we visualized them in Fig. 4.

1.
Model 1: 2 neighbor Chain Graph, in which ρ_ii|V\{i} = 1 and ρ_{i, i+1|V\{i, i+1}} = ρ_{i−1, i|V\{i, i−1}} = −0.4, and all other edges are set to 0.
2.
Model 2: 3 neighbor Chain Graph, in which ρ_ii|V\{i} = 1, ρ_{i, i+1|V\{i, i+1}} = ρ_{i−1, i|V\{i, i−1}} = −0.4, ρ_{i, i+2|V\{i, i+2}} = ρ_{i−2, i|V\{i, i−2}} = −0.2, and all other edges are set to 0.
3.
Model 3: 2 nearest-neighbor graph. We first specify the inverse of the covariance matrix Σ as follows: we randomly select p points from a unit square and we compute all pairwise distances between the p points. Then, for each node the neighborhood set is found by including the two nodes with the smallest distance. Next, the OR-rule is applied to these neighborhood sets to derive the associated undirected network. The off-diagonal elements of the corresponding Σ⁻¹ are randomly chosen from the interval $[\,-\,1,-\,0.5]\cup [0.5,1]$. To ensure that Σ⁻¹ is positive definite, the matrix is transformed as: Σ⁻¹ + (|λ(Σ⁻¹)_min| + 0.1)I_p where λ(Σ⁻¹)_min refers to the smallest eigenvalue and I_p is an identity matrix of dimension p. To compute Γ we normalize Σ⁻¹ and we multiply the off-diagonal elements by (−1).
4.
Model 4: Random graph. We first specify Σ⁻¹ as follows: each upper triangular element of Σ⁻¹ is set equal to 0.3 with probability ρ and to zero otherwise. We set the probability ρ = {0.1, 0.01, 0.001} when p = {20, 60, 200}, respectively. Next, we set the lower triangular elements equal to the corresponding upper triangular elements. To ensure that Σ⁻¹ is positive definite the matrix is transformed as in model 3. Finally, to compute Γ we normalize Σ⁻¹ and we multiply the off-diagonal elements by (−1).

We generated 100 replicates for each cell of the design. An R script to conduct the simulation experiment is provided in the Supplementary Information.

Performance measures

To evaluate how well the different methods perform in distinguishing between true non-zero partial correlations and true zero ones, we compute the True Positive Rate (TPR) and False Positive Rate (FPR):

$${\rm{TPR}}=\frac{{\rm{TP}}}{{\rm{TP}}+{\rm{FN}}}$$

(3)

$${\rm{FPR}}=\frac{{\rm{FP}}}{{\rm{TN}}+{\rm{FP}}}$$

(4)

where TP is the number of true positives (true non-zero edges that are estimated as such), TN is the number of true negatives (true zero edges that are recognized as such), FP is the number of false positives (true zero edges that are estimated as non-zero) and FN is the number of false negatives (true non-zero edges that are estimated as zero). The TPR and FPR coefficients take values in the range [0, 1]. For the TPR a value of 0 indicates that the labeling of edges as non-zero is completely wrong, a value of 0.5 indicates that the procedure cannot do better than random prediction and a value of 1 indicates a perfect recovery of the non-zero edges. Similarly, a FPR value of 0 indicates a perfect recovery of the zero edges, a value of 0.5 indicates that the procedure cannot do better than random prediction and a value of 1 indicates that the labeling of edges as zero is completely wrong. We also report the average number of TP and FP values across the 100 replicates.

Results

Tables 1 to 3 show the average TPR and FPR scores for the different methods under consideration for the different choices of p. We also report the average number of TP and FP for the different methods in Tables 4 to 6. First, we compare the performance of the methods without conducting PCS. The TPR and FPR scores depend strongly on the model used to generate the data and the values of n and p (i.e., amount of available information). In general, when comparing the performance of the different methods in controlling the amount of false non-zero partial correlations, we observe that for every combination of p and n, SPACE and NR perform better than Glasso. The results for Glasso are affected by the penalty tuning approach: whereas using cross-validation tends to introduce a large number of false positives across all different conditions, applying EBIC yields many false negatives. For NR and SPACE, the results depend on n and p and the data generating model. For n = 100, the best overall though still rather bad performance for Models 1 and 4 is obtained with some of the NR-AND variants and for Models 2 and 3 with some of the SPACE variants. We also note that for Model 2 none of the state-of-the-art methods (excluding Ridge) is able to efficiently estimate the true number of positive edges. Furthermore, in the high-dimensional case (i.e., p > n) all approaches perform badly in controlling the amount of false positive edges. When n = 500, the TPR and FPR values are clearly better than for the low-sample size setting and indicate good overall performance.

Table 1 Average true positive rate (TPR) and false positive rate (FPR) over 100 replications when p = 20.

Full size table

Table 2 Average true positive rate (TPR) and false positive rate (FPR) over 100 replications when p = 60.

Full size table

Table 3 Average true positive rate (TPR) and false positive rate (FPR) over 100 replications when p = 200.

Full size table

Table 4 Average number of true positive edges (TP) and false positive edges (FP) over 100 replications when p = 20. For each model the number of non-zero partial correlations are: 19 for Model 1, 37 for Model 2, 16 for Model 3 and 12 for Model 4.

Full size table

Table 5 Average number of true positive edges (TP) and false positive edges (FP) over 100 replications when p = 60. For each model the number of non-zero partial correlations are: 59 for Model 1, 117 for Model 2, 48 for Model 3 and 13 for Model 4.

Full size table

Table 6 Average number of true positive edges (TP) and false positive edges (FP) over 100 replications when p = 200. For each model the number of non-zero partial correlations are: 199 for Model 1, 397 for Model 2, 142 for Model 3 and 23 for Model 4.

Full size table

Turning to the results after applying PCS, we observe in Tables 1 to 6 that PCS-SPACE and PCS-NR estimate networks that contain a smaller number of false positive edges than the state-of-the-art methods without PCS. This improvement is larger for Models 1, 3 and 4 and when n = 100 and n < p, in that PCS is able to control the number of false positive edges without compromising the number of correctly estimated true edges. Furthermore, the performance differences between the different SPACE and NR variants have diminished. PCS-SPACE-BIC has the best overall performance across all the n = 100 conditions. For PCS-Glasso-EBIC the results cannot be improved by PCS, because Glasso-EBIC yields networks with a large number of false negatives. When n = 500, PCS performs almost perfectly in finding the non-zero edges in Models 1, 3 and 4, while for Model 2, the best overall performance is obtained with PCS-NR-OR-BIC when p = 20, 60 and with PCS-NR-OR-FSR when p = 200.

Finally, we study how the sample size and the non-sparsity level influence the height of the estimated threshold in the PCS procedure. For Glasso, SPACE, NR using the AND rule and NR using the OR rule, we estimate a linear mixed effect model with a random intercept in which observations are clustered according to the tuning procedure (i.e., different CV variants, information criteria or finite sample results). The model includes the estimated thresholds as the dependent variable and the sample size and the non-sparsity level as predictors. The non-sparsity level is computed as the number of true non-zero partial correlations divided by the total number of edges in the network. For Ridge we estimate the same model using OLS regression. Table 7 shows the obtained regression coefficients for each estimation procedure. We observe that across the different estimation procedures there is a significant negative relation between the sample size and the estimated threshold value. Also, we found a significant negative relation between the non-sparsity level and the threshold value parameter for all methods except Glasso.

Table 7 Regression coefficients, standard errors (SE), associated Wald’s t-scores and p-values for all predictors in the analysis.

Full size table

Simulation study based on real data

In this section we simulate data based on the sparse GGM results obtained by Armour, et al.³ for 20 Post-Traumatic Stress Disorder (PTSD) symptoms of 221 U.S. military veterans. The 20 PTSD symptoms are assumed to form four symptoms clusters: intrusions (B1-B5), avoidance (C1-C2), negative alterations in cognitions and mood (D1-D7), and alterations in arousal and reactivity (E1-E6). Armour, et al.³ applied the Glasso-EBIC approach and used bootstrapping techniques to estimate the parameter accuracy and stability of the partial correlation matrix Γ²⁶. The associated network, shown in Fig. 5, reveals strong positive within-cluster connections between nightmares (B2) and flashbacks (B3), blame of self or others (D3) and negative trauma related emotions (D4), detachment (D6) and restricted affect (D7), and hypervigilance (E3) and exaggerated startle response (E4). On top of that, they also find many moderately positive connections within the symptom clusters: for instance, intrusive thoughts (B1) and nightmares (B2), avoidance thoughts (C1) and avoidance remainders (C2), irritability/anger (E1) and self-destructive behaviour (E2), but also between symptom clusters, for instance between loss of interest (D5) and difficulty in concentrating (E5).

To compare the performance before and after using PCS, we drew n observations from a 20-variate Gaussian distribution with mean zero and partial correlation matrix Γ. We used two sample sizes n = {100, 500} and replicated the simulation 100 times.

Table 8 shows the average TPR and FPR scores and Figs. 6 and 7 present heatmaps of the frequency with which the entries of the partial correlation matrix are detected as non-zero. We observe that Partial Correlation Screening (PCS) significantly outperforms Glasso, NR and SPACE. When n = 100, PCS-SPACE-BIC has the best performance in terms of the false positive rate, which is in line with the simulation results on synthetic data. For n = 500, all the estimation procedures using PCS show an average TPR higher than 0.999 and an average FPR below 0.020 (see Table 8).

Table 8 Average true positive rate (TPR) and false positive rate (FPR) over 100 simulations based on the PTSD data.

Full size table

Breast cancer data

GGMs have been widely applied to analyze gene expression data, since many authors hypothesize that the complex interactions between genes take the form of sparse pathways or networks^27,28,29. More specifically, given mRNA levels of different patients, researchers have studied the conditional dependencies of genes for a variety of diseases¹.

We estimate a sparse partial correlation network for gene expression data from a breast cancer study by West et al.³⁰. The dataset contains 7, 129 genes sampled from 49 breast tumor tissues samples: 25 samples from patients diagnosed as estrogen receptor positive and 24 samples from patients diagnosed as estrogen receptor negative. In line with Sheridan et al.³¹, we focus on a subset of p = 150 genes related to the estrogen receptor gene ERS1. This gene acts as an estrogen-activated transcription factor and has a key role in the proliferation of cancerous cells³².

Table 9 shows how many edges are obtained with the Glasso, NR, SPACE and Ridge techniques under consideration and how much these numbers of edges decrease by applying PCS. It can be concluded that the sparsity level varies considerably depending on the approach used. We observe that when the procedures yield dense networks (i.e. Ridge-CV, Glasso-CV1, Glasso-CV1-1se, Glasso-CV2, Glasso-CV2-1se and NR-OR-CV), applying PCS produces a larger reduction in the number of edges.

Table 9 Estimated number of edges of the gene regulatory network for the breast cancer data, the symptom network of patients with a diagnosis within the nonaffective psychotic spectrum using the BPRS scale and the symptom network of patients with PTSD.

Full size table

Given that the results vary considerably across the methods, the next question is how we should deal with this uncertainty when interpreting the networks. We opt to combine the results of the different estimation methods^33,34, by computing a network that includes all edges that occur in at least two of the nineteen obtained PCS networks. Note that if we apply this combination approach to the estimated PCS networks for the toy example (see Fig. 3), we would recover the true network.

Figure 8 shows the resulting combined network for the breast cancer data. Figure 9 focuses on the sub-network of the genes that are related with the estrogen receptor gene ERS1 (Panel a) and the gene FOXA1 (Panel b). We can identify some important regularity interactions in the estimated GGM. As a first example, the ESR1 (ESR) gene is partially correlated with SLC39A6 (SLC). This gene functions as a zinc transporter and has been shown to be highly expressed in ESR1-positive tumours and is highly significantly associated with the spread of breast cancer to the lymph nodes³². As a second example, we can inspect the genes that belong to the neighborhood of FOXA1 (FOX). FOXA1 has been found to be predominantly expressed in luminal type A carcinomas³⁵ and may prevent metastatic progression of this type of breast cancer³⁶. We observe an edge between FOXA1 (FOX) and AR (AR) (androgen receptor), which is in line with findings that indicate that AR regulates estrogen receptor expression³⁷.

Psychopathological symptoms data

For a long time, modeling approaches to psychopathological data started from the assumption that psychopathological symptoms reflect an underlying mental disorder and thus are caused by this disorder³⁸. This assumption has recently been challenged and an alternative hypothesis has been put forward stating that symptoms are causally active components of a mental disorder^39,40. Within this framework, network analysis is then used to study the conditional dependencies between a set of symptoms^41,42.

We studied the conditional dependencies of a set of 24 psychopathological symptoms in a sample of 184 patients (189 before patients with missing data were discarded) within the nonaffective psychotic spectrum, that participated in the second wave of the multicenter Genetic Risk and Outcome of Psychosis (GROUP) cohort study⁴³. The symptoms are measured using the Brief Psychiatric Rating Scale (BPRS)⁴⁴, which captures the following symptoms: Somatic Concern (SmC), Anxiety (Anx), Depression (Dpr), Guilt (Glt), Hostility (Hst), Suspiciousness (Ssp), Unusual Thought (UnT), Grandiosity (Grn), Hallucinations (Hll), Disorientation (Dsr), Conceptual Disorganization (CnD), Excitement (Exc), Elevated mood (ElM), Tension (Tns), Mannerisms (Mnn), Uncooperativeness (Unc), Motor Retardation (MtR), Suicidality (Scd), Self Neglect (SlN), Bizarre Behaviour (BzB), Motor Hyperactivity (MtH), Distractibility (Dst), Emotional Withdrawal (EmW) and Blunted Affect (BlA). Each symptom is rated on a 7-point Likert scale. Because the data is measured on a Likert scale rather than on a continuous one, we apply the nonparanormal transformation proposed by Liu et al.⁴⁵ that uses the Gaussian copula to transform the data into normal scores.

Table 9 shows the number of edges that result from applying the different methods under consideration. We observe that Ridge-CV, Glasso-CV1, NR-OR-CV and Glasso-BIC estimate the most dense networks and that applying PCS drastically reduces the amount of edges when the original network was not so sparse.

Figure 10 shows the network computed by combining the different PCS networks and discarding edges all that occur only once. Cognitive models that study psychosis have postulated that some of the most prominent symptoms are delusional beliefs (grandiosity, suspiciousness, unusual thoughts)⁴⁶. We indeed observe that there is strong positive relation between Unusual Thoughts (UnT) and Suspiciousness (Ssp) and between Emotional Withdrawal (EmW) and Blunted Affect (BlA). Also, there is a strong positive relation between Unusual Thoughts (UnT) and Grandiosity (Grn), Motor Retardation (MtR) and Elevated mood (ElM), Anxiety (Anx) and Depression (Dpr), Depression (Dpr) and Guilt (Glt), and Tension (Tns) and Distractibility (Dst).

Post-traumatic stress disorder symptoms data

Finally, we return to the PTSD data that we studied in Subsection: Simulation Study Based on Real Data. Table 9 shows the number of edges for each of the procedures. We observe a similar pattern as in the previous applications. Figure 11 displays the network that results from applying our combination approach to the PCS networks. This combined network recovers the conditional dependencies that Armour et al.³ found to be strongly positive: nightmares (B2) and flashbacks (B3), blame of self or others (D3) and negative trauma related emotions (D4), detachment (D6) and restricted affect (D7), and hypervigilance (E3) and exaggerated startle response (E4).

Discussion

In this article, we have demonstrated through an extensive simulation study that the most popular procedures to estimate partial correlation networks, Glasso, SPACE, NR and Ridge, often do not yield the true underlying network, no matter which procedure is applied to select the regularization parameter. Results are heavily influenced by sample size and the number of variables (i.e., the lower the sample size and the higher the number of variables, the worse), with high-dimensional problems being especially difficult. We also note that the Glasso results heavily depend on which approach is used to tune the regularization parameter. Specifically, we found that in the high-dimensional setting, using the BIC or EBIC yields many false negatives and thus an overly sparse network.

Given that the state-of-the-art methods frequently cannot satisfactorily recover the true set of edges, we have presented a novel approach that allows to better control the false positive rate. This procedure boils down to performing an additional second step, after applying one or more state-of-the-art methods of choice. In this second step, we discard the partial correlation coefficients in the estimated network that are smaller in absolute value than a given threshold, which is obtained through cross-validation. Our novel procedure clearly improved the performance of the estimation methods and tuning approaches considered, especially in the settings where the state-of-the-art methods yielded bad results. Whereas PCS-SPACE-BIC seems to be the best choice for small sample size, which method is applied in the first step hardly matters when sample size increases.

We also applied all approaches to three real data sets. The results again show that our PCS approaches yield more sparse networks than the state-of-the art methods. To deal with the multitude of obtained networks, we proposed to compute a network that combines the different PCS estimates, but discard the edges that occurred in only one network. Although results seemed interpretable, future research should investigate further how to efficiently combine the different estimators or how to optimally select among the nineteen obtained networks.

In this paper we used standard simulation settings from the literature to demonstrate the problematic behaviour of existing approaches. It is important to mention that except for Glasso, none of the state-of-the-art procedures studied in this paper estimates a covariance matrix that is positive definite. Also, it is not guaranteed that this property still holds after applying the PCS to Glasso. In future research, it would be useful to investigate the behavior of the different approaches under more difficult settings as well as the theoretical properties of the PCS. This would lead to several possible extensions of our method. One extension targets data in which the assumption of multivariate normality is violated. Here, our approach can be easily extended to make use of techniques to estimate semiparametric undirected graphs^45,47,48. We also note that in some applications, such as in psychology data or in the high dimensional setting, some variables might be highly linearly correlated. In this setting, the assumption regarding the regularity of the covariance matrix might not hold. A possible solution is to first cluster the strongly correlated variables and then take this cluster structure into account when estimating the GGM using the PCS approach^49,50.

Finally, it is important to note that imposing sparsity might be too stringent in some applications. For instance, in some cases researchers are also interested in detecting partial correlations that are very close to zero. Moreover, it can also happen that the true network is not so sparse to begin with. In such cases, using approaches based on ${\ell }_{1}$ regularization may affect the validity of the results⁵¹. Therefore, we believe that future research should also focus on exploring how the methods proposed in this paper behave when the true underlying network is less sparse or includes some very weak edges.

Methods

Partial correlation estimation procedures

In this subsection we present the technical details of the state-of-the-arts methods to estimate sparse partial correlation networks and the associated tuning methods for the regularization parameter.

The graphical lasso

Yuan and Lin²⁵ and Rothman et al.²⁴ proposed a penalized maximum likelihood approach to estimate the inverse of the covariance matrix Σ, denoted by Ω = [ω_ij]. If S denotes the sample covariance matrix, the problem is to minimize the following penalized log-likelihood function:

$$\hat{{\boldsymbol{\Omega }}}({\lambda }_{1})=\mathop{{\rm{a}}{\rm{r}}{\rm{g}}{\rm{m}}{\rm{a}}{\rm{x}}}\limits_{{\boldsymbol{\Omega }}\succ 0}\{{\rm{t}}{\rm{r}}({\bf{S}}{\boldsymbol{\Omega }})-\,\log \,det({\boldsymbol{\Omega }})+{\lambda }_{1}\sum _{i\ne j}\,|{\omega }_{ij}|\},$$

(5)

where tr(⋅) denotes the trace of a matrix and λ₁ > 0 controls the size of the penalty. The penalty term is a proxy of the number of zeros in the precision matrix. The smaller the value of λ₁, the more non-zero elements the model includes. Friedman, Hastie and Tibshirani¹³ proposed an efficient algorithm to implement this method, which is called the Graphical lasso (Glasso). Afterwards, the partial correlation matrix can be computed using the known relation between the entries of the inverse of the covariance matrix and the partial correlation coefficients (see Lemma 1 in Peng et al.¹²).

For the different applications we select the regularization parameter as follows. We generate a grid of 100 equidistant possible values for λ₁ ranging from 0.001 to |max(S)| when p < 100. When p ≥ 100 the sequence limits are (0.05,|max(S)|). We propose six approaches to select the optimal value from this grid. The first one is to implement K-fold cross-validation using the log-likelihood as performance measure (see Section 4.2 in Huang et al.⁵² and Section 2.3 in Price et al.⁵³). We denote this procedure Glasso-CV1. We split the sample in K subsets. Using all but the k-th subset, we estimate the precision matrix using Glasso and denote this matrix $\hat{\Omega }$, for different values of λ₁. On the basis of the discarded k-th subset we estimate the sample covariance matrix, S^k. Next, for each value of λ₁ we compute the following loss function:

$${\rm{C}}{\rm{V}}1({\lambda }_{1})=\mathop{\sum }\limits_{k=1}^{K}\,\{{\rm{t}}{\rm{r}}({{\bf{S}}}^{k}\hat{{\boldsymbol{\Omega }}}({\lambda }_{1}))-\,\log \,det(\hat{{\boldsymbol{\Omega }}}({\lambda }_{1}))\}.$$

(6)

We plot CV1(λ₁) versus λ₁ and we select the tuning parameter that minimizes the loss function CV1(λ₁).

The second approach uses the one-standard-error-rule²⁰. We denote this procedure Glasso-CV1-1se. Using the loss function in Eq. (6), we first compute the standard deviation of CV1₁(λ₁), …, CV1_K(λ₁):

$${\rm{sd}}({\lambda }_{1})={\rm{sd}}({\rm{CV}}{1}_{1}({\lambda }_{1}),\ldots ,{\rm{CV}}{1}_{K}({\lambda }_{1})).$$

(7)

Next, we compute the standard error of CV1(λ₁):

$${\rm{se}}({\lambda }_{1})={\rm{sd}}({\lambda }_{1})/\sqrt{K}.$$

(8)

Finally, given the tuning weight that minimizes the cross-validation error in Eq. (6), denoted by ${\hat{\lambda }}_{1}$, we choose the tuning weight that verifies the following rule:

$${\rm{CV}}1({\lambda }_{1})\le {\rm{CV}}1({\hat{\lambda }}_{1})+{\rm{se}}({\hat{\lambda }}_{1})$$

(9)

The third approach implements K-fold cross-validation using the prediction errors of each node as performance measure. We denote this procedure Glasso-CV2. We split the sample in K subsets. Using all but the k-th subset, we estimate the precision matrix using Glasso and denote this matrix $\hat{\Omega }$, for different values of λ₁. Next, for each value of λ₁ we compute the following loss function:

$${\rm{CV}}2({\lambda }_{1})=\mathop{\sum }\limits_{k=1}^{K}\,\mathop{\sum }\limits_{i=1}^{p}\,{\left\Vert {X}_{i}^{k}-\sum _{j\ne i}\left(-\frac{{\hat{\omega }}_{ij}}{{\hat{\omega }}_{ii}}\right){X}_{j}^{k}\right \Vert}^{2}.$$

(10)

We plot CV2(λ₁) versus λ₁ and we select the tuning parameter that minimizes the loss function CV2(λ₁).

The fourth procedure selects the tuning weight by applying the one-standard-error-rule on the cross-validation procedure CV2. We denote this procedure Glasso-CV2-1se.

The fifth and sixth procedures to select the optimal regularization parameter from the 100 considered λ₁ values are based on the Bayesian Information Criterion (BIC) or the Extended Bayesian Information Criterion (EBIC). We refer to these procedures as Glasso-BIC and Glasso-EBIC, respectively. We select the value of λ₁ that minimizes the following loss function:

$${\rm{EBIC}}({\lambda }_{1})=-\,2 {\mathcal L} (\hat{{\boldsymbol{\Omega }}}({\lambda }_{1}))+\kappa \,\log (n)+4\kappa \gamma \,\log (p)$$

(11)

where $ {\mathcal L} (\cdot )$ is the value of the log-likelihood function that corresponds to the estimated matrix $\hat{\Omega }$, κ is the number of edges in the estimated network and γ ∈ [0, 1] is a parameter that controls the penalization of the network. If γ = 0, the Eq. (11) corresponds to the classical BIC. Positive values of γ lead to stronger penalization. To compute EBIC, we follow the recommendation of Chen and Chen⁵⁴ and Foygel and Drton²¹ and set γ to 0.5^55,56.

Nodewise regression

Meinshausen and Bühlmann¹¹ proposed to estimate the set of network edges by performing p separate lasso regressions:

$${\hat{{\boldsymbol{\beta }}}}_{i}({\lambda }_{2})=\mathop{{\rm{a}}{\rm{r}}{\rm{g}}{\rm{m}}{\rm{i}}{\rm{n}}}\limits_{{\beta }_{ij}}\left\{\frac{1}{2}{\left\Vert {X}_{i}-\sum _{j\ne i}{\beta }_{ij}{X}_{j}\right\Vert }^{2}+{\lambda }_{2}\sum _{j\ne i}\,|{\beta }_{ij}|\right\},$$

(12)

where ${\hat{\beta }}_{i}$ is a vector that contains the p − 1 estimated regression weights of node i and λ₂ > 0 is the regularization parameter that controls the number of non-zero elements in the neighborhood of node i. The set of edges can be computed with the AND-rule:

estimate an edge between nodes i and j ⇔ ${\hat{\beta }}_{ij}$ ≠ 0 and ${\hat{\beta }}_{ji}$ ≠ 0

yielding the NR-AND procedure.

Alternatively, we can use the NR-OR method and compute the edge set with the OR-rule:

estimate an edge between nodes i and j ⇔ ${\hat{\beta }}_{ij}$ ≠ 0 or ${\hat{\beta }}_{ji}$ ≠ 0.

Next, the partial correlation matrix can be computed using the relation between the prediction errors of the best linear predictor of each node and the partial correlation coefficients (see Lemma 1 in Peng et al.¹²).

To select the tuning parameter λ₂ for each regression separately we generate a grid of 100 possible values using the sequence generated with the function glmnet of the R package glmnet⁵⁷. We consider four different tuning procedures. First, we can perform K-fold cross-validation. Discarding the k-th subset we estimate the vector of regression weights ${\hat{\beta }}_{i}$ using a lasso regression. We select the value of λ₂ that minimizes the following loss function:

$${\rm{CV}}({\lambda }_{2})=\mathop{\sum }\limits_{k=1}^{K}\,{\left\Vert {X}_{i}^{k}-\sum _{j\ne i}{\hat{\beta }}_{ij}{X}_{j}^{k}\right\Vert }^{2},$$

(13)

where X_i^k are the observations in the discarded subset k.

The second approach adapts this cross-validation approach by using the one-standard-error-rule. We denote this procedure NR-CV-1se.

The third procedure to select the regularization parameter, NR-BIC, involves computing the Bayesian Information Criterion (BIC) for different values of λ₂. For each node, we select the value of λ₂ that minimizes the following loss function:

$${{\rm{BIC}}}_{i}({\lambda }_{2})=n{\rm{RSS}}({\hat{{\boldsymbol{\beta }}}}_{i})+{\kappa }_{i}\,\log (n)$$

(14)

where RSS(⋅) is the value of the residual sum of squares for the i-th regression and κ_i is the number of elements in the estimated neighborhood of node i.

The fourth procedure is NR-FSR and uses a Finite Sample Result. Meinshausen and Bühlmann¹¹ show that under certain assumptions regarding the sparsity and regularity conditions of the covariance matrix and the regression weights, the neighborhood of a node i will contain at most α ∈ (0, 1) false positive edges if the ${\ell }_{1}$ penalty parameter is set as: ${\lambda }_{2}(\alpha )=\frac{2}{\sqrt{n}}{\Phi }^{-1}(1-\frac{\alpha }{2{p}^{2}})$, where Φ⁻¹ is the inverse of the c.d.f. of N(0, 1). We set the bound to the proportion of the false positive edges to α = 0.05.

Joint sparse linear regression

Peng et al.¹² proposed to estimate the partial correlation matrix by minimizing the following joint sparse regression (SPACE):

$$\hat{\Gamma }({\lambda }_{3})=\mathop{{\rm{a}}{\rm{r}}{\rm{g}}{\rm{m}}{\rm{i}}{\rm{n}}}\limits_{{\rho }_{ij},{\omega }_{ii}}\left \{\frac{1}{2}\left(\mathop{\sum }\limits_{i=1}^{p}\,{\left\Vert {X}_{i}-\sum _{j\ne i}{\rho }_{ij| V{\rm{\setminus }}\{i,j\}}\sqrt{\frac{{\omega }_{jj}}{{\omega }_{ii}}}{X}_{j}\right\Vert }^{2}\right)+{\lambda }_{3}\sum _{1\le i < j\le p}\,|{\rho }_{ij}|\right\},$$

(15)

where ω_ii is the residual variance of the optimal prediction of X_i given all remaining variables, which is equivalent to the the i-th diagonal element of the matrix Ω and λ₃ > 0 is the regularization parameter that controls the number of non-zero elements in the partial correlation matrix Γ.

Given a grid of 100 equidistant values for λ₃ ranging from $\sqrt{n}{\Phi }^{-1}(1-\frac{0.9}{2{p}^{2}})$ to $\sqrt{n}{\Phi }^{-1}(1-\frac{1e-4}{2{p}^{2}})$, there are four different procedures to calibrate the tuning parameter λ₃. We first propose to perform K-fold cross-validation, yielding SPACE-CV. We first split the sample into K subsets and select the parameter value that minimizes the following loss function:

$${\rm{C}}{\rm{V}}({\lambda }_{3})=\mathop{\sum }\limits_{k=1}^{K}\,\mathop{\sum }\limits_{i=1}^{p}\,{\left\Vert {X}_{i}^{k}-\sum _{j\ne i}{\hat{\rho }}_{ij| V{\rm{\setminus }}\{i,j\}}\sqrt{\frac{{\hat{\omega }}_{jj}}{{\hat{\omega }}_{ii}}}{X}_{j}^{k}\right\Vert }^{2}.$$

(16)

The second procedure again adapts this cross-validation approach by using the one-standard-error-rule. We denote this procedure SPACE-CV-1se.

The third procedure to select the regularization parameter involves computing the Bayesian Information Criterion (BIC) for the 100 values of λ₃. First, we compute for each node the residual sum of squares:

$${{\rm{R}}{\rm{S}}{\rm{S}}}_{i}({\hat{\rho }}_{ij| V{\rm{\setminus }}\{i,j\}},{\hat{\omega }}_{ii})={\left\Vert {X}_{i}-\sum _{j\ne i}{\hat{\rho }}_{ij| V{\rm{\setminus }}\{i,j\}}\sqrt{\frac{{\hat{\omega }}_{jj}}{{\hat{\omega }}_{ii}}}{X}_{j}\right\Vert }^{2},$$

Next, we select the value of λ₃ by minimizing:

$${\rm{B}}{\rm{I}}{\rm{C}}({\lambda }_{3})=\mathop{\sum }\limits_{i=1}^{p}\,\left(n{{\rm{R}}{\rm{S}}{\rm{S}}}_{i}({\hat{\rho }}_{ij| V{\rm{\setminus }}\{i,j\}},{\hat{\omega }}_{ii})+{\kappa }_{i}\,\log (n)\right)$$

(17)

where κ_i is the number of elements in the estimated neighborhood of node i.

The fourth procedure SPACE-FSR is based on the Finite Sample Result by Peng et al.¹². These authors show that under certain assumptions regarding the sparsity and regularity conditions of the covariance matrix and the regression weights, the neighborhood of a node i will contain at most α ∈ (0, 1) false positive edges if the penalty parameter is set as: ${\lambda }_{3}(\alpha )=\sqrt{n}{\Phi }^{-1}(1-\frac{\alpha }{2{p}^{2}})$, where Φ⁻¹ is the inverse of the c.d.f. of N(0, 1). We again set the bound to the proportion of the false positive edges to α = 0.05.

Partial correlation estimation using ridge regression

Ha and Sun¹⁹ proposed to estimate a penalized partial correlations matrix using a ridge penalty. We apply a simpler version of their method by performing p separate ridge regressions:

$${\hat{{\boldsymbol{\delta }}}}_{i}({\lambda }_{4})=\mathop{{\rm{a}}{\rm{r}}{\rm{g}}{\rm{m}}{\rm{i}}{\rm{n}}}\limits_{{\delta }_{ij}}\left\{\frac{1}{2}{\left\Vert {X}_{i}-\sum _{j\ne i}{\delta }_{ij}{X}_{j}\right\Vert }^{2}+{\lambda }_{4}\sum _{j\ne i}\,{\delta }_{ij}^{2}\right\},$$

(18)

where ${\hat{\delta }}_{i}$ is a vector that contains the p − 1 estimated regression weights for node i and λ₄ > 0 is the regularization parameter that controls the amount of shrinkage of the regression weights toward zero in the neighborhood of node i. The partial correlation matrix is computed using the relation between the prediction errors of the best linear predictor of each node and the partial correlation coefficients (see Lemma 1 in Peng et al.¹²).

To select the tuning parameter λ₄ for each regression separately we generate a grid of 100 possible values using the sequence generated with the function glmnet of the R package glmnet⁵⁷. We select the regularization parameter by performing K-fold cross-validation. Discarding the k-th subset we estimate the vector of regression weights ${\hat{\delta }}_{i}$ using ridge regression. We select the value of λ₄ that minimizes the following loss function:

$${\rm{CV}}({\lambda }_{4})=\mathop{\sum }\limits_{k=1}^{K}\,{\left\Vert {X}_{i}^{k}-\sum _{j\ne i}{\hat{\delta }}_{ij}{X}_{j}^{k}\right\Vert }^{2},$$

(19)

where X_i^k are the observations in the discarded subset k. We denote this procedure Ridge-CV.

Partial correlation screening procedure

In this subsection we present the technical details of the Partial Correlation Screening (PCS) algorithm. The procedure estimates the set of edges in two steps. In the first step, we determine a sparse partial correlation network, denoted by $\hat{\Gamma }=[{\hat{\rho }}_{ij|V\backslash \{i,j\}}]$, using one of the methods that we discussed in the previous subsection.

In the second step of the algorithm, we detect unimportant pairs of variables by thresholding the partial correlations estimated in the first step. For i ∈ V and a threshold parameter τ ∈ (0, 1), we estimate the neighborhood of node i as follows

$${\hat{\mathscr A}}_{i,\tau }=\{j\in V\backslash \{i\}:|{\hat{\rho }}_{ij|V\backslash \{i,j\}}| > \tau \}.$$

(20)

The algorithm outputs the estimated set of edges for a given threshold τ:

$${\hat{E}}_{\tau }=\{(i,j)\in V:|{\hat{\rho }}_{ij|V\backslash \{i,j\}}| > \tau \}.$$

(21)

Finally, the prediction error of the regression of each node i conditioned on the variables that belong to the estimated neighborhood set ${\hat{\mathscr A}}_{i,\tau }$ is given by

$${\hat{\varepsilon }}_{i,\tau }={X}_{i}-\sum _{j\in {\hat{\mathscr A}}_{i,\tau }}\,{\hat{\theta }}_{ij,\tau }{X}_{j},$$

where ${\hat{\theta }}_{i,\tau }$ is the vector of estimated regression coefficients of node i ∈ V given the variables in the estimated neighborhood set ${\hat{\mathscr A}}_{i,\tau }$.

Choice of the tuning parameter

To select the threshold parameter τ, we perform K-fold cross validation. We generate a sequence of 100 equidistant values for the threshold τ ranging from 0.0001 to 1. The procedure to select the threshold uses a double-loop. First, for each of the estimation procedures proposed in the previous subsection, we select the regularization parameter λ. Second, we split the sample in K subsets. Using all but the k-the subset, we estimate a sparse partial correlation network using the selected regularization parameter λ. Next, for each value of τ in the grid, we estimate the neighborhood of each node (see Eq. (20)) and the regression weights vector ${\hat{\theta }}_{i,\tau }$. For each value of τ we compute the following loss function:

$$CV(\tau )=\mathop{\sum }\limits_{k=1}^{K}\,\mathop{\sum }\limits_{i=1}^{p}\,{\left\Vert {X}_{i}^{k}-\sum _{j\in {\hat{\mathscr A}}_{i,\tau }}{\hat{\theta }}_{ij,\tau }{X}_{j}^{k}\right\Vert }^{2}.$$

(22)

We plot CV(τ) versus τ and we select the threshold parameter that minimizes the loss function CV(τ).

References

Grechkin, M., Fazel, M., Witten, D. M. & Lee, S.-I. Pathway graphical lasso. In AAAI (2015).
Akbani, R. et al. A pan-cancer proteomic perspective on the cancer genome atlas. Nat. Commun. 5, 3887 (2014).
Article ADS CAS PubMed Google Scholar
Armour, C., Fried, E. I., Deserno, M. K., Tsai, J. & Pietrzak, R. H. A network analysis of DSM-5 posttraumatic stress disorder symptoms and correlates in us military veterans. J. Anxiety Disord. 45, 49–59 (2017).
Article PubMed Google Scholar
Huang, S. et al. Learning brain connectivity of alzheimer’s disease by sparse inverse covariance estimation. NeuroImage 50, 935–949 (2010).
Article PubMed Google Scholar
Lauritzen, S. L. Graphical Models. (Oxford University Press, 1996).
Edwards, D. Introduction to Graphical Modelling. (Springer Science & Business Media, 2000).
Gardner, T. S., Di Bernardo, D., Lorenz, D. & Collins, J. J. Inferring genetic networks and identifying compound mode of action via expression profiling. Science 301, 102–105 (2003).
Article ADS CAS PubMed Google Scholar
Jeong, H., Mason, S. P., Barabási, A.-L. & Oltvai, Z. N. Lethality and centrality in protein networks. Nature 411, 41 (2001).
Article ADS CAS PubMed Google Scholar
Hastie, T., Tibshirani, R. & Wainwright, M. Statistical learning with sparsity: the lasso and generalizations (CRC press, 2015).
Bulteel, K., Mestdagh, M., Tuerlinckx, F. & Ceulemans, E. Var (1) based models do not always outpredict ar (1) models in typical psychological applications. Psychol. Methods 23, 740 (2018).
Article PubMed Google Scholar
Meinshausen, N. & Bühlmann, P. High-dimensional graphs and variable selection with the lasso. The Annals Stat. 34, 1436–1462 (2006).
Article MathSciNet MATH Google Scholar
Peng, J., Wang, P., Zhou, N. & Zhu, J. Partial correlation estimation by joint sparse regression models. J. Am. Stat. Assoc. 104, 735–746 (2009).
Article MathSciNet CAS PubMed PubMed Central MATH Google Scholar
Friedman, J., Hastie, T. & Tibshirani, R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432–441 (2008).
Article PubMed MATH Google Scholar
Fan, J. & Lv, J. Sure independence screening for ultrahigh dimensional feature space. J. Royal Stat. Soc. Ser. B (Statistical Methodol. 70, 849–911 (2008).
Article MathSciNet MATH Google Scholar
Bühlmann, P. & van de Geer, S. Statistics for high-dimensional data: methods, theory and applications. (Springer Science & Business Media, 2011).
Su, W. et al. False discoveries occur early on the lasso path. The Annals Stat. 45, 2133–2150 (2017).
Article MathSciNet MATH Google Scholar
Saligrama, V. & Zhao, M. Thresholded basis pursuit: Lp algorithm for order-wise optimal support recovery for sparse and approximately sparse signals from noisy random measurements. IEEE Transactions on Inf. Theory 57, 1567–1586 (2011).
Article MathSciNet MATH Google Scholar
Descloux, P. & Sardy, S. Model selection with lasso-zero: adding straw to the haystack to better find needles. arXivpreprint arXiv:1805.05133 (2018).
Ha, M. J. & Sun, W. Partial correlation matrix estimation using ridge penalty followed by thresholding and re-estimation. Biometrics 70, 762–770 (2014).
Article MathSciNet PubMed Central MATH Google Scholar
Hastie, T., Tibshirani, R., Friedman, J. & Franklin, J. The elements of statistical learning: data mining, inference and prediction. The Math. Intell. 27, 83–85 (2005).
Google Scholar
Foygel, R. & Drton, M. Extended bayesian information criteria for gaussian graphical models. In Advances in Neural Information Processing Systems (2010).
Liu, H. et al. Tiger: a tuning-insensitive approach for optimally estimating gaussian graphical models. Electron. J. Stat. 11, 241–294 (2017).
Article MathSciNet MATH Google Scholar
Ravikumar, P. et al. High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence. Electron. J. Stat. 5, 935–980 (2011).
Article MathSciNet MATH Google Scholar
Rothman, A. J. et al. Sparse permutation invariant covariance estimation. Electron. J. Stat. 2, 494–515 (2008).
Article MathSciNet MATH Google Scholar
Yuan, M. & Lin, Y. Model selection and estimation in the gaussian graphical model. Biometrika 94, 19–35 (2007).
Article MathSciNet MATH Google Scholar
Epskamp, S., Borsboom, D. & Fried, E. I. Estimating psychological networks and their accuracy: a tutorial paper. Behav. Res. Methods 50, 195–212 (2018).
Article PubMed Google Scholar
Schäfer, J. & Strimmer, K. An empirical bayes approach to inferring large-scale gene association networks. Bioinformatics 21, 754–764 (2004).
Article PubMed Google Scholar
Dobra, A. et al. Sparse graphical models for exploring gene expression data. J. Multivar. Analysis 90, 196–212 (2004).
MathSciNet MATH Google Scholar
Wille, A. et al. Sparse graphical gaussian modeling of the isoprenoid gene network in arabidopsis thaliana. Genome Biol. 5, R92 (2004).
Article PubMed PubMed Central Google Scholar
West, M. et al. Predicting the clinical status of human breast cancer by using gene expression profiles. Proc. Natl. Acad. Sci. 98, 11462–11467 (2001).
Article ADS CAS PubMed PubMed Central Google Scholar
Sheridan, P., Kamimura, T. & Shimodaira, H. A scale-free structure prior for graphical models with applications in functional genomics. PLOS ONE 5, e13580 (2010).
Article ADS PubMed PubMed Central CAS Google Scholar
Sørlie, T. et al. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc. Natl. Acad. Sci. 98, 10869–10874 (2001).
Article ADS PubMed PubMed Central Google Scholar
Hjort, N. L. & Claeskens, G. Frequentist model average estimators. J. Am. Stat. Assoc. 98, 879–899 (2003).
Article MathSciNet MATH Google Scholar
Hansen, B. E. & Racine, J. S. Jackknife model averaging. J. Econom. 167, 38–46 (2012).
Article MathSciNet MATH Google Scholar
Carroll, J. S. & Brown, M. Estrogen receptor target gene: an evolving concept. Mol. Endocrinol. 20, 1707–1714 (2006).
Article CAS PubMed Google Scholar
Nakshatri, H. & Badve, S. Foxa1 as a therapeutic target for breast cancer. Expert. Opin. on Ther. Targets 11, 507–514 (2007).
Article CAS Google Scholar
Sahlin, L., Norstedt, G. & Eriksson, H. Androgen regulation of the insulin-like growth factor-i and the estrogen receptor in rat uterus and liver. The J. Steroid Biochem. Mol. Biol. 51, 57–66 (1994).
Article CAS PubMed Google Scholar
Borsboom, D. Psychometric perspectives on diagnostic systems. J. Clin. Psychol. 64, 1089–1108 (2008).
Article PubMed Google Scholar
Cramer, A. O., Waldorp, L. J., van der Maas, H. L. & Borsboom, D. Comorbidity: A network perspective. Behav. Brain Sci. 33, 137–193 (2010).
Article PubMed Google Scholar
Schmittmann, V. D. et al. Deconstructing the construct: A network perspective on psychological phenomena. New Ideas Psychol. 31, 43–53 (2013).
Article Google Scholar
Borsboom, D., Cramer, A. O., Schmittmann, V. D., Epskamp, S. & Waldorp, L. J. The small world of psychopathology. PLOS ONE 6, e27407 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar
Borsboom, D. & Cramer, A. O. Network analysis: an integrative approach to the structure of psychopathology. Annu. Rev. Clin. Psychol. 9, 91–121 (2013).
Article PubMed Google Scholar
Korver, N. et al. Genetic risk and outcome of psychosis (group), a multi site longitudinal cohort study focused on gene–environment interaction: objectives, sample characteristics, recruitment and assessment methods. Int. J. Methods Psychiatr. Res. 21, 205–221 (2012).
Article PubMed PubMed Central Google Scholar
Overall, J. E. & Gorham, D. R. The brief psychiatric rating scale. Psychol. Reports 10, 799–812 (1962).
Article Google Scholar
Liu, H., Lafferty, J. & Wasserman, L. The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. J. Mach. Learn. Res. 10, 2295–2328 (2009).
MathSciNet MATH Google Scholar
Garety, P. A., Kuipers, E., Fowler, D., Freeman, D. & Bebbington, P. A cognitive model of the positive symptoms of psychosis. Psychol. Medicine 31, 189–195 (2001).
Article CAS Google Scholar
Xue, L. et al. Regularized rank-based estimation of high-dimensional nonparanormal graphical models. The Annals Stat. 40, 2541–2571 (2012).
Article MathSciNet MATH Google Scholar
Liu, H., Han, F., Yuan, M., Lafferty, J. & Wasserman, L. High-dimensional semiparametric gaussian copula graphical models. The Annals Stat. 40, 2293–2326 (2012).
Article MathSciNet MATH Google Scholar
Bühlmann, P., Rütimann, P., van de Geer, S. & Zhang, C.-H. Correlated variables in regression: clustering and sparse estimation. J. Stat. Plan. Inference 143, 1835–1858 (2013).
Article MathSciNet MATH Google Scholar
Tan, K. M., Witten, D. & Shojaie, A. The cluster graphical lasso for improved estimation of gaussian graphical models. Comput. Stat. & Data Analysis 85, 23–36 (2015).
Article MathSciNet MATH Google Scholar
Epskamp, S., Kruis, J. & Marsman, M. Estimating psychopathological networks: Be careful what you wish for. PLOS ONE 12, e0179891 (2017).
Article PubMed PubMed Central CAS Google Scholar
Huang, J. Z., Liu, N., Pourahmadi, M. & Liu, L. Covariance matrix selection and estimation via penalised normal likelihood. Biometrika 93, 85–98 (2006).
Article MathSciNet MATH Google Scholar
Price, B. S., Geyer, C. J. & Rothman, A. J. Ridge fusion in statistical learning. J. Comput. Graph. Stat. 24, 439–454 (2015).
Article MathSciNet Google Scholar
Chen, J. & Chen, Z. Extended bayesian information criteria for model selection with large model spaces. Biometrika 95, 759–771 (2008).
Article MathSciNet MATH Google Scholar
Epskamp, S. Brief report on estimating regularized gaussian networks from continuous and ordinal data. arXiv preprintarXiv:1606.05771 (2016).
Epskamp, S., Rhemtulla, M. & Borsboom, D. Generalized network pschometrics: Combining network and latent variable models. Psychometrika 82, 904–927 (2017).
Article MathSciNet PubMed MATH Google Scholar
Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1 (2010).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The research leading to the results reported in this paper was sponsored in part by a research grant from the Fund for Scientific Research-Flanders (FWO, Project No. G.0582.14 awarded to Eva Ceulemans and Francis Tuerlinckx), by the Belgian Federal Science Policy within the framework of the Interuniversity Attraction Poles program (IAP/P7/06), and by the Research Council of KU Leuven (GOA/15/003).

Author information

Authors and Affiliations

Research Group on Quantitative Psychology and Individual Differences, KU Leuven–University of Leuven, Leuven, 3000, Belgium
Ginette Lafit, Francis Tuerlinckx & Eva Ceulemans
Center for Contextual Psychiatry, KU Leuven–University of Leuven, Leuven, 3000, Belgium
Ginette Lafit & Inez Myin-Germeys

Authors

Ginette Lafit
View author publications
You can also search for this author in PubMed Google Scholar
Francis Tuerlinckx
View author publications
You can also search for this author in PubMed Google Scholar
Inez Myin-Germeys
View author publications
You can also search for this author in PubMed Google Scholar
Eva Ceulemans
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

G.L. and E.C. wrote the manuscript. G.L., E.C. and F.T. worked out the basic concepts of the partial correlation screening and the design of the simulation studies. G.L. conducted the simulation studies and analyzed the real data examples. I.M.G. helped with the interpretation of the psychiatry data. All authors reviewed the manuscript.

Corresponding author

Correspondence to Ginette Lafit.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Covariance matrix for Psychopathological Symptoms Data

R Code Simulation Synthetic Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Lafit, G., Tuerlinckx, F., Myin-Germeys, I. et al. A Partial Correlation Screening Approach for Controlling the False Positive Rate in Sparse Gaussian Graphical Models. Sci Rep 9, 17759 (2019). https://doi.org/10.1038/s41598-019-53795-x

Download citation

Received: 11 April 2019
Accepted: 05 November 2019
Published: 28 November 2019
DOI: https://doi.org/10.1038/s41598-019-53795-x
Springer Nature Limited

This article is cited by

A psychological network analysis of the relationship among component importance measures
- Claudio M. Rocco
- Kash Barker
- Andrés D. González
Applied Network Science (2024)
Estimating Finite Mixtures of Ordinal Graphical Models
- Kevin H. Lee
- Qian Chen
- Lingzhou Xue
Psychometrika (2022)

A Partial Correlation Screening Approach for Controlling the False Positive Rate in Sparse Gaussian Graphical Models

Abstract

Similar content being viewed by others

The ‘un-shrunk’ partial correlation in Gaussian graphical models

Learning mixed graphical models with separate sparsity parameters and stability-based model selection

Novel model selection criteria on sparse biological networks

Introduction

Results

Toy example

Simulation study with synthetic data

Design

Performance measures

Results

Simulation study based on real data

Breast cancer data

Psychopathological symptoms data

Post-traumatic stress disorder symptoms data

Discussion

Methods

Partial correlation estimation procedures

The graphical lasso

Nodewise regression

Joint sparse linear regression

Partial correlation estimation using ridge regression

Partial correlation screening procedure

Choice of the tuning parameter

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Covariance matrix for Psychopathological Symptoms Data

R Code Simulation Synthetic Data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

A psychological network analysis of the relationship among component importance measures

Estimating Finite Mixtures of Ordinal Graphical Models

Search

Navigation