Inferring functional connectivity in fMRI using minimum partial correlation

Functional connectivity has emerged as a promising approach to study the functional organisation of the brain and to define features for prediction of brain state. The most widely used method for inferring functional connectivity is Pearson-s correlation, but it cannot differentiate direct and indirect effects. This disadvantage is often avoided by computing the partial correlation between two regions controlling all other regions, but this method suffers from Berkson-s paradox. Some advanced methods, such as regularised inverse covariance, have been applied. However, these methods usually depend on some parameters. Here we propose use of minimum partial correlation as a parameter-free measure for the skeleton of functional connectivity in functional magnetic resonance imaging (fMRI). The minimum partial correlation between two regions is the minimum of absolute values of partial correlations by controlling all possible subsets of other regions. Theoretically, there is a direct effect between two regions if and only if their minimum partial correlation is non-zero under faithfulness and Gaussian assumptions. The elastic PC-algorithm is designed to efficiently approximate minimum partial correlation within a computational time budget. The simulation study shows that the proposed method outperforms others in most cases and its application is illustrated using a resting-state fMRI dataset from the human connectome project.


Introduction
Functional connectivity is defined as measurable temporal dependences among spatially segregated brain regions of interest (ROIs) [1] .It is a statistical summarization of brain signals rather than a generative model of brain functions.Although functional connectivity cannot give a comprehensive interpretation about how brain works, it can show important features of brain activities at a macroscale level.For example, functional connectivity can inspire hypotheses of brain organisation [2] and contribute to predictive models [3] , e.g., to characterise pathological [4] or cognitive states [5] .
Implicit in interpretations is the hypothesis that functional connectivity describes true connections (i.e., direct causal effects) between nodes (i.e., distinguishable functional anatomic regions of interest).The accuracy of functional connectivity is typically evaluated in three stages: in terms of the "skeleton" or organisation, strength and, in some instances, directionality.These stages can be considered to be conceptually sequential.This paper focuses on inferring the skeleton, which is to determine whether a connection between two ROIs exists or not.Smith et al. [6] systematically investigated the accuracy of 12 methods for skeleton inference using 28 synthetic networks and their corresponding functional magnetic resonance imaging (fMRI) blood-oxygen-level dependent (BOLD) signals generated from the dynamic causal model (DCM) [7] .The results show that the top methods are inverse covariance (ICOV), fully partial correlation, Bayes net and full correlation.
Among the above four methods, full correlation is the most widely used method.Application of full correlation methods to fMRI datasets enables inferences regarding brain structures [8] and provides powerful features for brain "decoding" [9] .However, while full correlation is robust and parameter-free, there are well-known intrinsic limitations to differentiation of direct and indirect effects.Fig. 1 shows an example of differentiating direct and indirect effects in the network where Node 1 → Node 2 → Node 3. The full correlation between Node 1 and Node 3 is as high as 1.0, but there is no real connection between the two nodes.
Fully partial correlation [10] often can successfully identify the direct effects in Fig. 1 by regressing out the intermediate node.The fully partial correlation of two nodes is their correlation when all other nodes in the network are controlled."Fully partial correlation" was called "partial correlation" in previous studies about functional connectivity.Mathematically, partial correlation is used to express the correlation when any subset of nodes is controlled.To avoid ambiguity, fully partial correlation and partial correlation are explicitly distinguished in this paper.It is difficult to estimate the fully partial correlation when the number of data points is less than the number of nodes.Methods based on a sparse linear regression model with L1-norm regularisation [11] and on both L1-norm and L2-norm regularisation [12] were developed.In [6], it is shown that fully partial correlation can perform well.Recent studies show that fully partial correlation of fMRI signals was correlated with lifestyle, demographic and psychometric measures [13,14] .However, this strategy sometimes causes two independent nodes to become conditionally dependent, which is known as Berkson s paradox [15] .As illustrated by Fig. 2, there is no connection between Node 1 and Node 2, but the fully partial correlation between these two nodes is very close to 1.0.ICOV is a regularised version of fully partial correlation, which tends to reduce the number of non-zero elements of the precision matrix [16,17] .This method was extended to incorporate both functional and diffusion imaging data [18] .ICOV may correct Berkson s paradox if the regularisation parameter is set to an appropriate value.The main drawback of ICOV is that its accuracy highly depends on the regularisation parameter.As is observed in Fig. 3 (a), the absolute value of ICOV between Node 1 and Node 3 is significantly smaller than those of other node pairs when the value of the regularisation parameter is smaller than 1.0, which means the direct and indirect effects can be correctly distinguished.However, the absolute value of ICOV between Node 1 and Node 3 is closed to the values of other node pairs if the value of the regularisation parameter is larger than 1.0.Thus, the direct and indirect effects cannot be identified.One possible way to overcome this drawback is to use cross-validation for automatically adjusting the regularisation parameter.
For real datasets, the crossvalidation scores were usually calculated based on the accuracies of predicted BOLD signals rather than inferred connections [17] , because the real connections are unknown.However, the validity of tuning the regularisation parameter in this way is still doubtful for two reasons.One reason is that a value of the parameter that results in accurate prediction (e.g., fMRI BOLD signals) is not necessarily same as the one that is likely to recover the true model (e.g., real connections) [19] .The other reason is that functional connectivity provides a data summary rather than a generative model [1] , so it may not be reasonable to use functional connectivity to replicate fMRI BOLD signals for cross-validation.Several other approaches were also proposed for overcoming this drawback.Utilizing the monotone property of the lasso path of ICOV, Huang et al. [20] used the largest regularization parameter value that preserves the existence of a connection as a quasi-measure for the strength of this connection.Recently, a statistic also based on the lasso path was proposed to test whether a regularised inverse covariance matrix represents all the real connections [21,22] .The statements of their method are asymptotic and based on strong mathematical assumptions.Even worse, the proper value of the regularisation parameter for a sub-network may not be suitable for other subnetworks.The existence of an appropriate value for a largescale network is still unclear.
Here we propose use of minimum partial correlation as a parameter-free measure for the skeleton of functional connectivity in fMRI.The minimum partial correlation between two nodes is the minimum of absolute values of partial correlations by controlling on all possible subsets of other nodes.Fully partial correlation, which controls all other nodes rather than a proper subset of other nodes, can be regarded as an approximation of minimum partial correlation.Theoretically, there is a direct effect between two nodes if and only if their minimum partial correlation is non-zero under the faithfulness assumption [23] and Gaussian assumption.However, it is time-consuming to calculate minimum partial correlation, since the number of all possible controlling subsets grows exponentially with the number of nodes.To address this, we have modified the PCalgorithm to approximate the minimum partial correlation for an "elastic PC-algorithm".Unlike the PC-algorithm and its other variations, the elastic PC-algorithm is not based on a specific significance threshold.Within a given time budget, the elastic PC-algorithm can efficiently approach the minimum partial correlation as possible.We have evaluated the elastic PC-algorithm using the NetSim dataset [6] and illustrate its application with a resting-state fMRI dataset from the human connectome project [24] .This paper is an extension of its conference version [25] .More details of the algorithm and applications are provided in this journal version.

Preliminaries
In this paper, a graph is denoted as G = (V, E), which consists of a node set V = {r1, r2, • • • , rN } and an edge set E. A directed graph contains only directed edges, an undirected graph contains only undirected edges.A graph is usually represented by an adjacency matrix MG.For a directed graph, there is an edge from the node ri to the node rj if and only if the element mji in the j-th row and i-th column is non-zero.For an undirected graph, the adjacency matrix is symmetric.A non-zero element in the adjacency matrix represents an undirected edge.Given a directed graph, its skeleton is an undirected graph by stripping away all arrowheads from its edges and getting rid of redundant undirected edges.If there is a directed edge from the node ri to the node rj, then the node ri is a parent of the node rj and the node rj is a child of the node ri.If there is a directed path from the node ri to the node rj, then the node ri is an ancestor of the node rj and the node rj is a descendant of the node ri.
Given a directed acyclic graph (DAG) G = (V, E), each node denotes a random variable and each directed edge represents a causal effect from one random variable to another one.A causal model represented by G generates a joint probability distribution over the nodes, which is denoted as P (V ).
For a DAG G = (V, E) and a distribution P (V ) generated by the causal model represented by G, G and P satisfy the causal Markov Condition if and only if for every node r ∈ V , r is independent of V \ (Descendants(r) P arents(r)) given P arents(r) [23] .
When the causal Markov condition is satisfied, the conditional independence relationships in the generated distribution P are entailed in the DAG G using the concept called d-separation.For a 3-node path r1 → r2 ← r3, the middle node r2 is called collider.Definition 2. A path is said to be d-separated by a set Z if and only if at least one non-collider is in Z or at least one collider and all its descendants are not in Z. Two nodes are said to be d-separated by a set Z if and only if all paths between the two nodes are d-separated by Z [23] .
If two nodes ri and rj in DAG G are d-separated by a node set Z, then ri and rj are conditionally independent in distribution P given Z.There is no edge between nodes ri and rj if and only if they can be d-separated by some node set.This means that the nodes ri and rj are conditionally independent in the distribution P given some node set, if there is no edge in DAG G between the two nodes.However, the reverse statement is not true.For some DAG G and distribution P that satisfy the causal Markov condition, the conditional independence between two nodes ri and rj does not necessarily imply the absence of an edge between ri and rj.In order to exclude these cases, two equivalent concepts, stability [26] and faithfulness [23] , are introduced.Definition 3. A distribution P is faithful to a DAG G if all and only the conditionally independence relations in P are entailed by G, i.e., under the faithfulness assumption, there is no edge between two nodes ri and rj in G if and only if ri and rj are conditional independent in P given some node set [23] .
For the details of Bayesian networks and causal Bayesian networks, please refer to the two seminal texts [23,26] .For references to applications of Bayesian networks in neuroscience, refer to the two recent reviews [27,28] .

Minimum partial correlation
In this paper, we assume the mechanism that generates BOLD signals can be represented by a DAG G = (V, E), where V = {r1, r2, where xi and xj are sample means of random variables ri and rj .The Pearson s correlation coefficient is also called full correlation.Full correlation counts both direct and indirect effects between two random variables.To remove indirect effects, we can calculate the second order correlation between ri and rj by removing the effects of a node set Z ⊆ (V − {ri, rj}).This second order correlation is called the partial correlation ρi,j•Z by controlling the node set Z. Similarly, partial correlation can be estimated from samples: where {ε t i•Z } is a set of the residuals of the linear regression with ri as the response variable and Z as the predictor variable set.So is {ε t j•Z }.The sample partial correlation is the sample full correlation on the residuals when some controlling factors are regressed out.
The partial correlation between two nodes varies with the controlling set.For a node set V , the minimum partial correlation (MPC) between two nodes is the minimum of absolute values of partial correlations by controlling all possible subsets of other nodes, which is formally defined as follows.
Definition 4. Given two nodes ri and rj in a node set V , the minimum partial correlation ωi,j between the two nodes is where ρi,j•Z is the partial correlation between ri and rj by controlling the node set Z. 2 (V −{r i ,r j }) represents all subsets of the set V − {ri, rj}.A controlling set Z that satisfies ρi,j•Z = ωi,j is called a minimum controlling set for nodes ri and rj .Next, we will discuss an important property of minimum partial correlation that implies that it is a good measure of the existence of functional connectivity.
Assuming the distribution P of the nodes V in a DAG G is multivariate Gaussian and faithful to G, there is an edge between the two nodes ri and rj in G if and only if the minimum partial correlation ωi,j = 0.
According to [29] , the partial correlation ρi,j•z = 0 if and only if ri and rj are conditionally independent given Z under Gaussian assumption.Based on this result and Definition 3, Proposition 1 can be easily proved.
According to Proposition 1, minimum partial correlation is a measure of the existence of functional connectivity (i.e., the links in functional networks).Because minimum partial correlation is symmetrical, it cannot infer the directions of functional connectivity.According to Definition 1, minimum partial correlation can be calculated only based on partial correlation.Thus, minimum partial correlation can be estimated from samples, which is denoted as ωi,j.
We denote the cardinality of the controlling node set Z as |Z|.According to [30], the distribution of the estimated partial correlation ρi,j•Z is the same as the full correlation estimated from T − |Z| samples.Under the hypothesis that ρi,j•Z = 0, the z-score ρi,j•Z of the estimated partial correlation ρi,j•Z can be calculated as follows: However, the distribution of estimated minimum partial correlation is unknown.In this paper, we approximate the absolute value of z-score ωi,j•Z of the estimated minimum partial correlation ωi,j•Z as Given the BOLD signal samples from all ROIs of a brain, we can calculate the estimated value or the absolute value of its z-score of the minimum partial correlation for each pair of ROIs.These estimated values or absolute values of z-scores form a symmetric matrix Ω.The matrix Ω represents the skeleton of functional connectivity in a fractional way.Given a directed graph G, its skeleton is an undirected graph by stripping away all arrowheads from its links and removing redundant undirected links.A pair of ROIs with higher estimated minimum partial correlation value or absolute value of the z-score is more likely to have a link between them.The matrix Ω is therefore a parameter-free measure of functional connectivity, which can be used for decoding cognitive states, evaluating brain disease, etc.
There are three advantages of parameter-free methods for inferring functional connectivity.The most obvious one is that parameter-free methods are user-friendly, prior knowledge and experiences are not required.A second advantage is that parameter-free methods increase the comparability of results from different studies.If there are some tuning parameters in a method, the results usually vary according to the values of the parameters.Even though all experimental conditions are identical in two studies, it is not valid to compare the two results if different values of parameters are used in the two studies.A third advantage is that interpretability of parameter-free methods is usually better than other methods.Taking ICOV as an example, the physiological meaning of results showing that if a connection exists under some values of the regularisation parameter, but not with others is hard to explain.

Elastic PC-algorithm
It is in polynomial time to estimate partial correlation by controlling a given node set.However, we need to enumerate all possible controlling sets in order to estimate minimum partial correlation, the number of all possible controlling sets is exponential.Several algorithms for causal inference can be easily modified to calculate minimum partial correlation, such as SGS-algorithm [23] , PC-algorithm [23] , TPDA [31] , MMHC [32] and TPMB [33] .
This paper focuses on extending PC-algorithm to approximately calculate minimal partial correlation.The PCalgorithm is composed of two main steps: skeleton graph inference and edge direction determination.This paper aims at generating functional connectivity without orientations of edges.Thus we concentrate on the first step of the PCalgorithm.
The PC-algorithm uses an important feature of Bayesian networks to achieve fast computation.That is, if two nodes ri and rj can be d-separated, then there is a subset of their neighbours which d-separates them.Therefore, there is no need to enumerate all possible controlling sets.Instead the PC-algorithm only considers subsets of the neighbour sets.Because the structure of the graph is not known in advance, the PC-algorithm implements an iterative strategy: One step in the PC-algorithm is based on the graph reduced by the previous steps.The edges of the graph are removed according to a statistical test and a significance threshold.The more edges are removed, the smaller the search space is.However, if an edge is deleted incorrectly, this search space will be improperly reduced.These mistakes will propagate to the following steps.
The edge deletion is highly depended on the choice of the significance threshold α.The null hypothesis of the statistical test in the PC-algorithm is that there is no edge between the selected two nodes.With a lower significance threshold α, it is less likely that the null hypothesis is rejected.In other words, edges are more likely to be removed, which will increase type II error.To increase the accuracy of edge removal at each step, a larger α is required.However, the increase of α will increase the calculation time as more edges are remained.Therefore, the significance threshold α balances the accuracy of result and complexity of computation.
It is difficult to optimally determine α, because the accuracy and execution time of a given α cannot be predicted in advance.A common choice of α is 0.05.We can first set α to 0.05 and gradually increase its value to obtain more results with higher accuracy.This can avoid a situation in which the PC-algorithm runs for an undesirably long time for evaluation of a large choice of α.For example, we would like to choose α from the set {0.05, 0.1, 0.15, 0.2, 0.25}.We first carry out the PC-algorithm with α = 0.05 and then with larger values.If the algorithm keeps running for a long time when α = 0.2, we can just record the results for α = 0.15, as it is the most accurate result that can be obtained within an acceptable execution time.
Although it has been demonstrated that the above ap-proach achieved a satisfied accuracy for inferring functional connectivity in simulated fMRI datasets [34] , this approach brings in extra computational loads, as the results with lower significance threshold are not used in the calculation with higher significance thresholds.To save computational effort, this paper proposes the elastic PC-algorithm, taking the concept of elasticity in cloud computing [35] .Our method automatically increases the significance threshold within a given time budget to maximally approach the minimum partial correlation.More specifically, an execution of the elastic PC-algorithm includes several iterations.The execution starts from an iteration with a low significance threshold and continues on the iterations with higher thresholds.The execution stops if the given time budge is used up.An iteration with a higher significance threshold can exploit the results of the iterations with lower significance thresholds in order to avoid repeated calculation of same partial correlation in different iterations.if delete the undirected edge (ri, rj ) from the referenced skeleton Sα 9) delete the undirected edge (ri, rj) from the referenced skeleton S β 12) end if 13) end for 14) C(:, :, k) ← D(:, :, k) 15)  for each ordered node pair (ri, rj) that satisfies |adj(Sα, ri)\{rj }| ≥ k do 16)  for each controlling node set The elastic PC-algorithm is based on PC-stable algorithm [36] , which is a recent variant of PC-algorithm.The symbol N −1 (•) represents the inverse of the standard normal cumulative distribution function (CDF).The zscore of full correlation ρi,j and the z-score of partial correlation ρi,j•Z can be calculated according to (4).This symbol adj(G, ri) indicates the neighbour set of node ri in graph G.
In the elastic PC-algorithm, we introduce a 3D matrix that contains the minimum partial correlation values with different steps.We call this matrix as s-cube for short.In The inputs of the elastic PC-algorithm include signal samples, significance threshold α, previous significance threshold β and its corresponding s-cube D. At beginning, the s-cube C is initialized using the full correlation between any node pairs.In the k-th iteration, the reference skeletons Sα and S β are derived using the (k − 1)-th slice of s-cube C and D respectively.A reference skeleton is used as an approximation of the skeleton of the graph for calculating neighbour sets of nodes.Afterwards, the k-th slice of C is calculated.For an ordered pair of (ri, rj ) that is adjacent in Sα but not adjacent in S β , the minimum partial correlation should be calculated as it is not included in D. For an ordered pair of (ri, rj) that is adjacent in both Sα and S β , some steps of minimum partial correlation calculation have been done in D, which can be directly reused.That is, if a subset of neighbours Z for (ri, rj) in Sα is also a subset of neighbours for (ri, rj) in S β , there is no need to calculate the partial correlation ρi,j•Z.In this way, computational effort can be saved by avoiding repeated calculations.The estimated z-score matrix Ω of minimum partial correlation for significance threshold α is C(j, i, N − 2).As shown in Algorithm 2, Algorithm 1 is repeatedly executed with increasing significance threshold within the given time budget.An example of the elastic PC-algorithm is illustrated in Fig. 4.

Results
In this section, the elastic PC-algorithm was evaluated using a synthetic dataset: NetSim [6] .Its application is illustrated using a resting-state fMRI dataset from the human connectome project [24] .Results with the elastic PC-algorithm were compared with full correlation, fully partial correlation, regularized inverse covariance (ICOV) [16,17] , network deconvolution (ND) [37] and global silencing (GS) [38] .A brief description of these methods is as follows:

table in (c) shows calculations required at
Step 1 with the significance threshold α.The column "need calculation?"tells us whether the corresponding partial correlation needs to be calculated or can be directly obtained from D. For node pairs (r2, r3), (r2, r5) and (r3, r4), whose edges are in red in (a), all their partial correlations need to be calculated as Sβ in (b) does not contain the edges between them.For the node pair (r1, r2), partial correlation on set {r5} needs to be calculated.This is because in Sβ, {r5} is not a subset of neighbours for (r1, r2).

1) Full correlation (Full):
The full correlation of two ROIs ri and rj can be calculated using (1).2) Fully partial correlation (FP): The fully partial correlation of two ROIs ri and rj can be calculated using (2) by setting Z = V \{ri, rj}, where V is the set of all ROIs.3) ICOV: Let Σ sample be the sample covariance matrix calculated according to (1), the ICOV method finds an optimal matrix Λ according to: Λ=argmin Λ 0 tr (ΛΣ sample ) −logdetΛ +λ Λ l 1 , where • l 1 is the element-wise l1 norm of elements in the matrix Λ. ICOV is a parameter dependent method and the value of the regularization parameter λ greatly influences the results.For the simulation dataset, the regularization parameter λ was set to 5 and 100, which were suggested by the designers of this dataset [6] .For the real dataset, a cross-validation method was used to tune the regularization parameter λ as is described in [17], although we do not think this cross-validation method is appropriate.4) ND: Let Σ sample be the sample covariance matrix calculated according to (1).In ND, Σ sample is modelled as the sum of direct dependence Λ and indirect dependence due to indirect paths.Λ is derived from: Λ= Σ sample (I+Σ sample ) , where I is a identity (ID) matrix.5) GS: In the GS method, the relationship between Λ and Σ sample is approximately modelled as Λ= , where ϕ( • ) sets the off-diagonal terms of matrix to zero.

Simulation evaluation
The NetSim dataset [6] designed by Smith et al. was used to evaluate the accuracy of our method.Because this paper focuses on skeletons, we did not evaluate the direction inference.The BOLD signals were generated using DCM with nonlinear balloon model for the "vascular dynamics" [7] .The dynamics of the signals in the neural network was defined as follows: ṡ=σΦs+Ψu, where s is the neural time series, ṡ is its rate of change, u is the external inputs, Ψ is the weights controlling how the external inputs feed into the network, Φ defines the network connections between nodes with the diagonal elements set to −1, σ controls both the within-node temporal inertia and the neural lag between nodes.The neural signals were then fed into the nonlinear balloon model for vascular dynamics to generate the BOLD signals.The values of balloon model parameters were almost identical to the prior means used in [7].
All network topologies were constructed based on a same building block: a 5-node ring.The ring was a variation of a chain, the head of which is pointed to its tail.The networks with more than 5 nodes were built by linking several 5-node rings together.28 networks were designed for different purposes of evaluation.Please refer to [6] for the details of these networks.For each network, 50 simulated subjects were generated by slightly changing the values of parameters.
Following [6], we used c-sensitivity of skeleton inference to measure the accuracy.Given a directed graph G, its skeleton is an undirected graph by stripping away all arrowheads from its edges and removing redundant undirected edges.Suppose that each element of the symmetric matrix Â represents the estimated "confidence" about whether there is a connection between the corresponding nodes.All elements of matrix Â are assumed to be nonnegative.The distribution of "true positive" (TP) connections includes all the elements in the matrix Â, the corresponding connections of which are found in the skeleton of graph G. Similarly, the distribution of "false positive" (FP) connections includes all the elements in the matrix Â, the corresponding connections of which do not exist in the skeleton of graph G.The c-sensitivity is defined as the fraction of true positives that have higher value in the matrix Â than the 95th percentile of the false positive distribution.The c-sensitivity shows the ability of the matrix Â for differentiating true connections and spurious connections.If there is no overlap between the distributions of TP and FP, the c-sensitivity is 100%.It is worth noting that diagonal elements are ignored for calculating c-sensitivity.Fig. 5 shows an example of calculating c-sensitivity.
We investigated the performance of different methods by estimating the resulting c-sensitivity values.The results are summarized in Table 1, where each value is the mean value of 50 simulations.ICOV-5 and ICOV-100 represent the ICOV method with the regularization parameter being set to 5 and 100, respectively.These settings can give good results of ICOV according to [6].
The figures highlighted by yellow in Table 1 are the highest c-sensitivity scores in their corresponding simulations.Generally speaking, the elastic PC-algorithm (EPC) performed best.The elastic PC-algorithm gained highest c-sensitivity in 24 out of 28 simulations.Full correlation (Full) and ICOV-100 only worked best for 1 simulation, while fully partial correlation (FP) achieved the maximal c-sensitivity values for 2 simulations.ICOV-5 and network deconvolution (ND) gained the maximal values for 3 simulations, while global silencing (GS) worked best for 4 simulations.For the results obtained from elastic PC-algorithm, 19 simulations had c-sensitivity values larger than 85%.The results from all methods for the 11th simulation were less than 20%.This is because the synthetic signals for this simulation were generated in a way that ROIs signals greatly influence all others.Some values of c-sensitivities in Table 1 are slightly different from the previous results in [6].These differences must be caused by minor variations of  We further investigated the relationship between csensitivity and significance threshold for the elastic PCalgorithm.Fig. 6 shows the c-sensitivity improvements with increasing significance thresholds.Negative values in Fig. 6 mean the c-sensitivity decreased with higher significance thresholds.Theoretically, these cases should not happen, but fMRI data do not strictly follow faithfulness and Gaussian assumptions.Although the c-sensitivity improvements in Fig. 6 had some local fluctuations, the overall increasing trend can still be clearly observed.An obvious increase for a large fraction of networks occurred at the first two steps, which are from 0.05 to 0.10 and from 0.10 to 0.15.
The elastic PC-algorithm can save computational effort by avoiding repeated calculations.Some partial correlations at a higher significance threshold can be directly obtained from the results at the previous significance threshold.To assess this, we investigated computational effort saved at each significance threshold by representing the saved computational effort as the proportion of times that the last condition in Line 16 of Algorithm 1 was not satisfied.Fig. 7 shows the relationship between saved computational effort and significance thresholds for different simulations.It can be seen that the maximum proportion approached to 100%.For some simulations, the proportion values became 100% at some significance thresholds.It means that the elastic PC-algorithm returned stable results when significance threshold was above a certain value.Hence all results at higher threshold were directly obtained from previous results.The computational effort was 100% saved in this case.
The proportion of saved efforts varied from 23% to 100% with the mean equal to 84.3%.When the significance threshold became higher than 0.15, 27 out of 28 networks had the proportion values larger than 50%.The largest Fig. 7 Relationship between saved computational effort and significance thresholds in the elastic PC-algorithm increase of proportion value for most simulations occurred when significance threshold increased from 0.05 to 0.15.As has been pointed out, there was a significant increase of c-sensitivity in Fig. 6 occurring at this place.These observations demonstrate that the elastic PC-algorithm can save computational efforts while maintaining accuracy.

Empirical illustration
A real resting-state fMRI dataset from the human connectome project [24] was used to illustrate the elastic PCalgorithm and compare it with other methods as we listed above.In contrast to the simulation dataset, there was no "ground truth" in the real dataset.Thus accuracy measures of the results, such as c-sensitivity, therefore cannot be calculated.Here we only presented and discussed the properties of the results.
The real dataset included resting-state fMRI data of 10 unrelated subjects.All were healthy (6 women, 4 men).There were two subjects in the 22-25 age range, three subjects in the 26-30 age range, and five subjects in the 31-35 age range.For each subject, the functional data were acquired in four approximately-15-minute runs, which were carried out in two separate imaging sessions.Within each session, the oblique axial acquisition of one run was phase encoded in a right-to-left direction, it was alternated to phase encoding in a left-to-right direction for the other run.All functional images were acquired using a gradientecho echo-planar imaging (EPI) sequence with the following parameters: repetition time (TR) = 720 ms, echo time (TE) = 33.1 ms, flip angle = 52 • , FOV = 208×180 mm, slice thickness = 2.0 mm, 72 slices, frames per run = 1 200, run duration = 14:33.The data was processed using FM-RIB software library (FSL) [39] , FreeSurfer [40] and Connectome Workbench image analysis suites [41] .The data was de-noised using an independent component analysis based method [42,43] .The details of the pipelines are presented in [44].
The automated anatomical labeling (AAL) atlas [45] was used to parcel a whole brain into 116 ROIs, which are 90 regions in cerebrum and 26 regions in cerebellum.The time series of a ROI were defined to be the average time series of all voxels in this ROI.Although this paper only aims at estimating the connections between nodes, we have also faced the problem of parcelling a whole brain into nodes in this study.This is because that there is yet no widely-accepted parcellation for functional connectivity inference [46,47] .Although it has been recommended to use data-driven approaches, such as high-dimensional independent component analysis (ICA), rather than gross structural atlas based parcellation that may not finely correspond to real functional boundaries in the data [48] , we chose the AAL atlas (which shows greater stability than other atlases tested [49] ), because of its simplicity and interpretability.
For the elastic PC-algorithm, the initial significance threshold was 0.05 and the step size was 0.05.Ten steps were executed for single subject analysis.For multiple subject analysis, to save computational time only three steps were executed.Thus the significance threshold ranged from 0.05 to 0.15.For ICOV, the regularisation parameter was set to 6.5, which was obtained from a cross-validation method as is described in [17], although the validity of this method is still unclear.

Single subject analysis:
We first arbitrarily selected a subject to visually compare the elastic PC-algorithm with other methods.The results of each method formed a matrix, representing the functional connectivity skeleton.All matrices were normalized in the same way that all elements range from 0 to 1.An element with higher score in the matrix indicated a higher confidence on the existence of a connection.Fig. 8 gave us a straightforward overview of different results, the 10 images in the first two columns show results of elastic PC-algorithm over different significance thresholds, 5 images in the last column display results of 5 other methods.Each figure presents a matrix, the dimension of which is 116 × 116.The colour indicates the values of elements of these matrices, which are from 0 to 1.A figure with more white area means that there are more zero elements in the corresponding matrices.
More elements in the results of the elastic PC-algorithm approached zero with an increasing significance threshold.The resulting matrix with the significance threshold of 0.50 was sparsest.There were many elements consistently large in all results of the elastic PC-algorithm, especially the elements adjacent to the diagonal, suggesting that neighbouring labelled ROIs in AAL atlas are more likely to have connections.Because two homotypic regions in the left and right hemispheres were assigned adjacent labels in the AAL atlas (e.g., lingual regions 47 and 48 in the left and right hemispheres), this observation is consistent with the observation that two homotypic regions are more likely to highly correlate with each other.Compared with other methods including full correlation, fully partial correlation, ICOV with regularisation parameter of 6.5, ND and GS, results from elastic PC-algorithm with significance thresholds larger than 0.10 were sparser than results from all other methods.The ND method performed poorly in regards of sparsity, while GS method generates results with patterns different from other methods that appears unreliable.Fully partial correlation and ICOV methods returned results with similar patterns to the elastic PC-algorithm but less sparse.All these observations suggest that results from the elastic PC-algorithm may be more plausible and reliable than results from other methods.
The relationship between the saved computational efforts and significance thresholds was studied.The computational effort saved at each significance threshold was represented as the proportion of times that the last condition in Line 16 of Algorithm 1 was not satisfied.The proportion of saved computational effort varies from 9.2% to 72%.When the significance threshold increased from 0.05 to 0.1, only 9.2% computational effort was saved.It means that the reference skeleton of threshold 0.1 changed greatly from the reference skeleton of threshold 0.05.When the significance threshold further increased above 0.10, the proportion of saved effort was always maintained at a level larger than 50%.Thus, the results of the elastic PC-algorithm reached a relatively stable state, where the number of re-calculations is limited.
The patterns of the results with the elastic PC-algorithm with different significance thresholds were similar, but became sparser with higher significance threshold (Fig. 8).Moreover, the changes between results with two adjacent significance thresholds were calculated.The amounts of changes between two neighbouring significance thresholds were measured using the root mean square difference of elements.The changes at threshold 0.10 was as high as 0.17.When the significance threshold increased above 0.20, the changes remained at a constant level lower than 0.02.It means that the results became relatively stable with significance threshold larger than 0.20.This is consistent with the observation that the proportion of saved computational effort maintained high when the significance threshold was larger than 0.2.

Multi-subject analysis:
After the single subject analysis, we applied different methods to infer functional connectivity for 10 subjects.Similar to single subject analysis, functional connectivity was fractionally represented in matrices, whose elements were normalized between 0 and 1.The matrix for each subject was inferred using different methods, then the average matrix was obtained from each method across 10 subjects.This generated 6 matrices, one for each of the 6 different methods including full correlation (Full), fully partial correlation (FP), ICOV, network deconvolution (ND), global silencing (GS) and the elastic PC-algorithm (EPC).To compare these 6 different results, we selected the top 20 largest elements of each matrix and use their corresponding connections to form a functional network for each method, which was used to estimate the 20 most confident connections.There were 40 connections selected by at least one method.Many connections were detected by at least 2 different methods.
Table 2 shows the details of these connections.There were three categories, classified according to their patterns of connections between pairs of: 1) homotypic regions in the left and right hemispheres (Type I connection), 2) the heterotypic regions in the same hemisphere (Type II connection), 3) heterotypic regions in different hemispheres (Type III connection).If a connection was found by a certain method, then the corresponding value in the table was set to 1, otherwise it was set to 0. There were 16 Type I connec-tions, 22 Type II connections and 2 Type III connections.Connections discovered by the elastic PC-algorithm were similar with those detected by fully partial correlation and ICOV.However, the results of full correlation were very different from those of other methods, consistent with the expectation that full correlation method detects indirect, as well as direct, effects.There were 6 Type I connections that were coherently discovered by all 6 different methods.These were found in the following hemispherically homotypic ROIs: Calcarine, Cuneus, Occipital Mid, Postcentral, Supramarginal and Precuneus.Compared to the other types, the fraction of Type I connections that were consistently detected by different methods was relatively high.Only 1 out of 22 Type II connections was discovered by all 6 methods (the connection between precentral and postcentral gyri in the right hemisphere).Ten Type II connections were found by one method.
There were only two Type III connections (between left calcarine and right lingual regions, and right cuneus and left superior occipital regions), and these were identified only by the full correlation method.As is shown in Fig. 9, it was quite likely that these Type III connections were enhanced by indirect effects.
We also investigated the distribution of top 1% connections that were found by the PC-algorithm.Totally, 67 connections were selected with the corresponding elements in the matrix ranging from 0.20 to 0.75.These connections were plotted using BrainNet Viewer [50] (Fig. 10).There were 26 Type I connections, 40 Type II connections and only 1 Type III connection.It is interesting to observe that we only obtained a quite limited number of connections between two heterotypic regions in different hemispheres.It suggests that in resting states, the functional connectivity might be composed primarily of Type I and Type II connections.

Discussion
We proposed use of minimum partial correlation to infer the skeleton of functional connectivity in fMRI data.We designed an algorithm, called elastic PC-algorithm, to efficiently approximate the minimum partial correlation within a computational time budget.A Matlab implementation of the proposed algorithm is publicly available at https://github.com/LNie/ElasticPC.The simulation study demonstrated the capability of the proposed method to improve the accuracy of skeleton inference.The empirical study showed that direct functional effects between two heterotypic regions in different hemispheres might be rare during resting state.
There are limitations to our approach.One drawback of the proposed method is that minimum partial correlation is always zero if the number of points in fMRI time series is less than or equal to the number of ROIs.This situation is not common in areal level studies, because the number of time points is normally more than 600 and the number of ROI is usually equal to or less than 300.However, minimum partial correlation cannot be directly applied in voxel level studies.This is because the number of voxels is much larger than the number of time points.ICOV [16,17] and recently proposed L0-norm based methods [51] can avoid this drawback via proper regularization.Future work is needed to explore the possibility of combining minimum partial correlation and L0-norm or L1-norm based approaches.The figure is generated by BrainNet Viewer [50] to show all sides of the brain, including the lateral and medial sides of each hemisphere, the dorsal and ventral sides, and the anterior and posterior sides of the brain.
Another drawback of our study is that the theoretical analysis of minimum partial correlation is based on the faithfulness, Gaussian and DAG assumptions.As these are approximations for fMRI data, potential confounds that may arise as a consequence need to be considered in the application of the proposed method.Although, the performance of the proposed method was demonstrated on a simulation dataset generated without these assumptions, the simulation dataset was artificially designed and the dimensions of the network structures were highly simplified.Future work is needed to investigate the theoretical bounds on the performances under more realistic assumptions.
The methods for network modelling from fMRI data are based on various assumptions, thus the model complexity of these methods are different.A simpler model, e.g., full correlation, is less neurologically meaningful but it can stably and efficiently infer a network with more nodes, in contrast, a more complex model, e.g., DCM, is more neurologically meaningful but it may be more sensitive to its assumptions and can only analyse smaller networks [48] .In terms of model complexity, the proposed minimum partial correlation is simpler than DCM and more complex than full correlation, partial correlation and ICOV.The minimum partial correlation is a statistical model rather than a generation model and with few neurophysiological assumptions.Therefore, it may be better to be used to discover network structures as an alternative to full correlation, partial correlation and etc.
A recent study shows that the functional connectivity can be used as "fingerprinting" to identify individuals with a high accuracy [52] .Further work is needed to investigate the performance of minimum partial correlation to preserve subject-specific patterns via identification experiments.It will be of interest to study the heritability of subject-specific patterns on functional connectivity.Future work also may use minimum partial correlation to explore the relationship between brain functions and behavior, genetics or disease related phenotypes [3] .This could be important particularly in using minimum partial correlation as biomarkers [53] to better classify psychiatric diseases.
Research Article Manuscript received August 20, 2016; accepted February 21, 2017; published online June 13, 2017 Recommended by Associate Editor Hong Qiao c The Author(s)

Fig. 1
Fig. 1 An example of differentiating direct and indirect effects.(a) The left is the structure of a 3-node network, the right is the dynamics of each node.The input signal u1(t) is the BOLD signal of a voxel from a real fMRI image.The noise is randomly sampled from a Gaussian distribution with mean 0 and standard deviation 0.01.(b) The absolute values of full correlation, fully partial correlation and minimum partial correlation, respectively.The direct effects (1-2 and 2-3) and the indirect effect (1-3) have the same values of full correlation.

Fig. 2
Fig. 2 An example of the Berkson s paradox.(a) The left is the structure of a 3-node network, the right is the dynamics of each node.The input signals u1(t) and u2(t) are the BOLD signals of two voxels from a real fMRI image.The correlation between u1(t) and u2(t) is −0.035.The noise is randomly sampled from a Gaussian distribution with mean 0 and standard deviation 0.01.(b) The absolute values of full correlation, fully partial correlation and minimum partial correlation, respectively.The fully partial correlation of Node 1 and Node 2 is as high as 1.0, although there is no effect between the two nodes.

Fig. 3
Fig. 3 ICOV with various values of the regularisation parameter.(a) Absolute values of ICOV for the network in Fig. 1 with the regularisation parameter varying from 0 to 10.(b) Absolute values of ICOV for the network in Fig. 2 with the regularisation parameter varying from 0 to 1 000.

2 .
Elastic PC-algorithm (EPC) Input: signal samples {x t i t ∈ [1 : T ], i ∈ [1 : N ]}, step δ Output: s-cube C 1) for each ordered node pair (ri, rj) do 2) C(i, j, 0 : (N − 2)) ← |ρi,j | 3) end for 4) β = 0 5) α = β + δ 6) while remaining time budget > 0 do 7) C ← Call Algorithm 1 {x t i }, α, β, C 8) end while the algorithm, there are two s-cubes C and D, which are for the significance thresholds α and β respectively.As there are N nodes in the graph G and k should be smaller than |adj(G, ri) \ {rj }| for any pair of nodes (ri, rj), the maximum value of k is N − 2. Thus, k can vary from 0 to N − 2, which indicates the number of neighbours.The dimension of s-cube is N × N × (N − 1), where the k-th slice C(:, :, k) contains minimum partial correlation calculated in Step k.

Fig. 4
Fig. 4 An example of the elastic PC-algorithm.(a) and (b) display the reference skeletons Sα and Sβ in Algorithm 1, satisfying α > β.Thetable in (c) shows calculations required atStep 1 with the significance threshold α.The column "need calculation?"tells us whether the corresponding partial correlation needs to be calculated or can be directly obtained from D. For node pairs (r2, r3), (r2, r5) and (r3, r4), whose edges are in red in (a), all their partial correlations need to be calculated as Sβ in (b) does not contain the edges between them.For the node pair (r1, r2), partial correlation on set {r5} needs to be calculated.This is because in Sβ, {r5} is not a subset of neighbours for (r1, r2).

Fig. 5
Fig. 5 An example of calculating c-sensitivity.(a) The true network that generates the observations.(b) The skeleton of the true network, which is constructed by stripping away all arrowheads from links of the true network and removing redundant undirected links.(c) The estimated matrix, each element of which represents the "confidence" about whether there is a link between the corresponding nodes.(d) True positive distribution and false positive distribution that are calculated using both the estimated matrix and the skeleton.For example, the element 0.16 in Row 1 and Column 2 belongs to the true positive distribution because Node 1 and Node 2 are linked together in the skeleton.The element 0.11 in Row 1 and Column 3 belongs to the false positive distribution since there is no connection between Node 1 and Node 3 in the skeleton.The 95th percentile of the false distribution is 0.15 that is marked using a red circle.Four elements in the true distribution are larger than 0.15, one element is smaller than 0.15.Thus, the c-sensitivity is 0.8.

Fig. 6
Fig. 6 Relationship between c-sensitivity improvements and significance thresholds in the elastic PC-algorithm

Fig. 8
Fig. 8 Single subject results of different methods.The x-y axes indicate the labels of the 116 ROIs

Fig. 9
Fig. 9 Connections detected by various methods between the following ROIs: calcarine, lingual, cuneus and superior occipital regions

Fig. 10
Fig. 10 Top 1% connections selected by the elastic algorithm.The nodes outside the cerebrum are in cerebellum.The figure is generated by BrainNet Viewer[50] to show all sides of the brain, including the lateral and medial sides of each hemisphere, the dorsal and ventral sides, and the anterior and posterior sides of the brain.
• • • , rN }.The node ri denotes the ran-dom variable of the i-th region of interest (ROI).The random variable ri generates T samples of the BOLD signals of the i-th ROI, which are denoted as Xi = {x 1

Table 2
Top 20 connections detected by different methods