A measure of agreement across numerous conditions: assessing when changes in network structures are tissuespecific
Abstract
Background
There is great interest to study how gene pathways change their structure across different tissues. The assessment of interstudy reliability of pathway changes across tissues can inform on the fraction of tissues with specific functional changes in network structure. However, there is a lack of agreement measures among studies that independently observe how a group of observations change across conditions. We, therefore, propose λ, a new interstudy reliability measure that determines the consistency to distinguish observations by condition.
Results
We derived λ’s distributional characteristics, determine its reliability properties and compared it with Cohen’s κ. We studied the coexpression structure of 287 gene pathways across four brain regions in two transcriptomic studies and applied λ to assess the interstudy reliability of the pathways’ brainregional changes. Brainrelated pathways showed highest λ; the top value was for the nicotine addiction pathway whose structure was reliably distinguishable among regions with dopaminergic projections.
Conclusion
Our results offer novel substantial evidence that changes in network structure across tissues can be inferred independently of samples, algorithms and experiments (RNAsequencing or microarrays). Reliability measures, such as λ, can inform on the tissues where changes in a network’s structure are likely functional. An R package is available at https://github.com/isglobalbrge/lambda .
Keywords
Brain Transcriptome Coexpression networks Reliability Coehn’s kappa GTEx RNAsequencing Addiction NicotineAbbreviations
 Binomial(n,p)
Binomial distribution of n trials with success probability p
 BRAINEAC study
The brain eQTL almanac study
 CRBL
Cerebellum
 erf(x)
Error function of x
 FCTX
Frontal cortex
 GTEx study
Genotypetissue expression study
 HIPP
Hippocampus
 KEGG
The Kyoto encyclopedia of genes and genomes
 netA
Network A
 PUTM
Putamen
 RPKM
Reads per kilobase per million mapped reads
 RNAseq
RNA sequencing
 N(μ,σ^{2})
Normal distribution with mean μ and variance σ^{2}
Background
Classical statistics include numerous ways to measure the reliability of an observation [8]. Reliable observations are reproducible and accurate. Agreement measures between two independent experiments are used to assess the consistency of the observations being made. If observations are categorical, Cohen’s κ and its generalizations are typically used [9, 10]; if observations are continuous then a number of correlation measures can be used, such as intraclass correlations [11]. These and other agreement measures are suitable when experiments are performed under comparable, unchanged conditions. When studies are designed to test how a group of individuals change under a range of varying conditions, it is of interest to assess how reliable the changes across conditions are. However, there is a lack of recommended interrater measures that assess reliable changes among conditions [8, 12].
A first question that can be asked is the degree to which the most similar observations between studies correspond to those under the same condition, and hence have a measure of how reliably the observations can be distinguished by condition. Similarity can be assessed by any statistic, namely a preservation statistic, that compares two sets of observations at any particular property. A correlation of the observations between two conditions is an example of one preservation statistic. If a preservation statistic for observations between studies is defined [4], we can ask whether the preservation is maximum when the conditions between studies match. The question can be addressed mathematically by computing a preservation matrix, whose elements are the preservation statistics between conditions across studies, and assess the extent to which the diagonal terms are maxima across rows and columns. In particular, a preservation matrix across conditions can be defined for crosstabulated tables used in classical interstudy reliability studies and, therefore, the ability to correctly pair observations by condition can be studied in those cases. We thus aimed to construct a reliability measure for the correct condition pairing of observations between studies, compare it with Cohen’s κ and apply it to assess the interstudy reproducibility of changes in coexpression network structure across tissues.
The GTEx project is an unprecedented effort to study the gene expression in tens of tissues in hundreds of subjects using RNAsequencing [13]. It is, therefore, a strong candidate for becoming a preferred benchmark for interaction networks across different tissues. As the number of studies with expression data in multiple tissues is expected to increase, agreement measures against GTEx may serve to assess the reproducibility of network changes across tissues [3]. Some studies that measure gene expression in different tissues are publicly available, one of which is the BRAINEAC project [14]. Here, the gene expression using microarray data was measured in hundreds on individuals, free of neurodegenerative disorders at the time of death, in nine different brain tissues, four of which overlap with those in GTEx. Using our agreement measure between the studies, we assessed the fraction of brain regions for which 287 KEGG pathways [15] reliably changed across the cerebellum, frontal cortex, hippocampus and putamen.
Results
For two studies that measure the same group of units (subjects, coexpression pairs in a network, etc.) across k conditions, we propose to construct the k×k preservation matrix Z between conditions. The matrix element \(z_{ij'}\phantom {\dot {i}\!}\) (i,j=1,...k) is the correlation (or any preservation statistic) of the units’ observations between condition i in the first study and condition j in the second study. When the observations are the connections between all nodepairs in a network then \(\phantom {\dot {i}\!}z_{ij'}\) may be any normally distributed statistic where pairwise topologies are compared. Figure 1 shows the case in which a coexpression network A of m gene transctipts is inferred for n subjects in 6 tissues (1,...6) in 2 different experiments (E1 and E2). The matrix Z is given by all pairwise comparisons between tissues for the network connections across both experiments. The arrows show the values of Z that are maxima across the rows and columns that intersect at each diagonal term.
We are interested in measuring how reproducible the changes of units observations, such as network edges, are. In the case of perfect interstudy reproducibility across all conditions/tissues then the matrix Z looks diagonal. That is, \(\phantom {\dot {i}\!}z_{ii'}\) is maximum across row i and column i^{′}, for all i. In general, this is not the case and the preservation corresponding to row i and column j^{′}≠i^{′} may be maximum across the elements in row i and column i^{′}. Figure 1 shows, for instance, that the element of Z at tissue 3 in E1 and tissue 2 in E2 is the maximum in row 3 and column 3 of Z. As the network structure of tissue 3 in E1 is paired with that of tissue 2 in E2, network connections in tissue 3 have not been reliably reproduced between experiments. By contrast, all the structures at tissues 1, 2, 4 and 5 are correctly paired between experiments, as the corresponding diagonal terms are their row and column maxima and, therefore, their differences are reproduced between experiments.
where p_{ii} is the probability that the diagonal term \(z_{ii'}\phantom {\dot {i}\!}\) is maximum within its row and column elements. λ measures the fraction of conditions that can be reliably distinguished by structural changes, taking into account a probability for random pairing. We modeled p_{ii} assuming independence among the similarities of observations between conditions, and normality in the preservation statistic (see “Materials and methods” section). As an agreement measure, we derived its reliability properties and compare it with a classical measure of interstudy agreement, Cohen’s κ. We then applied it to determine which of 287 KEGG pathways presented higher rates of correct condition pairing across four brain regions between two independent transcriptomic studies.
Comparison between λ and Cohen’s κ in simulation studies
We studied the properties of λ with extensive simulations. While λ is applicable to more general situations than those covered by Cohen’s κ, we compared the performance of λ with κ in simulated cases where both measures can be used. We simulated numerous agreement tables between studies, in which 500 units were classified by both studies into numerous categories/conditions, under different scenarios. From the crosstabulation table of agreement, see expression 14 in “Materials and methods” section, we computed κ: the interstudy reliability of classification that accounts for random agreement. From the corresponding preservation matrix obtained from Eqs. 17 and 18 in “Materials and methods” section, we computed λ: the fraction of conditions that can be correctly distinguished between experiments. We also computed P_{0}: the proportion of units correctly classified between experiments; and r: the observed fraction of conditions that are correctly paired between experiments.
Simulations showed that when null agreement is expected (P_{0}=0) λ is 0, while λ is 1 for full agreement (P_{0}=1). Consistently with this, we observed that λ increased monotonically with κ for all the simulation scenarios, see Additional file 2: Figure S1. The functional dependence was highly stable under different scenarios, revealing, as expected, high λ agreement for fair values of κ (0.2,0.4), given that the latter is a measure of exact agreement rather than discriminative agreement. Therefore, the agreement measured by λ is consistently higher than the agreement measured by κ. We also observed that for low values, λ tends to zero when κ takes small negative values, a situation already described in Cohen’s work [9]. For a given κ, we observed a sizable range of λ values, in particular, as tissues become less equiprobable (scenario 3). We noted that if the number of tissues is small (k=5) and the marginal distribution greatly concentrates around one single tissue (j=1 for scenario 3), then λ tends to 1/k (0.2) because the experiments can clearly distinguish that tissue from the rest. In this case, κ tends to zero.
We additionally studied the variance σ of the fraction R, for which r is a realization and λ its expected value. We observed that σ decreased with the number of tissues and with departure from marginal equiprobability (Additional file 3: Figure S2). We observed that for a given λ, multiple values of σ are allowed (Additional file 4: Figure S3). However, the mean of κ, for a single draw of the binomial process in Eq. (16), is a function of its variance. In particular, κ’s variance is minimum at extreme values (0,1) and is inversely proportional to the number of subjects [16]. In contrast, we observed that σ decreases towards zero for a range of values of λ. This occurs when λ tends to r, that is, when the probabilities that the diagonal terms of the preservation matrix are maxima tend either to zero or to one, see Eq. (13) in “Materials and methods” section. We also computed the variability of λ from the 5% confidence intervals (CI) obtained from bootstrapping the units observations in the crosstabulation tables. For simulations corresponding to 5 equiprobable conditions and varying number of subjects, we observed that the length of λ’s CI was proportional to σ and independent of the number of subjects (Additional file 5: Figure S4).
Overall, these observations show that the fraction of tissues for which network changes are reproducible can be determined with high precision, low σ and therefore small CIs for λ. From a practical point of view, low or moderate values of λ can be used to select the tissues for which the network changes can be reliably measured, and are likely tissuespecific. Consider, for instance, a situation where the structure of a gene pathway is inferred in 50 different tissues. If we expect that the network is biologically functional in only 5 of the tissues, then λ will be at most 0.1. In addition, low σ will also indicate that the network structure in 90% of tissues could be discarded on grounds of measurement reliability. By contrast, κ is maximally informative only as it tends to 1, that is when all the conditions are distinguishable and the preservation matrix is fully diagonal.
We finally performed a power test for λ, where we simulated a true agreement scenario in which 3/5 tissues could be reliably paired across two studies. We simulated a true scenario using a multivariate distribution for 10 random variables corresponding to 5 networks per experiment; 3 networks between experiments correlated with strength of 0.5 while all other possible network correlations, across tissues and studies, varied from 0 to 0.5. We also varied the number of genes N in the network (10, 20, 40) corresponding to N(N−1)/2 nonredundant connections (45, 190, 780). We observed that λ had sufficient power (> 80%) to detect true agreement when the difference between matching tissue correlations and background correlations is 0.05 for 40 gene networks or 0.2 for networks consisting of 10 genes (Additional file 6: Figure S5).
KEGG pathways
The Kyoto encyclopedia of genes and genomes (KEGG) offers a list of experimentally characterized biochemichal pathways. We computed λ between two independent transcriptomic studies, GTEx (RNAsequencing) and BRAINEAC (microarray), that measured gene expression levels with different technologies in the cerebellum, frontal cortex, hippocampus and putamen.
Agreement measure λ>0.5 between BRAINEAC and GTEx across 4 brain regions in 287 KEGG pathways
λ  CI: 2.5%  CI: 97.5%  σ  Ref  Description 

0.68  0.72  0.75  0.02  hsa05033  Nicotine addiction 
0.67  0.61  0.88  0.04  hsa04720  Longterm potentiation 
0.58  0.61  0.77  0.04  hsa05206  MicroRNAs in cancer 
0.55  0.50  0.59  0.01  hsa04080  Neuroactive ligandreceptor interaction 
0.53  0.42  0.69  0.03  hsa04020  Calcium signaling pathway 
0.52  0.45  0.65  0.03  hsa04261  Adrenergic signaling in cardiomyocytes 
0.51  0.49  0.62  0.02  hsa04912  GnRH signaling pathway 
Agreement measure λ>0.7 for the benchmarking of BRAINEAC against GTEx across 4 brain regions in 287 KEGG pathways
λ  CI: 2.5%  CI: 97.5%  σ  Ref  Description 

0.97  0.99  1.00  0.01  hsa04724  Glutamatergic synapse 
0.95  0.97  1.00  0.01  hsa05033  Nicotine addiction 
0.94  0.93  1.00  0.01  hsa04360  Axon guidance 
0.91  0.93  1.00  0.02  hsa05414  Dilated cardiomyopathy 
0.89  0.97  1.00  0.02  hsa04931  Insulin resistance 
0.84  0.88  0.99  0.03  hsa04725  Cholinergic synapse 
0.84  0.82  0.97  0.02  hsa05032  Morphine addiction 
0.82  0.78  0.93  0.02  hsa04720  Longterm potentiation 
0.82  0.76  0.89  0.02  hsa05031  Amphetamine addiction 
0.81  0.76  0.95  0.02  hsa04912  GnRH signaling pathway 
0.79  0.70  0.91  0.03  hsa04261  Adrenergic signaling in cardiomyocytes 
0.76  0.75  0.78  0.01  hsa04727  GABAergic synapse 
0.75  0.75  0.75  0.00  hsa04080  Neuroactive ligandreceptor interaction 
0.75  0.75  0.75  0.00  hsa04723  Retrograde endocannabinoid signaling 
0.74  0.75  0.75  0.00  hsa04020  Calcium signaling pathway 
0.74  0.73  0.76  0.01  hsa04728  Dopaminergic synapse 
0.74  0.76  0.90  0.03  hsa05206  MicroRNAs in cancer 
0.73  0.75  0.76  0.01  hsa04015  Rap1 signaling pathway 
0.71  0.67  0.95  0.04  hsa04971  Gastric acid secretion 
0.71  0.68  0.80  0.02  hsa05034  Alcoholism 
Transcriptomewide gene network
We inferred the transcriptomewide coexpression network for 9071 genes across the GTEx and BRAINEAC studies in the four brain tissues. The network was fully characterized by 8.2×10^{7} genepair correlations. We observed λ=0.5 for the transcriptomewide network across all four tissues. Figure 5 illustrates the preservation matrix between studies across tissues. We observed that all the preservation statistics in the matrix were similar in size, between 0.38 and 0.49, and that the cerebellum and frontal cortex diagonals were correctly paired between studies. We noted that the diagonal terms at the hippocampus and putamen were the second maxima after their correlations with the frontal cortex in GTEx. Therefore, the transcriptomewide networks cannot clearly disentangle the frontal cortex from the hippocampus and putamen, suggesting that a large amount of the 8.2×10^{7} genepair correlations may not be brainregion specific.
We finally benchmarked the transcriptomewide network obtained in BRAINEAC against GTEx across the four brain regions. In this case, we confirmed that all diagonal terms were their column maxima (see Fig. 5) and therefore benchmark λ=1. These results show that the transcriptomewide network can correctly pair BRAINEAC tissues with those of GTEx. As the benchmarking only takes into account the variability between tissues of GTEx and not BRAINEAC, the observation: benchmark λ=1, is consistent with the consideration that RNAsequencing (GTEx) may provide better characterization of the transcriptome than microarray technology (BRAINEAC) [19].
Discussion
We proposed a new measure, λ, of agreement between studies. The motivation of the measure is the assessment of agreement between studies that measure the effects of varying conditions on a set of units (subjects, coexpression pairs, etc). We showed that the measure conformed to the properties of agreement measures and used simulations to compare it with Cohen’s κ. Our results illustrate the large potential of λ, in particular, for studying the changes in genenetwork structure across numerous tissues. The measure is particularly useful to determine the fraction of tissues for which network structures can be reliably distinguished, and thus informs on the ratio of structural changes that may be tissuespecific.
In our application to coexpression networks in brain, we observed that the structure of a number of brainrelated pathways could be reliably matched by tissue across GTEx and BRAINEAC. We observed that the nicotine addiction pathway presented the highest λ in which the frontal cortex, the hippocampus and the putamen were reliably distinguished. Interestingly the cerebellum, with a limited role in addiction, could not be reliably distinguished. Benchmarking of BRAINEAC against GTEx, confirmed that the differences in the nicotine pathway structure across relevant brain regions were highly reliable, as were those in glutamatergic, cholinergic, GABAergic, dopaminergic synapse pathways, whose complex interactions are critical to nicotine addiction [20]. The reliability of other addiction pathways (morphine, amphetamine and alcohol) supports these findings and indicates that structural changes of those pathways in regions with dopamine projections, which are essential in addiction processes (frontal cortex, hippocampus and putamen), are likely tissuespecific [21].
The agreement measure λ is based on the preservation matrix between tissues. Langfelder and colleagues thoroughly studied a number of pairwise preservation statistics of Cholesterol Biosynthetic Process network across sex and tissue differences in mice [4]. Their approach to assessing the reproducibility of the data was to test the preservation of network structure against a reference set of conditions (female/liver), allowing them to identify the condition (male/liver) for which structures are most similar. However, their data show, more generally, that sex changes are much smaller than tissue changes and, therefore, that differences in network structures can reliably pair tissues between sex changes. In this case, the interstudy reliability for changes in the network structure across a range of tissues is clearly high, as it would be summarized by a by high λ. While a pairwise preservation statistic assesses the reliability of a network’s structure between two conditions [4], the preservation matrix registers all pairwise changes of the structure among numerous conditions. λ is a measure on the preservation matrix that allows the assessment of agreement between studies to reliably discriminate between conditions, measuring the probability that the diagonal terms of the matrix are their row and column maxima.
We are unaware of similar measures of agreement on akin preservation matrices that test the changes in the structure of a network across tissues. Other functions of the preservation matrix could also be investigated. We observed, in particular, that λ is a measure of interobserver agreement that is consistently higher than Cohen’s κ. Perfect agreement for κ is exclusively given by diagonal tables, while perfect agreement for λ is given by maxima diagonal terms in tables where their elements, irrespective of their magnitude, are estimated with sufficient low variability. λ does not test the level of pairwise preservation but the extent to which structural patterns are correctly classified by conditions. This is an important difference between the measures, which allows λ to be used in more general situations where the elements of crosstabulated tables between raters/studies are inferences. In particular, we observed that λ is informative for intermediate values, as it can be used to select the fraction of tissues where network changes are reproducible and likely to be tissuespecific.
Currently, the validity of a gene or protein network derived from highthroughput data is often benchmarked against networks derived from current knowledge of specific interactions, given by curated pathways, specific experiments or even text mining of published articles [22]. This type of confirmatory analysis extracts networks that are a mixture of interactions that have been individually reported on different tissues. Therefore, while validity is investigated, in terms of consistency with previous knowledge, reproducibility of network structure in an independent confirmatory study is not being assessed. We observed important preservation of network patterns between two equivalent, independent studies, based on different technologies (RNAseq and microarray), analysis methods and subjects. As such, our results fully test interobserver reproducibility and provide support that inferred network structures are biological entities independent of numerous sources of variability and heterogeneity in the studies. Previous work has tested the agreement of transcription measurements between RNAsequencing and microarray experiments on the same subject samples ([23, 24]). These are important studies to validate experimental techniques. However, fully testing interobserver reproducibility of network inferences, as we have done, must be based on independent confirmatory experiments on different population samples, experiments and analyses. Finally, recent efforts have been made to assess Bayesian network reproducibility, where intrastudy preservation statistics have been proposed [25]. While there is still a need to apply those metrics to assess interstudy reproducibility, note that λ will also be applicable in such a case.
Conclusions
We propose a novel interstudy measure of agreement λ to determine the fraction of conditions for which structural changes in a set of observations may be deemed reliable. The measure revealed that changes in the coexpression structure of addictionrelated pathways can correctly distinguish the frontal cortex, hippocampus and putamen but not cerebellum between studies. Reliable tissue differences in network structure can help to identify tissuespecific pathwaybiology and increase the reproducibility of network inferences. More generally, our agreement measure can be used in a set of independent studies that measure how the same group of units changes across numerous conditions.
Materials and methods
We propose an interstudy measure of agreement to discriminate conditions. While the measure can be applied to different research studies, such as those with factorial designs, we illustrate how its need arises from an example in current functional genomic research. We consider two studies that measure the same units (subjects, genepair correlations, etc.) across multiple conditions. We aim to construct a measure that informs on the fraction of conditions that can be reliably distinguished by structural changes in the observations.
The problem
Let us assume that we have two studies that measure the expression of a set of genes in two different population samples, in the same range of tissues and using different experimental setups. For instance, one experiment may use RNAseq and the other a microarray technology. We may be interested in inferring the coexpression relationships between the genes across tissues and determine whether the changes in the relationships are consistent between experiments. The coexpression between two genes in the network is given by their correlation over the subjects’ gene expression levels. Figure 1 illustrates the situation where the coexpression between the genes of a network is inferred for 6 tissues in two different studies. Our aim is then to propose a measure that tells us the fraction of tissues that can be reliably distinguished by the network structure across studies. The measure will then inform on the fraction of tissues for which the changes in gene relationships may be tissuespecific.
A network A of m genes (nodes) can be represented by a m×m adjacency matrix between genes, where the entries are the edges linking a pair of nodes, see Fig. 1. We are interested in genenetwork modules, such as pathways or genesets associated with a biological function. There are numerous structural properties of networks that can be used to test the preservation of a network between two studies. Preservation statistics quantify given aspects of withinnetwork topology that are preserved between studies, see Langfelder et al. for a review [4]. Given a preservation statistics z, we aim to assess the interstudy reliability of a network’s structure between studies E1 and E2 across numerous conditions.
We would then like to have a measure of interstudy agreement from Z, which can tell how diagonal Z is or, more explicitly, the extent to which i can be correctly paired with i^{′}, for all i.
A solution

\(p_{11}=Pr(z_{11'}> z_{12'}, z_{13'}, z_{21'}, z_{31'}) \phantom {\dot {i}\!}\),

\(p_{22}=Pr(z_{22'}> z_{21'}, z_{23'}, z_{12'}, z_{32'})\phantom {\dot {i}\!}\) and

\(p_{22}=Pr(z_{33'}> z_{31'}, z_{32'}, z_{13'}, z_{23'})\phantom {\dot {i}\!}\),
The probability that the diagonal term \(z_{ii'}\phantom {\dot {i}\!}\) is the maximum in the column i follows a similar form.
The agreement measure λ is then the expected fraction of conditions with the correct matching of network structures between studies. In the case that E1 is the benchmark for experiment E2, then one is interested in testing whether the diagonal terms are the maxima of their columns only, generalizing the concepts of sensitivity and specificity. In this case λ can be computed by simply setting \(Pr(z_{ii}>z_{ji'})=1\phantom {\dot {i}\!}\) for j=1,...k. Note also that it is straightforward to generalize the measure for more than two studies by expanding the products in Eq. (4).
Reliability properties of λ
Suitable reliability measures satisfy three basic properties: i) their values range from 0 to 1; ii) they tend to 0 when no agreement is expected and tend to 1 when a full agreement is expected and iii) they account for random agreement.
That is, when there is no variability in z_{ij}, p_{ii} is 1 if \(\phantom {\dot {i}\!}z_{ii'}\) is maximum across rows and columns and 0 otherwise. Therefore, regarding property ii), null and perfect reliabilities are clearly expected with null variability. For such cases all p_{ii} are either 0 or 1 according to Eq. (13). In the case of null reliability none of the μ_{ii} is maximum, all p_{ii} are 0 and λ=0. Whereas, in the case of perfect reliability all μ_{ii} are maxima, all p_{ii} are 1 and λ=1. Regarding property iii), we can see that under null variability, λ tends to the observed fraction of conditions r that are correctly paired between experiments. The fraction r, which is a realization of R, can be considered as a reliability measure but does not account for random agreement. By contrast, λ, as an expected value of R under a probability distribution, incorporates a variability that accounts for correct random pairing.
Comparison between measures of agreement
where P(i,i)=n_{ii}/l is the observed frequency of units that were measured in category i by both raters and P_{d}(i) is the frequency of units in i observed by rater d (d=1,2). The sum \(P_{0}= \sum _{i} P(i,i)\) is the total fraction of agreement: the proportion of observations that falls in the diagonal, which does not account for random agreement. Cohen’s κ measures the fraction of discordant observations expected by chance that are actually observed in agreement.
For for large n_{i} the binomial distribution tends to a normal distribution. Therefore, the values in Eqs. 17 and 18 can be used in Eq. 7. With a similar computation for the column elements, the measure λ can be obtained for a table in the form (14) and compared with the value of κ for varying values of the total fraction of agreement P_{0}.
Simulations

Scenario 1 (equiprobable): \(P_{1}(i)=P_{2}(i)=\frac {1}{k}\), ∀i

Scenario 2: \(P_{1}(i)=P_{2}(i)=\frac {1}{i} / {\sum \nolimits }_{j=1}^{k} \frac {1}{j} \)

Scenario 3 (the least equiprobable): \(P_{1}(i)=P_{2}(i)=\frac {1}{i^{2}} / {\sum \nolimits }_{j=1}^{k} \frac {1}{j^{2}} \)
We set the number of observations to l=500. For each scenario, we simulated 50 cases of perfect agreement tables (P_{0}=1), i.e. diagonal matrices, and 50 cases of perfect disagreement (P_{0}=0); those are tables with zeros on the diagonal terms except for the cell of maximum joint probability. For each case, we permuted a pair of observations 3000 times, such that the original marginal frequencies were conserved. After each set of 30 permutations, we computed the four agreement measures. This procedure allowed the assessment of 10,000 simulations, in each scenario and tissue level, covering the whole agreement interval of P_{0}. The simulation scheme is shown in Fig. 2 and further details are given in Additional file 1: Table S1.
We used R.3.30 and the package psych to perform calculations and compute Cohen’s κ.
Gene expression data
We downloaded expression data from the GTEx project obtained from RNAseq [27]. Reads per kilobase per million mapped reads (RPKM) of version 6 were obtained for all brain tissues. Covariates for each tissue were also downloaded.
We also downloaded the brain expression data of the BRAINEAC project [28] obtained from winsorized values of exon array data (Affymetrix Human Exon 1.0 ST array). Downloaded data had been previously normalized and corrected for batch effects.
We identified four brain tissues common in both datasets and for which GTEx had covariates’ information. Those were cerebellum (CRBL) with 125 individuals in GTEx and 130 in BRAINEAC, frontal cortex (FCTX) with 108 and 135 individuals, (HIPP) hippocampus with 94 and 130 individuals, and putamen (PUTM) with 82 and 135 individuals, respectively.
Between the two studies, we mapped 9071 common genes to compute all genepair correlations of expression levels.
Coexpression networks
For each tissue in GTEx, we downloaded its corresponding table of covariates. These included 20 variables, 3 of which are genomewide principal components that inform on ancestry, 15 surrogate transcriptomic variables that inform on batch effects, one variable containing gender and another platform. As expression levels for GTEx are derived from count data, we performed Spearman’s partial correlations of the expression levels between each genepair, among 9071 genes. We used the par.cor function of the ppcor R package to test the rank correlations adjusting for covariates. A transcriptomewide adjacency matrix was constructed for each tissue, where the entries, corresponding to the edges of the genomewide network, were the absolute value of all genepair correlations.
For the BRAINEAC data, similar adjacency matrices per tissue were obtained. In this case, however, the downloaded gene expression values had already been normalized and corrected for batch effects. As gene expression levels are close to normality, we performed a Pearson’s correlation between all genepairs and used these values to compute the adjacency matrices.
between tissues i and j, across experiments E1 and E2, respectively. The elements z_{ij} of the preservation matrix Z for each pathway A were obtained by the Fisher’s Ztransformation of cor.cor(A_{i},Aj′).
Notes
Acknowledgments
We thank the reviewers of the manuscript for their contribution.
Funding
This work was partially supported by Red Española de Supercomputación (BCV201630002) and by Ministerio de Economía e Innovación (Spain) (MTM201568140R).
Availability of data and materials
The entire code to fully reproduce this work is given in Additional file 9: Computer Code and all required pre and postprocessed data has been deposited in Figshare and assigned the identifier https://figshare.com/account/home#/projects/20065, see README file within computer code. The function lambda.agreement has been implemented in an R package available in https://github.com/isglobalbrge/lambda.
Authors’ contributions
AC designed the study, analyzed the data and wrote the manuscript, JRG supervised the work and revised the manuscript. All authors have read and approved the manuscript.
Ethics approval and consent to participate
Not Applicable.
Consent for publication
Not Applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary material
References
 1.Baker M. 1500 scientists lift the lid on reproducibility. Nature. 2016; 533(7604):452–4.CrossRefGoogle Scholar
 2.Wasserstein RL, Lazar NA. The asa’s statement on pvalues: context, process, and purpose. Am Stat. 2016; 70(2):129–33.CrossRefGoogle Scholar
 3.Mogil JS, Macleod MR. No publication without confirmation. Nature. 2017; 542(7642):409–11.CrossRefGoogle Scholar
 4.Langfelder P, Luo R, Oldham MC, Horvath S. Is my network module preserved and reproducible?. PLoS Comput Biol. 2011; 7(1):1001057.CrossRefGoogle Scholar
 5.Oldham MC, Konopka G, Iwamoto K, Langfelder P, Kato T, Horvath S, Geschwind DH. Functional organization of the transcriptome in human brain. Nat Neurosci. 2008; 11(11):1271.CrossRefGoogle Scholar
 6.Cai C, Langfelder P, Fuller TF, Oldham MC, Luo R, van den Berg LH, Ophoff RA, Horvath S. Is human blood a good surrogate for brain tissue in transcriptional studies?. BMC Genomics. 2010; 11(1):589.CrossRefGoogle Scholar
 7.Chen C, Cheng L, Grennan K, Pibiri F, Zhang C, Badner JA, Gershon ES, Liu C. Two gene coexpression modules differentiate psychotics and controls. Mol Psychiatry. 2013; 18(12):1308.CrossRefGoogle Scholar
 8.Hallgren KA. Computing interrater reliability for observational data: an overview and tutorial. Tutor Quant Methods Psychol. 2012; 8(1):23.CrossRefGoogle Scholar
 9.Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas. 1960; 20(1):37–46.CrossRefGoogle Scholar
 10.Banerjee M, Capozzoli M, McSweeney L, Sinha D. Beyond kappa: A review of interrater agreement measures. Can J Stat. 1999; 27(1):3–23.CrossRefGoogle Scholar
 11.Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979; 86(2):420.CrossRefGoogle Scholar
 12.Gwet KL. Handbook of Interrater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters, 4th edn.Gaithersburg, USA: Advanced Analytics, LLC; 2014.Google Scholar
 13.Lonsdale J, Thomas J, Salvatore M, Phillips R, Lo E, Shad S, Hasz R, Walters G, Garcia F, Young N, et al.The genotypetissue expression (gtex) project. Nat Genet. 2013; 45(6):580–5.CrossRefGoogle Scholar
 14.Trabzuni D, Ryten M, Walker R, Smith C, Imran S, Ramasamy A, Weale ME, Hardy J. Quality control parameters on a large dataset of regionally dissected human control brains for whole genome expression studies. J Neurochem. 2011; 119(2):275–82.CrossRefGoogle Scholar
 15.Kyoto encyclopedia of genes and genomes. http://www.genome.jp/kegg. Accessed 12 Apr 2018.
 16.Fleiss JL, Cohen J, Everitt B. Large sample standard errors of kappa and weighted kappa. Psychol Bull. 1969; 72(5):323.CrossRefGoogle Scholar
 17.Krichevsky AM, King KS, Donahue CP, Khrapko K, Kosik KS. A microrna array reveals extensive regulation of micrornas during brain development. Rna. 2003; 9(10):1274–81.CrossRefGoogle Scholar
 18.O’Dell TJ, Connor SA, Guglietta R, Nguyen PV. βadrenergic receptor signaling and modulation of longterm potentiation in the mammalian hippocampus. Learn Mem. 2015; 22(9):461–71.CrossRefGoogle Scholar
 19.Zhao S, FungLeung WP, Bittner A, Ngo K, Liu X. Comparison of rnaseq and microarray in transcriptome profiling of activated t cells. PLoS ONE. 2014; 9(1):78644.CrossRefGoogle Scholar
 20.Markou A. Phil Trans R Soc B Biol Sci. 2008; 363(1507):3159–68.Google Scholar
 21.Volkow ND, Wang GJ, Fowler JS, Tomasi D, Telang F, Baler R. Addiction: decreased reward sensitivity and increased expectation sensitivity conspire to overwhelm the brain’s control circuit. Bioessays. 2010; 32(9):748–55.CrossRefGoogle Scholar
 22.Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, HuertaCepas J, Simonovic M, Roth A, Santos A, Tsafou KP, et al.String v10: protein–protein interaction networks, integrated over the tree of life. Nucl Acids Res. 2014; 43:D447–52.CrossRefGoogle Scholar
 23.Trost B, Moir CA, Gillespie ZE, Kusalik A, Mitchell JA, Eskiw CH. Concordance between rnasequencing data and dna microarray data in transcriptome analysis of proliferative and quiescent fibroblasts. R Soc Open Sci. 2015; 2(9):150402.CrossRefGoogle Scholar
 24.Guo Y, Sheng Q, Li J, Ye F, Samuels DC, Shyr Y. Large scale comparison of gene expression levels by microarrays and rnaseq using tcga data. PloS ONE. 2013; 8(8):71462.CrossRefGoogle Scholar
 25.Cohain A, Divaraniya AA, Zhu K, Scarpa JR, Kasarskis A, Zhu J, Chang R, Dudley JT, Schadt EE. Exploring the reproducibility of probabilistic causal molecular network models. In: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017. Singapore: World Scientific: 2017. p. 120–31.Google Scholar
 26.Weisstein ew. heaviside step function. in mathworld–a wolfram web resource. 2018. http://mathworld.wolfram.com/heavisidestepfunction.html. Accessed 12 Apr 2018.
 27.Genotype tissueexpression project. http://www.gtexportal.org. Accessed 12 Apr 2018.
 28.Braineac project. http://www.braineac.org/. Accessed 12 Apr 2018.
Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.