Single network analysis results
A single weighted gene coexpression network was constructed using expression data from livers of 135 female mice of the B × H cross, utilizing the 3421 most connected and varying transcripts from the approximately 23,000 transcripts present on the arrays (Ghazalpour et al. 2006). Using hierarchical clustering, we obtained 12 modules (each designated by a color). Gray denotes genes outside of modules. In this network, the Blue module had the highest module significance score for the physiologic trait of mouse weight (g) (module significance = 0.395, p = 7.7 × 10−5), and was also highly significant for abdominal fat pad mass (g) (module significance = 0.323, p = 0.009). These p values remain significant after Bonferroni correction adjusting for 12 modules. We mention that total mass (g) of other fat depots is also significant (module significance = 0.309, p = 0.02), but does not remain significant after Bonferroni correction.
To study the preservation of modules across different F2 intercrosses, we used the B × H module color assignment to cluster the corresponding network in the B × D mouse cross data set (Fig. 2a). A weighted gene coexpression network analysis was constructed using 1953 genes in the B × D data set that have corresponding probes in the B × H data set. We observe that several modules (Red, Blue, Green-yellow, Turquoise, and Green modules being notable examples) are roughly preserved between these two data sets. Figure 2b shows a multidimensional scaling (MDS) plot of the B × D data colored by B × H modules. This plot visualizes the pairwise gene dissimilarities by projecting them into a 3-dimensional Euclidean space.
If, in fact, intramodular connectivity (centrality and membership to the Blue module) reflects physiologic significance, one would expect to see a high correlation between kME and GSweight for the Blue module genes. As in Ghazalpour et al. (2006), we find a high correlation between kME and GSweight in the B × H cross (r = 0.47, p ≤ 10−20, Fig. 3c). Here we validate this relationship in the B × D cross (r = 0.57, p ≤ 10−20, Fig. 3d).
Figure 3a shows that intramodular connectivity (kME) with regard to the Blue module is preserved between the B × H and the B × D crosses (correlation r = 0.45, p ≤ 10−20). GSweight was conserved with a Spearman correlation of 0.19 (p = 1.0 × 10−17, see Fig. 3b). Network-based gene screening uses both GSweight and kME to find weight-related genes. Note that kME is better preserved than GSweight, which suggests that kME may be a more robust gene-screening variable (see Fig. 3).
A module QTL on chromosome 19
We had previously identified a single nucleotide polymorphism (SNP) marker on chromosome 19 (SNP19) that affected weight and module expression. Table 2 demonstrates the preservation of correlations between the Blue module eigengene MEblue, weight, and SNP19 in both the B × H and the B × D data sets. A relationship was seen between MEblue and weight in both the B × H data (r = 0.62, p = 1.3 × 10−15) and in the B × D cross (r = 0.34, p = 2.1 × 10−4). We note here that while the p values are not adjusted for multiple comparisons, using the most conservative correction—the Bonferroni correction, wherein we multiply the p significance level by the number of modules—still results in a significant correlation between MEblue and weight in the B × H data. More explicitly, in correcting the p value, multiplying p = 1.3 × 10−15 by the number of modules (12) leads to a still significant p = 1.6 × 10−14. This illustrates the value of using WGCNA to reduce the number of multiple comparisons common to microarray analysis. We note that the mQTL on chromosome 19 had a single-point LOD score of 3.36. While a relatively weaker correlation between SNP19 (d19mit71) and weight is seen in B × D data compared with B × H, homozygous animals for the B6 allele of a different marker on chromosome 19 (d19mit63) have significantly different weight from DBA homozygotes (in the B × D cross). This result is consistent with the previous finding that B6 and DBA homozygotes have significantly different subcutaneous fat pad mass (a weight-related trait) (Ghazalpour et al. 2005). It is also possible that the differences in experimental design such as diet, age of the animals, and the status of the Apoe gene could account for the weaker correlation observed in the B × D network.
Table 2 Studying the preservation of correlations between the B × H and the B × D mouse cross data
Using a body weight–related mQTL to prioritize genes inside the Blue module
A SNP marker allows one to define a gene significance measure, GS.SNP, which can be used to prioritize genes within a module.
For the ith gene, GS.SNP(i) is defined as the absolute value of the correlation between the ith gene’s expressions and a given SNP’s additive marker coding value:
$$ {{\rm{GS}}{\rm{.SNP(}}i{\rm{)}}\,{\rm{ = }}\,|\,{\rm{cor(}}x{\rm{(}}i{\rm{),SNP|}}.} $$
Additive marker coding reflects the dosage of a given allele; alternatively, one could use dominant or recessive marker coding (see Supplementary Material, Supplementary Table 2).
Observed GS.SNP values are reported in Supplementary Fig. 2a for our simulated module example. We explore the relationship between the GS.SNP values obtained by different marker coding methods in Supplementary Material, Appendix B, and depict the strong relationship between GS.SNP and the traditional LOD score in Supplementary Fig. 3. In short, this figure demonstrates that regardless of whether additive, dominant, or recessive marker coding is used, GS.SNP is highly related to the LOD score values.
Systems genetics gene-screening criteria
As described above, we found a SNP marker on chromosome 19 that is highly related to body weight and to the Blue module expressions. To determine which gene expressions mediate between this mQTL and body weight, it is natural to rank gene expressions based on their correlations with SNP19 and the clinical trait. This suggests to screen for genes with high GS.SNP19 and high GSweight. Furthermore, since the Blue module was found to be related to body weight, it is natural to rank genes by membership to the Blue module, i.e., by intramodular connectivity. Our gene-screening criteria for finding the genetic drivers of body weight are as follows: (1) high association with the body weight, i.e., high values of GSweight; (2) membership and hub status in a trait-related module, i.e., a high value of kME; and (3) high association with a body weight–related mQTL, i.e., high values of GS.SNP. Specifically, we used the 85th percentile of each screening variable, which resulted in nine genes inside the Blue module (Table 3). The gene list is quite robust with respect to the percentile as the reader may explore using our online R software tutorial. An examination of their potential relationship to body weight using the Mouse Genomics Informatics gene ontology database (http://www.informatics.jax.org/) (Eppig et al. 2005) and existing literature yields the following: Fsp27 encodes a pro-apoptotic protein. Nordstrom et al. (2005) found that Fsp27-null mice are resistant to obesity and diabetes. In addition, Fsp27 expression is halved in obese humans after weight loss, and other recent research suggests that Fsp27 regulates lipolysis in white human adipocytes (Nordstrom et al. 2005). A number of the other genes are related to basic biological processes that may be altered in the obese state, which is associated clinically with both the metabolic syndrome and vascular disease, among other conditions. Gpld1 (glycosylphosphatidylinositol-specific phospholipase D1) expression in liver is increased with a high-fat diet in mice, and overexpression is associated with an increase in fasting and postprandial plasma triglycerides and a reduction in triglyceride-rich lipoprotein catabolism (Raikwar et al. 2006). Gene products of F7 and Kng2 are elements of the hemostatic system and may play roles in thrombosis and vascular disease (Kaschina et al. 2004; Reiner et al. 2007; Viles-Gonzalez et al. 2006). Our network-based gene screening method appears to identify biologically relevant genes, considering the evidence from primary literature supporting involvement of these genes in obesity (Fsp27) and/or known obesity-related disorders (diabetes, metabolic syndrome, and vascular disease). Other genes identified by this method may be novel candidates. As such, these results should be considered a starting point for subsequent experimentation to explore involvement of these genes in obesity.
Table 3 Gene-screening results of the single-network analysis
Sector plots for identifying differentially expressed and differentially connected genes
Differential network analysis is concerned with identifying both differentially connected and differentially expressed genes. To measure differential gene expression between the lean and the obese mice, we use the absolute value of the Student t-test statistic. Plotting DiffK, the difference in connectivity between lean and obese mice, versus the t-test statistic value for each gene gives a visual demonstration of how difference in connectivity relates to a more traditional t-statistic describing difference in expression between the two networks.
Figure 4a shows a scatterplot of DiffK vs. the t statistic. Eight sectors of the plot with high absolute values of DiffK (> 0.4) and/or t-statistic values (> 1.96) are shown. Horizontal lines depict sector boundaries based on t-statistic values, and vertical lines depict boundaries based on DiffK. These eight sectors are marked by numbers in Fig. 4a. To assign a significance level (p value) to a gene’s DiffK value or to its membership in a particular sector defined by DiffK and t statistic, we use a permutation test approach that randomly permutes the microarray sample labels. The permutation test contrasts networks built by randomly partitioning the 60 mice into two groups. We consider the number of genes inside a given sector (which is defined by thresholding the t statistic and DiffK as described above) in determining significance level. Figure 4b demonstrates the same information except network membership is permuted. Based on 1000 random permutations, sector membership was found to be significant for sectors 2, 3, and 6 with p ≤ 1.0 × 10−3. Membership in sector 5 was significant with p ≤ 1.0 × 10−2.
Functional enrichment analysis of sector 3 genes
We analyzed 61 sector 3 genes that were both highly connected in network 1 and lowly connected in network 2 for functional enrichment using the DAVID database (Dennis et al. 2003). This software, which is free and available for download at http://www.d.abcc.ncifcrf.gov/home.jsp, calculates the p value for the extent of enrichment of a given biological pathway/set by performing Fisher’s exact test. We focused on sector 3 for two reasons. First, sector 3 members had extreme values of DiffK as well as high t-statistic values. Also, as one can readily see from Fig. 4a, a high proportion of Yellow module genes were found in this module, based on network 1 module definitions. These Yellow module genes were lowly connected in network 2, and therefore were annotated as Gray module (background) members in a module assignment scheme based on network 2. This result suggests that in a pathophysiologic state (mouse obesity), the Yellow module can no longer be found.
Results for this analysis that were significant at p < 0.05 level are shown in Table 4. These genes were markedly enriched for the extracellular region (37.7% of genes p = 1.8 × 10−4), extracellular space (34.4% of genes p = 5.7 × 10−4), signaling (36.1% of genes p = 5.4 × 10−4), cell adhesion (16.4% of genes p = 7.7 × 10−4), and glycoproteins (34.4% of genes p = 1.6 × 10−3). Furthermore, 12 terms for epidermal growth factor or its related proteins were recovered in the functional analysis. A few of the notable results are EGF-like 1 (8.2% of genes p = 8.7 × 10−4), EGF-like 3 (6.6% of genes p = 1.6 × 10−3), EGF-like 2 (6.6% of genes p = 6.0 × 10−3), EGF (8.2% of genes p = 0.013), and EGF_CA (6.6% of genes p = 0.015).
Table 4 Functional enrichment analysis of the results of the differential network analysis
In summary, we find a group of rewired genes identified by differential connectivity in lean and obese mice. These genes are highly enriched for extracellular and cell–cell interactions and notably 12 epidermal growth factor (EGF) or EGF-related factors. An indirect validation of the differential network results is provided by a published article that reports that EGF plays a causal role in inducing obesity in ovariectomized mice (Kurachi et al. 1993).
Functional enrichment analysis of sector 5 genes
Sector 5 is analagous to sector 3 in that it contains genes with both extreme differences in connectivity and extreme t-statistic values. After Bonferroni correction, these genes are enriched for enzyme inhibitor activity (p = 2.93 × 10−3), protease inhibitor activity (p = 6.00 × 10−3), endopeptidase activity (p = 6.00 × 10−3), dephosphorylation (p = 0.0122), protein amino acid dephosphorylation (p = 0.0122), and serine-type endopeptidase inhibitor activity (p = 0.0417) (Supplementary Table 6). Two genes were enriched for all significant categories: Itih1 and Itih3. These two genes are located near a QTL marker for hyperinsulinemia (D14Mit52) identified in C57Bl/6, 129S6/SvEvTac, and (B6 × 129) F2 intercross mice (Almind and Kahn 2004). Itih3 was independently determined to be a gene candidate for obesity-related traits based on differential expression in murine hypothalamus (Bischof and Wevrick 2005). Two serine protease inhibitors, Serpina3n and Serpina10, were enriched for the categories of enzyme inhibitor, protease inhibitor, and endopeptidase inhibitor. In humans, Serpina10 is also known as Protein Z-dependent protease inhibitor (ZPI). This serpin inhibits activated coagulation factors X and XI; ZPI deficiencies have been found to be associated with venous thrombosis (Water et al. 2004). We note that obesity is a strong independent risk factor for venous thrombosis (Abdollahi et al. 2003; Goldhaber et al. 1997) and that accordingly PZI may be a link between obesity and increased risk of venous thrombotic events.
Results from functional enrichment analysis for all other sectors are described in Supplementary Material, Appendix C and Supplementary Tables 3, 4, 5, 7, and 8 (Supplementary Table 3: enrichment of biological pathways/sets for Blue module genes intersecting B × H and B × D data sets; Supplementary Table 4: enrichment of biological pathways/sets for sector 2 genes; Supplementary Table 5: enrichment of biological pathways/sets for all sector 3 genes; Supplementary Table 7: enrichment of biological pathways/sets for sector 6 genes; Supplementary Table 8: enrichment of biological pathways/sets for sector 8 genes).