Background

Lung development is a complex developmental process initiated by budding of the lungs from the gut endodermal tube, multiple rounds of expansion and branching morphogenesis, and final maturation of the epithelial and endothelial components that comprise the airways, pulmonary circulation, and gas exchange surface [1, 2]. It is the final maturation of the lung epithelial cells that is frequently interrupted by premature birth, leading to both acute and chronic lung disease in premature infants [3, 4]. Here we demonstrate apparently related roles of the Nfib and glucocorticoid receptor (hereafter either Nr3c1 or GR) genes in lung maturation.

Previous studies showed that loss of Nfib resulted in perinatal lethality due to lung immaturity [5]. The lungs of late fetal mice lacking Nfib showed reduced expression of Type I and Type II epithelial markers along with morphological immaturity exemplified by a failure of the formation of saccules, the precursor to the alveolar air exchange region. In addition, excess proliferation of both mesenchymal and epithelial cells is seen in Nfib null lungs. Surprisingly, while the phenotype is clearly related to the failure of epithelial cell maturation, loss of Nfib only in the mesenchymal cells of the lung yields a very similar phenotype [6], indicating that mesenchymal cells regulate late epithelial maturation through as yet unknown inductive mechanisms [7].

Prenatal administration of glucocorticoids has been shown to stimulate lung maturation in both mice and premature infants [810]. Conversely, deletion of Nr3c1, the gene encoding the glucocorticoid receptor, results in a phenotype remarkably similar to that of loss of Nfib, including excess cell proliferation, failure of saccularization and reduced expression of markers of epithelial differentiation [10]. As with Nfib, loss of Nr3c1 only in the mesenchyme recapitulates much of this phenotype [11]. The similarity in phenotype seen with the loss of either Nfib or Nr3c1, together with the shared cell-type expression requirement suggests that these two genes may co-regulate a specific set of genes essential for lung maturation. We therefore examined the lung genes regulated by Nfib and Nr3c1 and the specific binding targets of NFIB to determine how these genes may cooperate in the regulation of lung maturation.

Results

ChIP-seq shows that NFIB binds to the known NFI motif in mouse fetal lung

We conducted a ChIP-seq analysis of NFIB in wild type mouse fetal lung at E16.5 and identified 759 peaks from an initial set of 8,717,818 unpaired reads (see Methods). The distribution of the distances between these peaks and the closest TSS shows a strong enrichment within 1 kbp both upstream and downstream of the TSSs compared to a random control (Figure 1). Peaks are particularly enriched at about 100bp upstream of the nearest known TSS, showing that NFIB frequently binds the proximal promoter. There is also considerable enrichment of peaks downstream of the nearest known TSS for several hundred base-pairs. This could represent either binding in the 5’UTR of the known gene or binding in the promoter of an unannotated alternative transcript.

Figure 1
figure 1

Distribution of NFIB ChIP-seq peaks relative to closest TSS. The black curve represents the distribution of the distances between the 759 NFIB ChIP-seq peaks and the closest TSS. Negative distances correspond to upstream peaks and positive distances to downstream peaks. The red curve shows the distances when the peaks are randomly and uniformly repositioned on their chromosomes. Density is estimated using a Gaussian kernel with bandwidth h=0.4.

We applied the MEME algorithm [12] to repeat-masked [13], 100bp genomic regions centered on each of the 759 NFIB ChIP-seq peaks. The most statistically significant motif found by MEME matches the known NFIB palindromic consensus sequence TGGCnnnnnGCCA. More importantly, the motif found by MEME is extremely similar to the in vitro NFIB motif obtained by Jolma et al.[14] using SELEX technology (Figure 2). This observation confirms that NFIB has similar DNA-binding specificity in mouse fetal lung cells as in a cell-free in vitro system. The palindromic binding motif found by MEME further strongly suggests that NFIB binds mainly as a dimer in these cells. Finally, the strong similarity between the in vivo and in vitro motifs for NFIB in Figure 2 show that the ChIP-seq experiment and downstream data analysis succeeded.

Figure 2
figure 2

NFIB DNA-binding motifs in mouse fetal lung and in vitro . The motif reported by Jolma et al. [14] using SELEX is shown above the motif discovered by MEME in mouse fetal lung NFIB ChIP-seq peak regions.

To further assess the quality of our ChIP-seq data, we considered the fraction of predicted ChIP-seq peaks that contain a match to the discovered NFIB motif at different motif score thresholds, as computed by the FIMO scanning algorithm [15]. At a motif score p-value threshold of 10-5, the NFIB motif is present in 10.1% of the ChIP-seq peaks but in only 1.7% of the randomized control sequences (Figure 3). This represents a six-fold enrichment (77/13 = 5.92), which exceeds the ENCODE guidelines requiring at least 10% of the peaks to have a four-fold enrichment for the ChIPed TF’s binding motif [16].

Figure 3
figure 3

Enrichment of the NFIB motif in NFIB ChIP-seq peaks. The black curve shows the fraction of 759 ChIP-seq peaks with at least one predicted NFIB binding site using different motif score thresholds. The red curve shows distribution obtained when each ChIP-seq peak region is shuffled while preserving trinucleotide frequencies.

Correlation of NFIB binding and expression of nearby genes

We studied the mechanism of transcriptional regulation by NFIB in fetal lung cells using our NFIB ChIP-seq data from E16.5 fetal lung cells and previously published gene expression data from E18.5 fetal lung cells in WT and Nfib-knockout mice [6].

We first sought for dysregulated genes in the Nfib-KO using a 2-fold expression change threshold and a maximal q-value of 0.05 for selection (see Methods).

We identified 631 genes, of which 412 are down-regulated and 219 are up-regulated. For convenience, we will refer to the down-regulated genes as “NFIB-activated”, and to the up-regulated as “NFIB-repressed”. Of course, we realize that the observed effect could be due to direct or indirect regulation of the gene in question (e.g., via NFIB interacting with another TF).

We then counted the numbers of genes with an NFIB ChIP-seq peak within 1 kbp, 10 kbp or 100 kbp (Table 1). As can be seen in Table 1, only 0.3% of the NFIB-activated genes (1 gene) have an NFIB ChIP-seq peak within 1 kbp of their TSS. This is a lower percentage than for all genes (1.2%), but the difference is not statistically significant (p=0.1, two-tailed Fisher’s exact test).

Table 1 Number of NFIB-activated/repressed genes with an NFIB ChIP-seq peak near their TSS

If we extend the analysis to binding at up to 10 kbp and 100 kbp from the TSS, neither the NFIB-activated nor NFIB-repressed genes have a number of NFIB ChIP-seq peaks that differs significantly from the number expected by chance (p>0.05, two-tailed Fisher’s exact test).

The lack of evidence of a clear relationship between proximal NFIB binding and gene expression in Table 1 may be due to the fact that the expression data is from a later stage of fetal lung development than the ChIP-seq data (E18.5 vs. E16.5). It is quite possible that the set of genes bound by NFIB changes substantially between E16.5 and E18.5. Another confounding factor is that the gene expression data comes from embryonic lungs where Nfib is deleted from E10, but expression is not measured until E18.5, leaving ample time for compensatory changes in gene expression. In fact among the 631 genes identified as activated or repressed at day E18.5 in the Nfib-KO mouse, 28 are annotated as having “sequence-specific DNA binding transcription factor activity” in the Gene Ontology (GO) database [17]. The changes in expression of these TFs will affect the expression of many genes, so many of the observed dysregulated genes may be indirect rather than direct targets of Nfib. Another possibility is that the majority of regulation by NFIB is via long-distance chromatin looping [18], but we consider this unlikely given the clear enrichment of NFIB binding events we observe in proximal promoter regions (Figure 1).

Promoters of genes activated by NFIB are enriched in NFIB motifs, but repressed ones are not

In the absence of ChIP-seq data in E18.5 mouse fetal lung, we turned to a motif-based analysis of the relationship between NFIB binding and gene expression. First, we tested for over-representation of putative NFIB binding sites (predicted using our new NFIB motif) in the promoters of NFIB-activated and NFIB-repressed genes (see Methods). We found a significant enrichment in the NFIB-activated genes (p a c t <0.0013), but not in the NFIB-repressed (p r e p >0.3). This suggest that many genes are activated by NFIB through direct interaction, but that repression generally results from indirect regulation.

As a control and to get a broader picture, we then calculated p a c t and p r e p and our motif association score (MAS, see Methods) for each of the 738 SELEX-derived motifs reported by Jolma et al.[14], which cover the DNA-binding specificity of most transcription factor families in mammals. We found that the three motifs with the largest MAS are the three NFI-family motifs in the Jolma et al.[14] compendium (Figure 4 and Table 2). These motifs have large positive MAS scores, which indicates that their presence in the promoter of a gene is highly correlated with it having reduced expression in the Nfib-KO mouse at E18.5. This suggests that NFIB acts as a direct activator of transcription for many genes in our NFIB-activated set, in mouse fetal lung at E18.5. The complete MAS results are given in Additional file 1.

Figure 4
figure 4

Motif Association Score distribution for Nfib targets. The histogram shows the distribution of the MAS for the 739 motifs in the compendium (our ChIP-seq motif inferred with MEME (Nfib_MEME) plus 738 mouse and human TF motifs determined by SELEX [14]).

Table 2 Association of TF motifs with Nfib targets

As an additional control, we repeated the complete MAS analysis after replacing mouse promoter sequences by their ortholog from rat or human. For both species, NFI motifs were among the most strongly enriched within the set of NFIB-activated genes. In human, NFIX ranked second (p a c t <10-4), and NFIB ranked fourteenth (p a c t <10-3). In rat, NFIX ranked third (p a c t <10-3) and NFIB ranked tenth (p a c t <10-3). Overall, this strongly support the hypothesis that NFIB activates its targets during lung maturation through direct interactions near the promoter regions.

The presence of our novel, ChIP-derived NFIB motif in gene promoters shows less significant correlation with gene expression than the SELEX-based motifs for NFIA, NFIB and NFIX (Table 2). The new motif (NFIB_MEME) ranks twentieth according to MAS, despite being highly similar to the SELEX-based NFIB motif (Figure 2), from which it differs primarily in the preference for an ‘A’ in the right-most position. This may indicate that the new, ChIP-based motif is slightly less accurate than the SELEX-based motif, which, if true, could be due to numerous reasons. The accuracy of motifs derived from ChIP-seq experiments depends strongly on the number of sequences without the motif presented to the motif discovery algorithm. Such sequences can be due to imperfect antibody specificity or to indirect DNA-binding by the antibody via a protein complex or via chromatin loops bound jointly by the antibody and another DNA-binding protein [19]. None of these issues are present in SELEX experiments, although they suffer from their own limitations. There is no guarantee that the DNA-binding specificity of the protein or DNA-binding domain assayed by SELEX is the same under the in vitro SELEX conditions as it is in in vivo. At any rate, our expression correlation results suggest that the existing SELEX-based NFI-family motifs are at least as accurate as our ChIP-derived motif for NFIB.

Although our new NFIB motif ranks twentieth among the 739 motifs in the combined motif database, there are effectively only three distinct motifs with higher MAS scores (Figure 5). This is because the compendium contains motifs for both mouse and human TFs (e.g., Meis3 and MEIS3) and it contains multiple members of transcription factor families. Paralogous and orthologous TFs tend to have highly similar DNA-binding affinities, as shown by the highly similar motif logos at the leaves of the motif tree in Figure 5.

Figure 5
figure 5

Motifs associated with dysregulated genes in Nfib -KO mouse clustered by motif similarity. The tree shows the top 20 motifs according to MAS clustered using the Pearson Correlation Coefficient to measure the similarity between pairs of motifs, and the UPGMA tree-building algorithm to create the tree [20]. The tree was drawn using Phylodendron [21].

Other TFs may contribute to the Nfib-KO phenotype

For each motif in Table 2 we have included the expression fold change of the corresponding gene, if significant (p-value ≤0.05). The increased expression of Nfia in the Nfib-KO further suggests that there may be some compensatory mechanism at play between the two paralogs. Since the DNA-binding motif of NFIA is almost identical to that of NFIB, it is probable that both transcription factors bind to the same regulatory elements. Such an apparently compensatory change in one NFI family member upon loss of another was noted previously in Nfib-KO lungs and suggests some type of homeostatic regulation of total NFI levels [5]. However, these data are not sufficient to indicate whether NFIA acts as an activator or a repressor, or whether the same genes are regulated by both NFIA and NFIB.

In addition to the NFI motifs, we note the large positive MAS of the Meis, SNAI2 and NR2F1 motifs, indicating that they are enriched in the NFIB-activated gene set (i.e. in genes that are under-expressed in the Nfib-KO). Because Snai2, Meis2 and Nr2f1 are over-expressed in the Nfib-KO, it is possible that some genes in our NFIB-activated set are repressed by these genes instead of being directly activated by NFIB. Repressor activity has been documented for each of these three factors [2224]. For example Snai2, which represses transcription via the recruitment of histone deacetylases to target gene promoters [25], is known for its antiapoptotic activity and plays a role in epithelial-mesenchymal transition. While neither epithelial-mesenchymal transition nor altered apoptosis seem implicated in the phenotype of Nfib-/- lungs [6], other epigenetic changes mediated by Snai2 affecting cell proliferation and cell differentiation are clearly possible and will be investigated. Finally, we note the enrichment of the EBF1 motif, which regulates cell differentiation [26]. However, according to the microarray data, the Ebf1 gene is not significantly dysregulated in the Nfib-KO.

The apparently paradoxical increased expression of Nfib in the Nfib-KO lungs (Table 2) is explained by the fact that only exon 2 (containing the DNA-binding domain and dimerization domain) is actually knocked out. The microarray is detecting an increase in a transcript that is missing exon 2, and thus cannot lead to a functional NFIB protein. One explanation for this increased expression from the Nfib promoter in the absence of production of a functional Nfib transcript is that NFIB normally represses its own production, either directly or indirectly (Table 3). A second possibility is that the shorter, disrupted transcript is less subject to post-transcriptional degradation that the complete transcript, leading to higher measured expression in the Nfib-KO lungs.

Table 3 Comparison of expression change for some TFs of interest in Nfib - and Nr3c1 -knockout lungs

Nfib and Nr3c1regulate an overlapping set of genes

Nfib-knockout mice show a phenotype very similar to that seen in glucocorticoid receptor (Nr3c1) knockout mice. To ask whether this may be due to a common set of dysregulated genes, we compared our Nfib-KO microarray data with available microarray data for Nr3c1-KO in fetal mice lung at 18.5 [27]. Using the same selection threshold as for the Nfib-KO dataset (2-fold expression change and q-value ≤ 0.05), we identified 158 activated genes and 160 repressed genes by GR.

The sets of genes activated or repressed by GR overlap significantly with the analogous gene sets for NFIB (Table 4). The sets of activated genes share 52 genes in common, which is 13.1 times higher than what would be expected by chance, while the sets of repressed genes have 22 genes in common, a 9-fold enrichment.

Table 4 Regulatory targets of Nfib and Nr3c1 overlap significantly

As we can see from Figure 6, the direction of the change in expression is the same for all but one of the genes activated/repressed in the two knockout experiments (Figure 6). The one gene whose expression changes in a different direction is Aspg (log2(FC) = 1.48 in Nfib-KO; -1.93 in Nr3c1-KO). The Pearson correlation coefficient between the log2(FC) in the two knockouts of these genes is 0.907. This strongly suggests that the common phenotype in the two knockouts are due to some or all of the genes in the overlap set. The list of genes activated/repressed in both datasets is available in Additional file 2.

Figure 6
figure 6

Expression change of genes activated/repressed in both Nfib -KO and Nr3c1 -KO mice is highly correlated. Each point represents a gene activated/repressed in both knockouts. Expression change is expressed as log2(KO/WT), and only genes with a 2-fold change and a q-value less than 0.05 in both knockouts are shown.

Promoters of genes that are activated by both NFIB and GR are enriched in NFIB and GR motifs

To further test our hypothesis that Nfib and Nr3c1 co-regulate an overlapping set of genes, we looked for motif enrichment in the sets of commonly activated or repressed genes identified above. Figure 7 shows the distribution of the MAS score for 739 motifs in the Jolma et al.[14] compendium, and the relative position of the GR (NR3C1) and NFIB motifs. We note that the MAS score of the two motifs are highly significant and among the largest positive ones, ranking at the 8th and 12th positions, respectively (Table 5). The complete results are available in Additional file 3.

Figure 7
figure 7

Motif Association Score distribution for Nfib and Nr3c1 targets. The histogram shows the distribution of the MAS for the 739 motifs in the compendium (our ChIP-seq motif inferred with MEME (Nfib_MEME) plus 738 mouse and human TF motifs determined by SELEX [14]).

Table 5 Association of TF motifs with Nfib and Nr3c1 common targets

Among the motifs with a larger (but similar) positive MAS, we find the androgen receptor (which is almost identical to GR), the EBFI motif, motifs for two members of the T-box family of transcription factors and the ZNF410 motif. However, none of the mouse genes corresponding to these motifs shows a significant (p- value ≤0.05) expression change in both the Nr3c1 and Nfib knockouts. We also note the large negative MAS of Foxi1 and some estrogen-related receptors (Esrra, Esrrb). Estrogen controls many cellular processes such as growth and differentiation. While Esrrb is over-expressed in the Nr3c1-KO, we identified no such dysregulation in the Nfib-KO. We have no expression data for Foxi1 in Nfib-KO, and this gene is not dysregulated in the Nr3c1-KO. However, some other genes of the same Fox family are dysregulated in both knockouts, such as Foxp2 (over-expressed in both) and Foxn3 (under-expressed in Nr3c1-KO and over-expressed in Nfib-KO). Interestingly, it has been shown that loss of Foxp2 leads to defective postnatal lung alveolarization in mouse [28].

Similarly to what we did in a previous section, we repeated the MAS analysis using human and rat orthologous sequences. For human, NFIX ranked third (p a c t <10-4), and NFIB ranked nineteenth (p a c t <10-2). For rat, NFIX ranked sixth (p a c t <10-3), and NFIB ranked eighth (p a c t <10-2). This suggests that NFIB activates some NR3C1-activated genes through binding at their promoter sequences. However, NR3C1 did not show a significant enrichment for human, and it ranked only 51 st in rat (p a c t =0.044). These data suggest that in some instances mechanisms other than direct binding of promoter sequences by NR3C1 may mediate co-regulation by NR3C1 and NFIB. Consistent with this finding, previous studies have indicated that some functions of NR3C1 are mediated by mechanisms other than direct NR3C1 binding to DNA. For example, mice defective in DNA-binding by NR3C1 are viable while those deleted for NR3C1 die at birth [29]. Thus it will be important to determine the fraction of co-regulated genes whose expression is regulated by direct binding of NR3C1 versus other indirect mechanisms of regulation.

Regulatory sub-network involving Nfib and Nr3c1

Based on the data presented above, we propose a possible regulatory sub-network involving Nfib, Nr3c1, Nfia and their 52 common activated target genes (Figure 8). Each of the links represents activation (pointed arrow) or repression (flat arrow) of transcription by a transcription factor. Each of the links in the figure is numbered, and we describe the experimental support for each link in what follows.

Figure 8
figure 8

Possible regulatory sub-network involving Nfib , Nfia and Nr3c1 . A flat head on an edge indicates a repressive effect. The connector labeled 5 indicates cooperative activation by both Nfib and Nr3c1. Evidence for each numbered edge is given in the main text. The 52 activated genes are 2-fold over-expressed in the wildtypes compared to Nfib-KO and Nfib-KO, with a q-value ≤0.05.

Firstly, as shown in Table 2 and discussed above, Nfib and Nfia transcripts are both repressed by NFIB, which is indicated by links 1 and 2 in Figure 8. For simplicity, and because both NFIA and NFIB are transcription factors and thus can act directly to affect transcription, we depict these as direct interactions, but they may well be indirect. The repression of Nfia by Nfib is further supported by the observation that Nfia is significantly under-expressed (p<10-5) while Nfib is significantly over-expressed (p=0.001) in the Nr3c1-KO (Table 3).

Secondly, both Nr3c1 and Nfib transcripts are significantly repressed by GR (p < 0.001 and p = 0.001, respectively, Table 3), which we indicate by links 3 and 4 in Figure 8. We once again depict these as direct interactions for simplicity.

Thirdly, we propose that the set of 52 genes under-expressed in both knockouts (hereafter G52) are activated cooperatively by Nfib and Nr3c1. While there may be some indirect regulation at play, our motif analysis revealed a significant enrichment for both NFIB and GR motifs in this set, which suggests that some of these genes are direct targets of these factors. Moreover, the two alternative trivial topologies that could connect Nfib and Nr3c1 to G52 are not supported by the data. Indeed, the topology

Nr3c1 Nfib G52

is precluded by the data supporting edge 3, and the topology

Nfib Nr3c1 G52

would involve under-expression of Nr3c1 in the Nfib-KO, which is not observed. We indicate the direct cooperation hypothesis by link 5 in Figure 8.

Finally, since the DNA binding affinities of NFIA and NFIB are highly similar (see motif logos in Figure 5), we infer that NFIA may bind many of the same regulatory elements as NFIB. We therefore hypothesize that Nfia may regulate the genes in G52, which we indicate by link 6 Figure 8. However, it is not clear from the available data whether Nfia acts as a repressor or an activator of these genes.

GO analysis of putative common targets of Nfib and Nr3c1

To identify which biological processes could be activated by Nfib and Nr3c1, we tested for gene-annotation enrichment using GOrilla [30] and a ranked list approach (see Methods). We found significant enrichment for several general terms including “cell adhesion” (q-value = 0.03), “transport” (q-value = 0.0006) and “immune system process” (q-value = 0.004), but this required the inclusion of genes that are not in the set of 52 putative common targets (i.e. genes that appear under-expressed in both knockouts, but that do not meet our strict fold-change and q-value thresholds, see Methods). This suggests that both Nfib and Nr3c1 may regulate these processes in some manner during lung development. On the other hand, we found significant enrichment for three specific terms where Nfib and Nr3c1 are more likely to play a direct role since the associated genes are among the 52 common targets we have identified: First, “cellular defense response” (q-value = 0.03), with the Tnfrsf4 (tumor necrosis factor receptor superfamily, member 4) and Ncf1 (neutrophil cytosolic factor 1) genes; Second, “regulation of vascular endothelial growth factor signaling pathway” (q-value = 0.02), with the Myoc1c (myosin ic) and Xdh (xanthine dehydrogenase) genes; Third, “regulation of fibroblast proliferation” (q-value = 0.04), with Sphk1 (sphingosine kinase 1), Aqp1 (aquaporin 1) and Cdkn1a (cyclin-dependent kinase inhibitor 1a) genes. We are currently examining these genes in more detail to assess NFIB and GR binding to putative regulatory regions.

Discussion

Lung maturation is a complex process dependent on cooperation between the mesenchyme, epithelium and endothelial cells to promote vascularization, epithelial differentiation, mesenchyme thinning and lung morphogenesis [1, 2]. Previous studies using transcriptional profiling have identified large numbers of genes whose expression levels change substantially during each of the characteristic stages of mouse lung development: 1) embryonic, 2) pseudoglandular, 3) canalicular, 4) saccular and 5) alveolar [3133]. For example Mariani et al. demonstrated groups of extracellular matrix genes that exhibit stage-specific expression patterns in mouse lung [32, 33]. However the vast majority of these changes in gene expression represent stage-specific differentiation markers of epithelial or mesenchymal cells which define the cellular phenotype at each stage, but give little information on the regulatory mechanisms that control the differentiation process. Conversely, genetic studies have been instrumental in the discovery of gene regulatory pathways essential for early, and to a lesser extent, later stages of lung development. For example, reciprocal epithelial-mesenchymal inductions have been shown to be essential for both early and late stages of lung development [7, 34, 35]. In addition, while a number of signaling pathways including the FGF [36, 37], Shh [38, 39], BMP [40, 41], Wnt [42] and TGF [43, 44] pathways are known to signal within and between the epithelium and mesenchyme, the transcriptional networks that respond to and/or generate these pathways and mediate the maturation program remain largely unknown [2, 45].

Both the NFIB and GR transcription factors have been shown previously to regulate the transition from the canalicular to the saccular stage in mouse lung [5, 10, 46, 47]. Indeed mutations in either gene result in a very similar phenotype characterized by an excess of mesenchymal and epithelial cells at E18, severe reduction or failure of saccule formation, severe delay in type I and type II cell differentiation and resultant perinatal lethality with largely non-inflated lungs. More recently it was determined using conditional KO alleles that Nfib and Nr3c1 expression in the lung mesenchyme, not lung epithelium, is essential for normal lung maturation [6, 11]. The similarity in phenotype seen with loss of either Nfib or Nr3c1 initially suggested that these two transcription factors might be cooperating in a conserved pathway of lung maturation. Our determination of a significant overlap between the genes whose expression changes with the loss of either Nfib or Nr3c1 (Table 4 and Figure 6) in lung mesenchyme is consistent with this hypothesis. In addition, the motif analysis showed significant enrichment of NFI and GR binding motifs in genes whose expression decreased in both Nfib-KO and Nr3c1-KO lungs (Figure 7) suggesting cooperative activation of these genes by these factors.

Conclusions

Our computational analysis combining expression, binding and motif data provide the first evidence for Nfib and Nr3c1 co-regulating genes related to lung maturation. Although these data are consistent with the model of Nfib and Nr3c1 function shown in Figure 8, other models are possible. For example, it may be that the overlap in differentially expressed genes in the two mutants (Table 4) is a reflection of the overall phenotype (lung immaturity) rather than being causally dependent on direct co-regulation by Nfib and Nr3c1. To distinguish between such alternate hypotheses it will be necessary to simultaneous assess GR and NFIB binding to mouse lung target genes and assess the frequency of co-occupancy at potential target genes. While GR ChIP-seq data is available for human lung cancer cells [48], rat pheochromocytoma cells [49] and mouse adipocytes [50], because GR binding to DNA is highly dependent on chromatin context and thus shows high cell-type specificity [51] these data sets are of marginal use at best for determining GR targets in normal lung mesenchyme. In addition, while the ChIP-seq data set used here for NFIB was determined at E16.5, it will be necessary to assess both NFIB and GR binding at multiple stages of lung development to determine the temporal sequence during which NFIB and GR bind to target sites to regulate gene expression. However, even in the absence of such data our determination that both NFIB and GR binding motifs are among those most highly associated with genes whose expression is decreased by loss of either Nfib and Nr3c1 (Figure 8) provides a useful framework for future studies.

Methods

NFIB ChIP-seq

E16.5 wild type lungs were minced on ice in PBS, incubated in 1% formaldehyde at room temperature for 10 min., quenched with 0.125M glycine for 5 min. and prepared for chromatin isolation as described previously [6]. Chromatin was sheared to ∼200–500 bp using a Branson Sonifier 250 sonicator and subjected to ChIP using a ChIP assay kit (Upstate Biotechnology) and NFIB antibody (Geneka Biotechnology). Immunoprecipitated chromatin was isolated, the crosslinks reversed, and the isolated DNA was used to prepare a sequencing library and sequenced at the Cornell University DNA Sequencing and Genotyping Laboratory, resulting in 8,717,818 unpaired reads of 42bp.

Reads were aligned to the mouse genome (mm9) using Bowtie2 software [52] with default parameters. 2,311,332 reads (26.51%) aligned at a unique location and 971,349 (11.14%) aligned to more than one location. Reads with a mapping quality score (MAPQ) below 30 were discarded. The remaining reads were then used to call peaks with the MACS1.4 software [53], using default parameter values except for the following: -g 1.87e9 (effective genome size) -m 5,30 (MFOLD parameter). The list of peak summits output by MACS is available in Additional file 4. De novo motif discovery was performed on the 100bp sequences centered around each peak summit with repeats masked (mm9, downloaded from UCSC), using the MEME software [12]. DNA sequences were shuffled using the uShuffle software [54] with parameter k=3 to preserve trinucleotide frequencies.

Microarray analysis

For Nfib-KO, we used the mRNA expression profiling datasets published in [6]. In this KO strain only exon 2 is deleted from the Nfib gene leaving the remaining exons present for detection in the microarray analysis. These datasets have been produced using Affymetrix arrays and are available at the NCBI Gene Expression Omnibus (GEO) data repository under Accession number GSE24465. The probe intensity signals were normalized using the GC-RMA algorithm.

For Nr3c1-KO, we used the datasets published in [27]. These datasets have been produced using Codelink BioArrays and are available at the ArrayExpress data repository under Accession number E-MEXP-861. The signal of the probes with “Good quality” flag was extracted, log2-transformed and quantile normalized using R.

For both datasets, differential expression and p-values were calculated using the linear model implemented in the R Limma package (Smyth, 2004), and q-values were obtained from the p-values using the Benjamini-Hochberg method (they are referred to as adjusted p-values in the Limma output). For downstream analysis, we considered only the probe set with the most significant result for each gene. The complete results with all probes are available in Additional files 5 and 6. Note that differential expression is expressed as log2(KO/WT) in Additional files 5 and 6, but are simply expressed as ratio (KO/WT) in the main text.

In this paper we will refer to a gene as activated by TF X if its expression decreases at least 2-fold in the knockout of X, with a q-value ≤ 0.05. Similarly, we will say that a gene is repressed by X if its expression increases at least 2-fold in the KO of X, with a q-value ≤ 0.05. These two definitions define three sets of genes:

  • X-activated genes: K O/W T≤ 0.5 and q-value ≤ 0.05,

  • X-repressed genes: K O/W T≥ 2 and q-value ≤ 0.05, and

  • X-non-target genes: All other genes for which we have expression data.

Of course we are using the terms “activated” and “repressed” somewhat loosely here, since these sets of genes will include both direct and indirect regulatory targets of TF X.

We use q-values in the above definition to take multiple testing into account. However, when we refer to a particular gene that has been selected using another criteria (e.g. a motif analysis), we report unadjusted p-values of differential expression, and we say that this gene is dysregulated if its p-value ≤ 0.05, without enforcing any fold change threshold.

Motif enrichment in promoters of target genes

To determine how a given motif M is associated with the regulatory targets of TF X, we measure the enrichment of M in the proximal promoters of genes activated or repressed by X, relative to the promoters of non-target genes. (Note that the motif M can be any motif, not just that of the TF X).

First, we use motif M to scan the proximal promoter of each gene, defined as the region within 1000 bp of its TSS, and save the best score for each promoter. We then determine p a c t , the p-value of a Wilcoxon rank-sum test [55] with the null hypothesis that the promoter motif scores are no better in the activated genes compared to the non-target genes. Similarly, p r e p is the p-value when we test the scores of the repressed genes compared with the non-target genes. For analysis and plotting purposes, we combine these two p-values into a single motif association score (MAS), whose magnitude is log10(min(p a c t ,p r e p )), and whose sign is positive if the motif M is more significantly enriched in the activated genes, p a c t p r e p , and negative otherwise.

In the current work, we first compute the MAS for Nfib targets using each of the 738 motifs in the compendium of SELEX-based motifs published by Jolma et al.[14]. In a subsequent analysis, we compute the MAS for the common targets of Nfib and Nr3c1. In this case, the “activated” set of genes is simply the intersection of the Nfib-activated and Nr3c1-activated, likewise for the “repressed” set of genes. The set of “non-targets” is the intersection of the Nfib-non-targets and the Nr3c1-non-targets. We note that since p a c t and p r e p are not adjusted for multiple tests, MAS should be viewed as a score rather than as a true statistical confidence measure.

We extracted TSS coordinates from the UCSC browser KnownGene table. When a gene has multiple TSSs, we used the one corresponding to the shortest transcript.

Intersection between Nfib and Nr3c1targets

The size of the overlap z between two independent sets A and B, each sampled without replacement from a set C, follows a hypergeometric distribution with parameters m=|A|, n=|C|-|A| and k=|B|, where |X| denotes the number of elements in set X. If C is the set of all genes, and A and B are two sets of target genes, the probability that their intersection contains z or more genes (under the null model) can be obtained in the R programming language with a call to the hypergeometric distribution function phyper(z-1,m,n,k,lower.tail=FALSE). The expected size of the overlap is equal to m k/(n+m).

GO-term analysis

All genes were ranked according to their likeliness of being activated by both Nfib and Nr3c1, with the most likely at the top of the list. To achieve this, we used the maximum of the fold-change (KO/WT) observed in either the Nfib-KO or the Nr3c1-KO as a sorting key. This ranked list of genes was used as input to GOrilla with default parameters [30], and a q-value threshold of 0.05 was used to define significant results.