Positional variations among heterogeneous nucleosome maps give dynamical information on chromatin
- First Online:
- Cite this article as:
- Tanaka, Y., Yoshimura, I. & Nakai, K. Chromosoma (2010) 119: 391. doi:10.1007/s00412-010-0264-y
- 395 Downloads
Although nucleosome remodeling is essential to transcriptional regulation in eukaryotes, little is known about its genome-wide behavior. Since a number of nucleosome positioning maps in vivo have been recently determined, we examined if their comparisons might be used for obtaining a genome-wide profile of nucleosome remodeling. Using seven yeast maps, the local variability of nucleosomes, measured by the entropy, was significantly higher in a set of reported unstable nucleosomes. The binding sites of four transcription factors, known as the remodeling factors, were distinctively high both in entropy and linker ratio, whereas those of Yhp1, their potential inhibitor, showed the lowest values in both of them. Taken together, our map shows the general information of nucleosome dynamics reasonably well. The “nucleosome dynamics” map provides the new significant correlation with the degree of expression variety instead of their intensity. Furthermore, the associations with gene function and histone modification were also discussed here.
The genome of eukaryotes takes the chromatin structure, which plays important roles in many cellular processes, such as transcription and replication. The unit of the chromatin structure is the nucleosome, which is composed of a histone octamer, and its positioning on DNA seems to be optimized for facilitating these cellular processes. For example, a number of researchers reported that the upstream region of transcription start sites (TSSs) of genes is statistically free of nucleosomes (Jiang and Pugh 2009a, b). This kind of studies are accelerated thanks to a series of recent releases of genome-wide nucleosome positioning maps in human (Ozsolak et al. 2007; Schones et al. 2008), fly (Mavrich et al. 2008a, b), nematode (Johnson et al. 2006; Valouev et al. 2008), Medaka fish (Sasaki et al. 2009), and budding yeast (Kaplan et al. 2009; Lee et al. 2007; Mavrich et al. 2008a, b; Whitehouse et al. 2007; Yuan et al. 2005).
The nucleosome positioning is not uniform temporally, either. This phenomenon is known as nucleosome remodeling, where ATP-dependent chromatin remodeling factors can change the nucleosome organization by inducing DNA superhelical torsion (Chandy et al. 2006; Havas et al. 2000). In yeast, several transcription factors, such as Abf1, Reb1, and Rap1, known as General Regulatory Factors (GRFs; Chasman et al. 1990), also perturb nucleosome positioning in the vicinity of their binding sites for activating neighboring regulatory sites (Fourel et al. 2002; Hartley and Madhani 2009). In the promoters of ribosomal protein, Rap1 is required for recruitment of Esa1 catalytic subunit of NuA4 H4 lysine acetyltransferase (Reid et al. 2000). Additionally, it is known that induction of chromatin remodeling factors is controlled by a variety of histone modifications. For example, a typical remodeling complex, Swi/Snf, requires histone acetylation by the Spt-Ada-Gcn5-acetyltransferase (SAGA) complex to bind to DNA and then displaces the acetylated histone (Chandy et al. 2006; Varga-Weisz 2001). Another remodeling complex, Mi-2, also contains histone deacetylases (Whitehouse et al. 1999). Therefore, it is of great interest to compare the changes between histone modification patterns and associated nucleosome dynamics.
To examine the chromatin remodeling globally, it is necessary to compare the change of nucleosome positions upon some stimuli. So far, while the positional changes of nucleosomes have been analyzed in several loci (Almer et al. 1986; Barbaric et al. 1992; Kent et al. 2001; Moreira and Holmberg 1998; Weiss and Simpson 1997), only a little is known about their genome-wide behavior. As a pioneering study, Shivaswamy et al. showed the depletion of nucleosomes in promoters induced by heat shock (Shivaswamy et al. 2008). However, their results are based on the comparison between only two conditions. Much more data are necessary to get the general genome-wide landscape. Jiang and Pugh developed a compiled reference map of nucleosome positions in Saccharomyces cerevisiae, using multiple maps independent of our study (Jiang and Pugh 2009a, b). However, detailed analyses of its “dynamic” positions have not been published yet.
In this study, we tested a simple hypothesis that the local positional variance of nucleosomes in a set of heterogeneous maps can be an indicator of local nucleosome remodeling. Seven yeast maps were used, and we show that our results are quite promising, that is, consistent with a number of previous observations, such as promoters, replication origins, and transcription factor binding sites (TFBSs) (Field et al. 2008; Hartley and Madhani 2009; Shivaswamy et al. 2008), and are useful for deriving novel but reasonable correlations between the change of nucleosome positioning and several features, such as gene function, expression variety, and histone modifications.
Re-definition of nucleosome positions
Summary of our detection method and the comparison with reported nucleosomes
Lee et al. 2007
Whitehouse et al. 2007 (WT)
Whitehouse et al. 2007 (Δisw2)
Mavrich et al. 2008a
Kaplan et al. 2009 (Ethanol)
Kaplan et al. 2009 (Galactose)
Kaplan et al. 2009 (YPD)
Consistency with known stable/unstable nucleosomes
By comparing the relative positions of re-assigned nucleosomes with the above procedure, we can identify local regions where nucleosome positions are stable or unstable among the seven maps. As a measure of the positional variability of nucleosomes, we adopted the average of “entropy” value over a 100-bp window (see Materials and methods). This definition of entropy is based on Shannon's entropy, which is a standard measure for estimating the uncertainty of a given signal in Information Theory (Schneider 2010). While the entropy is high in the region where nucleosome positioning is variable between datasets (black rectangle in Fig. 1a), the entropy is low where nucleosomes are occupied or depleted in all datasets (white or gray rectangles in Fig. 1a).
To verify our “nucleosome dynamics” map, we collected experimentally verified examples of 21 stable and 13 unstable nucleosomes on five promoters from literature (Almer et al. 1986; Barbaric et al. 1992; Kent et al. 2001; Moreira and Holmberg 1998; Weiss and Simpson 1997) (Supplementary Table 2). The unstable nucleosomes show significantly higher entropy values than the stable ones (P = 1.6e-3 from Wilcoxon test; Fig. 1b).
It is also important to know how much the variability of nucleosome locations is explained by the difference of experimental techniques. Thus, the mutual correlation of assigned positions in highly dynamic regions between different maps was examined using a cluster analysis (Supplementary Fig. 3). The data produced by the same authors tend to be clustered together. To confirm if this intra-laboratory correlation does significant harm or not, we further checked the effect of removing one or two of the data from the same Kaplan et al.'s laboratory. The Pearson's correlation coefficient (PCC) was calculated between the entropy values of each case and the original one along the whole genome. In all cases, high correlation was observed [PCC ≥0.936 (P < 2.2e-16)], indicating that the differences of nucleosome locations were not always due to those of experimental techniques or authors. In the following sections, we will further validate the reliability of our map by comparing our observations with a variety of previous reports.
Nucleosomes around TSSs and replication origins
Nucleosomes around the binding sites of 100 transcription factors
Genome-wide distribution of “dynamic” regions
From the above results, we concluded that the entropy/linker ratio values of the combined map are reliable indicators of (at least) the averaged dynamic status of chromatin. Hereafter, we report some of the new results derived from our data.
Difference between the inside and the outside of genes
Next, the difference of “nucleosome dynamics” between intragenic regions (i.e., exons plus introns) and intergenic regions was examined. The entropy in intergenic regions was significantly higher than in intragenic regions (P < 1.0e-15 from the Wilcoxon test; Fig. 4b). Within the intragenic regions, the entropy in introns was significantly higher than in exons (P < 1.0e-15), but was lower than in the intergenic regions (P < 1.0e-15). Like entropy, the linker ratio in exons was significantly lower than in introns and in intergenic regions (P < 1.0e-15 in both regions; Fig. 4c). These results suggest that nucleosomes are stably and densely located within exonic regions.
Correlation with gene expression
Next, using 173 DNA microarrays for which time-scale expression changes under 16 different stresses have been measured (Gasch et al. 2000), we calculated the standard deviation of log2(ratio) for each gene as a measure of expression variety and examined this value among the clusters (Fig. 5b). Cluster 1, the “dynamic and open” cluster, shows the largest expression variety (Bonferroni-adjusted P < 3.0e-3 in all comparisons from Wilcoxon test). On the other hand, cluster 3, the “stable and open” cluster, shows the smallest expression variety (Bonferroni-adjusted P < 1.5e-12 in all comparisons). For a confirmation, we did an opposite analysis: we classified the genes into 20 clusters based on the degree of their expression variety. As shown in Fig. 5c, clusters with larger variety tend to have higher entropy values while clusters with smaller variety show lower values, an observation which supports our claim that “dynamic” promoters often regulate genes with larger expression variety. We also tested the correlation with the gene expression intensity (interpreted as equivalent with the mRNA amount) by comparing the counts of cDNA tags mapped on the intragenic regions among the four clusters (Fig. 5d; Nagalakshmi et al. 2008). Although the expression level of cluster 1 was slightly higher than cluster 2 (Bonferroni-adjusted P = 8.1e-4), there were no clear differences. Thus, these results support the idea that chromatin remodeling is linked to the degree of expression change but not to its expression level.
Relationship with gene function
Over-represented GO terms in each cluster
DNA metabolic process
Structural molecule activity
Microtubule organizing center
Comparison between two histone modification maps and ours
To clarify the correlation between the variability of nucleosome positioning and the histone modification pattern, we used two datasets: ChIP-on-chip data (Pokholok et al. 2005) and tiling array data (Liu et al. 2005).
Similarly, in the comparison with the tiling array data, hyper-H3K4me1 showed significantly higher entropies while hyper-H3K4me3 and H3K14ac showed significantly lower entropies (P < 1.0e-04 from Wilcoxon test; Supplementary Fig. 9). Note that trimethylation of H3 lysine 79 (H3K79me3), H3K36me3, and H4ac were not included in this tiling array data. The linker ratio was not significantly different between hyper- and hypo-modifications in all of the five modifications (P > 0.01). Thus, at least the consensus of these two results (positive correlation of H3K4me1 and negative correlation of H3K4me3 and H3K14ac) seems to be reliable.
Histone modifications in promoters
In this study, we tested our hypothesis that a compendium of heterogeneous nucleosome maps should give us the local information of nucleosome instability, in other words, nucleosome dynamics. There are two kinds of raw data: the tag counts of next-generation sequencers and the hybridization signals in tiling arrays. They differ both in signal intensities and in resolutions. To make objective alignments of nucleosome positions, it is desirable to use a common method to assign these positions. Therefore, we modified the HMM method by Yuan et al. so that signal gradients, instead of intensity, are used as inputs (Yuan et al. 2005). Another modification was to add two self-loops to nucleosome states to make the model flexible enough to deal with noisy data.
The comparative analysis of the entropy value between known stable nucleosomes and known unstable nucleosomes showed the accuracy of our “nucleosome dynamics” map (Fig. 1b). Although there was a statistically significant difference between the two groups, the distinction was not perfect. It can be interpreted that our map may not be accurate enough to assess the situation of individual nucleosome accurately, but its statistical analysis is meaningful. Then, we analyzed the averaged distribution of the two indicator values (the entropy and the linker ratio) around TSS with three groups of mRNA data and ARSs (Fig. 2). There was a sharp peak of open and dynamic chromatin at around the −100 bp position, and this open region extends to around the −400 bp position from the TSS. A similar peak was also observed within ARSs. On the contrary, there was a weak tendency of dense and stable chromatin just downstream of TSS. Notably, these observations are consistent with our current knowledge of chromatin remodeling in promoters (Shivaswamy et al. 2008) and in replication origins (Field et al. 2008). The observed tendency becomes weaker as the reliability of the data becomes less.
Furthermore, our data indicate that most of the transcription factor binding sites are dynamic and open, more or less. Impressively, GRFs showed especially large entropy/linker ratio values (Fig. 3a). This is consistent with previous studies proposing that GRFs may have chromatin remodeling activity (Fourel et al. 2002; Hartley and Madhani 2009). Our observation that Abf1 and Reb1 show similar values, but are a little apart from Rap1, is also consistent with a previous report (Kaplan et al. 2009; Yarragudi et al. 2004). Mcm1, which is also located near GRFs in our plot, is implicated to be involved in chromatin remodeling (Chang et al. 2003). The fact that Mcm1 regulates the expression of a number of replication initiation factors may be related to our observation that there is a peak of nucleosome dynamics around ARSs (Fig. 2). On the contrary, Yhp1, which can bind to Mcm1 and the sites adjacent to Mcm1-binding sites, was observed in the stable and closed chromatin regions. Pramila et al. previously showed that Yhp1 protein represses cell-cycle-regulated genes with Mcm1 (Pramila et al. 2002). Thus, Yhp1 may restrict the activity of chromatin remodeling of Mcm1. Moreover, the two transcription activators, Ime1 and Fhl1, are also associated with the regulators of chromatin structure. The former factor activates meiosis-specific genes by recruiting with RSC chromatin remodeling complex (Inai et al. 2007); the latter is involved in the activation of ribosomal protein genes with Rap1 and is proposed to mediate the recruitment of Esa1 (Schawalder et al. 2004; Wade et al. 2004).
These results suggest that our map is reliable enough to predict the degree of chromatin remodeling around the binding sites of individual transcription factor. In addition, significant differences of the entropy and the linker ratio between inside and outside of genes (Fig. 4b, c) are consistent with the fact that TFBSs are enriched in intergenic regions, which contain promoters and enhancers. Recent reports suggesting that nucleosome positioning within exon regions may function as an exon–intron marker also support our results (Schwartz et al. 2009; Tilgner et al. 2009).
The correlation between nucleosome positioning and gene expression has often been reported. For example, Lee et al. showed that average nucleosome occupancy in promoters with high expression intensity is significantly lower than that with low intensity using a single nucleosome map (Lee et al. 2007). Our results give a little different interpretation from an additional dimension: based on the entropy value that is derived from the comparison of multiple maps, we further classified the “open” chromatin status, which is characterized by higher linker ratio values, into “dynamic” (higher entropy) and “stable” (lower entropy) statuses (Fig. 5a). As an example, we mapped the four clusters onto the promoter of genes involved in a typical stress-responsive pathway, cAMP-protein kinase A (Aguilera et al. 2007). There is a clear tendency that upstream genes have the “dynamic and open” promoters while downstream ones have “stable” promoters except a few exceptions, such as Msn4 (Supplementary Fig. 10). Additionally, while genes involved in nucleotide metabolism, promoters of which are enriched in cluster 3, are reported to show little changes in expression under various winemaking conditions too (Varela et al. 2005), promoters of upstream genes on another stress-responsible pathway, protein kinase C pathway, such as Rho1 and Mid2, are enriched in cluster 1. Taken together, we conclude that our clustering results seem to reflect the difference between genes that are responsive to various external stimuli and ones that are constantly expressed. Furthermore, the difference between the “open” and “closed” statuses was not significantly correlated with the expression intensity. It is an important observation that is only obtainable using multiple maps.
Although the interpretation of various histone modification data is complicated, the results obtained from two independent datasets (Liu et al. 2005; Pokholok et al. 2005) were basically consistent, further supporting our “nucleosome dynamics” map. Moreover, we observed that the contribution of each histone modification to nucleosome dynamics is different. H3K4me3, H3K14ac, and H4K9ac, which are known to be rich in the 5′ end of active genes (Barski et al. 2007; Heintzman et al. 2007; Li et al. 2007; Liu et al. 2005; Pokholok et al. 2005; Sung and Amasino 2006; Wang et al. 2008), were enriched in stable chromatin promoters in our study. Furthermore, H3K4me3 showed a clear difference in the upstream regions of genes in cluster 3 (Fig. 7). Recently, Vermeulen et al. showed that H3K4me3, but not the other two acetylations, is specifically associated with TFIID complex (Vermeulen et al. 2007). They also showed that H3K9ac and H3K14ac have a potential to enhance TFIID interaction with H3K4me3. Their associations with remodeling factors also have been presented in several studies (Kuo et al. 1998; Wysocka et al. 2006; Zhang et al. 1998). On the other hand, the H3K4me1-rich regions showed significantly higher entropy values (Fig. 6). Since this modification is reported to be associated with enhancer activity (Heintzman et al. 2007), it is possible that the modification modulates the enhancer activity with the change of nucleosome positioning.
From the above results, we conclude that the integration of various maps can show general features of nucleosome dynamics. Using our “nucleosome dynamics” map, a number of novel observations are made. We hope that some of them will be experimentally verified in the future. For example, the order of nucleosome dynamics was: “intergenic regions” > “introns” > “exons”. The degree of nucleosome instability correlates well with their degree of expression variety but not their intensity. The genes whose TSS region is highly dynamic and open tend to encode proteins that can sense extracellular conditions while the genes whose TSS regions are stably open tend to encode nuclear proteins. In addition to the GRFs, there are additional transcription factors whose binding sites exist at dynamically open regions: Leu3, Ime1, Rds1, and Fhl1, two of which have been suggested to be associated with chromatin remodeling.
It is evident that our methodology is quite intuitive, and there is much room for further improvement. Nevertheless, this methodology is promising in clarifying various aspects of epigenetic effects in cellular processes such as gene expression.
Materials and methods
The in vivo nucleosome maps were obtained from four articles (Supplementary Table 1; Kaplan et al. 2009; Lee et al. 2007; Mavrich et al. 2008a, b; Whitehouse et al. 2007). The genome sequences, the gene coordinates, and the GO Slim data of S. cerevisiae were downloaded from the SGD (http://www.yeastgenome.org/). Yeast cDNA tags generated by random primers (GSE11209) were obtained from the Gene Expression Omnibus database (http://www.ncbi.nlm.nih.gov/geo/; Nagalakshmi et al. 2008). The microarray data of time-scale expression changes were obtained from Stanford Genomic Resources (http://www-genome.stanford.edu/yeast_stress/; Gasch et al. 2000). The in vivo binding sites of yeast transcription factors, with the criteria of p value cutoff 0.005 and no conservation, were downloaded from the website of Fraenkel's laboratory (http://fraenkel.mit.edu/; Harbison et al. 2004; MacIsaac et al. 2006). The ChIP-on-chip data for histone acetylations and methylations were obtained from the website of Young's laboratory (http://inside.wi.mit.edu/young/pub/download.html; Pokholok et al. 2005). The tiling array dataset of histone modifications was downloaded from the supporting information of (Liu et al. 2005). The data of stable and unstable nucleosomes was collected from five articles (Almer et al. 1986; Barbaric et al. 1992; Kent et al. 2001; Moreira and Holmberg 1998; Weiss and Simpson 1997). In each article, nucleosomes, which were depleted or largely moved in another condition, were defined as unstable nucleosomes (Supplementary Table 2).
Assignment of nucleosome locations
Since the downloaded nucleosome maps were heterogeneous in both their experimental techniques and the algorithms for nucleosome position assignment (Supplementary Table 1), we modified the HMM algorithm by Yuan et al. to be applicable to both the next-generation sequencer and the tiling array data with various resolutions (Yuan et al. 2005):
Note that the above formulae are applicable to both of the data. It is possible that m = n−1 where the tiling array is sparse.
Our HMM contains three types of hidden states, each of which outputs the gradient signal value: one linker node (L), seven nucleosome nodes on the 5′ side (5N1-5N7), and seven nucleosome nodes on the 3′ side (3N1-3N7; Supplementary Fig. 1b). In addition to one self-connecting loop on the L node for allowing various lengths of linker regions, two self-loops were added to the 5N7 and 3N7 nodes for the detection of wider peaks. The emission probability function in each node is represented by a Gaussian distribution N(μ,ρ), where μ and ρ are parameters that take the same value in all nodes for each state (Supplementary Fig. 1c). All model parameters were estimated from a sliding window of 100 consecutive positions by Baum-Welch algorithm using the gradient signal in chromosome 3 as input (therefore, the parameter values are different for each original map). The averaged parameters from all windows were used for the estimation of hidden states by Viterbi's algorithm.
Since several nucleosome signals were obtained from short DNA sequences, we removed abnormal regions, where the length of the linker region is more than 300 bp (Supplementary Fig. 11).
For control studies, the nucleosome positions were shuffled randomly. This shuffling was iterated ten times. The distance between each nucleosome reported in the original article and the nearest nucleosome detected by our method was used for evaluation.
Similarity of nucleosome maps in highly dynamic regions
In the highly dynamic regions, where the entropy is more than 0.8, the output of Viterbi algorithm was converted to binary code (1 = nucleosome state and 0 = linker state). Phi coefficient was calculated with each pair of nucleosome maps.
Two indicators for representing chromatin status
Additionally, we used the linker ratio, which is defined as the count of linker states within the window divided by the window size (=100 bp) and the number of the nucleosome maps (=7).
Histone modification data analysis
In Pokholok et al.'s data, which covers the whole genome, we used the average of the entropy and the linker ratio within the probe center ±250 bp for characterizing the chromatin state. In Liu et al.'s tiling array data, which covers the entire chromosome 3 and parts of the other chromosomes at 20-bp resolution, we used the values at the corresponding sites as they were. For the analysis of hyper- and hypo-modified sites, the 250 highest and the 250 lowest probes were used.
Transcription factor binding sites
Among the data of 119 transcription factors downloaded from Fraenkel's website, we removed those of 19 factors because their binding sites were not observed in the “normal” regions more than five times. The entropy and the linker ratio at each position within ±50 bp from the binding sites were averaged over all binding sites for each transcription factor. As a control, we randomly picked up 100 sites and calculated the values in the same manner. This sampling was repeated 100 times. The significance of the distribution against the control for each transcription factor was calculated based on the two-dimensional Gaussian distribution.
Mapping of cDNA tags
About 15 million cDNA tags were mapped onto the yeast genome using the Maq software with the option “-n 3 -e 100” (http://maq.sourceforge.net/). Failed tags were further mapped by BLAT with the option “-trimHardA-minIdentity=85 -mask=lower” (Kent 2002). Then, the transcriptome map was constructed by counting the mapped tags within a sliding window of 100 bp with a 10-bp interval. Additionally, the count of tags mapped on the intragenic region of each gene was used as its expression intensity.
Gene clustering and gene ontology analysis
Using the average of the entropy and the linker ratio on the upstream 400 bp region from the TSS of “verified mRNAs” and “uncharacterized mRNAs”, we categorized these mRNAs (genes) into four clusters using Ward's method. In this clustering, we ignored 683 genes, whose promoters were overlapped with the above-mentioned abnormal regions. In each cluster, p values for 84 GO terms were calculated by the hyper-geometric test and were adjusted using Benjamini-Hochberg's method (false discovery rate <0.05).
We are grateful to Riu Yamashita, Alexis Vandenbon, and all other members of the Nakai–Kinoshita Laboratory for their valuable advice and discussions. We also thank Takashi Ito for valuable comments. Computational time was provided by the Super Computer System, Human Genome Center, Institute of Medical Science, University of Tokyo. This work was supported in part by Global COE Program (Center of Education and Research for Advanced Genome-Based Medicine), MEXT, Japan.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.