Motivation

Epigenetic regulations are important mechanisms for transcriptional control. There is evidence that neighbouring genes, although not always involved in the same pathways, are still similarly regulated via various histone modifications. Currently, most studies are limited to local epigenetic patterns, whereas methods for analysing large-scale organizations are still lacking.

Methods

We developed a computational approach to detect multi- gene domains with coherent epigenetic patterns. We applied this method to analyse a published ChIP-seq dataset for five different histone modification marks (H3K4me2, H3K4me3, H3K27me3, H3K9me3, H3K36me3) in mouse embryonic stem cells. We first obtained a 5-dimenisinal score for all known genes based on average modification activity in select regions. Then, with hidden Markov models and corresponding algorithms, we were able to determine the most probable domain status of each gene. We find that a three-state hidden Markov model can best describe the data, where the states correspond to active, inactive, and null domains.

Results

This model predicts 339 significantly large multi-gene domains, including known domains such as the olfactory receptor (OR) gene clusters, but also previously uncharacterized domains (Figure 1). We also noted less histone modification variability within each of our domains when compared to randomly selected boundaries. We further validated our predictions against gene expression and Gene Ontology data and found our domains were functionally relevant.

Figure 1
figure 1

Heatmaps of known gene clusters. The 35 gene region on Ch6 from Npy to 2410066E13Rik as depictured as a heatmap of histone modification and gene expression for (a) the ES cell line and (c) ES and NP cell lines. The 250 gene region on Ch7 from rt2a to Insc as depictured as a heatmap of histone modification and gene expression for (b) the ES cell line and (d) ES and NP cell lines. For all figures on the left (a and c), cluster state assignment is given in the first and second tracks and HMM state assignments are in the third and fourth tracks (red for active state, blue for non-active state, yellow for null state). For figures on the right (b and d), NP HMM state assignment is given in the first and second tracks and ES HMM state assignments are in the third and fourth tracks. Whether (black) or not (white) a gene is a respective gene cluster in shown in the bottom track in all figures.

Conclusion

Our method provides a novel approach to analyse large-scale epigenetic patterns. As we continue to apply our method to other cell lines, we will provide important insight into the general structure, organization, and regulation of the mammalian genome.