One genome, many epigenomes

Embryonic stem cells (ESCs) and the early developmental stage embryo share a unique property called pluripotency, which is the ability to give rise to the three germ layers (endoderm, ectoderm and mesoderm) and, consequently, all tissues represented in the adult organism [1, 2]. Pluripotency can also be induced in somatic cells during in vitro reprogramming, leading to the formation of so-called induced pluripotent stem cells (iPSCs; extensively reviewed in [37]). In order to fulfill the therapeutic potential of human ESCs (hESCs) and iPSCs, an understanding of the fundamental molecular properties underlying the nature of pluripotency and commitment is required, along with the development of methods for assessing biological equivalency among different cell populations.

Functional complexity of the human body, with over 200 specialized cell types, and intricately built tissues and organs, arises from a single set of instructions: the human genome. How, then, do distinct cellular phenotypes emerge from this genetic homogeneity? Interactions between the genome and its cellular and signaling environments are the key to understanding how cell-type-specific gene expression patterns arise during differentiation and development [8]. These interactions ultimately occur at the level of the chromatin, which comprises the DNA polymer repeatedly wrapped around histone octamers, forming a nucleosomal array that is further compacted into the higher-order structure. Regulatory variation is introduced to the chromatin via alterations within the nucleosome itself - for example, through methylation and hydroxymethylation of DNA, various post-translational modifications (PTMs) of histones, and inclusion or exclusion of specific histone variants [915] - as well as via changes in nucleosomal occupancy, mobility and organization [16, 17]. In turn, these alterations modulate access of sequence-dependent transcriptional regulators to the underlying DNA, the level of chromatin compaction, and communication between distant chromosomal regions [18]. The entirety of chromatin regulatory variation in a specific cellular state is often referred to as the 'epigenome' [19].

Technological advances have made the exploration of epigenomes feasible in a rapidly increasing number of cell types and tissues. Systematic efforts at such analyses had been undertaken by the human ENCyclopedia Of DNA Elements (ENCODE) and NIH Roadmap Epigenomics projects [20, 21]. These and other studies have already produced, and will generate in the near future, an overwhelming amount of genome-wide datasets that are often not readily comprehensible to many biologists and physicians. However, given the importance of epigenetic patterns in defining cell identity, understanding and utilizing epigenomic mapping will become a necessity in both basic and translational stem cell research. In this review, we strive to provide an overview of the main concepts, technologies and outputs of epigenomics in a form that is accessible to a broad audience. We summarize how epigenomes are studied, discuss what we have learned so far about unique epigenetic properties of hESCs and iPSCs, and envision direct implications of epigenomics in translational research and medicine.

Technological advances in genomics and epigenomics

Epigenomics is defined here as genomic-scale studies of chromatin regulatory variation, including patterns of histone PTMs, DNA methylation, nucleosome positioning and long-range chromosomal interactions. Over the past 20 years, many methods have been developed to probe different forms of this variation. For example, a plethora of antibodies recognizing specific histone modifications has been developed and used in chromatin immunoprecipitation (ChIP) assays for studying the local enrichment of histone PTMs at specific loci [22, 23]. Similarly, bisulfite-sequencing (BS-seq)-based, restriction enzyme-based and affinity-based approaches for analyzing DNA methylation have been established [24, 25], in addition to methods to identify genomic regions with low-nucleosomal content (for example, DNAse I hypersensitivity assay) [26] and to probe long-range chromosomal interactions (such as chromosomal conformation capture or 3C [27]).

Although these approaches were first established for low- to medium-throughput studies (for example, interrogation of a selected subset of genomic loci), recent breakthroughs in next-generation sequencing have allowed rapid adaptation and expansion of existing technologies for genome-wide analyses of chromatin features with an unprecedented resolution and coverage [2844]. These methodologies include, among others, the ChIP-sequencing (ChIP-seq) approach to map histone modification patterns and occupancy of chromatin modifiers in a genome-wide manner, and MethylC sequencing (MethylC-seq) and BS-seq techniques for large-scale analysis of DNA methylation at single-nucleotide resolution. The main epigenomic technologies have been reviewed recently [4547] and are listed in Table 1. The burgeoning field of epigenomics has already begun to reveal the enormous predictive power of chromatin profiling in annotating functional genomic elements in specific cell types. Indeed, chromatin signatures that characterize different classes of regulatory elements, including promoters, enhancers, insulators and long non-coding RNAs, have been uncovered (summarized in Table 2). Additional signatures that further specify and distinguish unique classes of genomic regulatory elements are likely to be discovered over the next few years. In the following section we summarize epigenomic studies of hESCs and pinpoint unique characteristics of the pluripotent cell epigenome that they reveal.

Table 1 Next-generation sequencing-based methods used in epigenomic studies
Table 2 Chromatin signatures defining different classes of regulatory elements

Epigenomic features of hESCs

ESCs provide a robust, genomically tractable in vitro model to investigate the molecular basis of pluripotency and embryonic development [1, 2]. In addition to sharing many fundamental properties with chromatin of somatic cells, chromatin of pluripotent cells appears to have unique features, such as the increased mobility of many structural chromatin proteins, including histones and heterochromatin protein 1 [48], and differences in nuclear organization suggestive of a less compacted chromatin structure [4851]. Recent epigenomic profiling of hESCs has uncovered several characteristics that, although not absolutely unique to hESCs, appear particularly pervasive in these cells [5254]. Below, we focus on these characteristics and their potential role in mediating the epigenetic plasticity of hESCs.

Bivalent domains at promoters

The term 'bivalent domains' is used to describe chromatin regions that are concomitantly modified by the trimethylation of lysine 4 of histone H3 (H3K4me3), a modification generally associated with transcriptional initiation, and trimethylation of lysine 27 of histone H3 (H3K27me3), a modification associated with Polycomb-mediated gene silencing. Although first described and most extensively characterized in mouse ESCs (mESCs) [55, 56], bivalent domains are also present in hESCs [57, 58], and in both species they mark transcription start sites of key developmental genes that are poorly expressed in ESCs, but induced upon differentiation. Albeit defined by the presence of H3K27me3 and H3K4me3, bivalent promoters are also characterized by other features, such as the occupancy of the histone variant H2AZ [59]. Upon differentiation, bivalent domains at specific promoters resolve into a transcriptionally active H3K4me3-marked monovalent state, or a transcriptionally silent H3K27me3-marked monovalent state, depending on the lineage commitment [42, 56]. However, a subset of bivalent domains is retained upon differentiation [42, 60], and bivalently marked promoters have been observed in many progenitor cell populations, perhaps reflecting their remaining epigenetic plasticity [60]. Nevertheless, promoter bivalency seems considerably less abundant in differentiated cells, and appears to be further diminished in unipotent cells [42, 54, 56]. These observations led to the hypothesis that bivalent domains are important for pluripotency, allowing early developmental genes to remain silent yet able to rapidly respond to differentiation cues. A similar function of promoter bivalency can be hypothesized for multipotent or oligopotent progenitor cell types. However, it needs to be more rigorously established how many of the apparently 'bivalent' promoters observed in progenitor cells truly posses this chromatin state, and how many reflect heterogeneity of the analyzed cell populations, in which some cells display H4K4me3-only and others H3K27me3-only signatures at specific promoters.

Poised enhancers

In multicellular organisms, distal regulatory elements, such as enhancers, play a central role in cell-type and signaling-dependent gene regulation [61, 62]. Although embedded within the vast non-coding genomic regions, active enhancers can be identified by epigenomic profiling of certain histone modifications and chromatin regulators [6365]. A recent study revealed that unique chromatin signatures distinguish two functional enhancer classes in hESCs: active and poised [66]. Both classes are bound by coactivators (such as p300 and BRG1) and marked by H3K4me1, but while the active class is enriched in acetylation of lysine 27 of histone H3 (H3K27ac), the poised enhancer class is marked by H3K27me3 instead. Active enhancers are typically associated with genes expressed in hESCs and in the epiblast, whereas poised enhancers are located in proximity to genes that are inactive in hESCs, but which play critical roles during early stages of post-implantation development (for example, gastrulation, neurulation, early somitogenesis). Importantly, upon signaling stimuli, poised enhancers switch to an active chromatin state in a lineage-specific manner and are then able to drive cell-type-specific gene expression patterns. It remains to be determined whether H3K27me3-mediated enhancer poising represents a unique feature of hESCs. Recent work by Creighton et al. [67] suggests that poised enhancers are also present in mESCs and in various differentiated mouse cells, although in this case the poised enhancer signature did not involve H3K27me3, but H3K4me1 only. Nevertheless, our unpublished data indicate that, similar to the bivalent domains at promoters, simultaneous H3K4me1/H3K27me3 marking at enhancers is much less prevalent in more restricted cell types compared with both human and mouse ESCs (A Rada-Iglesias, R Bajpai and J Wysocka, unpublished observations). Future studies should clarify whether poised enhancers are marked by the same chromatin signature in hESCs, mESCs and differentiated cell types, and evaluate the functional relevance of the Polycomb-mediated H3K27 methylation at enhancers.

Unique DNA methylation patterns

Mammalian DNA methylation occurs at position 5 of cytosine residues, generally in the context of CG dinucleotides (that is, CpG dinucleotides), and has been associated with transcriptional silencing both at repetitive DNA, including transposon elements, and at gene promoters [13, 14]. Initial DNA methylation studies of mESCs revealed that most CpG-island-rich gene promoters, which are typically associated with house-keeping and developmental genes, are DNA hypomethylated, whereas CpG-island-poor promoters, typically associated with tissue-specific genes, are hypermethylated [41, 60]. Moreover, methylation of H3K4 at both promoter-proximal and distal regulatory regions is anti-correlated with their DNA methylation level, even at CpG-island-poor promoters [60]. Nevertheless, these general correlations are not ESC-specific features as they have also been observed in a variety of other cell types [25, 60, 68]. On the other hand, recent comparisons of DNA methylation in early pre- and postimplantation mouse embryos with those of mESCs revealed that, surprisingly, mESCs accumulate promoter DNA methylation that is more characteristic of the postimplantation stage embryos rather than the blastocyst from which they are derived [69].

Although the coverage and resolution of mammalian DNA methylome maps have been steadily increasing, whole-genome analyses of human methylomes at single-nucleotide resolution require an enormous sequencing effort and have been reported only recently [70]. These analyses revealed that in hESCs, but not in differentiated cells, a significant proportion (approximately 25%) of methylated cytosines are found in a non-CG context. Non-CG methylation is a common feature of plant epigenomes [40] and, while it has been previously reported to occur in mammalian cells [71], its contribution to as much as a quarter of all cytosine methylation in hESCs had not been anticipated. It remains to be established whether non-CG methylation in hESCs is functionally relevant or, alternatively, is simply a by-product of high levels of de novo DNA methyltransferases and a hyperdynamic chromatin state that characterizes hESCs [49, 50, 72]. Regardless, its prevalence in hESC methylomes emphasizes unique properties of pluripotent cell chromatin. However, one caveat to the aforementioned study and all other BS-seq-based analyses of DNA methylation is their inability to distinguish between methylcytosine (5mC) and hydroxymethylcytosine (5hmC), as both are refractory to bisulfite conversion [15, 73], and thus it remains unclear how much of what has been mapped as DNA methylation in fact represents hydroxymethylation.

DNA hydroxymethylation

Another, previously unappreciated modification of DNA, hydroxymethylation, has become a subject of considerable attention. DNA hydroxymethylation is mediated by the TET family enzymes [15], which convert 5mC to 5hmC. Recent studies have shown that mESCs express high levels of TET proteins, and consequently their chromatin is 5hmC-rich [74, 75], a property that, to date, has only been observed in a limited number of other cell types - for example, in Purkinje neurons [76]. Although the functionality of 5hmC is still unclear, it has been suggested that it represents a first step in either active or passive removal of DNA methylation from select genomic loci. New insights into 5hmC genomic distribution in mESCs have been obtained from studies that utilized immunoprecipitation with 5hmC-specific antibodies coupled to next-generation sequencing or microarray technology, respectively [77, 78], revealing that a significant fraction of 5hmC occurs within gene bodies of transcriptionally active genes and, in contrast to 5mC, also at CpG-rich promoters [77], where it overlaps with the occupancy of the Polycomb complex PRC2 [78]. Intriguingly, a significant fraction of the intra-genic 5hmC occurs within a non-CG context [77], which prompts investigating whether a subset of the reported non-CG methylation in hESCs might actually represent 5hmC. Future studies should establish whether hESCs show a similar 5hmC distribution to mESCs. More importantly, it will be essential to re-evaluate the extent to which cytosine residues that have been mapped as methylated in hESCs are indeed hydroxymethylated, and to determine the functional relevance of this novel epigenetic mark.

Reduced genomic blocks marked by repressive histone modifications

A comprehensive study of epigenomic profiles in hESCs and human fibroblasts showed that, in differentiated cells, regions enriched in histone modifications associated with heterochromatin formation and gene repression, such as H3K9me2/3 and H3K27me3, are significantly expanded [79]. These two histone methylation marks cover only 4% of the hESC genome, but well over 10% of the human fibroblast genome. Parallel observations have been made independently in mice, where large H3K9me2-marked regions are more frequent in adult tissues in comparison with mESCs [80]. Interestingly, H3K9me2-marked regions largely overlap with the recently described nuclear lamina-associated domains [81], suggesting that the appearance or expansion of the repressive histone methylation marks might reflect a profound three-dimensional reorganization of chromatin during differentiation [82]. Indeed, heterochromatic foci increase in size and number upon ESC differentiation, and it has been proposed that an 'open', hyperdynamic chromatin structure is a crucial component of pluripotency maintenance [4850].

Are hESCs and iPSCs epigenetically equivalent?

Since Yamanaka's seminal discovery in 2006 showing that introduction of the four transcription factors Oct4, Sox2, Klf4 and c-Myc is sufficient to reprogram fibroblasts to a pluripotent state, progress in the iPSC field has been breathtaking [4, 83, 84]. iPSCs have now been generated from a variety of adult and fetal somatic cell types using a myriad of alternative protocols [3, 6, 7]. Remarkably, the resulting iPSCs seem to share phenotypic and molecular properties of ESCs; these properties include pluripotency, self-renewal and similar gene expression profiles. However, an outstanding question remains: to what extent are hESCs and iPSCs functionally equivalent? The most stringent pluripotency assay, tetraploid embryo complementation, demonstrated that mouse iPSCs can give rise to all tissues of the embryo proper [85, 86]. On the other hand, many iPSC lines do not support tetraploid complementation, and those that do remain quite inefficient in comparison with mESCs [85, 87]. Initial genome-wide comparisons between ESCs and iPSCs focused on gene expression profiles, which reflect the transcriptional state of a given cell type, but not its developmental history or differentiation potential [4, 84, 88]. These additional layers of information can be uncovered, at least partially, by examining epigenetic landscapes. In this section, we summarize studies comparing DNA methylation and histone modification patterns in ESCs and iPSCs.

Sources of variation in iPSC and hESC epigenetic landscapes

Bird's eye view comparisons show that all major features of the hESC epigenome are re-established in iPSCs [89, 90]. On the other hand, when more subtle distinctions are considered, recent studies have reported differences between iPSC and hESC DNA methylation and gene expression patterns [9094]. Potential sources of these differences can be largely divided into three groups: (i) experimental variability in cell line derivation and culture; (ii) genetic variation among cell lines; and (iii) systematic differences representing hotspots of aberrant epigenomic reprogramming.

Although differences arising as a result of experimental variability do not constitute biologically meaningful distinctions between the two stem cell types, they can be informative when assessing the quality and differentiation potential of individual lines [91, 95]. The second source of variability is a natural consequence of the genetic variation among human cells or embryos from which iPSCs and hESCs are respectively derived. Genetic variation likely underlies many of the line-to-line differences in DNA and histone modification patterns, underscoring the need for using cohorts of cell lines and stringent statistical analyses to draw systematic comparisons between hESCs, healthy donor-derived iPSCs, and disease-specific iPSCs. In support of the significant impact of human genetic variation on epigenetic landscapes, recent studies of specific chromatin features in lymphoblastoid cells [96, 97] isolated from related and unrelated subjects showed that individual, as well as allele-specific, heritable differences in chromatin signatures can be largely explained by the underlying genetic variants. Although genetic differences make comparisons between hESC and iPSC lines less straightforward, we will discuss later how these can be harnessed to uncover the role of specific regulatory sequence variants in human disease. Finally, systematic differences between hESC and iPSC epigenomes may arise through the incomplete erasure of marks characteristic of the somatic cell type of origin (somatic memory) during iPSC reprogramming, or defects in the re-establishment of hESC-like patterns in iPSCs, or as a result of selective pressure during reprogramming and the appearance of iPSC-specific signatures [90, 98]. Regardless of the underlying sources of variation, understanding epigenetic differences between hESC and iPSC lines will be essential for harnessing the potential of these cells in regenerative medicine.

Remnants of the somatic cell epigenome in iPSCs: lessons from DNA methylomes

Studies of stringently defined models of mouse reprogramming have shown that cell-type-of-origin-specific differences in gene expression and differentiation potential exist in early passage iPSCs, leading to the hypothesis that an epigenetic memory of previous fate persists in these cells [98, 99]. This epigenetic memory has been attributed to the presence of residual somatic DNA methylation in iPSCs, most of which is retained within regions located outside of, but in proximity to, CpG islands, at so-called 'shores' [98, 100]. The incomplete erasure of somatic methylation appears to predispose iPSCs to differentiation into fates related to the cell type of origin, while restricting differentiation towards other lineages. Importantly, this residual memory of past fate appears to be transient, and diminishes upon continuous passaging, serial reprogramming or treatment with small molecule inhibitors of histone deacetylase or DNA methyltransferase activity [98, 99]. These results suggest that remnants of somatic DNA methylation are not actively maintained in iPSCs during replication and thus can be erased through cell division.

More recently, whole-genome, single-base-resolution DNA methylome maps have been generated for five distinct human iPSC lines and compared with those of hESCs and somatic cells [90]. That study demonstrated that although the hESC and iPSC DNA methylation landscapes are remarkably similar overall, hundreds of differentially methylated regions (DMRs) exist. Nevertheless, only a small fraction of DMRs represents failure in erasure of somatic DNA methylation, whereas the vast majority corresponds to either hypomethylation (defects in the methylation of genomic regions that are marked in hESCs) or the appearance of iPSC-specific methylation patterns, not present in hESCs or the somatic cell type of origin. Moreover, these DMRs are likely to be resistant to passaging, as the methylome analyses were performed using relatively late passage iPSCs [80]. Due to a limited number of iPSC and hESC lines used in the study, genetic and experimental variation among individual lines may be a big contributor to the reported DMRs. However, a significant subset of DMRs is shared among iPSC lines of different genetic background and cell type of origin, and is transmitted through differentiation, suggesting that at least some DMRs may represent non-stochastic epigenomic hotspots that are refractive to reprogramming.

Reprogramming resistance of subtelomeric and subcentromeric regions?

In addition to erasing somatic epigenetic marks, an essential component of reprogramming is the faithful re-establishment of hESC-like epigenomic features. Although, as discussed above, most of the DNA methylation is correctly re-established during reprogramming, large megabase-scale regions of reduced methylation can be detected in iPSCs, often within the vicinity of centromeres and telomeres [90]. Biased depletion of DNA methylation from subcentromeric and subtelomeric regions correlates with blocks of H3K9me3 that mark these loci in iPSCs and somatic cells, but not in hESCs [79, 90]. Aberrant DNA methylation in proximity to centromeres and telomeres suggests that these chromosomal territories may have features that render them more resistant to epigenetic changes. Intriguingly, histone variant H3.3, which is generally implicated in transcription-associated and replication-independent histone deposition, was recently found to also occupy subtelomeric and subcentromeric regions in mESCs and mouse embryo [36, 101, 102]. It has been previously suggested that H3.3 plays a critical role in the maintenance of transcriptional memory during reprogramming of somatic nuclei by the egg environment (that is, reprogramming by somatic cell nuclear transfer) [103], and it is tempting to speculate that a similar mechanism may contribute to the resistance of the subtelomeric and subcentromeric regions to reprogramming in iPSCs.

Anticipating future fates: reprogramming at regulatory elements

Pluripotent cells are in a state of permanent anticipation of many alternative developmental fates, and this is reflected in the prevalence of the poised promoters and enhancers in their epigenomes [42, 66]. Although multiple studies have demonstrated that bivalent domains at promoters are re-established in iPSCs with high fidelity [89], the extent to which chromatin signatures associated with poised developmental enhancers in hESCs are recapitulated in iPSCs remains unclear. However, the existence of a large class of poised developmental enhancers linked to genes that are inactive in hESCs, but involved in postimplantation steps of human embryogenesis [66], suggest that proper enhancer rewiring to a hESC-like state may be central to the differentiation potential of iPSCs. Defective epigenetic marking of developmental enhancers to a poised state may result in impaired or delayed ability of iPSCs to respond to differentiation cues, without manifesting itself at the transcriptional or promoter modification level in the undifferentiated state. Therefore, we would argue that epigenomic profiling of enhancer repertoires should be a critical component in evaluating iPSC quality and differentiation potential (Figure 1) and could be incorporated into already existing pipelines [91, 95].

Figure 1
figure 1

Epigenomics as a tool to assess iPSC identity. Chromatin signatures obtained by epigenomic profiling of a cohort of human embryonic stem cell (hESC) lines can be used to generate hESC reference epigenomes (left panels). The extent of reprogramming and differentiation potential of individual induced pluripotent stem cell (iPSC) lines can be assessed by comparing iPSC epigenomes (right panels) to the reference hESC epigenomes. (a-c) Such comparisons should evaluate epigenetic states at regulatory elements of self-renewal genes that are active in hESCs (a), developmental genes that are poised in hESCs (b), and tissue-specific genes that are inactive in hESCs, but are expressed in the cell type of origin used to derive iPSC (c). H3K4me1, methylation of lysine 4 of histone H3; H3K4me3, trimethylation of lysine 4 of histone H3; H3K27ac, acetylation of lysine 27 of histone H3; H3K27me3, trimethylation of lysine 27 of histone H3; meC, methylcytosine.

Relevance of epigenomics for human disease and regenerative medicine

In this section, we envision how recent advances in epigenomics can be used to gain insight into human development and disease, and to facilitate the transition of stem cell technologies towards clinical applications.

Using epigenomics to predict developmental robustness of iPSC lines for translational applications

As discussed earlier, epigenomic profiling can be used to annotate functional genomic elements in a genome-wide and cell-type specific manner. Distinct chromatin signatures can distinguish active and poised enhancers and promoters, identify insulator elements and uncover non-coding RNAs transcribed in a given cell type [42, 56, 63, 64, 66, 104, 105] (Table 2). Given that developmental potential is likely to be reflected in the epigenetic marking of promoters and enhancers linked to poised states, epigenomic maps should be more predictive of iPSC differentiation capacity than transcriptome profiling alone (Figure 1). However, before epigenomics can be used as a standard tool in assessing iPSC and hESC quality in translational applications, the appropriate resources need to be developed. For example, although ChIP-seq analysis of chromatin signatures is extremely informative, its reliance on antibody quality requires the development of renewable, standardized reagents. Also, importantly, to assess the significance of epigenomic pattern variation, sufficient numbers of reference epigenomes need to be obtained from hESC and iPSC lines that are representative of genetic variation and have been rigorously tested in a variety of differentiation assays. The first forays towards the development of such tools and resources have already been made [89, 91, 106, 107].

Annotating regulatory elements that orchestrate human differentiation and development

As a result of ethical and practical limitations, we know very little about the regulatory mechanisms that govern early human embryogenesis. hESC-based differentiation models offer a unique opportunity to isolate and study cells that correspond to transient progenitor states arising during human development. Subsequent epigenomic profiling of hESCs that have been differentiated in vitro along specific lineages can be used to define the functional genomic regulatory space, or 'regulatome', of a given cell lineage (Figure 2a). This approach is particularly relevant for genome-wide identification of tissue-specific enhancers and silencers, which are highly variable among different, even closely related, cell types. Characterizing cell-type-specific regulatomes will be useful for comparative analyses of gene expression circuitries. In addition, through bioinformatic analysis of the underlying DNA sequence, they can be used to predict novel master regulators of specific cell fate decisions, and these can then serve as candidates in direct transdifferentiation approaches. Moreover, mapping enhancer repertoires provides an enormous resource for the development of reporters for isolation and characterization of rare human cell populations, such as the progenitor cells that arise only transiently in the developmental process [66]. Ultimately, this knowledge will allow refinement of the current differentiation protocols and derivation of well-defined, and thus safer and more appropriate, cells for replacement therapies [3, 108110]. Furthermore, as discussed below, characterizing cell-type specific regulatomes will be essential for understanding non-coding variation in human disease.

Figure 2
figure 2

The combination of stem cell models and epigenomics in studies of the role of non-coding mutations in human disease. Epigenomic analyses of cells derived through in vitro stem cell differentiation models can be used to define the functional regulatory space, or 'regulatome', of a given cell type and to study the significance of the non-coding genetic variation in human disease. (a) The vast non-coding fraction of the human genome can be significantly reduced by defining the regulatome of a given cell type via epigenomic profiling of chromatin signatures that define different types of regulatory elements, such as enhancers, promoters and insulators. Regulatome maps obtained in the disease-relevant cell types define genomic space that can be subsequently searched for the recurrent disease-associated genetic variants. (b) Most genetic variants associated with complex human diseases appear to reside in non-coding regions of the human genome. To assess functional consequences of such variants, disease-relevant cell types can be derived from healthy and disease-affected donor induced pluripotent stem cells (iPSCs) and epigenomic profiling can be used to evaluate how these genetic variants affect chromatin signatures, and transcription factor and coactivator occupancy at regulatory elements. CTCF, CCCTC-binding factor, insulator associated protein; ESC, embryonic stem cell; H3K4me1, methylation of lysine 4 of histone H3; H3K4me3, trimethylation of lysine 4 of histone H3; H3K27ac, acetylation of lysine 27 of histone H3; H3K27me3, trimethylation of lysine 27 of histone H3; meC, methylcytosine.

Cell-type-specific regulatomes as a tool for understanding the role of non-coding mutations in human disease

During the past few years, genome-wide association studies have dramatically expanded the catalog of genetic variants associated with some of the most common human disorders, such as various cancer types, type 2 diabetes, obesity, cardiovascular disease, Crohn's disease and cleft lip/palate [111118]. One recurrent observation is that most disease-associated variants occur in non-coding parts of the human genome, suggesting a large non-coding component in human phenotypic variation and disease. Indeed, several studies document a critical role for genetic aberrations occurring within individual distal enhancer elements in human pathogenesis [119121]. To date, the role of regulatory sequence mutation in human disease has not been systematically examined. However, given the rapidly decreasing cost of high-throughput sequencing and the multiple disease-oriented whole genome sequencing projects that are under way, the next years will bring the opportunity and challenge to ascribe functional significance to disease-associated non-coding mutations [122]. Doing so will require both an ability to identify and obtain cell types relevant to disease, and the ability to characterize their specific regulatomes.

We envision that combining pluripotent cell differentiation models with epigenomic profiling will provide an important tool for uncovering the role of non-coding mutations in human disease. For example, if the disease of interest affects a particular cell type that can be derived in vitro from hESCs, characterizing the reference regulatome of this cell type, as described above, will shrink the vast genomic regions that might be implicated in disease into a much smaller regulatory space that can be more effectively examined for recurrent variants that are associated with disease (Figure 2a). The function of these regulatory variants can be further studied using in vitro and in vivo models, of which iPSC-based 'disease in a dish' models appear particularly promising [123]. For example, disease-relevant cell types obtained from patient-derived and healthy-donor-derived iPSCs can be used to study the effects of the disease genotype on cell-type-specific regulatomes (Figure 2b). Moreover, given that many, if not most, regulatory variants are likely to be heterozygous in patients, loss or gain of chromatin features associated with those variants (such as p300 binding, histone modifications and nucleosome occupancy) can be assayed independently for each allele within the same iPSC line. Indeed, allele-specific sequencing assays are already being developed [42, 96, 97, 124] (Table 1). Moreover, these results can be compared with allele-specific RNA-seq transcriptome analyses from the same cells [125], yielding insights into the effects of disease-associated regulatory alleles on the transcription of genes located in relative chromosomal proximity [96, 125].

Conclusions and future perspective

Analyses of hESC and iPSC chromatin landscapes have already provided important insights into the molecular basis of pluripotency, reprogramming and early human development. Our current view of the pluripotent cell epigenome has been largely acquired due to recent advances in next-generation sequencing technologies, such as ChIP-seq or MethylC-seq. Several chromatin features, including bivalent promoters, poised enhancers and pervasive non-CG methylation seem to be more abundant in hESCs compared with differentiated cells. It will be important in future studies to dissect the molecular function of these epigenomic attributes and their relevance for hESC biology. Epigenomic tools are also being widely used in the evaluation of iPSC identity. In general, the epigenomes of iPSC lines seem highly similar to those of hESC lines, albeit recent reports suggest that differences in DNA methylation patterns exist between the two pluripotent cell types. It will be important to understand the origins of these differences (that is, somatic memory, experimental variability, genetic variation), as well as their impact on iPSC differentiation potential or clinical applications. Moreover, additional epigenetic features other than DNA methylation should be thoroughly compared, including proper re-establishment of poised enhancer patterns. As a more complete picture of the epigenomes of ESCs, iPSCs and other cell types emerges, important lessons regarding early developmental decisions in humans will be learnt, facilitating not only our understanding of human development, but also the establishment of robust in vitro differentiation protocols. These advancements will in turn allow for generation of replacement cells for cellular transplantation approaches and for development of the appropriate 'disease in a dish' models. Within such models, epigenomic profiling could be especially helpful in understanding the genetic basis of complex human disorders, where most of the causative variants are predicted to occur within the vast non-coding fraction of the human genome.