Main

During development, cells within a multicellular organism progressively differentiate into functionally and phenotypically distinct fates, a specialization that is enabled by cell type-specific gene expression. Once established, cellular states are remarkably stable and can be sustained over many cell divisions throughout an organism's lifespan. The process of differentiation proceeds from the totipotent zygote, which is itself formed at fertilization by the reactivation of early developmental programmes within the nuclei of two highly specialized gametes. The tremendous reprogramming potential of the ooplasm is highlighted by somatic cell nuclear transfer (SCNT) experiments, in which specialized cells from any somatic lineage are rapidly directed to totipotency by the cytoplasm of the enucleated oocyte (Box 1). The feasibility of using SCNT to generate germline-competent organisms proved that developmental processes are imposed strictly by epigenetic mechanisms and are thus reversible. SCNT remained a major technology for studying the regulatory mechanisms behind this functional reset, until the demonstration that pluripotency could be accomplished in vitro by the ectopic expression of only four transcription factors — OCT4 (also known as POU5F1), SRY-box 2 (SOX2), Krüppel-like factor 4 (KLF4) and MYC, collectively referred to as OSKM — to produce induced pluripotent stem cells (iPSCs)1 (Fig. 1). Whereas SCNT is technically challenging and utilizes a limited gametic resource, reprogramming by OSKM is experimentally tractable and can be performed in vitro on large populations of cells, enabling systematic biochemical and genomic characterization of the mechanisms that impose or surmount the epigenetic constraints governing cellular identity (see Supplementary information S1 (box)).

Figure 1: Approaches for reprogramming somatic nuclei.
figure 1

A somatic nucleus can be directed to an early embryonic state through three general approaches: somatic cell nuclear transfer (SCNT), fusion with embryonic stem cells (ES cells) and ectopic factor expression (direct reprogramming). In somatic nuclei, pluripotency-specific genes such as Oct4 are epigenetically silenced. Nuclear potential scales from totipotent (the ability to generate an entire organism, including extra-embryonic and embryonic tissues) to unipotent. In SCNT, somatic nuclei are transferred into enucleated oocytes. Upon oocyte activation, maternal factors rapidly and globally remodel the somatic genome (Box 1). Successfully reprogrammed SCNT zygotes have restored totipotency and express select pluripotency factors, such as Oct4, that have developmental functions in both the early embryo and inner cell mass (ICM). After ∼3–4 days, the ICM is formed in the blastocyst and developmental potency is restricted, producing pluripotent cells that contribute to the embryo proper, or from which ES cells can be derived. In experimentally induced ES cell fusion, a somatic cell is fused with a pluripotent cell. Reprogramming is initiated in the heterokaryon phase, during which both nuclei remain discrete, and includes global epigenetic remodelling that precedes the activation of pluripotent loci216,217. In the late heterokaryon phase, select loci are activated through a process that may require DNA replication218,219,220. The somatic and pluripotent nuclei fuse after cell division, and additional genes are then reprogrammed to consolidate the pluripotent network within the somatic genome218,220. During direct reprogramming, OCT4, SRY-box 2 (SOX2), Krüppel-like factor 4 (KLF4) and MYC (OSKM) are introduced into somatic cells, which respond by increasing proliferation and undergoing local changes to their epigenome. Shortly afterwards, KLF4 induces epithelial genes, and genes supporting the mesenchymal state are repressed. Early responses do not include activation of true pluripotency-associated regulators, including Oct4. The transition from a factor-dependent, non-pluripotent state to induced pluripotent stem cells (iPSCs) requires multiple divisions and an extended latency period under persistent ectopic OSKM expression. Complete reprogramming of the somatic nucleus occurs within a small percentage of responding cells and includes activation of the endogenous network, as well as consolidation of additional global epigenetic features of stable pluripotency. NT zygote, nuclear transfer zygote.

PowerPoint slide

In this Review, we describe insights that have been uncovered using direct reprogramming as an experimental tool, beginning by defining cellular identity as a specific molecular state that is stabilized and maintained through cooperating transcriptional and epigenetic mechanisms, including many parameters that are unique to functional pluripotency (that is, the ability to contribute to the formation of all embryonic tissues). We also address how chromatin remodellers, transcription factors and various levels of epigenetic regulation are coordinated to re-establish developmental potential in vitro.

A molecular definition of cell state

Cellular identity and differentiation potential were originally assigned on the basis of phenotype and lineage history, as well as from the stability of these traits following transplantation of cells to ectopic sites within the developing embryo. Studies of molecular regulation of cellular identity initially focused on master transcription factors that are expressed at the inception of a cell lineage and that are often necessary or sufficient to direct its identity2. However, genomic expression profiling strategies revealed substantial redundancy in the function of specific master regulators within different cell types of the same lineage, or even across lineages, indicating that they perform context-specific activities within an identical genomic framework3,4,5. For instance, the transcription factor SOX2 participates in the regulation of pluripotency6, specification to the neural lineage7 and adult tissue homeostasis8. The non-overlapping spectra of genes targeted by this trans-acting factor clearly influence the functional differences between pluripotent stem cells (PSCs) and neural progenitors, and they are regulated, in part, through lineage-specific cofactors5,9.

Although the combinatorial expression of master transcriptional regulators constitutes a simple and plausible model for the control of diverse cell functions during development, the genomic distribution of transcription factor-binding sequences (motifs) or the co-expression of factors within a given cell type are surprisingly imprecise predictors of target loci or transcriptional output. These limitations hinder efforts to identify sets of factors that reprogramme one cell type to another, which instead generally requires systematically testing known regulators to identify a sufficient combination, as initially performed by Takahashi and Yamanaka10,11,12. During development, the genomic occupancy of master transcriptional regulators and their cofactors is often restricted to local 'nucleosome-free regions', in which cis-regulatory sequences are not occluded by chromatin, making it difficult to study the stepwise coordination of cell type-specific regulatory elements from static snapshots13,14,15. Recent efforts to comprehensively map regulatory networks across mammalian cell types demonstrate the complex interplay between transcriptional regulators and the local epigenetic environment as they cooperate to direct cellular identity16,17. This integration of transcriptional and chromatin state data encompasses empirically determined definitions of cell function, lineage and developmental potential to delineate a precise molecular genomic state.

Reprogramming the somatic cell state

Although epigenomic annotation efforts have compiled extensive information about the genomic abundance, location and function of regulatory sequences that underlie natural developmental transitions, they are inadequate to predict the potential of experimentally directed reprogramming. Ontogeny is sequential and spatiotemporally controlled, such that prior cellular states influence the activation of subsequent molecular programmes. By contrast, direct reprogramming of differentiated cells to pluripotency proves that, despite marked stability, cell fate is not irreversible and need not depend on lineage history. Classically, ectopic introduction of reprogramming factors in somatic cells only generates iPSCs after an extended latency of one or several weeks at a quantifiable, but low, frequency (Fig. 1). However, populations of intermediate cells can be isolated and assessed by molecular profiling or screened for emerging heterogeneity over the experimental time course. These experimental features have been repeatedly utilized to uncover insights into the nature of somatic cell identity by distinguishing ectopic factor activities that can function within pre-established, responsive regulatory programmes from those that must surmount the more complex, epigenetically constrained barriers that are imposed during differentiation.

Initial effects on the somatic epigenome. Early studies of the reprogramming process noted that ectopic introduction of OSKM into differentiated cells results in population-wide morphological changes and a loss of somatic identity markers, as well as a rapid reduction in cell size coupled with increased proliferation18,19,20,21,22,23. Notably, this initial 'de-differentiation' response appears to be largely unstable, such that induced cells will reactivate somatic state-specific markers if reprogramming factors are prematurely removed; similarly, isolated 'intermediate' cell states that divide continuously but are not pluripotent require persistent OSKM expression19,24,25,26 (Fig. 2). The inability of OSKM to rapidly alter the differentiated state can be explained by epigenetic regulatory mechanisms that preserve global features of cellular identity during mitosis in the absence of genetically targeted activating or repressive cues (see Supplementary information S2 (box)). Indeed, global profiling of epigenetic modifications such as DNA methylation and the trimethylation of histone H3 Lys4 and Lys27 (H3K4me3 and H3K27me3, respectively) indicate very limited epigenetic remodelling following factor induction in mouse embryonic fibroblasts (MEFs) except in very specific contexts18,26,27,28,29. Thus, the general structure of the somatic genome is preserved in perpetually dividing, factor-dependent intermediate-state cells, further demonstrating the remarkable stability of epigenetic modifications over weeks-long experiments18,26,27,28.

Figure 2: Direct reprogramming traverses stable somatic and pluripotent state boundaries.
figure 2

The ectopic expression of lineage-altering transcription factors in somatic cells does not immediately result in the establishment of another cell state (known as transdifferentiation). For example, the ectopic expression of myoblast determination protein (MYOD) in fibroblasts can activate muscle-specific genes, but complete conversion to a muscle cell does not occur in all cell types2. Endogenous MyoD is not activated, suggesting that the myogenic programme is dependent on sustained expression of exogenous MYOD221. Moreover, genes expressed within the original cell type are not always fully suppressed, and their expression sometimes persists despite the acquisition of some myogenic markers. These 'myoblast-like cells' have distinct morphological features and gene expression programmes resulting from local reprogramming of select targets, and not from global epigenetic reprogramming, as is observed when directing cells to pluripotency. Initially, ectopic expression of OCT4, SRY-box 2 (SOX2), Krüppel-like factor 4 (KLF4) and MYC, collectively referred to as OSKM, induces unstable modulation of the original somatic state, with local rather than global epigenetic changes that depend on sustained OSKM expression and thus do not represent complete nuclear reprogramming. Mesenchymal genes are initially repressed by OCT4 and SOX2, and the cell cycle is accelerated by MYC, after which epithelial gene expression programmes are induced, in part by KLF4. At either of these stages, OSKM-dependent, non-pluripotent intermediates can be isolated as perpetually dividing cell lines. Neither transdifferentiation nor early direct reprogramming establish new state characteristics in addition to those prescribed as somatic. Reprogramming cells enter pluripotency after a sufficient number of key regulators have been activated, presumably downstream of a single switch-like event. During consolidation, cells undergo complete nuclear reprogramming and acquire many features that are specific to pluripotency, including complete embryonic potential, establishment of bivalent domains at promoters of developmental genes, silencing of retroviral vectors, insensitivity to loss of global repressors such as DNA methylation, and X chromosome reactivation in females. Alternative somatic fates can be induced by OSKM, including neural stem cells (NSCs) or cardiomyocytes, but require support from exogenous factors and transient passage through pluripotency, as measured by X chromosome reactivation and the silencing of viral vectors. iPSC, induced pluripotent stem cell.

PowerPoint slide

Mechanisms for the mitotic inheritance of epigenetic modifications provide a robust, transcription factor-independent mode of maintaining nuclear homeostasis. As a consequence, foreign transcription factor activity is largely restricted to loci where stabilizing repressive mechanisms are either not in play or can be effectively reversed. Conversely, induction of OSKM expression in MEFs results in the rapid loss of thousands of distal, state-specific enhancers27, presumably in response to the downregulation of corresponding transcription factors18,30. Immediate transcriptional responses to ectopic OSKM expression are also limited in scope, occurring largely at accessible, H3K4me3-modified promoters18,27. Nonspecific alteration of global somatic chromatin states can assist in removing epigenetic memory that buffers against the activities of OSKM. For example, the modifications H3K79me2 and H3K36me3 are canonically distributed over the bodies of transcribed genes and interfering with their regulation supports iPSC generation by eliminating a potent memory of prior transcriptional activity in the pre-induced somatic cell31,32. Similarly, the H3K27me3 demethylase UTX facilitates the induction of pluripotency-associated genes that are repressed by this modification during early development33. In each of these cases, a specific epigenetic pathway actively participates in the erasure of somatic memory at target genes. More globally, the broader epigenetic maintenance of somatic nuclear identity is preserved until iPSCs emerge, after which the majority of these modifications are reset by the unique regulatory characteristics of the pluripotent state (see below).

Transcriptional changes during early reprogramming. The epigenetic robustness of the somatic nuclear state initially limits reprogramming in response to ectopic OSKM, but transcriptional programmes that function as modules within multiple cell types during development seem to be particularly sensitive to induction. For instance, cellular proliferation and mitogen sensing are actively regulated in somatic cells, and promoters of genes involved in these processes are generally H3K4 trimethylated and primed for expression34. The earliest phenotypic and molecular changes following OSKM expression include an immediate increase in proliferation that is decoupled from normal homeostatic growth control23. MYC is broadly expressed across cell types, such that its role during reprogramming probably stems from its ectopic expression from a strong constitutive promoter, and participates in a largely separate regulatory network from OCT4 and SOX2 in pluripotent cells28,35. Upon OSKM induction, MYC preferentially targets open, accessible sites, primarily at core promoter sequences, where it promotes the transition from initiating to elongating forms of RNA polymerase II (Pol II) to enhance transcription at responsive cell cycle genes27,28,36,37. Similarly, the early response to OSKM includes a metabolic switch from oxidative phosphorylation to glycolysis that is directed, in part, through MYC and is a commonly observed feature of transformed cells, such as cancer cells, as well as of pluripotent and adult stem cells38,39,40,41,42.

OSKM induction also triggers immediate downregulation of somatic identity genes, including those that are characteristic of mesenchymal cells (Fig. 2). The sensitivity of somatic genes to OSKM corresponds with the disassembly of distal enhancers and is a general characteristic of reprogramming cells, regardless of somatic origin, making it unclear whether this preliminary de-differentiation proceeds through the direct action of OSKM or by some other, indirect mechanism18,27. Following general, and reversible, gene suppression, mesenchymal cells activate a previously silent epithelial programme that shares some features with early embryonic cells and their in vitro counterparts25,43. A crucial step in gastrulation and mesodermal germ layer differentiation, as well as in later developmental stages, is the conversion of polarized, non-motile epithelial cells to a mesenchymal phenotype44. This epithelial-to-mesenchymal transition (EMT) is actively imposed by transcriptional regulators such as SNAIL1, SNAIL2, zinc-finger E box-binding homeobox 1 (ZEB1) and ZEB2, which repress promoters of epithelial genes such as CDH1 (E-cadherin) to support a mesenchymal expression programme44. Additionally, signalling pathways such as the transforming growth factor-β (TGFβ) pathway promote EMT, in part by activating transcriptional regulators through SMAD phosphorylation45. During reprogramming, the loss of mesenchymal-supporting genes alleviates active repression of the epithelial programme, particularly by OCT4 and SOX2 inhibition of Snail transcription, as well as by inducing the expression of Zeb2-targeting microRNAs25,46. Simultaneously, KLF4 directly induces epithelial genes such as Cdh1 (Ref. 43). The somatic cell response to ectopic OSKM expression cooperates with exogenous growth factors: ectopic MYC downregulates TGFβ receptors, whereas bone morphogenetic proteins (BMPs) promote early induction of mesenchymal gene-suppressing microRNAs25,43. Inhibiting TGFβ signalling enhances reprogramming from mesenchymal cell types and can substitute for the role of MYC and SOX2 in suppressing EMT-supporting factors47,48. The early induction of epithelial genes therefore reflects diminishing repression by a continuously utilized genetic programme that is enforced through growth factors. Accordingly, epithelialization represents an inherently accessible potential that differs from the activation of the true pluripotency network. In keeping with this idea, many epithelial genes, including several low-stringency markers that are expressed in, but not specific to, pluripotent cells can be induced independently and with far higher efficiency during early reprogramming20,25,43,49.

Induction and consolidation of pluripotency. Pluripotency itself is stabilized late in the reprogramming process and leads to independence from the expression of ectopic factors50 (Figs 1,2). Accumulating evidence suggests that this final transition occurs in a switch-like manner following activation of a few key genes, which subsequently re-establish a complete, self-sustaining regulatory network. Transcriptional analysis of single reprogramming cells divides the process into an initial, ectopic OSKM-dependent stochastic period, followed by a more deterministic phase established upon endogenous Sox2 activation51. Subsequently, the stepwise re-establishment of the pluripotency network proceeds through the induction of a minimal set of genes and requires simultaneous silencing of ectopic OSKM expression52. The endogenous activation of this gene subset may therefore represent the theorized rate-limiting step that must be overcome in order for reprogramming cells to transition fully into a stably self-renewing pluripotent state53.

Although the activation of key effectors may qualify as a determining event in the generation of iPSCs, it is insufficient to qualify cells as functionally pluripotent. Pluripotent cells not only self-renew but must also maintain the ability to respond to multiple developmental cues and generate all organismal cell types, as well as sustain an epigenetic memory of this potential as they divide. Establishing functional pluripotency appears to require additional downstream events beyond the primary induction of target genes, including molecular features that cannot be evaluated by transcriptional output alone54,55,56. These include the erasure of somatic DNA methylation signatures18, activation of the silent X chromosome in female cells57, and the re-establishment of bivalent histone modifications at developmental genes18,58 (Fig. 2). Notably, the re-establishment of epigenetic modifications associated with pluripotency requires additional cell divisions in the absence of ectopic OSKM expression, such that alternative differentiation states may be acquired by culturing reprogramming cells in the presence of specific exogenous growth factors59,60,61,62,63.

Molecular features of pluripotency

Successful reprogramming culminates in the establishment of stable, self-renewing and functionally pluripotent stem cell lines58,64. Although pluripotency exists only transiently during early embryonic development in vivo, the derivation of PSC lines that can be stably propagated in vitro has provided a powerful model for many developmental processes, including the mechanisms regulating downstream lineage restriction, commitment and eventual maintenance of a terminally differentiated state65. The crucial transcriptional regulators of pluripotency, OCT4 and NANOG, in cooperation with SOX2, were identified through genetic studies demonstrating their role in embryonic development and pluripotency maintenance both in vivo and in vitro65,66,67,68. Subsequent genome-wide localization and interaction analyses revealed that OCT4, SOX2 and NANOG bind to and regulate pluripotency-specific genes, often co-occupying the same target loci to form a regulatory circuit consisting of both feedforward and autoregulatory loops that maintain their own expression, as well as that of other key genes50. Unlike somatic cell states, this network must also maintain an extensive and unbiased differentiation potential but remain sensitive enough to integrate opposing differentiation cues efficiently and robustly. Active suppression of lineage specification is a key feature of pluripotency that is only partially controlled through the direct action of OCT4, SOX2 and NANOG, or their immediate effectors50. The consolidation of additional sequence-specific transcription factors, signalling pathways and chromatin modifiers during the final stages of direct reprogramming can be carefully parsed into stepwise molecular pathways to understand the unique developmental potential of pluripotent cells and how it is restored.

Bivalence of developmental genes. Pluripotent cells are unique in that they must suppress multiple developmental pathways comprising thousands of genes while preserving their responsiveness to specific differentiation cues. Canonically, this dual regulation converges on the opposing functions of repressive Polycomb group (PcG) and transcription-associated Trithorax group (TrxG) proteins at CpG island-containing promoters, which establish chromatin with bivalent H3K27me3 and H3K4me3 modifications, respectively69,70,71,72. Within pluripotent cells, bivalent domains are prevalent at developmental gene promoters and provide a molecular analogy for cellular potential, as most subsequently resolve to either an expressed, TrxG-regulated or a repressed, PcG-regulated state, according to developmental trajectory72,73,74. Bivalent chromatin is functionally relevant to proper development: interfering with the machinery that maintains these dual modifications often results in aberrant or impaired differentiation75. In the G1 phase of the cell cycle, sensitivity to extracellular developmental cues favours imbalanced enrichment of H3K4me3 and, eventually, induction of gene expression76. In PSCs, many distal enhancers exist in a 'poised' state (marked by H3K4me1 or H3K4me2) and interact with cognate developmental gene promoters, subsequently acquiring repressive H3K27me3 owing to interactions with PcG at the CpG island77,78. When triggered to differentiate, p300 acetylates H3K27 (H3K27ac), which stabilizes Pol II and H3K4me3 at the promoter, destabilizes PcG, and activates the gene77.

Bivalent signatures do not appear to be re-established at developmental genes until late in the reprogramming process18, an observation consistent with the idea that they serve as molecular markers for functional pluripotency79. However, many lineage-specifying genes and those with dual roles in development and pluripotency gradually accumulate H3K4 methylation directly at their CpG island-containing promoters during reprogramming, resulting in local epigenetic remodelling at previously repressed, H3K27me3-only loci27,56. Evidence suggests that preliminary local remodelling may be carried out by binding of reprogramming factors to hypomethylated, distal cis-regulatory sequences, followed by H3K4me3 deposition at corresponding, constitutively unmethylated promoters, seemingly without Pol II recruitment or gene expression27,80. Local and reciprocal depletion of H3K27me3 is carried out by UTX, suggesting that even terminally bivalent genes may require transient activation to restore their inductive potential27,33. The assembly of bivalent domains during reprogramming remains incompletely understood, but it represents a valuable assay to molecularly characterize the establishment of unrestricted developmental potential that defines pluripotent cells.

Recent studies have added multiple tiers of regulatory parameters to bivalent domains beyond their original characterization as dually H3K4 and H3K27 methylated chromatin. PSCs cultured with two small-molecule kinase inhibitors (2i) and leukaemia inhibitory factor (LIF) are broadly depleted of H3K27me3 at bivalent promoters without affecting the repression or induction potential of the corresponding gene81. In these precise culture conditions, H3K27me3 appears to diffuse over the genome, possibly to compensate for global DNA hypomethylation (see below)81,82. Embryonic stem cells (ES cells) lacking the Polycomb repressive complex 2 (PRC2) factor EED also do not show impaired repression of developmental genes in the 2i/LIF condition, but they do exhibit spontaneous differentiation and mis-expression of early lineage-specifying factors in less-defined culture conditions or in ES cells derived from the post-implantation epiblast83,84,85. It now seems that the presence of dual modifications themselves may not sufficiently define functional bivalence (the ability to stably maintain multiple developmental pathways in a repressed but labile state) and may be more specific to later developmental stages of pluripotency that are primed for differentiation85,86,87 (Fig. 3a,b). Alternatively, H3K4me3 and H3K27me3 may primarily serve as indicators of a larger regulatory module of poised repression with specific functions in the terminal stages of early lineage commitment88,89,90,91,92 (see Supplementary information S3 (box)).

Figure 3: Molecular features of pluripotent cells.
figure 3

a | Pluripotent cells include several distinguishable molecular states: naive or ground state cells propagated in 2i/LIF (two small-molecule kinase inhibitors and leukaemia inhibitory factor) share features with the inner cell mass or early epiblast, whereas primed state cells cultured in fibroblast growth factor (FGF) and activin resemble the late epiblast. Both states are functionally pluripotent, but they can be distinguished at the molecular level by multiple criteria and by their supporting growth conditions. b | Pluripotent cells exhibit unbiased developmental potential, which is epigenetically encoded by balancing activating and repressive inputs at developmental gene promoters. Mixed lineage leukaemia (MLL) complexes (which are the mammalian orthologues of Trithorax) and Polycomb repressive complex 2 (PRC2) deposit histone 3 Lys4 trimethylation (H3K4me3) and H3K27me3, respectively, to hold promoters in a facultative state (middle). H3K27me3 recruits canonical PRC1 complexes to monoubiquitylate H2AK119 (H2AK119ub). As pluripotent cells differentiate, such bivalent domains are modified to favour either the H3K4me3-only or H3K27me3-only states (right). H3K27me3 is more diffusely distributed over the genome in naive state cells (not shown). However, bivalent domains can be established de novo by the direct recruitment of non-canonical PRC1 complexes (PRC1*) to unmethylated CpG islands by the CXXC domain-containing subunit Lys-specific demethylase 2B (KDM2B), which may maintain local repression under these conditions (left). c | Naive cells can be distinguished from primed cells by their capacity for stable self-renewal in the absence of genome-wide epigenetic repressors, such as DNA or H3K27 methylation, although differentiation is then impeded. When these repressors are removed, non-naive cells acutely die. d | Many species-specific endogenous retroviruses (ERVs) are dynamically regulated in pluripotent cells. Left: some retroelements are expressed through the binding of OCT4 and SRY-box 2 (SOX2) to cis-regulatory motifs within their long terminal repeat (LTR) promoter. Middle: sequence-specific Krüppel-associated box (KRAB) zinc-finger proteins (ZFPs) interact with tripartite motif-containing protein 28 (TRIM28) and SETDB1 to establish repressive H3K9me3, which is subsequently recognized by the H3.3 chaperone complex ATRX–DAXX to support silencing. Many ERVs exhibit dynamic DNA methylation carried out by the de novo DNA methyltransferase DNMT3A and by the non-catalytic cofactor DNMT3L. Sequence-specific de novo ERV repression is largely specific to the pluripotent state. Right: following lineage commitment, maintenance of gene silencing is carried out epigenetically. Pol II, RNA polymerase II.

PowerPoint slide

The prevalence of unmethylated CpG islands at poised promoters suggests that this feature may serve as the essential cis-regulatory template to instruct assembly of a bivalent epigenetic state93,94. Indeed, the PcG and TrxG complexes are recruited to CpG island-containing promoters of developmental genes by subunits that recognize unmethylated CpGs94,95. In the canonical model of PcG-based repression, PRC2 deposits H3K27me3, which functions as an epigenetic template for chromobox (CBX)-containing PRC1 complexes to trigger H2AK119 monoubiquitylation (H2AK119ub) and chromatin compaction96. However, H2AK119ub may also have a role in recruiting PRC2 and initiating de novo silencing, providing at least one alternative pathway of PcG-based repression that does not depend on upstream H3K27me3 deposition. In this model, non-canonical PRC1 complexes directly bind to unmethylated CpG sequences through the CXXC domain-containing protein Lys-specific demethylase 2B (KDM2B) to establish bivalency before PRC2 recruitment and activity97,98,99,100 (Fig. 3b). How these mechanisms are reassembled during reprogramming, and in what order, is yet to be explored systematically.

Self-renewal in the absence of epigenetic repressors. The pluripotent state is also distinguished by the robust maintenance of its transcriptional network and self-renewal capacity in the absence of repressive chromatin modifications, whereas early post-implantation embryos or non-pluripotent cell types are inviable79,101 (Fig. 3c). This unusual insensitivity is unique to one of two developmentally distinct classes of pluripotent states that, until recently, had only been isolated in mouse102. naive or ground state cells cultured in 2i/LIF conditions broadly resemble cells of the inner cell mass (ICM) or early epiblast and are canonically associated with mouse ES cells. Alternatively, primed state cells more generally exhibit features associated with the post-implantation epiblast and share molecular similarities with human ES cells103,104. The unique tolerance of pluripotent naive cells for the loss of epigenetic repressors can be used to select for iPSCs by eliminating cells that have not been successfully reprogrammed, or to facilitate late-stage reprogramming events by establishing pluripotency-like global chromatin features18,105. Notably, repression-deficient PSCs do not show substantial changes to their self-sustaining transcriptional network, but their developmental potential is severely affected and hence their functional pluripotency is lost83,106,107,108,109,110,111. When DNA methylation is globally depleted in mouse ES cells, they cannot differentiate into embryonic germ layers but gain the capacity to differentiate into extra-embryonic tissues112,113. Similarly, mouse ES cells lacking the histone H3K9 methyltransferase SETDB1 are unstable by virtue of spontaneous extra-embryonic differentiation, which may reflect derepression of endogenous retroviral elements that function as enhancers or promoters for placental genes114,115,116,117. Despite the widespread and profound epigenetic changes that are induced by loss of epigenetic repressors, the decoupling of functional developmental potential from self-renewal can be reversible; restoring repressors often rescues the ability to differentiate, further highlighting the necessity of epigenetic silencing mechanisms in this process108,109.

PSCs cultured in 2i/LIF also seem to lose or redistribute repressive epigenetic modifications, specifically global DNA methylation and H3K27me3 (Refs 81, 84, 118, 119, 120, 121), although how these changes affect differentiation is not yet completely understood. Some of the earliest transcriptional responses to 2i/LIF withdrawal include induction of genes encoding epigenetic repressors such as DNA methyltransferase 3b (Dnmt3b), Dnmt3l and jumonji and AT-rich interaction domain-containing 2 (Jarid2), indicating the rapid establishment of a differentiation-competent epigenome81,84,118,119,120,121. Consistent with the requirement for epigenetic repression in somatic cells, reprogramming in the presence of 2i from the outset is severely limited, whereas switching to 2i conditions at later time points can promote iPSC generation, similar to what is observed following the targeted depletion of global repressors122. Culturing reprogramming cells with the glycogen synthase kinase-β (GSK3β) inhibitor component of 2i, and with ascorbic acid, which promotes global DNA demethylation, results in rapid, homogeneous iPSC generation from somatic or factor-dependent intermediate states123,124. Intriguingly, human ES cells or mouse epiblast stem cells do not tolerate the global loss of DNA methylation, indicating that there are fundamental regulatory differences between PSCs that correspond to unique developmental periods, despite their similar self-renewal properties and shared expression of many canonical pluripotency regulators85,87.

Dynamically regulated retrotransposons. In their original reprogramming screen, Takahashi and Yamanaka introduced transcription factors into fibroblasts using retroviral vectors derived from the Moloney murine leukaemia virus (M-MuLV)1. This approach specifically exploited a key distinguishing feature between somatic and pluripotent cell states: the targeted silencing of retroviral long terminal repeats (LTRs) in early embryos and ES cells125 (Fig. 3d; see Supplementary information S4 (box)). With this strategy, emerging iPSC colonies intrinsically switch off transgene expression and propagate indefinitely without further support from the OSKM factors.

The targeted silencing of LTRs in pluripotent cells has only recently been appreciated as reflecting a fundamental principle of genome regulation during mammalian development126. Cumulatively, approximately 40% of the mouse and human genomes are of retrotransposon origin, primarily derived from long and short interspersed nuclear elements (LINEs and SINEs, respectively) or LTR-containing endogenous retroviruses(ERVs)127,128. Although each class exhibits unique modes of retrotransposition and has evolved exceedingly divergent, species-specific elements, their regulation seems to be largely conserved129. Generally, primary silencing is initiated by sequence-specific, Krüppel-associated box domain-containing zinc-finger proteins (KRAB-ZFPs), which interface with SETDB1 to direct repression130. Interestingly, KRAB-ZFPs evolve continuously to counteract the emergence of new retro-elements and represent a general mechanism for genomic surveillance117,130,131,132,133. Downstream of primary targeting, numerous germline-associated genes expressed in mouse ES cells establish epigenetically heritable repressive states that can be maintained in differentiated cells, where sequence-specific regulators may be absent113.

Although the pluripotent state is usually associated with the active repression of retrotransposons, their expression is surprisingly dynamic and involves interactions between activating and repressive inputs that are not restored during reprogramming until after iPSCs emerge134,135. In both mouse and human, OCT4, SOX2 or NANOG cis-regulatory sequences encoded within the promoters of species-specific ERVs restrict their activity to early embryonic states, which supports their progressive radiation to new genomic positions through the host germ line136,137. As new integrations arise, these elements can be subsequently co-opted to function as enhancers or alternative promoters for numerous genes, including a majority of state-specific non-coding RNAs that can be essential to pluripotency and experimentally included to improve reprogramming efficiency135,138,139,140. DNA is only heterogeneously methylated at loci of many repetitive element classes, and this methylation depends on the continuous activity of de novo DNMTs113. In mouse ES cells, specific classes of ERVs are expressed within a small population of cells that also exhibit low global levels of repressive chromatin modifications and demonstrate an expanded extra-embryonic potential when injected into pre-implantation embryos141. Notably, this extra-embryonic-contributing, retrotransposon-expressed state and the embryonic-restricted, retrotransposon-repressed state are reversible, similar to the dynamic cellular heterogeneity of many pluripotency-associated transcription factors142,143,144,145,146. This reversibility appears to reflect the fluidity of locus-specific chromatin architecture in pluripotent cells, as ERV induction corresponds to higher global levels of DNA replication-independent histone H3 variant 3.3 (H3.3)-containing chromatin147. Reciprocally, ERV repression is, in part, directed though the H3.3-specific ATRX–DAXX chaperone complexes (Fig. 3d), indicating that DNA replication-independent histone exchange and retrotransposon expression may be required to initiate silencing148,149,150. Thus, the dynamic regulation of genomic repetitive elements represents another specific feature of the pluripotent state that has a direct impact on the developmental potential of each cell within the population.

Developmental barriers to reprogramming

Differentiation involves inactivation of the pluripotency transcriptional network and activation of lineage-specifying transcriptional programmes. As cells proceed through this process, they become responsive to extracellular cues to specify early embryonic lineages. As part of this transition, distally located, poised enhancers are activated by cofactor-directed recruitment of the histone acetyltransferase p300 to instruct the resolution of bivalent promoters77,78. Simultaneously, promoters and enhancers of the pluripotency state are shut down and heterochromatinized151,152. Unlike bivalent, differentiation-associated genes, which are held in an activation-poised state, many pluripotency-associated genes have different promoter features and utilize distinct silencing mechanisms that affect their reactivation potential during reprogramming79.

The removal of exogenous pluripotency-supporting culture conditions triggers the downregulation of key factors and destabilizes the self-sustaining pluripotency network, thereby initiating the major transition through which cells become committed to differentiate153. Most bivalent promoters resolve to either active H3K4me3-only or repressive H3K27me3-only states, according to their specific regulatory programme72. Alternatively, stable shutdown of the pluripotency network is ensured by silencing core regulators such as Oct4 and Nanog through targeted H3K9 methylation followed by DNA methylation113,151,154. The more permanent modes of epigenetic silencing at specific pluripotency gene promoters, including many germline genes, may guard against their accidental re-activation in somatic tissues, an event with potentially oncogenic consequences155. Alternatively, genes such as Sox2 and Klf4, which may remain expressed in a lineage-dependent manner or be re-induced in later developmental programmes, are instead silenced by deposition of H3K27me3 at their CpG island-containing promoters8,156,157. DNA methylation is highly dynamic during differentiation in mouse and human ES cells, in particular during the first transition from pluripotency to a lineage-committed state73,74,158. Early commitment is also accompanied by a global shift in nuclear organization, including focal accumulation of heterochromatic protein 1 (HP1) and H3K9me3-modified heterochromatin, leading to extensive chromatin compaction and a reduction in nucleosome turnover159. Substantial changes in metabolic programmes and cell cycle regulation are also coordinated during this transition, resulting in a prolonged G1 phase42.

Dynamic regulation and disassembly of active enhancers. In PSCs, the activity of OCT4, SOX2 and NANOG is concentrated at super-enhancers that interface with target genes through the Mediator and cohesin complexes to control their expression70,160,161 (Fig. 4a). The super-enhancer architecture is acutely sensitive to cellular state, and disruption of Mediator activity or downregulation of master pluripotency factors leads to rapid downregulation of associated genes161. Transcriptionally permissive chromatin at pluripotency-associated enhancers is progressively disassembled in a stepwise manner, beginning with the removal of histone modifications, followed by encroachment of nucleosomes, and culminating in the methylation of repression-associated histone residues and of DNA152,162 (Fig. 4b,c).

Figure 4: Differentiation establishes epigenetic barriers to reprogramming.
figure 4

a | Within pluripotent cells, multiple enhancers bound by OCT4, SRY-box 2 (SOX2) and NANOG form a super-enhancer that engages a target promoter and is held in this topological configuration by the Mediator and cohesin complexes. Transcription factor binding preserves a nucleosome-free region, with adjacent phased nucleosomes acetylated by p300. In the pluripotent state, these enhancers also recruit the repressive nucleosome remodelling and deacetylase (NuRD) complex, which contains the histone H3 Lys4 (H3K4) demethylase LSD1, the core component methyl CpG-binding domain protein 3 (MBD3) and histone deacetylases. This balance of activating and repressive inputs favours gene activation and preserves the expression of the target gene. b | Removal of exogenous pluripotency-supporting conditions leads to the downregulation of OCT4, SOX2 and NANOG and triggers super-enhancer disassembly. The loss of transcription factor binding strongly favours the activity of repressive inputs. At the enhancer, NuRD (or possibly other ATP-dependent chromatin remodellers) establishes nucleosomes at the previous nucleosome-free region and removes histone acetylation and methylation modifications associated with transcriptional activity. Loss of enhancer–promoter association results in a similar disassembly of promoter-associated activating modifications, such as H3K4 trimethylation (H3K4me3) and H3K36me3, by Polycomb repressive complex 2 (PRC2)-associated histone demethylases KDM5A and KDM2B (see also Supplementary information S3 (box)). c | Repressive epigenetic modifications are established following the erasure of the previous chromatin state. At enhancers, these repressive modifications include H3K9me2 and possibly H3K9me3 deposition by G9A in complex with G9A-like protein (GLP) and SETDB1, respectively, as well as DNA methylation by DNA methyltransferase 3A (DNMT3A). At promoters, PRC2 deposits H3K27me3, which recruits the canonical PRC1 complex to monoubiquitylate H2AK119 (H2AK119ub) and initiate chromatin compaction. These new modifications then serve as epigenetic templates to preserve repression during subsequent cell divisions. The schematic depicts silencing pathways both at a distal regulatory element and in the associated CpG island-containing promoter of a pluripotency gene, such as Sox2 or Krüppel-like factor 2 (Klf2). Many essential pluripotency genes, such as Oct4 and Nanog, do not have CpG island-containing promoters and, in these cases, the mechanisms of promoter silencing largely resemble those observed for low-CpG density, distal regulatory elements. K27ac, Lys27 acetylation; Pol II, RNA polymerase II.

PowerPoint slide

The sensitivity of pluripotency enhancers to disassembly seems to reflect opposing inputs between activating, transcription factor-guided recruitment of histone acetyltransferases and the repressive activity of the nucleosome remodelling and deacetylase (NuRD) complex163,164,165. Diminished OCT4 binding in the earliest stages of differentiation favours decommissioning of pluripotency-associated enhancers through the activity of the NuRD subunit LSD1, an H3K4 demethylase162. The NuRD complex has a central role in the removal of permissive chromatin marks and in local chromatin remodelling, which temporally precedes the establishment of heterochromatic modifications166,167. Methyl CpG-binding domain protein 3 (MBD3) seems to be an essential component for NuRD assembly and recruitment162,168. Following the loss of activating epigenetic modifications, ATP-dependent chromatin remodellers — either NuRD or members of the BRG1-associated factor (BAF) chromatin remodelling complex (which is the mammalian equivalent of SWI/SNF) — change the architecture of the enhancer to occlude previously nucleosome-free DNA into chromatin as a template for repressive modifications, such as methylation of H3K9 and DNA152,169.

When initially introduced into somatic cells, OSKM seem to be insufficient to re-establish the balance between the activity of repressive and permissive chromatin modifiers that regulate target loci in pluripotent cells. The ability of regulatory elements to support the transcriptional activity of pluripotency genes is initiated by OSK, which can find and engage select sequences in closed chromatin (see below). However, epigenetic repressors that are expressed in both somatic and pluripotent cells, specifically NuRD, may be inadvertently recruited and impede transcriptional activation of target loci during early reprogramming170. Indeed, perturbing the NuRD complex can substantially accelerate the kinetics and improve the efficiency of iPSC generation, presumably by eliminating counterproductive and premature recruitment of this repressive input at a stage when opposing activators remain absent170,171,172,173. Once pluripotency is acquired, additional essential cofactors may override this repression to support the assembly of active enhancers and stably maintain target gene expression.

Transcription factor binding at inert chromatin. In stably propagating cell types, most transcription factors bind within open chromatin, indicated by the presence of a nucleosome-free region surrounded by highly dynamic, phased nucleosomes174,175,176. However, pioneer factors can directly bind to their cognate DNA motifs, even in compact chromatin, to evict nucleosomes and initiate enhancer activation, whereas additional cofactors may be required to induce transcription following primary locus engagement177. During normal developmental transitions, multiple DNA-binding factors, including those with pioneer activities, are expressed in a coordinated fashion, often leading to near-simultaneous binding, chromatin modification and transcriptional changes that complicate the molecular assignment of each contributing factor into a clear, linear pathway178. Alternatively, direct reprogramming introduces a minimal set of transcription factors into a nuclear environment in which the majority of pluripotent state-specific enhancers are chromatinized and their target loci repressed, which allows intermediate molecular states of locus activation to be isolated and characterized26,27,28,56,173.

Ectopically expressed OSKM must engage a somatic genome in which the majority of their target enhancers are epigenetically silenced (Fig. 5a). Consequently, OSKM initially only bind to a minimal number of pluripotency-associated targets, and they seem to do so cooperatively at nucleosomal DNA that lacks obvious histone modifications and contains recognition motifs for OCT4, SOX2 and KLF4 (Refs 28, 179). Similar loci containing repressive modifications seem to be intransigent to factor binding179. Canonical pioneer factors like Forkhead box protein A2 (FOXA2; also known as HNF3β) identify their binding motifs within chromatin and possess a DNA-binding domain with structural similarity to linker histones that outcompetes nucleosomes to initiate a region of open chromatin177. Although OSKM lack similar structural features, recent evidence indicates that OCT4, SOX2 and KLF4 co-bind to shared somatic targets through a cooperative, pioneer-like activity involving combinatorial binding to outwardly facing partial motif sequences within the nucleosome180 (Fig. 5b). Binding initiates preliminary chromatin remodelling through the deposition of H3K4me1 and H3K4me2 and, through additional steps that are not yet clear, eventual induction of target genes179,180.

Figure 5: Transcriptional activation of silent pluripotency genes.
figure 5

a | A schematic example of a CpG island-containing, pluripotency-associated gene as it is activated by a cognate distal regulatory element (enhancer) that includes recognition motifs for OCT4, SRY-box 2 (SOX2) and Krüppel-like factor 4 (KLF4), collectively known as OSK. In somatic cells, CpG island-containing promoters of some pluripotency genes are epigenetically modified with histone H3 Lys27 trimethylation (H3K27me3) and H2AK119 monoubiquitylation (H2AK119ub). Many distal regulatory elements are highly DNA methylated, whereas others have lower methylation levels but are nonetheless occluded with nucleosomes that lack obvious histone modifications associated with repression. b | At select enhancers, OCT4, SOX2 and KLF4 cooperate to bind to partial motif sequences to direct pioneer factor-like activity. The exact kinetics of nucleosome exchange or identity of additional cofactors is not fully understood, but these sites do undergo preliminary epigenetic remodelling, including H3K4me1 and H3K4me2 deposition. Initial binding also increases H3K4 methylation at the promoter through interactions with the mixed lineage leukaemia (MLL) component WD repeat-containing protein 5 (WDR5) and co-occurs with local depletion of H3K27me3 by the histone demethylase UTX. During this stage, transcriptional induction of the target gene is not observed, in part because OSK factors interact with the nucleosome remodelling and deacetylase (NuRD) repressor complex, which suppresses efficient reprogramming in the absence of sufficient activating inputs by inhibiting nucleosome eviction or preventing stable acetylation. The nature of the topological interactions between enhancer and promoter during cell reprogramming, or of the stability of preliminary, OSK-bound nucleosome-free regions, are unknown and are highlighted by dashed bidirectional arrows. c | In the final steps of gene activation, additional, unknown factors cooperate with OSK to stably evict nucleosomes and establish a canonical enhancer signature, which includes H3K4me2 and H3K27 acetylation (H3K27ac) and tight topological linkage to the cognate promoter through the Mediator and cohesin complexes. Nucleosome eviction may be stabilized by embryonic stem cell-specific BAF (esBAF), which is a pluripotent state-specific version of the ATP-dependent chromatin remodelling BRG1-associated factor (BAF) complex. At this point, activating inputs override repressive inputs, although the NuRD complex remains associated with enhancers in pluripotent cells. Once select enhancers are sufficiently activated and direct the expression of their target genes, the reprogramming process follows a deterministic consolidation. LSD1, Lys-specific demethylase 1; MBD3, methyl CpG-binding domain protein 3; Pol II, RNA polymerase II.

PowerPoint slide

The majority of reprogramming-related regulatory regions that are targeted by OCT4, SOX2 and KLF4 in PSCs remain unbound during most of the reprogramming process28,179. It remains to be seen whether the few loci at which cooperative binding is observed represent a sufficient subset to permit entry into the pluripotent state upon gene induction, or how enhancers that do not contain an appropriate configuration of motifs are nevertheless activated. A cooperative pioneering activity could explain notable improvements to reprogramming efficiencies when additional pluripotency-associated transcription factors (such as NANOG, SAL-like protein 4 (SALL4), oestrogen-related receptor-β (ESRRB), nuclear receptor subfamily 5 group A member 2 (NR5A2) or zinc-finger protein GLIS1) are ectopically expressed, which may broaden the number of cis-regulatory elements that can be accessed in somatic cells53,181,182,183,184,185. Additionally, this model may explain why combinations of different members of the POU, SOX or KLF transcription factor families can adequately replace their canonical member, as some share highly similar recognition motifs180,186,187.

Initial binding by OSK within somatic cells appears to be population-wide and represents only the earliest step in re-activating the endogenous pluripotency network. Moreover, the extended latency between binding and the induction of cognate genes indicates that these preliminary interactions are themselves insufficient. Transcription factor-bound loci can recruit chromatin remodellers such as cell-type specific BAF complexes that phase nucleosomes around the site of transcription factor binding to stabilize a nucleosome-depleted region169,177,188. The constitution of these complexes varies considerably among cell types, and overexpressing components of the ES cell-specific BAF (esBAF) complex during reprogramming enhances iPSC generation189,190,191,192. There is also some evidence that this primary genomic engagement by OSK may still depend on underlying chromatin status, as OCT4 binding in somatic cells occurs preferentially at distal cis-regulatory sequences that lack DNA methylation but are nevertheless nucleosomal152. The constrained contexts that dictate nucleosomal OSK binding in differentiated cells resemble priming by 'fragile nucleosomes' at cis-regulatory elements in yeast, which can be rapidly displaced by transcription factors to direct immediate transcriptional responses193. Similarly, suppressing chromatin assembly factor 1 (CAF1), which is an H3.1-specific, DNA replication-dependent chaperone complex, improves the efficiency and kinetics of iPSC generation194. Presumably, the diminishing presence of H3.1 within heterochromatin expands the number of cis-regulatory sequences that are occluded within more labile H3.3-containing nucleosomes, which must be nonspecifically incorporated as compensation. Following binding, OCT4 initiates chromatin modification at the enhancer and seems to interact with corresponding CpG island-containing promoters to direct targeted H3K4 methylation and H3K27 demethylation27,33,80,195 (Fig. 5b). The exact nature of these interactions is still unclear, but they may correspond to a key intermediate state of enhancer assembly within reprogramming populations that precedes transcriptional activation in the few cells that do generate iPSCs27,179,196.

These preliminary steps in enhancer activation must be followed by the recruitment of co-regulators, including histone methyltransferases and acetyltransferases, the assembly of super-enhancers and the Mediator-dependent topological juxtaposition of enhancers and promoters, although not necessarily in that order (Fig. 5c). Many of these downstream events may require support from DNA-binding factors that are not present until late in the reprogramming process49. Eventually, a full complement of factors must recruit and assemble Pol II pre-initiation complexes and proceed to the stabilization of transcription elongation. Much of the transcriptional reprogramming that is required to consolidate pluripotency post-induction remains to be characterized, although current efforts have illuminated a number of notable molecular features that were previously opaque.

Conclusion

The application of genomic technologies to the study of direct reprogramming has raised substantial new considerations that must be taken into account when defining or manipulating cell states. Whereas cellular identity was previously empirically determined according to functional criteria, the ability to modulate it in a controlled and measurable fashion has reframed these phenotypic observations into a precise molecular definition that includes the composition of regulators as they control or constrain specific genetic programmes. Although reprogramming experiments tend to focus on the reversibility of somatic identity, their results reveal the remarkable resistance of cell state to perturbation: substantial barriers are imposed by cooperative interactions between self-sustaining transcriptional networks as they operate within stable epigenetic landscapes. Similarly, measurable improvements to reprogramming kinetics or efficiency following the modulation of epigenetic features highlight the extent to which chromatin state influences chromatin–transcription factor interactions and target gene expression. Early SCNT experiments revealed the role of epigenetic regulation in determining reprogramming outcomes, but the use of ectopic transcription factors in vitro has provided a more dynamic description of the regulators that coordinate the induction of silent genes. Additionally, the imposed changes to developmental potential that occur during the generation of iPSCs can be used to molecularly distinguish the pluripotent state according to multiple parameters and measure them following the endogenous activation of a sufficient set of master regulators. Although complete descriptions of how these additional features are first initiated and subsequently consolidated have yet to be achieved, direct reprogramming has proven to be an extraordinarily tractable and insightful tool for the systematic dissection of developmental potential, differentiation and cellular states.