Introduction

Embryonic stem cells (ESCs) possess the unique properties of pluripotency—they can both self renew and differentiate into a myriad of other cell types, including an entire animal, upon receiving the appropriate stimulus. Remarkably, these same properties can be conferred upon somatic cells by the overexpression of a few proteins to generate induced pluripotent stem cells (iPSCs) [1]. The observation that cell fate can be reversed to a “blank” state has multiple ramifications for regenerative therapy. If ESCs and iPSCs are truly equivalent, they can be used to overcome the practical and ethical concerns of using ESCs for such therapy. Much progress has been made in delivering the reprogramming factors by integration free methods to make the process more translatable for human therapeutic use [24]. However, the mechanism of reprogramming remains elusive, mainly because even in a culture of cells where each individual cell has incorporated the reprogramming factors, only about 2–4 % complete the process and at variable timescales [58]. This inherent heterogeneity of the process makes it challenging to study.

The reprogramming of somatic cells to iPSCs is a profound change in cell identity, encompassing erasure of the somatic cell program including the mesenchymal-to-epithelial transition (MET), [9, 10], cell cycle [11], metabolism [12, 13], chromatin structure [14] and activation of the pluripotency network [9, 10, 1517]. Herein, we discuss the significant strides that have been made in elucidating when and whether these changes occur throughout reprogramming to specify the acquisition of the pluripotency gene activation.

The Order of Events: Tools and Timescales

Almost all mechanistic studies thus far have been carried out with mouse embryonic fibroblasts (MEFs) as the starting somatic cells for reprogramming. While these studies have improved the overall understanding of somatic cell reprogramming, the lack of studies using diverse cell types means some of this knowledge may be specific to MEFs (e.g. role of MET in the reprogramming of mesenchymal cells). These studies usually take two broad approaches: 1. The molecular profiling of either heterogeneous reprogramming cultures or clonal intermediates, and 2. loss or gain of function of specific candidates or libraries of genes to test if they affect reprogramming efficiency [9, 18, 19, 20•, 2126]. In the past year these approaches have been extended to purified intermediate reprogramming populations [27••, 28••, 29••, 30••], early stages of human reprogramming, and single cell expression profiling of a cohort of important genes.

Previous studies have established a timescale for the sequence of events in reprogramming. Early in the process almost 80 % of the initial MEFs lose somatic cell surface markers (Thy1) [15, 16]. During the middle phase of the process, cells undergoing reprogramming gain intermediate markers of pluripotency such as SSEA1 [15, 16] followed by activation of a pluripotency gene such as Oct4 or Nanog. While these initial studies were performed with the reprogramming factors transduced as lentiviruses or retroviruses, more recent studies have taken advantage of secondary systems [58, 3133] in which every cell in the reprogramming culture can inducibly express the reprogramming factors. Using the methods and signposts of reprogramming described above, in the last year a temporal picture has emerged for when changes in MET, cell cycle, metabolism, the epigenome and activation of pluripotency loci occur and are discussed in greater detail below (Fig. 1).

Fig. 1
figure 1

Major events and barriers in reprogramming events that take place in the early (day 0–3), middle (day 3–9) and late (day 9 onwards) stages of reprogramming are indicated. Days for each phase are according to Polo et al. [27••] Typical cell populations identified in each phase include: Early 30 % Thy1− [15], Intermediate up to 55 % Thy1− with 27 % SSEA1+, and Late 90–99 % Thy1− and SSEA expression variable depending on establishment of iPSC [15, 27••]. Events or gene (italicized) expression recorded in red are down-regulated, while events recorded in green are up-regulated. Events spanning multiple stages of reprogramming are contained within triangles indicating when they occur. Underlined chromatin events occur at select loci, while non-underlined events are global. DNAme is both up- and down-regulated. MET, cell cycle and metabolism data summarized from references [27••, 28••, 29••], Chromatin, data summarized from references [20•, 27••, 30••, 52, 59], activation of the pluripotency network from references [29••, 63••]. MET; mesenchymal-to-epithelial transition, MEF; mouse embryonic fibroblast, iPSC; induced pluripotent stem cells, Pre-iPSC; pre-induced pluripotent stem cells, DNAme; DNA methylation, GDNF/RET; glial cell line derived neurotrophic factor/RET pathway

The Mesenchymal-to-Epithelial Transition

During development, the cells of the embryo undergo an epithelial-to-mesenchyme (EMT) transition to delaminate from the epiblast so as to traverse the primitive streak and differentiate into mesoderm and endoderm [34, 35]. Since embryonic stem cells (ESCs) are isolated from the embryo before this event occurs, they maintain an epithelial nature in culture including expression of Cdh1 (E-cadherin) on their cell surface. However, MEFs are mesenchymal in nature, and a reversal to the iPSCs state involves a mesenchymal-to-epithelial transition (MET). MET is described as one of the first events to occur in the process of forming iPSCs [15, 16]. The Hochedlinger and Krijgsveld groups [27••, 28••] isolated populations of cells that expressed different cell surface markers (Thy1, SSEA1) or were positive for Oct4-GFP expression at various timepoints [day 3 (d3), d6, d9 and d12] and performed transcriptomic, ChIP-Seq and proteomic analysis to thoroughly characterize these intermediate stages. At both the transcriptional and proteomic levels, they observed that genes associated with maintaining the mesenchymal state such as Snai1 are down-regulated by day 3 of reprogramming while epithelial associated genes such as Cdh1 were up-regulated [27••, 28••]. The Jaenisch group [29••] took a complementary approach. Instead of characterizing the purified intermediate populations, they chose select genes involved in MET, cell cycle and pluripotency and performed a single cell transcriptome analysis to record reprogramming in individual cells. They demonstrated that at the earliest time point they analyze (d6), cells contain both Snai1 and Cdh1 transcripts, indicating the process is a transition and a cell does not have to completely lose mesenchymal gene expression before gaining epithelial gene expression [29••]. While there is no apparent difference in the number of Cdh1 transcripts per cell when comparing between the d6 and d12 single cell transcriptomes, by d12, no cell expressing Cdh1 has transcripts for Snai1, indicating a complete silencing of the mesenchymal somatic cell program at this time [29••].

Interestingly, a ChIP-Seq analysis of the reprogramming factors in human cells within 48 h of their induction showed that binding sites were enriched for the functional category of MET [30••], implying that MET genes are direct targets of the reprogramming factors.

Cell Cycle

The cell cycle of pluripotent stem cells is uniquely different from that of somatic cells (reviewed by White and Dalton [36]). Exhibiting a short G1 and G2/M phase without reducing the time taken to transit through S phase, pluripotent cells rapidly proliferate and divide symmetrically to produce equivalent daughter cells [36], while somatic cells exhibit extended division times. In the process of reprogramming somatic cells to iPSCs, a reversion to the pluripotent cell cycle occurs, with iPSCs exhibiting highly similar cell cycle profiles to those of embryonic stem cells [11, 37]. It is known that rapid cell cycling is important for the efficient formation of iPSCs from human somatic cells, and that inducing cell cycle arrest prevents the formation of iPSCs [38]. In a testament to the importance of cell cycle to reprogramming the removal or reduction in p53 levels or the loss of Ink4/Arf improve the number of reprogrammed cells that are obtained [2126]. However, this may be because a greater number of cell divisions give the necessary chance for reprogramming relevant stochastic changes to take place [39]. Like MET, cell cycle changes seem to occur early in the reprogramming process. Ccnb1 (Cyclin B1), which is an essential gene for the control of the G2/M phase of cell division, Cdk1 (serine/threonine kinase essential for G1/S and G2/M phase transitions) and Plk1 (serine/threonine protein kinase performing numerous functions throughout the M phase of the cell cycle) are all rapidly upregulated in all cells, including the ones that will remain non-reprogrammed. However, at later time points in the reprogramming process (d6, d9, d12 and iPSCs) these genes are clearly expressed at higher levels than “non-reprogramming” counterparts [27••]. Whether the down-regulation of cell cycle genes is a root-cause of reprogramming failure or a consequence of the failure remains to be examined.

These results are also corroborated in the single cell studies where a trifecta of genes that interact for mitotic maintenance Bub1, Cdc20 and Mad2l1 are upregulated by day 2–4 compared to MEFs [29••]. Gene ontology analysis of both transcriptional changes and proteins upregulated at d3 include a plethora of terms associated with cell cycle, cell division, mitosis, DNA replication and repair [27••, 28••]. c-Myc binds to Cdkn2d (p19), which is a potent inhibitor of Cdk4 and Cdk6 and results in negative regulation of the cell cycle and reduction of proliferative capacity [30••]. At the transcriptional level, there is an immediate down-regulation of Cdkn2b (p15), belonging to the same family as Cdkn2d and affecting cell cycle in a similar manner [27••]. Interestingly, as mentioned above, knockdown of another Cdkn2 family member Cdkn2a, also known as Ink4a, improves reprogramming efficiency [24]. It is interesting to speculate that c-Myc binding to Cdkn2 family members reduces their expression which subsequently removes the block in cell division that Cdnk2b and Cdkn2d impose resulting in increased proliferation. In addition, transcription factor binding data indicate that c-Myc could be partially responsible for removing the blocks imposed in somatic cells on cell cycle [17]. However, reprogramming is also successful when c-Myc is omitted from the reprogramming cocktail. While these studies utilized all four of the Yamanaka reprogramming factors it will be interesting to perform these experiments with just Oct4, Sox2 and Klf4 to test if similar timelines are maintained.

Metabolism

The role of metabolic changes in reprogramming is only now being explored. Like the MET transitions, changes in metabolism also seem to be initiated early and accumulate in the middle and late stages during reprogramming. Both the transcriptomic and proteomic studies identified a number of changes that suggest a decrease in oxidative phosphorylation in the earliest stages of reprogramming [27••, 28••], followed by the up-regulation of glycolysis during the middle phase of reprogramming [28••]. There is no indication why these proteins accumulate rather than appear abruptly after transcription initiation. However, the changes that must occur in rearranging mitochondria into a state for glycolysis are extensive [13]. To date there is still no evidence on what drives the changes in metabolism, or which factor or combination of factors results in the dramatic reversal of metabolism seen in the formation of iPSCs. This undoubtedly will be the focus of future investigations. Another facet of metabolism is the emerging evidence of dependence of ESCs on the amino acid threonine [40]. Threonine catabolism is important for the regeneration of S-adenosylmethsionine (SAM), a necessary cofactor for protein methylation. Similarly, Acetyl CoA, which is a metabolite, involved in producing reducing equivalents such as NAD, is also produced from threonine. Unlike the glycolytic enzymes, the enzymes involved in threonine metabolism are upregulated late in the reprogramming process (d9–12 at the protein level) [28••, 41]. Metabolic changes may not only be related to a rapid production of energy but also provide intermediates for pluripotent related chromatin changes [42].

Epigenetic Changes

Histone Modifications

The epigenome of a cell refers to inheritable changes that do not involve changes in genomic sequence but affect the phenotype of a cell. These epigenomic marks include post-translational modifications (PTMs) on histones and covalent methylation of DNA. It is well established that iPSCs regain an ESC like epigenome upon reprogramming [14]. ESCs are also known to have fewer heterochromatic spots in their nuclei—i.e. a different and more permissive chromatin architecture [43] and a more transcriptionally active genome [44], which is thought to increase their plasticity to be able to respond to differentiation cues. That chromatin plays an important role in reprogramming is confirmed by experiments where modulating the levels and functions of enzymes and proteins that participate in chromatin structure affects reprogramming efficiency [4551]. Changes in levels of histone modifications occur at both the global and locus-specific levels as measured by chromatin immunoprecipitation. Various studies over the last decade have established that specific histone modifications at regulatory regions of genes correlate with their expression or lack thereof. It has previously been established that one early epigenomic event is the marking of pluripotency specific enhancers with histone H3 lysine [K] 4 dimethylation [me2] much before the corresponding genes are expressed [52]. Two well-studied modifications that change during reprogramming are the “activating” histone H3K4me3 and “repressive” H3K27me3. Loci that contain both of these modifications are called bivalent and are thought to be marks of “poised” chromatin, since upon receiving differentiation cues these bivalent genes could rapidly retain one or the other mark consolidating the gene expression pattern [53]. It is important to note that bivalent genes are more prevalent in, but not unique to, pluripotent cells [53, 54]. Several genes that are marked with H3K4me3 in MEFs gain H3K27me3 and acquire bivalency after transitioning to the iPSCs state [14]. Polo et al. [27••] found that there is a sudden accumulation of bivalent marks in the initial stage of reprogramming followed by gradual accumulation from day 3 through the middle and late stages. This may enhance the plasticity of these intermediate stages. Taking a more global approach, Mattout et al. performed immunofluorescence for various histone modifications and found that those usually associated with active transcription, such as H3K9ac increase as reprogramming progresses [55]. Supporting this observation, general histone deacetylase inhibitors have been shown to improve reprogramming efficiency [45, 46, 50, 51]. While at the global level, Mattout et al. did not find changes in H3K27me3 levels [55], it is interesting that the loss of a H3K27me3 demethylase, Utx, prevents reprogramming [56]. This was partially due to failure of some pluripotency genes such as Sall4 and Utf1 to lose their repressive H3K27me3. This suggests that for some chromatin changes, locus specific effects rather than global effects are required for reprogramming to occur.

There are now several reports suggesting that modulating the levels of enzymes associated with histone modifications improve reprogramming [4749]. The loss of Dot1L, which performs H3K79 methylation, which is associated with transcription elongation, increases reprogramming efficiency [18]. This is a surprising finding given that pluripotent cells are usually thought to be transcriptionally hyperactive [44]. High H3K9me3 levels form a barrier both for initial docking of reprogramming factors at pluripotency loci and also at intermediate stages [20•, 30••]. Taken together, these results suggest that epigenomic changes both at global and local levels affect the reprogramming process.

DNA Methylation

An important feature of the epigenome is direct covalent methylation of DNA. While reprogramming from intermediate stages is improved by the addition of 5-azacytidine, which decreases DNA methylation [57], the two de novo DNA methyltransferases are dispensable for acquisition of an iPSCs state [58]. Nonetheless, the loss or gain of DNA methylation at specific promoters is a late event in reprogramming [15]. Recently, a new modification in DNA, 5-hydroxymethylcytosine (5hmc), has been described. This offers a new route to demethylate DNA by the conversion of 5-methylcytosine (5mC) to 5hmC. The enzymes that are responsible for this modification are called Tet1, Tet2 and Tet3. At early stages of reprogramming, 5hmC levels increase at the regulatory regions of Nanog, an important pluripotency gene and the loss of Tet2 abolished reprogramming [59] implying an important role for this modification. It has also been suggested that the long time frame of reprogramming is required for passive DNA demethylation. The discovery of 5hmC raises the intriguing possibility that there may be an active conversion of 5mC to 5hmC during the reprogramming process.

Activation of the Pluripotency Network

Activation of the naïve pluripotency network, with a cessation in reliance upon the exogenous reprogramming factors, is one of the final steps to occur in the formation of iPSCs. The early and middle events set the stage for the activation of the regulatory network to occur. Polo et al. [27••] observed large transcriptional changes at day 3 (compared to MEFs) and day 12 (compared to d9) reprogramming cells. During the intervening 9 days there were no dramatic changes in transcriptional activity until activation of the pluripotency network. However, while there were no dramatic transcriptional changes during the middle phase of reprogramming, Polo and colleagues observed changes in transcript levels of various differentiation related genes [27••]. Interestingly, a recent paper reports the reprogramming factors can convert human fibroblasts into a plastic state where they can subsequently be directed to differentiate toward angioblast-like cells prior establishing pluripotency [60•]. It is interesting to speculate that the authors were able to capitalize on the erasure of the somatic epigenome, accumulation of bivalent genes and changeability in the expression of differentiation related genes to generate the plasticity observed in their study. Other studies have established intermediates that are stalled at this intervening stage where the somatic genome is silenced but activation of pluripotency network has not occurred are commonly recovered from reprogramming cultures starting with MEFs, NPCs and B cells [17, 57, 61, 62]. These studies reveal the activation of the pluripotency network can be a barrier to reprogramming and that there are ways for this barrier to be overcome.

Previous reports suggested a two-step process in the activation of the pluripotency network, termed “maturation” and “stabilization”, primarily involving the activation of early and late markers of pluripotency [9]. Maturation involved the upregulation of expression of Nanog, Sall4, Esrrb and Rex1 while stabilization included the upregulation of Lin28, Utf1 and Pecam [9]. A recent report by Golipour and colleagues delves further into the intricacies of the transition of cells from maturation to the stabilization phase [63••]. During this transition, reprogrammed cells become independent of the transgenes, while failure to down-regulate the transgenes results in a failure to proceed to the stabilization phase. Interestingly, siRNA screening revealed that there were two groups of genes necessary for the transition from the maturation to stabilization phase. The first group consists of genes typically associated with pluripotency including Oct4, Nanog and Sox2. Loss of these genes disrupts maintenance of pluripotency but also prevents the transition to the stabilization phase in reprogramming cells. A second cohort of genes was identified whose expression is required for the transition to occur, but which are not typically associated with maintenance of pluripotency. These genes appear to form a separate but overlapping network which is enriched for processes such as signaling pathways (GDNF/RET) and cell cycle. These studies imply that maturation and stabilization represent real checkpoints in the reprogramming process and certain processes need to be upregulated to allow the final transition between the two. It is interesting to note that cell cycle changes continue to be of vital importance well into the late stages of reprogramming.

Past investigations have relied on the activation of reporters linked to Nanog or Oct4 regulatory regions, but recent studies have revealed that there may be a hierarchy of events in the activation of pluripotency genes and Oct4 or Nanog activation may not represent the completion of reprogramming. For example, the Oct4 locus is activated in single cells undergoing reprogramming, even in clones that remain partially reprogrammed [29••]. Other pluripotency genes such as Esrrb, Utf1 and Sall4 are upregulated in a subset of cells that have undergone MET as early as d6 of reprogramming [29••], and could represent early markers of cells that will actually complete the process. However, Esrrb and Utf1 represent better markers of reprogramming cells than Sall4, as Sall4 expression is found in a number of cells that have not upregulated Cdh1 expression, a necessary step for reprogramming. This is valuable insight because detection of both Esrrb and Utf1 at such an early time during reprogramming may represent an important landmark of the intermediate stage of the process. This is especially intriguing given the fact that Esrrb can functionally replace Nanog [64, 65].

Work conducted by Buganim et al., seems to indicate that establishing endogenous Sox2 expression may initiate the transition to the stabilization phase and ensuing activation of the pluripotency network. Establishing Sox2 expression leads to subsequent Lin28, Sall4, Fgf4, Fbx15 and Dnmt3b upregulation, placing Sox2 at the top of a number of genes associated with the endogenous pluripotent network [29••]. While events prior to Sox2 activation appear to occur stochastically, once activated, it seems to drive a feedforward pathway. Thus, activation of Sox2 can lead to the establishment of the endogenous pluripotent network and may represent cells that have entered the stabilization phase. This study also highlights the power of single cell studies. While Golipour et al. [63••] were able to identify Sox2 as necessary for the transition to stabilization to occur in their assessment of bulk reprogramming cultures, the studies by Buganim and colleagues was able to place Sox2 at the top of a hierarchy of genes leading to establishment of the naïve pluripotency network.

Conclusion

The recent transcriptome, proteome and epigenome analyses presented here have increased our understanding of the temporal specifics of the reprogramming process. However, a number of questions still remain: For example, while MET is an important process for reprogramming from MEFs it will be interesting to explore how cells that are already epithelial in nature traverse the landscape to gain pluripotency. Similarly, it is surprising that the single cell studies mark activation of Sox2 as an important trigger, given that reprogramming from NPCs which express endogenous Sox2 results in rapid accumulation of pre-iPSCs [61]. It will be interesting to investigate whether the activation of Sox2 from MEFs correlates with a particular epigenetic change that cannot be detected by transcription profiling, and is the dominant event causing entry into the stabilization phase.

Do the metabolic changes have to precede the activation of pluripotency loci or chromatin changes? Given that many important metabolic intermediates are utilized by histone modifying enzymes as substrates, these connections are worth exploring. The data sets from the reprogramming intermediates are likely to prove treasure troves that could be mined further with the potential to find better markers of faithful reprogramming than currently reported. These exciting recent studies pave the way for many more investigations into the mechanism of reprogramming.