Sox2 levels regulate the chromatin occupancy of WNT mediators in epiblast progenitors responsible for vertebrate body formation

Blassberg, Robert; Patel, Harshil; Watson, Thomas; Gouti, Mina; Metzis, Vicki; Delás, M. Joaquina; Briscoe, James

doi:10.1038/s41556-022-00910-2

Sox2 levels regulate the chromatin occupancy of WNT mediators in epiblast progenitors responsible for vertebrate body formation

Article
Open access
Published: 12 May 2022

Volume 24, pages 633–644, (2022)
Cite this article

Download PDF

You have full access to this open access article

From

View current issue Submit your manuscript

Sox2 levels regulate the chromatin occupancy of WNT mediators in epiblast progenitors responsible for vertebrate body formation

Download PDF

13k Accesses
28 Citations
67 Altmetric
1 Mention
Explore all metrics

Abstract

WNT signalling has multiple roles. It maintains pluripotency of embryonic stem cells, assigns posterior identity in the epiblast and induces mesodermal tissue. Here we provide evidence that these distinct functions are conducted by the transcription factor SOX2, which adopts different modes of chromatin interaction and regulatory element selection depending on its level of expression. At high levels, SOX2 displaces nucleosomes from regulatory elements with high-affinity SOX2 binding sites, recruiting the WNT effector TCF/β-catenin and maintaining pluripotent gene expression. Reducing SOX2 levels destabilizes pluripotency and reconfigures SOX2/TCF/β-catenin occupancy to caudal epiblast expressed genes. These contain low-affinity SOX2 sites and are co-occupied by T/Bra and CDX. The loss of SOX2 allows WNT-induced mesodermal differentiation. These findings define a role for Sox2 levels in dictating the chromatin occupancy of TCF/β-catenin and reveal how context-specific responses to a signal are configured by the level of a transcription factor.

Eomes and Brachyury control pluripotency exit and germ-layer segregation by changing the chromatin state

Article 02 December 2019

Pluripotency factors determine gene expression repertoire at zygotic genome activation

Article Open access 10 February 2022

BMP4 resets mouse epiblast stem cells to naive pluripotency through ZBTB7A/B-mediated chromatin remodelling

Article 11 May 2020

Main

Producing the variety of cell types that compose a multicellular organism requires the spatial and temporal regulation of gene expression, controlled by extrinsic signals. But there are relatively few signals compared with the number of cell types, and these signals are re-used over the course of ontogeny. The molecular mechanisms responsible for context-dependent responses to signals that generate cellular diversity remain incompletely understood.

WNT signalling, through its transcriptional effector TCF/β-catenin, has multiple functions^1,2. In mouse embryonic stem cells (mESCs), WNT/β-catenin signalling promotes pluripotency^3,4. Later, it upregulates CDX transcription factors (TFs) and assigns posterior identity in the forming caudal epiblast⁵. A subset of the cells within the caudal lateral epiblast (CLE) are neuromesodermal progenitors (NMPs) that generate the neural and mesodermal tissue responsible for the elongation of the axis⁶. In NMPs, WNT/β-catenin signalling promotes differentiation to mesodermal tissue at the expense of spinal cord neural differentiation^{7,8,9,10,11,12,13}.

Alongside WNT signalling, the TF SOX2 also plays a central role. SOX2 is expressed at high levels in mESCs and epiblast cells where it maintains the undifferentiated pluripotent state^14,15. Then, as the embryo regionalizes, SOX2 expression drops in the CLE and remains expressed at low levels in NMPs together with genes conferring primitive streak identity such as T/BRA^16,17,18. Upregulation of SOX2 is associated with the allocation of NMPs to neural progenitors^9,19. By contrast, SOX2 expression is lost upon commitment of NMPs to mesodermal lineages¹⁷.

The correlation between SOX2 levels and the changes in the function of WNT signalling raises the possibility that SOX2 influences the response of cells to WNT signalling. In mESCs, SOX2 acts with β-catenin and the TFs TCF7L1, OCT4 and NANOG to promote the expression of WNT target genes that maintain pluripotency^20,21,22. By contrast, SOX2, T/BRA and β-catenin co-occupy a large number of mesodermal cis-regulatory elements (CREs) in NMPs¹². This suggested that SOX2 contributes to maintaining the undifferentiated state of CLE progenitors by directly counteracting WNT signalling activity and inhibiting mesoderm gene expression. Nevertheless, direct evidence of whether and how SOX2 is responsible for the different developmental responses to WNT signalling has been lacking.

In this Article, to test the causal role of SOX2 in the response of cells to WNT signalling, we decoupled SOX2 expression from its developmental regulation. This revealed that SOX2 controls the WNT response by adopting different modes of chromatin interaction and occupying distinct genomic locations depending on its level of expression. Together the results provide insight into the mechanisms that determine the context-specific response of cells to an extrinsic signal that generates the diversity of outcomes necessary for tissue development.

Results

SOX2 levels alter the response of pluripotent cells

We took advantage of an in vitro model of the caudal epiblast. mESCs differentiated for 48 h in the presence of FGF and LGK974, an inhibitor of WNT secretion (henceforth ‘FL medium’), acquire an epiblast-like cell (EpiLC) identity (Fig. 1a), recapitulating post-implantation epiblast gene expression changes (Extended Data Fig. 1a). Similar to their in vivo counterparts, EpiLCs acquired caudal epiblast-like cell (CEpiLC) identity in response to WNT signalling, activated by 24 h exposure to the GSK3 inhibitor CHIR99021 (henceforth ‘FLC medium’) (Fig. 1a). This resulted in reduced expression of SOX2, upregulation of the primitive streak marker T/BRA (Extended Data Fig. 1b) and expression of Cdx and posterior Hox genes (Extended Data Fig. 1c,d)⁹.

**Fig. 1: High SOX2 levels inhibit WNT-induced differentiation of epiblast-like cells.**

We ablated endogenous Sox2 and introduced a Sox2 transgene under the control of doxycycline (Dox) (Fig. 1b). Addition of Dox to these SOX2^TetON cells generated levels of SOX2 expression similar to wild type (WT) (Extended Data Fig. 2a), and these cells (henceforth SOX2-ON) could be propagated in naive pluripotent ‘2i’ medium⁴ (Extended Data Fig. 2b). Removal of Dox from SOX2^TetON (henceforth SOX2-OFF) resulted in a gradual decrease in SOX2 protein levels (Extended Data Fig. 2c), flattening of colonies (Extended Data Fig. 2d), loss of expression of pluripotency markers OCT4 and NANOG (Extended Data Fig. 2e) and the induction of T/BRA (Extended Data Fig. 2f).

Differentiation of ESCs to CEpiLC identity involves exiting naive pluripotency and transitioning to EpiLC identity before activation of WNT signalling⁹ (Fig. 1a). Both SOX2-OFF and SOX2-ON differentiated in FL medium maintained the gene expression changes characteristic of the transition to EpiLC identity: downregulating Nanog and upregulating Fgf5 while continuing to express Pou5f1 (Fig. 1c–e). This was due to Sox3 upregulation in SOX2-OFF (Extended Data Fig. 2g–i), consistent with SOX3 being sufficient to maintain EpiLCs in the absence of SOX2 (ref. ²³). Upon transfer to FLC medium, only limited T/BRA induction was observed in SOX2-ON cells (Fig. 1f), consistent with SOX2 acting as a repressor of primitive streak identity^11,12,24,25. By contrast, SOX2-OFF cells in FLC medium, which express low levels of both SOX2 and Sox3 (Extended Data Fig. 2j,k), induced T/BRA to similar levels as WT CEpiLC (Fig. 1g,h).

Both WT and SOX2-OFF cells differentiated for 24 h in FLC medium induced genes characteristic of the primitive streak, yet a set of genes associated with the caudal epiblast were not upregulated in SOX2-OFF cells (Fig. 2a). These included the caudal epiblast determinants Cdx2 and Cdx4 and posterior Hox genes. Anterior Hox and paraxial mesoderm marker expression was reduced (Fig. 2a and Extended Data Fig. 3a,b) and genes characteristic of earlier more anterior mesoderm and endoderm were increased in SOX2-OFF cells (Fig. 2a,b and Extended Data Fig. 3c). Thus, loss of SOX2 disrupts the induction of the caudal epiblast gene expression programme in response to WNT signalling and instead leads to the differentiation of an earlier, more anterior primitive streak identity.

**Fig. 2: SOX2 dynamics configure the WNT response of epiblast-like cells.**

We next determined the identity of high-SOX2-expressing SOX2-ON cells cultured in FLC medium. As predicted from the reduction of T/BRA expression (Fig. 1f), paraxial mesoderm differentiation was inhibited (Extended Data Fig. 3d). Moreover, the majority of WNT-induced genes associated with CEpiLC identity were repressed in SOX2-ON cells (Fig. 2c). SOX2-ON cells in FLC medium did not differentiate to neural identity (Extended Data Fig. 3e), and instead re-expressed genes associated with naïve pluripotency (Fig. 2d and Extended Data Fig. 3f). Thus, sustaining high levels of SOX2 in the presence of WNT signalling appeared to revert cells to a naive-like pluripotent state.

We reasoned that removing the WNT agonist from SOX2-ON cells should destabilize the pluripotent state and permit differentiation to neural identity. Consistent with this, a decline in pluripotency marker expression was accompanied by the onset of neural differentiation following WNT-agonist withdrawal from SOX2-ON cells (Fig. 2e,f). Importantly, posterior Hox genes, typical of spinal cord neural progenitors, were not induced (Fig. 2g). Taken together, these data indicate that a reduction of SOX2 levels is necessary to prevent cells from initiating a pluripotent WNT response, whereas premature elimination of SOX2 abrogates the ability of WNT signalling to promote caudal identity. This raises the question of how SOX2 alters the response of epiblast progenitors to WNT signalling.

SOX2 downregulation reconfigures β-catenin occupancy

SOX2 is found with WNT signal transducers at a large number of CREs in both CEpiLCs¹² and naive pluripotent ESCs^20,21,22,26. To determine whether SOX2 levels configure distinct transcriptional responses to WNT signalling by altering the binding profile of β-catenin, we performed chromatin immunoprecipitation followed by sequencing (ChIP–seq) for SOX2 and β-catenin from naive ESCs cultured in 2i and CEpiLCs, SOX2-ON and SOX2-OFF cells cultured in FLC medium (Fig. 3a). Consensus peaks included 76% of SOX2 and 89% of β-catenin peaks independently identified in WT CEpilCS¹², plus an additional 110,893 SOX2 and 81,832 β-catenin peaks (Extended Data Fig. 4a,b).

Differential analysis between naive pluripotent ESCs and CEpiLCs revealed a dynamic pattern of SOX2 binding (Fig. 3b). SOX2 occupancy was reduced at 5,943 sites in CEpiLCs, but increased at 5,754 locations, despite its lower expression levels. Higher SOX2 occupancy in naive ESCs included peaks associated with pluripotency genes, whereas peaks exhibiting higher SOX2 occupancy in CEpiLCs were associated with primitive streak and trunk identity genes (Fig. 3b). Strikingly, β-catenin exhibited a coordinated reconfiguration in its occupancy at sites differentially occupied by SOX2 (Fig. 3c). The majority of peaks differentially occupied by β-catenin reflected the altered SOX2 occupancy at these sites in CEpiLCs compared with naive ESCs (5,851/5,908; 99%) (Extended Data Fig. 4c). This indicated that the changes in SOX2 levels accompanying the transition from pluripotency to CEpiLC might reconfigure the transcriptional response to WNT signalling by redistributing β-catenin occupancy.

To test whether changes in SOX2 levels account for the reconfiguration of SOX2/β-catenin occupancy during the transition from pluripotency to CEpiLC, we assayed the effect of manipulating SOX2 levels in SOX2^TetON cells cultured under CEpiLC differentiation conditions. Of the 9,727 β-catenin peaks differentially occupied between CEpiLC, SOX2-ON and SOX2-OFF, 4,399 (45%) overlapped with peaks differentially occupied by SOX2 (Fig. 3d). By contrast, differentially occupied β-catenin peaks showed a markedly lower association with chromatin-associated factors identified by ENCODE (1.6–9.5% overlap) (Extended Data Fig. 4d). Moreover, of the overlapping SOX2 and β-catenin peaks differentially occupied in response to experimental manipulation of SOX2 levels, 1,976 (45%) of the SOX2 and 2,355 (53%) of the β-catenin peaks were also differentially occupied between CEpiLC and naive ESCs. Taken together, these observations suggest that the redistribution of β-catenin that occurs during the transition from pluripotent to CEpiLC identity might be mechanistically related to differential SOX2 occupancy in high- and low-SOX2-expressing cells.

We further investigated the relationship between SOX2 and β-catenin by clustering SOX2-occupied CREs on the basis of their cell-type-specific occupancy. As observed during the transition from pluripotent to CEpiLC identity, changes in β-catenin occupancy mirrored those of SOX2 (Fig. 3e,f and Extended Data Fig. 4e,f). Moreover, both SOX2 and β-catenin occupancy were similar between both SOX2-ON and naive progenitors (which express high SOX2) (R = 0.75 SOX2, R = 0.76 β-catenin). Likewise, SOX2 and β-catenin occupancy were similar in CEpiLC and SOX2-OFF cells (which express low SOX2) (R = 0.87 SOX2, R = 0.90 β-catenin). Notably, a group of SOX2/β-catenin bound regions was most highly occupied in SOX2-OFF (cluster 2). The detection of SOX2 in SOX2-OFF conditions is probably due to the perdurance of low levels of SOX2 protein after removal of DOX (Extended Data Fig. 2j). These data therefore provide evidence of a profound and coordinated genome-wide alteration in the binding site occupancy of both SOX2 and β-catenin that is dependent on the level of SOX2 expression.

SOX2 levels configure TCF/LEF occupancy

TCF/LEF factors are differentially expressed between high- and low-SOX2-expressing cell types (Fig. 2c). We performed ChIP–seq with TCF7L1, TCF7L2 and LEF1 in ESCs, CEpiLCs, SOX2-OFF and SOX2-ON and found differentially occupied TCF/LEF1 overlapped with differentially occupied SOX2 (Extended Data Fig. 5a–c) and β-catenin sites (Extended Data Fig. 4d–f), suggesting that occupancy occurs at the same CREs. Indeed, TCF/LEF factors exhibited a similar pattern of cell-state-specific occupancy to SOX2/β-catenin (compare Fig. 3e,f with Extended Data Fig. 5g,h,i; Extended Data Fig. 4e,f with Extended Data Fig. 5j–l). These data indicate that the reduction in SOX2 levels during the transition from pluripotency to CEpiLC identity drives the coordinated reconfiguration of SOX2/TCF/β-catenin co-occupancy across the genome.

TCF/β-catenin occupancy and transcriptional responses

We asked whether the changes in SOX2/β-catenin binding could explain the distinct gene expression programmes of cells expressing different levels of SOX2. In line with the known positive effect of β-catenin on transcription, differentially expressed genes neighbouring differential SOX2 peaks were, on average, positively correlated with changes in SOX2/β-catenin occupancy for each of the cell-state clusters (Fig. 4).

**Fig. 4: SOX2/β-catenin co-occupancy correlates with cell-type-specific gene expression.**

Genes in cluster 1, which are specifically occupied by SOX2/β-catenin in CEpiLCs (Figs. 4a and 3e,f), are increased in expression in CEpiLCs (Fig. 4a,b). These were enriched for biological processes related to anterior–posterior patterning (Fig. 3f and Extended Data Fig. 6a). Genes in cluster 2 were occupied by SOX2/β-catenin most highly in SOX2-OFF cells, showed greatest expression in these cells (Figs. 4c and 3e,f) and included early primitive streak and mesendodermal genes (Fig. 3f). Cluster 3 genes showed greatest SOX2/β-catenin occupancy and expression in SOX2-ON cells (Figs. 4e and 3e,f) and comprised genes associated with pluripotency and anterior neural identity (Fig. 3f and Extended Data Fig. 6b). Genes in cluster 4, which are occupied by SOX2/β-catenin most highly in WT CEpiLCs and SOX2-OFF cells (Figs. 4g and 3e,f), were enriched for genes expressed in caudal epiblast, primitive streak and mesoderm (Fig. 3f) and showed highest expression in these conditions (Fig. 4h). Thus, differential SOX2/β-catenin occupancy driven by changes in SOX2 levels correlates with cell-state-specific gene expression patterns and points to a positive role for SOX2 in promoting WNT-dependent gene activation by β-catenin.

Differentially expressed genes associated with SOX2/β-catenin occupancy in SOX2-ON in FLC medium and naive progenitors (clusters 3 and 5), included a number of pluripotency factors as well as genes involved in nervous system development (Fig. 3f and Extended Data Fig. 6b). Neural genes were expressed at comparatively low levels (Extended Data Fig. 6c), consistent with the priming of these genes in ESCs²⁷. Thus, the establishment of a naive-like SOX2/TCF/LEF configuration appears to underlie the re-expression of pluripotent factors in SOX2-ON cells stimulated with WNT agonist, and repression of the post-implantation epiblast WNT response. This supports the idea that a reduction in SOX2 levels is necessary to reconfigure SOX2 and β-catenin to ensure a caudal epiblast gene regulatory programme in response to WNT activity.

High SOX2 levels maintain chromatin accessibility

SOX2 has been proposed to act as a pioneer factor^28,29,30,31, suggesting that SOX2 may direct cell-state-specific TCF/β-catenin binding by altering chromatin accessibility. To test this, we performed assay for transposase-accessible chromatin using sequencing (ATAC–seq). This revealed distinct relationships between chromatin accessibility and SOX2 occupancy in different cell states. Cluster 3 CREs (pluripotency and neural), occupied by SOX2 in SOX2-ON and pluripotent progenitors, were only accessible in cell states with high SOX2 (Fig. 5a,b and Extended Data Fig. 7a). By contrast, CREs in cluster 1, which were occupied by SOX2 specifically in CEpiLC, were accessible in all cell states (Fig. 5a and Extended Data Fig. 7a). Cluster 2 CREs (SOX2-OFF-specific and early streak) and cluster 4 CREs (primitive streak and paraxial mesoderm) exhibited a more complex pattern of accessibility, with comparable average accessibility in SOX2-OFF, SOX2-ON and pluripotent progenitors, but less accessibility in CEpiLCs (Fig. 5a and Extended Data Fig. 7a). Thus, whereas changes in chromatin accessibility may explain the specific occupancy of SOX2/TCF/β-catenin complexes at cluster 3 CREs in SOX2-ON and pluripotent cells, which express high levels of SOX2, they do not explain the cell-state-specific occupancy at other clusters.

**Fig. 5: SOX2 promotes chromatin accessibility at high-affinity sites.**

To exclude the possibility that accessibility at SOX2-occupied cluster 3 CREs might be an indirect consequence of the WNT-dependent naive pluripotent cell state, we analysed EpiLCs cultured in the absence of CHIR, which express high levels of SOX2 but do not express naive pluripotency genes (Fig. 2d). SOX2 occupancy was higher at CREs associated with pluripotency and neural differentiation genes, and lower at CREs associated with CEpiLC genes in EpiLCs compared with CEpiLCs (Fig. 5c). Average SOX2 occupancy and accessibility at cluster 3 CREs was also higher in EpiLCs (Fig. 5d,e and Extended Data Fig. 7b), supporting the idea that high SOX2 levels promote accessibility at these sites in the absence of WNT signalling.

We explored whether the increased ATAC–seq signal at cluster 3 CREs in SOX2-ON could be driven directly by SOX2 occupancy by analysing the nucleosome landscape at sites of SOX2 binding. NucleoATAC analysis³² revealed that average nucleosome occupancy at the centre of cluster 3 SOX2 binding peaks was markedly depleted in high-SOX2-expressing SOX2-ON cells compared with low-SOX2-expressing SOX-OFF cells and WT CEpiLCs (Fig. 5f,g). By contrast, the average nucleosome density at SOX2 peak centres in cluster 4 peaks was largely independent of SOX2 levels (Fig. 5f). We conclude that SOX2 binding directly drives nucleosome eviction, rather than being a secondary consequence of increased neighbouring accessibility.

We hypothesized that the distinct relationship between SOX2 levels and nucleosome occupancy at cluster 3 peaks may reflect the affinity of SOX2 binding sites within the underlying sequence. Motif analysis identified both more SOX2 sites and a greater proportion of peaks with high-affinity SOX2 motifs in cluster 3 (Fig. 5h and Extended Data Fig. 7c,d). Within this cluster, motif score correlated with SOX2 occupancy and nucleosome depletion in SOX2-ON cells (Extended Data Fig. 7e). What then explains the change in SOX2 binding in CEpiLCs?

SOX2 occupies low-affinity sites with cell-specific factors

We reasoned that SOX2 occupancy at constitutively accessible sites might require additional cell-state-specific co-factors^33,34. Motif enrichment analysis (Fig. 6a) revealed that cluster 1 peaks (CEpiLC-specific SOX2 binding) were enriched for CDX/HOX motifs, factors expressed specifically in CEpiLCs (Fig. 2a,f). Cluster 4 sites, which are occupied by SOX2/TCF/β-catenin in both CEpiLCs and SOX-OFF primitive streak progenitors, were enriched for motifs for T/BRA, which is repressed in SOX2-ON. In addition, cluster 2 sites were enriched for the Nodal signalling mediator FOXH1, indicating that elevated Nodal signalling may contribute to the regulation of the distinct early primitive streak WNT response in SOX-OFF cells.

**Fig. 6: SOX2 associates with cell-type-specific factors at low-affinity sites.**

Consistent with these motif enrichment results, analysis of ChIP–seq data from CEpiLCs indicated that T/BRA was enriched, along with SOX2 and β-catenin, at both cluster 2 and cluster 4 sites (Fig. 6b,c and Extended Data Fig. 8a). Similarly, CDX2 ChIP–seq from CEpiLCs confirmed an increased CDX2 co-occupancy with SOX2 and β-catenin at cluster 1 sites (Fig. 6d,e and Extended Data Fig. 8a). Moreover, a larger proportion of low-affinity SOX2 motifs were found within close proximity (<100 bp) to CDX binding motifs within cluster 1 peaks than in clusters 2–4 (Extended Data Fig. 8b). Similarly, a larger proportion of low-affinity SOX2 motifs within cluster 2 and 4 peaks were located closer to T/BRA binding motifs than in other clusters (Extended Data Fig. 8c). This suggested that CDX and T/BRA may act as co-factors to promote cell-type-specific recruitment of SOX2 or β-catenin.

To test directly whether cell-type-specific TFs such as CDX and T/BRA mediate SOX2/TCF/β-catenin occupancy, we performed ChIP–seq for SOX2, β-catenin and LEF1 in CEpiLC cells derived from ESCs either mutant for T/Bra (T/BraKO) or triple mutant for Cdx1,2,4 (CdxKO) (ref. ³⁵). Strikingly, both SOX2 and β-catenin exhibited changes in binding in the absence of either CDX factors or T/BRA, and tended to be reduced at peaks adjacent to transcriptional targets of CDX (Fig. 6f,g,h) and T/BRA (Extended Data Fig. 8d–f). CdxKO cells showed reduced SOX2, β-catenin and LEF1 occupancy across cluster 1 CEpiLC-specific CREs co-occupied by SOX2 and CDX2 (Extended Data Fig. 8g–i), and reduced β-catenin at cluster 4 sites normally occupied by CDX factors (Extended Data Fig. 8j–l). These data support a direct role for CDX factors in configuring the CEpiLC WNT response. Similar results were observed for T/BraKO in clusters 2 (Extended Data Figs. 8m,n,o) and 4 (Extended Data Fig. 8p,q,r). As these changes in SOX2/β-catenin/LEF1 occupancy were not accompanied by discernable changes in nucleosome occupancy in either CdxKO or T/BraKO cells (Extended Data Fig. 8s), these data suggest that CDX and T/BRA regulate cell-type-specific WNT target-gene expression by directing the recruitment of SOX2 and TCF/β-catenin to constitutively accessible CREs.

SOX2 levels control CDX2 enhancer activity

WNT-induced CDX2 expression is constrained to a specific range of SOX2 levels (Fig. 2a and Extended Data Fig. 9a). A previously identified regulatory element within the CDX2 intron^36,37 displayed a cell-type-specific pattern of SOX2/β-catenin co-occupancy that correlated with SOX2 levels (Fig. 7a). We generated fluorescent reporter lines harbouring the intronic sequence (Fig. 7b). Both CDX2 expression and reporter activity were higher in CEpiLCs cultured in FLC medium than in pluripotent ESCs, which express high levels of SOX2, and activin-induced early primitive streak cells (Fig. 7c–e and Extended Data Fig. 9b) that express little if any SOX2 or Sox3 (Extended Data Fig. 9c,d). To test whether SOX2 regulates the CDX2 intronic enhancer, we scrambled all SOX2 binding sites in the reporter (Sox2del) (Fig. 7b). Activity of the Sox2del reporter was substantially reduced in CEpiLCs (Fig. 7f and Extended Data Fig. 9e,f) consistent with the idea that SOX2 occupancy promotes the induction of CDX2 by WNT signalling, and that its repression in ESCs is indirect.

**Fig. 7: Cdx2 induction requires low-level SOX2/SOX3 expression.**

SOX2 and CDX2 are repressed by Nodal signalling in early primitive streak progenitors^38,39. Nodal expression is elevated in SOX2-OFF primitive streak progenitors (Fig. 3d). Inhibition of Nodal signalling in SOX2-OFF cells concurrently with WNT pathway activation led to the inhibition of both the general primitive streak marker T/Bra and of early primitive streak markers Eomes, Mixl1 and Nanog (Extended Data Fig. 9g), the upregulation of Sox3 (Fig. 7g) and a rescue of Cdx2 expression (Fig. 7h). We conclude that the presence of moderate levels of SOX2 in CEpiLCs promotes posterior identity by both positively regulating CDX2 expression and restraining the induction of early primitive streak identity by Nodal.

Discussion

Here we show that the level of SOX2 expression determines its genome-wide occupancy and this underpins distinct WNT-driven transcriptional programmes at successive stages of pluripotent stem cell differentiation. We found that β-catenin frequently co-occupies genomic sites with SOX2. Perturbations to SOX2 levels led to coordinated changes in the genomic location of SOX2 and β-catenin binding. During the transition from pluripotency to caudal epiblast identity, a reduction in global SOX2 levels resulted in a reduction of SOX2 occupancy at a set of CREs accompanied by a corresponding reduction in β-catenin occupancy. Many of these CREs were associated with genes expressed in pluripotent epiblast or neural ectoderm, cell types that require high levels of SOX2 expression to maintain their identity^14,24,27,40. Surprisingly, the reduction in global SOX2 levels also resulted in an increase in SOX2 and β-catenin co-occupancy at a set of CREs. These were associated with WNT-responsive genes expressed in caudal epiblast progenitors, many of which are responsible for posterior patterning and mesoderm differentiation. Artificially increasing or decreasing SOX2 expression redistributed SOX2/β-catenin, and prevented the transition to a CLE identity. Thus, SOX2 levels configure the WNT response of epiblast progenitors and shape the transcriptional changes accompanying the differentiation of pluripotent cells to CLE.

Different levels of TF expression have been found to control differential gene expression programmes in several systems^{41,42,43,44,45}. In many cases, the mechanistic basis for this has been unclear. Here we provide evidence that, for SOX2, the level of expression has a marked effect on the selection of CREs to which it binds, providing an explanation for the different gene expression responses. At high levels of expression, SOX2 remains bound to a set of CREs associated with neural and pluripotency genes. For this set of CREs, SOX2 binding correlated with chromatin accessibility. This is consistent with the known role of SOX2 as a pioneer factor and its ability to bind and open inaccessible CREs^28,29,30,31.

The decrease in SOX2 levels resulted in a repositioning of SOX2 to CREs associated with genes involved in posterior patterning and mesoderm induction. Despite the lower levels of SOX2, these CREs contained lower-affinity SOX2 binding sites than the CREs bound by SOX2 in cell types with high SOX2 expression levels. Moreover, these CREs were accessible in pluripotent conditions as well as in CLE. Co-factor-mediated recruitment to low-affinity sites has been implicated in cell-type-specific CRE activity and gene expression^33,34,46,47. For SOX2, we found evidence of the involvement of CDX2 and T/BRA in directing binding to low-affinity sites. These observations suggest that SOX2 adopts different modes of chromatin interaction and CRE selection depending on its level of expression (Fig. 7i). This resolves a paradox. Despite its pioneering activity and ability to bind and activate condensed chromatin, the distribution of SOX2 occupancy on chromatin differs between cell types. Our data provide further evidence that SOX2 acts as a pioneer factor in pluripotent cells when expressed at high levels, but collaborates with other TFs to select lower-affinity binding sites when expressed at lower levels. This is consistent with previous studies of how TFs gain access to their genomic targets^29,48,49,50 and provides an explanation for the distinct gene expression programmes regulated at different TF expression levels.

There was a positive correlation between SOX2 binding and the activation of WNT-responsive genes in CLE cells. Consistent with this, using a CRE from CDX2, we found that SOX2 occupancy is required for CDX2 activation by WNT signalling, providing direct evidence of an activator role for SOX2 in the regulation of β-catenin target genes. This suggests a self-reinforcing mechanism for cell-type specificity of WNT signalling. Downregulation of SOX2 in CEpiLCs leads to reconfiguration of the chromatin state and prevents re-expression of the pluripotent transcriptional programme. This eases repression on CLE-specific WNT target genes such as T/BRA and CDX2 (Figs. 2 and 7)^51,52. Consequently, SOX2 and TCF/β-catenin are recruited to CREs associated with CDX and T/BRA target genes, inducing gene expression programmes characteristic of posterior identity and primitive streak to paraxial mesoderm differentiation. Then, as SOX2 levels are further reduced during the differentiation of CLE progenitors to mesoderm progenitors, CDX and T/BRA expression decreases^17,53.

A consequence of the mechanism that establishes the primary body axis is that anterior and posterior structures derive from distinct epiblast progenitor pools³⁵. Anterior tissues, including the forebrain and heart, are established early in embryonic development from pluripotent epiblast progenitors that co-express OCT4, NANOG and high levels of SOX2^{16,18,38,54,55,56,57,58,59}. Caudal epiblast progenitors retain low levels of SOX2 expression, and this is required to assign trunk identity to both the mesoderm and spinal cord by establishing CDX/HOX expression in response to WNT signalling. As CREs associated with neural genes require high SOX2 to maintain accessibility, neural differentiation is restrained in CLE progenitors independently of inhibitory activity of OCT4/NANOG. By contrast, the initiation of mesoderm differentiation is controlled by regulation of T/BRA by WNT/Nodal signalling, independently of chromatin remodelling.

A division between the ontogeny of head and trunk tissue is also apparent in arthropods. Reminiscent of the CLE, homologues of SOX2 and CDX2 are co-expressed in a posterior progenitor pool that fuels WNT-dependent axis elongation. Moreover, the SOX orthologues have been shown to participate in the assignment of posterior identity within these progenitors^60,61,62. A collaboration between WNT signalling and SOX2 in the regulation of CDX factors therefore appears to be an evolutionarily conserved mechanism that establishes the primary bilaterian axis and allocates cells to trunk tissues.

Methods

Cell lines

All cell lines were maintained and experiments performed at 37 °C with 5% CO₂. All ESC lines used were derived from the XY HM1 TetON line⁶⁶, which was used as the WT control. Sox2 TetON, Cdx2 intron CRE reporter and Cdx2 intron CRE reporter Sox2del lines were generated as described below. Cell lines were validated by DNA sequencing and flow cytometry, and routinely tested for mycoplasma.

Sox2 TetON

Sox2 TetON was generated by introducing a silent G > A mutation 54 bp into the open reading frame of Sox2 complementary DNA (cDNA) using site-directed mutagenesis to ablate the protospacer adjacent motif site targeted by the guide Sox2_CRISPR_1. The Sox2_CRISPR_1-insensitive Sox2 cDNA was then subcloned into pBI2 using Sal1/Not1 restriction digest and subsequently cloned into the HPRT locus targeting vector Hprt2 as described previously in ref. ⁶⁶. The HPRT_TetON-SOX2_54G > A construct was electroporated into HM1 TetON ESCs, and integrants were selected by culturing for 10 days in hypoxanthine-aminopterin-thymidine (HAT)-containing ESC medium. Construct integration was confirmed by genotyping, and transgene expression was confirmed by flow cytometry. SOX2_54G > A targeted cells were then adapted to 2i culture, the transgene was induced with Dox (1 μg ml⁻¹), and cells were electroporated with Sox2_crispr_1 using an Amaxa Nucleofector to ablate endogenous Sox2 gene expression. Electroporated cells were seeded at clonal density on gelatin plates in 2i medium and the following day were selected by culturing in puromycin (1 μg ml⁻¹) for 36 h. Resistant clones were grown, picked, expanded and screened for SOX2 expression by flow cytometry to detect complete loss of SOX2 expression following withdrawal of Dox. Following ablation of endogenous SOX2, Sox2 TetON were stably maintained in pluripotency conditions by addition of 1 μg ml⁻¹ Dox to serum + leukaemia inhibitory factor (LIF) ESC culture medium. Oligonucleotide sequences are detailed in Supplementary Table 1.

Sox2 TetON, Sox3⁻

Sox3 was ablated from Sox2 TetON by electroporating Sox3_CRISPR_1 and selecting edited clones using the approach described above for Sox2 ablation. Functional ablation of the single Sox3 allele was confirmed by genotyping to identify clones with frameshift mutations, and subsequent functional analysis. Oligonucleotide sequences are detailed in Supplementary Table 1.

Cdx2 intron CRE reporter

GeneBlock oligonucleotides coding for a 1.6 kb fragment of WT sequence containing the SOX2 peaks within Cdx2 intron 1, or the same region in which seven predicted SOX2 motifs (JASPAR)⁶⁷ were scrambled (Sox2del), were cloned by Gibson assembly into a pENTR11 backbone upstream of an hsp68 minimal promoter driving expression of a Venus-H2B transgene. Additionally, the Gateway cassette from FuTetO-GW (Addgene) was cloned by Gibson assembly into the Asc1/Pme1 site of Hprt2 (ref. ⁶⁶) to yield HPRT_GW. LR clonase (Invitrogen) was used to induce recombination between the pENTR reporter construct and HPRT_GW, yielding HPRT-locus targeting constructs, which were used to generate stable lines in HM1 TetON ESCs as described for Sox2 TetON. Oligonucleotide sequences are detailed in Supplementary Table 1.

ESC culture and differentiation

All mESCs were propagated on mitotically inactivated mouse embryonic fibroblasts (feeders) in DMEM knockout medium supplemented with 1,000 U ml⁻¹ LIF, 10% cell-culture-validated foetal bovine serum, penicillin–streptomycin and 2 mM l-glutamine (Gibco). To obtain EpiLCs and CEpiLCs, ESCs were differentiated as previously described⁹ with the addition of the porcupine inhibitor LGK974 in all culture media. Briefly, ESCs were dissociated with 0.05% trypsin, and plated on tissue-culture-treated plates for two sequential 20-min periods in ESC medium to separate them from their feeder layer cells, which adhere to the plastic. To start the differentiation, cells remaining in the supernatant were pelleted by centrifugation, counted and resuspended in N2B27 medium containing 10 ng ml⁻¹ bFGF + 5 μM, and 50,000 cells per 35 mm gelatin-coated CellBIND dish (Corning) were plated. N2B27 medium contained a 1:1 ratio of DMEM/F12:Neurobasal medium (Gibco) supplemented with 1× N2 (Gibco), 1× B27 (Gibco), 2 mM l-glutamine (Gibco), 40 mg ml⁻¹ BSA (Sigma), penicillin–streptomycin and 0.1 mM 2-mercaptoethanol. To generate EpiLCs, the cells were grown for 72 h in N2B27 + 10 ng ml⁻¹ bFGF + 5 μM LGK974 (FL medium). To generate CEpiLCs, cells were cultured with N2B27 + 10 ng ml⁻¹ bFGF + 5 μM LGK974 for 48 h, then N2B27 + 10 ng ml⁻¹ bFGF + 5 μM LGK974 + 5 μM CHIR99021 (FLC medium) for a further 24 h (day 3 in ref. ⁹). CEpiLCs were differentiated to spinal cord neural progenitors by removal of bFGF and CHIR from culture medium at 72 h, and to paraxial mesoderm by removal of bFGF and maintenance of 5 μM CHIR from 72 h onwards. When investigating the activity of Nodal signalling, either 10 ng ml⁻¹ recombinant activin or 10 μM ALK-inhibitor SB-431542 was included in bFGF/CHIR-containing medium. Experiments conducted in 2i medium were initiated by separating serum/LIF-grown ESCs from feeders as described above and plating onto gelatin-coated CellBIND dishes in N2/B27-containing basal medium supplemented with 3 μM CHIR and 500 nM PD0325901. For all experiments described, cells were cultured for 48 h before changing medium. Medium changes were then made every 24 h. Details of key compounds are described in Supplementary Table 1.

Immunofluorescence

Cells were washed in PBS and fixed in 4% paraformaldehyde in PBS for 15 min at 4 °C, followed by two washes in PBS and one wash in PBST (0.1% Triton X-100 diluted in PBS). Primary antibodies were applied overnight at 4 °C diluted in filter-sterilized blocking solution (1% BSA diluted in PBST). Cells were washed three times in PBST and incubated with secondary antibodies at room temperature, for 1 h. Cells were washed three times in PBST, incubated with DAPI for 5 min in PBS and washed twice before mounting with Prolong Gold (Invitrogen). Cells were imaged on a Zeiss Imager.Z2 microscope using the ApoTome.2 structured illumination platform. Z stacks were acquired using Zeiss Zen software and represented as maximum intensity projections using ImageJ software. The same settings were applied to all images. Immunofluorescence was performed on a minimum of two biological replicates, from independent experiments. Secondary antibodies used were anti-mouse AlexaFluor 488 (Thermo Fisher), anti-rabbit AlexaFluor 488 (Thermo Fisher), anti-rabbit AlexaFluor 647 (Thermo Fisher) and anti-goat AlexaFluor 647 (Thermo Fisher). Details of primary antibodies are described in Supplementary Table 1.

Intracellular flow cytometry

Cells were washed in PBS and dissociated with minimal accutase (Gibco). Once detached, cells were collected into 1.5 ml Eppendorf tubes by dissociating in N2B27 and pelleted. Cells were resuspended in PBS, pelleted and resuspended in 4% paraformaldehyde in PBS. Following 15 min incubation at 4 °C, cells were centrifuged at 700 relative centrifugal force, resuspended in PBS and stored at 4 °C for future analysis. On the day of flow cytometry, cells were counted and equal cell numbers were transferred for staining in V-bottom 96-well plates. Samples were pelleted and resuspended in 5 μl FACS block (PBS + 0.2% Triton + 3% BSA). After 10 min incubation at room temperature, antibodies were added to the sample and incubated overnight at 4 °C. Cells were pelleted at 700g for 5 min and resuspended in 50 μl FACS block. One additional wash was performed before acquisition on a Fortessa flow cytometer (BD) using FACSDiva software. Analysis was performed using the R package flowCore⁶⁸ and data were graphed using ggplot2 (ref. ⁶⁹). A representative figure illustrating the gating strategy is provided in Extended Data Fig. 10. Details of antibodies are described in Supplementary Table 1.

Quantification of flow cytometry data

To determine the relative response of the Cdx2 intron CRE reporter to WNT pathway activation in CEpiLCs, early streak and naïve progenitors, the median fluorescence intensity was determined and normalized against the value obtained for unstimulated EpiLCs. The same approach was taken when investigating the consequence of SOX2 binding site deletion in the Sox2del line.

RNA extraction

RNA used for quantitative PCR (qPCR) or RNA sequencing (RNA-seq) was extracted from cells using a QIAGEN RNeasy kit in RLT buffer, following the manufacturer’s instructions. Extracts were digested with DNase I to eliminate genomic DNA.

cDNA synthesis and qPCR analysis

First-strand cDNA synthesis was performed using Superscript III (Invitrogen) using random hexamers and was amplified using PowerUp SYBR-Green Mastermix (Applied Biosystems). qPCR was performed using the Applied Biosystems QuantStudio Real Time PCR system and analysed with Applied Biosystems QuantStudio 12 K Flex software. PCR primers were designed using online GenScript qPCR primer design tool. Two technical replicates were obtained for each sample and averaged before normalization and statistical analysis. Relative expression values for each gene were calculated by normalization against β-actin, using the delta–delta CT method. qPCR analysis was performed on samples obtained from a minimum of three independent experiments for every primer pair analysed. Data were graphed and statistical tests were performed using GraphPad Prism software. Primer sequences are detailed in Supplementary Table 1.

RNA-seq

Libraries were prepared using the KAPA mRNA HyperPrep kit (Roche) and sequenced as 76 bp single-end, strand-specific reads on the Illumina HiSeq 4000 platform (Francis Crick Institute).

RNA-seq analysis

Adapter trimming was performed with cutadapt (version 1.16)⁷⁰ with parameters ‘–minimum-length=25 –quality-cutoff=20 -a AGATCGGAAGAGC’, and for paired-end data ‘’-A AGATCGGAAGAGC’ was appended to the command. The RSEM package (version 1.3.0)⁷¹ in conjunction with the STAR alignment algorithm (version 2.5.2a)⁷² was used for the mapping and subsequent gene-level counting of the sequenced reads with respect to mm10 RefSeq genes downloaded from the UCSC Table Browser⁷³ on 11 December 2017. The parameters passed to the ‘rsem-calculate-expression’ command were ‘–star –star-gzipped-read-file –star-output-genome-bam –forward-prob 0’, and for paired-end data ‘–paired-end’ was appended to the command. Differential expression analysis was performed with the DESeq2 package (version 1.16.1)⁷⁴ within the R programming environment (version 3.4.1). An adjusted P value ≤0.05 was used as the significance threshold for the identification of differentially expressed genes.

RNA-seq clustering

The R ‘kmeans’ function was used to cluster standardized (z-transformed) FPKM values across biological conditions before plotting with R ‘heatmap2’ function. The lowest value of k able to partition gross trends in the data was chosen.

RNA-seq associating differential gene expression with differential SOX2 occupancy

Homer ‘annotatePeaks.pl’ was used to associate consensus SOX2 ChIP peaks with nearest gene promoters. SOX2-associated genes were then filtered on the basis of their differential expression in pairwise comparisons between either CEpiLC, SOX2-OFF and SOX2-ON; WT CEpiLC and Cdx1,2,4^−/− (CdxKO) CEpiLC; or WT CEpiLC and T/Bra^−/− (T/BraKO) CEpiLC using DESeq2. Mean FPKM values from triplicate samples were z-transformed across the three experimental conditions to standardize fold change in expression and plotted using ggplot2.

RNA-seq GO enrichment

The online functional annotation tool of the DAVID bioinformatics resource https://david.ncifcrf.gov/summary.jsp was used with default parameters to identify statistically enriched biological process annotations within sets of gene IDs associated with differentially expressed transcripts, and to calculate associated Benjamini–Hochberg adjusted P values.

RNA-seq comparison of in vitro to in vivo epiblast differentiation

Principal component analysis was performed on mRNA-seq data from duplicate 2i, and triplicate ICM, E4.5 epiblast and E5.5 epiblast samples from ref. ⁶⁵ using using the R function prcomp. PC1 aligned with developmental time, whereas PC2 separated in vitro (2i) and in vivo (ICM, E4.5 and E5.5) derived samples. The top 300 genes contributing most positively and negatively to PC1 were selected to represent the gene expression dynamics observed to occur during epiblast differentiation in vivo, and the dynamics of their expression during in vitro differentiation of ESCs to EpiLCs was represented by plotting standardized (z-transformed) FPKM values using heatmap2.

ChIP–seq

Adherent cells were washed three times with PBS, fixed with gentle agitation for 45 min at room temperature with fresh 2 mM di(N-succinimidyl) glutarate (Sigma) in PBS, washed an additional three times with PBS, then fixed for 10 min at room temperature with 1% molecular-biology-grade paraformaldehyde in PBS. Fixation was quenched by addition of 250 mM glycerine for 5 min, followed by additional washing with PBS. Plates were cooled, and cells were scraped into tubes in a low volume of PBS 0.02% Triton X-100 and pelleted by centrifugation at 100g for 5 min at 4 °C before snap freezing in liquid nitrogen and storing at −80 °C. Approximately 5 × 10⁶ cells were transferred to a Diagenode TPX tube and resuspended in ice-cold shearing buffer containing 0.3% SDS and protease inhibitors (Sigma). Chromatin was sheared using a Bioruptor plus: 25 cycles of 30 s on/30 s off on high setting, and lysates were then diluted to 0.15% SDS and cleared by centrifugation at 14,000 r.p.m. for 10 min at 4 °C. Then, 1/20 of the chromatin from ~1 × 10⁷ cells was set aside and frozen for subsequent use as input control, and the remainder was incubated overnight at 4 °C under rotation with 100 μl of protein G dynabeads (Invitrogen) pre-loaded for 4 h at room temperature with 5 μg of ChIP antibodies diluted in shearing buffer containing 0.15% SDS. Beads were magnetically immobilized, unbound supernatant was discarded and beads were sequentially washed under rotation twice with Wash Buffer 1, once with Wash Buffer 2, once with Wash Buffer 3 and twice with Wash Buffer 4 for 5 min each, magnetically capturing beads between each wash. Chromatin was eluted from beads by incubating twice at 65 °C for 10 min in 100 μl elution buffer on a shaking heat block, capturing beads between each elution step and then pooling each eluted fraction. Input samples were made up to 200 μl with elution buffer, 6.4 μl of 5 M NaCl was added to each input or immunoprecipitated sample, and all samples were de-crosslinked overnight at 65 °C. Samples were incubated for 2hrs at 37 °C with 0.2 μg ml⁻¹ PureLink RNAse A (Invitrogen), then supplemented with 5 mM EDTA and incubated for an additional 2 h at 45 °C with 0.2 μg ml⁻¹ proteinase K (Thermo Scientific) before purifying DNA with Qiagen PCR clean-up columns. DNA fragmentation of IP and input samples was confirmed by Agilent TapeStation before library preparation using NEB Ultra II DNA. Biological triplicates were obtained for all conditions from separate experiments. Libraries were sequenced as single-end, 76 bp reads on the Illumina High-Seq 4000 platform (Francis Crick Institute). The composition of buffers and details of antibodies are described in Supplementary Table 1.

ChIP-seq analysis

The nf-core/ChIP-seq pipeline (version 1.1.0; https://doi.org/10.5281/zenodo.3529400)⁷⁵ written in the Nextflow domain specific language (version 19.10.0)⁷⁶ was used to perform the primary analysis of the samples in conjunction with Singularity (version 2.6.0)⁷⁷. The command used was ‘ nextflow run nf-core/ChIP-seq –input design.csv –genome mm10 –gtf refseq_genes.gtf –single_end –narrow_peak –min_reps_consensus 2 -profile crick -r 1.1.0’. To summarize, the pipeline performs adapter trimming (Trim Galore! - https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/), read alignment (BWA)⁷⁸ and filtering (SAMtools)⁷⁹; (BEDTools)⁸⁰; (BamTools)⁸¹; (pysam - https://github.com/pysam-developers/pysam); (picard-tools; http://broadinstitute.github.io/picard), normalized coverage track generation (BEDTools)⁸⁰; (bedGraphToBigWig)⁸², peak calling (MACS) (default q-value threshold <0.05)⁸³ and annotation relative to gene features (HOMER)⁸⁴, consensus peak set creation (BEDTools)⁸⁰, differential binding analysis (featureCounts)⁸⁵; (DESeq2)⁷⁴ and extensive quality control and version reporting (MultiQC)⁸⁶; (FastQC; https://www.bioinformatics.babraham.ac.uk/projects/fastqc/); (preseq); deepTools⁸⁷; (phantompeakqualtools)⁸⁸. Inclusion of a peak in the consensus peak set required that it be called by MACS in a minimum of two of three biological replicates from any of the four experimental conditions (CEpiLC, SOX2-OFF, SOX2-ON and naive ESCs). In all analyses, except for Fig. 3c and Extended Data Fig. 4c, the consensus peak set was derived from SOX2 peaks. For Fig. 3c and Extended Data Fig. 4c, the consensus peak set comprised peaks from SOX2/β-catenin/TCF7L1/LEF1. All data were processed relative to the mouse UCSC mm10 genome (UCSC)⁷³ downloaded from AWS iGenomes (https://github.com/ewels/AWS-iGenomes). Peak annotation was performed relative to the same GTF gene annotation file used for the RNA-seq analysis. Tracks illustrating representative peaks were visualized using the IGV genome browser⁸⁹.

ChIP–seq peak clustering

SOX2 peaks were manually assigned to six clusters on the basis of differential occupancy between WT, SOX2-OFF and SOX2-ON samples. Peaks in clusters 1, 2 and 3 had the highest mean read counts across biological triplicate samples in either WT, SOX2-OFF or SOX2-ON respectively, and were statistically different (false discovery rate (FDR) <0.05) as determined by DESeq2 compared with all other experimental conditions. Cluster 4, 5 and 6 peaks were statistically different to only one of the other experimental conditions. Browser Extensible Data (BED) files of genomic intervals defined by SOX2 peaks within these clusters were used to plot metaplots and heat maps from the BigWig files generated from the nf-core/ChIP-seq and nf-core/ATAC-seq pipelines using deepTools, for motif enrichment analysis and motif scanning.

ChIP–seq motif enrichment

Motifs enriched within each SOX2 peak cluster were identified using Homer⁸⁴ findMotifsGenome using default parameters. Region size was 200 bp (±100 bp adjacent to peak centre).

ChIP–seq motif scoring with FIMO

Regions ±100 bp adjacent to SOX2 ChIP–seq peak centres were used as inputs for the motif scanning tool Find Individual Motif Occurrences (FIMO) http://meme-suite.org/tools/fimo (ref. ⁹⁰). The SOX2 motif MA0143.3 (JASPAR)⁶⁷ was used as a target. P-value threshold was set to P < 0.1 so as to include low-scoring SOX2 motifs present within peak sets. Cluster 3 peaks were ranked on the basis of the total score of all motifs within each region with a score greater than −20, which represents up to two mismatches compared with the consensus. All ±100 bp regions within cluster 1–6 peaks contained at least one motif with a score greater than −20.

ChIP–seq peak intersection

BEDtools⁸⁰ intersectBed was used to identify genomic intervals overlapping by >10% in BED files listing coordinates of consensus and differentially occupied peak sets for each immunoprecipitated factor.

ATAC–seq

ATAC–seq sample preparation was performed as described in ref. ³⁵. Briefly, adherent cells were treated with StemPro Accutase (ThermoFisher) to obtain a single cell suspension, counted and resuspended to obtain 50,000 cells per sample in ice-cold PBS. Cells were pelleted and resuspended in lysis buffer (10 mM Tris–HCl pH 7.4, 10 mM NaCl, 3 mM MgCl₂ and 0.1% IGEPAL). Following a 10 min centrifugation at 4 °C, nucleic extracts were resuspended in transposition buffer for 30 min at 37 °C and purified using a QIAGEN MinElute PCR Purification kit following the manufacturer’s instructions. Transposed DNA was eluted in a 10 μl volume and amplified by PCR with Nextera primers to generate single-indexed libraries. Libraries were sequenced as paired-end, 101 bp reads on the Illumina High-Seq 4000 platform (Francis Crick Institute).

ATAC–seq analysis

The nf-core/atacseq pipeline (version 1.0.0; https://doi.org/10.5281/zenodo.2634133)⁷⁵ written in the Nextflow domain specific language (version 19.10.0)⁷⁶ was used to perform the primary analysis of the samples in conjunction with Singularity (version 2.6.0)⁷⁷. The command used was ‘ nextflow run nf-core/ATAC-seq –design design.csv –genome mm10 –gtf refseq_genes.gtf -profile crick -r 1.0.0’. The nf-core/ATAC-seq pipeline uses similar processing steps as described for the nf-core/ChIP-seq pipeline in the previous section but with additional steps specific to ATAC–seq analysis, including removal of mitochondrial reads.

Nucleosome analysis

The NucleoATAC package (version 0.3.4)³² was run in default mode. Analysis was performed on all genomic intervals called as peaks from ATAC–seq data as described above. Metaplots of the occ.bedgraph files for each experimental condition were plotted using deepTools to score the average nucleosome occupancy within each peak cluster. Tracks of the occ.bedgraph and nucleoatac_signal.smooth.bedgraph files were visualized using the IGV genome browser⁸⁹ to illustrate the occupancy and position of nucleosomes at genomic intervals of interest.

Statistics and reproducibility

No statistical method was used to pre-determine sample size. No data were excluded from the analyses. The experiments were not randomized. The investigators were not blinded to allocation during experiments and outcome assessment. Software used for statistical analysis is detailed in Supplementary Table 2.

For all statistical analyses, data were obtained from a minimum of three independent experiments. Details of replicate numbers, quantification and statistics for each experiment are specified in the figure legends.

Availability of unique biological material

All embryonic stem cell lines described for the first time in this study are available from James Briscoe upon request.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

Deep-sequencing (ChIP–seq, ATAC–seq and RNA-seq) data generated during this study have been deposited in the Gene Expression Omnibus (GEO) under the accession code GSE162774. Previously published ChIP–seq, ATAC–seq and RNA-seq data that were re-analysed during this study are available under accession codes GSE64059, GSE84899, GSE93524, E-MTAB-2268, E-MTAB-2958 and E-MTAB-6337. Details of individual samples re-analysed are described in Supplementary Table 3. Source data for Figs. 1,2,4,5 and 7 and Extended Data Figs. 1,2,3 and 9 are provided in source data. All other data supporting the findings of this study are provided in supplementary information or are available from the corresponding author on reasonable request. Source data are provided with this paper.

Code availability

All data were processed using published nf-core pipelines as detailed in Methods.

References

Steinhart, Z. & Angers, S. Wnt signaling in development and tissue homeostasis. Development 145, dev146589 (2018).
Article PubMed CAS Google Scholar
Cadigan, K. M. & Waterman, M. L. TCF/LEFs and Wnt signaling in the nucleus. Cold Spring Harb. Perspect. Biol. 4, a007906 (2012).
Article PubMed PubMed Central CAS Google Scholar
Madeja, Z. E., Hryniewicz, K., Orsztynowicz, M., Pawlak, P. & Perkowska, A. WNT/β-catenin signaling affects cell lineage and pluripotency-specific gene expression in bovine blastocysts: prospects for bovine embryonic stem cell derivation. Stem Cells Dev. 24, 2437–2454 (2015).
Article CAS PubMed Google Scholar
Ying, Q.-L. et al. The ground state of embryonic stem cell self-renewal. Nature 453, 519–523 (2008).
Article CAS PubMed PubMed Central Google Scholar
Deschamps, J. & Nes, Jvan Developmental regulation of the Hox genes during axial morphogenesis in the mouse. Development 132, 2931–2942 (2005).
Article CAS PubMed Google Scholar
Henrique, D., Abranches, E., Verrier, L. & Storey, K. G. Neuromesodermal progenitors and the making of the spinal cord. Development 142, 2864–2875 (2015).
Article CAS PubMed Google Scholar
Martin, B. L. & Kimelman, D. Canonical Wnt signaling dynamically controls multiple stem cell fate decisions during vertebrate body formation. Dev. Cell 22, 223–232 (2012).
Article CAS PubMed PubMed Central Google Scholar
Tsakiridis, A. et al. Distinct Wnt-driven primitive streak-like populations reflect in vivo lineage precursors. Development 141, 1209–1221 (2014).
Article CAS PubMed PubMed Central Google Scholar
Gouti, M. et al. In vitro generation of neuromesodermal progenitors reveals distinct roles for Wnt signalling in the specification of spinal cord and paraxial mesoderm identity. PLoS Biol. 12, e1001937 (2014).
Article PubMed PubMed Central CAS Google Scholar
Garriock, R. J. et al. Lineage tracing of neuromesodermal progenitors reveals novel Wnt-dependent roles in trunk progenitor cell maintenance and differentiation. Development 142, 1628–1638 (2015).
Article CAS PubMed PubMed Central Google Scholar
Gouti, M. et al. A gene regulatory network balances neural and mesoderm specification during vertebrate trunk development. Dev. Cell 41, 243–261.e7 (2017).
Article CAS PubMed PubMed Central Google Scholar
Koch, F. et al. Antagonistic activities of Sox2 and brachyury control the fate choice of neuro-mesodermal progenitors. Dev. Cell 42, 514–526.e7 (2017).
Article CAS PubMed Google Scholar
Veenvliet, J. V. et al. Mouse embryonic stem cells self-organize into trunk-like structures with neural tube and somites. Science 370, eaba4937 (2020).
Article CAS PubMed Google Scholar
Avilion, A. A. et al. Multipotent cell lineages in early mouse development depend on SOX2 function. Genes Dev. 17, 126–140 (2003).
Article CAS PubMed PubMed Central Google Scholar
Masui, S. et al. Pluripotency governed by Sox2 via regulation of Oct3/4 expression in mouse embryonic stem cells. Nat. Cell Biol. 9, 625–635 (2007).
Article CAS PubMed Google Scholar
Wood, H. B. & Episkopou, V. Comparative expression of the mouse Sox1, Sox2 and Sox3 genes from pre-gastrulation to early somite stages. Mech. Dev. 86, 197–201 (1999).
Article CAS PubMed Google Scholar
Wymeersch, F. J. et al. Position-dependent plasticity of distinct progenitor types in the primitive streak. eLife 5, e10042 (2016).
Article PubMed PubMed Central Google Scholar
Mulas, C. et al. Oct4 regulates the embryonic axis and coordinates exit from pluripotency and germ layer specification in the mouse embryo. Development 145, dev159103 (2018).
Article PubMed PubMed Central CAS Google Scholar
Kinney, B. A. et al. Sox2 and canonical Wnt signaling interact to activate a developmental checkpoint coordinating morphogenesis with mesoderm fate acquisition. Cell Rep. 33, 108311 (2020).
Article CAS PubMed PubMed Central Google Scholar
Boyer, L. A. et al. Core transcriptional regulatory circuitry in human embryonic stem cells. Cell 122, 947–956 (2005).
Article CAS PubMed PubMed Central Google Scholar
Cole, M. F., Johnstone, S. E., Newman, J. J., Kagey, M. H. & Young, R. A. Tcf3 is an integral component of the core regulatory circuitry of embryonic stem cells. Genes Dev. 22, 746–755 (2008).
Article CAS PubMed PubMed Central Google Scholar
Yi, F. et al. Opposing effects of Tcf3 and Tcf1 control Wnt stimulation of embryonic stem cell self-renewal. Nat. Cell Biol. 13, 762–770 (2011).
Article CAS PubMed PubMed Central Google Scholar
Corsinotti, A. et al. Distinct SoxB1 networks are required for naïve and primed pluripotency. eLife 6, e27746 (2017).
Article PubMed PubMed Central Google Scholar
Wang, Z., Oron, E., Nelson, B., Razis, S. & Ivanova, N. Distinct lineage specification roles for NANOG, OCT4, and SOX2 in human embryonic stem cells. Cell Stem Cell 10, 440–454 (2012).
Article CAS PubMed Google Scholar
Thomson, M. et al. Pluripotency factors in embryonic stem cells regulate differentiation into germ layers. Cell 145, 875–889 (2011).
Article CAS PubMed PubMed Central Google Scholar
Zhang, X., Peterson, K. A., Liu, X. S., McMahon, A. P. & Ohba, S. Gene regulatory networks mediating canonical Wnt signal directed control of pluripotency and differentiation in embryo stem cells. Stem Cells 31, 2667–2679 (2013).
Article CAS PubMed PubMed Central Google Scholar
Bergsland, M. et al. Sequentially acting Sox transcription factors in neural lineage development. Genes Dev. 25, 2453–2464 (2011).
Article CAS PubMed PubMed Central Google Scholar
Soufi, A. et al. Pioneer transcription factors target partial DNA motifs on nucleosomes to initiate reprogramming. Cell 161, 555–568 (2015).
Article CAS PubMed PubMed Central Google Scholar
Dodonova, S. O., Zhu, F., Dienemann, C., Taipale, J. & Cramer, P. Nucleosome-bound SOX2 and SOX11 structures elucidate pioneer factor function. Nature 580, 669–672 (2020).
Article CAS PubMed Google Scholar
Soufi, A., Donahue, G. & Zaret, K. S. Facilitators and impediments of the pluripotency reprogramming factors’ initial engagement with the genome. Cell 151, 994–1004 (2012).
Article CAS PubMed PubMed Central Google Scholar
Malik, V. et al. Pluripotency reprogramming by competent and incompetent POU factors uncovers temporal dependency for Oct4 and Sox2. Nat. Commun. 10, 3477 (2019).
Article PubMed PubMed Central CAS Google Scholar
Schep, A. N. et al. Structured nucleosome fingerprints enable high-resolution mapping of chromatin architecture within regulatory regions. Genome Res. 25, 1757–1770 (2015).
Article CAS PubMed PubMed Central Google Scholar
Slattery, M. et al. Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins. Cell 147, 1270–1282 (2011).
Article CAS PubMed PubMed Central Google Scholar
Crocker, J. et al. Low affinity binding site clusters confer hox specificity and regulatory robustness. Cell 160, 191–203 (2015).
Article CAS PubMed Google Scholar
Metzis, V. et al. Nervous system regionalization entails axial allocation before neural differentiation. Cell 175, 1105–1118.e17 (2018).
Article CAS PubMed PubMed Central Google Scholar
Gaunt, S. J., Drage, D. & Trubshaw, R. C. cdx4/lacZ and cdx2/lacZ protein gradients formed by decay during gastrulation in the mouse. Int. J. Dev. Biol. 49, 901–908 (2005).
Article CAS PubMed Google Scholar
Wang, W. C. H. & Shashikant, C. S. Evidence for positive and negative regulation of the mouse Cdx2 gene. J. Exp. Zool. B 308B, 308–321 (2007).
Article CAS Google Scholar
Mendjan, S. et al. NANOG and CDX2 pattern distinct subtypes of human mesoderm during exit from pluripotency. Cell Stem Cell 15, 310–325 (2014).
Article CAS PubMed Google Scholar
Teo, A. K. K. et al. Pluripotency factors regulate definitive endoderm specification through eomesodermin. Genes Dev. 25, 238–250 (2011).
Article CAS PubMed PubMed Central Google Scholar
Bylund, M., Andersson, E., Novitch, B. G. & Muhr, J. Vertebrate neurogenesis is counteracted by Sox1–3 activity. Nat. Neurosci. 6, 1162–1168 (2003).
Article CAS PubMed Google Scholar
Huang, A. & Saunders, T. E. in Current Topics in Developmental Biology Vol. 137 (eds. Small, S. & Briscoe, J.), Ch. 3, 79–117 (Academic Press, 2020).
Jacob, J. et al. Retinoid acid specifies neuronal identity through graded expression of Ascl1. Curr. Biol. 23, 412–418 (2013).
Article CAS PubMed PubMed Central Google Scholar
Chen, A. I., de Nooij, J. C. & Jessell, T. M. Graded activity of transcription factor runx3 specifies the laminar termination pattern of sensory axons in the developing spinal cord. Neuron 49, 395–408 (2006).
Article CAS PubMed Google Scholar
Sansom, S. N. & Livesey, F. J. Gradients in the brain: the control of the development of form and function in the cerebral cortex. Cold Spring Harb. Perspect. Biol. 1, a002519 (2009).
Article PubMed PubMed Central CAS Google Scholar
Urbán, N. et al. Return to quiescence of mouse neural stem cells by degradation of a proactivation protein. Science 353, 292–295 (2016).
Article PubMed PubMed Central CAS Google Scholar
Farley, E. K. et al. Suboptimization of developmental enhancers. Science 350, 325–328 (2015).
Article CAS PubMed PubMed Central Google Scholar
Geusz, R. J. et al. Sequence logic at enhancers governs a dual mechanism of endodermal organ fate induction by FOXA pioneer factors. Nat. Commun. 12, 6636 (2021).
Article CAS PubMed PubMed Central Google Scholar
Li, G. & Widom, J. Nucleosomes facilitate their own invasion. Nat. Struct. Mol. Biol. 11, 763–769 (2004).
Article CAS PubMed Google Scholar
Polach, K. J. & Widom, J. Mechanism of protein access to specific DNA sequences in chromatin: a dynamic equilibrium model for gene regulation. J. Mol. Biol. 254, 130–149 (1995).
Article CAS PubMed Google Scholar
Workman, J. L. & Kingston, R. E. Nucleosome core displacement in vitro via a metastable transcription factor-nucleosome complex. Science 258, 1780–1784 (1992).
Article CAS PubMed Google Scholar
Kalkan, T. et al. Complementary activity of ETV5, RBPJ, and TCF3 drives formative transition from naive pluripotency. Cell Stem Cell 24, 785–801.e7 (2019).
Article CAS PubMed PubMed Central Google Scholar
Chen, L. et al. Cross-regulation of the Nanog and Cdx2 promoters. Cell Res. 19, 1052–1061 (2009).
Article PubMed Google Scholar
Javali, A. et al. Co-expression of Tbx6 and Sox2 identifies a novel transient neuromesoderm progenitor cell state. Development 144, 4522–4529 (2017).
CAS PubMed Google Scholar
Iwafuchi-Doi, M. et al. Transcriptional regulatory networks in epiblast cells and during anterior neural plate development as modeled in epiblast stem cells. Development 139, 3926–3937 (2012).
Article CAS PubMed Google Scholar
Hart, A. H., Hartley, L., Ibrahim, M. & Robb, L. Identification, cloning and expression analysis of the pluripotency promoting Nanog genes in mouse and human. Dev. Dyn. 230, 187–198 (2004).
Article CAS PubMed Google Scholar
Mesnard, D., Guzman-Ayala, M. & Constam, D. B. Nodal specifies embryonic visceral endoderm and sustains pluripotent cells in the epiblast before overt axial patterning. Development 133, 2497–2505 (2006).
Article CAS PubMed Google Scholar
Morgani, S., Nichols, J. & Hadjantonakis, A.-K. The many faces of pluripotency: in vitro adaptations of a continuum of in vivo states. BMC Dev. Biol. 17, 7 (2017).
Article PubMed PubMed Central CAS Google Scholar
Osorno, R. et al. The developmental dismantling of pluripotency is reversed by ectopic Oct4 expression. Development 139, 2288–2298 (2012).
Article CAS PubMed PubMed Central Google Scholar
Tam, P. P., Parameswaran, M., Kinder, S. J. & Weinberger, R. P. The allocation of epiblast cells to the embryonic heart and other mesodermal lineages: the role of ingression and tissue movement during gastrulation. Development 124, 1631–1642 (1997).
Article CAS PubMed Google Scholar
Copf, T., Schröder, R. & Averof, M. Ancestral role of caudal genes in axis elongation and segmentation. PNAS 101, 17711–17715 (2004).
Article CAS PubMed PubMed Central Google Scholar
Clark, E. & Peel, A. D. Evidence for the temporal regulation of insect segmentation by a conserved sequence of transcription factors. Development 145, dev155580 (2018).
Bonatto Paese, C. L. Investigating the Roles of Hes and Sox Genes during Embryogenesis of the Spider P. tepidariorum. PhD thesis, Oxford Brookes Univ. (2018).
Kearns, N. A. et al. Functional annotation of native enhancers with a Cas9–histone demethylase fusion. Nat. Methods 12, 401–403 (2015).
Article CAS PubMed PubMed Central Google Scholar
Amin, S. et al. Cdx and T brachyury co-activate growth signaling in the embryonic axial progenitor niche. Cell Rep. 17, 3165–3177 (2016).
Article CAS PubMed Google Scholar
Boroviak, T. et al. Lineage-specific profiling delineates the emergence and progression of naive pluripotency in mammalian embryogenesis. Dev. Cell 35, 366–382 (2015).
Article CAS PubMed PubMed Central Google Scholar
Serafimidis, I., Rakatzi, I., Episkopou, V., Gouti, M. & Gavalas, A. Novel effectors of directed and ngn3-mediated differentiation of mouse embryonic stem cells into endocrine pancreas progenitors. Stem Cells 26, 3–16 (2008).
Article CAS PubMed Google Scholar
Khan, A. et al. JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework. Nucleic Acids Res. 46, D260–D266 (2018).
Article CAS PubMed Google Scholar
Hahne, F. et al. flowCore: a Bioconductor package for high throughput flow cytometry. BMC Bioinformatics 10, 106 (2009).
Article PubMed PubMed Central CAS Google Scholar
Wickham, H. ggplot2: Elegant Graphics for Data Analysis. (Springer-Verlag, 2016).
Book Google Scholar
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 10–12 (2011).
Article Google Scholar
Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).
Article CAS PubMed PubMed Central Google Scholar
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Article CAS PubMed Google Scholar
Karolchik, D. et al. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 32, D493–D496 (2004).
Article CAS PubMed PubMed Central Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Article PubMed PubMed Central CAS Google Scholar
Ewels, P. A. et al. The nf-core framework for community-curated bioinformatics pipelines. Nat. Biotechnol. 38, 276–278 (2020).
Article CAS PubMed Google Scholar
Di Tommaso, P. et al. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 35, 316–319 (2017).
Article PubMed CAS Google Scholar
Kurtzer, G. M., Sochat, V. & Bauer, M. W. Singularity: scientific containers for mobility of compute. PLoS ONE 12, e0177459 (2017).
Article PubMed PubMed Central CAS Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Article PubMed PubMed Central CAS Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS PubMed PubMed Central Google Scholar
Barnett, D. W., Garrison, E. K., Quinlan, A. R., Strömberg, M. P. & Marth, G. T. BamTools: a C++ API and toolkit for analyzing and managing BAM files. Bioinformatics 27, 1691–1692 (2011).
Article CAS PubMed PubMed Central Google Scholar
Kent, W. J., Zweig, A. S., Barber, G., Hinrichs, A. S. & Karolchik, D. BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics 26, 2204–2207 (2010).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Y. et al. Model-based analysis of ChIP–seq (MACS). Genome Biol. 9, R137 (2008).
Article PubMed PubMed Central CAS Google Scholar
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).
Article CAS PubMed PubMed Central Google Scholar
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
Article CAS PubMed Google Scholar
Ewels, P., Magnusson, M., Lundin, S. & Käller, M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047–3048 (2016).
Article CAS PubMed PubMed Central Google Scholar
Ramírez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, W160–W165 (2016).
Article PubMed PubMed Central CAS Google Scholar
Landt, S. G. et al. ChIP–seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res. 22, 1813–1831 (2012).
Article CAS PubMed PubMed Central Google Scholar
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
Article CAS PubMed PubMed Central Google Scholar
Bailey, T. L. et al. MEME Suite: tools for motif discovery and searching. Nucleic Acids Res. 37, W202–W208 (2009).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We are grateful to T. Frith, K. Ivanovitch and M. Melchionda for experimental support and critical feedback during manuscript preparation. We also thank A. Sagner and other members of the lab for their generosity sharing insight, expertise and reagents, and the Crick Science Technology Platforms, in particular the Advanced Sequencing Facility, Flow Cytometry Facility, and the Bioinformatics and Biostatistics group. This work was supported by the Francis Crick Institute, which receives its core funding from Cancer Research UK, the UK Medical Research Council and Wellcome Trust (all under FC001051); and by the European Research Council under European Union (EU) Horizon 2020 research and innovation program grant 742138. This research was funded in whole, or in part, by the Wellcome Trust (FC001051). For the purpose of Open Access, the authors have applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission.

Author information

Vicki Metzis
Present address: Institute of Clinical Sciences, Imperial College London, London, UK

Authors and Affiliations

The Francis Crick Institute, London, UK
Robert Blassberg, Harshil Patel, Thomas Watson, Vicki Metzis, M. Joaquina Delás & James Briscoe
Stem Cell Modelling of Development & Disease Group, Max Delbrück Center for Molecular Medicine, Berlin, Germany
Mina Gouti

Authors

Robert Blassberg
View author publications
You can also search for this author in PubMed Google Scholar
Harshil Patel
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Watson
View author publications
You can also search for this author in PubMed Google Scholar
Mina Gouti
View author publications
You can also search for this author in PubMed Google Scholar
Vicki Metzis
View author publications
You can also search for this author in PubMed Google Scholar
M. Joaquina Delás
View author publications
You can also search for this author in PubMed Google Scholar
James Briscoe
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

R.B. and J.B. conceived the project, interpreted the data and wrote the manuscript with input from all authors. R.B. designed and performed experiments and data analysis. H.P. performed bioinformatic analysis. T.W. generated reagents. M.G. shared reagents, protocols and data unpublished at the time of initiating this study. V.M. provided advice and assistance with ATAC experiments. M.J.D. assisted with ATAC experiments and bioinformatic analyses and edited the manuscript.

Corresponding author

Correspondence to James Briscoe.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Cell Biology thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 In vitro differentiated CEpiLCs recapitulate in vivo epiblast cell gene expression programmes.

(a) Epiblast-like cells differentiated in FL medium recapitulate in vivo gene-expression dynamics⁶⁵. Illustrative genes for each cluster are highlighted. Gene expression profiles from 3 biological replicates are shown. See methods for details of genes plotted. (b) Flow cytometry analysis shows that culture in FLC medium reduces SOX2 levels and induces T/BRA. (c) Measurement of relative mRNA expression by RT-qPCR shows that FLC medium induces caudal epiblast markers Cdx1,2,4, and (d) posterior Hox genes. Normalised mean expression is shown relative to peak expression. Line is a loess fit to the data. The number of biological replicates averaged to fit each point are shown in Source numerical data available in source data.

Source data

Extended Data Fig. 2 SOX2 maintains pluripotency of SOX2-TetON ES cells.

(a) Flow cytometry analysis shows that SOX2-TetON cultured in the presence of doxycycline (SOX2-ON) express SOX2 at comparable levels to pluripotent stem cells. (b) Brightfield migrograph showing that SOX2-ON maintain an undifferentiated morphology in ‘2i’ medium. (c) Flow cytometry analysis shows that SOX2 levels are progressively reduced following removal of Dox from SOX2-TetON (SOX2-OFF). (d) Brightfield migrograph showing that SOX2-OFF cultured in ‘2i’ medium lose their undifferentiated morphology. (e) Immunofluorescence image showing that SOX2-OFF cultured in ‘2i’ lose expression of pluripotency markers OCT4 and NANOG and (f) induce the primitive streak marker T/BRA in the absence of SOX2 expression. Images shown in figures B, D, E, F are representative of 3 independent experiments. Scale bars represent 75uM. (g) Measurement of relative mRNA expression by RT-qPCR shows that SOX2-OFF induce expression of the EpiLC marker Sox3 at similar levels to WT EpiLCs. Measurement of relative mRNA expression by RT-qPCR shows that ablation of Sox3 in SOX2-OFF (SOX2-OFF, SOX3^-) results in loss of expression of the pluriptency marker Pou5f1 (h) and EpiLC marker Fgf5 (i). (j) FACS analysis shows that SOX2 levels in SOX2-OFF are intermediate to SOX2 low CEpiLCs and SOX2 -ve paraxial mesoderm progenitors. (k) Measurement of relative mRNA expression by RT-qPCR shows that Sox3 levels are reduced in CHIR-stimulated SOX2-OFF compared to WT CEpiLCs. Each data point in G, H, I, and K represents an individual biological replicate. Bars denote mean ± s.e.m. P values calculated for differences of mean expression by two-tailed Student’s t-test are shown. In G n=12 for ESC, n=4 for EpiLC, and n=4 for SOX2-OFF samples. In H n=6 for EpiLC, n=4 for SOX2-OFF, and n=6 for SOX2-OFF, SOX3^- samples. In I n=10 for EpiLC, n=4 for SOX2-OFF, and n=6 for SOX2-OFF, SOX3^- samples. In K n=4 for CEpiLC, and n=4 for SOX2-OFF samples. Source numerical data are available in source data.

Source data

Extended Data Fig. 3 SOX2-OFF and SOX2-ON adopt distinct identities in response to WNT signaling.

Measurement of relative mRNA expression by RT-PCR shows that expression of (a) the paraxial mesoderm marker (Tbx6), and (b) the somitic mesoderm marker (Meox1), is reduced in SOX2-OFF compared to CEpiLCs. (c) GO analysis reveals that genes specifically upregulated in SOX2-OFF compared to EpiLC and CEpiLC (from clustering in Fig. 2c) are enriched for biological processes indicative of an early/anterior streak identity. (d) Measurement of relative mRNA expression by RT-qPCR shows that induction of the paraxial mesoderm marker (Tbx6) is repressed in SOX2-ON cultured in FLC medium. (e) Measurement of relative mRNA expression by RT-qPCR shows that Sox1 is not induced in SOX2-ON cultured in FLC medium. (f) SOX2-ON re-express markers associated with pluripotency when cultured in FLC medium. Triplicate FPKM values for each gene are shown. In A, B, D, E each data point represents an individual biological replicate. Bars denote mean ± s.e.m. P values calculated for differences of mean expression by two-tailed Student’s t-test are shown. In A n = 14 for 96hr samples. In B n = 4 for paraxial mesoderm and SOX2-OFF. In D n = 14 for CEpiLC and n = 8 for SOX2-ON at 96hr. In E n = 12 for Spinal cord, n = 14 for paraxial mesoderm, and n = 8 for SOX2-ON at 96hr. Source numerical data are available in source data.

Source data

Extended Data Fig. 4 SOX2 and β-catenin co-occupy differential peaks.

(a) Comparison of SOX2 and β-catenin ChIP-seq peaks identified in data from this study with¹². (b) Our consensus SOX2 and β-catenin peak sets include the majority of those identifiable from the data from¹², plus a large number of additional peaks. (c) Peaks differentially occupied by β-catenin (FDR <0.05) between ES cells and CEpiLCs exhibit a correlated change of SOX2 occupancy. FDR was determined by DESeq2 padj metric from n=3 biological replicates. log2FC calculated by DESeq2. (d) Differentially occupied β-catenin peaks overlap with differentially occupied SOX2 peaks to a greater extent than with peaks occupied by other ENCODE chromatin-associated proteins (ENCODE). Numbers of peaks in each set are show. (e) Metaplots of triplicate SOX2 and (f) β-catenin ChIP-seq signals at cell-type specific SOX2 bound CREs classified in Fig. 3e. Metaplots show mean ± s.e.m.

Extended Data Fig. 5 TCF/LEF are redistributed with SOX2 and β-catenin.

The majority of (a) LEF1, (b) TCF7L1, and (c) TCF7L2 differential peaks (FDR <0.05) overlap SOX2 differential peaks. FDR was determined by DESeq2 padj metric from n=3 biological replicates. The majority of (d) LEF1, (e) TCF7L1, and (f) TCF7L2 differential peaks (FDR <0.05) overlap β-catenin differential peaks. FDR was determined by DESeq2 padj metric from n=3 biological replicates. (g) LEF1, (h) TCF7L1, and (i) TCF7L2 exhibit a similar cell-type specific pattern of occupancy as β-catenin at SOX2 differential peaks classified in Fig. 3e (compare to Fig. 3e,f). Metaplots of triplicate (j) LEF1, (k) TCF7L1 and (l) TCF7L2 ChIP-seq signals at cell-type specific SOX2 bound peaks classified in Fig. 3e. Metaplots show mean ± s.e.m.

Extended Data Fig. 6 SOX2-ON express spinal-cord markers at low levels.

(a) Differentially expressed genes associated with CEpiLC specific cluster 1 CREs are enriched for biological processes underlying the patterning of the anterior-posterior axis. (b) Differentially expressed genes associated with SOX2-ON specific cluster 3 CREs are enriched for biological processes related to nervous system development. Differential expression criteria for genes analysed in A and B = FDR < 0.05 as determined by DESeq2 padj metric from n=3 biological replicates. (c) Cluster 3 associated genes with GO terms related to nervous system development from (b) that are expressed at high levels in spinal-cord (SC) neural progenitors exhibit comparatively low expression in SOX2-ON, ES cells, EpiLCs and CEpiLCs. Illustrative genes for each cluster are highlighted. Spinal cord progenitor data reanalysed from⁹.

Extended Data Fig. 7 SOX2 promotes nucleosome eviction from peaks containing high scoring motifs.

(a) Metaplots of ATAC-seq data at cell-type specific SOX2 bound CREs classified in Fig. 3e. Naïve ATAC-seq data reanalysed from⁶³. Metaplots show mean ± s.e.m. (b) SOX2 occupancy in CEpiLCs and EpiLCs at cell-type specific CREs classified in Fig. 3e. (c) Percentage of peaks from each cluster with at least 1 SOX2 motif with the indicated number of mismatches. (d) Average number of motifs per peak in each cluster with less than or equal to the indicated number of mismatches. (e) Nucleosome occupancy and SOX2 ChIP occupancy profiles of individual cluster 3 peaks ranked by their total FIMO motif score. Higher scoring peaks show a higher intensiy of SOX2 ChIP signal and a greater degree of nucleosome depletion at SOX2 peak centres in SOX2-ON.

Extended Data Fig. 8 SOX2 and β-catenin co-occupy cell-type specific CREs with T/BRA and CDX2.

(a) SOX2 and β-catenin occupancy is correlated with T/BRA and CDX2 in CEpiLCs at cell-type specific CREs classified in Fig. 3e. Metaplots show mean ± s.e.m. (b) A sub-set of low-affinity SOX2 motifs within Cluster 1 SOX2 bound CREs are located in close proximity to CDX motifs. (c) A sub-set of low-affinity SOX2 motifs within Cluster 2 and Cluster 4 SOX2 bound CREs are located in close proximity to T/BRA motifs. Differential occupancy of (d) SOX2 and (e) β-catenin at T/BRA co-occupied peaks in T/BraKO progenitors compared to CEpiLCs. Differentially occupied peaks coloured red are statistically different between conditions (FDR <0.05). FDR was determined by DESeq2 padj metric from n=3 biological replicates. Labelled peaks are adjacent to genes differentially expressed in T/BraKO progenitors. (f) SOX2 and β-catenin occupancy is specifically reduced at CREs adjacent to paraxial mesoderm determinants in T/BraKO progenitors. Average SOX2 (g), β-catenin (h), and LEF1 (i) occupancy is reduced at cluster 1 CDX2 co-occupied peaks. Average SOX2 (j), β-catenin (k) and LEF1 (l) occupancy at cluster 4 CDX2 occupied peaks. Average (m) SOX2, (n) β-catenin, and (o) LEF1 at T/BRA co-occupied cluster 2 peaks in CEpiLCs and T/BraKO progenitors. Average (p) SOX2, (q) β-catenin, and (r) LEF1 at T/BRA co-occupied cluster 4 peaks in CEpiLCs and T/BraKO progenitors. (s) Average nucleosome occupancy is unchaged compared to CEpiLCs across all clusters of SOX differential peaks in CdxKO and T/BraKO progenitors (reanalysed from³⁵).

Extended Data Fig. 9 Repression of SOX2, Sox3 and CDX2 expression in early-streak progenitors.

(a) Flow cytometry analysis shows that CDX2 expression is uniformly absent from SOX2-OFF and SOX2-ON cultured in FLC medium. (b) Measurement of relative mRNA expression by RT-qPCR shows that early primitive streak markers are induced to comparable levels by the addition of activin (10ng/ml) to FLC medium and in SOX2OFF relative to CEpiLCs. (c) Flow cytometry analysis shows SOX2 levels in EpiLCs, CEpiLCs, pluripotent and early streak progenitors. CHIR concentration was 5μM in 2i+ conditions. (d) Measurement of relative mRNA expression by RT-qPCR shows that Sox3 levels are reduced in early-streak progenitors induced by the addition of activin to FLC medium. (e) Flow cytometry analysis shows the distribution of CDX2 expression is similar, whereas (f) Venus fluorescence is reduced in two Sox2del subclones analysed in Fig. 7f compared to the parental Cdx2 CRE reporter line. (g) Measurement of relative mRNA expression by RT-PCR shows that primitive streak marker expression is reduced in CEpiLCs and SOX2-OFF cultured in FLC medium plus SB-431542. In B, D, and G each data point represents an individual biological replicate. Bars denote mean ± s.e.m. P values calculated for differences of mean expression by two-tailed Student’s t-test are shown. In B for Bra, n = 26 for CEpiLCs and n = 8 for early streak progenitors; for Eomes n = 14 for CEpiLCs and n = 8 for early streak progenitors; for MixL1 n = 14 for CEpiLCs and n = 8 for early streak progenitors; for Nanog n = 14 for CEpiLCs and n = 8 for early streak progenitors. In D n = 5 for CEpiLCs and early streak progenitors. In G for Bra, n = 29 for CEpiLCs, n = 6 for SB treated CEpiLCs, n= 20 for SOX2-OFF, n= 6 for SB treated SOX2-OFF; for Eomes n = 20 for CEpiLCs, n = 6 for SB treated CEpiLCs, n= 12 for SOX2-OFF, n= 6 for SB treated SOX2-OFF; for MixL1 n = 20 for CEpiLCs, n = 6 for SB treated CEpiLCs, n= 12 for SOX2-OFF, n= 6 for SB treated SOX2-OFF; for Nanog n = 20 for CEpiLCs, n = 6 for SB treated CEpiLCs, n= 12 for SOX2-OFF, n= 6 for SB treated SOX2-OFF. Source numerical data are available in source data.

Source data

Extended Data Fig. 10 Representative FACS gating strategy used throughout the study.

Strategy applied to gate single cells plotted in Fig. 1f,h; Extended Data Figures 1B, 2A, 2C, 2J, 9A, 9C, 9E, 9F; and for analysis of average reporter activity in Fig. 7d–f.

Supplementary information

Reporting Summary

Peer Review File

Supplementary Table

Supplementary Tables 1–10.

Source data

Source Data Fig. 1

Statistical source data.

Source Data Fig. 2

Statistical source data.

Source Data Fig. 4

Statistical source data.

Source Data Fig. 5

Statistical source data.

Source Data Fig. 7

Statistical source data.

Source Data Extended Data Fig. 1

Statistical source data.

Source Data Extended Data Fig. 2

Statistical source data.

Source Data Extended Data Fig. 3

Statistical source data.

Source Data Extended Data Fig. 9

Statistical source data.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Blassberg, R., Patel, H., Watson, T. et al. Sox2 levels regulate the chromatin occupancy of WNT mediators in epiblast progenitors responsible for vertebrate body formation. Nat Cell Biol 24, 633–644 (2022). https://doi.org/10.1038/s41556-022-00910-2

Download citation

Received: 08 January 2021
Accepted: 29 March 2022
Published: 12 May 2022
Issue Date: May 2022
DOI: https://doi.org/10.1038/s41556-022-00910-2
Springer Nature Limited

This article is cited by

Pioneer factor Pax7 initiates two-step cell-cycle-dependent chromatin opening
- Arthur Gouhier
- Justine Dumoulin-Gagnon
- Jacques Drouin
Nature Structural & Molecular Biology (2024)
Protein-intrinsic properties and context-dependent effects regulate pioneer factor binding and function
- Tyler J. Gibson
- Elizabeth D. Larson
- Melissa M. Harrison
Nature Structural & Molecular Biology (2024)
Lineage regulators TFAP2C and NR5A2 function as bipotency activators in totipotent embryos
- Lijia Li
- Fangnong Lai
- Wei Xie
Nature Structural & Molecular Biology (2024)
Multiplex profiling of developmental cis-regulatory elements with quantitative single-cell expression reporters
- Jean-Benoît Lalanne
- Samuel G. Regalado
- Jay Shendure
Nature Methods (2024)
Precise and scalable self-organization in mammalian pseudo-embryos
- Mélody Merle
- Leah Friedman
- Thomas Gregor
Nature Structural & Molecular Biology (2024)

Sox2 levels regulate the chromatin occupancy of WNT mediators in epiblast progenitors responsible for vertebrate body formation

Abstract

Similar content being viewed by others

Main

Results

SOX2 levels alter the response of pluripotent cells

SOX2 downregulation reconfigures β-catenin occupancy

SOX2 levels configure TCF/LEF occupancy

TCF/β-catenin occupancy and transcriptional responses

High SOX2 levels maintain chromatin accessibility

SOX2 occupies low-affinity sites with cell-specific factors

SOX2 levels control CDX2 enhancer activity

Discussion

Methods

Cell lines

Sox2 TetON

Sox2 TetON, Sox3−

Cdx2 intron CRE reporter

ESC culture and differentiation

Immunofluorescence

Intracellular flow cytometry

Quantification of flow cytometry data

RNA extraction

cDNA synthesis and qPCR analysis

RNA-seq

RNA-seq analysis

RNA-seq clustering

RNA-seq associating differential gene expression with differential SOX2 occupancy

RNA-seq GO enrichment

RNA-seq comparison of in vitro to in vivo epiblast differentiation

ChIP–seq

ChIP-seq analysis

ChIP–seq peak clustering

ChIP–seq motif enrichment

ChIP–seq motif scoring with FIMO

ChIP–seq peak intersection

ATAC–seq

ATAC–seq analysis

Nucleosome analysis

Statistics and reproducibility

Availability of unique biological material

Reporting Summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Navigation

Sox2 TetON, Sox3⁻