Embryonic stem cells (ESCs), the pluripotent stem cells derived from the inner cell mass of a blastocyst, provide a vital tool for studying the regulation of early embryonic development and cell fate decision and hold the promise for regenerative medicine [1]. The past few years have witnessed remarkable progress in understanding the ESC fate decision, i.e. either pluripotency maintenance (self-renewal) or proper differentiation [2]. The underlying mechanisms have been largely expanded from the core pluripotent transcription factors (TFs) [3], signaling pathways [4,5,6,7,8,9], specific microRNAs [10, 11], and long non-coding RNAs [12] to alternative pre-messenger RNA (mRNA) splicing (AS) [13, 14], histone modifications (HMs) [15,16,17,18,19], and cell-cycle machinery [20]. These emerging mechanisms suggest their intricate interrelations and potential joint contributions to ESC pluripotency and differentiation, which, however, remain unknown.

Alternative splicing (AS) is one of the most important pre-mRNA processing to increase the diversity of transcriptome and proteome in tissue-dependent and development-dependent manners [21]. The estimates based on RNA-sequencing (RNA-seq) revealed that up to 94%, 60%, and 25% of genes in human, Drosophila melanogaster, and Caenorhabditis elegans, respectively, undergo AS [21,22,23,24,25]. AS also provides a powerful mechanism to control the developmental decision in ESCs [26,27,28]. Specific isoforms are necessary to maintain both the identity and activity of stem cells and switching to different isoforms ensures proper differentiation [29]. In particular, the AS of TFs plays major roles in ESC fate determination, such as FGF4 [30] and FOXP1 [13] for hESC, and Tcf3 [14] and Sall4 [31] for mouse ESCs (mESCs). Understanding the precise regulations on AS would contribute to the elucidation of ESC fate decision and has attracted extensive efforts [32]. For many years, studies aiming to shed light on this process focused on the RNA level, characterizing the manner by which splicing factors (SFs) and auxiliary proteins interact with splicing signals, thereby enabling, facilitating, and regulating RNA splicing. These cis-acting RNA elements and trans-acting SFs have been assembled into splicing code [33], revealing a number of AS regulators critical for ESC differentiation, such as MBNL [34] and SON [28]. However, these genetic controls are far from sufficient to explain the faithful regulation of AS [35], especially in some cases that tissue-specific AS patterns exist despite the identity in sequences and ubiquitous expression of involved SFs [36, 37], indicating additional regulatory layers leading to specific AS patterns. As expected, we are increasingly aware that splicing is not an isolated process; rather, it occurs co-transcriptionally and is presumably also regulated by transcription-related processes. Emerging provocative studies have unveiled that AS is subject to extensive controls not only from genetic but also epigenetic mechanisms due to its co-transcriptional occurrence [38]. The epigenetic mechanisms, such as HMs, benefit ESCs by providing an epigenetic memory for splicing decisions so that the splicing pattern could be passed on during self-renewal and be modified during differentiation without the requirement of establishing new AS rules [38].

HMs have long been thought to play crucial roles in ESC maintenance and differentiation by determining what parts of the genome are expressed. Specific genomic regulatory regions, such as enhancers and promoters, undergo dynamic changes in HMs during ESC differentiation to transcriptionally permit or repress the expression of genes required for cell fate decision [15]. For example, the co-occurrence of the active (H3K4me3) and repressive (H3K27me3) HMs at the promoters of developmentally regulated genes defines the bivalent domains, resulting in the poised states of these genes [39]. These poised states will be dissolved upon differentiation to allow these genes to be active or more stably repressed depending on the lineage being specified, which enables the ESCs to change their identities [40]. In addition to above roles in determining transcripts abundance, HMs are emerging as major regulators to define the transcripts structure by determining how the genome is spliced when being transcribed, adding another layer of regulatory complexity beyond the genetic splicing code [41]. A number of HMs, such as H3K4me3 [42], H3K9me3 [43], H3K36me3 [44, 45], and hyperacetylation of H3 and H4 [46,47,48,49,50], have been proven to regulate AS by either directly recruiting chromatin-associated factors and SFs or indirectly modulating transcriptional elongation rate [38]. Together, these studies reveal that HMs determine not only what parts of the genome are expressed, but also how they are spliced. However, few studies focused on the detailed mechanisms, i.e. epigenetic regulations on AS in the context of cell fate decision.

Additionally, cell-cycle machinery dominates the mechanisms underlying ESC pluripotency and differentiation [20, 51]. Changes of cell fates require going through the cell-cycle progression. Studies in mESCs [52] and hESCs [53, 54] found that the cell fate specification starts in the G1 phase when ESCs can sense differentiation signals. Cell fate commitment is only achieved in G2/M phases when pluripotency is dissolved through cell-cycle-dependent mechanisms. However, whether the HMs and AS and their interrelations are involved in these cell-cycle-dependent mechanisms remains unclear. Therefore, it is intuitive to expect that HMs could contribute to ESC pluripotency and differentiation by regulating the AS of genes required for specific processes, such cell-cycle progression. Nevertheless, we do not even have a comprehensive view of how HMs relate to AS outcome at a genome-wide level during ESC differentiation. Therefore, further studies are required to elucidate the extent to which the HMs are associated with specific splicing repertoire and their joint contributions to ESC fate decision between self-renewal and proper differentiation.

To address these gaps in current knowledge, we performed genome-wide association studies between transcriptome and epigenome of the differentiation from the hESCs (H1 cell line) to five differentiated cell types [15]. These cells cover three germ layers for embryogenesis, adult stem cells, and adult somatic cells, representing multiple lineages of different developmental levels (Additional file 1: Figure S1A). This carefully selected dataset enabled our understanding of AS epigenetic regulations in the context of cell fate decision. First, we identified several thousands of AS events that are differentially spliced between the hESCs and differentiated cells, including 3513 mutually exclusive exons (MXE) and 3678 skipped exons (SE) which were used for further analyses. These hESC differentiation-related AS events involve ~ 20% of expressed genes and characterize the multiple lineage differentiation. Second, we profiled 16 HMs with chromatin immunoprecipitation sequencing (ChIP-seq) data available for all six cell types, including nine types of acetylation and seven types of methylation. Following the observation that the dynamic changes of most HMs are enriched in AS exons and significantly different between inclusion-gain and inclusion-loss exons, we found that three of the 16 investigated HMs (H3K36me3, H3K27ac, and H4K8ac) are strongly associated with 52.8% of hESC differentiation-related AS exons. We then linked the association between HMs and AS to cell-cycle progression based on the additional discovery that the AS genes predominantly function in cell-cycle progression. More intriguingly, we found that HMs and AS are associated in G2/M phases and involved in ESC fate decision through promoting pluripotency state dissolution, repressing self-renewal, or both. In particular, with experimental valuations, we demonstrated an H3K36me3-regulated isoform switch from PBX1a to PBX1b, which is implicated in hESC differentiation by attenuating the activity of the pluripotency regulatory network. Collectively, we presented a mechanism conveying the HM information into cell fate decision through the regulation of AS, which will drive extensive studies on the involvements of HMs in cell fate decision via determining the transcript structure rather than only the transcript abundance.


AS characterizes hESC differentiation

The role of AS in the regulation of ES cell fates adds another notable regulatory layer to the known mechanisms that govern stemness and differentiation [55]. To screen the AS events associated with ES cell fate decision, we investigated a panel of RNA-seq data during hESC (H1 cell line) differentiation [15]. We considered four cell types directly differentiated from H1 cells, including trophoblast-like cells (TBL), mesendoderm (ME), neural progenitor cells (NPC), and mesenchymal stem cells (MSC). We also considered IMR90, a cell line for primary human fetal lung fibroblast, as an example of terminally differentiated cells. These cells represent five cell lineages of different developmental levels (Additional file 1: Figure S1A). We identified thousands of AS events of all types with their changes of “per spliced in” (ΔPSIs) are > 0.1 (inclusion-loss) or < − 0.1 (inclusion-gain), and with the false discovery rates (FDRs) are < 0.05 based on the measurement used by rMATS [56] (Additional file 1: Figure S1B and Table S1, see “Methods”). We implemented further analyses only on the most common AS events, including 3513 MXEs and 3678 SEs, which are referred to as hESC differentiation-associated AS exons (Additional file 1: Figure S1C and Additional file 2: Table S2).

These hESC differentiation-related AS exons possess typical properties, as previously described [57, 58], as follows: (1) most of their hosting genes are not differentially expressed between hESCs and differentiated cells (Additional file 1: Figure S1D); (2) they tend to be shorter with much longer flanking introns compared to the average length of all exons and introns (RefSeq annotation), respectively (Additional file 1: Figure S2A, B); (3) the arrangement of shorter AS exons surrounded by longer introns is consistent across cell lineages and AS types (Additional file 1: Figure S2C, D); and (4) the lengths of AS exons are more often divisible by three to preserve the reading frame (Additional file 1: Figure S2E).

During hESC differentiation, about 20% of expressed genes undergo AS (2257 genes for SE and 2489 genes for MXE), including previously known ESC-specific AS genes, such as the pluripotency factor FOXP1 [13] (Fig. 1a) and the Wnt/β-catenin signalling component CTNND1 [14] (Fig. 1b). These hESC differentiation-related AS genes include many TFs, transcriptional co-factors, chromatin remodelling factors, housekeeping genes, and bivalent domain genes implicated in ESC pluripotency and development [39] (Fig. 1c and Additional file 1: Figure S1C). Enrichment analysis based on a stemness gene set [59] also shows that hESC differentiation-related AS genes are enriched in the regulators or markers that are most significantly associated with stemness signatures of ESCs (Additional file 1: Figure S3A, see “Methods”).

Fig. 1
figure 1

AS characterizes the hESC differentiation. a, b Sashimi plots show two AS events of previously known ESC-specific AS events, FOXP1 (a) and CTNND1 (b). Inset histograms show the PSIs (Ψ) of the AS exons in all cell types based on the MISO estimation. c The bar graph shows that the number of total AS events and lineage-specific AS events increase coordinately with the developmental levels. Higher developmental level induces more (lineage-specific) AS events. MXE.sp. and SE.sp. indicate the percentage of lineage-specific AS events. d Heat maps show the differential “percent splice in” (ΔPSIs) of SE (left) and MXE (right) AS events (rows) for each cell lineage (columns). For MXE event, the ΔPSIs are of the upstream exons. e, f The hosting genes of MXE (e) and SE (f) AS events characterize cell lineages. Black and white bars refer to the common AS genes shared by all cell lineages, while the colour bars indicate the lineage-specific AS genes. The length of the colour bars is proportional to the percentage of lineage-specific genes. Dark fills indicate the inclusion-gain events, while light fills indicate the inclusion-loss events. The numbers in the bars are the proportion of corresponding parts; the numbers in the parentheses are the numbers of common AS genes or lineage-specific AS genes of each lineage. Gain or loss for MXE events refers to the upstream exons. Also see Additional file 1: Figures S1–S3

Clustering on AS events across cell lineages show lineage-dependent splicing patterns (Fig. 1d). Upon hESC differentiation, the SE exons tend to lose their inclusion levels (inclusion-loss), while the upstream exons of MXE events are likely to gain their inclusion levels (inclusion-gain) (Fisher’s exact test, p = 3.83E-107). The numbers of AS events increase accordingly with the developmental level following hESC differentiation (Fig. 1c). For example, the differentiation to ME involves the fewest AS events and ME presents the most stem-cell-like AS profiles, while the IMR90 has the most AS events and exhibits the most similar AS profiles to adult cells (Fig. 1c, d). Inter-lineage comparisons show, on average, that 42.0% of SE and 56.4% of MXE events (Fig. 1c, d and Additional file 1: Figure S3B, C), involved in 29.6% and 38.6% of AS hosting genes (Fig. 1e, f and Additional file 1: Figure S3D, E), are lineage-specific. In contrast, only 0.65% of SE and 0.14% of MEX events (Additional file 1: Figure S3B, C), involved in 0.49% and 1.52% of AS hosting genes, are shared by all lineages (Fig. 1e, f and Additional file 1: Figure S3D, E). Similar trends are observed from pairwise comparisons (Additional file 1: Figure S3F). Furthermore, one-third of AS genes (n = 881) have both MXE and SE events (Additional file 1: Figure S3G). Only four genes are common across all cell lineages and AS types, of which the AS events of Ctnnd1 and Mbd1 have been reported to regulate mESC differentiation [14]. Together, these results demonstrate that AS depicts lineage-dependent and developmental level-dependent characterizations of hESC differentiation.

Dynamic changes of HMs predominantly occur in AS exons

In ESCs, epigenetic mechanisms contribute mainly to maintaining the expression of pluripotency genes and the repression of lineage-specific genes in order to avoid exiting from stemness. Upon differentiation, epigenetic mechanisms orchestrate the expression of developmental programs spatiotemporally to ensure the heritability of existing or newly acquired phenotypic states. Though epigenetic signatures are mainly found to be enriched in promoters and enhancers, it has become increasingly clear that they are also present in gene bodies, especially in exon regions, implying a potential link of epigenetic regulation to pre-mRNA splicing [60, 61]. Consistent with previous reports [36, 37, 62], we also observed that few involved SFs are differentially expressed during H1 cells differentiation (Additional file 1: Figure S3H, see “Methods”), which confirms the existence of an additional layer of epigenetic regulations on AS. However, the extents to which the AS is epigenetically regulated and how these AS genes contribute to the cell fate decision are poorly understood. We focused on 16 HMs, including nine histone acetylation and seven histone methylation that have available data in all six cell types (see “Methods”) and aimed to reveal their associations with AS genes during hESC differentiation.

To investigate whether the dynamic changes of these HMs upon cell differentiation prefer the AS exons consistently (Fig. 2a, b), we profiled the differential HM patterns of around the hESC differentiation-associated AS exons and the same number of randomly selected constitutive splicing (CS) exons of the same AS genes for each differentiation lineage. We compared the changes of ChIP-seq reads count (normalized Δ reads count, see “Methods”) in ± 150-bp regions around the splice sites upon hESC differentiation (Fig. 2c and Additional file 1: Figure S4, see “Methods”). Except for a small part of cases (with black dots or boxes in Fig. 2d), most HMs changed more significantly around AS exons than around constitutive exons upon hESC differentiation (Mann–Whitney–Wilcoxon test, p ≤ 0.05, Fig. 2d and Additional file 1: Figure S4). Nevertheless, some HMs displayed strong links to AS, such as H3K79me1 and H3K36me3, while others only had weak link strengths, such as H3K27me3 and H3K9me3 (Fig. 2d). This result is consistent with the fact that the former are involved in active expression and AS regulation [38, 44, 63], while the latter are the epigenetic marks of repressed regions and heterochromatin [64]. The link strengths are presented as the -log10 p values to test whether the HM changes consistently prefer the AS exons across different cell lineages and AS types (Fig. 2d sidebar graph, see “Methods”). Taken together, these results, from a global view, revealed a potential regulatory link from HMs to RNA splicing, of which some are strong while the others are weak.

Fig. 2
figure 2

Dynamic changes of HMs predominantly occur in AS exons. a, b Genome browser views of representative H3K36me3 changes in MXE (exemplified as FGFR2) and SE (exemplified as CDC27) events, respectively, showing that the changes of H3K36me3 around the AS exons (blue shading) are more significant than around the flanking constitutive exons (gray shading) in 4 H1-derived cell types and IMR90. The tracks of H1 are duplicated as yellow shadings overlapping with other tracks of the derived cells (green) for a better comparison. c Representative profiles of HM changes (normalized Δ reads number) around the AS exons and randomly selected constitutive splicing (CS) exons upon hESC differentiation, shown as the average of all cell lineages pooled together. The ± 150-bp regions (exons and flanking introns) of the splice sites were considered and 15 bp-binned to produce the curves. It shows that the changes of HMs are more significant around AS exons than around constitutive exons, especially in exonic regions (gray shading). The p values, Mann–Whitney–Wilcoxon test. d The statistic significances for changes of all 16 HMs in all cell lineages and pooling them together (pooled), represented as the -log10 p values based on Mann–Whitney–Wilcoxon test. The detailed profiles are provided in Additional file 1: Figure S4. Black boxes indicate the cases that HMs around constitutive exons change more significantly than around AS exons, corresponding to the red-shaded panels in Additional file 1: Figure S4. Sidebars represent the significances whether the changes of HMs are consistently enriched in AS exons across cell lineages, showing the link strength between AS and HMs and represented as the -log10 p value based on Fisher’s exact test. The yellow vertical line indicates the significance cutoff of 0.05. Also see Additional file 1: Figure S4

Three HMs are significantly associated with AS upon hESC differentiation

To quantitatively associate the HMs with AS, all ChIP-seq data were processed for narrow peak calling using MACS2 [65]. For each AS exon of each differentiation lineage, we then quantified the differential inclusion levels, i.e. the changes of “percent splice in” (ΔPSIs, Additional file 1: Figure S1B), and the differential HMs signals, i.e. the changes of normalized narrow peak height of ChIP-seq (ΔHMs, Additional file 1: Figure S5A, see “Methods”) between H1 and differentiated cells. We observed significant differences in all HM profiles (except H3K27me3, Additional file 1: Figure S5B) between the inclusion-gain and inclusion-loss exons across cell lineages and AS types (Mann–Whitney–Wilcoxon test, p ≤ 0.05) (Fig. 3a and Additional file 1: Figure S5B). However, three independent correlation tests showed only weak global quantitative associations between the ΔPSIs and ΔHMs for some HMs (Fig. 3c and Additional file 1: Figure S5C), including eight HMs for MXE AS exons and eight HMs for SE AS exons. The weak associations may indicate that only subsets of AS exons are strongly associated with HMs and vice versa, which is consistent with a recent report [66].

Fig. 3
figure 3

A subset of HMs and AS are strongly associated upon hESC differentiation. a Representative profiles of HM (H3K36me3) changes (normalized Δ reads number) around the inclusion-gain (red lines) and inclusion-loss (blue lines) AS exons, as well as randomly selected constitutive splicing (CS) exons (black lines) for both MXE (left) and SE (right) AS events. It shows that HM changes are significantly different between inclusion-gain and inclusion-loss AS exons (p values, Mann–Whitney–Wilcoxon test). Additional file 1: Figure S5B provides the whole significances of all HMs across AS types and cell lineages. b Pearson correlation test between differential HM signals (ΔHMs) and differential inclusion levels (ΔPSIs), taking H3k36me3 as an example. Additional file 1: Figure S5C provides the correlation test results of other HMs based on two more tests. c A representative k-means cluster shows a subset of SE AS events having a negative correlation between the ΔPSIs and the ΔHMs of H3K36me3. Additional file 1: Figures S5D and S6 provide all the clustering results. d Scatter plot shows that HM-associated AS events display significant correlations between the ΔPSIs and the ΔHMs upon hESC differentiation, taking H3K27ac–associated (positively) MXE events as an example. Also see Additional file 1: Figures S5, S6

To explore the subsets of highly associated AS exons and corresponding HMs, we performed k-means clustering on the sets of inclusion-gain and inclusion-loss exons of SE and MXE events, separately, taking the ΔHMs of eight identified HMs as epigenetic features (Fig. 3c and Additional file 1: Figures S5D and S6, see “Methods”). We obtained three subsets of HM-associated SE exons and three subsets of HM-associated MXE exons (Additional file 3: Table S3). The three HM-associated SE subsets include 180, 664, and 1062 exons and are negatively associated with H4K8ac (Additional file 1: Figure S6), negatively associated with H3K36me3 (Fig. 3c), and positively associated with H3K36me3 (Additional file 1: Figure S6), respectively. The three HM-associated MXE subsets include 99, 821, and 971 exons and are positively associated with H3K27ac (Fig. 3d), negatively associated with H3K36me3 (Additional file 1: Figure S6), and positively associated with H3K36me3 (Additional file 1: Figure S6), respectively. The exons of each subset show significant correlations between their ΔPSIs and ΔHMs upon hESC differentiation (Fig. 3d). These HM-associated AS exons account for an average of 52.8% of hESC differentiation-related AS events, on average (Additional file 1: Figure S5E).

Of the three AS-associated HMs, H3K36me3 has both positive and negative correlations with AS exons. This is consistent with the fact that H3K36me3 has dual functions in regulating AS through two different chromatin-adapter systems, PSIP1/SRSF1 [45] and MRG15/PTBP1 [44]. The former increases the inclusion levels of targeting AS exons, whereas the latter decreases the inclusion levels [38]. As expected, 139 and 11 of our identified H3K36me3-associated AS genes have been reported to be regulated by SRSF1 [67, 68] (Additional file 1: Figure S5F) and PTBP1 [69] (Additional file 1: Figure S5G), respectively. Taken together, our analysis showed that more than half (52.8%) of hESC differentiation-associated AS events are significantly associated with three of 16 HMs during hESC differentiation, including H3K36me3, H3K27ac, and H4K8ac.

HM-associated AS genes predominantly function in G2/M phases to facilitate hESC differentiation

Epigenetic mechanisms have been proposed to be dynamic and play crucial roles in human ESC differentiation [15, 16]. Given the aforementioned associations between HMs and AS, and the well-established links between AS and hESC differentiation, we hypothesized that the three HMs (H3K36me3, H3K27ac, and H4K8ac) may contribute to stem cell differentiation through their associated AS events. To test our hypothesis and gain more insights into the differences between the HM-associated and HM-unassociated AS events, we performed comparative function analyses between their hosting genes, revealing that HMs are involved in alternatively splicing the core components of cell-cycle machinery and related pathways to regulate stem cell pluripotency and differentiation.

We found that HMs prefer to be associated with even shorter AS exons (Additional file 1: Figure S7A, p < 0.001, Student’s t-test), though AS exons are shorter than the average length of all exons (Additional file 1: Figure S2A). HM-associated genes (n = 2125) show more lineage specificity, i.e. more genes (49.76% vs 29.6% of MXE or 38.6% of SE genes) are lineage-specific (Additional file 1: Figures S7B and S3D, E), regardless of whether IMR90 is included or not (Additional file 1: Figure S7C). Only a few HM-associated genes are shared by different cell lineages, even in pairwise comparisons (Additional file 1: Figure S7D); the most common shared genes are lineage-independent housekeeping genes (Additional file 1: Figure S7E). These suggest that HM-associated AS genes contribute more to lineage specificity. In addition, the HM-associated AS genes (966 of 2125) are more enriched in stemness signatures than unassociated AS genes (429 of 1057) (Fig. 4a). TF binding enrichment analysis shows that HM-associated AS genes are likely to be regulated by TFs involved in cell differentiation, whereas HM-unassociated AS genes are likely to be regulated by TFs involved in cell proliferation and growth (Fig. 4b). All these results suggest that HM-associated and HM-unassociated AS genes function differently during hESC differentiation.

Fig. 4
figure 4

HM-associated AS genes predominantly function in G2/M cell-cycle phases contributing to hESC differentiation. a HM-associated AS genes are enriched more significantly in stemness signatures than HM-unassociated AS genes. b TF binding enrichment shows that HM-associated AS genes prefer to be regulated by TFs involved in cell differentiation, while the HM-unassociated AS genes are prone to be regulated by TFs involved in cell proliferation and growth. c GO enrichment analysis shows that HM-associated AS genes are enriched more significantly in cell-cycle progression than HM-unassociated AS genes, shown as the -log10 p values after FDR (≤ 0.05) adjustment. d The significant enrichment of HM-associated AS genes in the cell cycle are consistent across cell lineages, with the MSC as an exception that no significant enrichment was observed. e The top 20 enriched functions show that HM-associated AS genes involved in cell-cycle progression prefer to function in G2/M phases and DNA damage response. f The canonical pathway enrichment shows that AMT/ATR-mediated DNA damage response is the top enriched pathway of HM-associated AS genes. The vertical lines (yellow) indicate the significance cutoff of 0.05. Also see Additional file 1: Figures S7, S8

Gene Ontology (GO) enrichment analysis shows that more than half of the HM-associated AS genes (1120 of 2125) function in cell-cycle progression and exhibit more significant enrichment than do HM-unassociated AS genes (376 of 1057, Fig. 4c, d and Additional file 1: Figure S8A). The significance of the top enriched GO term (GO:0007049, cell cycle) is consistent across cell lineages, although HM-associated AS genes exhibit more lineage specificity and few of them are shared among lineages (Additional file 1: Figures S7B–D and S8B). These results suggest the involvement of HMs in AS regulation of the cell-cycle machinery that has been reported to be exploited by stem cells to control their cell fate decision [20].

Further study of the top enriched cell-cycle AS genes (Fig. 4d and Additional file 1: Figure S8A) shows that HM-associated (n = 282) and HM-unassociated AS genes (n = 150) play roles in different cell-cycle phases and related pathways. The former is prone to function in G2/M phases and DNA damage response (Fig. 4e, f). This indicates that HMs contribute to cell differentiation, at least partially, via AS regulations in these phases, which is consistent with the fact that inheritance of HMs in daughter cells occurs during the G2 phases [20]. The latter play roles in G1 phase, cell-cycle arrest, and Wnt/β-catenin signalling (Additional file 1: Figure S8C, D). Since cell fate choices seem to occur or at least be initiated during G1/S transition [53], while cell fate commitment is achieved in G2/M [54], it could be rational for stem cells to change their identity during the G2 phase when HMs are reprogrammed [20].

Intriguingly, the top enriched pathway of HM-associated AS genes is “ATM/ATR-mediated DNA damage response,” which is activated in S/G2 phases and has been recently reported as a gatekeeper of the pluripotency state dissolution (PSD) that participates in allowing hESC differentiation [54]. Together with our previous results [19], it suggests the presence of a combinational mechanism involving HMs and AS, wherein HMs facilitate the PSD and cell fate commitment by alternatively splicing the key components of the ATM/ATR pathway. Additionally, many cell-cycle TF genes are involved in the top enriched HM-associated AS gene set. The pre-B-cell leukaemia transcription factor 1 (PBX1) is one of these genes that contribute to cell-cycle progression and is discussed later in next section. Taken together, we suggest that three of 16 HMs function in positive or negative ways affect the AS of subsets of genes and further contribute to hESC differentiation in a cell-cycle phase-dependent manner. The results suggest a potential mechanistic model connecting the HMs, AS regulations, and cell-cycle progression with the cell fate decision.

Splicing of PBX1 links H3K36me3 to hESC fate decision

The past few years have identified key factors required for maintaining the pluripotent state of hESCs [70, 71], including NANOG, OCT4 (POU5F1), SOX2, KLF4, and c-MYC, the combination of which was called Yamanaka factors and sufficient to reprogram somatic cells into induced pluripotent stem cells (iPSCs) [72]. These factors appear to activate a transcriptional network that endows cells with pluripotency [73]. The above integrative analyses showed strong links between three HMs and RNA splicing, revealing a group of epigenetic regulated AS genes involved in cell-cycle machinery. PBX1 was one of the genes that their ASs are positively associated with H3K36me3 (Fig. 5a, b). Its protein is a member of the TALE (three-amino acid loop extension) family homeodomain transcription factors [74, 75] and well-known for its functions in lymphoblastic leukaemia [76,77,78,79] and several cancers [80,81,82,83,84,85,86,87,88,89]. PBX1 also plays roles in regulating developmental gene expression [90], maintaining stemness and self-renewal [80, 91, 92], and promoting the cell-cycle transition to the S phase [93]. Additionally, multiple lines of evidence obtained from in vivo and in vitro highlighted its functions as a pioneer factor [86, 94]. However, few studies have distinguished the distinct functions of its different isoforms.

Fig. 5
figure 5

Isoform switch from PBX1a and PBX1b during hESC differentiation. a Genome browser view shows the AS event and H3K36me3 signals of PBX1 upon hESC differentiation. The green horizontal bars below the ChIP-seq tracks indicate the narrow peaks called by MACS2. b The inclusion level for exon 7 of PBX1 is significantly correlated to the H3K36me3 signals over this exon across cell lineages. c The sequence difference of three protein isoforms of PBX1 and the main functional domains. d The relative expressions of PBX1a and PBX1b in 56 cells/tissues, representing the differential expressions of two isoforms in three groups based on their developmental states. e The expression levels of NANOG and OCT4 genes are negatively correlated with the expression of PBX1b. f The expression levels of PSIP1 and SRSF1 show significant positive correlations with the expression level of PBX1a. Also see Additional file 1: Figures S9, S10

PBX1 has three isoforms [95], including PBX1a, PBX1b, and PBX1c (Fig. 5c and Additional file 1: Figure S9A). PBX1a and PBX1b are produced by the AS of exon 7 (Fig. 5a) and attract most of the research attention of PBX1. PBX1b retains the same DNA-binding homeodomain as PBX1a, but changes 14 amino acids (from 334 to 347) and truncates the remaining 83 amino acids at the C-terminus of PBX1a (Fig. 5c and Additional file 1: Figure S9A). This C-terminal alteration of PBX1a has been reported to affect its cooperative interactions with HOX partners [96], which may impart different functions to these two isoforms. We here revealed its H3K36me3-regulated isoform switch between PBX1a and PBX1b, which functions at the upstream of pluripotency transcriptional network to link H3K36me3 with ESC fate decision.

We first observed differential transcript expressions of these two isoforms between the hESCs and differentiated cells, wherein PBX1a was predominantly transcribed in hESCs, while PBX1b was predominantly induced in differentiated cells (Fig. 5a and Additional file 1: Figure S9B). The same trend was also observed in an extended dataset of 56 human cell lines/tissues (Fig. 5d) from the Roadmap [97] and ENCODE [98] projects (Additional file 4: Table S4). Additionally, we did not observe significantly different expression of the total PBX1 and three other PBX family members across cell types (Additional file 1: Figure S9C, fold change < 2), indicating that the isoform switch of PBX1, rather than the differential expression of its family members, plays more important roles during hESC differentiation. To further test the possible mechanism by which PBX1b contributes to stem cell differentiation, we investigated the transcription levels of Yamanaka factors. Of these TFs, the NANOG is activated by direct promoter binding of PBX1 and KLF4, which is essential for stemness maintenance [91, 99]. Consistently, all these core factors are repressed in differentiated cells where PBX1b is highly expressed (Additional file 1: Figure S9D–G), even though the PBX1a is expressed. Based on the 56 human cell lines/tissues, we also observed significant negative correlations between expression of most important pluripotent factors (NANOG and OCT4) and PBX1b (Fig. 5e), as well as positive correlations between these two factors and PBX1a (or inclusion level of exon 7, Additional file 1: Figure S10A, B). Consistent with previous reports showing that the PBX1a and PBX1b differ in their ability to activate or repress the expression of reporter genes [100, 101], we hypothesize that PBX1a promotes the activity of the pluripotent regulatory network by promoting the expression of NANOG, whereas PBX1b may attenuate this activity by competitively binding and regulating the same target gene, since PBX1b retains the same DNA-binding domain as PBX1a. These observations are strongly suggestive that the switch from PBX1a to PBX1b is a mechanism by which PBX1 contributes to hESC differentiation via regulating the pluripotency regulatory network.

Exon 7 of PBX1 shows significantly positive correlations between its inclusion levels (PSIs) and the surrounding epigenetic signals of H3K36me3 in hESCs and differentiated cells (Fig. 5b). It suggests a potential role of H3K36me3 in regulating the isoform switch between PBX1a and PBX1b. To investigate the regulatory role of H3K36me3, we focused on two previously proved chromatin-adaptor complexes, MRG15/PTBP1 [44] and PSIP1/SRSF1 [45], which endow dual functions to H3K36me3 in AS regulation [38]. Based on the 56 cell lines/tissues from the Roadmap/ENCODE projects, we first found significant positive correlations between the expressions of PBX1a (or inclusion level of exon 7) and PSIP1/SRSF1 (Fig. 5f), but not with MRG15/PTBP1 (Additional file 1: Figure S10C, D). This result suggests that the AS of PBX1 is epigenetic regulated by H3K36me3 through the PSIP1/SRSF1 adaptor system, which was strongly supported by a recent report using the HeLa cell lines [67]. The overexpression of SRSF1 in Hela cells introduces a PSI increase of 0.18 for exon 7 of PBX1 (chr1: 164789308–164,789,421 based on NCBI37/hg19 genome assembly) based on the RNA-seq (Table S1 of [67]). Additionally, this exon was one of the 104 AS exons that were further validated using radioactive reverse transcription polymerase chain reaction (RT-PCR) (Table S2 of [67]). Their results showed that exon 7 of PBX1 is indeed a splicing target of SRSF1, supporting our conclusions.

We then validated the above hypotheses on MSCs and IM90 cells, since these two cells types show the most significant difference from H1 cells regarding our hypotheses (Fig. 5b). We cultured H1 cells, IMR90 cells, and induced H1 cells to differentiate into MSCs (H1-MSCs, see “Methods” for details). Additionally, we also included other two sources of MSCs, including one derived from human bone marrow (hBM-MSCs) and the other derived from adipose tissue (hAT-MSCs) (see “Methods” for details). Consistent with the results from RNA-seq, the same expression patterns of Yamanaka factors in H1, MSCs, and IMR90 cells were observed using quantitative RT-PCR (qRT-PCR) and western blot (Fig. 6a), which confirmed the pluripotent state of H1 cells and the differentiated states of other cell types. We then detected the isoform switch from PBX1a to PBX1b in our cultured cells, which are consistent both in mRNA and protein levels (Fig. 6b and Additional file 1: Figure S10E) and further confirmed by the western blot using PBX1b-specific antibody (anti-PBX1b) (Fig. 6b bottom and Additional file 1: Figure S10E iii). These results have verified that the PBX1b was significantly induced in differentiated cells, where the PBX1a was significantly reduced.

Fig. 6
figure 6

Isoform switch of PBX1 links H3K36me3 to hESC fate decision. a qRT-PCR and western blot show the expression levels of Yamanaka factors in H1, MSC, and IMR90 cells. Whiskers denote the standard deviations of three replicates. b RT-PCR and western blot show the isoform switches between PBX1a and PBX1b from H1 cells to differentiated cells. c i. ChIP-PCR shows the differential binding of PBX1b to NANOG promoter in H1 cells and differentiated cells; ii. ChIP-PCR shows the reduced H3K36me3 signal in differentiated cells; iii. ChIP-PCR shows the differential recruitment of PSIP1 to exon 7 of PBX1. d RIP-PCR show the differential recruitment of SRSF1 around exon 7 of PBX1. e Co-IP shows the overall physical interaction between PSIP1 and SRSF1 in all studied cell types. f The mechanism by which H3K36me3 is linked to cell fate decision by regulating the isoform switch of PBX1, which functions upstream of the pluripotency regulatory network. Also see Additional file 1: Figures S9, S10

We also validated the mechanism by which the splicing of PBX1 links H3K36me3 to stem cell fate decision. We first confirmed that PBX1b also binds to the promoter of NANOG at the same region where PBX1a binds to and the binding signals (ChIP-PCR) were high in the differentiated cells but very low in H1 stem cells (Fig. 6c i and Additional file 1: Figure S10F i). Consistent with the results from ChIP-seq, we also observed reduced H3K36 tri-methylation around exon 7 of PBX1 based on ChIP-PCR assay (Fig. 6c ii and Additional file 1: Figure S10F ii). Furthermore, the chromatin factor PSIP1 only showed high binding signal in H1 stem cells (Fig. 6c iii and Additional file 1: Figure S10F iii), which recruit the SF SRSF1 to the PBX1 exclusively in H1 stem cells (Fig. 6d and Additional file 1: Figure S10G) even though the physical binding between these two factors were universally detected in all cell types (Fig. 6e). All these experimental results suggested that, upon differentiation, stem cells reduced the H3K36 tri-methylation and may attenuate the recruitment of PSIP1/SRSF1 adaptor around exon 7 of PBX1, leading to the exclusion of exon 7 and highly expressed PBX1b in differentiated cells. High expression of PBX1b may attenuate the activity of PBX1a in promoting the pluripotency regulatory network.

Taken together, we suggested that H3K36me3 regulates the AS of PBX1 via the PSIP1/SRSF1 adaptor system, leading the isoform switch from PBX1a to PBX1b during hESC differentiation. Subsequently, PBX1b competitively binds to NANOG and abolishes the bindings of PBX1a. This competitive binding attenuates the pluripotency regulatory network to repress self-renewal and consequently facilitate differentiation (Fig. 6f). These findings revealed how the PBX1 contributes to cell fate decision and exemplify the mechanism by which AS links HMs to stem cell fate decision.


ESCs provide a vital tool for studying the regulation of early embryonic development and cell fate decision. [1]. In addition to the core pluripotency regulatory network, emerging evidence revealed other processes regulating ESC pluripotency and differentiation, including HMs, AS, cell-cycle machinery, and signalling pathways [54]. Here, we connected these previously separate avenues of investigations, beginning with the discovery that three of 16 HMs are significantly associated with more than half of AS events upon hESC differentiation. Further analyses implicated the association of HMs, AS regulation, and cell-cycle progression with hESC fate decision. Specifically, HMs orchestrate a subset of AS outcomes that play critical roles in cell-cycle progression via the related pathways, such as ATM/ATR-mediated DNA response [19], and TFs, such as PBX1 (Additional file 1: Figure S10H). In this way, HMs, AS regulation, and signalling pathways are converged into the cell-cycle machinery that has been claimed to rule the pluripotency [20].

Although epigenetic signatures, such as HMs, are mainly enriched in promoters and other intergenic regions, it has become increasingly clear that they are also present in the gene body, especially in exon regions. This indicates a potential link between epigenetic regulation and the predominantly co-transcriptional occurrence of AS. Thus far, H3K36me3 [44, 45], H3K4me3 [42], H3K9me3 [43], and the acetylation of H3 and H4 [46,47,48,49,50] have been revealed to regulate AS, either via the chromatin-adapter systems or by altering Pol II elongation rate. Here, we investigated the extent to which the HMs could associate with AS by integrative analyses on both transcriptome and epigenome data during hESC differentiation. We found that three HMs are significantly associated with about half of AS events. By contrast, a recent report showed that only about 4% of differentially regulated exons among five human cell lines are positively associated with three promoter-like epigenetic signatures, including H3K9ac, H3K27ac, and H3K4m3 [66]. Like that report, we also found a positive association of H3K27ac with a subset of AS events. However, our results differ regarding the other two HMs that we identified to be associated with AS.

In our study, H3K36me3 is associated with the most identified HM-associated AS events, either positively or negatively. It is reasonable since H3K36me3 is a mark for actively expressed genomes [63] and it has been reported to have dual roles in AS regulations through two different chromatin-adapter systems, PSIP1/SRSF1 [44] and MRG15/PTBP1 [45]. SRSF1 is a SF which will increase the inclusion of targeted AS exons and PTBP1 will decrease the inclusion levels of the regulated AS exons. Therefore, the exons that are regulated by the PSIP1/SRSF1 adapter system will show positive correlations with the H3K36me3, while the exons regulated by MRG15/PTBP1 will show negative correlations. Our results are consistent with this fact and show both direction correlations between different sets of AS events and H3K36me3. Many of these AS events in our study have been validated by other studies (Additional file 1: Figure S5F, G).

H4K8ac is associated with the fewest number of AS events in our results. Although rarely studied, H4K8ac is known to act as a transcriptional activator in both the promoters and transcribed regions [102]. Its negative association with AS is supported by the finding that it recruits proteins involved in increasing the Pol II elongation rate [103]. This suggests that H4K8ac may function in AS regulation by altering the Pol II elongation rate, rather than via the chromatin-adaptor systems. However, further studies are required. Collectively, possible reasons for the different results between this study and others [66] could be that: (1) we considered differentiation from hESCs to five different cell types, which covered more inter-lineage AS events than the previous report; or (2) different sets of epigenetic signatures were considered, which may lead to relatively biased results in both studies. Obviously, the inclusion of more cell lineages and epigenetic signatures may reduce this bias. Therefore, an extended dataset of 56 cell lines/tissues was included in our study and the observations support our results.

Our study also extended the understanding that HMs contribute to cell fate decision via determining not only what parts of the genome are expressed, but also how they are spliced [38]. We demonstrated that the HM-associated AS events have a significant impact on cell fate decision in a cell-cycle-dependent manner. The most intriguing discovery is that the HM-associated genes are enriched in G2/M phases and predominantly function in ATM/ATR-mediated DNA response. Evidentially, the ATM/ATR-mediated checkpoint has been recently revealed to attenuate pluripotency state dissolution and serves as a gatekeeper of the pluripotency state through the cell cycle [54]. The cell cycle has been considered the hub machinery for cell fate decision [20] since all commitments will go through the cell-cycle progression. Our study expanded such machinery by linking the HMs and AS regulation to cell-cycle pathways and TFs, which, together, contribute to cell fate decision (Additional file 1: Figure S10H).

We also exemplified our hypothesized mechanism by an H3K36me3-regulated isoforms switch of PBX1. In addition to its well-known functions in lymphoblastic leukaemia [76,77,78,79] and a number of cancers [80,81,82,83,84,85,86,87,88,89], PBX1 was also found to promote hESC self-renewal by corporately binding to the regulatory elements of NANOG with KLF4 [99]. We found that the transcriptions of two isoforms of PBX1, PBX1a and PBX1b, are regulated by H3K36me3 during hESC differentiation. Their protein isoforms competitively bind NANOG and the binding of PBX1b will abolish the binding of PBX1a, which further attenuates the activity of the core pluripotency regulatory network composed of Yamanaka factors. The switch from PBX1a to PBX1b is modulated by H3K36me3 via the PSIP1/SRSF1 adapter system [45]. Our results were also supported by an extended dataset of 56 cell lines/tissues from the Roadmap/ENCODE projects. Collectively, our findings expanded understanding of the core transcriptional network by adding a regulatory layer of HM-associated AS (Fig. 6f).

A very recent report showed that the switch in Pbx1 isoforms was regulated by Ptbp1 during neuronal differentiation in mice [104], indicating a contradiction that the AS of PBX1 should be negatively regulated by H3K36me3 via the MRG15/PTBP1 [44]. Our study also included the neuronal lineage and showed that differentiation to NPC is an exception, distinct from other lineages. If NPC is considered separately, the results are consistent with the recent report [104] showing that NPCs and mature neurons express increasing levels of PBX1a rather than PBX1b (Additional file 1: Figure S9B). Another recent report showed that PBX1 was a splicing target of SRSF1 in the HeLa cell line [67], which strongly supports our findings. Taken together, these evidence suggests that there are two parallel mechanisms regulating PBX1 isoforms in embryonic development, in which neuronal differentiation adopts a mechanism that is different from other lineages.

Finally, it is worth noting that both our work and other studies [66] reported that HMs cannot explain all AS events identified either during ESC differentiation or based on pairwise comparisons between cell types. Moreover, bidirectional communication between HMs and AS has been widely reported. For instance, the AS can enhance the recruitment of H3K36 methyltransferase HYPB/Set2, resulting in significant differences in H3K36me3 around the AS exons [66]. These findings increased the complexity of defining the cause and effect between HMs and AS. Nevertheless, our findings suggest that at least a subset of AS events are regulated epigenetically, similar to the way that epigenetic states around the transcription start sites define what parts of the genome are expressed. Additionally, as we described in our previous study, the AS outcomes may be estimated more precisely by combining splicing cis-elements and trans-factors (i.e. genetic splicing code) and HMs (i.e. epigenetic splicing code), as an “extended splicing code” [19]. Taken together, we presented a mechanism conveying the HM information into cell fate decision through the AS of cell-cycle factors or the core components of pathways that controlling cell-cycle progression (Additional file 1: Figure S10H).


We performed integrative analyses on transcriptome and epigenome data of the hESCs, H1 cell line, and five differentiated cell types, demonstrating that three of 16 HMs were strongly associated with half of AS events upon hESC differentiation. We proposed a potential epigenetic mechanism by which some HMs contribute to ESC fate decision through the AS regulation of specific pathways and cell-cycle genes. We experimentally validated one cell-cycle-related transcription factor, PBX1, which demonstrated that AS provides a mechanism conveying the HM information into the regulation of cell fate decisions (Fig. 6f). Our study will have a broad impact on the field of epigenetic reprogramming in mammalian development involving splicing regulations and cell-cycle machinery.


Identification of AS exons upon hESC differentiation

Aligned BAM files (hg18) for all six cell types (H1, ME, TBL, MSC, NPC, and IMR90) were downloaded from the source provided in reference. Two BAM files (replicates) of each cell type were analyzed using rMATS (version 3.0.9) [56] and MISO (version 0.4.6) [105]. The rMATS was used to identify AS exons based on the differential PSI (Ψ) values between each differentiated cell type and H1 cells (Additional file 1: Figure S1B). The splicing changes (ΔPSIs or ΔΨ) are used to identify the AS events between H1 and other cell types. A higher cutoff is always helpful in reducing the false positive while compromising the sensitivity. The cutoff, |ΔΨ| ≥ 0.1 or |ΔΨ| ≥ 10%, is widely accepted and used in AS identification [25, 106,107,108]. Many other studies even used 0.05 as the cutoff [109,110,111,112,113]. We did additional correlation analyses based on different ΔPSI cutoffs (0.1, 0.2, 0.3, 0.4, and 0.5). With the increase of the cutoffs, the number of AS events was significantly reduced (Additional file 1: Figure S11A); however, the correlations ware only slightly increased between AS and some HMs (Additional file 1: Figure S11B, C, upper panels), i.e. no consistent impacts of the cutoffs on the correlations were observed. Similarly, the correlation significances were also not consistently affected (Additional file 1: Figure S11B, C, lower panels). Therefore, in our study, only AS exons which hold the |ΔPSI| ≥ 0.1, p value ≤ 0.01, and FDR ≤ 5%, were considered as final hESC differentiation-related AS exons.

All identified AS event types are summarized in Additional file 1: Table S1. Finally, two types of AS exons, namely SE and MXE, which are most common and account for major AS events, were used for subsequent analysis (Additional file 2: Table S2). MISO was used to estimate the confidence interval of each PSI and generate Sashimi graphs [114] (see Figs. 1a, b and 5a). To match the ChIP-seq analysis, genomic coordinates of identified AS events were lifted over to hg19 using LiftOver tool downloaded from UCSC.

ChIP-seq data process and HM profiling

ChIP-seq data (aligned BAM files, hg19) were downloaded from Gene Expression Omnibus (GEO, accession ID: GSE16256). This dataset includes the ChIP-seq reads of up to 24 types of HMs for six cell types (H1, ME, TBL, MSC, NPC, and IMR90). Among these, nine histone acetylation modifications (H2AK5ac, H2BK120ac, H2BK5ac, H3K18ac, H3K23ac, H3K27ac, H3K4ac, H3K9ac, and H4K8ac) and seven histone methylation modifications (H3K27me3, H3K36me3, H3K4me1, H3K4me2, H3K4me3, H3K79me1, and H3K9me3) are available for all six cell types and therefore were used for our analyses.

To generate global differential profiles of HM changes between AS exons and constitutive exons upon hESC differentiation, for each MXE and SE AS events, we first randomly selected the constitutive splicing (CS) exons from the same genes, composing a set of CS exons. We then considered the HM changes in a ± 150-bp region flanking both splice sites of each AS and CS exon, i.e. a 300-bp exon-intron boundary region. Each region was 15 bp-binned. Alternatively, for a few cases where the exon or intron is < 150 bps, the entire exonic or intronic region was evenly divided into 10 bins. This scaling allows combining all regions of different lengths to generate uniform profiles of HM changes around the splice sites (see Fig. 2c and Additional file 1: Figure S4). To this end, we calculated the sequencing depth-normalized Δ reads number for each binned region between H1 cells and differentiated cells as follows:

$$ \varDelta\ reads\ number=\frac{\mathrm{reads}\ \mathrm{number}\ \mathrm{of}\ \mathrm{H}1\ \mathrm{cells}-\mathrm{reads}\ \mathrm{number}\ \mathrm{of}\ \mathrm{differentiated}\ \mathrm{cells}}{\mathrm{bin}\ \mathrm{size}\ \mathrm{in}\ \mathrm{bps}} $$

Each region is assigned a value representing the average Δ reads number between H1 cells and differentiated cells for each HM. We also compared HM profiles between the inclusion-gain and inclusion-loss exons (Fig. 3a and Additional file 1: Figure S5B) using the same strategy. The statistical results (Fig. 2c and Additional file 1: Figure S5B) are presented as the p values based on Mann–Whitney–Wilcoxon tests (using the R package).

To quantitatively link HMs to AS upon hESC differentiation, the ChIP-seq data were further processed by narrow peak calling. For each histone ChIP-seq dataset, the MACS v2.0.10 peak caller ( [65, 115] was used to compare ChIP-seq signal to a corresponding whole cell extract (WCE) sequenced control to identify narrow regions of enrichment (narrow peak) that pass a Poisson p value threshold of 0.01. All other parameters (options) were set as default. We then compared the HM signals between H1 cells and differentiated cells. We defined the “differential HM signals (ΔHMs)” as the difference of the normalized peak signals (i.e. the heights of the narrow peaks) between H1 and the differentiated cells. Because the 3′ splice sites (3′ ends of the introns) determine the inclusion of the downstream AS exons [116] and the distances from the peaks to their target sites affect their regulatory effects [117], we normalize the peak signals against the distance (in kb) between the peak summits and 3′ splice sites (Additional file 1: Figure S5A). Since there is no evidence showing that distal HMs could regulate the AS, we only considered local peaks with at least 1 bp overlapping on either side of the AS exon. For exons without overlapping peaks, peak signals of these exons were set to zero. For the exons there are more than one overlapping peaks, the peak signals of these exons were set to the greater ones. For MXE events, only the upstream AS exons were considered due to their exclusiveness in inclusion level between these two exons, i.e. the sum of the PSIs for 2 exons of an MXE event is always 1.

Association studies and k-means clustering

To quantitatively estimate associations between HMs and AS, we first used three independent correlation tests, including Pearson correlation (PC), multiple linear regression (MLR), and logistic regression (LLR), to test global correlations between AS events and each of 16 HMs based on differential inclusions (ΔPSIs) and differential HM signals (ΔHMs). PC was performed using the R package (stats, cor.test(),method = ‘pearson’). MLR and LLR were calculated using Weka 3 [118], wherein the ΔHMs are independent variables and ΔPSIs are response variables. The results show that only some HMs correlate with AS, and most correlations are weak (Additional file 1: Figure S5C). HMs that have significant correlations (p ≤ 0.05) with AS were used for further clustering analysis, through which we identified six subsets of HM-associated AS events (Additional file 3: Table S3).

K-means clustering was performed separately on inclusion-gain and inclusion-loss AS events of MXE and SE, based on the selected HM signatures (Additional file 1: Figure S5C, checked with a “√”). K was set to 6 for all clustering processes (Additional file 1: Figure S5D), which produced the minimal root mean square error (RMSE) for most cases based on a series of clustering with k in the range of 2–8 (data not shown). Then the two clusters that generate mirror patterns, of which one was from inclusion-gain events and one was from inclusion-loss events, were combined to be considered as a subset of HM-associated AS events (Additional file 1: Figure S6). Finally, we identified six subsets of HM-associated AS events displaying significantly positive or negative correlations with three HMs, respectively.

Gene expression quantification

For each cell type, two aligned BAM files (replicates) were used to estimate the expression level of genes using Cufflinks [119]. Default parameters (options) were used. The expression level of each gene was presented as FPKM for each cell type. Differentially expressed genes (DEGs) were defined as those genes whose fold changes between the differentiated cells and hESCs are > 2. Specifically for DEG analysis of SF genes, we collected a list of 47 ubiquitously expressed SFs with “RNA splicing” in their GO annotations from AmiGO 2 [120]. The enrichment significances in Additional file 1: Figures S1D and S3H are shown as the p values based on hypergeometric tests, using the DEGs of all expressed genes (with the FPKM ≥ 1 in at least hESCs or one of the differentiated cell types) as background. We found that the AS genes are generally not differentially expressed between hESCs and differentiated cells, indicating that they function in hESC differentiation via isoform switches rather than expression changes. Few SF genes show differential expression between hESCs and differentiated cells, indicating the existence of epigenetic control of AS, rather than the direct control on SFs expression.

Genome annotations

Since the RNA-seq reads (BAM) files and ChIP-seq read (BAM) files downloaded from the public sources were mapped to different human genome assemblies, NCBI36/hg18 (Mar. 2006) and GRCh37/hg19 (February 2009), respectively, we downloaded two version of gene annotations (in GTF formats) from the UCSC Table Browser [121]. The hg18 GTF file was used for rMATS and MISO to identify AS during the differentiation from H1 ESCs into five differentiated cells. The hg19 GTF file was used to define the genome coordinates of AS exons and further for ChIP-seq profile analysis (Figs. 2a–c, 3a, and Additional file 1: Figures S4 and S5A). We compared exonic and intronic lengths based on hg18 annotation (Additional file 1: Figure S2).

Gene Ontology enrichment analysis

The GO enrichment analysis was performed using ToppGene [122] by searching the HGNC Symbol database under default parameters (p value method: probability density function). Overrepresented GO terms for the GO domain “biological process” were used to generate data shown in Fig. 4c–e and Additional file 1: Figure S8A–C using either the FDR (0.05) adjusted p value or the enriched gene numbers (Additional file 1: Figure S8A).

Canonic pathway enrichment analysis

Both the HM-associated (n = 282) and HM-unassociated (150) AS genes from the top enriched GO term (GO:0007049) were used to perform canonic pathway enrichment (Fig. 4f and Additional file 1: Figure S8D) analysis through Ingenuity Pathway Analysis (IPA,

Stemness signature and TF binding enrichment analysis

StemChecker [59], a web-based tool to discover and explore stemness signatures in gene sets, was used to calculate the enrichment of AS genes in stemness signatures. Significances were tested (hypergeometric test) using the gene set from human (Homo sapiens) of this database as the background. For all AS genes (n = 3865), 2979 genes were found in this dataset. Of 2979 genes, 1395 were found in the stemness signature gene set, most of which (n = 813) are ESC signature genes. Additional file 1: Figure S3A shows the enrichment significance as the -log10 p values (Bonferroni adjusted). For HM-associated AS genes (n = 2125), 1992 genes were found in this dataset. Of 1992 genes, 966 were found in the stemness signature gene set, most of which (n = 562) are ESC signature genes. For HM-unassociated genes (n = 1057), 987 genes were found in this dataset. Of 987 genes, 429 were found in the stemness signature gene set, most of which (n = 251) are ESC signature genes. The significances are shown as -log10 p values (Bonferroni adjusted) in Fig. 4a.

FunRich (v2.1.2) [123, 124], a stand-alone functional enrichment analysis tool, was used for TF binding enrichment analysis to get the enriched TFs that may regulate the query genes. The top six enriched TFs of HM-associated and HM-unassociated AS genes are presented and shown as the proportion of enriched AS genes. It shows that HM-associated AS genes are more likely to be regulated by TFs involved in cell differentiation and development, while the HM-unassociated AS genes are more likely to be regulated by TFs involved in cell proliferation and renewal (Fig. 4b).

Roadmap and ENCODE data analysis

All raw data are available from the GEO accession IDs GSE18927 and GSE16256. The individual sources of RNA-seq data for 56 cell lines/tissues from Roadmap/ENCODE projects are listed in Additional file 4: Table S4. The RNA-seq data (BAM files) were used to calculate the PSI of exon 7 for PBX1 in each cell line/tissue and to estimate the expression levels of all gene (FPKM), based on aforementioned strategies. The relative expression levels of PBX1a and PBX1b shown in Fig. 5 and Additional file 1: Figure S10 were calculated as the individual PFKM value of each divided by their total FPKM values.

Statistical analyses and tests

Levels of significance were calculated with the Mann–Whitney–Wilcoxon test for Figs. 2c, d, 3a, and Additional file 1: Figure S4 and S5B, with Fisher’s exact test for Figs. 1d, 2d, and Additional file 1: Figure S5B, with Student’s t-test for Fig. 5d and Additional file 1: Figure S7A, and with a hypergeometric test for Additional file 1: Figures S1D and S3H. Levels of correlation significance were calculated with PC, MLR, and LLR for Fig. 3c, d, and Additional file 1: Figure S5C. MLR and LLR were performed using Weka 3 [118], whereas all other tests were performed using R packages. The p values for the enrichment analyses (Fig. 4, Additional file 1: Figures S3A and S8) were adjusted either by PDR or Bonferroni (refer to the corresponding method sections for details). The statistical analyses of the ChIP, RIP, and western blotting assays were shown in Additional file 1: Figure S10E–G. Three replicates were conducted for each assay and the quantitatively statistical analyses were performed on the relative band intensities normalized by control genes (ß-Actin) or input signals. ImageJ ( was used to quantify the band intensities. ANOVA was used for statistical significance tests.

Cell culture and MSC induction

Human embryonic stem cell (H1cell line) line was purchased from the WiCell Research Institute (product ID: WA01). H1 cells were cultured on the matrigel-coated six-well culture plates (BD Bioscience) in the defined mTeSR1 culture medium (STEMCELL Technologies). The culture medium was refreshed daily. The H1-derived mesenchymal stem cells (H1-MSC) were differentiated from H1 cells as described previously [125]. Briefly, small H1 cell aggregates were cultured on a monolayer of mouse OP9 bone-marrow stromal cell line (ATCC) for nine days. After depleting the OP9 cell layer, the cells were then cultured in semi-solid colony-forming serum-free medium supplemented with fibroblast growth factor 2 and platelet-derived growth factor BB for two weeks. The mesodermal colonies were selected and cultured in mesenchymal serum-free expansion medium with FGF2 to generate and expand H1-MSCs. hAT-MSCs were derived from the subcutaneous fats provided by the National Disease Research Interchange (NDRI) using the protocol described previously [126]. Briefly, the adipose tissue was mechanically minced and digested with collagenase Type II (Worthington Bio, Lakewood, NJ, USA). The resulted single cell suspension was cultured in α-minimal essential medium with 5% human platelet lysate (Cook Regentec, Indianapolis, IN, USA), 10 μg/mL gentamicin (Life Technologies, CA, USA), and 1× Glutamax (Life Technologies). After reaching ~ 70% confluence, the adherent cells were harvested at the passage 1 (P1) hAT-MSC/AEY239. hBM-MSCs were isolated from commercially available fresh human bone marrow (hBM) aspirates (AllCells, Emeryville, CA, USA) and expanded following a standard protocol [127]. Briefly, hBM-MSC were cultured in α-minimal essential medium supplemented with 17% fetal bovine serum (FBS; lot-selected for rapid growth of MSC; Atlanta Biologicals, Norcross, GA, USA), 100 units/mL penicillin (Life Technologies, CA, USA), 100 μg/mL streptomycin (Life Technologies), and 2 mM L-glutamine (Life Technologies). After reaching ~ 70% confluence, the adherent cells were harvested at the passage 1 (P1) hBM-MSC-5204(G). The human IMR-90 cell line was purchased from the American Tissue Type Culture Collection (Manassas, VA, USA) and cultured in Eagle’s Minimum Essential Medium (Thermo Scientific, Logan, UT, USA) supplemented 10% fetal calf serum, 2 mM L-glutamine, 100 U/mL penicillin, and 100 μg/mL streptomycin at 37 °C in humidified 5% CO2.

RNA isolation and RT-PCR

Total RNA was extracted from cells using the RNeasy Mini plus kit (Qiagen, Valencia, CA, USA) according to the manufacturer’s instructions. RT reactions for first-strand complementary DNA (cDNA) synthesis were performed with 2 μg of total RNA in 25 uL reaction mixture containing 5 μL of 5× first-strand synthesis buffer, 0.5 mM dNTP, 0.5 μg oligo(dT)12-18mer (Invitrogen), 200 units of M-MLV reverse transcriptase (Promega, Madison, WI, USA), and 25 units of RNase inhibitor (Invitrogen). The mixture was then incubated at 42 °C for 50 min and 52 °C for 10 min. The reaction was stopped by heating at 70 °C for 15 min. The PCR amplifications were carried out in 50 μL reaction solution containing 1 μL of RT product, 5 μL of 10× PCR buffer, 0.15 mM MgCl2, 0.2 mM dNTP, 0.2 mM sense and antisense primers, and 2.5 U Taq polymerase (Bohringer, Mannheim, Germany). The sequences for the upstream and downstream primers of PBX1 and β-actin are listed in Additional file 1: Table S5. PCR reaction solution was denatured initially at 95 °C for 3 min, followed by 35 cycles at 95 °C for 1 min, 55 °C for 40 s, and 72 °C 40 s. The final extension step was at 72 °C for 10 min. The PCR products were resolved in a 2% ethidium bromide-containing agarose gel and visualized using ChemiDoc MP Imager (Bio-Rad).

Quantitative RT-PCR

The qPCR amplification was done in a 20-uL reaction mixture containing 100 ng of cDNA, 10 uL 2× All-in-One™ qPCR mix (GeneCopoeia, Rockville, MD, USA), 0.3 mM of upstream and downstream primers, and nuclear-free water. The PCR reaction was conducted with 1 cycle at 95 °C for 10 min, 40 cycles at 95 °C for 15 s, 40 °C for 30 s, and 60 °C for 1 min, followed by dissociation curve analysis distinguishing PCR products. The expression level of a gene was normalized with the endogenous control gene β-actin. The relative expressions of genes were calculated using the 2-ΔΔCT method, normalized by β-actin and presented as mean ± SD (n = 3) (Fig. 6a). The sequences of the paired sense and antisense primers for human SOX2, NANOG, OCT4, c-MYC, KLF4, and β-actin are listed in Additional file 1: Table S5.

Western blotting

Cells were lysed with 1× RIPA buffer supplemented with protease and phosphatase inhibitor cocktail (Roche Applied Science, Indianapolis, IN, USA) and stored in aliquots at − 20 °C until use. Twenty micrograms of cell lysates were mixed with an equal volume of Laemmli sample buffer, denatured by boiling, and separated by SDS-PAGE. The separated proteins were then transferred to PVDF membranes (Bio-Rad, Hercules, CA, USA). The membranes were blocked using 5% BSA for 1 h at room temperature and incubated with the first antibodies against 1% BSA overnight. SOX2, OCT4, NANOG, KLF4, c-MYC, and β-actin antibodies were from Cell Signalling Technology (Beverly, MA, USA). PBX1 antibody was from Abcam (Cambridge, MA, USA) and PBX1b antibody from Santa Cruz Technology (Dallas, TX, USA). After incubation with IgG horseradish peroxidase-conjugated secondary antibodies (Cell Signalling) for 2 h at room temperature, the immunoblots were developed using SuperSignal West Pico PLUS Chemiluminescent reagent (Thermo Fisher Scientific, Waltham, MA, USA) and imaged using ChemiDoc MP Imager (Bio-Rad).

Co-immunoprecipitation (Co-IP)

Co-IP was used to validate the PSIP1-SRSF1 interaction (Fig. 6e). Cells were lysed with ice-cold non-denaturing lysis buffer containing 20 mM Tris HCl (pH 8), 137 mM NaCl, 1% Nonidet P-40, 2 mM EDTA, and proteinase inhibitor cocktails. A total of 200 μg protein were pre-cleared with 30 uL protein G magnetic beads (Cell Signalling) for 30 min at 4 °C on a rotator to reduce non-specific protein binding. The pre-cleared lysate was immunoprecipitated with the 5 μL anti-SRSF1 antibody (Invitrogen) overnight and then incubated with protein G magnetic beads for 4 h at 4 °C. Beads without anti-SRSF1 antibody were used as an IP control. The protein G magnetic beads were washed five times with lysis buffer and the precipitated protein collected. The SRSF1-bound PSIP1 protein (also known as LEDGF) level was determined with the PSIP1 antibody (Human LEDGF Antibody, R&D Systems) using western blot as described above. Three specific bands (one for p75 and two for p52) were detected (Fig. 6e) as indicated on the R&D Systems website (

Chromatin immunoprecipitation (ChIP)

ChIP assay was performed using a SimpleChIP Enzymatic Chromatin IP Kit (Cell Signaling Technology) according to the manufacturer’s instructions. Briefly, 2 × 107 cells were cross-linked with 1% formaldehyde and lysed with lysis buffer. Chromatin was fragmented by partial digestion with Micrococcal Nuclease Chromatin. The protein–DNA complex was precipitated with ChIP-Grade Protein G Magnetic Beads (Cell Signalling) and ChIP-validated antibodies against H3K36me3 (Abcam), PSIP1 (Novus), and PBX1b (Santa Cruz). Normal mouse IgG and normal rabbit IgG were used as negative controls. After reversal of protein–DNA cross-links, the DNA was purified using DNA purification spin columns. The purified DNA fragments were then amplified with the appropriate primers on T100 thermal cycler (Bio-Rad). The primer pairs used for PCR are listed in Additional file 1: Table S5. The H3K36me3-immunoprecipitated and PSIP1-immunoprecipitated DNA fragments surrounding exon 7 of PBX1b and the promoter region of NANOG in the PBX1b-immunoprecipitated DNA fragments were PCR-amplified. The ChIP-PCR products were revealed by electrophoresis on a 2% agarose gel (Fig. 6c).

RNA immunoprecipitation (RIP)

RIP assay was performed using Magna RIP™ RNA-Binding Protein Immunoprecipitation Kit (Millipore Sigma) following the manufacturer’s instructions. Briefly, the cells were lysed with RIP lysis buffer with RNase and protease inhibitors. In total, 500 μg of total protein was precleared with protein G magnetic beads for 30 min. The protein G magnetic beads were preincubated with 5 μg of mouse monoclonal anti-SRSP1 antibody or normal mouse IgG for 2 h at 4 °C. The antibody-coated beads were then incubated with precleared cell lysates at 4 °C overnight with rotation. The RNA/protein/beads conjugates were washed five times with RIP was buffer and the RNA–protein complexes were eluted from the protein G magnetic beads on the magnetic separator. The SRSP1-bound RNA was extracted using acid phenol-chloroform and precipitated with ethanol. The RNA was then reverse-transcribed and the expression levels of PBX1a and PBX1b in the immunoprecipitated and non-immunoprecipitated (input) samples were analyzed using RT-PCR (Fig. 6d).