Introduction

The development and pathology of acute myeloid leukemia (AML) can be caused by a number of genetic alterations, although the molecular basis of AML is not yet thoroughly understood. Chromosomal translocations and variations such as t(15;17), t(8:21), inv(16), t(9;21), t(9;11) are characteristic of AML, and suggest that genetic events play a key role in leukemogenesis [1]. However, nearly 50% of AML cases have a normal karyotype and lack major chromosome abnormalities. In an effort to elucidate the genetic basis of these cases, next-generation genome sequencing methods have been successfully used in recent years to identify many novel leukemogenic genes [2]. From these analyses, recurrent mutations in genes encoding subunits of the cohesin complex emerged in AML genomes. Several studies have now revealed that mutations in the cohesin complex are strongly associated with AML, and furthermore, that cohesin mutations are also found at high frequency in other related myeloid malignancies. Cohesin mutations could therefore represent a potential new molecular mechanism underpinning oncogenesis. Cohesin has multiple functions, including roles in cell division, nuclear architecture, DNA damage repair, development and transcription, and these functions have been the subject of several comprehensive recent reviews [39]. In this concise, focused review, we will discuss molecular functions of cohesin that have potential to influence the etiology and progression of AML and other myeloid malignancies.

Cohesin biology and cancer development

Cohesin is a large ring-shaped protein complex consisting of four major subunits: SMC1A, SMC3, RAD21 and STAG1/2 [7] (Figure 1). While best known for its role in mediating sister chromatid cohesion from S phase until M phase [7], cohesin also plays crucial roles in DNA damage repair and gene expression [10], human development [11] and cancer [12, 13]. Cohesin mutations are found in several cancer types [1416], however their contribution to oncogenesis is unclear with both overexpression and mutation of cohesin subunits being implicated in cancer. For example, overexpression of cohesin subunit RAD21 in breast cancer is associated with poor prognosis and resistance to chemotherapy [17]. Cohesin mutations must necessarily lead to reduced, but not absent function, since complete loss of cohesin function blocks mitosis and results in cell death [18, 19]. Therefore, the cohesin mutations found in cancer are usually heterozygous or hypomorphic. The mechanisms by which cohesin mutations contribute to cancer probably involve multiple molecular pathways reflecting its non-mitotic molecular roles [12, 13].

Figure 1
figure 1

Frequency of cohesin mutations in AML. Cohesin is a multi-subunit protein complex that is involved in chromosome pairing, DNA repair and transcription regulation. Mutations within the individual protein components of cohesin occur at significant frequency in AML. Data from references 20–28 were combined to determine the mutation frequency (boxes) in each of the cohesin subunits (SMC1A, SMC3, STAG1/2, and RAD21). Details of mutations found in each study are presented in Table S1.

Cohesin mutations in myeloid malignancies

The association of cohesin mutations with myeloid malignancy is particularly striking. Data from the Cancer Genome Atlas Research Network (TCGA) revealed that a significant proportion of AMLs had mutations in subunits of cohesin [20]. Somatic variants in cohesin genes were identified in 26/200 cases of AML subjected to exome or whole genome sequencing [20, 21]. Sequencing of a separate set of AML samples by Welch et al. identified cohesin mutations in 7/108 cases [21]. Cohesin mutations occurred primarily in French-American-British (FAB) M1 and M2 cases in the TCGA cohort, and Welch et al. found cohesin mutations exclusively in M1 cases [21]. The predominance of cohesin lesions in the most immature forms of AML suggests they were initiating events rather than passenger mutations [21]. Cohesin mutations co-occurred with NPM1, DNMT3A, TET2, or RUNX1 mutations in 17/19 cases [21], implying cooperation with other leukemogenic pathways. Mutations in cohesin genes represented one of just nine categories of mutations thought to actively contribute to leukemogenesis [20]. Our calculation of the rate of cohesin mutations in AML using the TCGA data [20, 21] and other published studies to date [2228] indicates that the total rate of cohesin mutation in AML is around 9% (Figure 1, Table 1). Further details of the contributing studies can be found in the accompanying supplementary table (see Additional file 1: Table S1).

Table 1 Key findings from selected studies identifying cohesin mutations in myeloid malignancies

The emergence of cohesin mutations in AML prompted Thol et al. [24] to sequence cohesin complex genes in 389 AML samples, yielding a total of 23 mutations (5.9%). Mutations in cohesin subunits were mutually exclusive, and most mutations were found in karyotypically normal samples. A strong correlation was observed between mutations in cohesin and the known AML-associated gene nucleophosmin (NPM1), with NPM1-mutated patients twice as likely to also harbor a cohesin mutation compared with NPM1-normal. Cohesin mutation status was not prognostically informative, nor did it correlate with any differences in clinical features. Allelic burden analysis suggested that cohesin mutations occurred as an early event during leukemogenesis [24].

While most evidence for cohesin mutations in myeloid leukemia currently comes from AML, cohesin mutation is also implicated in related myeloid disorders. For example, Kon et al. [25] reported frequent mutations in cohesin components in a variety of myeloid neoplasms, including AML, myelodysplastic syndromes (MDS), chronic myelomonocytic leukemia (CMML), chronic myelogenous leukemia (CML) and classical myeloproliferative neoplasms (MPN). Deep sequencing revealed that the majority of cohesin mutations existed in the major tumor populations, indicating they arose early in neoplasia. Strikingly, despite cohesin’s known role in sister chromatid cohesion, myeloid malignancies with cohesin mutations were no more likely to be aneuploid than leukemias harboring other mutations [25]. Kon et al. conclude that, owing to their early origin and frequency in myeloid neoplasms, cohesin mutations actively contribute to leukemogenesis [25].

Further evidence of cohesin’s involvement in myeloid malignancies emerged from a recent study by Haferlach et al. showing that approximately 15% of patients with MDS harbor cohesin mutations [29]. The high proportion of cohesin mutations in MDS, combined with the fact that STAG2 and SMC1A mutations were significantly associated with poor survival outcome, strongly suggests that cohesin mutation is central to the development and prognosis of MDS [29].

Yoshida et al. identified a striking association of cohesin mutation with another myeloid dysplasia, DS-AMKL [30]. Down’s Syndrome (DS) patients can present with transient abnormal myelopoiesis (TAM) that is self-limiting in most cases. TAM is a myeloid proliferation resembling AML, and 10% of TAM progresses to non self-limiting acute megakaryoblastic leukemia (AMKL) in DS patients (DS-AMKL). Deep sequencing revealed that 53% of DS-AMKL samples had acquired cohesin mutations that were not found in somatic cells or the original TAM [30]. The high frequency of lesions in cohesin raises the strong possibility that cohesin mutation is instrumental to progression to DS-AMKL.

Despite the prevalence of cohesin mutations in myeloid dysplasia, the exact mechanism by which cohesin lesions contribute to cancer development is unclear. Accumulating evidence argues that cohesin mutation is an early event in myeloid oncogenesis. Welch et al. and TCGA showed that cohesin mutations mainly occur in the most immature AML subtypes [20, 21]; clonal analysis by Kon et al. [25] and allelic burden analysis by Thol et al. [24] suggest that cohesin mutations occur as early events in leukemogenesis. What is the mechanism by which these mutations lead to cancer? In solid tumors with cohesin mutations, chromosome instability and aneuploidy have been suggested as the mechanisms by which cohesin mutation facilitates neoplasia [16, 31, 32], although other evidence argues against this idea [33]. For myeloid malignancies, a clear theme is emerging: heterozygous cohesin mutations do not cause chromosome instability [21, 22, 24, 25, 30] (for details, see Additional file 1: Table S1). This suggests that, at least in myeloid cancers, it is cohesin’s non-mitotic roles that contribute to oncogenesis.

Cohesin regulates gene transcription

Cohesin’s role in gene expression has been intensively investigated over the last 15 years [10]. Several examples of cohesin-dependent gene regulation have been found, including regulatory roles at developmental genes [5] and in stem cells [34]. One of the potential mechanisms by which cohesin regulates gene transcription is through mediating long-range communication events that form DNA loops, which regulate transcription [6]. Enhancers (which promote transcription) and insulators (which usually block transcription) are located in c onserved r egulatory e lements (CREs) on chromosomes, and need not be close to the gene(s) they regulate. Cohesin is thought to physically connect distant CREs with gene promoters, in a cell type-specific manner, to modulate transcriptional outcomes [6] (Figure 2). Therefore, mutations in cohesin could impede cohesin binding to CREs, thereby altering their interaction with promoters, and subsequently gene activity. Similarly, mutations in the CREs that affect cohesin binding could alter transcription of the gene target(s) of that CRE.

Figure 2
figure 2

Cohesin regulates gene expression by controlling CRE-promoter interactions. CREs can regulate gene expression by physically contacting a promoter, but are often located at a distance (tens of kilobases and sometimes megabases) from the promoter. Cohesin is involved in the establishment and maintenance of CRE-promoter interactions and can thereby control gene expression. Loss of cohesin can lead to loss of CRE-promoter interactions, resulting in inappropriate gene repression, or gene activation.

In addition to connecting CREs with promoters, cohesin has an important role in organizing global genomic architecture. Cohesin binding of DNA together with CCCTC-binding factor (CTCF) helps to partition the genome into megabase-sized regions known as t opologically a ssociated d omains (TADs) [3537]. TADs are demarcated by boundaries that are characterized by the presence of cohesin and CTCF, housekeeping genes, tRNAs and short interspersed element (SINE) retrotransposons [35]. Within TADs are regions of local chromosome interactions, which allow CREs to come into physical proximity with gene promoters to modulate gene expression [3840]. While TAD boundaries are conserved between cell types, the chromosome interactions within TADs vary, and provide a means for enabling cell type-specific transcription [35, 38, 40].

Although cohesin and CTCF frequently colocate on chromosomes [4143], they appear to have distinct roles in genome architecture [40, 44, 45]. Cohesin influences gene expression by coordinating interactions between CREs and promoters within TADs [38, 40], while CTCF is important for preventing interactions between TADs [40]. Cohesin deficiency reduces the number of chromosome interactions within TADs and leads to altered expression of many genes [3840]: different genes to those dysregulated upon CTCF depletion [40]. Because of the cell type specificity of CRE-promoter interactions within TADs, cohesin deficiency could result in an abnormal transcriptional profile for a particular tissue type (Figure 3). Moreover, there are several genomic sites where cohesin binds exclusively of CTCF, in combination with tissue-specific transcription factors [44, 45]. For example, in mouse primary liver cells and hepatocellular carcinoma cells (HepG2), CTCF-independent cohesin binding sites are associated with expression of liver-specific genes [44].

Figure 3
figure 3

Model for cohesin’s role in AML and other myeloid malignancies. Cohesin has an important function in the nucleus: it mediates chromosome interactions within topologically associated domains (TADs). Within TADs, cohesin connects conserved regulatory elements (CREs) with promoters, thereby regulating gene transcription. When cohesin function is compromised by a heterozygous mutation, as in AML, this leads to loss of CRE-promoter communication at specific hematopoietic genes, such as RUNX1. The result is dysregulation of hematopoietic transcription programs, which could facilitate the development of AML. In addition, loss of tissue-specific sub-domain structures affects the global hematopoietic transcription program.

Strikingly, only a modest reduction in chromatin-bound cohesin is sufficient to cause changes in gene expression [46]. In human cells and mice, heterozygous mutations in the cohesin-loading factor NIPBL or in cohesin subunit SMC1A affect the expression of numerous genes [47, 48]. In Drosophila, halving the gene dose of cohesin components robustly affects gene expression [49, 50]. Therefore, leukemias with heterozygous cohesin mutations are also likely to be affected by the dysregulation of many genes.

Altered cohesin function has potential to perturb hematopoietic gene expression

A number of hematopoietic transcription factors are regulated by cohesin binding to CREs or promoters. For example, the hematopoietic transcription factor TAL1 is regulated at the transcriptional level by a chromatin hub containing cohesin and CTCF [51]. In further examples, the GATA2 gene contains an intronic +9.5 kb enhancer that is important for its expression [52], while the +85 ERG stem cell enhancer contains binding sites for a heptad of hematopoietic transcription factors and is thought to propagate a hematopoietic stem cell-like transcription profile [53]. Our survey of publically available ENCODE data revealed that cohesin binds both the GATA2 and ERG enhancers in hematopoietic cells (K562). Therefore, it is possible that cohesin mutation could alter the activity of these enhancers and their target genes in a leukemogenic setting.

RUNX1 transcription is altered by cohesin deficiency

The developmental transcription factor RUNX1 plays a particularly important role in myeloid malignancies. RUNX1 function is central to early myeloid differentiation and is absolutely required for definitive hematopoiesis [54, 55]. RUNX1 is involved in chromosomal translocations, such as t(12;21) in acute lymphoblastic leukemia in childhood and t(8;21) in acute myeloid leukemia, and is also targeted by point mutations and deletions [56]. Leukemic alterations of RUNX1 lead to abnormal protein function and thus dysregulation of RUNX1 target genes. The importance of RUNX1 function in hematopoiesis and leukemia has generated great interest in determining the factors that regulate its expression.

It is interesting that DS-AMKL leukemias contain three copies of the RUNX1 gene (owing to trisomy 21), as well as having a remarkably high frequency of cohesin mutation (53%) [30]. Evocatively, data from zebrafish provided the first evidence that cohesin regulates tissue-specific Runx1 transcription. In developing zebrafish embryos, a null mutation in the rad21 subunit of cohesin blocked runx1 expression in hematopoietic mesoderm, but not in Rohon-Beard neurons [57]. That cohesin ablation affected hematopoietic progenitors, but not neurons, indicates that the transcriptional role of cohesin is tissue-specific in hematopoietic precursors.

In mouse, a CRE enhancer resides in an intron between the P1 (distal) and P2 (proximal) promoters of Runx1. This enhancer, termed +23 [58] or alternatively, +24 [59], is active only in precursors of hematopoietic stem cells where Runx1 is endogenously expressed [58, 59]. Cohesin subunit Rad21 binds the Runx1 + 23/24 mouse enhancer region, which is also conserved in human [60]. ENCODE data from the leukemia K562 cell line indicates that the equivalent human CRE/enhancer also recruits cohesin subunits, together with CTCF [60].

In zebrafish, Marsman et al. showed that cohesin depletion altered the activity of intronic runx1 CREs [60]. Multiple binding sites were identified for cohesin and CTCF in the zebrafish runx1 gene, coinciding with active CREs in the intron between P1 and P2. Cohesin and CTCF determine the spatial distribution of runx1 transcripts in the zebrafish embryo at the onset of runx1 expression, likely by controlling intronic CRE activity and CRE-promoter interactions. CTCF appears to restrict the expression pattern of runx1, consistent with insulator activity, while cohesin is necessary for its expression in a specific subpopulation of hematopoietic progenitors [57, 60].

Interestingly, Marsman et al. also showed that siRNA knock down of cohesin (but not CTCF) in HL-60 myelocytic leukemia cells enhanced RUNX1 transcription [60], indicating that cohesin’s transcriptional role is conserved in human cells. It is tempting to speculate that cohesin mutation leading to an increase in RUNX1 transcription might exacerbate myeloid malignancies that already have excess RUNX1; for example, DS-AMKL [30].

In summary, it appears that cohesin has a crucial role in cell type-specific regulation of Runx1, likely by mediating interactions between CREs and the promoters of Runx1. In support of this idea, ChIA-PET data generated in K562 cells using RNA polymerase II demonstrated that the two promoters of human RUNX1 are in physical proximity with each other, and with CREs in the intron between the two promoters [61]. It is not yet known whether these interactions regulate RUNX1, or whether they are cohesin-dependent. While formal proof of this kind of mechanism for cohesin regulation of Runx1 is still to come, the link between cohesin mutation and spatiotemporal Runx1 transcription may explain cohesin’s contribution to AML pathogenesis and other myeloid malignancies.

Conclusions

Mutations in cohesin comprise a novel genetic pathway significantly associated with the development of AML and related leukemias. While several types of cancer do have cohesin mutations, most cancers also harbor many additional mutations in multiple gene categories [15]. By contrast, AML genomes contain relatively fewer mutations than other cancer types, with only 23 genes significantly mutated [20]. Four of these genes correspond to cohesin subunits [15, 20], indicating that cohesin mutations are particularly important to the progression of AML.

Why do myeloid disorders have a high prevalence of cohesin mutations in particular? The answer could reside in cohesin’s potential to mediate global transcriptional activity in a way that is also exquisitely cell type-specific.

Evidence that cohesin regulates cell type-specific global gene transcriptional programs, and in particular, expression of the AML-associated transcription factor, RUNX1, could explain why cohesin mutations are so prevalent in myeloproliferative disorders. Perhaps correct differentiation along the myeloid pathway relies on accurate expression of key genes (such as RUNX1) that can only respond to a full complement of cohesin. When cohesin function is impaired, differentiation of myeloid precursors might be prevented, facilitating dysplasia. These notions support previous hypotheses that cohesin is likely to play an important role in hematopoiesis [57, 62].

Remarkably, cohesin binds to a majority of cell type-specific transcription factor binding sites, even when transcription factors themselves are evicted during mitosis [63]. In this manner, cohesin binding may ‘bookmark’ transcription factor binding sites to re-establish transcriptional programs after cell division [63], including sites for hematopoietic transcription factors.

Further research will be necessary to understand exactly how cohesin functions in normal and abnormal hematopoiesis, and how cohesin mutations cooperate with other genetic events to progress leukemia.