Haematopoiesis represents one of the best studied models of adult stem cell development and differentiation [1, 2]. Powerful techniques allow purification and in vitro as well as in vivo functional assays of small subsets of cells, from haematopoietic stem cells (HSCs) via a plethora of intermediate progenitors to fully mature cell types. Transcription factors (TFs) directly regulate gene expression and thus control cellular phenotypes. It is no surprise, therefore, that TFs have emerged as some of the most powerful regulators of both normal development and disease.

TFs play important roles during haematopoiesis, from stem cell maintenance to lineage commitment and differentiation. However, relatively little is known about the way in which regulatory information is encoded in the genome, and how individual TFs are integrated into wider regulatory networks. Based on the recent analysis of large-scale efforts to reconstruct tissue-specific regulatory networks, it has been suggested that transcriptional regulatory networks are characterised by a high degree of connectivity between TFs and transcriptional cofactors. Extensive cross- and autoregulatory links therefore create densely connected regulatory circuits that control the large numbers of tissue-specific effector proteins (en-zymes, structural proteins) [3, 4] (Figure 1). To understand the functionality of large mammalian regulatory networks, it will therefore be important to identify downstream target genes of specific TFs as well as gain insight into combinatorial TF interactions. This in turn will not only provide fundamental insights into normal development, but also advance our understanding of how deregulation of networks contributes to pathology.

Figure 1
figure 1

Transcription factor networks control cellular phenotypes. Transcription factors (TFs) together with cofactors (Co-TF) form densely connected regulatory networks that define cellular phenotypes by regulating large numbers of effector genes coding for cell-type-specific structural proteins and enzymes.

The cis-regulatory regions of a gene locus can be thought of as different modules, each partaking in an important role, such as driving expression of the gene to a specific subset of cells or a specific tissue type. The activity of each regulatory region is controlled by a distinct set of upstream regulators. The individual regulatory regions within a given gene locus may have over-lapping or very distinct upstream regulators, and it is the combined activity of all these regions that ultimately controls gene expression. Comprehensive identification and characterisation of true functional cis-regulatory regions therefore represent an essential prerequisite to integrate important regulatory genes into wider transcriptional networks. Traditionally, DNaseI mapping was performed to identify regions of open/accessible chromatin. More recently, comparative genomic sequence analysis has been used to identify highly conserved sequences, which were taken to represent candidate regulatory elements based on the premise that sequence conservation indicated an important function [57]. The most recent development has been that of whole genome re-sequencing, which when coupled with chromatin immunoprecipitation assays allows genome-wide mapping of the chromatin status for a given histone modification [8]. Though more predictive than previous approaches, these techniques still require functional validation of candidate elements, which involves in vivo and in vitro experiments to assess the true function of a given candidate regulatory region.

Several gene loci coding for TFs essential for haematopoiesis have been characterised using a combination of the above techniques. Collectively, these studies provided important insights into TF hierarchies and regulatory network core circuits [911]. This review will specifically focus on three haematopoietic loci, encoding the key haematopoietic regulators Scl/Tal1, Lmo2 and Gfi1.

Transcriptional regulation of Scl

The basic helix-loop-helix TF Scl/Tal1 is a key regulator of haematopoiesis with additional important roles in the development of the vascular and central nervous systems [1216]. Within the haematopoietic system, Scl is essential for the development of HSCs as well as further differentiation into the erythroid and megakaryocytic lineages [17].

Since correct spatio-temporal expression of Scl is crucial for the appropriate execution of its biological functions, much effort has been invested into understanding how Scl is regulated. Using a combination of long-range comparative sequence analysis and both in vitro and in vivo functional analysis, multiple cis-regulatory elements have been identified in the murine Scl locus, each of which directs expression to a subdomain of endogenous Scl expression when tested in transgenic mice (Figure 2). Scl has three promoters located in different exons (exons 1a, 1b and exon 4), none of which displayed haematopoietic activity when tested in transgenic mice. A search for additional cis-regulatory elements led to the identification of three haematopoietic enhancers (-4, +19 and +40 kb). The -4 Scl enhancer, characterized by the presence of five Ets sites, drives expression to endothelium and fetal blood progenitors [18]. The +19 Scl enhancer was shown to drive expression of Scl in HSCs, haematopoietic progenitors and endothelial cells [1921] and critically depended on an Ets/Ets/GATA composite motif shown to be bound in vivo by Elf-1, Fli-1 and Gata2 [22]. Of note, the +19 enhancer was flanked by a nearby hypersensitive site (+18 Scl element), which did not function as an enhancer but contains a mammalian interspersed repeat that is essential for its ability to 'boost' activity of the +19 element [23]. The +40 Scl enhancer drives expression to erythroid cells [24, 25] as well as midbrain and is characterized by the presence of two Gata/E-box motifs. Mutation or deletion of a single one of these motifs leads to a loss of function of the enhancer [24, 25].

Figure 2
figure 2

Scl cis -regulatory elements. The genomic locus of the murine Scl gene and adjacent genes are drawn schematically in the top panel (boxes represent exons and arrowheads indicate gene orientation). The middle diagram shows a Vista plot illustrating sequence conservation between the mouse and human Scl locus. Functional Scl cis-regulatory elements are highlighted in red. Bottom panels show whole mount LacZ staining of embryonic day 12.5 transgenic embryos and relevant histological sections for each individual Scl cis-regulatory element. The -4 Scl and +18/19 Scl enhancers target endothelium and haematopoietic progenitors; promoter 1a and the +23 Scl enhancer target ventral midbrain; promoter 1b targets hind brain and spinal cord, and the +40 Scl enhancer targets midbrain and erythroid cells [1821, 2325].

Taken together, these studies have highlighted the presence of three haematopoietic enhancers within the murine Scl locus, with distinct yet overlapping regulatory codes that contribute to the overall correct spatio-temporal expression of Scl. Interestingly, a recent study comparing the functionality of the mouse Scl enhancers with their corresponding chicken counterparts suggested that elements shared by mammals and lower vertebrates exhibit functional differences and binding site turnover between widely separated cis-regulatory modules [26]. Remarkably, however, the regulatory inputs and overall expression patterns remain the same across different species. This in turn suggested that significant regulatory changes may be widespread, and not only apply to genes with altered expression patterns, but also to those where expression is highly conserved.

Transcriptional regulation of Lmo2

The Lim domain only 2 gene (Lmo2) encodes a transcriptional cofactor that is essential for haematopoiesis [27, 28]. The Lmo2 protein does not bind to DNA directly but rather participates in the formation of multipartite DNA-binding complexes with other TFs, such as Ldb1, Scl/Tal1, E2A and Gata1 or Gata2 [2931]. Lmo2 is widely expressed across haematopoiesis with the exception of mature T-lymphoid cells where aberrant expression of Lmo2 results in T-cell leukaemias [32].

Lmo2 contains three promoters: the proximal promoter, which drives the majority of expression in endothelial cells [33]; the distal promoter, which is active in the fetal liver and specific T-cell acute lymphoblastic leukemia (T-ALL) cell lines [34]; and the intermediate promoter, which was detected in CD34+ cells and was implicated in mediating LMO2 expression in T-ALL patients where high levels of LMO2 were present in the absence of any translocation involving the LMO2 locus [35]. However, none of the three promoters on their own displayed robust expression when tested in transgenic mice [33, 36], which led to the identification of eight enhancer elements dispersed over 100 kb that could recapitulate the expression of Lmo2 in normal haematopoiesis [36]. Of note, while individual elements augmented endothelial expression of the proximal promoter, robust haematopoietic expression was only observed when they were combined together (Figure 3). This type of combinatorial collaboration between regulatory elements to obtain haematopoietic activity has been seen for other gene loci, such as Endoglin [37], suggesting a process of step-wise and modular activation of the locus during the development of blood and endothelial cells from their common precursor.

Figure 3
figure 3

Combinatorial interactions of distinct enhancers are critical to recapitulate the endogenous expression of Lmo2. (a) The Lmo2 gene locus is drawn to scale. Exons are shown as black rectangles. Regulatory elements (-75/-70/-25/-12/pP/+1) are highlighted using shapes and distinct colours (-75 = orange diamond; -70 = green octagon; -25 = blue oval; dP = red rectangle; -12 = red triangle; +1 = purple triangle). (b) Transgenic animals were generated with many different combinations of the identified regulatory elements. The -75 enhancer and pP showed strong expression in endothelium, circulating erythrocytes and fetal liver. The -70 enhancer together with pP showed weak staining in endothelium and haematopoietic progenitor cells. The -25 or the -12 enhancer together with pP showed strong expression in endothelium and fetal liver. The +1 enhancer with pP gave rise to lacZ staining in the tail, apical ridge of the limbs, fetal liver and strong endothelium. Only when these elements were coupled together was a staining pattern corresponding to endogenous expression of Lmo2 seen [36]. Strength of staining is indicated: +++, very strong; ++, intermediate; +, weak; -, not present.

Transcriptional regulation of Gfi1

The Growth factor independence 1 gene (Gfi1) was originally identified in a retroviral screen designed to identify regulatory pathways that could initiate interleukin-2 independence in T cells [38]. Within the haematopoietic system Gfi1 is expressed in HSCs [39], specific subsets of T cells [40], granulocytes, monocytes, and activated macrophages [41]. Gfi1-/- mice lack neutrophils [41, 42] and Gfi1-/- HSCs are unable to maintain long-term haematopoiesis because elevated levels of proliferation lead to eventual exhaustion of the stem cell pool [39, 43]. Outside the haematopoietic system, Gfi1 is also specifically expressed in sensory epithelia, the lungs, neuronal precursors, the inner ear, intestinal epithelia and during mammary gland development [4447].

A recent study used a combination of comparative genomics, locus-wide chromatin immunoprecipitation assays and functional validation within cell lines and transgenic animals to identify cis-regulatory regions within the Gfi1 locus [48]. Four regulatory regions (-3.4 kb min pro, -1.2 kb min pro, +5.8 kb enhancer and +35 kb enhancer) were shown to recapitulate endogenous expression patterns of Gfi1 in the central nervous system, gut, limbs and developing mammary glands but no haematopoietic staining was observed. However, a recent genome-wide ChIP-Seq experiment [49] revealed binding of Scl/Tal1 to a region situated 35 kb upstream of the Gfi1 promoter within the last intron of its 5' flanking gene, Evi5. This element was subsequently validated in transgenic assays, which demonstrated lacz staining at multiple sites of haematopoietic stem/progenitor cell emergence (vitelline vessels, fetal liver, and dorsal aorta).

Moreover, the element was also shown to be bound by TFs known to be critical for haematopoiesis, including Scl/Tal1, Pu.1/Sfpi1, Runx1, Erg, Meis1, and Gata2, thus integrating Gfi1 into the wider HSC regulatory network. This study therefore supports the notion that important regulatory elements can be located at a significant distance from the gene they control (Figure 4), and thus emphasize the need for careful interpretation of genome-wide TF binding datasets [49, 50].

Figure 4
figure 4

Combinatorial transcription factor binding identified the Gfi1 -35 kb regulatory region. Raw ChIP-Seq read data from [50] were transformed into a density plot for each transcription factor and loaded into the UCSC genome browser as custom tracks above the UCSC tracks for gene structure and mammalian homology. A discreet binding event for all ten TFs (Scl/Tal1, Lyl1, Lmo2, Gata2, Runx1, Meis1, Pu.1, Fli1, Erg and Gfi 1b) can be seen in the last intron of the 5' flanking gene, Evi5 (indicated by an asterisk). This region was subsequently shown to drive expression in early haematopoietic cells in transgenic mouse embryos [48].

Transcriptional regulation of other key haematopoietic transcription factors

The transcriptional control of several other TFs known to play important roles within haematopoiesis have also been investigated. Runx1 has been shown to be transcribed from two promoter elements, both of which collaborate with the Runx1 +23 kb enhancer to drive expression of Runx1 to sites of HSC emergence [5153]. Moreover, the Runx1 +23 kb region was shown to be regulated by important haematopoietic TFs (Gata2, Fli1, Elf1, Pu.1, Scl, Lmo2, Ldb1 and Runx1 itself) [53, 54]. Lyl1 is known to contain a promoter region that can be divided into two separate promoter elements that are responsible for driving the expression of Lyl1 within endothelial, haematopoietic progenitor, and megakaryocytic cells [55]. These promoter elements were shown to contain conserved Ets and Gata motifs that were bound in vivo by Fli1, Elf1, Erg, Pu.1, and Gata2. Multiple elements within the Gata2 locus have been identified (-77 kb, -3.9 kb, -3 kb, -2.8 kb, -1.8 kb, +9.5 kb and 1 s promoter) [5658] with the -1.8 kb region being essential for maintaining Gata2 repression in terminally differentiating cells [58]. Elf1 contains four promoter elements (-55 kb, -49 kb, -21 kb and proximal), which are used in a cell-type-specific manner in combination with a lineage-specific -14 kb enhancer element [59]. Enhancer elements utilising the Ets/Ets/Gata regulatory code, originally defined in the Scl +19 enhancer, were also identified in the Fli1, Gata2, Hhex/Prh and Smad6 gene loci [5, 57]. The picture emerging, therefore, is that transcriptional control of important haematopoietic TF loci is achieved through multiple regulatory elements but the number of upstream regulators may be relatively small. The same binding motifs are repeatedly found, but it is the precise arrangement within a single element as well as the interactions between elements that ultimately control expression.

Conclusion

Recent analysis of gene regulatory networks controlling pluripotency in embryonic stem cells suggests that a finite number of major combinatorial interactions are critical in controlling cellular phenotypes [60, 61]. Identification and subsequent functional characterisation of specific regulatory elements provides a powerful route into deciphering these combinatorial regulatory interactions. Whilst traditional methods of identifying regulatory elements should not be overlooked, it is essential to integrate new genome-wide methods to ensure that regulatory elements outside traditional gene loci boundaries are not overlooked. With the genome-wide mapping of TF binding events now eminently feasible, the importance of sequence conservation as a primary technique for identification of regulatory elements will diminish.

Nevertheless, genome-wide mapping of binding events is descriptive and therefore no substitute for conventional functional assays, which are therefore likely to remain an important component of any research programme aimed at elucidating transcriptional control mechanisms.

Note

This article is part of a review series on Epigenetics and regulation. Other articles in the series can be found online at http://stemcellres.com/series/epigenetics