Summary

The Drosophila Groucho (Gro) protein was the founding member of the family of transcriptional co-repressor proteins that now includes the transducin-like enhancer of split (TLE) and Gro-related gene (Grg) proteins in vertebrates. Gro family proteins do not bind DNA directly, but are recruited by a diverse profile of transcription factors, including members of the Hes, Runx, Nkx, LEF1/Tcf, Pax, Six and c-Myc families. The primary structure of Gro proteins includes five identifiable regions, of which the most highly conserved are the amino-terminal glutamine-rich Q domain and the carboxy-terminal WD-repeat domain. The Q domain contains two coiled-coil motifs that facilitate oligomerization into tetramers and binding to some transcription factors. The WD domain folds to form a β-propeller, which mediates protein-protein interactions. Many transcription factors interact with the WD domain via a short peptide motif that falls into either of two classes: WRPW and related tetrapeptides; and the 'eh1' motif (FxIxxIL). Gro family proteins are broadly expressed during development and in the adult. They have essential functions in many developmental pathways (including Notch and Wnt signaling) and are implicated in the pathogenesis of some cancers. The molecular mechanisms through which Gro proteins act to repress transcription are not yet well understood. It is becoming clear that Gro proteins have different modes of action in vivo dependent on biological context and these include direct and indirect modification of chromatin structure at target genes.

Gene organization and evolutionary history

The groucho (gro) gene family is found only in metazoa and is named after the phenotype of the first identified mutation in the family: gro1mutant Drosophila melanogaster display clumps of extra bristles above the adult eyes that resemble the distinctive bushy eyebrows of the American film star and comedian Groucho Marx [1]. Subsequently, human homologs were identified, but were named Transducin-Like Enhancer of split (TLE) proteins because of apparent structural similarities to β-transducin and the adjacency of Drosophila gro to the Enhancer of split (E(spl)) complex [2, 3]. To complicate nomenclature further, when homologs were first isolated from mouse, they were named Groucho-related-gene (Grg) proteins [4], and the Caenorhabditis elegans gro homolog is known as unc-37 [5]. TLE and Grg have often been used interchangeably for vertebrate orthologs in the literature and in sequence databases. For simplicity, we shall use the term Gro proteins to refer to the entire family.

Drosophila and C. elegans each contain a single Gro protein. There are two in the tunicate Ciona, four in birds and mammals, and six in teleost fish (Figure 1 and see also [6]). It has been proposed that the evolution of the multiple Gro proteins found in Chordata involved several independent duplication events [6].

Figure 1
figure 1

A phylogenetic tree of the WD domains from Groucho/TLE/Grg family members. The protein sequences of known Gro family members were extracted from Refseq [56], and searched using BLAT [57] against the current UCSC genome browser [58] releases of the assembled genomes of mosquito (ag), honeybee (am), dog (cf), Ciona intestinalis (ci), Ciona savignyi (cs), Drosophila melanogaster (dm), zebrafish (dr), chicken (gg), human (hs), opossum (md), mouse (mm), medaka (ol), Tetraodon (tn), and Xenopus tropicalis (xt). The matching regions of the genomes were extracted and aligned against known RefSeq sequences, using Wise2 [59], to derive orthologous protein sequences. The WD-domain regions were aligned using ClustalX 2.0 [60] and bootstrapped neighbor-joining trees [61] were generated and visualized with NJPlot [62]. The branch lengths are proportional to the amount of inferred evolutionary change, and numbers between internal nodes indicate bootstrap values as percentages of 100 replications. Accession numbers for the sequences are in Additional data file 1.

Despite the divergent names, Gro proteins show a great deal of sequence conservation, especially in the carboxy-terminal WD domain, where most share at least 86% amino-acid identity (see Figure 1 and [6]). More sequence changes are observed in the Q domains of these proteins. However, the groupings of orthologs in a phylogenetic tree based on Q-domain amino-acid sequence are essentially the same as those based on the WD domain (Additional data file 1).

The carboxy-terminal WD domain of Drosophila and vertebrate Gro proteins also shows significant conservation with the WD domain of the yeast TUP1 co-repressor protein [7, 8]. The sequences outside this region are very divergent, however, so TUP1 is not generally considered a bona fide member of the Gro family, although it probably represents an ancestral form.

Characteristic structural features

The primary structure of Gro proteins includes five regions defined by their evolutionary conservation: they are, in order, Q, GP, CcN, SP and WD (Figure 2). The amino-terminal Q domain and the carboxy-terminal WD-repeat domain are the most highly conserved and rigorously characterized features of this protein family.

Figure 2
figure 2

Domains within Groucho/TLE/Grg family proteins. Gro/TLE/Grg proteins are characterized by five evolutionarily conserved and distinct domains. The amino-terminal Q domain contains two predicted amphipathic α-helices (AH1 and AH2) and mediates oligomerization and protein-protein interactions. The three central domains, GP, CcN and SP, are less well conserved across evolution and their structures are not known. The WD domain is highly conserved across evolution, folds to form a seven-bladed β-propeller and mediates protein-protein interactions.

Sequences within the glutamine-rich Q domain are predicted to form two amphipathic α-helical motifs, referred to as AH1 and AH2, which facilitate oligomerization into tetramers and binding to some transcription factors (for example, LEF1/TCF, FoxA, c-Myc [911]).

The glycine/proline rich GP domain has been implicated in the recruitment of histone deacetylases [12] and the central CcN domain contains a nuclear localization signal and potential regulatory phosphorylation sites [2]. Although the role of the SP domain (serine/proline rich) is not well characterized, it has been implicated in repression [13].

The crystal structure of the WD domain of human TLE1 has been determined and shown to form a β-propeller with seven blades [8]. Because the WD domain from TLE1 shares a high degree of amino acid sequence identity with other members of the family (>85% for Drosophila and vertebrate orthologs), this structure can be used as a representative model. Many transcription factors interact with the WD domain through short peptide motifs that fall into one of two classes: WRPW and related tetrapeptides in Hes and Runx family proteins; and the eh1 motif FxIxxIL (where x can be any amino acid) in Engrailed, Goosecoid, Pax, Nkx and FoxD. (For a more comprehensive list of WRPW- and eh1-containing factors see [14]). The WRPW motif forms a very compact structure when bound to Gro/TLE, whereas the eh1 motif adopts a helical conformation [15]. A combination of genetic, biochemical and structural studies has shown that both these distinct motifs bind across the central pore of the β-propeller (Figure 3 and see [15]).

Figure 3
figure 3

Binding of the WRPW and eh1 peptide motifs to the WD domain. Model showing the structure of the WD domain (rainbow ribbons) bound to the WRPW and eh1 (FSIDNIL) peptide motifs (light-gray sticks). Although these peptide motifs fold to form quite different structures, they bind overlapping sites on the same surface across the central pore of the β-propeller. For a more detailed structural analysis see [15].

Two additional proteins have been identified in vertebrates that are closely related to the Gro family in amino-acid sequence. First, AES (amino-terminal Enhancer of split)/Grg5, contains just the amino-terminal Q and GP domains. This protein acts as a negative regulator of Gro proteins in some contexts [1618]. Second, TLE6/Grg6 (found only in mammals) contains a WD domain closely related to Gro proteins, but with a highly divergent amino-terminal region [19]. TLE6/Grg6 has been shown to compete with the binding of TLE1 to FoxG1/BF-1. TLE6/Grg6 does not repress transcription when targeted to DNA and acts to antagonize TLE1 repression.

Localization and function

Gro proteins are broadly expressed and play numerous key roles during animal development. Consistent with their function as transcriptional co-repressors, they show nuclear localization. A single report documents the presence of non-nuclear TLE1/Grg1 in a subset of neuronal dendrites in the adult rat brain [20], but the biological function of this is not yet known.

The biological roles of this family have been most intensively studied in fruit-fly development, facilitated by the numerous (almost 100) mutant alleles isolated so far in the D. melanogaster gro gene [1, 21]. Drosophila Gro is expressed ubiquitously throughout development [22]. It acts in key developmental signaling pathways, including those mediated by Notch, Wnt, Hedgehog, and Decapentaplegic/bone morphogenetic proteins (Dpp/BMP), and has well characterized roles in anterior-posterior segmentation, neural development, sex determination and patterning of the imaginal discs [23] (reviewed in [13, 14, 24, 25]).

The expression patterns of Grg1-4 have been systematically documented during avian development [26]. These proteins have largely overlapping patterns of expression in the primitive streak and Hensen's node, and later in the anterior central nervous system, ventricular zone of the neural tube, notochord, paraxial mesoderm, myotome, dermomyotome and limb buds. The expression pattern of all six family members in a fish, the medaka, has also been characterized -in particular during ear development [27].

Expression patterns in other vertebrate model organisms have been documented in a more piecemeal fashion; patterns of expression similar to those in birds and fish have been reported for some TLE/Grg proteins in Xenopus laevis

[28] and mouse [4, 2932]. Northern blots from adult human tissue revealed that transcripts corresponding to all four TLEs were expressed to some degree in all tissues examined (heart, brain, placenta, lung, liver, muscle, kidney and pancreas); however, the abundance of each transcript varied, depending on the tissue [2].

Studies of Gro protein function in vertebrate systems have been hampered by the presence of multiple family members expressed in overlapping domains, making loss-of-function and genetic analyses extremely difficult. Overexpression, cell culture, and expression-pattern studies have, however, indicated that vertebrate Gro proteins act in many processes, including neural development, somitogenesis, establishment of left/right asymmetry, osteogenesis and hematopoiesis [33]. Recently, it has been revealed that the expression of human TLE proteins is significantly altered in several types of tumor and that overexpression of Grg1 in the mouse induces lung adenocarcinoma [18]. Thus, Gro proteins may contribute to the pathogenesis of some cancers. The various known functions of Gro proteins in vertebrate development and disease have been summarized in more detail in two recent reviews [14, 33].

Mechanism

It is well established that Gro proteins act as transcriptional co-repressors; they do not interact with DNA directly, but are recruited to the regulatory region of target genes by DNA-binding transcription factors. However, it is not known how Gro proteins then act to switch off transcription, and several different models have been proposed. These models involve either direct or indirect chromatin modifications or interactions with the core transcriptional machinery. The co-repressor activity of Gro can also be altered by various posttranslational modifications. The confusing, and at some times conflicting, observations made about Gromediated repression can be reconciled if, as it now seems most likely, Gro proteins repress transcription through more than one distinct molecular mechanism, depending on context.

Several observations point to Gro proteins being able to interact with and modify chromatin directly to cause transcriptional repression, although the mode of this regulation remains unclear. It has been shown that the amino-terminal region of Drosophila Gro, lacking the WD domain, is necessary and sufficient for binding to histones and that Gro binds to all four core histones, with a preference for histone 3 [7]. Grg3 is also reported to stably bind nucleosome arrays assembled in vitro and appears to have an intrinsic chromatin-modifying function [11]. Chromatin binding by Grg3 enables transcription factor recruitment and induces closed, DNase1-resistant chromatin spanning three to four nucleosomes. In contrast to the previously reported requirements for the amino-terminal domains in Drosophila, however, the WD domain made the major contribution to chromatin binding in this system [11]. Thus, it is not clear if these two sets of results reveal the same or complementary modes of Gro-mediated repression.

In addition to its direct interaction with chromatin, Drosophila Gro has been shown to interact with a histone deacetylase, HDAC1 (encoded by the rpd3 gene in flies), via the GP domain and that this interaction augments Gro-mediated repression in tissue culture cells [12]. rpd3 mutants show segmentation defects, consistent with Gro's known roles in segmentation. However, rpd3 embryos do not share many of the other distinctive characteristics of gro mutant embryos, including the strong neurogenic phenotype. Thus, either Gro can recruit additional HDACs, or HDAC activity is only essential in some developmental contexts.

Gro proteins may also interact directly with the core transcription machinery to repress transcription. A genetic interaction has been established between unc-37 and genes encoding components of the Mediator complex in C. elegans [34], although formally this interaction may reflect indirect effects.

Results from studies in which Gro is ectopically expressed in Drosophila cultured cells and larvae had indicated that oligomerization of Gro proteins is necessary for repression [35, 36]. This led to a model in which Gro inhibits transcription by 'spreading' along chromatin to impose repressive chromatin structure. However, in vivo analysis of a Gro mutated in the Q domain that is unable to oligomerize demonstrates that this is not always the case. Such mutant embryos do not have a null gro phenotype and Gro-mediated repression is affected to different extents, dependent on the context [21].

The interaction between Gro proteins and the recruiting DNA-binding transcription factor is a potential point of regulation by posttranslational mechanisms. For example, TLE1/Grg1 has been isolated in a protein complex that includes poly(ADP-ribose) polymerase 1 (PARP-1), topo-isomerase IIb, nucleolin, nucleophosmin, and Rad50 [37]. This study in rat neural stem cells also revealed that activation of PARP-1 by Ca2+/calmodulin-dependent kinase II (CaMKIIδ) leads to poly(ADP-ribosyl)ation of TLE1/Grg1 and associated factors, resulting in dissociation from Hes1 and the relief of repression.

Phosphorylation of Gro proteins can have either positive or negative effects on repression. In Drosophila, mitogen-activated protein kinase (MAPK) phosphorylates Gro at two sites in response to signaling via the epidermal growth factor receptor (EGFR): Thr308 in the SP domain and Ser510 within the WD domain [38]. These residues are conserved in vertebrate family members. Phosphorylation by MAPK has been shown to downregulate Gro's activity, in particular diminishing repression by the E(spl) basic helix-loop-helix proteins (E(spl)bHLHs; members of the Hes family), which are important effectors of Notch signaling. This provides one mechanism by which EGFR signaling can antagonize Notch signaling during development, and implicates Gro as an important junction between these key developmental pathways. More recently, it has been shown that Gro is phosphorylated by MAPK in response to other receptor tyrosine kinase pathways and that this phosphorylation persists after deactivation of MAPK [39]. Similarly, phosphorylation of residues within the CcN domain of Drosophila Gro by HIPK2 reduces binding to the Eyeless (Pax6) transcription factor and HDAC1, resulting in loss of Eyelessmediated repression [40]. During mitosis, the CcN domain is also phosphorylated by the kinase Cdc2 [41]. This has been proposed to alleviate interactions with chromatin during cell division. However, phosphorylation of the CcN domain of TLE1 by the CK2 kinase promotes the association of transcription factors and chromatin, enhancing repressive activity [42].

Gro activity is also modulated by the binding of accessory factors. In Drosophila, the Runx family member Lozenge requires the Cut homeodomain protein to form a stable complex with Gro and mediate repression [43]. Similarly, binding of Gro to a weak eh1 motif in Dorsal requires the presence of additional transcription factors [44].

Frontiers

It is perhaps surprising that the molecular mechanism of Gro family-mediated repression is so poorly understood 14 years after the first report of Gro acting as a transcriptional co-repressor [23]. Furthermore, many of the biological functions of vertebrate Gro family members are yet to be characterized.

It has become apparent that Gro proteins must repress transcription by various molecular mechanisms in vivo. Thus, the repression mechanism must be considered on a case-by-case basis, dependent on the recruiting transcription factor and biological context, until themes linking mechanism and context become clearly apparent. There are many questions to answer. Does each particular DNA-binding transcription factor always lead to repression via the same molecular mechanism or is the mechanism dependent on the identity of other factors recruited to the target promoters? What is the role of tetramerization? How far along the DNA from the recruitment site do Gro proteins directly and indirectly affect chromatin structure? Can transcription factors that recruit via the Q domain initiate the same repression mechanisms as those recruiting via the WD domain? Is the mechanism for temporary repression (for example, that induced by the highly dynamic expression of E(spl)bHLHs in Drosophila neurogenesis) the same as for more stable repression (for example, the 'long range' repression mediated by Hairy on a modified rhomboid neuro-ectodermal enhancer element [45])? Do all four vertebrate TLE/Grg proteins repress transcription by the same profile of mechanisms?

Further research is also needed into the roles of Gro proteins in vertebrate development. There is a pressing need for mouse strains with conditional knockouts of the TLE/Grg proteins to fully appreciate their roles during mammalian development. Once these strains are available, it will be possible to characterize the individual and combined contribution of each of the four TLE/Grgs to development and determine if they have any specificity of function. A flood of recent papers has shown correlations between TLE/Grg expression and specific human cancers [4651]. It is well established that Gro proteins act as important effectors of Notch signaling through their interactions with Hes proteins, and the deregulation of Notch signaling has been implicated in the pathogenesis of some cancers [52, 53]. In addition, Gro family proteins are known to interact with other transcription factors that influence tumorigenesis, including Runx family proteins, LEF1/TCF and c-Myc [9, 10, 54, 55]. All these observations, taken together with the results of experiments demonstrating that overexpression of Grg1 in adult mice leads to lung adenocarcinoma [18], make a compelling case for further research into the role of Gro proteins in cancer.

Additional data files

Additional data file 1 contains a phylogenetic tree made using sequences from the Q domain only and accession numbers and further details of the Gro/TLE/Grg proteins used to make the phylogenetic trees.