Gene organization and evolutionary history

Sp1 was identified in the early 1980s and was one of the first transcription factors to be purified and cloned from, and characterized in, mammalian cells [1,2]. Sp1 was shown to recognize and specifically bind to GC-rich sites within the simian virus 40 (SV40) promoter via three Cys2His2 zinc-finger motifs. A similar DNA-binding domain had been found in many developmental regulators, including the Drosophila embryonic pattern regulator Krüppel [2]. Subsequently, other transcription factors were identified that had zinc-finger motifs highly similar to those of Sp1, thereby defining a novel class of Sp1-like proteins or Krüppel-like factors (KLFs) [3,4,5,6,7,8,9]. Because many members of the Sp1-like/KLF family have acquired multiple names over time, the nomenclature for these proteins is currently being revised and standardized. In this article, we follow the current nomenclature of Sp1-Sp6, with the remainder of the family called KLFs, and we refer to other names of each protein on first mention.

Sp1-like/KLF proteins are present in species ranging from the nematode Caenorhabditis elegans to humans and appear to have evolved through multiple gene-duplication events [10,11,12,13,14]. The fruitfly, for instance, has three Sp1-like proteins [12,13], whereas up to 21 Sp1-like/KLF genes have been identified in humans by a variety of cloning approaches. So far, homologs of 17 of the 21 human Sp1-like/KLF proteins have been found in mouse, and 11 have been found in rat. Other species, such as zebrafish, have fewer members of the Sp1-like family (Table 1). To date, no systematic comparisons have been made of the structural and functional properties of Sp1-like/KLF proteins in humans and other species.

Table 1 Summary of the functional features of Sp1-like/KLF family members

Like Sp1, factors of the Sp1-like/KLF family can bind various GC-rich DNA elements and regulate transcription. Furthermore, many Sp1-like transcription factors both in mammals and in invertebrates are involved in processes regulating cell growth and control morphogenetic pathways [15,16]. Unlike other important developmental regulators, such as Hox transcription factors or factors containing Krüppel-associated (KRAB) boxes [17,18,19], that are encoded in gene clusters, the genes encoding Sp1-like proteins are randomly dispersed throughout the genome and their products are thought to function independently; the exception is one locus that contains two genes, namely those coding for KLF2 (LKLF) and KLF1 (EKLF). The gene structures of Sp1-like/KLF proteins have not been studied in detail.

Within the Sp1-like/KLF family, several subgroups have been defined on the basis of sequence and functional similarities (Figure 1a). The factors that are most highly related to Sp1 are named Sp1-Sp6 and form one subgroup (the 'Sp' proteins or subgroup I). The other Sp1-like/KLF proteins make up two additional subgroups (subgroups II and III). According to the rules of the new nomenclature, these proteins are numbered as KLF factors, corresponding to the approximate order in which the genes were described (KLF1-KLF16; see Table 1).

Characteristic structural features

To function as site-specific transcription factors, proteins require at least three domains: a DNA-binding domain, a nuclear localization signal, and a transcriptional regulatory domain. The defining feature of Sp1-like/KLF proteins is a highly conserved DNA-binding domain (more than 65% sequence identity among family members) at the carboxyl terminus that has three tandem Cys2His2 zinc-finger motifs (Figure 1b,2). In addition to DNA binding, the zinc-finger motifs may also function in protein-protein interactions that modulate DNA-binding specificity [20,21]. The amino-terminal regions of the Sp1-like/KLF proteins are much more variable and contain transcriptional activation or repression domains. In addition, Sp1-like/KLF proteins have nuclear localization sequences, which can occur immediately adjacent to, or within, the zinc-finger motifs [22,23].

Figure 1
figure 1

Mammalian members of the Sp1-like/KLF family. (a) A phylogenetic tree of human Sp1-like/KLF proteins and mouse Sp5 and Sp6 (mSp5 and mSp6) identifies three general subgroups. Subgroup I consists of the proteins most highly related to Sp1 (Sp1-Sp6). The other Sp1-like/KLF proteins are divided into two additional groups (subgroups II and III). The tree was generated using Genetic Computer Group (GCG) sequence analysis software. (b) Sequence alignment of the zinc-finger domains of Sp1-like/KLF protein family members. The sequence of the zinc-finger motifs of human Sp1 was compared with the corresponding regions of previously identified human Sp1-like proteins and with mouse Sp5. The consensus zinc fingers (ZF1, ZF2 and ZF3) are indicated below the sequences and the amino-acid residues predicted to interact with DNA according to the Klevit model [58] are indicated by arrows. Identical residues are in black, similar residues in gray and different residues in lower case. The percentage similarity between the Sp1 and the other Sp1-like/KLF zinc-finger domains is indicated on the right. Note that the amino acids predicted to make contact with DNA within the first (KHA), second (RER) and third (RHK) zinc-finger domains of Sp1 are nearly identical to the corresponding regions of other members of the Sp1-like/KLF family. All sequences are available in the NCBI human genome database [57].

Figure 2
figure 2

Structural properties of Sp1-like/KLF proteins. Sp1-like/KLF proteins have highly homologous carboxy-terminal DNA-binding domains characterized by three Cys2His2 zinc-finger motifs and recognizing GC-rich DNA elements, and variant amino termini. The members of the family can be classified into subgroups on the basis of common structural and functional features of the amino termini; these correlate well with the subgroups predicted by sequence similarities in Figure 1a. Some members of subgroup I (Sp1, Sp2, Sp3, and Sp4) contain glutamine-rich (Q) and serine/threonine-rich (S/T) amino-terminal transcription activation domains. Two members of subgroup III, KLF10 and KLF11, are TGFβ-inducible repressors and have three conserved amnio-terminal repression domains, including the Sin3 interaction domain (SID), which mediates interaction with the corepressor mSin3A. Three other members of subgroup III, KLF9, KLF13 and KLF16, are also characterized by a functional SID domain. KLF1, KLF2 and KLF4, which belong to subgroup II, are characterized by amino-terminal acidic activation domains, inhibitory regions adjacent to the zinc fingers and a conserved nuclear localization signal (NLS) sequence. In addition, KLF13 contains a similar nuclear localization sequence. Other members of subgroup II, KLF3, KLF8 and KLF12, have a conserved repression motif (PVALS/T) that interacts with the corepressor CtBP2.

The Sp1-like/KLF zinc-finger domain

Each Sp1-like zinc-finger motif conforms to the Cys2His2 zinc-finger consensus sequence C-X2-5-C-X3-(F/Y)-X5-ψ-X2-H-X3-5-H (in the single-letter amino-acid code), where X represents any amino acid and ψ is a hydrophobic residue [24]. The overall amino-acid similarity between the zinc-finger motifs of Sp1 and other members of the Sp1-like/KLF family is a minimum of 66.7 % (Figure 1b), and the length of each motif is invariant: the first two zinc-finger motifs are 23 amino acids and the third zinc finger 21 amino acids long. The linkers separating the zinc fingers are seven amino acids long and also highly conserved - TGE(R/K)(P/k/r)(F/y)X.

The DNA-binding domains of Sp1-like/KLF proteins have not yet been studied using X-ray crystallography, but crystal structures of the related zinc-finger proteins TFIIIA, a general transcription factor, and Zif268, an immediate-early gene, have provided some information about how Sp1-like/KLF proteins bind DNA [25,26,27], as the structures of TFIIIA and Zif268 can be used as templates in molecular modeling experiments. Preliminary results from studies of this type in our laboratory indicate that Sp1-like/KLF proteins bind DNA in a manner similar to TFIIIA and Zif268 (J.K, T.C. and R.U., unpublished observations). These studies, however, have not yet been able to reveal subtle structural features that could correlate with some observed biochemical differences between members of the Sp1-like/KLF family. Thus, more exhaustive analyses using direct biophysical methodology, such as nuclear magnetic resonance or crystallography, combined with molecular dynamic simulations, are necessary to address this problem.

Biochemical DNA-binding studies have shown that most Sp1-like/KLF proteins have similar affinities for different GC-rich sites [28,29,30,31]. Importantly, the amino acids that are predicted to interact with DNA are identical among several members (Figure 1b), and competition for DNA binding has been shown for some of these members; for example, Sp1 and Sp3 compete for the same sites, as do Sp1 and KLF9 (BTEB1), Sp1 and KLF13 (BTEB3), Sp1 and KLF4 (GKLF), and KLF1 and KLF3 (BKLF) [9,30,32,33,34]. It is worth noting that in those cases that have these key residues differing between members of the family, the DNA-binding specificity is frequently altered. For example, Sp2, which has a leucine residue within the first zinc-finger motif in place of the histidine found in the corresponding region of Sp1, preferentially recognizes the GT box (5'-GGTGTGGGG-3'), found in many different promoters, rather than the GC box [9,35]. In addition, class II proteins, with the exception of KLF6 (CPBP), contain a leucine instead of a lysine in the third zinc-finger motif and preferentially bind the 5'-CACCC-3' element sequence, which is found, for example, in the β-globin gene promoter [5,8]. Previous studies also suggested that the linkers between the zinc-finger motifs contribute to high-affinity binding of certain zinc-finger proteins [24]. The linker regions of members of the Sp1-like/KLF family have several potentially relevant differences, but it is currently not known if all of these contribute to differences in DNA-binding activity.

Transcriptional regulatory domains

Despite the high degree of similarity in the DNA-binding activities of the Sp1-like/KLF proteins different family members vary range broadly in their ability to regulate transcription, and thus to regulate morphogenetic processes. Sp1, for instance, is one of the most potent transcriptional activators characterized to date [9,36], whereas KLF11 (TIEG2) functions as a potent transcriptional repressor [37,38]. In addition, several Sp1-like/KLF proteins can function as either activators or repressors, depending on the cellular context in which the function and the promoters they bind [20,23,28,29,33,34,37,38,39,40,41,42,43].

The amino-terminal domains of Sp1-like/KLF proteins are highly variable, and recent studies have revealed that several members of the family regulate transcription by interacting with coactivators and corepressors via specific amino-terminal activation and repression domains (Figure 2). Glutamine-rich regions within the amino termini of Sp1 and Sp3 interact with components of the general transcription factor TAFII130 to activate transcription [44], whereas a PVALS/T motif within the amino termini of KLF3 (BKLF), KLF8 (BKLF3) and KLF12 (AP2-rep) associates with corepressors belonging to the C-terminal binding protein (CtBP) family to mediate transcriptional repression [6,45].

TIEG proteins (TIEG1 and TIEG2; or KLF10 and KLF11) and BTEB proteins (BTEB1, BTEB3 and BTEB4; or KLF9, KLF13 and KLF16), all of which belong to class III of the Sp1-like/KLF family, also share a conserved repression motif, an α-helical domain highly related to the Sin3 interaction domain (SID) of the transcriptional repressor Mad1 (Figure 2). This SID-like domain is sufficient to mediate repression by interacting with the histone deacetylase corepressor complex mSin3A [29,34,38], and this function can be modified by cell-signaling events. For instance, phosphorylation of four residues in a region adjacent to the KLF11 SID-like domain by the signaling pathway involving extracellular regulator kinase 2 (ERK2) disrupts interaction with mSin3A and results in a significant loss of repressor function [46].

Localization and function

Expression and gene-knockout studies are beginning to reveal that most, if not all, Sp1-like/KLF proteins are involved in growth-regulatory or developmental processes of a large number of tissues [4] (Table 1). Sp1, for instance, is ubiquitously expressed in murine cells, and the knockout of this gene leads to gross global morphological defects very early in development [16]. In contrast, other members, such as KLF1 and KLF2, are expressed specifically in erythroid cells and T lymphocytes, respectively, suggesting a more cell-type-specific function for these factors. Indeed, the knockout of KLF1 results in selective defects in erythropoiesis [47,48], whereas KLF2 is involved specifically in T-cell quiescence and survival [49]. In Drosophila and Xenopus, homologs of Sp1 appear to be important for development [11,12,13].

Because many Sp1-like/KLF proteins regulate cell growth in a variety of cell types, it is not surprising that some members of the family also appear to participate in mechanisms leading to carcinogenesis. Sp1 expression and activity is observed to be increased in epithelial carcinomas compared with benign tumors, such as papillomas, suggesting that Sp1 may be involved in tumor progression [50]. Similarly, KLF4 appears to promote cell growth in some cancers, such as breast cancer [51,52], and is downregulated in a manner similar to tumor suppressor genes in other cancers [53], suggesting that this gene may play distinct roles in carcinogenesis in different contexts. KLF6 was also recently reported as a candidate tumor suppressor gene that is mutated in prostate cancer [54], and the ability of KLF6 to inhibit cell growth was reduced by mutations within its transcriptional regulatory domain.

Together, these results indicate that while some Sp1-like/KLF proteins play a ubiquitous role in mammalian cell physiology, others have more cell-type-restricted functions. All of them appear to participate in morphogenetic pathways, however. It is therefore important to begin to understand both the similarities and differences between family members that direct their individual functions.

Mechanism

Several members of the Sp1-like/KLF family can regulate transcription by interacting with corepressors or coactivators. Recent evidence suggests that histone acetylation and deactylation, which are associated with repression and activation of transcription, respectively, may serve as a switch for Sp1-like/KLF proteins to function as activators or repressors. KLF13, for instance, activates several promoters (such as, the promoters of SV40, the C-C chemokine RANTES, and γ-globin [40,55,56]), but represses others (such as the cytochrome P450 CYP1A1 [29,34]. This indicates that the trans-regulatory activity of KLF13 is in part promoter-dependent. In the case of activation, Song et al. [20] have shown that the coactivators CREB-binding protein (CBP) and its homolog p300 and the CBP/p300-associated factor (PCAF) bind to and acetylate the zinc-finger domain of KLF13, thereby stimulating KLF13 DNA-binding activity. In addition, a region within the amino terminus of KLF13 has been reported to function as an activation domain, although no coactivator has been shown to associate with this region of the protein. More recently, we [29,38] have also identified three unique repression domains, the SID, R2 and R3 domains, within the amino-terminal region of KLF13 that interacts with the mSin3A histone deacetylase corepressor complex and have shown that this interaction allows KLF13 to repress expression of CYP1A1. The residues of the SID of KLF13 overlap with the amino-terminal region, which can also function as an activation domain [23]. Thus, it will be interesting to ascertain what mechanisms dictate whether KLF13 functions as an activator or a repressor for a given gene. For example, does the histone deacetylase activity recruited by the amino terminus of KLF13 remove the modified acetyl groups within its zinc-finger domain? It will be important to see how these transcriptional regulatory mechanisms function to dictate different developmental processes.

Frontiers

Several lines of investigation are needed to further our understanding of how the many members of the Sp1-like/KLF protein family regulate gene expression in a promoter-, cell-, and tissue-specific manner; whether they antagonize each other's functions to fine-tune specific cellular processes; and whether they participate in a hierarchical cascade of gene expression. The identification of the structural features that correlate with either similarities or differences between family members are necessary not only for better understanding of the biochemistry of Sp1-like/KLF proteins but also for a theoretical framework to be constructed for use in the development of specific small-molecule antagonists, which can be used to manipulate Sp1-like/KLF proteins both in vitro and in vivo.

Another important area of research pertains to the transcriptional regulatory function of the amino-terminal domains of these proteins. What are the molecular mechanisms that regulate how these domains interact with coregulatory complexes and thereby repress or activate gene expression? Further insights into the functions of different Sp1-like/KLF proteins have the potential to change the partially informative classification of these proteins, which is currently based on primary structure. Lastly, because these proteins are important in morphogenesis, it is likely that they may play a significant role in the mechanisms underlying human diseases that are characterized by aberrant growth and differentiation, such as cancer. Future studies of the Sp1-like/KLF proteins have a large potential for defining the machinery that not only regulates physiological processes but may also modulate human diseases.