Gene organization and evolutionary history

Gene organization

The prototypical Fgf genes contain three coding exons (Figure 1), with exon 1 containing the initiation methionine, but several Fgf genes (for example, Fgf2 and Fgf3) have additional 5' transcribed sequence that initiates from upstream CUG codons [1,2]. The size of the coding portion of Fgf genes ranges from under 5 kb (in Fgf3 and Fgf4) to over 100 kb (in Fgf12). In several Fgf subfamilies, exon 1 is subdivided into between two and four alternatively spliced sub-exons (denoted 1A-1D in the case of Fgf8). In these Fgf genes, a single initiation codon (ATG) in exon 1A is used. This gene organization is conserved in humans, mouse and zebrafish, but its functional consequences are poorly understood. Other subfamilies of Fgfs (such as Fgf11-14) have alternative amino termini, which result from the use of alternative 5' exons. It is not known whether a common 5' untranslated exon splices to these exons or whether alternative promoter and regulatory sequences are used.

Figure 1
figure 1

Gene structure of selected members of the Fgf family. Only the portion of each gene containing coding exons is shown. Constitutively expressed exons are in black; alternatively spliced exons are in gray. Fgfs1, 2, 4 and 9 contain the prototypic three-exon organization. For Fgf1, 5' untranslated exons are not shown; inclusion of these exons extends the gene by approximately 69 kb [78]. Fgf8 is an example of a gene with 5' alternative splicing, and Fgf13 demonstrates alternatively used 5' exons separated by over 30 kb. References: Fgf1 [78]; Fgf2 [79]; Fgf4 [80]; Fgf8 [52]; Fgf9 [81]; Fgf13 [76].

Most Fgf genes are found scattered throughout the genome. In human, 22 FGF genes have been identified and the chromosomal locations of all except FGF16 are known (Table 1) [3,4,5,6,7]. Several human FGF genes are clustered within the genome. FGF3, FGF4 and FGF19 are located on chromosome 11q13 and are separated by only 40 and 10 kb, respectively; FGF6 and FGF23 are located within 55 kb on chromosome 12p13; and FGF17 and FGF20 map to chromosome 8p21-p22. These gene locations indicate that the FGF gene family was generated both by gene and chromosomal duplication and translocation during evolution. Interestingly, a transcriptionally active portion of human FGF7, located on chromosome 15q13-q22, has been amplified to about 16 copies, which are dispersed throughout the human genome [8].

Table 1 Chromosomal localizations of FGFs in human and mouse

In the mouse, there are at least 22 Fgf genes [3,9], and the locations of 16 have been identified (Table 1). Many of the mouse Fgf genes are scattered throughout the genome, but as in the human, Fgf3, Fgf4 and Fgf19 are closely linked (within 80 kb on chromosome 7F) and Fgf6 and Fgf23 are closely linked on chromosome 6F3-G1.

Evolutionary history

Fgfs have been identified in both invertebrates and vertebrates [3]. Interestingly, an Fgf-like gene is also encoded in the nuclear polyhedrosis virus genome [10]. Fgf-like sequences have not been found in unicellular organisms such as Escherichia coli and Saccharomyces cerevisiae. Although the Drosophila and Caenorhabditis elegans genomes have been sequenced, only one Fgf gene (branchless) has been identified in Drosophila [11] and two (egl-17 and let-756) have been identified in C. elegans [12,13], in contrast to the large number of Fgf genes identified in vertebrates. The evolutionary relationship between invertebrate and vertebrate Fgfs is shown in Figure 2a.

Figure 2
figure 2

Evolutionary relationships within the FGF family. (a) Apparent evolutionary relationships between FGFs from vertebrates, invertebrates and a virus. Amino-acid sequences of nine representative FGFs were chosen from human and compared with FGFs from Drosophila, C. elegans, zebrafish and Autographa californica nuclear polyhedrosis virus. (b) Apparent evolutionary relationships of the 22 known human and murine FGFs. Sequences were aligned using Genetyxsequence analysis software and trees were constructed from the alignments using the neighbor-joining method.

The Fgf gene expansion has been hypothesized to be coincident with a phase of global gene duplications that took place during the period leading to the emergence of vertebrates [14]. Across species, most orthologous FGF proteins are highly conserved and share greater than 90% amino-acid sequence identity (except human FGF15 and mouse Fgf19; see below). To date, four Fgfs (Fgf3, 8, 17 and 18) have been identified in zebrafish, seven (Fgf3, Fgf(i), Fgf(ii), Fgf8, 9 and 20) in Xenopus (Fgf(i) and Fgf(ii) are most closely related to Fgf4 and Fgf6 [15]) and seven (Fgf2, 4, 8, 12, 14, 18 and 19) in chicken [3].

The apparent evolutionary relationships of the 22 known human FGFs are shown in Figure 2b. Vertebrate FGFs can be classified into several subgroups or subfamilies. Members of a subgroup of FGFs share increased sequence similarity and biochemical and developmental properties. For example, members of the FGF8 subfamily (FGF8, FGF17, and FGF18) have 70-80% amino acid sequence identity, similar receptor-binding properties and some overlapping sites of expression (for example, the midbrain-hindbrain junction) [16,17]. Members of FGF subgroups are not closely linked in the genome, however, indicating that the subfamilies were generated by gene-translocation or by genome-duplication events, not by local duplication events.

Human FGF15 and mouse Fgf19 have not been identified. Human FGF19 is evolutionarily most closely related to mouse Fgf15 (51% amino acid identity; Figure 2b) [18] and both the human FGF19 and mouse Fgf15 genes are closely linked to the human and mouse Fgf3 and Fgf4 genes on orthologous regions of human chromosome 11q13 and mouse chromosome 7F (N.I., unpublished observations). These findings indicate that human FGF19 may be the human ortholog of mouse Fgf15. Because all other Fgf orthologs share greater than 90% amino acid identity, it remains possible that the true orthologs of these genes have not been identified, have been lost or have diverged during vertebrate evolution.

Characteristic structural features

FGFs range in molecular weight from 17 to 34 kDa in vertebrates, whereas the Drosophila FGF is 84 kDa. Most FGFs share an internal core region of similarity, with 28 highly conserved and six identical amino-acid residues [19]. Ten of these highly conserved residues interact with the FGF receptor (FGFR) [20]. Structural studies on FGF1 and FGF2 identify 12 antiparallel β strands in the conserved core region of the protein (Figure 3) [21,22]. FGF1 and FGF2 have a β trefoil structure that contains four-stranded β sheets arranged in a triangular array (Figure 3b; reviewed in [23]). Two β strands (strands β10 and β11) contain several basic amino-acid residues that form the primary heparin-binding site on FGF2. Regions thought to be involved in receptor binding are distinct from regions that bind heparin (Figure 3) [21,22,23,24].

Figure 3
figure 3

(a) Structural features of the FGF polypeptide. The amino terminus of some FGFs contains a signal sequence (shaded). All FGFs contain a core region that contains conserved amino-acid residues and conserved structural motifs. The locations of β strands within the core region are numbered and shown as black boxes. The heparin-binding region (pink) includes residues in the loop between β strands 1 and 2 and in β strands 10 and 11. Residues that contact the FGFR are shown in green (the region contacting Ig-domain 2 of the receptor), blue (contacting Ig-domain 3) and red (contacting the alternatively spliced region of Ig-domain 3). Amino-acid residues that contact the linker region are shown in gray [20]. (b) Three-dimensional structure of FGF2, a prototypical member of the FGF family. A ribbon diagram of FGF2 is shown; β strands are labeled 1-12 and regions of contact with the FGFR and heparin are color-coded as in (a) [22,24]. Image provided by M. Mohammadi.

Localization and function

Localization

Subcellular localization and secretion

Most FGFs (FGFs 3-8, 10, 15, 17-19, and 21-23) have amino-terminal signal peptides and are readily secreted from cells. FGFs 9, 16 and 20 lack an obvious amino-terminal signal peptide but are nevertheless secreted [25,26,27]. FGF1 and FGF2 also lack signal sequences, but, unlike FGF9, are not secreted; they can, however, befound on the cell surface and within the extracellular matrix. FGF1 and FGF2 may be released from damaged cells or could be released by an exocytotic mechanism that is independent of the endoplasmic-reticulum-Golgi pathway [28]. FGF9 has been shown to contain a non-cleaved amino-terminal hydrophobic sequence that is required for secretion [29,30]. A third subset of FGFs (FGF11-14) lack signal sequences and are thought to remain intracellular [31,32,33,34]. It is not known whether these FGFs interact with known FGFRs or function in a receptor-independent manner within the cell. FGF2 and FGF3 have high-molecular-weight forms that arise from initiation from upstream CUG codons [2,14,35]. The additional amino-terminal sequence in these proteins contains nuclear-localization signals, and the proteins can be found in the nucleus; the biological function of nuclear-localized FGF is unclear.

Developmental expression patterns and function

The 22 members of the mammalian FGF family are differentially expressed in many, if not all, tissues, but the patterns and timing of expression vary. Subfamilies of FGFs tend to have similar patterns of expression, although each FGF also appears to have unique sites of expression. Some FGFs are expressed exclusively during embryonic development (for example, Fgf3, 4, 8, 15, 17 and 19), whereas others are expressed in embryonic and adult tissues (for example, Fgf1, 2, 5-7, 9-14, 16, 18, and 20-23).

Function

The expression patterns of FGFs (see above) suggest that they have important roles in development. FGFs often signal directionally and reciprocally across epithelial-mesenchymal boundaries [36]. The integrity of these signaling pathways requires extremely tight regulation of FGF activity and receptor specificity. For example, in vertebrate limb development, mesenchymally expressed Fgf10 in the lateral-plate mesoderm induces the formation of the overlying apical ectodermal ridge; the ridge subsequently expresses Fgf8, which signals back to the underlying mesoderm [37]. This directional signaling initiates feedback loops and, along with other signaling molecules, regulates the outgrowth and patterning of the limb. Importantly, the differential expression of the alternative splice forms of the receptors in the apical ectodermal ridge and underlying mesoderm is such as to limit or prevent autocrine signaling within a given compartment.

Studies of the biochemical activities of FGFs have focused on the specificity of interactions between FGFs and FGFRs, on factors that affect the stability of FGFs and on the composition and mechanism of the active FGF-FGFR signaling complex.

Specificity of FGFs for FGF receptors

The FGFR tyrosine kinase receptors contain two or three immunoglobulin-like domains and a heparin-binding sequence [38,39,40]. Alternative mRNA splicing of the FGFR gene specifies the sequence of the carboxy-terminal half of immunoglobulin-domain III, resulting in either the IIIb or the IIIc isoform of the FGFR [41,42,43]. This alternative-splicing event is regulated in a tissue-specific manner and dramatically affects ligand-receptor binding specificity [44,45,46,47,48]. Exon IIIb is expressed in epithelial lineages and exon IIIc tends to be expressed in mesenchymal lineages [44,46,47,48]. In vitro patterns of binding specificity have been determined for each splice form of FGFR1-3 and for FGFR4, which is not alternatively spliced [49,50,51]. Ligands specific for these receptor splice forms are expressed in adjacent tissues, resulting in directional epithelial-mesenchymal signaling. For example, epithelially expressed FGFR2b (that is, FGFR2 IIIb isoform) can be activated by FGF7 and FGF10, ligands produced in mesenchymal tissue [49,50,51]. These ligands show no activity towards mesenchymally expressed FGFR2c. Conversely, FGF8 is expressed in epithelial tissue and activates FGFR2c but shows no activity towards FGFR2b ([49,52] and our unpublished observations). Notably, FGF8 expression is often restricted to epithelial tissue such as the apical ectodermal ridge of the developing limb bud [53,54].

Interaction with heparin or heparan sulfate proteoglycans

An important feature of FGF biology involves the interaction between FGF and heparin or heparan sulfate (HS) proteoglycan (HSPG) [19]. These interactions stabilize FGFs to thermal denaturation and proteolysis and may severely limit their diffusion and release into interstitial spaces [55,56]. FGFs must saturate nearby HS-binding sites before exerting an effect on tissue further away, or else must be mobilized by heparin/HS-degrading enzymes. The interaction between FGFs and HS results in the formation of dimers and higher-order oligomers [57,58,59]. Although the biologically active form of FGF is poorly defined, it has been established that heparin is required for FGF to effectively activate the FGFR in cells that are deficient in or unable to synthesize HSPG or in cells pretreated with heparin/HS-degrading enzymes or inhibitors of sulfation [60,61,62]. Genetic studies have also shown that mutations in enzymes involved in HS biosynthesis affect FGF signaling pathways during development [19,63]. Additional studies have shown that heparin and/or HS act to increase the affinity and half-life of the FGF-FGFR complex (reviewed in [40,64]).

A minimal complex containing one FGF molecule per FGFR can form in the absence of HS [24]. Structural studies suggest that HS may bridge FGF2 and the FGFR by binding to a groove formed by the heparan-binding sites of both the ligand and the receptor [24,65]. Binding studies with soluble chimeric FGFRs have identified a second potential FGF-binding site that, in some cases, can interact cooperatively with the primary FGF-binding site [66].

Important mutants

Many members of the Fgf family have been disrupted by homologous recombination in mice. The phenotypes range from very early embryonic lethality to subtle phenotypes in adult mice. The major phenotypes observed in Fgf knockout mice are shown in Table 2. Because FGFs within a subfamily have similar receptor-binding properties and overlapping patterns of expression, functional redundancy is likely to occur. This has been demonstrated for Fgf17 and Fgf8, which cooperate to regulate neuroepithelial proliferation in the midbrain-hindbrain junction [17]. In the case of Fgf knockouts resulting in early lethality, other functions later in development will need to be addressed by constructing conditional alleles that can be targeted at specific times and places in development. For example, Fgf8-/- mice die by embryonic day 9.5 [67]. A conditional allele for Fgf8 targeted to the apical ectodermal ridge has been used to demonstrate an essential role for Fgf8 in early limb development [68,69].

Table 2 FGF knockout mice

Several mutations in Fgf genes have been identified in C. elegans, Drosophila, zebrafish, mouse and human. The C. elegans gene egl-17 is required for sex myoblast migration [12], and a null allele of let-756 causes developmental arrest of the early larva [13]. The Drosophila branchless gene is required for tracheal branching and cell migration [11]. In zebrafish, acerebellar (ace) embryos lack the cerebellum and the midbrain-hindbrain boundary organizer. The ace gene encodes the zebrafish homolog of Fgf8 [70]. Interestingly, zebrafish aussicht mutant embryos, which overexpress Fgf8, also have defects in development of the central nervous system [71].

In the mouse, the angora mutation, which affects hair growth, was found to be allelic with Fgf5 [72]. A mouse mutant with a Crouzon-syndrome-like craniofacial dysmorphology phenotype was found to result from an insertional mutation in the Fgf3/Fgf4 locus [73]. Recently, positional cloning of the autosomal dominant hypophosphataemic rickets gene identified missense mutations in human FGF23 [74]. A recent paper demonstrates that this disease is caused by a gain-of-function mutation [75]. The chromosomal location (Xq26) and tissue-specific expression pattern of Fgf13 (also called Fhf2) suggests that it may be a candidate gene for Borjeson-Forssman-Lehmann syndrome, an X-linked mental retardation syndrome [76].

Frontiers

Issues most studied

FGFs have been intensely studied for nearly 30 years. Most of the early work focused on the mechanisms that regulate stability, secretion, export and interactions with heparin and on the mechanisms and consequences of signal transduction in various types of cells. More recent work has focused on the mechanisms regulating receptor specificity and receptor activation, the structure of the FGF-FGFR-HS complex, and the identification of new members of the FGF family. Functional studies have begun to address the role of FGFs in cell biology, development and physiology. Initial studies focused on the regulation of cell proliferation, migration and differentiation; more recent work has addressed the negative effect of FGFs and FGFRs on proliferation of some cell types, which was surprising as FGFs were thought to promote proliferation. In vitro studies have now been complemented by gene targeting in mice. The knockout approach has been fairly successful in identifying primary phenotypes but will be challenged by the need to address redundancy amongst the 22 FGFs and to study their developmental and physiological functions after the point of lethality of the null allele.

Unresolved questions

A major unresolved question concerns the mechanism(s) regulating FGF activity in vivo in the presence of cell-surface and extracellular-matrix HSPG. Current hypotheses predict that tissue-specific heparan fragments of defined sequence (and particularly of defined sulfation pattern) will differentially regulate FGFs by controlling their diffusion in the extracellular matrix and their ability to activate specific receptors [77]. These issues will be resolved by determining the sequence of tissue-specific HS and by demonstrating whether specific HS sequences can modulate the binding specificity of FGFs beyond that determined by the specific FGFR and its alternative splice form in the presence of heparin.

A second area of research will aim to elucidate the developmental roles of all the FGFs, first alone and then in various combinations. This will include determining whether a single FGF with a defined developmental function interacts with one or multiple FGFRs. A third major frontier will be to elucidate the physiological roles of FGFs that are expressed in adult tissues. This will again involve testing combinations of FGFs in cases in which knockouts are viable and designing conditional alleles in cases of embryonic lethality. Major areas being considered include neuronal and cardiovascular physiology, neuronal regeneration and homeostasis and tissue repair.

The last major frontier will be to elucidate the primary roles of FGFs in genetic diseases and cancer. Several FGFs were initially cloned from human and animal tumors. Future work will be required to determine whether FGF activation is itself an etiological agent in primary human tumors or whether it is a progression factor in the pathogenesis of cancer. As functional roles for FGFs are elucidated in embryonic development, it is expected that various human birth defects and genetic diseases will be attributed to mutations in Fgf genes. These studies will probably lead to the development of pharmacogenetic agents to treat these diseases. Because a large number of skeletal diseases are caused by mutations in Fgfr genes, it is anticipated that mutations in some Fgf genes will also be involved in skeletal pathology.