Gene organization and evolutionary history

Annexins were discovered approximately 25 years ago. The first to be described as an isolated, purified protein was human annexin A7 (then known as synexin) [1], and the first to be cloned were human annexins A1 and A2 (formerly known as lipocortin and calpactin respectively) [2, 3]. The name 'annexin' was proposed for the superfamily in 1990, and the 12 annexins common to vertebrates were recently classified in the annexin A family and named as annexins A1-A13 (or ANXA1-ANXA13), leaving A12 unassigned in the official nomenclature [4]. Annexins outside vertebrates are classified into families B (in invertebrates), C (in fungi and some groups of unicellular eukaryotes), D (in plants), and E (in protists); at least 40 additional subfamilies await formal classification into these families.

Most eukaryotic species have 1-20 annexin (ANX) genes (Table 1, Figure 1); even the primitive unicellular protist Giardia has at least seven (which are in the annexin E family) [5, 6]. The annexin genes have duplicated extensively and independently in several eukaryotic lineages, as seen from their molecular phylogeny, their gene structures and their chromosomal positions. Plant annexins (the D family) make up a monophyletic cluster whose members generally lack amino-terminal domains and functional calcium-binding sites in their second and third repeats [7] (see below). They originated approximately 1,000 million years ago from the one to three founding members in mosses, ferns and gymnosperms. Up to 17 additional gene subfamilies have emerged in flowering angiosperms as a result of gene or genome duplication events, accruing additional annexins in individual lineages (for example, one more in lilies and ten more in the barrel medic). The eight and nine annexin genes in the complete genomes of Arabidopsis and rice, respectively, include only two true orthologs; the rest are products of lineage-specific duplications after the separation of dicotyledonous and monocotyledonous plants about 200-250 million years ago.

Table 1 Annexin genes in different groups of organisms
Figure 1
figure 1

The phylogenetic distribution of annexins. A tree showing the classification of annexins into five families, ANXA to ANXE, which correspond with different eukaryotic lineages that originated at different periods over the past 1,200 million years (Mya, million years ago). Names of the vertebrate annexins are shown, but those of other members of the superfamily are omitted for simplicity.

The annexin C family consists of diverse members in unicellular organisms, represented by fungi, mycetozoa (slime molds) and the newly defined kingdom of chromalveolates (a grouping of chromist stramenopiles, including brown algae and diatoms, and alveolates, including ciliates and dinoflagellates). Individual species in these groups may have no annexins (yeasts), one to three (other fungi), or up to six (potato rot). Members of the annexin B family, found in both protostome and deuterostome invertebrates, have also undergone many lineage-specific duplications, leading to more than 20 subfamilies whose gene organization, protein structures and chromosomal maps differ between clades and from vertebrate annexins. Insect annexins exemplify the complex pattern of duplication and loss in individual lineages: tsetse flies and mosquitoes have four annexins, whereas Drosophila, honeybees and silkmoths have only three, of which only one or two are clear orthologs between species. The early-branching deuterostomes - sea urchins, tunicates and lancelets - have 5-12 annexins; these include close relatives of annexins A13, A7 and A11, the founder genes of vertebrate annexins [8, 9]. Although this establishes the invertebrate ancestry of vertebrate annexins, none of the 12 annexins in the vertebrate A family have (yet) been assigned a true invertebrate ortholog.

The vertebrate A family includes the 12 annexins that have been confirmed to make up the complete family in mammals, but the number of annexins may vary in other classes of vertebrates as genes have been gained and lost. Ancient polyploidization events in bony fish, and more recent genome duplications in pseudotetraploid frogs (Xenopus), have duplicated many of the annexin genes. Thus, annexin A1 has undergone two successive duplications to yield up to four copies in some fish, amphibians and birds. Mammalian ANXA6 is a compound gene, probably derived from the fusion of duplicated ANXA5 and ANXA10 genes in early vertebrate evolution (the two halves of the encoded protein are indicated as 5'ANX6 and 3'ANX6 in Figure 1). Annexins A7, A8 and A10 have not yet been detected in fish, although genes similar to annexin A7 have been found in earlier-diverging species such as the sea urchin, the earthworm and Hydra. The reasons for the tendency of annexin genes (or their chromosomal regions) to duplicate, their successful preservation, and the extent to which they contribute to vertebrate complexity are as yet unknown.

The 12 human annexin genes range in size from 15 kb (ANXA9) to 96 kb (ANXA10) and are dispersed throughout the genome on chromosomes 1, 2, 4, 5, 8, 9, 10 and 15 [10]. Annexin genes from other vertebrates may vary slightly in size and chromosomal linkage, but orthologs are grossly similar in their sequence and splicing patterns (Figure 2a).

Figure 2
figure 2

Gene structures, protein domains and signature logos of vertebrate annexins. (a) The organization of the regions of family-A annexin genes encoding the core carboxy-terminal region. Exon numbers are shown above each gene; introns are indicated by vertical lines and homologous intron positions by dotted lines. The structures of the nine human annexin genes not shown are the same as that of ANXA11. ANXA13 is thought to have the gene structure closest to the ancestral vertebrate annexin gene; ANXA7 is intermediate between ANXA13 and the others, and most closely resembles ANXA11 in its amino-terminal half and ANXA13 in its carboxy-terminal half. (b) Annexin proteins generally consist of a unique amino-terminal region (of 0-191 amino acids in vertebrates, for example) and a carboxy-terminal 'core region' of four homologous repeats, each 68-69 amino acids long and containing five α helices and a type-2 calcium binding site with the sequence GxGT-[38 residues]-D/E. The indicated residues Glu89 and Arg265 are considered key components of the putative calcium channel function. (c) Sequence logo for the core domain of vertebrate annexins, derived from a hidden Markov model [11] generated from an alignment of 311 amino acids from 200 sequences representing the 12 subfamilies in 50 vertebrate species. The full height of each residue stack reflects the conservation level at that position; the height of symbols within the stack indicates the relative frequency of each amino acid [12]. The two parts of the calcium-binding motif (GxGT and D/E) are indicated by asterisks. The four repeats are aligned to the right on their calcium-binding motifs.

Characteristic structural features

All annexins share a core domain made up of four similar repeats, each approximately 70 amino acids long. Each repeat is made up of five α helices and usually contains a characteristic 'type 2' motif for binding calcium ions with the sequence 'GxGT-[38 residues]-D/E' (in the single-letter amino-acid code; see Figure 2b). Animal and fungal annexins also have variable amino-terminal domains.

Amino-acid site conservation in a sequence alignment of vertebrate annexins can be analyzed statistically using hidden Markov models [11, 12] to generate a signature logo [13, 14] (Figure 2c). The levels of conservation and the frequency of amino acids at each site reflect both evolutionary selection on the site and its functional importance. The similarity in sequence between individual repeats (especially repeats 2 and 4) is evident. The calcium-binding motif is most conserved in repeat 2; in annexin A10, the motif in repeat 2 is the only such motif. The exon splice patterns and alternating intron phases in annexin genes do not correspond to domains within the proteins, a feature that has effectively precluded exon shuffling. The high level of conservation of features such as the four carboxy-terminal repeats contrasts with unique amino termini in different annexins, and there are also smaller differences between them in specific parts of the proteins (Figure 3). Annexins bind to a wide variety of other proteins (Table 2). Many annexins have posttranslational modifications, such as phosphorylation and myristoylation (Figure 3); such modifications and surface remodeling of individual members presumably account for much of the subfamily specificity in annexin interactions. An intriguing, recurring connection between annexins and structural proteins, such as actin, is suggested by genetic anomalies such as the fusion in Drosophila of an annexin gene with a dynein intermediate-chain gene [15] and in Ciona intestinalis of a gene similar (on the basis of its exon-splicing pattern) to ANXA11 with an intermediate-filament gene [16, 17].

Figure 3
figure 3

Domain structures of representative annexin proteins. Orthologs of the 12 human annexins shown in other vertebrates have the same structures, with strict conservation of the four repeats in the core region (black) and variation in length and sequence in the amino-terminal regions (shaded). Human ANXA1 and ANXA2 are shown as dimers, with the member of the S100 protein family that they interact with. Domain structures for other model organisms are derived from public data made available by the relevant genome-sequencing projects. Features: S100Ax, sites for attachment of the indicated member of the S100 family of calcium-binding proteins; P, known phosphorylation sites; K, KGD synapomorphy (a conserved, inherited characteristic of proteins); I, codon insertions (+x denotes the number of codons inserted); S-A/b, nonsynonymous coding polymorphisms (SNPs) with the amino acid in the major variant (A) and that in the minor variant (b); N, putative nucleotide-binding sites; D, codon deletions (-x denotes the number of codons deleted); A, alternatively spliced exons; Myr, myristoylation. The total length of each protein is indicated on the right.

Table 2 Proteins that interact with vertebrate annexins

The core domains of most vertebrate annexins have been analyzed by X-ray crystallography, revealing conservation of their secondary and tertiary structures [1820] despite only 45-55% amino-acid identity among individual members. Each annexin repeat is folded into five α helices (Figure 2b,2c), and these in turn are wound into a right-handed super-helix. The four repeats pack into a structure that resembles a flattened disc, with a slightly convex surface on which the Ca2+-binding loops are located and a concave surface at which the amino and carboxyl termini come into close apposition (Figure 4). An innovative and powerful approach to associating protein structural domains with intrinsic function or functional divergence involves the incorporation of evolutionary information into three-dimensional models [2123]. The family sequence logo defines universally conserved sites (Figure 2), which can be mapped as a color or shading scheme onto the surface-exposed atoms of a crystallographic model [24] to reveal localized domains of probable relevance to the basic function of the protein (Figure 4a). A more complicated problem in the study of large protein families is the determination of which structural differences are responsible for functional specificity in each subfamily. Because functional constraint guides evolutionary selection, the sites that change in an evolutionarily significant way can be inferred to be responsible for functional divergence. Thus, shifts in site-specific evolutionary rates during speciation (computed from site variations in multiple sequence alignments derived from a broad range of species) or a conserved change in an amino-acid property at a critical location may consolidate a functional change at that site. We present such a comparative analysis of annexins A1, A2 and A5 (Figure 4b) to identify the sites at which each differs significantly from other annexins and which, in terms of evolutionary dynamics, may indicate a key structure-function adaptation. For example, annexin A2 orthologs incorporate additional basic residues into a group of amino acids (positions 48-56) that is accessible on the concave (cytosolic) face of the molecule (Figure 4b). The functional significance of this divergence is consistent with a possible nuclear-localization signal in annexin A2, although any such hypothesis requires empirical testing. An extreme case of adaptive evolution is that of annexin A9, an ancient duplicated relative of annexin A2, in which all four calcium-binding sites on the convex surface have been eradicated by evolutionary selection. Comparable divergence patterns in annexins A1 and A5, which are in distinct evolutionary clades of the A-family annexins, reveal patterns of structural divergence between subfamilies that localize principally to sites exposed on the protein surface that are most likely to be involved in intermolecular interactions, such as the KGD motif that is an inherited characteristic of the clade containing annexins A1, A2, and A9. These contrast with the universally conserved sites common to all annexins, which are confined to the central, interior portion of the molecule (Figure 4a). This approach, involving differentiation between universally conserved sites (important for the general function of a family) and discrete rate changes (which affect binding to other proteins and which may be responsible for the properties of individual annexins) may eventually help to resolve the molecular basis for the multifaceted functional profiles of individual annexin proteins.

Figure 4
figure 4

Surface mapping of important sites onto the three-dimensional structure of annexins. All panels show the crystal structure of the core region of the pig annexin A1 protein (Protein Data Bank code:1MCX [19]), viewed frontally (left) and laterally inverted (right) as a space-filling model rendered by the RasTop 2.0 version of RasMol [24]. Residues are numbered as in Figure 2, and the approximate positions of the conserved repeats are indicated with Roman numerals. (a) Functionally important sites common to all annexins. The level of evolutionary conservation in clusters of residues is indicated by lighter or darker shading. This is derived from a maximum-likelihood analysis of a multiple sequence alignment from 200 vertebrate A-family annexins using CONSURF [22, 23] NT, amino terminus. (b) Sites that are functionally divergent between annexin subfamilies are shown with different shading for ANXA1, ANXA2 or ANXA5, three annexins for which the differences are especially significant. The sites were assessed by 'rate-shift analysis' of subfamily sequence alignments using DIVERGE [21] RATE4SITE and CONSURF [22, 23]. Calcium atoms are indicated by Ca.

Localization and function

Annexins are generally cytosolic proteins, with pools of both a soluble form and a form stably or reversibly associated with components of the cytoskeleton or proteins that mediate interactions between the cell and the extracellular matrix (matricellular proteins). Some, such as annexins A11 and A2, have been found in the nucleus under particular circumstances [25, 26]. In certain instances, annexins may be expressed at the cell surface, despite the absence of any secretory signal peptide; for example, annexin A1 translocates from the cytosol to the cell surface following exposure of cells to glucocorticoids [27], and annexin A2 is constitutively expressed at the surface of vascular endothelial cells where it functions in the regulation of blood clotting [28]. The expression level and tissue distribution of annexins span a broad range, from abundant and ubiquitous (annexins A1, A2, A4, A5, A6, A7, A11) to selective (such as annexin A3 in neutrophils and annexin A8 in the placenta and skin) or restrictive (such as annexin A9 in the tongue, annexin A10 in the stomach and annexin A13 in the small intestine).

The presence of multiple annexins in all higher eukaryotic cell types suggests fundamental roles in cell biology [5], even though prokaryotes and yeasts appear to tolerate their absence, but the apparent functional diversity within the family remains perplexing. The development of knockout mice has provided insight into the functions of annexins A1, A2, A5, A6 and A7. Loss of ANXA1 leads to changes in the inflammatory response and the effects of glucocorticoids [29], whereas the ANXA2 knockout mouse has defects in neovascularization and fibrin homeostasis [30]. The ANXA5 and ANXA6 knockout mice have subtler phenotypes and need further investigation [31, 32], and two independently derived ANXA7 null mutant mouse strains are either embryonic lethal [33] or show changes in calcium homeostasis [34]. The diversity of phenotype in the annexin knockout mice is consistent with these proteins having largely independent functions. Roles for annexins that have been established from studies using cultured cells are not always reflected in phenotypic abnormalities in the corresponding knockout mice, suggesting that functional redundancy may, in some instances, obscure the full range of functions of these multifunctional proteins. In mice that lack an overt phenotype, there is now the opportunity to test molecular theories of annexin function, such as the proposed calcium channel activity of annexin A5.

Frontiers

The definition of the biological processes in which annexins are involved has progressed through the use of gene knockouts and imaging. On the basis of studies using live cell imaging and targeted gene disruption, roles have now been unequivocally established for annexin A1 in inflammation, annexin A2 in vesicle traffic and annexin A7 in regulation of cell growth. The ubiquity and stability of annexins suggest some fundamental role of the unique core domain in cellular physiology, possibly involving adhesion mechanics, membrane traffic, signal transduction and/or developmental processes. To the extent that annexins may have adapted to the particular needs of their host species, molecular-evolution studies offer some insight into which structural changes may be responsible for their functional diversity, but biological data remain scant for nonvertebrate annexins. Transcript expression studies using microarrays and RNA interference offer new experimental approaches that could implicate annexins in some defined cellular process or pathway.

Long-standing problems also remain to be addressed. Do individual annexins have different functions in different cell types? How are annexins secreted? Can annexins be classified into groups with integrated functions, or are they functionally independent of each other? These and many other questions, and perhaps most importantly the need to understand mechanism, will occupy annexin biologists for years to come. The discovery of annexins with negligible calcium-binding capacity and growing evidence for interactions with other proteins may make the traditional definition of annexins as calcium-dependent phospholipid-binding proteins superfluous in the near future.