Introduction

Biocalcification refers to the physiological and molecular process by which living systems produce mineralized structures based on calcium salts, mainly calcium carbonate. From an evolutionary perspective, biocalcified structures represent a true innovation in the evolutionary history of metazoans. Biocalcitic structures fulfil a wide range of functions [1] such as: support and protection (e.g. sponge spicules, coral and bryozoan exoskeletons, mollusc and brachiopod shells, mineralized carapaces of crustaceans, calcified tubes of annelids, and the test, spines and teeth of echinoderms); organs for spatial equilibration (e.g. statoliths in jellyfishes, otoliths in fishes); calcium storage structures (e.g. gastroliths in crustaceans); organs for reproduction (e.g. gastropod love darts); desiccation prevention (e.g. avian eggshells); detoxification (e.g. calcium granules in molluscs). Despite the diversity and abundance of metazoan biocalcified structures, and the relevance of their evolutionary origins to our understanding of how metazoan life diversified, the molecular processes that underlie their formation remain generally poorly understood. Nevertheless, many recent molecular studies focused on a broad phylogenetic range of biomineralising animals have revealed that carbonic anhydrase (CA) is a key conserved enzyme involved in the process of biocalcification [2]-[22].

The carbonic anhydrase super-family

Carbonic anhydrases constitute a group of metallo-enzymes of the carbon-oxygen lyase sub-class and of the hydro-lyase infra-class. They catalyse the reversible hydration of CO2 to form one bicarbonate ion and one proton: H2O + CO2 ⇔ H2CO3 ⇔ HCO3- + H+. In most cases, the active site of these enzymes is composed of one Zinc(II) ion coordinated by three histidine residues. CAs occur in all three domains of life (Eubacteria, Archaea and Eukaryota), and play roles in diverse metabolic pathways. The CA superfamily is subdivided into five families: α-CA, β-CA, γ-CA, δ-CA and ζ-CA [23],[24]. Interestingly, large-scale phylogenetic analyses reveal no homology between these five families [25], and they are therefore thought to be the result of convergent evolution, i.e. there was no common ancestor from which these five families evolved. This viewpoint is supported by the fact that CA conformations differ between families: while monomers and dimers are the active forms of α-CAs [26], α-CAs are active in homotrimeric form [27], and γ-CAs can be active in dimeric, tetrameric, hexameric or octameric forms. The amino acids involved in the coordination of the zinc ion are three histidine residues, or two cysteines and one histidine residues in the case of β-CAs (e.g. dicotyledonous plant Pisum sativum; [28]).

Despite their similar enzymatic properties and distinct evolutionary histories, the presence of a given CA family in a certain lineage does not exclude the presence of another. For example, α-CAs are present in bacteria, algae, in the cytoplasm of green plants and fungi, and in metazoans [29]-[31], while the β-CAs are present in bacteria, fungi, algae and in the chloroplasts of monocotyledones and dicotyledones [32],[33]; γ-CAs are found in archaea and bacteria [34]-[36], suggesting that they are the oldest class, while the δ-CAs and the ζ-CAs are present in some marine diatoms [37]-[39]. All CA families are characterized by the presence of a zinc ion, the exception being the ζ family, for which cadmium replaces zinc, for example in marine diatoms such as Thalassiosira weissflogii[37],[40]. Because α-CAs have most often been associated with biocalcification roles in various metazoans, here we will focus our attention on this family.

The α-carbonic anhydrases family

In 1933 Meldrum and Roughton [41], and simultaneously Stadie and O’Brien [42], discovered the first α-CA in vertebrate erythrocytes via its enzymatic activity. Today, with the advances in molecular techniques and the development of high-throughput sequencing methods, hundreds of α-CAs in many isoforms have been described for vertebrate and invertebrate metazoans.

Different α-CA isozymes are produced by distinct tissues. In mammals, they have diverse subcellular localizations: cytosolic (isozymes CAI, II, III, VII and XIII), membrane-bound (isozyme CAIV – GPI-link), transmembrane (isozymes CAIX, XII and XIV), mitochondrial (isozymes CAVA and VB) or extracellular (isozyme CAVI). They are involved in many biological functions including: acid–base balance, CO2 transport, urea cycling, gluconeogenesis, fatty acids/amino acids synthesis and calcification/decalcification [43]. It is very likely that the primitive function of the α-CA was the intracellular regulation of acid–base balance, and/or the metabolism of CO2[44]. However, during evolution, the function of the enzyme was co-opted for other physiological processes, such as biomineralization. In some metazoan groups, α-CAs have been described as a calcium-dissolving enzyme, as in the bones of vertebrates and the spicules of calcareous sponges such as Sycon raphanus[45]. Recently, eight CAs in murine, produced by ameloblasts, were identified to be involved in biomineralization during teeth development [46]. In corals, α-CA isoforms interact in the carbon cycle of the endosymbiotic zooxanthellae playing an indirect role in skeletal formation [19]. In the calcifying species listed in Table 1, we have summarized the presence of α-CAs potentially involved in the deposition of CaCO3 biominerals. For some species, such as brachiopods, data are currently missing; for others, such as sponges, crustaceans or cnidarians, they are widely incomplete [7],[9],[14],[47]. In molluscs, sequence data is relatively abundant, but are focused on a limited number of species, such as the pearl oyster (Pinctada sp.) or the giant owl limpet (Lottia gigantea) [11],[21]. Within the deuterostomes, most of the available α-CA data in echinoderms are related to the genome sequencing of the sea urchin Strongylocentrotus purpuratus[48]. The urochordates and vertebrates also possess CaCO3 structures (e.g. spicules, otoliths, eggshell). Indeed, α-CAs were identified in the calcifying tissues that produce those structures [3],[4],[6],[49].

Table 1 Biomineralization types and the presence of α-CA in various metazoan phyla

The α-CA family contains numerous isozymes, corresponding to the expression of a set of paralogous genes that are the product of gene duplication and speciation events that occurred throughout the Phanerozoic [65]-[67]. All genes encoding α-CAs are likely derived from a single ancestral gene, and are therefore homologous [35].

To date, very few molecular phylogenetic studies of α-CAs have been conducted in relation to the biocalcification processes [7],[9],[10],[18],[19]. These few previous studies were performed with CA sequences of interest from poriferan, cnidarian and molluscan representatives in addition to various invertebrate/vertebrate datasets (Figure 1). The first phylogenetic study of α-CAs [35] revealed a clear dichotomy between membrane-associated/secreted and cytosolic/mitochondrial α-CAs, and this pattern is generally found in subsequent studies (Figure 1) [7],[9],[10],[18]. In all of these studies poriferan α-CAs form a monophyletic clade, and occupy an early branching position (Figure 1). In other respects α-CA tree topology can differ significantly, for instance the position of vertebrate GPI-linked α-CAs as a sister group to the poriferan α-CAs [68], or the early-branching position of the molluscan nacreins [10]. One significant difficulty that all phylogenetic analyses of the α-CAs face, is the lack of a full genomic complement of α-CAs from a diverse taxonomic range of metazoan representatives. Due to the ever-growing availability of whole genomes and transcriptomes from non-model organisms, this issue is being addressed and will allow more complete analyses to be performed.

Figure 1
figure 1

Simplified phylogenetic trees from four studies of α-CAs derived from metazoan taxa. These trees were modified from Jackson et al., 2007 (A), Moya et al., 2008 (B), Jackson et al., 2010 (C) and Moya et al., 2012 (D)[7],[9],[10],[18]. Membrane-associated/secreted CAs are indicated in blue and cytosolic/mitochondrial CAs are indicated in green. Bootstrap or posterior probability values are indicated.

Evolution of metazoan α-CAs

We explored the evolutionary relationships of metazoan α-CAs using two datasets compiled from publicly available genomic, EST and RNASeq databases (i.e. JGI, SpBase and NCBI [48],[54],[69]) derived from several metazoan species. In compiling these datasets we endeavoured to employ only those species with complete, or likely close to complete, genomic complements of α-CA. We explicitly excluded α-CA sequences from incomplete genomes and incomplete and/or highly divergent α-CA sequences that were insufficiently conserved for alignment. We performed two phylogenetic analyses using Maximum likelihood [70],[71] and Bayesian methods [72],[73] (Additional file 1) that generated largely congruent tree topologies. The first analysis was conducted using 138 α-CA sequences (6 α-CAs from non-metazoans as outgroups, and 132 α-CAs from metazoans, Figure 2, Additional files 2 and 3: Figure S1 and S2), and the second, from 98 α-CA sequences (3 α-CAs from protists and 94 α-CAs from metazoans with a focus on molluscan α-CAs likely to be involved in biocalcification, Figure 3, Additional file 4: Figure S3).

Figure 2
figure 2

Phylogenetic reconstruction of metazoan α-CAs relationships. The phylogeny was reconstructed using PhyML and Bayesian inference methods, the topology presented is that resulting from the PhyML method (see Additional file 2: Figure S1 for the Bayesian topology). Node support values indicate aLRT values, and Bayesian posterior probabilities expressed as a percentage respectively. Only values above 50 are indicated. Sequence data was derived from 4 vertebrate species (Homo sapiens, Mus musculus, Bos taurus, Gallus gallus, Oncorhynchus mykiss and Xenopus tropicalis, in black); 1 cephalochordate (Branchiostoma floridae; in grey); 1 sea urchin (Strongylocentrotus purpuratus, in purple); 1 annelid (Capitella teleta, in green); 1 mollusc (Lottia gigantea, in blue); 1 coral (Acropora millepora, in red); 1 placozoan (Trichoplax adhaerens, in pink); 2 sponges (Amphimedon queenslandica and Astrosclera willeyana, in orange); 1 fungi (Colletotrichum orbiculare, in black); 2 chromalveolates (Emiliania huxleyi and Vaucheria litorea, in black); 1 chlorophyte (Chlamydomonas reinhardtii, in black); and 2 bacteria (Nostoc sp. and Neisseria gonorrhoeae, in black). The α-CAs involved in biocalcification are in bold/underlined. CARPs: Carbonic Anhydrase Related-Proteins.

Figure 3
figure 3

Phylogenetic reconstruction of metazoan α-CAs with a focus on molluscan sequences. The phylogeny was reconstructed using PhyML and Bayesian inference methods, the topology presented is that resulting from the Bayesian method (see Additional file 4: Figure S3 for the PhyML topology). Node support values indicate aLRT values, and Bayesian posterior probabilities expressed as a percentage respectively. Only values above 50 are indicated. Sequence data was derived from 2 vertebrate species (Homo sapiens and Mus musculus, in black); 1 cephalochordate (Branchiostoma floridae, in grey); 6 bivalve molluscs (Crassostrea gigas, Crassostrea nippona, Patinopecten yessoensis, Pinctada fucata, Pinctada margaritifera and Pinctada maxima, in blue); 4 gastropod molluscs (Haliotis gigantea, Haliotis tuberculata, Lottia gigantea, Turbo marmoratus, in blue); 2 sponges (Amphimedon queenslandica and Astrosclera willeyana, in orange); and 3 unicellular organisms (Nostoc sp., Chlamydomonas reinhardtii and Neisseria gonorrhoeae, in black). The α-CAs involved in biocalcification are in bold/underlined. Nacreins and nacrein-likes are indicated by a red star. Well supported aLRT/bootstrap values (>90) are indicated by a green point due to lack of space. CARPs: Carbonic Anhydrase Related-Proteins.

In these trees, as previously mentioned, α-CA isozymes can be located in one of four broad positions: cytosolic, membrane-associated (i.e. transmembrane, membrane-bound, GPI-linked), secreted and mitochondrial. The location of a given α-CA seems to be married to its function; for example the majority of molluscan α-CAs known to be involved in shell formation are located extracellularly [2],[15],[21], while CAV members are thought to be involved in insulin regulation in the pancreas [74]. The fact that the ancient (>550 MYA) relationships of α-CA family members are largely reflected in where they are located highlights the important role that gene duplication played in allowing this class of enzymes to diversify and acquire novel functions.

In the tree of Figure 2, we observe that the seven poriferan α-CAs share a monophyletic relationship, and that they occupy an early-branching position. This finding is congruent with earlier publications [7],[10],[18] (see Figure 1 A, C and D), and confirms a recent analysis reported by Moya et al. [9]. While we do not observe a monophyletic clade of secreted and membrane-bound α-CAs, we do recover a highly supported clade of cytosolic and mitochondrial α-CAs. Puzzling is also the fact that the poriferan α-CAs are excluded from the membrane-associated/secreted cluster in the different phylogenetic reconstructions [7],[9],[10],[18]. Among them, six are membrane-associated or secreted while one is cytosolic (CAII Amphimedon queenslandica, GenBank:DAA06051) [7]. The presence of membrane-associated/secreted α-CAs both in the poriferan and in eumetazoan α-CAs, suggests that the ancestral metazoan α-CA was membrane-associated or secreted. It must be noted however, that there is a putative cytosolic α-CA in the poriferan cluster, which belongs to the sponge A. queenslandica (CAII, GenBank:DAA06051). This may be explained by either the internalization of an ancestral membrane-associated/secreted enzyme, or the early evolution of a membrane-associated/secreted CA in the poriferan lineage. Nevertheless, it is obvious that the dearth of α-CA sequence data from the phylum Porifera is a critical point for any interpretation of the evolution of metazoan α-CA.

The distribution of metazoan α-CAs in relation to biocalcification

Our second phylogenetic analysis (Figure 3) includes 44 α-CAs from molluscan species, and sheds light on the evolution of α-CAs involved in molluscan biocalcification. One key-feature of this analysis is that molluscan mantle-secreted α-CAs and nacrein/nacrein-like sequences clearly group together. In their 2010 paper, Jackson et al. [10] described a basal position of molluscan nacreins relative to cytosolic/mitochondrial and secreted/membrane-associated α-CAs. Our phylogenetic reconstruction uses a more complete α-CA dataset, and generates a nacrein-type α-CA clade in a more derived position (Figure 3). Together with other molluscan α-CAs (Lottia gigantea, JGI:239188, 238082, Crassostrea gigas, GenBank:EKC18733, EKC41232, EKC34226, EKC34889, EKC22661, EKC34890, Haliotis tuberculata, GenBank:AEL22201, AEL22200, Haliotis gigantea, GenBank:BAH58350, BAH58349, Turbo marmoratus, GenBank:BAB91157), the nacrein/nacrein-like cluster is the sister group of a clade containing vertebrate related-protein α-CAs (CARPs: CAX and CAXI). Nacreins are clustered with other molluscan α-CAs likely to be involved in biocalcification (e.g. Lottia gigantea, JGI:239188, 238082).

With regard to its complement of 17 α-CAs, the giant owl limpet, L. gigantea, is unusual. Currently, this represents the largest known repertoire of α-CAs for an invertebrate. Sixteen of these sequences were used in our phylogenetic reconstruction, with 8 of these located within the membrane-associated/secreted clade, 6 within the vertebrate cytosolic/mitochondrial α-CAs and CARPs, and 2 (JGI:239188, 238082]) within the molluscan cluster containing nacreins (Figure 3). Two recent proteomic analyses confirmed the presence of these 2 α-CAs (JGI:239188, 238082) in the shell of L. gigantea[17],[21]. Nonetheless, among the 8 sequences that are distributed in the membrane-associated/secreted cluster, 3 have a signal peptide (JGI:239341, 202179, 172731) while 5 are incomplete sequences with missing N-termini. We also note the presence of a putatively cytosolic α-CA (JGI:239347) in both trees (Figures 2 and 3), within the membrane-associated/secreted α-CAs. Even so, if these sequences are genuine cytosolic α-CAs, this may be explained by the evolutionary internalisation of a formerly secreted α-CA, as may be the case for the poriferan DAA06051 sequence (see above). Alternatively, these proteins may be secreted by non-canonical secretory pathways, or represent incomplete sequencing/assembly/or gene predictions of the N-terminal region of the protein. Moreover, the location of the majority of the invertebrate α-CAs are yet to be experimentally validated, and therefore the groupings that we observe in our phylogenetic analysis must be considered in that light. Despite these uncertainties, the overall pattern of molluscan α-CA evolution appears to be one that took place largely independently of vertebrate α-CA evolution (see dashed square in Figure 3).

Similar to the L. gigantea, the 10 α-CAs from the purple sea urchin Strongylocentrotus purpuratus, and the 8 from the stony coral Acropora millepora, are spread across the membrane-associated/secreted cluster (6 A. millepora α-CAs and 6 S. purpuratus α-CAs), and in the cytosolic/mitochondrial cluster (2 A. millepora α-CAs and 4 S. purpuratus α-CAs; Figure 2). One of the S. purpuratus α-CAs (JGI:SPU_012518, also called Sp-CAra7LA) has been identified in the extracellular matrix of the adult urchin test [12]. In our phylogenetic tree, Sp-Cara7LA is localized in the same cluster that contains the 2 α-CAs involved in the shell biocalcification of L. gigantea (JGI:239188, 238082) [17],[69] and the 2 others involved in the skeleton precipitation of A. millepora (GenBank:JR995761 and JT014580) [18]. Despite the large phylogenetic distance between these three organisms they share a common α-CA ancestor that could be independently recruited in the purple sea urchin, in the giant owl limpet and in the stony coral, to be included in their specific biocalcification processes.

The dispersion of α-CAs involved in biocalcification in cnidarians, molluscs and echinoderms (Figures 2 and 3) supports the hypothesis of the independent evolution of calcifying matrices in these different metazoan lineages. This is also true for vertebrate α-CAs taking part in the calcification processes (e.g. otolithes, eggshell). Our finding is congruent with the proteomic analyses on skeletal matrices that show that calcifying secretory repertoires are highly divergent in different metazoan phyla [22],[75], even within one clade such as molluscs [10],[15],[21],[68],[76].

The position of α-CAs involved in (1) the calcification of eggshell of the red junglefowl Gallus gallus (GenBank:XP_415893, [77]), of otoliths of the rainbow trout Oncorhynchus mykiss (GenBank:BAD36835, BAD36836, [49],[64]) and (2) the decalcification of bones of the human Homo sapiens (GenBank:NP_000058, [78]), still remains inside vertebrate α-CA clades (Figure 2). Nevertheless, they are not monophyletic. Among these α-CAs playing a role in the biocalcification, CAII of H. sapiens and CAI and II of O. mykiss are clustered with cytosolic/mitochondrial α-CAs while CAIV of G. gallus is included within membrane-associated/secreted α-CAs. The use of different α-CA isoforms in these species could be explained by an independent recruitment of α-CAs for biocalcification functions inside the vertebrate subphylum.

The phylogenetic distribution of α-CAs involved in metazoan biocalcification (Figure 2) reveals a complex evolutionary history that requires more whole-genome data in order to clarify how the different α-CA isoforms evolved. Obviously, broader taxon sampling from groups such as sponges would serve to shed light on the evolutionary history of these biomineral forming mechanisms.

Conserved domains in α-CAs involved in the metazoan biocalcification processes

The catalytic domain of all α-CAs is highly conserved. In order to identify the most conserved domains in α-CAs involved in biocalcification, we calculated the substitution rate per site using the methodology described in Petit et al. (2006) [79] and Martin et al. (2009) [80]. Briefly, the topology of the tree provided by the Maximum Likelihood method with MEGA 5.1 (using an alignment of 34 sequences for the alignment of α-CAs involved in the calcification process and another alignment of 33 sequences from cytosolic/mitochondrial α-CAs in eumetazoans, Additional file 5: Table S1) was considered as the user tree to run the parsimony program Protpars included in the PHYLIP v3.69 package [81]. A moving average (window size = 5) was calculated for each substitution number per site, and then plotted in parallel with the structural domains of α-CAs (Figure 4). The major reversed peaks correspond to the catalytic sites that contain histidine residues, but also to conserved domains that maintain the conformation of the catalytic site for a correct activity. Seven short conserved α-CA domains (from 4 to 8 residues) spread along the sequence were identified: QSPI, LHVH, GSEH, EAHL, FVVVGVFL, GSLTTP and ESVLW. These conserved regions represent about 25% of the overall sequences used for the analysis. These domains contain residues essential for the regeneration of the active site [51],[82]-[86], and an amino acid involved in the proton shuffling process during the catalytic activity. Two of the poriferan α-CAs (Astrosclerin 1 and 2: GenBank:ABR53885, ABR53886) are characterized by the replacement of one of the three zinc coordinating histidine residues by a glutamine. In accordance with this Astrosclerin 1 was reported to be catalytically inactive, however recombinant Astrosclerin 2 could not be generated [7]. Such mutations in the active site of α-CA are likely to cause a weakening or elimination of zinc-binding capacity, and therefore of catalytic function [87].

Figure 4
figure 4

Substitution rates within metazoan α-CAs. The number of substitutions per site between poriferan and vertebrate sequences was calculated for an alignment of amino acids derived from 34 α-CA sequences involved in biocalcification (A), and an alignment of 33 cytosolic/mitochondrial α-CAs (B). Sequences used for the two analyses are listed in the Additional file 5: Table S1. The first position begins with the first α-CA conserved peptide (QSPI). Blue peptides are involved in the CA conformation and green peptides contain histidine residues that coordinate the Zn2+ ion in the active site. The black arrowhead indicates the substitution rate of the proton shuffling amino acid.

The substitution rate is two times less in the α-CAs involved in the biocalcification (1.6 to 8.3; graph A Figure 4) than in the cytosolic/mitochondrial α-CAs (4 to 16.8; graph B Figure 4). In the cytosolic/mitochondrial α-CAs the substitution rate is higher in regions that are not conserved in comparison with the substitution rate in the first graph (A, Figure 4). In general it seems that α-CAs involved in the biocalcification process are highly conserved when compared to the cytosolic/mitochondrial ones. We observe that the independent recruitment of these α-CAs to the biocalcification process seems to be followed by a higher conservation of amino acid sequences than in α-CAs involved in other functions. This pressure is probably link to the biocalcification function, which needs a high conserved α-CA activity as we can observe in the cluster of molluscan α-CAs linked to the formation of the shell (e.g. Figure 3). The catalytic activity could appear at different levels of this process. Indeed, CAs can provide HCO3- ions and/or CO2 depending of the direction of the reaction. The catalytic hydration of CO2 produces protons in addition to HCO3- and these protons have been suggested to promote the uptake of Ca2+ during larval shell formation in the freshwater gastropod Lymnaea stagnalis[88]. α-CAs may also play structural roles in biomineralization; the enzyme is occluded in the extracellular organic matrix of the biomineral structure as in some coral, molluscan and urchin species (e.g. A. millepora, U. pictorum, L. gigantea, S. purpuratus).

Low Complexity Domains (LCDs) in α-CAs related to CaCO3 biomineralization

In silico sequence analysis (Additional file 1) focused on the identification and characterization of LCD domains present in α-CA sequences. We selected 33 α-CAs among which 33 have been demonstrated to be present in calcifying tissues or in the CaCO3 biomineral itself (e.g. nacreins in the shell of pearl oysters; [2]). Thus, we assume that these α-CAs are directly involved in the biocalcification process.

The full-length sequences of these biomineral-associated α-CAs varies from 278 (A. willeyana, GenBank:ABR53885) to 678 amino acids (C. gigas, GenBank:ECK34889). This disparity in size is primarily due to the presence/absence of LCDs of variable length. Among the selected 39 biomineralization-associated α-CA sequences, 25 of them contain LCDs, which can make up to half the length of the sequence. Figure 5 presents a schematic representation of the primary structure of these 39 CAs.

Figure 5
figure 5

LCDs in α-CAs, which are presumably associated with CaCO 3 biominerals. Schematic representations of 38 α-CA sequences that are membrane-associated (MA), secreted (S) or cytosolic (C).

With three exceptions (C. gigas, GenBank:EKC34889; L. gigantea, JGI:239188 and S. purpuratus, JGI:SPU_012518), all of these LCDs are located in the C-terminal half of the protein. The homology of these domains is difficult to discern as they differ in length, and are composed of a variety of amino acid sequences (Figure 5). Moreover, our phylogenetic analyses indicate that α-CAs containing these LCDs are not necessarily orthologous, but were likely acquired independently from lineage to lineage. The most common LCD is the GXN repeat found in molluscan nacreins [2],[89]. Within the bivalves, the homology of such GXN domains is hinted by the degree of nucleic acid similarity (Additional file 6: Table S2). This similarity does not extend to the GN repeat found in the nacrein of the gastropod Turbo marmoratus (Figure 6). Beside issues of LCD homology is the question of origin. In bivalves, the repetition of the GXN motif may be the result of a partial gene duplication followed by a series of unequal recombination events, as it has been suggested for other proteins such as mucins [90]-[92]. Such events result in the successive, tandem addition of short motifs. Transposable elements may also have generated such RLCDs in molluscan biomineralizing proteins. To test this idea we searched for signatures of transposases and the presence of inverted repeats in RLCD containing α-CA using the RepeatMasker software [93]. We could not find evidence of either in these sequences.

Figure 6
figure 6

Details of 17 LCDs of α-CAs/nacreins from invertebrate species. Three categories sharing likely homology are indicated (black boxes): C-term basic LCDs in the sponge A. willeyana (highlighted in pink), acidic GXN repeated domains in bivalves (highlighted in blue) and acidic LCDs in the two species of gastropods Haliotis (highlighted in green).

There is strong proteomic evidence that some nacreins are occluded in the calcified molluscan shell, and consequently belong to the so-called “calcifying matrix” [15],[94]-[97]. It is known that the GXN repeat may display different functions related to calcification: (1) it may interact with calcium ions in solution, by behaving as a low affinity, high capacity calcium-binding domain, and could therefore be involved in CaCO3 nucleation [98],[99]. This suggests that nacreins work as both as enzymes, converting CO2 into bicarbonate, and also as CaCO3 nucleating polymers. (2) The GXN repeat domain is also known to interact with CaCO3in vitro. Miyamoto et al. [5], conducted in vitro calcification assays with wild type and truncated recombinant nacrein proteins. According to that work, the GXN domain inhibits the activity of nacrein and could act as a negative regulator of the CaCO3 precipitation in the shell formation. The reason for such an α-CA inhibitory role is as yet unknown.

Our in silico analysis also reveals the occurrence of domains with similar acidic domains to the nacrein GXN repeat in the longer α-CA of L. gigantea (JGI:239188) and in the spicule specific α-CA [12] of S. purpuratus (JGI:SPU_012518). In L. gigantea, an acidic DG-rich peptide is located at the C-terminal end, and another more basic peptide is present just downstream of the signal peptide. In the purple sea urchin, the G-rich domain is less acidic than the nacrein GXN-rich domain. The position of this domain is also uncommon because it is upstream of the active α-CA site, in the N-terminal half of the protein.

Other short LCDs are strikingly different from the nacrein GXN repeat. Some are basic, i.e., positively charged in standard pH conditions and are represented by the C-terminal R-rich, S-rich and LS-rich ends in four α-CAs (GenBank:ABR53886, GenBank:ABR53887, GenBank:ACE95141, JGI:SPU_012518). In A. willeyana the basic K-rich domain exhibits the KKRKRR motif containing also several arginine residues similarly to the C-termini of two family of proteins present in the prism matrix of the shell of the pearl oyster Pinctada fucata: shematrins and K-rich mantle proteins [100]-[102]. The presence of such basic domains suggests two possible functions: (1) an interaction with negatively charged bicarbonate ions. If so, the basic domain may concentrate bicarbonate ions at the vicinity of the nucleation site. (2) Electrostatic interaction with acidic (D/E-rich) domains of other proteins of the calcifying matrix. In this case, the basic domains may then anchor the acidic-soluble proteins to the three dimensional framework of the matrix. Here again, these basic domains indicate that the corresponding α-CAs display at least two functions, the enzymatic activity and the interaction with bicarbonate and/or acidic macromolecules.

Although the way by which the "mosaic" α-CAs are evolutionarily constructed is not known, exon shuffling can be envisaged as one possible mechanism [103]. Such a mechanism can generate new functions through the combination of different pre-existing functional domains into one protein. This "genetic tinkering" is well documented for proteins present in vertebrate extracellular matrices [104],[105]. It is believed that this mechanism is subject to elevated rates of evolution, in comparison to those observed for intracellular proteins [106].

Conclusion

The α-CA family is a key family for understanding how the molecular mechanisms of metazoan biocalcification strategies evolved. Independent lines of evidence (phylogenetic relationships of biomineral associated α-CAs and patterns of LCD insertions) suggest that the recruitment of α-CAs into biocalcification roles likely occurred independently in different metazoan lineages. A major limitation of this work is the difficulty in reconstructing a robust phylogeny of metazoan α-CAs. The primary causes of this difficulty are likely to include the deep phylogenetic history of the α-CA family, the fact that many independent duplications have generated a variety of isoforms, and that these isoforms have subsequently been recruited to a wide variety of physiological roles. We remain optimistic that both broader and deeper taxon sampling (i.e. complete genome sequences) may bring more resolution to these relationships, and we anticipate that in the coming years such data will make these analyses more accurate. Such complete genome data may also provide additional lines of evidence for assigning CA orthology via gene synteny. In addition, physiological, biochemical and functional studies aimed at understanding the role of specific α-CAs in biomineralization processes will provide insight into the evolutionary history of mineral-relevant α-CA isozymes. A rich field of research regarding the origins of the LCDs associated with mineralizing α-CAs also awaits further investigation. Understanding how such LCDs are generated, how and why they come to be associated with mineralizing α-CAs, and what their functions are remain open and intriguing questions.

Additional files