Introduction

Alpha-2-fucosyltransferases (α2FTs) are enzymes required for the biosynthesis of the terminal glycan motif Fucα2Galβ-R found in ABH and Lewis histo-blood group antigens. The biologic functions of such carbohydrate motifs are not completely clear yet, but their main expression at the surface of epithelial cells that constitute doors of entry for pathogens, as well as on soluble mucins present in these epithelia, suggests that they might have functions related to interactions with microorganisms, pathogenic or not (Marionneau et al. 2001). Consistent with this view, Helicobacter pylori strains, Campilobacter pilori, uropathogenic strains of Escherichia coli, lactobacilli strains, and several strains of Caliciviruses are known to attach to α2fucosylated glycan structures (Boren et al. 1993; Le Pendu et al. 2006; Ruiz-Palacios et al. 2003; Uchida et al. 2006). Additional cellular functions—such as involvement in the development of the olfactory system, angiogenesis, interaction between dendritic cells and the vascular endothelium, and regulation of apoptosis (Garcia-Vallejo et al. 2008; Halloran et al. 2000; Moehler et al. 2008; St. John et al. 2006)—recently have been suggested. Like other glycosyltransferases, α2FTs are type II membrane proteins anchored in the Golgi apparatus. They present a short intracytoplasmic tail and a transmembrane domain in N-terminal location, followed by a stem region and the catalytic domain, which can be subdivided into two subdomains, the N- and C-terminal subdomains.

In mammalian genomes, three α2FT genes are located in tandem and designated as FUT1, FUT2, and Sec1 (Oriol et al. 2000). The coding sequence of each of the three genes is comprised within a single exon, and it has been suggested that this monoexonic structure results from an L1-retrotransposition event that occurred within the α2FT mammalian ancestor gene (Saunier et al. 2001). Their tandem localization and earlier sequence comparisons using only primates suggested that the FUT1, FUT2, and Sec1 genes originated from two successive duplications. The first one would have given rise to FUT1 and to the ancestor of both FUT2 and Sec1, whilst the second duplication event would have generated FUT2 and Sec1.

Gene duplication is generally considered important for adaptation because it allows advantageous mutations in one of the duplicates to promote a new role without impairing the original function exerted by the other duplicate (Ohno 1970). However, theoretical studies suggest that one of the duplicates has a high probability to become silenced rapidly (Walsh 1995), which would be consistent with the silencing of Sec1 in Catharrinians (Apoil et al. 2000). In addition, it has been proposed that functional diversification is a rare event because gene conversion tends to homogenize the variation between duplicated genes (Walsh 1987). For that reason, gene conversion is considered the main mechanism of concerted evolution of gene families. Previous studies have not detected gene conversion in primates because each of the FUT1, FUT2, and Sec1 genes appeared as separated clusters (Apoil et al. 2000); however, very recently the occurrence of gene conversion between FUT2 and Sec1 was reported in a Sec1-FUT2-Sec1 human allele (Soejima et al. 2008).

Here, with the aim to improve our understanding of the evolution and the biologic role of the α2FTs gene family, we addressed two main questions. First, what are the evolutionary relations between FUT1, FUT2, and Sec1 when looking at a broader phylogenetic context? Second, is there evidence of gene-conversion events between these three genes?

Materials and Methods

Complete coding sequences of α2FTs FUT1, FUT2, and Sec1 genes were retrieved from GenBank and aligned with Clustal W (Thompson et al. 1994), followed by visual inspection (see Table 1 for a list of species used, abbreviations, and GenBank accession numbers).

Table 1 List of the coding sequences of the α2FTs genes, FUT1, FUT2 and Sec1, retrieved from GenBank and used in this study

Phylogenetic relations between mammalian FUT1, FUT2, and Sec1 genes were analyzed using the entire catalytic domain, corresponding to nucleotide positions 235 to 1083, 184 to 1026, and 193 to 1033 of the human FUT1, FUT2, and Sec1 sequences, respectively. Only the catalytic domains of these enzymes were used because the three genes are highly divergent for the transmembrane and stem regions, which could not be aligned with confidence. The optimal model of sequence evolution was estimated using the ModelTest web server (Posada 2006). This model was then used to estimate a maximum likelihood (ML) tree using Phyml (Guindon and Gascuel 2003). The software GARD (Kosakovsky-Pond et al. 2006a, b) was used to detect possible phylogenetic incongruences, such as those due to gene conversion. ML trees were estimated for each of the segments identified by GARD and compared using the Shimodaira–Hasegawa (SH) (Shimodaira and Hassegawa 1999) test implemented in PAUP* (Swofford 2000).

In addition, the program Geneconv (Sawyer 1989) was used to confirm the conversion events inferred by visual inspection of the phylogenetic trees recovered for each fragment. Geneconv looks for aligned segments in which pairs of sequences are similar enough to be suggestive of past gene conversion. The programs finds and ranks the highest-scoring fragments globally for the entire alignment (“global” fragments), and if specified, also for each sequence pair (“pairwise” fragments). p values are obtained by permutation (in this case 10,000); however, whereas global p-values compare each fragment with all possible fragments for the entire alignment, pairwise p-values compare each fragment with the maximum that might have been expected for that sequence pair in the absence of gene conversion. Global fragments have p-values that are multiple comparison–corrected for all possible sequence pairs, whereas pairwise fragments have a built-in multiple comparison–correction for the length of the alignment. The program also distinguishes between “inner” fragments, i.e., gene-conversion events between ancestors of two sequences in the alignment, and “outer” fragments, i.e., evidence of past-gene conversion events that may have originated from outside of the alignment. A mismatch penalty was allowed (gscale = 1); therefore, conversion fragments did not have to be identical. ML trees were also obtained with Phyml from the amino acid sequences for each of the segments identified by GARD using the best-fit model suggested by ProtTest (Abascal et al. 2005).

Results

Phylogenetic analysis of the nucleotide sequences corresponding to the catalytic domain (Supplementary data) yielded evolutionary relations somewhat different from those previously published based on full protein sequences (Apoil et al. 2000; Barreaud et al. 2000; Bureau et al. 2001). The FUT1 sequences appeared as a well-supported basal clade. However, FUT2 and Sec1 sequences did not form clearly separated groups. For example, rabbit, rat, and mouse FUT2 sequences clustered, with high support, with their corresponding Sec1 sequences. This was also the case for the pig FUT2 and Sec1 sequences, but here the pair was embedded inside the main FUT2 group.

GARD analyses indicated 2 highly supported recombination break points at positions 720 and 912 (Fig. 1). The resulting fragments were named segment A (nucleotide position 184–720); segment B (nucleotide position 721–912); and segment C (913–1026 human FUT2 nucleotide positions). According to SH test, the resulting phylogenetic trees for each segment were significantly different (p < 0.001). In segment A, such as for the complete sequences, FUT1 appears as a highly supported basal clade. Interestingly, the FUT2 and Sec1 sequences clustered by gene in primates and by species in nonprimates (Fig. 2a). For segment B, the FUT1 sequences did not form a clade (Fig. 2b). Although the major FUT2 and Sec1 groups were established, all of the rabbit sequences (FUT1, FUT2 and Sec1) formed a well-supported group, and the pig Sec1 sequence clustered with the pig FUT2 sequence inside the main FUT2 group. In segment C, the α2FTs formed three distinct clades (FUT1, (FUT2, Sec1)), although they did so with low bootstrap values (Fig. 2c).

Fig. 1
figure 1

cAIC model-averaged support for recombination break points as detected by GARD. Nucleotide position 1 in the graph corresponds to nucleotide position 184 of the human FUT2 sequence

Fig. 2
figure 2

ML trees for (a) segment A (nucleotides 184–720), (b) segment B (nucleotides 721–912), and (c) segment C (nucleotides 913–1032). Nucleotide positions are according to human FUT2 sequence

Geneconv found 52 globally significant global inner fragments (Table 2) and 6 additional pairwise inner fragments (Table 3). The length of the estimated gene conversion tracts ranged from 177 to 521 base pairs (bp). Inclusion/exclusion of the outgroup resulted in similar inferences. Among the 52 inner fragments, Geneconv detected many significant conversion events that were quite consistent with the phylogenetic partition suggested by GARD and the corresponding trees (Fig. 2). Taking into account both sources of information, i.e., the phylogenetic incongruences and the Geneconv output, several gene-conversion events appear to have occurred between FUT2 and Sec1 in segment A. We can infer a gene-conversion event in segment A (nucleotides [nt] 208 to 559) before the diversification of primates. Note that all of the FUT2/Sec1 primate pairs in the Geneconv output (Table 2) group together. Additional but independent FUT2/Sec1 conversion events also seem to have occurred in this segment for cow (nt 193–608), rat (nt 208–728), and mouse (nt 307–728). Another FUT2/Sec1 conversion could have occurred in segment A before the rat and mouse split (nt 241–728). It is possible also to infer two events that imply both segments A and B in pig (nt 208–929) and in segments A, B and C in the rabbit (nt 184–881). These two events are obvious from the trees but only appear in the pairwise inner fragment list provided by Geneconv (Table 3), probably as a consequence of their large mismatch penalties or because overlapping events occurred in segment B. Therefore, we can infer two more events in rabbit, a FUT1-to-Sec1 (nt 755–931) and a FUT1-to-FUT2 (nt 755–1023) conversion. Some of these events are highlighted in grey in the amino acid alignment shown in Fig. 3.

Table 2 List of global inner fragments (484 polymorphisms and 849 aligned bases) obtained with Geneconv where inner fragments are runs of matching sites with penalties
Table 3 Additional pairwise fragments obtained by Geneconv with BC Pairwise SimPval <0.05 or listed global fragments with significantly better BC SimPval (≤3 pairwise fragments considered per pair)
Fig. 3
figure 3

Amino acid alignment of the catalytic domain of the α2FTs proteins of a the N-terminal region and b the C-terminal region. The homologous regions between FUT1, FUT2, and Sec1 that may have resulted by gene conversion are highlighted in light grey. Amino acid substitutions shared between FUT1 and FUT2, but not shared with Sec1, are highlighted in black. Closed square = stop codon; dashes = alignment gaps; dots = identity with the consensus sequence; and question marks = consensus sequence represents nonconsensual amino acids. The consensus sequence is based on the amino acid sequences of the α2FTs proteins of the mammals presented

ML trees were estimated at the peptide level for each of the A, B, and C segments of the catalytic domain (amino acids 62–342 of the human FUT2 enzyme). The trees obtained for each of the segments (data not shown) were consistent with the ones obtained at the nucleotide level. However, when examining the amino acid alignment (Fig. 3), we noted that the number of sites that are identical between FUT1 and FUT2, but that differ in Sec1, is greater in the C-terminal (which includes both B and C segments from amino acid position 237–342) than in the N-terminal subdomain (positions 62–236), with the exception of the Sus and Oryctolagus sequences that suffered gene conversion.

Discussion

Our results confirm the evolutionary scenario for the origin of α2FT genes previously reported for primates (Apoil et al. 2000). As proposed by Apoil et al., two duplication events could explain the emergence of these genes: An ancestral duplication event originated FUT1 and the ancestor of FUT2 and Sec1, and the ancestor of FUT2 and Sec1 duplicated and originated the FUT2 and Sec1 genes.

The idea that in mammals gene conversion between α2FT genes is rare (Apoil et al. 2000) is severely challenged by our results. When only primates are considered, gene conversion is not apparent because FUT1, FUT2, and Sec1 cluster by gene, creating three independent clusters (Apoil et al. 2000); however, gene conversion still could be detected with statistical methods as those implemented in Geneconv. Indeed, when we include other α2FT mammal genes and use GARD and Geneconv, we can readily see that gene conversion between FUT2 and Sec1 has been common. An inspection of the trees, with distinct clades formed by conspecific FUT2 and Sec1 sequences, and the different segments detected by Geneconv, suggests that multiple independent events of gene conversion occurred in the evolution of the α2FT gene family in mammals. The conversion events have different lengths and can span the three different segments previously detected with the GARD software. Note that the two phylogenetic break points detected maximize the phylogenetic disagreement and not the exact limits of the conversion events.

In our analyses, multiple gene-conversion fragments involving primates were detected by Geneconv in segment A (30 of 52). Given that all of the Sec1 and FUT2 primate sequences form a single group in the tree, although defined by a very short branch with a small bootstrap value, all of these fragments might be parsimoniously explained by a single gene-conversion event in the ancestor of primates. Evolution after this conversion event, with accumulation of specific mutations in each gene, would explain why they cluster by gene. The clustering of all the other mammals by species and not by gene suggests an ongoing gene-conversion process between FUT2 and Sec1 within species. In addition, the position of the opossum sequence in segment C within the nonprimate FUT2 clade was not expected, although the use of a small segment (120 nt) and the low bootstrap values suggest that this particular result may not be reliable.

Indeed, the fact that the three genes are located within <80 kb in the same chromosome prompts gene conversion. In addition, the gene-conversion events may be related to the biologic role of the α2FTs. The Sec1 gene is inactivated in many primate species, both in Old World and New World lineages, by a premature stop codon (Apoil et al. 2000; Borges et al. 2008). This gene has also been shown to be inactivated in pig and mouse (Iwamori and Domino 2004), and we recently observed similar evidences in rabbits, where although some Sec1 alleles show residual enzyme activity, most are inactive (Guillon et al. 2009). In these species, however, no premature stop codon was observed. Altogether, these observations suggest that Sec1 is either a pseudogene or that it is on the way to pseudogenisation. At variance, both FUT1 and FUT2 are active in all mammalian species tested so far (Oriol et al. 2000). The fact that the proportion of sites identical between FUT1 and FUT2, but different in Sec1, is higher in the C-terminal subdomain than in the N-terminal domain, and the fact that for most of the species, gene conversion is limited to the N-terminal, suggests that the enzymes must maintain the ancestral characteristics for this particular region (this part of the enzymes most likely resembles the ancestral enzyme that gave rise to this protein family), probably because they require some structural identity to preserve their functionality. The structure and mechanisms of fucosyltransferases are as yet unknown. Nevertheless, based on comparisons with many other glycosyltransferases, some predictions have been made (Breton et al. 1998, 2006). According to the models, the C-terminal region would correspond to the nucleotide binding domain, whereas the N-terminal part would correspond to the acceptor-binding domain. The latter is generally more variable than the former becasue it should accommodate a number of acceptor substrates much larger than the number of donor substrates. In the case of α2FTs, there is a single possible donor substrate, GDP-Fuc, whereas the number of acceptor substrates can be quite large. Indeed, albeit with different affinities, these enzymes use various acceptor substrates such as Galβ3GlcNAcβ-R, Galβ4GlcNAcβ-R, Galβ3GalNAcα-R, Galβ3GalNAcβ-R, and Galβ4Glcβ-R, where R represents the highly variable subjacent chains of glycolipids and of O-linked or N-linked glycan chains of glycoproteins. The redundant nature of Sec1 and the different functional constraints on the two regions of the catalytic domain of FUT1 and FUT2 would explain a higher similarity between these enzymes in the C-terminal part. A comparison of the synonymous and nonsynonymous divergences in both domains indicated that for the three proteins, dN/dS ratios were <1, suggesting that they are under purifying selection. Nevertheless, dN/dS ratios for the C-terminal domain are lower than for the N-terminal domain, consistent with our hypothesis of a higher functional constraint on the C-terminal domain. Concerning Sec1, dN/dS ratios were also <1, but they were higher than for either Fut1 and Fut2, which is at odds with the idea of Sec1 being a pseudogene. Indeed, if Sec1 were a pseudogene, it would be evolving neutrally, and dN/dS should be close to 1. Its deviation from neutrality suggests some functional constraints caused by the action of purifying selection. Nevertheless, these constraints appear lower than those for Fut1 and Fut2 (Table 1 Supplementary material).

In humans, both FUT1 and FUT2 present polymorphisms with null alleles encoding inactive or nearly inactive enzymes, responsible for Bombay and the nonsecretor phenotypes, respectively (Oriol et al. 2000). However, the frequency of these alleles can be different. Likewise, the cell types expressing each enzyme can vary in a species-specific manner. In humans, FUT1 null alleles are extremely rare (Wagner and Flegel 1997), and FUT1 is expressed in many cell types, including erythrocytes, the vascular endothelium, some neurons, and epithelial cells (Ravn and Dabelsteen 2000). In contrast, in humans, FUT2 null alleles are almost as frequent as functional alleles, and it has been shown that the gene undergoes balanced selection to maintain both types of alleles at high frequency in various human populations (Koda et al. 2001). FUT1 has been shown to be involved in some cellular functions, such as adhesion of leukocytes to the vascular endothelium, angiogenesis, and development of the olfactory bulb (Amin et al. 2008; Garcia-Vallejo et al. 2008; Moehler et al. 2008; St John et al. 2006). In contrast, FUT2 is mainly expressed in epithelial cells lining the surface of the digestive tract, the upper respiratory tract, and the lower urinary and genital tracts, i.e., in cells in contact with the external environment and potential pathogens (Marionneau et al. 2001). The secretor/nonsecretor polymorphism determined by FUT2 has been shown to be associated with sensitivity or resistance to various pathogens, including uropathogenic strains of E. coli, BabA-expressing strains of H. pylori, and various strains of norovirus (Azevedo et al. 2008; Le Pendu et al. 2006; Stapleton et al. 1995). The involvement of FUT1 in cellular functions and that of FUT2 in interactions with pathogens may explain the high frequency of FUT2 null alleles in contrast to the rare occurrence of such FUT1 alleles.

Classical studies on the evolution of duplicated genes indicate that the persistence of both duplicates requires their functional differentiation. In the absence of such differentiation, one of the duplicate should rapidly become a pseudogene (Teshima and Innan 2004; Walsh 2003). This is consistent with the inactivation of Sec1 in most primate species and with our observation of a limited and most likely ancient gene-conversion event in this lineage. As discussed previously, FUT1 and FUT2 have become functionally differentiated. Gene conversion involving Sec1 after its inactivation may no longer be observed because it would be deleterious to FUT2 or FUT1. In other species, inactivation of Sec1 may be recent or as yet not complete; therefore, many gene-conversion events between FUT2 and Sec1 can still be detected. The situation is clearly different in rabbit in which the three genes are involved in gene-conversion events. In such a species, Sec1 might have acquired a function distinct from those of the two other α1,2fucosyltransferases genes; however, that remains to be defined.

In conclusion, the gross evolutionary history of α2FTs (FUT1, FUT2, and Sec1) seems clear, but the evolution of these genes involved many gene-conversion events that can only be partially characterized and enumerated. It will be difficult to describe the exact phylogenetic relations for each species and gene because these gene conversions differ in position and in length size and because the several histories embedded in the sequences alignment obscures true evolutionary relations. The degree of concerted evolution of the three α1,2fucosyltransferases genes appears to be species-specific, possibly related to the functional differentiation of these genes.