Prediction of sites under adaptive evolution in flavin-containing monooxygenases: Selection pattern revisited

Flavin-containing monooxygenase (FMO), like cytochrome P450 (CYP), is a monooxygenase that uses the reducing equivalents of NADPH to reduce one atom of molecular oxygen to water, while the other atom is used to oxidize the substrate. Recently, it was shown that some CYP isoforms have been subject to positive selection. However, it is unknown whether the highly conserved phase I detoxification enzyme, FMO, has undergone similar positive Darwinian selection. We used maximum-likelihood models of codon substitution, evolutionary fingerprinting, and cross species comparison to investigate the occurrence of adaptive evolution in FMO sequences. We used recent genomic data from a range of species, including vertebrates and invertebrates. We present the evidence for the occurrence of adaptive evolution in mammalian FMO 3, 4, 5, and fugu FMOs but not in mammalian FMO 1, FMO 2, frog FMOs, other fish FMOs and invertebrate FMOs. The sites under adaptive evolution were significantly associated with the insertion domain in mammalian FMO 5. We identified specific amino acid sites in FMOs 3–5 that are likely targets for selection based on the patterns of parallel amino acid change. The most likely role of adaptive evolution is the repair of mutations that permitted optimal NADP+ binding and improved catalytic efficiency. The occurrence of positive selection during the evolution of phase I detoxification enzymes such as FMOs 3–5 and fugu FMO suggests the occurrence of both high selection pressure acting on species within their unique habitats and significant changes in intensity and direction (forms of xenobiotics and drugs) resulting from changes in microhabitat and food.

Flavin-containing monooxygenases (FMOs, E.C. 1.14.13.8) oxygenate the nucleophilic O, N, S and Se atoms of a wide range of substrates, including amines, amides, thiols and sulfides. FMO, like cytochrome P450 (CYP), is a monooxygenase that uses the reducing equivalents of NADPH to reduce one atom of molecular oxygen to water, while the other atom is used to oxidize the substrate. Similarly, FMO and CYP also share similar tissue and cellular distributions, molecular weights, substrate specificities, and both exist as multiple enzymes that are under developmental control. The human FMO functional gene family is much smaller *Corresponding authors (email: hao@djtu.edu.cn; xiaopg@public.bta.net.cn) (5 families each with a single member) than the CYP family. Furthermore, FMO does not require a reductase to transfer electrons from NADPH and the catalytic cycle of the two monooxygenases also differs significantly. The FMOs associated with liver microsomes are divided into 3 classes: FMOs, N-hydroxylating monooxygenases, and Baeyer-Villigar monooxygenases (BVMOs). BVMO and FMO may be distinguished by their signature sequences FXGXXXHXX-XW(P/D) and FXGXXXHXXX(Y/F), respectively. In general, CYP is the primary contributor to oxidative xenobiotic metabolism. However, an increasing number of drugs and xenobiotic substances are known to be metabolized by FMOs [1,2]. FMO and CYP have overlapping substrate specificities, but often yield distinct metabolites with potentially significant toxicological/pharmacological consequences.
An increasing number of researchers study gene function and regulation using population genetics and phylogenomics techniques. This approach is aided by the rapidly expanding effort towards genome sequencing and genotyping. The evolution of a gene's sequence among human populations or species reflects, in part, adaptations to changing physiology or environment. Recently, Qiu et al. [3] investigated the evolution of the CYP3 genomic loci using genomic sequences from 16 species, and found 2 recent episodes of particularly strong positive selection acting on primate CYP3A4 and 3A7 protein-coding sequences. Zawaira et al. [4] used maximum-likelihood models of codon substitution to investigate the role of adaptive evolution in the evolution of CYP sequences, and found evidence for the occurrence of adaptive evolution in the evolution of rat CYP2C, rabbit CYP2C, rat CYP2D, human CYP3A, and rabbit CYP4A. In addition, Chen et al. [5] conducted a comprehensive study of the nucleotide diversity and haplotype structure of the CYP3A locus and concluded that CYP3A4 and 3A7 had recently undergone, or were undergoing, a selective sweep in all 3 populations under study. However, CYP3A43 and 3A5 were undergoing a selective sweep in non-Africans and Caucasians, respectively.
In contrast to the depth of information on the CYP family, little is known about the evolution of FMOs. Allerston et al. [6] analyzed the genetic diversity within FMO 3 and found an excess of intermediate-frequency SNPs and haplotypes, a ragged pairwise mismatch distribution, and an excess of replacement polymorphisms, providing evidence that FMO 3 has been the subject of balancing selection. Hao et al. [7] reconstructed a phylogenetic tree for the FMO family and characterized the long-term evolution and functional divergence followed by members of this family. The authors noted that there is extensive silent divergence at the nucleotide level suggesting that this family has been subject to strong purifying selection at the protein level. However, recently published genomic data from a range of species, including vertebrates and invertebrates, have not yet been fully evaluated with respect to determining the evolution of FMOs. The whole genome sequencing of a rapidly increasing number of species facilitates studies of the evolution of gene families using a comparative genomic approach. Given the similarities between FMO and CYP, we hypothesize that FMO is also subject to positive Darwinian selection. We test this hypothesis using a maximum-likelihood approach and cross species sequence comparison.
The protein database predicted from the S. purpuratus genome (release 2.1) [8] was searched using hidden Markov models (HMMs) of FMOs constructed with Hmmer 2.3 [9]. The HMM was constructed with FMOs from humans, mice, rats, and fish. Both global and local multidomain HMMs were constructed and used to search the predicted protein database. Predicted proteins were aligned with known FMOs using Clustal W2 [10]. Examination of the S. purpuratus genome assembly using Blast searching confirmed the FMO protein sequence predictions. Genome sequences were obtained from GenBank, Ensembl, and the Joint Genome Institute. Some preliminary gene predictions were performed using Genewise [11] and Genscan [12].

Sequence alignment and phylogeny reconstruction
DNA sequence and codon alignments were performed with RevTrans [13] (http://www.cbs.dtu.dk/services/RevTrans/) and Clustal W2. For amino acid sequences, we used the neighbor-joining (NJ) method [14] to reconstruct the phylogenetic tree with MEGA4 [15]. The best model, JTT + G,  Table 2. The occurrence of positive selection in FMO 2 was not supported by the evolutionary fingerprint analysis ( Figure 2) and parallel amino acid substitution (Table S3).  was identified using ProtTest [16]. To determine whether the outcome was dependent on this choice, we conducted a phylogenetic inference analysis by reconstructing a Bayesian tree (MrBayes 3.1.2) [17] with 4 Markov chain Monte Carlo chains run for 1×10 6 generations. For the nucleotide sequences, we used a Bayesian analysis and the maximum likelihood (ML) method (GARLI) [18] to infer the phylogenetic trees. The best model, GTR + I + G, was selected using ModelTest 3.8. Bayesian probabilities were obtained under this model using a Markov chain Monte Carlo simulation (MCMC: 4 chains, each runs for 4×10 6 generations). We used random trees as the starting point and sampling every 500th generation. To test the reliability of the obtained topologies, we calculated bootstrap probability (BP) and posterior probability (PP) values for each internal branch, assuming that BP≥80% and PP≥95% were statistically significant. S. pombe FMO [19] and the FMO from Saccharomyces cerevisiae [20] were assigned as outgroups in the reconstructions.

Detection of positively selected sites and evolutionary fingerprint analyses
We tested for evidence of positive selection by comparing the nonsynonymous substitution rate (d N ) to the synonymous substitution rate (d S ). If a gene is evolving neutrally, ω = d N /d S is expected to equal 1, whereas ω>1 is considered strong evidence that a gene has experienced positive selection. We used several ML approaches to test for evidence of positive selection on FMOs. The first approach, developed by Yang et al. [21] (referred to as Yang models), involves comparisons of a neutral codon substitution model with ω constrained to be ≤1 to a selection model where a class of sites has ω>1. Because neutral models are nested within the corresponding selection models, a likelihood ratio test (LRT) can be used to compare the two. The test statistic -2ΔlnL (ΔlnL = the difference in log likelihoods of the 2 models) follows the χ 2 distribution with degrees of freedom (df) equal to the difference in number of parameters between models. In the specific models implemented, ω varies between codons as a beta distribution (neutral: M7, M8a; selection: M8). We implemented models M7, M8a, and M8 with the codeml program in PAML4 [22]. Because Yang models are based on theoretical assumptions and ignore the empirical observation that distinct amino acids differ in their replacement rates, we also implemented the MEC (Mechanistic Empirical Combination) model [23]. This takes into account not only the transition-transversion bias and the nonsynonymous/synonymous ratio, but also the different amino acid replacement probabilities as specified in empirical amino acid matrices. Because the LRT is applicable only when two models are nested and is therefore not suitable for comparing MEC and M8a models, we used the second-order AIC (AICc) for comparisons [23]. Those sites that are most likely to be in the positive selection class (ω>1) are identified as likely targets of selection.
Although the Yang models allow for variation in the nonsynonymous substitution rate, the synonymous rate is fixed across the sequence. A number of methods have been proposed for detecting positive selection that allow for variation in the synonymous rate (e.g., fixed effects methods and random effects methods). The fixed effect likelihood (FEL) method [24] estimates ω on a site-by-site basis, uses ML estimation, and treats shared parameters (branch lengths, tree topology, and nucleotide substitution rates) as fixed. The random effects likelihood (REL) method is similar to the Yang model M3. However, both nonsynonymous and synonymous rates vary as gamma distributions with 3 rate classes [24]. We implemented the REL, FEL, and SLAC methods using the web interface DATAMONKEY [25].
Over time, natural selection molds every gene into a unique mosaic of sites that evolve rapidly or resist change, an "evolutionary fingerprint" of the gene. Pond et al. [26] developed a novel model for coding sequence evolution that uses a general bivariate discrete parameterization of the evolutionary rates. This approach provides a better fit to the data using a smaller number of parameters than existing models. We used the probability distributions generated using this method to represent the evolutionary fingerprints of the FMOs under consideration.

Parallel amino acid substitutions
Parallel and convergent evolution refer to independent acquisitions of the same character state on more than one occasion during evolution. The distinction between parallelism and convergence is that the former refers to the situation in which the ancestral states were identical among independent lineages, whereas the latter requires different ancestral states [27]. To identify parallel amino acid substitutions, we performed ML reconstructions of ancestral sequences and individual mutation events using PAML4 (baseml and pamp). The marginal reconstruction approach [28] compares the probabilities of different character assignments to an interior node at a site and selects the character that has the highest PP. We then calculated the number of independent changes, Grantham's distance [29] between the starting and ending amino acid, the universal evolutionary index (EI) [30], and the possible alternative amino acid substitutions for the identified parallel amino acid substitutions.

Statistical significance of the overlap between sites under adaptive evolution and defined insertion domain
S. pombe FMO (447 aa) consists of two structural domains [19]. Residues 176-291 form a small structural domain, i.e., the insertion domain (ID), with the remainder of the polypeptide chain forming a larger single domain. The prokaryotic FMO from Methylophaga sp. strain SK1 is made up of 2 distinct domains [31], a larger FAD-binding domain (residues 1-169 and 281-461) and a smaller NADP-binding domain (residues 170-280, corresponding to the ID of the yeast FMO). The positions of the ID in mammalian FMOs were assigned according to the sequence alignment with yeast and bacterial FMOs (Figure 1). In addition, we defined the active sites (ASs) of mammalian FMOs by extending the yeast and bacterial FMO ASs by 3 amino acid residues on both sides (Figures 1 and S1).
We constructed a model for analyzing the statistical significance of the overlap between the sites under adaptive evolution and the ID as follows. Let X be the total number of sites (for analysis of a given gene or set of genes as listed in the first column of Tables 1 and S1) predicted to be under adaptive evolution that lie in the ID. We used the binomial distribution to model X. To test the statistical significance of the overlap between the model-predicted sites and the ID, we tested the null hypothesis that the probability of a predicted site lying in the ID (which we refer to as P) is half, versus the alternative hypothesis that P is greater than half, that is: If the null hypothesis is accepted, the observed proportions support the conclusion that the model-predicted sites are equally likely to lie inside or outside the ID, implying a weak association between the predicted sites and the ID. In contrast, rejection of the null hypothesis supports the conclusion that there is an association between the predicted sites and the ID, i.e., the observed overlap between sites under adaptive evolution and the ID is greater than that expected from chance alone (where the adaptive evolution sites are equally likely to occur inside or outside the ID). If n is the number of predicted sites in a given analysis then under the H 0 the probability that X takes the value r [denoted to by P(X=r)] is given by the binomial distribution as follows: We performed a one-tailed test with a significance level of 10%. This involved finding an integer (a) such that P(X≥a) < 0.1. For a given study, the value of a defines the rejection/acceptance criteria as follows: if X<a, then the integer a does not lie in the critical region and the null hypothesis is accepted, otherwise it is rejected and the alternative hypothesis accepted.

Maximum-likelihood analysis of orthologous and paralogous FMO sequences
The positively selected sites of the orthologous and paralogous sequences obtained from the maximum-likelihood inferences are shown in Tables 1, 2 and S1, respectively. We identified positively selected sites in the FMOs 3, 4, and 5 orthologs and in the fugu FMO paralogs. Based on the evolutionary fingerprint analysis, both FMOs 4 and 5 have 5 evolutionary rate classes, fugu FMOs have 4 rate classes, and FMO 3 has 3 rate classes (Figure 2). The presence of one rate class with ω (β/α) >1 in these 4 groups suggests the occurrence of positive selection. The SLAC, FEL, REL, M8 and MEC models suggested that one or several FMO 1 sites may have been subject to positive selection. However, the evolutionary fingerprint analysis did not yield evidence of Sequence identities to human FMO 3 are 31% for meFMO, 27% for spFMO, 42% for fugu FMO, 52% for human FMO 4, and 52% for human FMO 5. The ID was defined by profile alignment using ClustalW2 and its boundary is shown with "start" and "end", corresponding to the alignment positions 189 and 390, respectively. We defined active sites (AS) by extending the Methylophaga FMO active sites by three amino acids on both sides. ASs 3 (alignment positions 188-194) and 4 (alignment positions 236-246) are indicated with "^" and black "*", respectively; ASs 1, 2, 5, and 6 were not within the ID region and are shown in Figure S1. "+", positively selected sites detected by the M8 model and lying within the ID region. The alignment positions of the PS sites are as follows: 269, 274, 280, 284, 308, 335, and 382 of FMO 4; 226, 280, 283, 284, 289, 290, 292, 293, 296, 298, 300, 301, 303, and 308 of FMO 5; 290 of fugu FMO. "------" and "===" represent the fingerprint sequences of FMOs that are similar to the BVMO-identifying sequence motif and the second Rossmann fold for NADPH binding, respectively. Common SNPs located within ID region are labeled with "#", "%", and "@" for human FMOs 3, 4, and 5, respectively. "&" indicates FMO 3 residues that are mutated in patients affected by TMAU. positive selection for FMO 1 (Figure 2). Similarly, none of the rate classes supported the occurrence of positive selection in FMO 2 (Figure 2), although the FEL, M8 and MEC models provided support for positive selection at one or several FMO 2 sites. The discrepancies between the different models are a function of their underlying mathematical and statistical relationships. To reduce the potential for bias, we used a range of methods to predict the presence of positive selection. All Europeans and Asians tested were homozygous for a non-functional FMO 2 variant (FMO2*2A) that contains a premature stop codon caused by a single nucleotide change in exon 9 (g. 23238C>T) [32]. This is exactly as would be expected under the birth-and-death evolution model [33] and implies nonfunctionalization [34] after gene duplication. To be stringent, we do not consider FMOs 1 and 2 to be under positive selection. In addition, we found no evidence for positive selection at sites in the sea urchin, medaka, tetraodon, and other non-mammalian data sets, except fugu FMOs (Table S1).

Parallel amino acid substitutions
The pattern of amino acid change based on ML ancestral sequence reconstruction provides further evidence that FMOs 3-5 evolved under positive selection. Twenty seven amino acid sites in FMO 3, 34 sites in FMO 4, and 32 sites in FMO 5 changed independently to the same amino acid in 2 or more mammalian species (Table S3). For example, site 360 in FMO 3 changed from the polar residue glutamine (Q) to the nonpolar leucine (L) in tarsier and the lesser hedgehog tenrec (Figure 3(a)). In FMO 4, site 235 changed from the nonpolar valine (V) to the negatively charged aspartate (D) Figure 2 Phylogenetic relationships among FMO proteins and the evolutionary fingerprint of each FMO lineage. The reconstruction was carried out using the JTT + G model. The tree/branch style is shown as a circle. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances are in the units of the number of amino acid substitutions per site. The concept of an evolutionary fingerprint is formulated based on the probability distribution of site-to-site of synonymous (α) and nonsynonymous (β) substitution rates in an alignment [26]. Methods that exploit the ω = β/α ratio as an indicator of positive or negative selection have become a mainstay of modern evolutionary analyses. An accurate estimate of the strength and extent of selective forces acting on a gene, represented by a distribution of α and β, is undoubtedly more informative than a simple binary test for any kind of positive selection. The panel beside each FMO lineage shows the approximate sampling distributions, which are drawn on the log scale that is consistent between all 10 data sets, with α on the x axis and β on the y axis, and the diagonal line corresponding to values α = β: A proxy for neutral evolution. The color intensity reflects the density assigned to a particular square of the rate distribution. The ellipses are centered on the approximate sampling means of the corresponding rate estimates, and each axis is drawn as 1.96× the sampling standard deviation for α (horizontal) or β (vertical) rates. in bushbabies and dolphins (Figure 3(b)). This change was classified as radical based on Grantham's distance, which takes into account amino acid size, hydrophobicity, charge, and polarity. This classification was supported by the universal evolutionary index (EI , Table S3). Changes from a noncharged to a charged residue were found in FMOs 4 and 5. Changes at 2 sites in FMO 3, 6 sites in FMO 4, and 2 sites in FMO 5 were not conservative or moderately conservative, as defined by changes in charge or by Grantham's distance. There was no significant difference in the distribution of the four types of amino acid change among FMOs 3-5 (chi square test, χ 2 =3.86, P=0.695). Such nonconservative changes occur much less frequently than is expected under neutrality [35]. Thus, the nonconservative changes observed are more likely to have consequences for enzyme structure and/or function. Parallel evolution at the amino acid sequence level can be interpreted as evidence of adaptive evolution [27]. Thus, sites that have changed in parallel are likely targets of selection, in addition to those sites identified using the ML approach. Our results also suggest that the number of available pathways of adaptive evolution may be constrained.

Assignment of predicted sites under adaptive evolution to the map of the insertion domain and active sites
The predicted sites under adaptive evolution (listed in Tables 2 and S1) were projected onto the map of the ID and AS (i.e., onto the structure of yeast and bacterial FMOs) [31] using the profile alignment strategy. The projections shown in Figure 1 were carried out for FMOs 3-5 and fugu FMO where adaptive evolution was observed (Figure 2).

Statistical significance of the overlap between sites under adaptive evolution and the insertion domain of the FMOs
We used the binomial distribution to model the total number of sites predicted to be under adaptive evolution that lie either inside (Figure 1) or outside the ID. The statistical analysis is summarized in Table S2. A significant number (14 of 16) of positively selected sites in FMO 5 were located in the ID. Similarly, a significant number (4 of 4) positively selected sites in the fugu FMOs detected by MEC model were located in the ID. These sites may have contributed to the fugu's adaptation to specific aquatic environments. None of the positively selected sites in FMOs 3 and 4 were located in ASs 1-6. In addition, only one positively selected site (465L) in FMO 5 was in AS-6 ( Figure  S1). In contrast, Zawaira et al. [4] found a significant association between sites under adaptive evolution and Gotoh's substrate recognition sites (SRSs) in rat and rabbit CYP2C, human CYP3A, and rat CYP2D. Furthermore, 2 of 3 positively selected sites in the primate CYP3A were in SRS-6 [3]. The method for assessing the statistical significance of the overlap between the sites under adaptive evolution and the ID assumes that all positions in the non-ID region are functionally relevant, either for substrate specificity, binding NADP + , or enzyme kinetics. This assumption should be supported or modified based on experimental evidence. Our map of the ID and ASs is complete given current data, but should be updated when information regarding substrate-binding sites encoded in FMO-substrate complexes and mutagenesis become available.

Discussion and conclusions
FMOs play an important role in mediating interactions between organisms and their chemical environment due to their involvement in the metabolism of xenobiotics and endobiotics. Thus, they are likely targets for natural selection. Previously, we suggested that FMOs are subject to strong purifying selection [7]. Allerston et al. [6] provided population genetic evidence that human FMO 3 has been subject to balancing selection, but not positive selection. To further our understanding of the evolutionary history of the FMO loci, we analyzed mammalian and non-mammalian DNA and protein sequence variation within a phylogenetic framework. For the first time, we unambiguously identified a number of positively selected sites in FMOs 3-5 and showed that the loci have been subject to adaptive evolution.
FMOs belong to the flavoenzyme class of single component flavoprotein monooxygenases. They use equivalents from NADPH to reduce the FAD cofactor which, in turn, becomes capable of reacting with molecular oxygen to yield the C4a-hydroperoxy FAD intermediate [31]. A defining characteristic of this reaction is that the presence of NADP + is essential for intermediate stabilization. Furthermore, NADP + remains consistently bound to the insertion domain of the enzyme throughout the catalytic cycle, being the last product to be released. Interestingly, we observed a significant overlap between the FMO 5 sites under adaptive evolution (PAML-predicted) and the insertion domain. The most likely role of adaptive evolution is to repair the mutations that permitted optimal NADP + binding and improved catalytic efficiency toward the expanding substrate repertoire during long-term evolution of FMO 5. Compared to FMOs 1-4, FMO 5 is ancient in origin [7]. FMO 5 is the most prominent form of FMO in the fetal liver. FMO 5 and FMO 3 transcripts are both prevalent in human adult liver. FMO 5 is also the most abundant FMO transcript present in the human small intestine and may contribute to intestinal first-pass metabolism [36]. FMO 5 does not oxygenate typical FMO substrates (i.e., methimazole, ranitidine, or cimetidine) [37,38], but has been reported to S-oxygenate thioethers with a proximal carboxylic acid, a somewhat unique FMO substrate activity [39]. The dietary anticarcinogen quercetin enhanced the action of genes involved in phases I and II metabolism, including FMO 5 [40]. E7016, an inhibitor of poly(ADP-ribose) polymerase, undergoes FMO 5-mediated Baeyer-Villiger oxidation in liver microsomes [41]. The specificity of FMO 5 toward catalyzing this Baeyer-Villiger oxidation was confirmed and may be mediated by the amino acid residues in the ID that are under adaptive evolution as these positively selected sites are adjacent to the well-known signature motif FXGXXXHX-XX(Y/F) which is very similar to that of the active site of the BVMOs. As more specific substrates are discovered, FMO 5 may show selective functional activity. Thus, we speculate that FMO 5 differentiated at an early stage, following the split from the common ancestor of FMOs 1-4, to fulfill a critical role in the liver and intestine. The evolutionary fate of FMO 5 is consistent with the neofunctionalization model that predicts that one of the duplicated genes acquires a new function because of positive Darwinian selection. Whether the positively selected sites found in this study are related to the distinct monooxygenation profile of FMO 5 for a large repertoire of xenobiotic agents requires further experimental study.
Seven out of 13 positively selected FMO 4 sites are located in the ID, 5 in the C-terminal region and only 1 in the highly conserved N-terminal region. This is reasonable biologically because genetic variability at the FMO 4 locus has presumably evolved to detoxify novel types of xenobiotics and drugs. FMO 4 is the least studied FMO isoform. The loci for FMOs 1-4 diverged from the ancestral FMO locus a long time ago and now have different functions and metabolize different substrates [7]. Whereas other FMOs displayed a significant, dominant tissue-specific mRNA profile (i.e., FMO 1 in kidney, FMO 2 in lung, FMOs 3 and 5 in adult liver), FMO 4 mRNA was observed more broadly at relatively comparable levels in the liver, kidney, lung, and small intestine [36]. Real time quantitative RT-PCR confirmed that 20 μmol/L p-NO-ASA (NO-donating aspirin) upregulated FMO 4 by 4.5±1.67-fold [42]. Taken together, these observations illustrate FMO subfunctionalization which, in addition to neofunctionalization and nonfunctionalization [34] following FMO gene duplication, increases the complexity of the evolutionary scenario.
Although neofunctionalization may not be important for the evolution of FMOs 1 (the predominant FMO in the fetal liver) and 2 (tend to nonfunctionalization), it is reasonable to postulate that, given a constantly changing environment, some positively selected sites were involved in the neofunctionalization of FMOs 3-5. Following the relatively recent split of FMO 4 and the common ancestor of FMOs 1 and 3 (Figure 2), these detoxification enzymes are currently in the process of specialization. Due to limited knowledge regarding substrate specificity and the lack of information on the 3-D structures of mammalian FMOs, we are unable to adequately explain why positively selected sites were observed in FMOs 3-5, but not in FMOs 1 and 2. Mammalian FMO 3 oxygenates a variety of nucleophilic primary, secondary, and tertiary amines, as well as sulfur and other heteroatom-containing chemicals and drugs [1]. Because FMO 3 is not readily induced or inhibited, variation in the functional activity is derived from genetic variability arising largely from common SNPs (Figures 1 and  S1). Conversely, individuals with rare FMO 3 mutations that cause defective TMA N-oxygenation suffer from trimethylaminuria (TMAU). Allerston et al. [6] found no evidence for positive selection when only considering the intraspecific FMO 3 sequence variation. Conversely, we found two positively selected sites using the M8 model by comparing interspecific sequence variation, which suggests that adaptive evolution has had a role in the long-term, but not short-term, evolution of FMO 3. One of positively selected sites is 360Leu, a polymorphic site. A small number of individuals (i.e., 1%) possessed the Pro360-FMO 3 variant, which is catalytically more efficient than the wild-type FMO 3 [43]. P360-FMO 3 oxygenated mercaptoimidazole, TMA, and 10-(N,N-dimethylaminopentyl)-2-(trifluoromethyl)phenothiazine approximately 3-, 5-, and 2-fold more efficiently, respectively, than wild-type FMO 3. For these substrates, the K m values for P360-FMO 3 and the wild-type enzyme were similar, but the V max values were greater. It is possible that being adjacent to the insertion domain, P360-FMO3 is able to facilitate desorption of NADP + or dehydration of FAD pseudobase water and, thus, speed up the FMO reaction [44]. Another possibility is that P360-FMO 3 modifies the structure of FMO 3. It is known that substitution of Pro into an enzyme can change protein structure. Analogously, the change in amino acid from Q to L, or other cases on site 360, may have adjusted the spatial relationship and/or modified the affinity between the insertion domain and NADP + , thus optimizing FMO 3 reactions with newly encountered substrates in the environment.
In conclusion, we present the first evidence for the occurrence of adaptive evolution in mammalian FMOs 3, 4, 5, and fugu FMOs. The adaptive evolution signal was not detected in mammalian FMOs 1, 2, frog FMOs, other fish FMOs, and invertebrate FMOs. Our results show distinct patterns of evolution for mammalian FMOs 1 and 3, both of which have a recent origin. Furthermore, the sites under adaptive evolution are significantly associated with the in-sertion domain in mammalian FMO 5. Whether the identified positive selection and the parallel amino acid substitution alter the product profile in the respective species deserves further study. The finding of positive selection during the evolution of phase I detoxification enzymes such as FMOs 3-5 and fugu FMO is counterintuitive in the light of the supposed limited room for change in these molecules. Our results support expectations of both high selection pressure acting on the various species within their unique habitats and significant changes in intensity and direction (kinds of xenobiotics and drugs) resulting from changes in microhabitat and food. Future studies of enzyme structure, substrate specificity, and catalytic mechanism would benefit from site-directed mutagenesis based on our current results.  Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.  The supporting information is available online at csb.scichina.com and www.springerlink.com. The supporting materials are published as submitted, without typesetting or editing. The responsibility for scientific accuracy and content remains entirely with the authors.