Functional Significance May Underlie the Taxonomic Utility of Single Amino Acid Substitutions in Conserved Proteins
We hypothesized that some amino acid substitutions in conserved proteins that are strongly fixed by critical functional roles would show lineage-specific distributions. As an example of an archetypal conserved eukaryotic protein we considered the active site of β-tubulin. Our analysis identified one amino acid substitution—β-tubulin F224—which was highly lineage specific. Investigation of β-tubulin for other phylogenetically restricted amino acids identified several with apparent specificity for well-defined phylogenetic groups. Intriguingly, none showed specificity for “supergroups” other than the unikonts. To understand why, we analysed the β-tubulin Neighbor-Net and demonstrated a fundamental division between core β-tubulins (plant-like) and divergent β-tubulins (animal and fungal). F224 was almost completely restricted to the core β-tubulins, while divergent β-tubulins possessed Y224. Thus, our specific example offers insight into the restrictions associated with the co-evolution of β-tubulin during the radiation of eukaryotes, underlining a fundamental dichotomy between F-type, core β-tubulins and Y-type, divergent β-tubulins. More broadly our study provides proof of principle for the taxonomic utility of critical amino acids in the active sites of conserved proteins.
KeywordsEukaryote Tubulin Phylogeny SNP Substitution Unikont
Phylogeny reconstruction methods aim to establish common descent between taxa, and advances in not only molecular sequence data generation, but also phylogenetic analysis tools and techniques have led to considerable progress in the quest to better understand evolution. Unfortunately though, phylogenies reconstructed from different markers can exhibit discrepancies which can be caused by the underpinning marker undergoing complex evolutionary processes such as parallel or convergent evolution rendering the signals left behind by these processes ambiguous. Among many conceivable markers, those that are based on “rare molecular events” have the potential of being good markers for phylogeny reconstruction. Those used to date include gene fusion, recombination, insertion or deletion, SINESs and LINEs (Rokas and Holland 2000). These can have advantages over traditional phylogenetic comparisons which sometimes conflict depending on the sequence chosen (Edgcomb et al. 2001; Stechmann and Cavalier-Smith 2002; Steenkamp et al. 2006; Van de Peer et al. 2000). Once found, analysis of such rare event markers gives rise to straight-forward and readily interpretable phylogenetic analyses. Confounding such analyses are the risks of multiple similar events rather than a single instance, and the possibility of reversion. These risks are reduced as more such events are discovered and analyzed but such markers remain difficult to identify and are by definition scarce.
Single amino acid polymorphisms in even the most evolutionarily conserved proteins are not rare; substitutions in coding sequences are the coin of protein evolution. The selection pressure on particular amino acids critical for structure and function to remain invariant can, though, be enormous making transitions in some amino acids in the functional regions of conserved proteins rare. When such amino acids do overcome this intense selection pressure and mutate to a different amino acid, particularly when the change is not a conservative one, it may only be possible by way of commensurate and contemporaneous changes elsewhere in the protein or its partner proteins, potentially enabled by alternative strong selective pressures. In this study, we focussed initially on the active site of the canonical eukaryotic cytoskeletal GTPase β-tubulin as a prospective source of amino acids under intense conservative selective pressure. We considered whether transition of amino acids within the active site would show lineage-specific distributions useful for phylogenetic analysis and therefore might be associated with functional differences observed between taxa including microtubule dynamics and pharmacological susceptibility.
Microtubules are a defining feature of eukaryotes representing key components of the cytoskeleton and mitotic spindle. They are built from repeating αβ-tubulin heterodimers and under physiological conditions α- and β-tubulin bind one molecule of GTP each. While the nucleotide bound to α-tubulin is non-exchangeable, the intrinsic GTPase activity of β-tubulin catalyses the hydrolysis of GTP bound to this site. This enzymatic activity is a prerequisite for the dynamic instability of microtubules, which in turn influences their biological function (Downing and Nogales 1999). As a consequence, GTP analogues which bind to β-tubulin and inhibit its activity cause disruption of microtubule function (Muraoka et al. 1999).
Alignment of the β-tubulin 222-232 peptide
In order to determine the distribution of the F224 substitution, we interrogated against a subset of β-tubulins from 15 discrete lineages which encompass most of the eukaryotic diversity so far described (Keeling et al. 2005). We found that only green plants, discicristates, jakobids, haptophytes and cryptophytes possessed F224 (Table 1) and that the Unikonts which are composed from animals, fungi, choanozoa and amoebozoa did not. Since this lineage-specific distribution made 224 a potentially valuable marker for resolving relationships deep in the eukaryotic phylogeny, we validated that this was always the case by interrogating the Uniprot database which provides a complete and nonredundant database of known, distinct β-tubulin sequences. Uniprot now contains over 2,000 distinct β-tubulin sequences spanning the major phyla of eukaryotes. Importantly, for organisms which possess multiple β-tubulin isotypes, all of the organism’s β-tubulin isotypes encoded by the same nucleus were found to have the same amino acid 224.
An interesting case was noted, however; the cryptomonad Guillardia theta which has two nuclei arising as a result of a presumed secondary endosymbiosis of a red algae cell into a non-photosynthetic host. The minor nucleus or nucleomorph retains its own complement of actin and tubulin genes the function of which is enigmatic (Douglas et al. 2001; Keeling et al. 1999). The β-tubulin genes of the G. theta host nucleus encode an F224 type protein while the nucleomorph β-tubulin genes encode Y224 type protein consistent with different eukaryotic lineages for the host and endosymbiont.
The change of state from tyrosine to phenylalanine can be accomplished by a single nonsynonymous point mutation (nsSNP), so the fact that identity of the β-tubulin 224 amino acid does not vary stochastically across the eukaryotic tree is remarkable. It implies considerable evolutionary restraint which is consistent with the critical functional role of this amino acid for the correct binding of GTP. However, β-tubulins are extremely well conserved across their entire sequence and thus a similar situation may occur with other amino acids. We therefore investigated whether other amino acids of β-tubulin might be lineage specific.
Outside of the green plants, discicristates, jakobids, haptophytes and cryptophytes all β-tubulins currently sequenced (including all the bacterial β-tubulins—BtubBs—of Prosthecobacter) (Schlieper et al. 2005) express Y224—with just five exceptions, all of which are found within the Fungi. In three cases (Geotrichum candidum, Microbotryum violaceum and Sporidiobolus pararoseus), H224 is expressed in place of Y224, in a fourth, Yarrowia lipolytica, I224 replaces Y224. In one case, a symbiotic fungus of plant roots does express a β-tubulin with an F224 substitution. This fungus is Paxillus fumigatus, a common, ectomycorrhizal fugus, which is particularly associated with mutualistic growth on the roots of forest plants.
The anomalous presence of F224 in the β-tubulin of P. fumigatus, combined with its close relationship with plant roots led us to consider the possibility that the β-tubulin was conferred horizontally from plants. In basic local alignment searches (BLAST) (Altschul et al. 1990) of the P. fumigatus β-tubulin sequence against the Uniprot and nr databases, the top hundred homologies were restricted to fungi, animals and choanozoa, indicating a likely lineage from fungal rather than plant β-tubulins. Plants and plant roots are associated with the production of anti-microtubule drugs, in particular taxols, vinca alkaloids and colcemids, all of which bind in close proximity or juxtaposition to the F224 site (Downing and Nogales 1999) and would therefore represent a powerful selective force on a root endosymbiont to evolve a more plant-like β-tubulin. Although the P. fumigatus protein is most similar to other fungal β-tubulins, if only those residues which show an evolutionary bias in distribution are considered when comparing P. fumigatus to other fungi and to plants, the β-tubulin of P. fumigatus is plant-like at some key residues. For instance, the amino acids 224, 231, 259, 260, 315 are normally Y, V, M, V, V for fungi, choanozoa and animals but are F, I, L, I, A for plants—P. fumigatus is F, I, L, I, A at these residues.
To gain more insight into the two distinct parts of the network in Fig. 4, we repeated our Neighbor-Net analysis by excluding all core β-tubulins (supplemental Fig. 2a) or all divergent β-tubulins (supplemental Fig. 2b). Not surprisingly, the Neighbor-Net of the divergent β-tubulins remains highly netted, however, the Neighbor-Net of the core β-tubulins now resolves. In common with the aforementioned amino acid analysis, both Neighbor-Nets recover several of the recognized groups. Notably within the divergent β-tubulins (supplemental Fig. 2b) the animal, choanozoa and fungi groups are recovered correctly individually, and as the well-defined opisthokonts group. Within the core group of β-tubulins all groups are recovered correctly by the network; while F-type β-tubulins group to the exclusion of Y-type β-tubulins (supplemental Fig. 2a). Interestingly, except for the unikonts, the supergroups are separated. In particular, the rhizaria are clearly split between the foraminifera which are divergent and the cercozoa which are core. Similarly for the plantae, in which the red algae are divergent, while the green plants are core.
One F-type β-tubulin (Goniomonas) segregates with the divergent β-tubulins indicating that F224 is not an absolute restraint on β-tubulin diversification. It is important to note, however, that this β-tubulin does not associate with any of the other groups in the divergent part of the network and that it has a clear relationship with the other cryptophyte in our analysis (Guillardia theta—host) which groups with the core tubulins. Indeed, it is even possible to recover F-type β-tubulins as a group including Goniomonas. If this is done and the Neighbor-Net analysis is repeated on F-type and Y-type β-tubulins respectively, then recovery of phylogenetic groupings is as with the subanalyses of core and divergent β-tubulins (supplemental Figs. 3a and 3b).
A major feature of the tree proposed by Keeling in Keeling (2007) and Keeling et al. (2005) is the unresolvedness of its root implying that there is uncertainty regarding how the supergroups are related. This uncertainty correlates with the nettedness of the β-tubulin Neighbor-Net (Fig. 4) which is indicative of conflicting information relating to heredity. Such conflict is potentially driven by environmental pressures on the β-tubulin molecule driving convergent and parallel evolution. Nevertheless, group recovery is remarkably good and the clear resolution of more and less divergent groups of β-tubulins tempts speculation that the ancestral state may lie closer to the core β-tubulins than the divergent ones and indeed that within the core group, some groups such as the jakobids may possess β-tubulin with a sequence rather closer to the ancestral state than the β-tubulins of other groups. Although this would need to be explored using alternative means.
Taken together our data suggests a fundamental dichotomy in β-tubulins with core F-type β-tubulins being possessed by plant-like lineages and diverging Y-type β-tubulins by animals and fungi. The functional importance of this dichotomy is yet to be experimentally tested across the predicted range of organisms. Two areas in particular suggest themselves in which these differences may have functional consequences. First, the dynamic instability of microtubules is a characteristic closely linked to GTP hydrolysis. Microtubules incorporating core F-type β-tubulin may be more dynamic than those which incorporate divergent Y-type β-tubulins (Hush et al. 1994; Moore et al. 1997; Shaw et al. 2003). Second, some drug susceptibility profiles may also correlate. For instance, some β-tubulins incorporating core F-type β-tubulins (plant and trypanosome) are more refractory to colcemids than Y-type (animal) β-tubulins—with the colcemid binding site having been mapped to this area of β-tubulin and with substitution mutation of nearby amino acids (213, 226, 236) having already been associated with colcemid resistance (Hari et al. 2003).
In conclusion, our analysis identified an amino acid substitution in the active site of a conserved protein with good discriminative power. It can be argued that single amino acid substitutions which occur very rarely because they are under strong selective pressure not to, are likely to evolve convergently under a strong functional positive pressure rather than neutrally—so that in such cases any instances of loss of homoplasy may be informative. In the case of the identified F224 substitution, homoplasy arose in a root symbiont and it is interesting to speculate whether other closely symbiotic and lichenized fungi will prove similar exceptions. As far as we are aware, our analysis of the distribution of F224 is not inconsistent with the rare molecular events elucidated to date such as the TS-DHFR gene fusion (giving rise to the opisthokonts) and the EF1α 12 bp insertion. However, a tree based on these and the F224 transition as a single rare event would split the haptophytes and cryptophytes from the rest of the chromoalveolata and rhizaria, the jakobids and euglenoids from other members of the excavatae and the green and brown algae from the red algae. Thus F224 may have arisen independently in each of these superkingdoms, perhaps as a result of convergent selection pressure exerted in similar niches on cell size, shape, motility or mode of replication. In this context it is interesting to note that to none of the groups with F-type β-tubulins have centrioles or defined spindle poles a phenotypic aspect which could certainly be envisaged as a fulcrum for selective pressure in some ecological niches (Delattre and Felix 2009). To investigate this possibility we performed a Neighbor-Net analysis of the analyzed eukaryotic β-tubulins and the inferred network clearly segregated these into more and less divergent subsets. Interestingly although lineages were conserved on the network, several of the supergroups were clearly separated by it. Those which were more divergent were almost exclusively Y224 which may indicate that the presence of F224 in the GTP binding site acts as a restraint on β-tubulin diversification that can only be overcome by substitution. If this is the case and F224 were the ancestral state, then environmental pressure might act as a driver for convergent evolution at this residue in diverse lineages. Additional corroborating datasets from other highly evolutionarily restricted single amino acid polymorphisms or rare molecular events should help to discriminate these possibilities in the future. Similar analyses of other essential and evolutionarily conserved proteins are, of course, likely to contain single amino acid polymorphisms which may provide an important but finite pool of markers for further dissection of the tree of life. Regardless of the eventual resolution of the tree of life, the observations of very stable and lineage-specific biochemical adaption in the functional site of β-tubulin remains an important one in our understanding of the evolution of one of the defining structural proteins of eukaryotes.
Thanks to Keith Gull for his initial input into the analyses and to Simon Topp whose MSc thesis provided context to the work. Thanks also to Richard Luduena and Tom Cavalier-Smith for helpful discussion, to Vincent Moulton, Francisco Ayala, Bill Wickstead, Enrico Coen and Clive Lloyd for critical reading of the manuscript. QW was supported by a UEA School of Computing Sciences studentship during her PhD studies from which most of her contribution was drawn.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
- Maddison WP, Maddison DR (2009) Mesquite: a modular system for evolutionary analysis. Version 2.72. http://mesquiteproject.org