Skip to main content
Log in

Phylogeny and Sequence Space: A Combined Approach to Analyze the Evolutionary Trajectories of Homologous Proteins. The Case Study of Aminodeoxychorismate Synthase

  • Regular Article
  • Published:
Acta Biotheoretica Aims and scope Submit manuscript

Abstract

During the course of evolution, variations of a protein sequence is an ongoing phenomenon however limited by the need to maintain its structural and functional integrity. Deciphering the evolutionary path of a protein is thus of fundamental interest. With the development of new methods to visualize high dimension spaces and the improvement of phylogenetic analysis tools, it is possible to study the evolutionary trajectories of proteins in the sequence space. Using the data-driven high-dimensional scaling method, we show that it is possible to predict and represent potential evolutionary trajectories by representing phylogenetic trees into a 3D projection of the sequence space. With the case of the aminodeoxychorismate synthase, an enzyme involved in folate synthesis, we show that this representation raises interesting questions about the complexity of the evolution of a given biological function, in particular concerning its capacity to explore the sequence space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Abascal F, Zardoya R, Telford MJ (2010) TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations. Nucleic Acids Res 38:W7–W13

    Google Scholar 

  • Adami C, Ofria C, Travis CC (2000) Evolution of biological complexity. PNAS 97(9):4463–4468

    Google Scholar 

  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410

    Google Scholar 

  • Anisimova M, Gascuel O (2006) Approximate likelihood-ratio test for branches: a fast, accurate, and powerful alternative. Syst Biol 55(4):539–552

    Google Scholar 

  • Aurrecoechea C, Brestelli J, Brunk BP, Dommer J, Fischer S, Gajria B, Gao X, Gingle A, Grant G, Harb OS, Heiges M, Innamorato F, Iodice J, Kissinger JC, Kraemer E, Li W, Miller JA, Nayak V, Pennington C, Pinney DF, Roos DS, Ross C, Stoeckert CJ Jr, Treatman C, Wang H (2009) PlasmoDB: a functional genomic database for malaria parasites. Nucleic Acids Res 37:D539–D543

    Google Scholar 

  • Basset GJC, Quinlivan EP, Ravanel S, Rébeillé F, Nichols BP, Shinozaki K, Seki M, Adams-Phillips LC, Giovannoni JJ, Gregory JF III, Hanson AD (2004) Folate synthesis in plants: the p-aminobenzoate branch is initiated by a bifunctional PabA-PabB protein that is targeted to plastids. Proc Natl Acad Sci USA 101:1496–1501

    Google Scholar 

  • Bastien O, Ortet P, Roy S, Maréchal E (2005) A configuration space of homologous proteins conserving mutual information and allowing a phylogeny inference based on pair-wise Z-score probabilities. BMC Bioinform 6(1):49

    Google Scholar 

  • Berardini TZ, Reiser L, Li D, Mezheritsky Y, Muller R, Strait E, Huala E (2015) The Arabidopsis information resource: making and mining the “gold standard” annotated reference plant genome. Genesis 53:474–485

    Google Scholar 

  • Bornberg-Bauer E, Chan HS (1999) Modeling evolutionary landscapes: mutational stability, topology, and superfunnels in sequence space. Proc Natl Acad Sci 96(19):10689–10694

    Google Scholar 

  • Camara D, Richefeu-Contesto C, Gambonnet B, Dumas R, Rébeillé F (2011) The synthesis of pABA: coupling between the glutamine amidotransferase and aminodeoxychorismate synthase domains of the bifunctional aminodeoxychorismate synthase from Arabidopsis thaliana. Arch Biochem Biophys 505(1):83–90

    Google Scholar 

  • Dayhoff MO (1976) The origin and evolution of protein superfamilies. Fed Proc 35:2132–2138

    Google Scholar 

  • Dayhoff MO, Barker WC, Hunt LT (1983) Establishing homologies in protein sequences. Methods Enzymol 91:524–545

    Google Scholar 

  • Degret F, Lespinats S (2018) Circular background decreases misunderstanding of multidimensional scaling results for naive readers. In: MATEC web of conferences, vol 189. EDP sciences, p 10002

    Google Scholar 

  • DePristo MA, Weinreich DM, Hartl DL (2005) Missense meanderings in sequence space: a biophysical view of protein evolution. Nat Rev Genet 6:678–687

    Google Scholar 

  • Dryden DTF, Thomson AR, White JH (2008) How much of protein sequence space has been explored by life on Earth? J R Soc Interface 5:953–956

    Google Scholar 

  • Edman JC, Goldstein AL, Erbe JG (1993) Para-aminobenzoate synthase gene of Saccharomyces cerevisiae encodes a bifunctional enzyme. Yeast 9:669–675

    Google Scholar 

  • Facco E, d’Errico M, Rodriguez A, Laio A (2017) Estimating the intrinsic dimension of datasets by a minimal neighborhood information. Sci Rep 7:12140

    Google Scholar 

  • Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17(6):368–376

    Google Scholar 

  • France SL, Caroll JD (2010) Two-way multidimensional scaling: a review. In: IEEE transactions on systems, man, and cybernetics, Part C: applications and reviews, vol 99, pp 1–18

  • Gignoux C, Silvestre-Brac B (2002) Mécanique, de la formulation lagrangienne au chaos hamiltonien. EDP Sciences, Grenoble

    Google Scholar 

  • Gorelova V, Bastien O, de Clerck O, Lespinats S, Rébeillé F, Van Der Straeten D (2019) Evolution of folate biosynthesis and metabolism across algae and land plant lineages. Sci Rep 9(1):5731

    Google Scholar 

  • Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59(3):307–321

    Google Scholar 

  • Hinton GE, Roweis ST (2003) Stochastic neighbor embedding. Advances in neural information processing systems. MIT Press, Cambridge, pp 857–864

    Google Scholar 

  • Holm L, Sander C (1996) Mapping the protein universe. Science 273:595–603

    Google Scholar 

  • James TY, Boulianne RP, Bottoli AP, Granado JD, Aebi M, Kües U (2002) The pab1 gene of Coprinus cinereus encodes a bifunctional protein for para-aminobenzoic acid (PABA) synthesis: implications for the evolution of fused PABA synthases. J Basic Microbiol 42:91–103

    Google Scholar 

  • Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30:772–780

    Google Scholar 

  • Kondrashov DA, Kondrashov FA (2015) Topological features of rugged fitness landscapes in sequence space. Trends Genet 31(1):24–33

    Google Scholar 

  • Koonin EV, Wolf YI, Karev GP (2002) The structure of the protein universe and genome evolution. Nature 420:218–223

    Google Scholar 

  • Kumar S, Stecher G, Tamura K (2016) MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol 33:1870–1874

    Google Scholar 

  • Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG (2007) Clustal W and Clustal X version 2.0. Bioinformatics 23:2947–2948

    Google Scholar 

  • Lee JA, Verleysen M (2007) Nonlinear dimensionality reduction. Springer, New York

    Google Scholar 

  • Lemey P, Salemi M, Vandamme AM (2009) The phylogenetic handbook, 2nd edn. Cambridge Press, Cambridge

    Google Scholar 

  • Lespinats S, Aupetit M (2011) CheckViz: sanity check and topological clues for linear and non-linear mappings. Comput Graph Forum 30:113–121

    Google Scholar 

  • Lespinats S, Fertil B (2011) ColorPhylo: a color code to accurately display taxonomic classifications. Evol Bioinform 7:EBO-S7565

    Google Scholar 

  • Lespinats S, Verleysen M, Giron A, Fertil B (2007) DD-HDS: a method for visualization and exploration of high-dimensional data. IEEE Trans Neural Netw 18(5):1265–1279

    Google Scholar 

  • Lukasz P, Kozlowski LP (2017) Proteome-pI: proteome isoelectric point database. Nucleic Acids Res 45:D1112–D1116

    Google Scholar 

  • Marchler-Bauer A, Lu S, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scott C, Fong JH, Geer LY, Geer RC, Gonzales NR, Gwadz M, Hurwitz DI, Jackson JD, Ke Z, Lanczycki CJ, Lu F, Marchler GH, Mullokandov M, Omelchenko MV, Robertson CL, Song JS, Thanki N, Yamashita RA, Zhang D, Zhang N, Zheng C, Bryant SH (2011) CDD: a conserved domain database for the functional annotation of proteins. Nucleic Acids Res 39(D):225–229

    Google Scholar 

  • Maynard Smith J (1970) Natural selection and the concept of a protein space. Nature 225:563–564

    Google Scholar 

  • Morrison A, Ross G, Chalmers M (2003) Fast multidimensional scaling through sampling, springs and interpolation. Inf Vis 2:68–77

    Google Scholar 

  • Neath AA, Cavanaugh JE (2012) The Bayesian information criterion: background, derivation, and applications. Wiley Interdiscip Rev Comput Stat 4(2):199–203

    Google Scholar 

  • Nei M, Kumar S (2000) Molecular evolution and phylogenetics. Oxford University Press, New York

    Google Scholar 

  • Nordberg H, Cantor M, Dusheyko S, Hua S, Poliakov A, Shabalov I, Smirnova T, Grigoriev IV, Dubchak I (2014) The genome portal of the Department of Energy Joint Genome Institute: 2014 updates. Nucleic Acids Res 42(1):D26–D31

    Google Scholar 

  • Pearson K (1901) On lines and planes of closest fit to systems of points in space. Lond Edinb Dublin Philos Mag J Sci 2(11):559–572

    Google Scholar 

  • Povolotskaya IS, Kondrashov FA (2010) Sequence space and the ongoing expansion of the protein universe. Nature 465:922–927

    Google Scholar 

  • Pruitt KD, Tatusova T, Maglott DR (2005) NCBI reference sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 33(Database Issue):D501–D504

    Google Scholar 

  • Rambaut A (2018) FigTree version 1.4.4 (computer program). http://tree.bio.ed.ac.uk/software/figtree/

  • Rébeillé F, Ravanel S, Jabrin S, Douce R, Storozhenko S, Van Der Straeten D (2006) Folates in plants: biosynthesis, distribution, and enhancement. Physiol Plant 126:330–342

    Google Scholar 

  • Romero PA, Arnold FH (2009) Exploring protein fitness landscapes by directed evolution. Nat Rev Mol Cell Biol 10(12):866–876

    Google Scholar 

  • Sammon JW (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comput C 18(5):401–409

    Google Scholar 

  • Setubal JC, Meidanis J (1997) Introduction to computational molecular biology. PWS, Boston

    Google Scholar 

  • Shannon CE, Weaver W (1949) The mathematical theory of communication. Univ of Illinois Press, Urbana

    Google Scholar 

  • Stahnke J, Dörk M, Müller B, Thom A (2016) Probing projections: interaction techniques for interpreting arrangements and errors of dimensionality reductions. IEEE Trans Vis Comput Graph 22(1):629–638

    Google Scholar 

  • Starr TN, Thornton JW (2016) Epistasis in protein evolution. Protein Sci 25(7):1204–1218

    Google Scholar 

  • Tokuriki N, Tawfik DS (2009) Stability effects of mutations and protein evolvability. Curr Opin Struct Biol 19:596–604

    Google Scholar 

  • Torgerson WS (1965) Multidimensional scaling of similarity. Psychometrika 30(4):379–393

    Google Scholar 

  • Triglia T, Cowman AF (1999) Plasmodium falciparum: a homologue of p-aminobenzoic acid synthetase. Exp Parasitol 92:154–158

    Google Scholar 

  • van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605

    Google Scholar 

  • Waterhouse AM, Procter JB, Martin DMA, Clamp M, Barton GJ (2009) Jalview version 2-a multiple sequence alignment editor and analysis workbench. Bioinformatics 25:1189–1191

    Google Scholar 

  • Wright S (1931) Evolution in mendelian populations. Genetics 16:0097–0159

    Google Scholar 

  • Yau SST, Mao WG, Benson M, He RL (2015) Distinguishing proteins from arbitrary amino acid sequences. Sci Rep 5:7972

    Google Scholar 

  • Young G, Householder AS (1938) Discussion of a set of points in terms of their mutual distances. Psychometrika 3:19–22

    Google Scholar 

Download references

Acknowledgements

This work was supported by the French National Research Agency (ANR-10-LABEX-04 GRAL Labex, Grenoble Alliance for Integrated Structural Cell Biology; ANR-11-BTBR-0008 Océanomics; ANR-15-IDEX-02 GlycoAlps and “Origin Of Life” Cross Disciplinary Projects of the Univ. Grenoble-Alpes; ANR-17-EURE-0003).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Olivier Bastien.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lespinats, S., De Clerck, O., Colange, B. et al. Phylogeny and Sequence Space: A Combined Approach to Analyze the Evolutionary Trajectories of Homologous Proteins. The Case Study of Aminodeoxychorismate Synthase. Acta Biotheor 68, 139–156 (2020). https://doi.org/10.1007/s10441-019-09352-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10441-019-09352-0

Keywords

Navigation