Skip to main content

From Sequence Data Including Orthologs, Paralogs, and Xenologs to Gene and Species Trees

  • Chapter
  • First Online:
Evolutionary Biology

Abstract

Phylogenetic reconstruction aims at finding plausible hypotheses of the evolutionary history of genes or species based on genomic sequence information. The distinction of orthologous genes (genes having a common ancestry and diverged after a speciation) is crucial and lies at the heart of many genomic studies. However, existing methods that rely only on 1:1 orthologs to infer species trees are strongly restricted to a small set of allowed genes that provide information about the species tree. The use of larger gene sets that additionally consist of non-orthologous genes (e.g., so-called paralogous or xenologous genes) considerably increases the information about the evolutionary history of the respective species. In this work, we introduce a novel method to compute species phylogenies based on sequence data including orthologs, paralogs, or even xenologs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Aho AV, Sagiv Y, Szymanski TG, Ullman JD (1981) Inferring a tree from lowest common ancestors with an application to the optimization of relational expressions. SIAM J Comput 10:405–421

    Article  Google Scholar 

  • Altenhoff AM, Dessimoz C (2009) Phylogenetic and functional assessment of orthologs inference projects and methods. PLoS Comput Biol 5:e1000262

    Article  PubMed  PubMed Central  Google Scholar 

  • Altenhoff AM, Dessimoz C (2012) Inferring orthology and paralogy. Evol Genomics Stat Comput Methods 1:259–279

    Google Scholar 

  • Altenhoff AM, Schneider A, Gonnet GH, Dessimoz C (2011) OMA 2011: orthology inference among 1000 complete genomes. Nucleic Acids Res 39(suppl 1):D289–D294

    Article  CAS  PubMed  Google Scholar 

  • Arvestad L, Berglund AC, Lagergren J, Sennblad B (2003) Bayesian gene/species tree reconciliation and orthology analysis using mcmc. Bioinformatics 19(suppl 1):i7–i15

    Article  PubMed  Google Scholar 

  • Bansal MS, Eulenstein O (2013) Algorithms for genome-scale phylogenetics using gene tree parsimony. Comput Biol Bioinform IEEE/ACM Trans 10(4):939–956

    Google Scholar 

  • Bansal MS, Alm EJ, Kellis M (2012) Efficient algorithms for the reconciliation problem with gene duplication, horizontal transfer and loss. Bioinformatics 28(12):i283–i291

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Bansal MS, Alm EJ, Kellis M (2013) Reconciliation revisited: handling multiple optima when reconciling with duplication, transfer, and loss. J Comput Biol 20(10):738–754

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Böcker S, Dress AWM (1998) Recovering symbolically dated, rooted trees from symbolic ultrametrics. Adv Math 138:105–125

    Google Scholar 

  • Bogdanowicz D, Giaro K, Wróbel B (2012) Treecmp: Comparison of trees in polynomial time. Evol Bioinform Online 8:475

    PubMed Central  Google Scholar 

  • Boussau B, Szöllösi GJ, Duret L, Gouy M, Tannier E, Daubin V (2013) Genome-scale coestimation of species and gene trees. Genome Res 23(2):323–330

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Byrka J, Gawrychowski P, Huber KT, Kelk S (2010a) Worst-case optimal approximation algorithms for maximizing triplet consistency within phylogenetic networks. J Discr Alg 8:65–75

    Google Scholar 

  • Byrka J, Guillemot S, Jansson J (2010b) New results on optimizing rooted triplets consistency. Discr Appl Math 158:1136–1147

    Google Scholar 

  • Chang WC, Górecki P, Eulenstein O (2013) Exact solutions for species tree inference from discordant gene trees. J Bioinform Comput Biol 11(05):1342005

    Google Scholar 

  • Chaudhary R, Burleigh JG, Fernandez-Baca D (2013) Inferring species trees from incongruent multi-copy gene trees using the robinson-foulds distance. Algorithms Mol Biol 8:28

    Google Scholar 

  • Chen F, Mackey AJ, Stoeckert CJ, Roos DS (2006) Orthomcl-db: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res 34(suppl 1):D363–D368

    Article  CAS  PubMed  Google Scholar 

  • Corneil DG, Lerchs H, Steward Burlingham L (1981) Complement reducible graphs. Discr Appl Math 3:163–174

    Article  Google Scholar 

  • Corneil DG, Perl Y, Stewart LK (1985) A linear recognition algorithm for cographs. SIAM J Comput 14:926–934

    Article  Google Scholar 

  • Dalquen DA, Anisimova M, Gonnet GH, Dessimoz C (2012) ALF–a simulation framework for genome evolution. Mol Biol Evol 29(4):1115–1123

    Article  CAS  PubMed  Google Scholar 

  • Dalquen DA, Altenhoff AM, Gonnet GH, Dessimoz C (2013) The impact of gene duplication, insertion, deletion, lateral gene transfer and sequencing error on orthology inference: a simulation study. PLoS ONE 8(2):e56925

    Google Scholar 

  • DeLuca TF, Cui J, Jung JY, Gabriel KCS, Wall DP (2012) Roundup 2.0: enabling comparative genomics for over 1800 genomes. Bioinformatics 28(5):715–716

    Google Scholar 

  • Doyon JP, Ranwez V, Daubin V, Berry V (2011) Models, algorithms and programs for phylogeny reconciliation. Briefings Bioinform 12(5):392–400

    Article  Google Scholar 

  • Eulenstein O, Huzurbazar S, Liberles DA (2010) Reconciling phylogenetic trees. Evol After Gene Duplication 185–206

    Google Scholar 

  • Fitch WM (1970) Distinguishing homologous from analogous proteins. Syst Zool 19:99–113

    Article  CAS  PubMed  Google Scholar 

  • Fitch WM (2000) Homology: a personal view on some of the problems. Trends Genet 16:227–231

    Article  CAS  PubMed  Google Scholar 

  • Gabaldón T (2008) Large-scale assignment of orthology: back to phylogenetics? Genome Biol 9(10):235

    Article  PubMed  PubMed Central  Google Scholar 

  • Gabaldón T, Koonin EV (2013) Functional and evolutionary implications of gene orthology. Nat Rev Genet 14(5):360–366

    Article  PubMed  Google Scholar 

  • Gerlt J, Babbitt P (2000) Can sequence determine function? Genome Biol 1(5):reviews0005.1–reviews0005.10

    Google Scholar 

  • Goodman M, Czelusniak J, William Moore G, Romero-Herrera AE, Matsuda G (1979) Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences. Syst Biol 28(2):132–163

    Article  CAS  Google Scholar 

  • Gray GS, Fitch WM (1983) Evolution of antibiotic resistance genes: the DNA sequence of a kanamycin resistance gene from Staphylococcus aureus. Mol Biol Evol 1:57–66

    CAS  PubMed  Google Scholar 

  • Hellmuth M, Wieseke N (2015a) On symbolic ultrametrics, cotree representations, and cograph edge decompositions and partitions. In: Xu D, Du D, Du D (eds) Computing and combinatorics. Lecture notes in computer science, vol. 9198. Springer International Publishing, pp 609–623

    Google Scholar 

  • Hellmuth M, Wieseke N (2015b) On tree representations of relations and graphs: symbolic ultrametrics and cograph edge decompositions. J Comb Opt CoRR abs/1509.05069 (Springer)

    Google Scholar 

  • Hellmuth M, Hernandez-Rosales M, Huber KT, Moulton V, Stadler PF, Wieseke N (2013) Orthology relations, symbolic ultrametrics, and cographs. J Math Biol 66(1–2):399–420

    Article  PubMed  Google Scholar 

  • Hellmuth M, Wieseke N, Lechner M, Lenhof H-P, Middendorf M, Stadler PF (2015) Phylogenomics with paralogs. Proc Natl Acad Sci 112(7):2058–2063

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Hellmuth M, Stadler PF, Wieseke N (2016) The mathematics of xenology: Di-cographs, symbolic ultrametrics, 2-structures and tree-representable systems of binary relations. CoRR abs/1603.02467

    Google Scholar 

  • Hernandez-Rosales M, Hellmuth M, Wieseke N, Huber KT, Moulton V, Stadler PF (2012) From event-labeled gene trees to species trees. BMC Bioinform 13(Suppl 19):S6

    Google Scholar 

  • Hubbard TJ et al (2007) Ensembl 2007. Nucleic Acids Res 35(suppl 1):D610–D617

    Article  CAS  PubMed  Google Scholar 

  • Jansson J, Lemence RS, Lingas A (2012) The complexity of inferring a minimally resolved phylogenetic supertree. SIAM J Comput 41:272–291

    Article  Google Scholar 

  • Jensen RA (2001) Orthologs and paralogs—we need to get it right. Genome Biol 2:8

    Article  Google Scholar 

  • Jensen LJ, Julien P, Kuhn M, Von Mering C, Muller J, Doerks T, Bork P (2008) eggnog: automated construction and annotation of orthologous groups of genes. Nucleic Acids Res 36(suppl 1):D250–D254

    Google Scholar 

  • Koonin EV (2005) Orthologs, paralogs, and evolutionary genomics1. Annu Rev Genet 39(1):309–338

    Article  CAS  PubMed  Google Scholar 

  • Kristensen DM, Wolf YI, Mushegian AR, Koonin EV (2011) Computational methods for gene orthology inference. Briefings Bioinform 12(5):379–391

    Article  Google Scholar 

  • Lechner M, Findeiß S, Steiner L, Marz M, Stadler PF, Prohaska SJ (2011) Proteinortho: detection of (co-)orthologs in large-scale analysis. BMC Bioinform 12:124

    Article  Google Scholar 

  • Lechner M, Hernandez-Rosales M, Doerr D, Wiesecke N, Thevenin A, Stoye J, Hartmann RK, Prohaska SJ, Stadler PF (2014) Orthology detection combining clustering and synteny for very large datasets. PLoS ONE 9(8):e105015

    Google Scholar 

  • Li L, Stoeckert CJ, Roos DS (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13(9):2178–2189

    Google Scholar 

  • Liu Y, Wang J, Guo J, Chen J (2012) Complexity and parameterized algorithms for cograph editing. Theoret Comput Sci 461:45–54

    Article  Google Scholar 

  • Mahmood K, Webb GI, Song J, Whisstock JC, Konagurthu AS (2012) Efficient large-scale protein sequence comparison and gene matching to identify orthologs and co-orthologs. Nucleic Acids Res 40(6):e44–e44

    Article  CAS  PubMed  Google Scholar 

  • Östlund G, Schmitt T, Forslund K, Köstler T, Messina DN, Roopra S, Frings O, Sonnhammer ELL (2010) InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nucleic acids Res 38(suppl 1):D196–D203

    Google Scholar 

  • Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Hohna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP (2012) MrBayes 3.2: efficient bayesian phylogenetic inference and model choice across a large model space. Syst Biol 61(3):539–542

    Article  PubMed  PubMed Central  Google Scholar 

  • Schneider A, Dessimoz C, Gonnet GH (2007) Oma browser—exploring orthologous relations across 352 complete genomes. Bioinformatics 23(16):2180–2182

    Article  CAS  PubMed  Google Scholar 

  • Semple C, Steel M (2003) Phylogenetics. In: Oxford lecture series in mathematics and its applications, vol. 24. Oxford University Press, Oxford, UK

    Google Scholar 

  • Shi G, Peng M-C, Jiang T (2011) Multimsoar 2.0: an accurate tool to identify ortholog groups among multiple genomes. PLoS ONE 6(6):e20892

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics

    Google Scholar 

  • Szöllösi GJ, Rosikiewicz W, Boussau B, Tannier E, Daubin V (2013) Efficient exploration of the space of reconciled gene trees. Syst Biol p syt054

    Google Scholar 

  • Szöllösi GJ, Tannier E, Daubin V, Boussau B (2014) The inference of gene trees with species trees. Syst Biol p syu048

    Google Scholar 

  • Tatusov RL, Galperin MY, Natale DA, Koonin EV (2000) The cog database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res 28(1):33–36

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Trachana K, Larsson TA, Powell S, Chen WH, Doerks T, Muller J, Bork P (2011) Orthology prediction methods: a quality assessment using curated protein families. BioEssays 33(10):769–780

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Ullah I, Parviainen P, Lagergren J (2015) Species tree inference using a mixture model. Mol Biol Evol 32(9):2469–2482

    Article  PubMed  Google Scholar 

  • Van der Heijden R, Snel B, Van Noort V, Huynen M (2007) Orthology prediction at scalable resolution by phylogenetic tree analysis. BMC Bioinform 8(1):83

    Google Scholar 

  • Van Iersel L, Kelk S, Mnich M (2009) Uniqueness, intractability and exact algorithms: reflections on level-k phylogenetic networks. J Bioinf Comp Biol 7:597–623

    Google Scholar 

  • Wapinski I, Pfeffer A, Friedman N, Regev A (2007) Automatic genome-wide reconstruction of phylogenetic gene trees. Bioinformatics 23(13):i549–i558

    Article  CAS  PubMed  Google Scholar 

  • Zhang J (2003) Evolution by gene duplication: an update. Trends Ecol Evol 18:292–298

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marc Hellmuth .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Hellmuth, M., Wieseke, N. (2016). From Sequence Data Including Orthologs, Paralogs, and Xenologs to Gene and Species Trees. In: Pontarotti, P. (eds) Evolutionary Biology. Springer, Cham. https://doi.org/10.1007/978-3-319-41324-2_21

Download citation

Publish with us

Policies and ethics