Abstract
Phylogenetic reconstruction aims at finding plausible hypotheses of the evolutionary history of genes or species based on genomic sequence information. The distinction of orthologous genes (genes having a common ancestry and diverged after a speciation) is crucial and lies at the heart of many genomic studies. However, existing methods that rely only on 1:1 orthologs to infer species trees are strongly restricted to a small set of allowed genes that provide information about the species tree. The use of larger gene sets that additionally consist of non-orthologous genes (e.g., so-called paralogous or xenologous genes) considerably increases the information about the evolutionary history of the respective species. In this work, we introduce a novel method to compute species phylogenies based on sequence data including orthologs, paralogs, or even xenologs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aho AV, Sagiv Y, Szymanski TG, Ullman JD (1981) Inferring a tree from lowest common ancestors with an application to the optimization of relational expressions. SIAM J Comput 10:405–421
Altenhoff AM, Dessimoz C (2009) Phylogenetic and functional assessment of orthologs inference projects and methods. PLoS Comput Biol 5:e1000262
Altenhoff AM, Dessimoz C (2012) Inferring orthology and paralogy. Evol Genomics Stat Comput Methods 1:259–279
Altenhoff AM, Schneider A, Gonnet GH, Dessimoz C (2011) OMA 2011: orthology inference among 1000 complete genomes. Nucleic Acids Res 39(suppl 1):D289–D294
Arvestad L, Berglund AC, Lagergren J, Sennblad B (2003) Bayesian gene/species tree reconciliation and orthology analysis using mcmc. Bioinformatics 19(suppl 1):i7–i15
Bansal MS, Eulenstein O (2013) Algorithms for genome-scale phylogenetics using gene tree parsimony. Comput Biol Bioinform IEEE/ACM Trans 10(4):939–956
Bansal MS, Alm EJ, Kellis M (2012) Efficient algorithms for the reconciliation problem with gene duplication, horizontal transfer and loss. Bioinformatics 28(12):i283–i291
Bansal MS, Alm EJ, Kellis M (2013) Reconciliation revisited: handling multiple optima when reconciling with duplication, transfer, and loss. J Comput Biol 20(10):738–754
Böcker S, Dress AWM (1998) Recovering symbolically dated, rooted trees from symbolic ultrametrics. Adv Math 138:105–125
Bogdanowicz D, Giaro K, Wróbel B (2012) Treecmp: Comparison of trees in polynomial time. Evol Bioinform Online 8:475
Boussau B, Szöllösi GJ, Duret L, Gouy M, Tannier E, Daubin V (2013) Genome-scale coestimation of species and gene trees. Genome Res 23(2):323–330
Byrka J, Gawrychowski P, Huber KT, Kelk S (2010a) Worst-case optimal approximation algorithms for maximizing triplet consistency within phylogenetic networks. J Discr Alg 8:65–75
Byrka J, Guillemot S, Jansson J (2010b) New results on optimizing rooted triplets consistency. Discr Appl Math 158:1136–1147
Chang WC, Górecki P, Eulenstein O (2013) Exact solutions for species tree inference from discordant gene trees. J Bioinform Comput Biol 11(05):1342005
Chaudhary R, Burleigh JG, Fernandez-Baca D (2013) Inferring species trees from incongruent multi-copy gene trees using the robinson-foulds distance. Algorithms Mol Biol 8:28
Chen F, Mackey AJ, Stoeckert CJ, Roos DS (2006) Orthomcl-db: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res 34(suppl 1):D363–D368
Corneil DG, Lerchs H, Steward Burlingham L (1981) Complement reducible graphs. Discr Appl Math 3:163–174
Corneil DG, Perl Y, Stewart LK (1985) A linear recognition algorithm for cographs. SIAM J Comput 14:926–934
Dalquen DA, Anisimova M, Gonnet GH, Dessimoz C (2012) ALF–a simulation framework for genome evolution. Mol Biol Evol 29(4):1115–1123
Dalquen DA, Altenhoff AM, Gonnet GH, Dessimoz C (2013) The impact of gene duplication, insertion, deletion, lateral gene transfer and sequencing error on orthology inference: a simulation study. PLoS ONE 8(2):e56925
DeLuca TF, Cui J, Jung JY, Gabriel KCS, Wall DP (2012) Roundup 2.0: enabling comparative genomics for over 1800 genomes. Bioinformatics 28(5):715–716
Doyon JP, Ranwez V, Daubin V, Berry V (2011) Models, algorithms and programs for phylogeny reconciliation. Briefings Bioinform 12(5):392–400
Eulenstein O, Huzurbazar S, Liberles DA (2010) Reconciling phylogenetic trees. Evol After Gene Duplication 185–206
Fitch WM (1970) Distinguishing homologous from analogous proteins. Syst Zool 19:99–113
Fitch WM (2000) Homology: a personal view on some of the problems. Trends Genet 16:227–231
Gabaldón T (2008) Large-scale assignment of orthology: back to phylogenetics? Genome Biol 9(10):235
Gabaldón T, Koonin EV (2013) Functional and evolutionary implications of gene orthology. Nat Rev Genet 14(5):360–366
Gerlt J, Babbitt P (2000) Can sequence determine function? Genome Biol 1(5):reviews0005.1–reviews0005.10
Goodman M, Czelusniak J, William Moore G, Romero-Herrera AE, Matsuda G (1979) Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences. Syst Biol 28(2):132–163
Gray GS, Fitch WM (1983) Evolution of antibiotic resistance genes: the DNA sequence of a kanamycin resistance gene from Staphylococcus aureus. Mol Biol Evol 1:57–66
Hellmuth M, Wieseke N (2015a) On symbolic ultrametrics, cotree representations, and cograph edge decompositions and partitions. In: Xu D, Du D, Du D (eds) Computing and combinatorics. Lecture notes in computer science, vol. 9198. Springer International Publishing, pp 609–623
Hellmuth M, Wieseke N (2015b) On tree representations of relations and graphs: symbolic ultrametrics and cograph edge decompositions. J Comb Opt CoRR abs/1509.05069 (Springer)
Hellmuth M, Hernandez-Rosales M, Huber KT, Moulton V, Stadler PF, Wieseke N (2013) Orthology relations, symbolic ultrametrics, and cographs. J Math Biol 66(1–2):399–420
Hellmuth M, Wieseke N, Lechner M, Lenhof H-P, Middendorf M, Stadler PF (2015) Phylogenomics with paralogs. Proc Natl Acad Sci 112(7):2058–2063
Hellmuth M, Stadler PF, Wieseke N (2016) The mathematics of xenology: Di-cographs, symbolic ultrametrics, 2-structures and tree-representable systems of binary relations. CoRR abs/1603.02467
Hernandez-Rosales M, Hellmuth M, Wieseke N, Huber KT, Moulton V, Stadler PF (2012) From event-labeled gene trees to species trees. BMC Bioinform 13(Suppl 19):S6
Hubbard TJ et al (2007) Ensembl 2007. Nucleic Acids Res 35(suppl 1):D610–D617
Jansson J, Lemence RS, Lingas A (2012) The complexity of inferring a minimally resolved phylogenetic supertree. SIAM J Comput 41:272–291
Jensen RA (2001) Orthologs and paralogs—we need to get it right. Genome Biol 2:8
Jensen LJ, Julien P, Kuhn M, Von Mering C, Muller J, Doerks T, Bork P (2008) eggnog: automated construction and annotation of orthologous groups of genes. Nucleic Acids Res 36(suppl 1):D250–D254
Koonin EV (2005) Orthologs, paralogs, and evolutionary genomics1. Annu Rev Genet 39(1):309–338
Kristensen DM, Wolf YI, Mushegian AR, Koonin EV (2011) Computational methods for gene orthology inference. Briefings Bioinform 12(5):379–391
Lechner M, Findeiß S, Steiner L, Marz M, Stadler PF, Prohaska SJ (2011) Proteinortho: detection of (co-)orthologs in large-scale analysis. BMC Bioinform 12:124
Lechner M, Hernandez-Rosales M, Doerr D, Wiesecke N, Thevenin A, Stoye J, Hartmann RK, Prohaska SJ, Stadler PF (2014) Orthology detection combining clustering and synteny for very large datasets. PLoS ONE 9(8):e105015
Li L, Stoeckert CJ, Roos DS (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13(9):2178–2189
Liu Y, Wang J, Guo J, Chen J (2012) Complexity and parameterized algorithms for cograph editing. Theoret Comput Sci 461:45–54
Mahmood K, Webb GI, Song J, Whisstock JC, Konagurthu AS (2012) Efficient large-scale protein sequence comparison and gene matching to identify orthologs and co-orthologs. Nucleic Acids Res 40(6):e44–e44
Östlund G, Schmitt T, Forslund K, Köstler T, Messina DN, Roopra S, Frings O, Sonnhammer ELL (2010) InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nucleic acids Res 38(suppl 1):D196–D203
Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Hohna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP (2012) MrBayes 3.2: efficient bayesian phylogenetic inference and model choice across a large model space. Syst Biol 61(3):539–542
Schneider A, Dessimoz C, Gonnet GH (2007) Oma browser—exploring orthologous relations across 352 complete genomes. Bioinformatics 23(16):2180–2182
Semple C, Steel M (2003) Phylogenetics. In: Oxford lecture series in mathematics and its applications, vol. 24. Oxford University Press, Oxford, UK
Shi G, Peng M-C, Jiang T (2011) Multimsoar 2.0: an accurate tool to identify ortholog groups among multiple genomes. PLoS ONE 6(6):e20892
Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics
Szöllösi GJ, Rosikiewicz W, Boussau B, Tannier E, Daubin V (2013) Efficient exploration of the space of reconciled gene trees. Syst Biol p syt054
Szöllösi GJ, Tannier E, Daubin V, Boussau B (2014) The inference of gene trees with species trees. Syst Biol p syu048
Tatusov RL, Galperin MY, Natale DA, Koonin EV (2000) The cog database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res 28(1):33–36
Trachana K, Larsson TA, Powell S, Chen WH, Doerks T, Muller J, Bork P (2011) Orthology prediction methods: a quality assessment using curated protein families. BioEssays 33(10):769–780
Ullah I, Parviainen P, Lagergren J (2015) Species tree inference using a mixture model. Mol Biol Evol 32(9):2469–2482
Van der Heijden R, Snel B, Van Noort V, Huynen M (2007) Orthology prediction at scalable resolution by phylogenetic tree analysis. BMC Bioinform 8(1):83
Van Iersel L, Kelk S, Mnich M (2009) Uniqueness, intractability and exact algorithms: reflections on level-k phylogenetic networks. J Bioinf Comp Biol 7:597–623
Wapinski I, Pfeffer A, Friedman N, Regev A (2007) Automatic genome-wide reconstruction of phylogenetic gene trees. Bioinformatics 23(13):i549–i558
Zhang J (2003) Evolution by gene duplication: an update. Trends Ecol Evol 18:292–298
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Hellmuth, M., Wieseke, N. (2016). From Sequence Data Including Orthologs, Paralogs, and Xenologs to Gene and Species Trees. In: Pontarotti, P. (eds) Evolutionary Biology. Springer, Cham. https://doi.org/10.1007/978-3-319-41324-2_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-41324-2_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41323-5
Online ISBN: 978-3-319-41324-2
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)