Abstract
Tree comparison costs are sophisticated tools used to compare the results of different phylogenetic hypotheses and reconstruction methods and to evaluate the robustness of a tree to data perturbations. The Robinson-Foulds distance is a widely used measure for comparing the topologies of two trees, but it is highly sensitive to tree error. Consequently, tree differences may be over-estimated, leading to incorrect inference. An approach to overcome this shortcoming is the Cluster Affinity distance, which is a refinement of the Robinson-Foulds distance. These distances are symmetric and thus designed to compare the same type of trees. However, it is common to compare different types of trees, such as gene trees compared with species trees, or the integration of different datasets into a supertree: these comparisons are inherently asymmetric. Here, we introduce the asymmetric Cluster Affinity cost, a relaxation of the original Affinity cost to compare heterogeneous trees. We demonstrate that the characteristics of this cost are similar to the symmetric Cluster Affinity distance. Further, for the asymmetric affinity cost we describe efficient algorithms, derive the exact diameters, and use these to standardize the cost to be applicable in practice.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Allen, B.L., Steel, M.: Subtree transfer operations and their induced metrics on evolutionary trees. Ann. Comb. 5, 1–15 (2001)
Bininda-Emonds, O.R.: Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, vol. 4. Springer, Dordrecht (2004). https://doi.org/10.1007/978-1-4020-2330-9
Bogdanowicz, D., Giaro, K.: Matching split distance for unrooted binary phylogenetic trees. IEEE/ACM Trans. Comput. Biol. Bioinf. 9(1), 150–160 (2011)
Bogdanowicz, D., Giaro, K.: On a matching distance between rooted phylogenetic trees. Int. J. Appl. Math. Comput. Sci. 23(3), 669–684 (2013)
Bogdanowicz, D., Giaro, K.: Comparing phylogenetic trees by matching nodes using the transfer distance between partitions. J. Comput. Biol. 24(5), 422–435 (2017)
Bordewich, M., Semple, C.: On the computational complexity of the rooted subtree prune and regraft distance. Ann. Comb. 8, 409–423 (2005). https://doi.org/10.1007/s00026-004-0229-z
Chaudhary, R., Burleigh, J.G., Eulenstein, O.: Efficient error correction algorithms for gene tree reconciliation based on duplication, duplication and loss, and deep coalescence. BMC Bioinform. 13, 1–10 (2012)
Estabrook, G.F., McMorris, F., Meacham, C.A.: Comparison of undirected phylogenetic trees based on subtrees of four evolutionary units. Syst. Zool. 34(2), 193–200 (1985)
Giardina, F., Romero-Severson, E.O., Albert, J., Britton, T., Leitner, T.: Inference of transmission network structure from HIV phylogenetic trees. PLoS Comput. Biol. 13(1), e1005316 (2017)
Kulkarni, A., Sabetpour, N., Markin, A., Eulenstein, O., Li, Q.: CPTAM: constituency parse tree aggregation method. In: SDM (2022)
Lin, Y., Rajan, V., Moret, B.M.: A metric for phylogenetic trees based on matching. IEEE/ACM Trans. Comput. Biol. Bioinf. 9(4), 1014–1022 (2011)
Lozano-Fernandez, J.: A practical guide to design and assess a phylogenomic study. Genome Biol. Evol. 14(9), evac129 (2022)
Moon, J., Eulenstein, O.: The cluster affinity distance for phylogenies. In: Cai, Z., Skums, P., Li, M. (eds.) ISBRA 2019. LNCS, vol. 11490, pp. 52–64. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20242-2_5
Page, R.D.M.: Modified mincut supertrees. In: Guigó, R., Gusfield, D. (eds.) WABI 2002. LNCS, vol. 2452, pp. 537–551. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45784-4_41
Prum, R.O., et al.: A comprehensive phylogeny of birds (Aves) using targeted next-generation DNA sequencing. Nature 526(7574), 569–573 (2015)
Robinson, D.F., Foulds, L.R.: Comparison of phylogenetic trees. Math. Biosci. 53(1–2), 131–147 (1981)
Russo, C., Takezaki, N., Nei, M.: Efficiencies of different genes and different tree-building methods in recovering a known vertebrate phylogeny. Mol. Biol. Evol. 13(3), 525–536 (1996)
Shen, X.X., Steenwyk, J.L., Rokas, A.: Dissecting incongruence between concatenation-and quartet-based approaches in phylogenomic data. Syst. Biol. 70(5), 997–1014 (2021)
Smith, M.R.: Information theoretic generalized Robinson-Foulds metrics for comparing phylogenetic trees. Bioinformatics 36(20), 5007–5013 (2020)
Steel, M.A., Penny, D.: Distributions of tree comparison metrics-some new results. Syst. Biol. 42(2), 126–141 (1993)
Swenson, M.S., Suri, R., Linder, C.R., Warnow, T.: An experimental study of quartets MaxCut and other supertree methods. In: Moulton, V., Singh, M. (eds.) WABI 2010. LNCS, vol. 6293, pp. 288–299. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15294-8_24
Waterman, M.S., Smith, T.F.: On the similarity of dendrograms. J. Theor. Biol. 73(4), 789–800 (1978)
Wickett, N.J., et al.: Phylotranscriptomic analysis of the origin and early diversification of land plants. Proc. Natl. Acad. Sci. 111(45), E4859–E4868 (2014)
Yang, Z., Rannala, B.: Molecular phylogenetics: principles and practice. Nat. Rev. Genet. 13(5), 303–314 (2012)
Acknowledgements
We thank the reviewers for their constructive and valuable comments. This work was supported in part by the U.S. Department of Agriculture (USDA) Agricultural Research Service (ARS project number 5030-32000-231-000-D, and 5030-32000-231-095-S). The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication. Mention of trade names or commercial products in this article is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the USDA. USDA is an equal opportunity provider and employer. PG was supported by the grant of National Science Centre 2017/27/B/ST6/02720.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wagle, S., Markin, A., Górecki, P., Anderson, T., Eulenstein, O. (2023). The Asymmetric Cluster Affinity Cost. In: Jahn, K., Vinař, T. (eds) Comparative Genomics. RECOMB-CG 2023. Lecture Notes in Computer Science(), vol 13883. Springer, Cham. https://doi.org/10.1007/978-3-031-36911-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-36911-7_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36910-0
Online ISBN: 978-3-031-36911-7
eBook Packages: Computer ScienceComputer Science (R0)