Advertisement

The -Cophenetic Metric for Phylogenetic Trees As an Interleaving Distance

  • Elizabeth MunchEmail author
  • Anastasios Stefanou
Chapter
Part of the Association for Women in Mathematics Series book series (AWMS, volume 17)

Abstract

There are many metrics available to compare phylogenetic trees since this is a fundamental task in computational biology. In this paper, we focus on one such metric, the -cophenetic metric introduced by Cardona et al. This metric works by representing a phylogenetic tree with n labeled leaves as a point in \(\mathbb {R}^{n(n+1)/2}\) known as the cophenetic vector, then comparing the two resulting Euclidean points using the distance. Meanwhile, the interleaving distance is a formal categorical construction generalized from the definition of Chazal et al., originally introduced to compare persistence modules arising from the field of topological data analysis. We show that the -cophenetic metric is an example of an interleaving distance. To do this, we define phylogenetic trees as a category of merge trees with some additional structure, namely, labelings on the leaves plus a requirement that morphisms respect these labels. Then we can use the definition of a flow on this category to give an interleaving distance. Finally, we show that, because of the additional structure given by the categories defined, the map sending a labeled merge tree to the cophenetic vector is, in fact, an isometric embedding, thus proving that the -cophenetic metric is an interleaving distance.

Keywords

Topological data analysis Labeled merge tree Phylogenetic tree Interleaving distance Category with a flow 

Notes

Acknowledgements

The authors gratefully thank two anonymous reviewers whose feedback substantially increased the quality of the paper. The work of EM was supported in part by NSF Grant Nos. DMS-1800446 and CMMI-1800466. AS was partially supported both by the National Science Foundation through grant NSF-CCF-1740761 TRIPODS TGDA@OSU and by the Mathematical Biosciences Institute at the Ohio State University.

References

  1. 1.
    P.K. Agarwal, K. Fox, A. Nath, A. Sidiropoulos, Y. Wang, Computing the Gromov-Hausdorff distance for metric trees. ACM Trans. Algorithms 14(2), 1–20 (2018). https://doi.org/10.1145/3185466 MathSciNetCrossRefGoogle Scholar
  2. 2.
    R. Alberich, G. Cardona, F. Rosselló, G. Valiente, An algebraic metric for phylogenetic trees. Appl. Math. Lett. 22(9), 1320–1324 (2009). https://doi.org/10.1016/j.aml.2009.03.003 MathSciNetCrossRefGoogle Scholar
  3. 3.
    A. Babu, Zigzag coarsenings, mapper stability and gene network analyses, Ph.D. thesis, Stanford University, 2013Google Scholar
  4. 4.
    U. Bauer, X. Ge, Y. Wang: measuring distance between Reeb graphs, in Annual Symposium on Computational Geometry - SOCG 14 (ACM Press, New York, 2014). https://doi.org/10.1145/2582112.2582169 Google Scholar
  5. 5.
    U. Bauer, E. Munch, Y. Wang, Strong equivalence of the interleaving and functional distortion metrics for Reeb graphs, in 31st International Symposium on Computational Geometry (SoCG 2015), Leibniz International Proceedings in Informatics (LIPIcs), vol. 34, pp. 461–475 (Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, 2015). https://doi.org/10.4230/LIPIcs.SOCG.2015.461. http://drops.dagstuhl.de/opus/volltexte/2015/5146
  6. 6.
    U. Bauer, B. Di Fabio, C. Landi, An edit distance for Reeb graphs (2016). https://doi.org/10.6092/unibo/amsacta/4705
  7. 7.
    K. Beketayev, D. Yeliussizov, D. Morozov, G.H. Weber, B. Hamann, Measuring the distance between merge trees, in Mathematics and Visualization (Springer, Cham, 2014), pp. 151–165. https://doi.org/10.1007/978-3-319-04099-8_10 zbMATHGoogle Scholar
  8. 8.
    S. Biasotti, D. Giorgi, M. Spagnuolo, B. Falcidieno, Reeb graphs for shape analysis and applications. Theor. Comput. Sci. Comput. Algebraic Geom. Appl. 392(13), 5–22 (2008). https://doi.org/10.1016/j.tcs.2007.10.018. http://www.sciencedirect.com/science/article/pii/S0304397507007396
  9. 9.
    L.J. Billera, S.P. Holmes, K. Vogtmann, Geometry of the space of phylogenetic trees. Adv. Appl. Math. 27(4), 733–767 (2001). https://doi.org/10.1006/aama.2001.0759 MathSciNetCrossRefGoogle Scholar
  10. 10.
    H.B. Bjerkevik, M.B. Botnan, Computational complexity of the interleaving distance, in 34th International Symposium on Computational Geometry (SoCG 2018) (Schloss Dagstuhl - Leibniz-Zentrum für Informatik, Wadern, 2018)Google Scholar
  11. 11.
    D. Bryant, J. Tsang, P.E. Kearney, M. Li, Computing the quartet distance between evolutionary trees, in Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’00, pp. 285–286 (Society for Industrial and Applied Mathematics, Philadelphia, 2000). http://dl.acm.org/citation.cfm?id=338219.338264
  12. 12.
    P. Bubenik, J.A. Scott, Categorification of persistent homology. Discret. Comput. Geom. 51(3), 600–627 (2014). https://doi.org/10.1007/s00454-014-9573-x MathSciNetCrossRefGoogle Scholar
  13. 13.
    P. Bubenik, V. de Silva, J. Scott, Metrics for generalized persistence modules. Found. Comput. Math. 15(6), 1501–1531 (2014). https://doi.org/10.1007/s10208-014-9229-5 MathSciNetCrossRefGoogle Scholar
  14. 14.
    G. Cardona, A. Mir, F. Rosselló, L. Rotger, D. Sánchez, Cophenetic metrics for phylogenetic trees, after Sokal and Rohlf. BMC Bioinforma. 14(1), 3 (2013). https://doi.org/10.1186/1471-2105-14-3
  15. 15.
    M. Carrière, S. Oudot, Structure and stability of the one-dimensional mapper. Found. Comput. Math. (2017). https://doi.org/10.1007/s10208-017-9370-z
  16. 16.
    F. Chazal, D. Cohen-Steiner, M. Glisse, L.J. Guibas, S.Y. Oudot, Proximity of persistence modules and their diagrams, in Proceedings of the 25th Annual Symposium on Computational Geometry, SCG ’09, pp. 237–246 (ACM, New York, 2009). https://doi.org/10.1145/1542362.1542407. http://doi.acm.org/10.1145/1542362.1542407
  17. 17.
    F. Chazal, V. de Silva, M. Glisse, S. Oudot, The Structure and Stability of Persistence Modules (Springer, New York, 2016). https://doi.org/10.1007/978-3-319-42545-0 CrossRefGoogle Scholar
  18. 18.
    J. Curry, Sheaves, cosheaves and applications, Ph.D. thesis, University of Pennsylvania, 2014Google Scholar
  19. 19.
    V. de Silva, E. Munch, A. Patel, Categorified Reeb graphs. Discret. Comput. Geom. 1–53 (2016). https://doi.org/10.1007/s00454-016-9763-9
  20. 20.
    V. de Silva, E. Munch, A. Stefanou, Theory of interleavings on categories with a flow. Theory Appl. Categories 33(21), 583–607 (2018). http://www.tac.mta.ca/tac/volumes/33/21/33-21.pdf MathSciNetzbMATHGoogle Scholar
  21. 21.
    B. Di Fabio, C. Landi, The edit distance for Reeb graphs of surfaces. Discrete Comput. Geom. 55(2), 423–461 (2016). https://doi.org/10.1007/s00454-016-9758-6 MathSciNetCrossRefGoogle Scholar
  22. 22.
    P.W. Diaconis, S.P. Holmes, Matchings and phylogenetic trees. Proc. Natl. Acad. Sci. 95(25), 14600–14602 (1998). http://www.pnas.org/content/95/25/14600.abstract MathSciNetCrossRefGoogle Scholar
  23. 23.
    J. Eldridge, M. Belkin, Y. Wang, Beyond Hartigan consistency: merge distortion metric for hierarchical clustering, in Proceedings of The 28th Conference on Learning Theory, ed. by P. Grünwald, E. Hazan, S. Kale. Proceedings of Machine Learning Research, vol. 40, pp. 588–606 (PMLR, Paris, 2015). http://proceedings.mlr.press/v40/Eldridge15.html
  24. 24.
    H. Fernau, M. Kaufmann, M. Poths, Comparing trees via crossing minimization. J. Comput. Syst. Sci. 76(7), 593–608 (2010). https://doi.org/10.1016/j.jcss.2009.10.014 MathSciNetCrossRefGoogle Scholar
  25. 25.
    F.W. Lawvere, Metric spaces, generalized logic, and closed categories. Rendiconti del seminario matématico e fisico di Milano 43(1), 135–166 (1973). Republished in: Reprints in Theory and Applications of Categories, No. 1 (2002), pp. 1–37Google Scholar
  26. 26.
    B. Lin, A. Monod, R. Yoshida, Tropical foundations for probability & statistics on phylogenetic tree space (2018). arXiv:1805.12400v2Google Scholar
  27. 27.
    T. Mailund, C.N.S. Pedersen, QDist–quartet distance between evolutionary trees. Bioinformatics 20(10), 1636–1637 (2004). https://doi.org/10.1093/bioinformatics/bth097 CrossRefGoogle Scholar
  28. 28.
    D. Morozov, K. Beketayev, G. Weber, Interleaving distance between merge trees, in Proceedings of TopoInVis (2013)Google Scholar
  29. 29.
    V. Moulton, T. Wu, A parsimony-based metric for phylogenetic trees. Adv. Appl. Math. 66, 22–45 (2015). https://doi.org/10.1016/j.aam.2015.02.002 MathSciNetCrossRefGoogle Scholar
  30. 30.
    E. Munch, B. Wang, Convergence between categorical representations of Reeb space and mapper, in 32nd International Symposium on Computational Geometry (SoCG 2016) ed. by S. Fekete, A. Lubiw Leibniz International Proceedings in Informatics (LIPIcs), vol. 51, pp. 53:1–53:16 (Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, 2016). https://doi.org/10.4230/LIPIcs.SoCG.2016.53. http://drops.dagstuhl.de/opus/volltexte/2016/5945
  31. 31.
    M. Owen, Computing geodesic distances in tree space. SIAM J. Discret. Math. 25(4), 1506–1529 (2011). https://doi.org/10.1137/090751396 MathSciNetCrossRefGoogle Scholar
  32. 32.
    G. Reeb, Sur les points singuliers d’une forme de pfaff complèment intégrable ou d’une fonction numérique. C.R. Acad. Sci. 222, 847–849 (1946)Google Scholar
  33. 33.
    E. Riehl, Category Theory in Context (Courier Dover Publications, New York, 2017)zbMATHGoogle Scholar
  34. 34.
    D. Robinson, L. Foulds, Comparison of weighted labelled trees, in Combinatorial Mathematics VI (Springer, Berlin, 1979), pp. 119–126. https://doi.org/10.1007/BFb0102690 Google Scholar
  35. 35.
    D. Robinson, L. Foulds, Comparison of phylogenetic trees. Math. Biosci. 53(1–2), 131–147 (1981). https://doi.org/10.1016/0025-5564(81)90043-2 MathSciNetCrossRefGoogle Scholar
  36. 36.
    G. Singh, F. Mémoli, G.E. Carlsson, Topological methods for the analysis of high dimensional data sets and 3D object recognition, in SPBG, pp. 91–100 (2007)Google Scholar
  37. 37.
    A. Stefanou, Dynamics on categories and applications, Ph.D. thesis, University at Albany, State University of New York, 2018Google Scholar
  38. 38.
    G. Valiente, An efficient bottom-up distance between trees, in SPIRE (IEEE, Piscataway, 2001), p. 0212Google Scholar

Copyright information

© The Author(s) and the Association for Women in Mathematics 2019

Authors and Affiliations

  1. 1.Department of Computational Mathematics, Science and EngineeringDepartment of Mathematics, Michigan State UniversityEast LansingUSA
  2. 2.Mathematical Biosciences Institute, Department of MathematicsThe Ohio State UniversityColumbusUSA

Personalised recommendations