Abstract
The construction of a dendogram on a set of individuals is a key component of a genomewide association study. However, even with modern sequencing technologies the distances on the individuals required for the construction of such a structure may not always be reliable making it tempting to exclude them from an analysis. This, in turn, results in an input set for dendogram construction that consists of only partial distance information, which raises the following fundamental question. For what (proper) subsets of a dendogram’s leaf set can we uniquely reconstruct the dendogram from the distances that it induces on the elements of such a subset? By formalizing a dendogram in terms of an edge-weighted, rooted, phylogenetic tree on a pre-given finite set X with |X|≥3 whose edge-weighting is equidistant and subsets Y of X for which the distances between every pair of elements in Y is known in terms of sets of 2-subsets of X, we investigate this problem from the perspective of when such a tree is lassoed, that is, uniquely determined by the elements in . For this, we consider four different formalizations of the idea of “uniquely determining” giving rise to four distinct types of lassos. We present characterizations for all of them in terms of the child-edge graphs of the interior vertices of such a tree. Our characterizations imply in particular that in case the tree in question is binary, then all four types of lasso must coincide.
Similar content being viewed by others
Notes
The definition of a topological lasso for an unrooted phylogenetic tree on X is the same as that of a topological lasso for an X-tree, but with the requirement dropped that the two proper edge-weightings mentioned in that definition are equidistant.
References
De Soete, G. (1984). Ultrametric tree representations of incomplete dissimilarity data. J. Classif., 1, 235–242.
Deza, M. M., & Rosenberg, I. G. (2000). n-semimetrics. Eur. J. Comb., 21(6), 797–806.
Diestel, R. (2005). Graph theory. Heidelberg: Springer. Electronic Edition.
Dress, A., Huber, K. T., Koolen, J., Moulton, V., & Spillner, A. (2012a). Basic phylogenetic combinatorics. Cambridge: Cambridge University Press.
Dress, A. W. M., Huber, K. T., & Steel, M. (2012b). ‘Lassoing’ a phylogenetic tree I: basic properties, shellings and covers. J. Math. Biol., 65, 77–105.
Felsenstein, F. (2003). Inferring phylogenies. Sunderland: Sinauer Associates.
Harper, A. L., Trick, M., Higgins, J., Fraser, F., Clissold, L., Wells, R., Hattori, C., Werner, P., & Bancroft, I. (2012). Associative transcriptomics of traits in the polyploid crop species Brassica napus. Nat. Biotechnol., 30(8), 798–802.
Herrmann, S., Huber, K. T., Moulton, V., & Spillner, A. (2012). Recognizing treelike k-dissimilarities. J. Classif., 29(3), 321–340.
Huber, K. T., & Steel, M. (2013). Reconstructing fully-resolved trees from triplet cover distances. Submitted.
Muir, W. M., Wong, G. K. S., Zhang, Y., Wang, J., Groenen, M. A. M., Crooijmans, R. P. M. A., Megens, H. J., Zhang, H., Okimoto, R., Vereijken, A., Jungerius, A., Albers, G. A. A., Lawley, C. T., Delany, M. E., MacEachern, S., & Cheng, H. H. (2008). Genome-wide assessment of worldwide chicken SNP genetic diversity indicates significant absence of rare alleles in commercial breeds. Proc. Natl. Acad. Sci., 105(45), 17312–17317.
Pachter, L., & Speyer, D. (2004). Reconstructing trees from subtree weights. Appl. Math. Lett., 17, 615–621.
Philippe, H., Snell, E. A., Bapteste, E., Lopez, P., Holland, P. H., & Casane, D. (2004). Phylogenomics of eukaryotes: impact of missing data on large alignments. Mol. Biol. Evol., 21, 1740–1752.
Sanderson, M. J., McMahon, M. M., & Steel, M. (2010). Phylogenomics with incomplete taxon coverage: the limits to inference. BMC Evol. Biol., 10, 155.
Semple, C., & Steel, M. (2003). Phylogenetics. Oxford: Oxford University Press.
Steel, M., & Sanderson, M. J. (2010). Characterizing phylogenetically decisive taxon coverage. Appl. Math. Lett., 23, 82–86.
Warrens, M. J. (2010). n-way metrics. J. Classif., 27(2), 173–190.
Zhang, Z., Ersoz, E., Lai, C. Q., Todhunter, R. J., Tiwari, H. K., Gore, M. A., Bradbury, P. J., Yu, J., Arnett, D. K., Ordovas, J. M., & Buckler, E. (2010). Mixed linear model approach adapted for genome-wide association studies. Nat. Genet., 42(4), 355–360.
Acknowledgement
A.-A. Popescu thanks the Norwich Research Park (NRP) for support. The authors thank the referees for their helpful comments.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Huber, K.T., Popescu, AA. Lassoing and Corralling Rooted Phylogenetic Trees. Bull Math Biol 75, 444–465 (2013). https://doi.org/10.1007/s11538-013-9815-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11538-013-9815-8