Efficient Enumeration of the Directed Binary Perfect Phylogenies from Incomplete Data
We study a character-based phylogeny reconstruction problem when an incomplete set of data is given. More specifically, we consider the situation under the directed perfect phylogeny assumption with binary characters in which for some species the states of some characters are missing. Our main object is to give an efficient algorithm to enumerate (or list) all perfect phylogenies that can be obtained when the missing entries are completed. While a simple branch-and-bound algorithm (B&B) shows a theoretically good performance, we propose another approach based on a zero-suppressed binary decision diagram (ZDD). Experimental results on randomly generated data exhibit that the ZDD approach outperforms B&B. We also prove that counting the number of phylogenetic trees consistent with a given data is #P-complete, thus providing an evidence that an efficient random sampling seems hard.
Unable to display preview. Download preview PDF.
- 4.Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) VLDB, pp. 487–499. Morgan Kaufmann (1994)Google Scholar
- 5.Minato, S.: Zero-suppressed BDDs for set manipulation in combinatorial problems. In: DAC, pp. 272–277. ACM Press (1993)Google Scholar
- 11.Kiyomi, M., Okamoto, Y., Saitoh, T.: Efficient enumeration of the directed binary perfect phylogenies from incomplete data, arXiv:1203.3284 (2012)Google Scholar
- 12.Jansson, J.: Directed perfect phylogeny (binary characters). In: Kao, M.Y. (ed.) Encyclopedia of Algorithms, pp. 246–248. Springer, Heidelberg (2008)Google Scholar
- 14.Hudson, R.R.: Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18, 337–338 (2002), http://home.uchicago.edu/~rhudson1/source/mksamples.html CrossRefGoogle Scholar