Abstract
To infer a phylogenetic tree from a set of DNA sequences, typically a multiple alignment is first used to obtain homologous bases. The inferred phylogeny can be very sensitive to how the alignment was created. We develop tools for analyzing the robustness of phylogeny to perturbations in alignment parameters in the NW algorithm. Our main tool is parametric alignment, with novel improvements that are of general interest in parametric inference. Using parametric alignment and a Gaussian distribution on alignment parameters, we derive probabilities of optimal alignment summaries and inferred phylogenies. We apply our method to analyze intronic sequences from Drosophila flies. We show that phylogeny estimates can be sensitive to the choice of alignment parameters, and that parametric alignment elucidates the relationship between alignment parameters and reconstructed trees.
Similar content being viewed by others
References
Bárány, I., & Larman, D. G. (1998). The convex hull of the integer points in a large ball. Math. Ann., 312, 167–181.
Beerenwinkel, N., Pachter, L., Sturmfels, B., Elena, S., & Lenski, R. (2007). Analysis of epistatic interactions and fitness landscapes using a new geometric approach. BMC Evol. Biol., 7(1), 60.
Carrillo, H., & Lipman, D. (1988). The multiple sequence alignment problem in biology. SIAM J. Appl. Math., 48(5), 1073–1082.
Chiarmonte, F., Yap, V. B., & Miller, W. (2002). Scoring pairwise genomic sequence alignments. Pacific Symp Biocomput, (7), 115–126.
Daskalakis, C., & Roch, S. (2010). Alignment-free phylogenetic reconstruction. In Proceedings of RECOMB 2010. To appear.
Dewey, C. N., Huggins, P. M., Woods, K., Sturmfels, B., & Pachter, L. (2006). Parametric alignment of Drosophila genomes. PLoS Comput. Biol., 2(6), e73.
Dobkin, D., Edelsbrunner, H., & Yap, C. K. (1990). Probing convex polytopes. In Cox & Wilfong (Eds.), Autonomous robot vehicles (pp. 326–341). New York: Springer.
Edelsbrunner, H. (1987). Algorithms in combinatorial geometry. New York: Springer.
Felsenstein, J. (1985). Confidence limits on phylogenies: an approach using the bootstrap. Evolution, 39(4), 783–791.
Fernández-Baca, D., Seppalainen, T., & Slutzi, G. (2004). Parametric multiple sequence alignment and phylogeny construction. J. Discrete Algorithms, 2(2), 271–287.
Fernández-Baca, D., & Venkatachalam, B. (2006). Parametric sequence alignment. In S. Aluru (Ed.), Handbook of computational molecular biology. New York: Chapman & Hall.
Gawrilow, E., & Joswig, M. (2000). Polymake: an approach to modular software design in computational geometry. In G. Kalai & G. M. Ziegler (Eds.), Proceedings of the 17th annual symposium on computational geometry (pp. 43–74). Basel: Birkhäuser.
Gusfield, D., Balasubramanian, K., & Naor, D. (1994). Parametric optimization and sequence alignment. Algorithmica, 12, 312–326.
Gusfield, D., & Stelling, P. (1996). Parametric and inverse-parametric sequence alignment with XPARAL. Methods Enzymol., 266, 481–494.
Guyon, F., Brochier-Armanet, C., & Guénoche, A. (2009). Comparison of alignment free string distances for complete genome phylogeny. Adv. Data Anal. Classif., 3, 95–108.
Hein, J. J. (1990). A unified approach to phylogenies and alignments. Methods Enzymol., 183, 625–644.
Higgins, D., Thompson, J., Gibson, T., & Thompson, J. D. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 4673–4680.
Huggins, P. (2006). iB4e: A software framework for parametrizing specialized LP problems. In A. Iglesias & N. Takayama (Eds.), Proceedings of ICMS 2006 (pp. 245–247). New York: Springer.
Huggins, P. (2008). Polytopes in computational biology. PhD dissertation, University of California, Berkeley.
Huggins, P., Pachter, L., & Sturmfels, B. (2007). Towards the Human Genotope. Bull. Math. Biol., 69(8), 2723–2725.
Konagurthu, A. S., & Stuckey, P. J. (2006). Optimal sum-of-pairs multiple sequence alignment using incremental Carrillo-and-Lipman bounds. J. Bioinform. Comput. Biol., 13(3), 668–685.
Liu, K., Raghavan, S., Nelesen, S., Linder, C. R., & Warnow, T. (2009). Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science, 324(5934), 561–1564.
Lunter, G., Miklos, I., Drummond, A., Jensen, J. L., & Hein, J. (2005). Bayesian coestimation of phylogeny and sequence alignment. BMC Bioinform., 6, 83.
Needleman, S. B., & Wunsch, C. D. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol., 48(3), 443–453.
Pachter, L., & Sturmfels, B. (2004). Parametric inference for biological sequence analysis. Proc. Natl. Acad. Sci. USA, 101(46), 16138–16143.
Pachter, L., & Sturmfels, B. (Eds.) (2005). Algebraic statistics for computational biology. Cambridge: Cambridge University Press.
Pollard, D. A., Moses, A. M., Iyer, V. N., & Eisen, M. B. (2006). Widespread discordance of gene trees with species trees in Drosophila: evidence for incomplete lineage sorting. PLoS Genetics, 2(10), e173.
Redelings, B. D., & Suchard, M. A. (2005). Joint Bayesian estimation of alignment and phylogeny. Syst. Biol., 54(3), 401–418.
Sankoff, D. (1975). Minimal mutation trees of sequences. SIAM J. Appl. Math., 78, 35–42.
Sankoff, D., Cedergren, R. J., & Lapalme, G. (1976). Frequency of insertion–deletion, transversion, and transition in the evolution of 5S ribosomal RNA. J. Mol. Evol., 7, 133–149.
States, D. J., Gish, W., & Altschul, S. F. (1991). Improved sensitivity of nucleic acid database searches using application-specific scoring matrices. Methods Enzymol., 3(1), 66–70.
Suchard, M. A., & Redelings, B. D. (2006). Bali-Phy: simultaneous Bayesian inference of alignment and phylogeny. Bioinformatics, 22(16), 2047–2048.
Swafford, D. (2007). Paup*. http://paup.csit.fsu.edu/.
Vinzant, C. (2009). Lower bounds for optimal alignments of binary sequences. Discrete Appl. Math., 157(15), 3341–3346.
Waterman, M. S., Eggert, M., & Lander, E. (1992). Parametric sequence comparisons. Proc. Natl. Acad. Sci. USA, 89(13), 6090–6093.
Vinga, S., & Almeida, J. (2003). Alignment-free sequence comparison—a review. Bioinformatics, 19(4), 513–523.
Wheeler, W. C. (1995). Sequence alignment, parameter sensitivity, and the phylogenetic analysis of molecular data. Syst. Biol., 44(3), 321–331.
Ziegler, G. M. (1995). Lectures on polytopes. New York: Springer.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Malaspinas, AS., Eriksson, N. & Huggins, P. Parametric Analysis of Alignment and Phylogenetic Uncertainty. Bull Math Biol 73, 795–810 (2011). https://doi.org/10.1007/s11538-010-9610-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11538-010-9610-8