Abstract
The problem of reconstructing the duplication tree of a set of tandemly repeated sequences which are supposed to have arisen by unequal recombination, was first introduced by Fitch (1977), and has recently received a lot of attention. In this paper, we deal with the restricted problem of reconstructing single copy duplication trees. We describe an exact and polynomial distance based algorithm for solving this problem, the parsimony version of which has previously been shown to be NP-hard (like most evolutionary tree reconstruction problems). This algorithm is based on the minimum evolution principle, and thus involves selecting the shortest tree as being the correct duplication tree. After presenting the underlying mathematical concepts behind the minimum evolution principle, and some of its benefits (such as consistency), we provide a new recurrence equation to estimate the tree length using ordinary least-squares, given a matrix of pairwise distances between the copies. We then show how this equation naturally forms the dynamic programming framework on which our algorithm is based, and provide an implementation in O(n 3) time and O(n 2) space, where n is the number of copies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ohno, S.: Evolution by gene duplication. Springer Verlag, New York (1970)
Smith, G.: Evolution of repeated dna sequences by unequal crossover. Science 191 (1976) 528–535
Fitch, W.: Phylogenies constrained by cross-over process as illustrated by human hemoglobins in a thirteen-cycle, eleven amino-acid repeat in human apolipoprotein A-I. Genetics 86 (1977) 623–644
Jeffreys, A., Harris, S.: Processes of gene duplication. Nature 296 (1981) 9–10
Elemento, O., Gascuel, O., Lefranc, M.P.: Reconstruction de l’histoire de duplication de gènes répétés en tandem. In: Actes des Journées Ouvertes Biologie Informatique Mathématiques. (2001) 9–11
Elemento, O., Gascuel, O., Lefranc, M.P.: Reconstructing the duplication history of tandemly repeated genes. Molecular Biological Evolution 19 (2002) 278–288
Benson, G., Dong, L.: Reconstructing the duplication history of a tandem repeat. In Lengauer, T., Schneider, R., Bork, P., Brutlag, D., Glasgow, J., Mewes, H.W., Zimmer, R., eds.: Proceedings of Intelligent Systems in Molecular Biology ISMB’99. (1999) 44–53
Tang, M., Waterman, M., Yooseph, S.: Zinc finger gene clusters and tandem gene duplication. In El-Mabrouk, N., Lengauer, T., Sankoff, D., eds.: Proceedings of RECOMB 2001. (2001) 297–304
Tang, M., Waterman, M., Yooseph, S.: Zinc finger gene clusters and tandem gene duplication. Journal of Computational Biology 9 (2002) 429–446
Jaitly, D., Kearney, P., Lin, G., Ma, B.: Methods for reconstructing the history of tandem repeats and their application to the human genome. Journal of Computer and System Sciences 65 (2002) 494–507.
Zhang, J., Nei, M.: Evolution of antennapedia-class homeobox genes. Genetics 142 (1996) 295–303
Wang, L., Gusfield, D.: Improved approximation algorithms for tree alignment. Journal of Algorithms 25 (1997) 255–273
Kidd, K., Sgaramella-Zonta, L.: Phylogenetic analysis: concepts and methods. American Journal of Human Genetics 23 (1971) 235–252
Rzhetsky, A., Nei, M.: Theoretical foundation of the minimum-evolution method of phylogenetic inference. Molecular Biological Evolution 10 (1993) 173–1095
Denis, F., Gascuel, O.: On the consistency of the minimum evolution principle of phylogenetic inference. Computational Molecular Biology Series, Issue IV. Discrete Applied Mathematics 127 (2003) 63–77
Felsenstein, J.: Cases in which parsimony or compatibility methods will be positively misleading. Systematic Zoology 27 (1978) 401–410
Vardi, I.: Computational Recreations in Mathematica. Addison-Wesley (1991)
Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4 (1987) 406–425
Vach, W.: Least-squares approximation of additive trees. In Opitz, O., ed.: Conceptual and Numerical Analysis of Data, Heidelberg, Springer (1989) 230–238
Gascuel, O.: Concerning the NJ algorithm and its unweighted version, UNJ. In Mirkin, B., McMorris, F., Roberts, F., Rzhetsky, A., eds.: Mathematical Hierarchies and Biology. DIMACS Series in Discrete Mathematics and Theoretical Computer Science. Amer. Math. Society, Providence (1997) 149–170
Barthelemy, J., Guénoche, A.: Trees and proximity representations. Wiley and Sons (1991)
Elemento, O., Gascuel, O.: A fast and accurate distance-based algorithm to reconstruct tandem duplicatin trees. Bioinformatics 18 (2002) S92–S99 Proceedings of European Conference on Computational Biology (ECCB2002).
Fitch, W., Margoliash, E.: Construction of phylogenetic trees. Science 155 (1967) 279–284
Felsenstein, J.: An alternating least squares approach to inferring phylogenies from pairwise distances. Systematic Biology 46 (1997) 101–111
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Elemento, O., Gascuel, O. (2003). An Exact and Polynomial Distance-Based Algorithm to Reconstruct Single Copy Tandem Duplication Trees. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds) Combinatorial Pattern Matching. CPM 2003. Lecture Notes in Computer Science, vol 2676. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44888-8_8
Download citation
DOI: https://doi.org/10.1007/3-540-44888-8_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40311-1
Online ISBN: 978-3-540-44888-4
eBook Packages: Springer Book Archive