Automated Segmentation of DNA Sequences with Complex Evolutionary Histories

  • Broňa Brejová
  • Michal Burger
  • Tomáš Vinař
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6833)


Most algorithms for reconstruction of evolutionary histories involving large-scale events such as duplications, deletions or rearrangements, work on sequences of predetermined markers, for example protein coding genes or other functional elements. However, markers defined in this way ignore information included in non-coding sequences, are prone to errors in annotation, and may even introduce artifacts due to partial gene copies or chimeric genes.

We propose the problem of sequence segmentation where the goal is to automatically select suitable markers based on sequence homology alone. We design an algorithm for this problem which can tolerate certain amount of inaccuracies in the input alignments and still produce segmentation of the sequence to markers with high coverage and accuracy. We test our algorithm on several artificial and real data sets representing complex clusters of segmental duplications. Our software is available at


Segmental Duplication Automate Segmentation Pairwise Alignment True Segmentation Sequence Segmentation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Adam, Z., Sankoff, D.: The ABCs of MGR with DCJ. Evolutionary Bioinformatics Online 4, 69–74 (2008)Google Scholar
  2. Bellemare, J., Rouleau, M., Girard, H., Harvey, M., Guillemette, C.: Alternatively spliced products of the UGT1A gene interact with the enzymatically active proteins to inhibit glucuronosyltransferase activity in vitro. Drug Metabolism and Disposition 38(10), 1785–1789 (2010)CrossRefGoogle Scholar
  3. Benson, G., Dong, L.: Reconstructing the duplication history of a tandem repeat. In: Intelligent Systems for Molecular Biology (ISMB), pp. 44–53 (1999)Google Scholar
  4. Bertrand, D., Gascuel, O.: Topological rearrangements and local search method for tandem duplication trees. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2(1), 15–28 (2005)CrossRefGoogle Scholar
  5. Bourque, G., Pevzner, P.A.: Genome-scale evolution: reconstructing gene orders in the ancestral species. Genome Research 12(1), 26–36 (2002)Google Scholar
  6. Elemento, O., Gascuel, O., Lefranc, M.-P.: Reconstructing the duplication history of tandemly repeated genes. Molecular Biology and Evolution 19(3), 278–278 (2002)CrossRefGoogle Scholar
  7. Fitch, W.M.: Phylogenies constrained by the crossover process as illustrated by human hemoglobins and a thirteen-cycle, eleven-amino-acid repeat in human apolipoprotein A-I. Genetics 86(3), 623–624 (1977)Google Scholar
  8. Fujita, P.A., et al.: The UCSC Genome Browser database: update 2011. Nucleic Acids Research 39(D), D876–D882 (2011)CrossRefGoogle Scholar
  9. Gibbs, R.A., et al.: Evolutionary and biomedical insights from the rhesus macaque genome. Science 316(5822), 222–224 (2007)CrossRefGoogle Scholar
  10. Harris, R.: Improved pairwise alignment of genomic DNA. PhD thesis, Pennsylvania State University (2007)Google Scholar
  11. Hasegawa, M., Kishino, H., Yano, T.: Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. Journal of Molecular Evolution 22(2), 160–164 (1985)CrossRefGoogle Scholar
  12. Huntley, S., et al.: A comprehensive catalog of human KRAB-associated zinc finger genes: insights into the evolutionary history of a large family of transcriptional repressors. Genome Research 16(5), 669–677 (2006)CrossRefGoogle Scholar
  13. Kent, W.J., et al.: Evolution’s cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc. Natl. Acad. Sci. USA 100(20), 11484–11489 (2003)CrossRefGoogle Scholar
  14. Lajoie, M., Bertrand, D., El-Mabrouk, N.: Inferring the evolutionary history of gene clusters from phylogenetic and gene order data. Molecular Biology and Evolution 27(4), 761–762 (2010)CrossRefGoogle Scholar
  15. Lajoie, M., Bertrand, D., El-Mabrouk, N., Gascuel, O.: Duplication and inversion history of a tandemly repeated genes family. Journal of Computational Biology 14(4), 462–468 (2007)CrossRefGoogle Scholar
  16. Ma, J., Ratan, A., Raney, B.J., Suh, B.B., Miller, W., Haussler, D.: The infinite sites model of genome evolution. Proc of the National Academy of Science USA 105(38), 14254–14261 (2008a)CrossRefGoogle Scholar
  17. Ma, J., Ratan, A., Raney, B.J., Suh, B.B., Zhang, L., Miller, W., Haussler, D.: DUPCAR: reconstructing contiguous ancestral regions with duplications. Journal of Computational Biology 15(8), 1007–1007 (2008b)CrossRefGoogle Scholar
  18. Ma, J., Zhang, L., Suh, B.B., Raney, B.J., Burhans, R.C., Kent, W.J., Blanchette, M., Haussler, D., Miller, W.: Reconstructing contiguous regions of an ancestral genome. Genome Research 16(12), 1557–1565 (2006)CrossRefGoogle Scholar
  19. Moret, B.M., Wang, L.S., Warnow, T., Wyman, S.K.: New approaches for reconstructing phylogenies from gene order data. Bioinformatics 17(S1), S165–S173 (2001)CrossRefGoogle Scholar
  20. Nadeau, J.H., Taylor, B.A.: Lengths of chromosomal segments conserved since divergence of man and mouse. Proceedings of the National Academy of Science USA 81(3), 814–818 (1984)CrossRefGoogle Scholar
  21. Schmidt, D., Durrett, R.: Adaptive evolution drives the diversification of zinc-finger binding domains. Molecular Biology and Evolution 21(12), 2326–2329 (2004)CrossRefGoogle Scholar
  22. Schwartz, S., et al.: Human-mouse alignments with BLASTZ. Genome Research 13(1), 103–107 (2003)CrossRefGoogle Scholar
  23. Shamir, R., Sharan, R., Tsur, D.: Cluster graph modification problems. Discrete Applied Mathematics 144(1-2), 173–182 (2004)CrossRefzbMATHGoogle Scholar
  24. Song, G., Zhang, L., Vinar, T., Miller, W.: CAGE: combinatorial analysis of gene-cluster evolution. Journal of Computational Biology 17(9), 1227–1232 (2010)CrossRefGoogle Scholar
  25. Van Dongen, S.: Graph clustering via a discrete uncoupling process. SIAM Journal on Matrix Analysis and Applications 30, 121 (2008)CrossRefzbMATHGoogle Scholar
  26. Vinar, T., Brejova, B., Song, G., Siepel, A.C.: Reconstructing histories of complex gene clusters on a phylogeny. Journal of Computational Biology 17(9), 1267–1279 (2010)CrossRefGoogle Scholar
  27. Zhang, J.: Evolution by gene duplication: an update. Trends in Ecology and Evolution 18(6), 292–298 (2003)CrossRefGoogle Scholar
  28. Zhang, Y., Song, G., Vinar, T., Green, E.D., Siepel, A., Miller, W.: Evolutionary history reconstruction for Mammalian complex gene clusters. Journal of Computational Biology 16(8), 1051–1060 (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Broňa Brejová
    • 1
  • Michal Burger
    • 1
  • Tomáš Vinař
    • 1
  1. 1.Faculty of Mathematics, Physics, and InformaticsComenius UniversityBratislavaSlovakia

Personalised recommendations