Automated Segmentation of DNA Sequences with Complex Evolutionary Histories
Most algorithms for reconstruction of evolutionary histories involving large-scale events such as duplications, deletions or rearrangements, work on sequences of predetermined markers, for example protein coding genes or other functional elements. However, markers defined in this way ignore information included in non-coding sequences, are prone to errors in annotation, and may even introduce artifacts due to partial gene copies or chimeric genes.
We propose the problem of sequence segmentation where the goal is to automatically select suitable markers based on sequence homology alone. We design an algorithm for this problem which can tolerate certain amount of inaccuracies in the input alignments and still produce segmentation of the sequence to markers with high coverage and accuracy. We test our algorithm on several artificial and real data sets representing complex clusters of segmental duplications. Our software is available at http://compbio.fmph.uniba.sk/atomizer/
Unable to display preview. Download preview PDF.
- Adam, Z., Sankoff, D.: The ABCs of MGR with DCJ. Evolutionary Bioinformatics Online 4, 69–74 (2008)Google Scholar
- Benson, G., Dong, L.: Reconstructing the duplication history of a tandem repeat. In: Intelligent Systems for Molecular Biology (ISMB), pp. 44–53 (1999)Google Scholar
- Bourque, G., Pevzner, P.A.: Genome-scale evolution: reconstructing gene orders in the ancestral species. Genome Research 12(1), 26–36 (2002)Google Scholar
- Fitch, W.M.: Phylogenies constrained by the crossover process as illustrated by human hemoglobins and a thirteen-cycle, eleven-amino-acid repeat in human apolipoprotein A-I. Genetics 86(3), 623–624 (1977)Google Scholar
- Harris, R.: Improved pairwise alignment of genomic DNA. PhD thesis, Pennsylvania State University (2007)Google Scholar