A Linear-Time Algorithm for Studying Genetic Variation
The study of variation in DNA sequences, within the framework of phylogeny or population genetics, for instance, is one of the most important subjects in modern genomics. We here present a new linear-time algorithm for finding maximal k-regions in alignments of three sequences, which can be used for the detection of segments featuring a certain degree of similarity, as well as the boundaries of distinct genomic environments such as gene clusters or haplotype blocks. k-regions are defined as these which have a center sequence whose Hamming distance from any of the alignment rows is at most k, and their determination in the general case is known to be NP-hard.
Unable to display preview. Download preview PDF.
- 9.Stojanovic, N., Dewar, K.: A Probabilistic Approach to the Assessment of Phylogenetic Conservation in Mammalian Hox Gene Clusters. In: Proc. BIOINFO 2005, Int’l Joint Conf. of InCoB, AASBi and KSBI, pp. 118–123 (2005)Google Scholar
- 10.The ENCODE Project Consortium. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306, 636–640 (2004)Google Scholar
- 11.The International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005)Google Scholar
- 12.The International SNP Map Working Group. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409, 928–933 (2001)Google Scholar