Conservative Extensions of Linkage Disequilibrium Measures from Pairwise to Multi-loci and Algorithms for Optimal Tagging SNP Selection

  • Ryan Tarpine
  • Fumei Lam
  • Sorin Istrail
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6577)


We present results on two classes of problems. The first result addresses the long standing open problem of finding unifying principles for Linkage Disequilibrium (LD) measures in population genetics (Lewontin 1964 [10], Hedrick 1987 [8], Devlin and Risch 1995 [5]). Two desirable properties have been proposed in the extensive literature on this topic and the mutual consistency between these properties has remained at the heart of statistical and algorithmic difficulties with haplotype and genome-wide association study analysis. The first axiom is (1) The ability to extend LD measures to multiple loci as a conservative extension of pairwise LD. All widely used LD measures are pairwise measures. Despite significant attempts, it is not clear how to naturally extend these measures to multiple loci, leading to a “curse of the pairwise”. The second axiom is (2) The Interpretability of Intermediate Values. In this paper, we resolve this mutual consistency problem by introducing a new LD measure, directed informativeness \(\overrightarrow{\mathcal{I}}\) (the directed graph theoretic counterpart of the informativeness measure introduced by Halldorsson et al. [6]) and show that it satisfies both of the above axioms. We also show the maximum informative subset of tagging SNPs based on \(\overrightarrow{\mathcal{I}}\) can be computed exactly in polynomial time for realistic genome-wide data. Furthermore, we present polynomial time algorithms for optimal genome-wide tagging SNPs selection for a number of commonly used LD measures, under the bounded neighborhood assumption for linked pairs of SNPs. One problem in the area is the search for a quality measure for tagging SNPs selection that unifies the LD-based methods such as LD-select (implemented in Tagger, de Bakker et al. 2005 [4], Carlson et al. 2004 [3]) and the information-theoretic ones such as informativeness. We show that the objective function of the LD-select algorithm is the Minimal Dominating Set (MDS) on r 2-SNP graphs and show that we can compute MDS in polynomial time for this class of graphs. Although in LD-select the “maximally informative” solution is obtained through a greedy algorithm, and therefore better referred to as “locally maximally informative,” we show that in fact, Tagger (LD-select) performs very close to the global maximally informative optimum.


Conservative Extension Single Single Nucleotide Polymorphism Linkage Disequilibrium Measure Single Nucleotide Polymorphism Selection Dynamic Programming Matrix 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ardlie, K., Kruglyak, L., Seielstad, M.: Patterns of linkage disequilibrium in the human genome. Nature Reviews, Genetics 3, 299–309 (2002)CrossRefGoogle Scholar
  2. 2.
    Bafna, V., Halldrsson, B.V., Schwartz, R., Clark, A.G., Istrail, S.: Haplotypes and informative snp selection algorithms: don’t block out information. In: RECOMB, pp. 19–27 (2003)Google Scholar
  3. 3.
    Carlson, C., Eberle, M., Reider, M., Yi, Q., Kruglyak, L., Nickerson, D.: Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am. J. Hum. Genet. 74, 106–120 (2004)CrossRefGoogle Scholar
  4. 4.
    de Bakker, P., Yelensky, R., Peer, I., Gabriel, S., Day, M., Altshuler, D.: Efficiency and power in genetic association studies. Nature Genetics 37, 1217–1223 (2005)CrossRefGoogle Scholar
  5. 5.
    Delvin, B., Risch, N.: A comparison of linkage diseqilibrium measures for fine-scale mapping. Genomics 29, 311–322 (1995)CrossRefGoogle Scholar
  6. 6.
    Halldorsson, B., Bafna, V., Lippert, R., Schwartz, R., De La Vega, F., Clark, A., Istrail, S.: Optimal haplotype block-free selection of tagging snps for genome-wide association studies. Genome Research 14, 1633–1640 (2004)CrossRefGoogle Scholar
  7. 7.
    Halldrsson, B.V., Bafna, V., Edwards, N., Lippert, R., Yooseph, S., Istrail, S.: Combinatorial problems arising in snps and haplotype analysis. In: Calude, C.S., Dinneen, M.J., Vajnovszki, V. (eds.) DMTCS 2003. LNCS, vol. 2731, pp. 26–47. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  8. 8.
    Hedrick, P.: Gametic disequilibrium measures: Proceed with caution. Genetics 117, 331–341 (1987)Google Scholar
  9. 9.
    Lancia, G., Bafna, V., Istrail, S., Lippert, R., Schwartz, R.: SNPs problems, complexity, and algorithms. In: Meyer auf der Heide, F. (ed.) ESA 2001. LNCS, vol. 2161, pp. 182–193. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  10. 10.
    Lewontin, R.: On measures of gametic disequilibrium. Genetics 120, 849–852 (1988)Google Scholar
  11. 11.
    Pritchard, J.K., Przeworski, M.: Linkage disequilibrium in humans: Models and data. The American Journal of Human Genetics 69, 1–14 (2001)CrossRefGoogle Scholar
  12. 12.
    Schwartz, R., Clark, A.G., Istrail, S.: Methods for inferring block-wise ancestral history from haploid sequences. In: Guigó, R., Gusfield, D. (eds.) WABI 2002. LNCS, vol. 2452, pp. 44–59. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  13. 13.
    Schwartz, R., Halldrsson, B.V., Bafna, V., Clark, A.G., Istrail, S.: Robustness of inference of haplotype block structure. Journal of Computational Biology 10, 13–20 (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Ryan Tarpine
    • 1
  • Fumei Lam
    • 2
  • Sorin Istrail
    • 1
  1. 1.Center for Computational Molecular Biology, Department of Computer ScienceBrown UniversityProvidenceUSA
  2. 2.Department of Computer ScienceUniversity of CaliforniaDavisUSA

Personalised recommendations