Bulletin of Mathematical Biology

, Volume 73, Issue 3, pp 459–494 | Cite as

The Site-Frequency Spectrum of Linked Sites

  • Xiaohui XieEmail author
Open Access
Original Article


The site-frequency spectrum, representing the distribution of allele frequencies at a set of polymorphic sites, is a commonly used summary statistic in population genetics. Explicit forms of the spectrum are known for both models with and without selection if independence among sites is assumed. The availability of these explicit forms has allowed for maximum likelihood estimation of selection, developed first in the Poisson random field model of Sawyer and Hartl, which is now the primary method for estimating selection directly from DNA sequence data. The independence assumption, which amounts to assume free recombination between sites, is, however, a limiting case for many population genetics models. Here, we extend the site-frequency spectrum theory to consider the case where the sites are completely linked. We use diffusion approximation to calculate the joint distribution of the allele frequencies of linked sites for models without selection and for models with equal coefficient selection. The joint distribution is derived by first constructing Green’s functions corresponding to multiallele diffusion equations. We show that the site-frequency spectrum is highly correlated between frequencies that are complementary (i.e., sum to 1), and the correlation is significantly elevated by positive selection. The results presented here can be used to extend the Poisson random field to allow for estimating selection for correlated sites. More generally, the Green’s function construction should be able to aid in studying the genetic drift of multiple alleles in other cases.


Site-frequency spectrum Wright–Fisher model Diffusion approximation Green’s function 


  1. Abramowitz, M., Stegun, I., 1965. Handbook of Mathematical Functions: with Formulas, Graphs, and Mathematical Tables. Courier Dover, New York. Google Scholar
  2. Adams, A., Hudson, R., 2004. Maximum-likelihood estimation of demographic parameters using the frequency spectrum of unlinked single-nucleotide polymorphisms. Genetics 168(3), 1699. CrossRefGoogle Scholar
  3. Barbour, A., Ethier, S., Griffiths, R., 2000. A transition function expansion for a diffusion model with selection. Ann. Appl. Probab., 123–162. Google Scholar
  4. Baxter, G., Blythe, R., McKane, A., 2007. Exact solution of the multi-allelic diffusion model. Math. Biosci. 209(1), 124–170. MathSciNetzbMATHCrossRefGoogle Scholar
  5. Braverman, J., Hudson, R., Kaplan, N., Langley, C., Stephan, W., 1995. The hitchhiking effect on the site frequency spectrum of DNA polymorphisms. Genetics 140(2), 783. Google Scholar
  6. Bustamante, C., Wakeley, J., Sawyer, S., Hartl, D., 2001. Directional selection and the site-frequency spectrum. Genetics 159(4), 1779. Google Scholar
  7. De, A., Durrett, R., 2007. Stepping-stone spatial structure causes slow decay of linkage disequilibrium and shifts the site frequency spectrum. Genetics 176(2), 969. CrossRefGoogle Scholar
  8. Drake, J., Bird, C., Nemesh, J., Thomas, D., Newton-Cheh, C., Reymond, A., Excoffier, L., Attar, H., Antonarakis, S., Dermitzakis, E., et al., 2006. Conserved noncoding sequences are selectively constrained and not mutation cold spots. Nat. Genet. 38(2), 223–227. CrossRefGoogle Scholar
  9. Durrett, R., 2008. Probability Models for DNA Sequence Evolution. Springer, Berlin. zbMATHGoogle Scholar
  10. Etheridge, A., Griffiths, R., 2009. A coalescent dual process in a Moran model with genic selection. Theor. Popul. Biol. Google Scholar
  11. Evans, S., Shvets, Y., Slatkin, M., 2007. Non-equilibrium theory of the allele frequency spectrum. Theor. Popul. Biol. 71(1), 109–119. zbMATHCrossRefGoogle Scholar
  12. Ewens, W., 1979. Mathematical Population Genetics. Springer, New York. zbMATHGoogle Scholar
  13. Fay, J., Wu, C., 2000. Hitchhiking under positive Darwinian selection. Genetics 155(3), 1405. Google Scholar
  14. Fisher, R., 1930. The distribution of gene ratios for rare mutations. In: Proc. R. Soc. Edinb., vol. 50, pp. 204–219. Google Scholar
  15. Fu, Y., 1995. Statistical properties of segregating sites. Theor. Popul. Biol. 48(2), 172–197. zbMATHCrossRefGoogle Scholar
  16. Griffiths, R., 1979. A transition density expansion for a multi-allele diffusion model. Adv. Appl. Probab. 11(2), 310–325. zbMATHCrossRefGoogle Scholar
  17. Griffiths, R., 2003. The frequency spectrum of a mutation, and its age, in a general diffusion model. Theor. Popul. Biol. 64(2), 241–251. CrossRefGoogle Scholar
  18. Griffiths, R., Li, W., 1983. Simulating allele frequencies in a population and the genetic differentiation of populations under mutation pressure. Theor. Popul. Biol. 23(1), 19. zbMATHCrossRefGoogle Scholar
  19. Griffiths, R., Tavaré, S., 1998. The age of a mutation in a general coalescent tree. Stoch. Models 14(1), 273–295. zbMATHCrossRefGoogle Scholar
  20. Hill, W., Robertson, A., 2009. The effect of linkage on limits to artificial selection. Genet. Res. 8(03), 269–294. CrossRefGoogle Scholar
  21. Karlin, S., Taylor, H., 1981. A Second Course in Stochastic Processes. Academic Press, New York. zbMATHGoogle Scholar
  22. Kim, Y., Stephan, W., 2002. Detecting a local signature of genetic hitchhiking along a recombining chromosome. Genetics 160(2), 765. Google Scholar
  23. Kimura, M., 1955. Random genetic drift in multi-allelic locus. Evolution 9(4), 419–435. CrossRefGoogle Scholar
  24. Kimura, M., 1956. Random genetic drift in a tri-allelic locus; exact solution with a continuous model. Biometrics 12(1), 57–66. CrossRefGoogle Scholar
  25. Kimura, M., 1964. Diffusion models in population genetics. J. Appl. Probab. 1(2), 177–232. zbMATHCrossRefGoogle Scholar
  26. Kimura, M., 1969. The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics 61(4), 893. Google Scholar
  27. Li, W., 1977. Maintenance of genetic variability under mutation and selection pressures in population. Proc. Natl. Acad. Sci. USA 74(6), 2509–2513. CrossRefGoogle Scholar
  28. Littler, R., 1975. Loss of variability at one locus in a finite population. Math. Biosci. 25(1–2), 151–163. MathSciNetzbMATHCrossRefGoogle Scholar
  29. Littler, R., Fackerell, E., 1975. Transition densities for neutral multi-allele diffusion models. Biometrics 31(1), 117–123. MathSciNetzbMATHCrossRefGoogle Scholar
  30. Marth, G., Czabarka, E., Murvai, J., Sherry, S., 2004. The allele frequency spectrum in genome-wide human variation data reveals signals of differential demographic history in three large world populations. Genetics 166(1), 351. CrossRefGoogle Scholar
  31. Nei, M., Maruyama, T., Chakraborty, R., 1975. The bottleneck effect and genetic variability in populations. Evolution 29(1), 1–10. CrossRefGoogle Scholar
  32. Nielsen, R., 2005. Molecular signatures of natural selection. Annu. Rev. Genet. 39, 197–218. CrossRefGoogle Scholar
  33. Ohta, T., Kimura, M., 1969. Linkage disequilibrium at steady state determined by random genetic drift and recurrent mutation. Genetics 63(1), 229. Google Scholar
  34. Polanski, A., Kimmel, M., 2003. New explicit expressions for relative frequencies of single-nucleotide polymorphisms with application to statistical inference on population growth. Genetics 165(1), 427. Google Scholar
  35. Przeworski, M., 2002. The signature of positive selection at randomly chosen loci. Genetics 160(3), 1179. MathSciNetGoogle Scholar
  36. Przeworski, M., Wall, J., Andolfatto, P., 2001. Recombination and the frequency spectrum in Drosophila melanogaster and Drosophila simulans. Mol. Biol. Evol. 18(3), 291. Google Scholar
  37. Roach, G., 1982. Green’s Functions. Cambridge University Press, Cambridge. zbMATHGoogle Scholar
  38. Sawyer, S., Hartl, D., 1992. Population genetics of polymorphism and divergence. Genetics 132(4), 1161. Google Scholar
  39. Shimakura, N., 1977. Equations differentielles provenant de la genetique des populations. Tohoku Math. J. 29, 287. MathSciNetzbMATHCrossRefGoogle Scholar
  40. Tajima, F., 1989. The effect of change in population size on DNA polymorphism. Genetics 123(3), 597. MathSciNetGoogle Scholar
  41. Tavaré, S., 1984. Line-of-descent and genealogical processes, and their applications in population genetics models. Theor. Popul. Biol. 26(2), 119–164. zbMATHCrossRefGoogle Scholar
  42. Wakeley, J., Nielsen, R., Liu-Cordero, S., Ardlie, K., 2001. The discovery of single-nucleotide polymorphisms and inferences about human demographic history. Am. J. Hum. Genet. 69(6), 1332–1347. CrossRefGoogle Scholar
  43. Watterson, G., 1977. Heterosis or neutrality? Genetics 85(4), 789. MathSciNetGoogle Scholar
  44. Wright, S., 1938. The distribution of gene frequencies under irreversible mutation. Proc. Natl. Acad. Sci. USA 24(7), 253. zbMATHCrossRefGoogle Scholar
  45. Wright, S., 1942. Statistical genetics and evolution. Bull. Am. Math. Soc. 48(4), 223–246. zbMATHCrossRefGoogle Scholar

Copyright information

© The Author(s) 2010

Authors and Affiliations

  1. 1.Department of Computer Science, Center for Complex Biological Systems, Institute for Genomics and BioinformaticsUniversity of CaliforniaIrvineUSA

Personalised recommendations