Computational Statistics

, Volume 23, Issue 1, pp 111–129 | Cite as

Bayesian spatial modeling of genetic population structure

Original Paper

Abstract

Natural populations of living organisms often have complex histories consisting of phases of expansion and decline, and the migratory patterns within them may fluctuate over space and time. When parts of a population become relatively isolated, e.g., due to geographical barriers, stochastic forces reshape certain DNA characteristics of the individuals over generations such that they reflect the restricted migration and mating/reproduction patterns. Such populations are typically termed as genetically structured and they may be statistically represented in terms of several clusters between which DNA variations differ clearly from each other. When detailed knowledge of the ancestry of a natural population is lacking, the DNA characteristics of a sample of current generation individuals often provide a wealth of information in this respect. Several statistical approaches to model-based clustering of such data have been introduced, and in particular, the Bayesian approach to modeling the genetic structure of a population has attained a vivid interest among biologists. However, the possibility of utilizing spatial information from sampled individuals in the inference about genetic clusters has been incorporated into such analyses only very recently. While the standard Bayesian hierarchical modeling techniques through Markov chain Monte Carlo simulation provide flexible means for describing even subtle patterns in data, they may also result in computationally challenging procedures in practical data analysis. Here we develop a method for modeling the spatial genetic structure using a combination of analytical and stochastic methods. We achieve this by extending a novel theory of Bayesian predictive classification with the spatial information available, described here in terms of a colored Voronoi tessellation over the sample domain. Our results for real and simulated data sets illustrate well the benefits of incorporating spatial information to such an analysis.

Keywords

Bayesian inference Genetic structure Spatial modeling Statistical learning theory Unsupervised classification 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andrieu C, Doucet A and Robert CP (2004). Computational advances for and from Bayesian Analysis. Stat Sci 19: 120–129 MathSciNetGoogle Scholar
  2. Balding DJ and Nichols RA (1997). Significant genetic correlations among Caucasians at forensic DNA loci. Heredity 78: 583–589 CrossRefGoogle Scholar
  3. Barber CB, Dobkin DP and Huhdanpaa HT (1996). The Quickhull algorithm for convex hulls. ACM Trans Math Software 22: 469–483 MATHCrossRefMathSciNetGoogle Scholar
  4. Berry A (1999) A wide-range efficient algorithm for minimal triangulation. Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms, Philadelphia, SIAM, pp 860–861Google Scholar
  5. Cegelski CC, Waits LP and Anderson NJ (2003). Assessing population structure and gene flow in Montana wolverines (Gulo gulo) using assignment-based approaches. Mol Ecol 12: 2907–2918 CrossRefGoogle Scholar
  6. Corander J, Waldmann P and Sillanpää MJ (2003). Bayesian analysis of genetic differentiation between populations. Genetics 163: 367–374 Google Scholar
  7. Corander J, Waldmann P, Marttinen P and Sillanpää MJ (2004). BAPS 2: enhanced possibilities for the analysis of genetic population structure. Bioinformatics 20: 2363–2369 CrossRefGoogle Scholar
  8. Corander J, Marttinen P and Mäntyniemi S (2006). Bayesian identification of stock mixtures from molecular marker data. Fish Bull 104: 550–558 Google Scholar
  9. Corander J, Gyllenberg M, Koski T (2007) Bayesian unsupervised classification framework based on stochastic partitions of data and a parallel search strategy. Adv Data Analysis Classification, under reviewGoogle Scholar
  10. Denison DGT and Holmes CC (2001). Bayesian partitioning for estimating disease risk. Biometrics 57: 143–149 CrossRefMathSciNetGoogle Scholar
  11. Duda RO, Hart PE and Stork DG (2000). Pattern classification, 2nd edn. Wiley, New York Google Scholar
  12. Falush D, Stephens M and Pritchard JK (2003). Inference of population structure using multilocus genotype data: Linked loci and correlated allele frequencies. Genetics 164: 1567–1587 Google Scholar
  13. Gelfand AE and Vounatsou P (2003). Proper multivariate conditional autoregressive models for spatial data analysis. Biostatistics 4: 11–25 MATHCrossRefGoogle Scholar
  14. Guillot G, Estoup A, Mortier F and Cosson JF (2005). A spatial statistical model for landscape genetics. Genetics 170: 1261–1280 CrossRefGoogle Scholar
  15. Hartl DL and Clark AG (1997). Principles of population genetics, 3rd edn. Sinauer Associates, Sunderland Google Scholar
  16. Heikkinen J and Arjas E (1998). Non-parametric Bayesian estimation of a spatial Poisson intensity. Scand J Statist 25: 435–450 MATHCrossRefMathSciNetGoogle Scholar
  17. Heikkinen J and Arjas E (1999). Modeling a poisson forest in variable elevations: a nonparametric Bayesian approach. Biometrics 55: 738–745 MATHCrossRefGoogle Scholar
  18. Kimura M and Weiss GH (1964). The stepping-stone model of population structure and the decrease of genetic correlation with distance. Genetics 49: 561–576 Google Scholar
  19. Lauritzen SL (1996). Graphical models. Oxford University Press, Oxford Google Scholar
  20. Manni F, Guérard E and Heyer E (2004). Geographic patterns of (genetic, morphologic, linguistic) variation: how barriers can be detected by “Monmonier’s algorithm”. Hum Biol 76: 173–190 CrossRefGoogle Scholar
  21. Pella J and Masuda M (2001). Bayesian methods for analysis of stock mixtures from genetic characters. Fish Bull 99: 151–167 Google Scholar
  22. Perks W (1947). Some observations on inverse probability including a new indifference rule. J Inst Actuaries 73: 285–334 MathSciNetGoogle Scholar
  23. Pritchard JK, Stephens M and Donnelly P (2000). Inference of population structure using multilocus genotype data. Genetics 155: 945–959 Google Scholar
  24. Rannala B and Mountain JL (1997). Detecting immigration by using multilocus genotypes. PNAS 94: 9197–9201 CrossRefGoogle Scholar
  25. Seppä P, Gyllenstrand M, Corander J and Pamilo P (2004). Coexistence of the social types: Genetic population structure in the ant Formica exsecta. Evolution 58: 2462–2471 Google Scholar
  26. Sawyer S (1977). Asymptotic properties of the equilibrium probability of identity in a geographically structured population. Adv Appl Prob 9: 268–282 MATHCrossRefMathSciNetGoogle Scholar
  27. Vounatsou P, Smith T and Gelfand AE (2000). Spatial modeling of multinomial data with latent structure; an application to geographical mapping of human gene and haplotype frequencies. Biostatistics 1: 177–189 MATHCrossRefGoogle Scholar
  28. Wasser SK, Shedlock AM, Comstock K, Ostrander EA, Mutayoba B and Stephens M (2004). Assigning African elephant DNA to geographic region of origin: Applications to the ivory trade. PNAS 101: 14847–14852 CrossRefGoogle Scholar
  29. Wright S (1943). Isolation by distance. Genetics 28: 139–156 Google Scholar
  30. Wright S (1951). The genetical structure of populations. Ann Eugen 15: 323–354 Google Scholar
  31. Wright S (1965). The interpretation of population structure by F-statistics with special regard to systems of mating. Evolution 52: 950–956 Google Scholar

Copyright information

© Springer-Verlag 2007

Authors and Affiliations

  1. 1.Department of Mathematics and StatisticsUniversity of HelsinkiHelsinkiFinland

Personalised recommendations