Bayesian spatial modeling of genetic population structure
 Jukka Corander,
 Jukka Sirén,
 Elja Arjas
 … show all 3 hide
Rent the article at a discount
Rent now* Final gross prices may vary according to local VAT.
Get AccessAbstract
Natural populations of living organisms often have complex histories consisting of phases of expansion and decline, and the migratory patterns within them may fluctuate over space and time. When parts of a population become relatively isolated, e.g., due to geographical barriers, stochastic forces reshape certain DNA characteristics of the individuals over generations such that they reflect the restricted migration and mating/reproduction patterns. Such populations are typically termed as genetically structured and they may be statistically represented in terms of several clusters between which DNA variations differ clearly from each other. When detailed knowledge of the ancestry of a natural population is lacking, the DNA characteristics of a sample of current generation individuals often provide a wealth of information in this respect. Several statistical approaches to modelbased clustering of such data have been introduced, and in particular, the Bayesian approach to modeling the genetic structure of a population has attained a vivid interest among biologists. However, the possibility of utilizing spatial information from sampled individuals in the inference about genetic clusters has been incorporated into such analyses only very recently. While the standard Bayesian hierarchical modeling techniques through Markov chain Monte Carlo simulation provide flexible means for describing even subtle patterns in data, they may also result in computationally challenging procedures in practical data analysis. Here we develop a method for modeling the spatial genetic structure using a combination of analytical and stochastic methods. We achieve this by extending a novel theory of Bayesian predictive classification with the spatial information available, described here in terms of a colored Voronoi tessellation over the sample domain. Our results for real and simulated data sets illustrate well the benefits of incorporating spatial information to such an analysis.
 Andrieu, C, Doucet, A, Robert, CP (2004) Computational advances for and from Bayesian Analysis. Stat Sci 19: pp. 120129
 Balding, DJ, Nichols, RA (1997) Significant genetic correlations among Caucasians at forensic DNA loci. Heredity 78: pp. 583589 CrossRef
 Barber, CB, Dobkin, DP, Huhdanpaa, HT (1996) The Quickhull algorithm for convex hulls. ACM Trans Math Software 22: pp. 469483 CrossRef
 Berry A (1999) A widerange efficient algorithm for minimal triangulation. Proceedings of the tenth annual ACMSIAM symposium on Discrete algorithms, Philadelphia, SIAM, pp 860–861
 Cegelski, CC, Waits, LP, Anderson, NJ (2003) Assessing population structure and gene flow in Montana wolverines (Gulo gulo) using assignmentbased approaches. Mol Ecol 12: pp. 29072918 CrossRef
 Corander, J, Waldmann, P, Sillanpää, MJ (2003) Bayesian analysis of genetic differentiation between populations. Genetics 163: pp. 367374
 Corander, J, Waldmann, P, Marttinen, P, Sillanpää, MJ (2004) BAPS 2: enhanced possibilities for the analysis of genetic population structure. Bioinformatics 20: pp. 23632369 CrossRef
 Corander, J, Marttinen, P, Mäntyniemi, S (2006) Bayesian identification of stock mixtures from molecular marker data. Fish Bull 104: pp. 550558
 Corander J, Gyllenberg M, Koski T (2007) Bayesian unsupervised classification framework based on stochastic partitions of data and a parallel search strategy. Adv Data Analysis Classification, under review
 Denison, DGT, Holmes, CC (2001) Bayesian partitioning for estimating disease risk. Biometrics 57: pp. 143149 CrossRef
 Duda, RO, Hart, PE, Stork, DG (2000) Pattern classification, 2nd edn. Wiley, New York
 Falush, D, Stephens, M, Pritchard, JK (2003) Inference of population structure using multilocus genotype data: Linked loci and correlated allele frequencies. Genetics 164: pp. 15671587
 Gelfand, AE, Vounatsou, P (2003) Proper multivariate conditional autoregressive models for spatial data analysis. Biostatistics 4: pp. 1125 CrossRef
 Guillot, G, Estoup, A, Mortier, F, Cosson, JF (2005) A spatial statistical model for landscape genetics. Genetics 170: pp. 12611280 CrossRef
 Hartl, DL, Clark, AG (1997) Principles of population genetics, 3rd edn. Sinauer Associates, Sunderland
 Heikkinen, J, Arjas, E (1998) Nonparametric Bayesian estimation of a spatial Poisson intensity. Scand J Statist 25: pp. 435450 CrossRef
 Heikkinen, J, Arjas, E (1999) Modeling a poisson forest in variable elevations: a nonparametric Bayesian approach. Biometrics 55: pp. 738745 CrossRef
 Kimura, M, Weiss, GH (1964) The steppingstone model of population structure and the decrease of genetic correlation with distance. Genetics 49: pp. 561576
 Lauritzen, SL (1996) Graphical models. Oxford University Press, Oxford
 Manni, F, Guérard, E, Heyer, E (2004) Geographic patterns of (genetic, morphologic, linguistic) variation: how barriers can be detected by “Monmonier’s algorithm”. Hum Biol 76: pp. 173190 CrossRef
 Pella, J, Masuda, M (2001) Bayesian methods for analysis of stock mixtures from genetic characters. Fish Bull 99: pp. 151167
 Perks, W (1947) Some observations on inverse probability including a new indifference rule. J Inst Actuaries 73: pp. 285334
 Pritchard, JK, Stephens, M, Donnelly, P (2000) Inference of population structure using multilocus genotype data. Genetics 155: pp. 945959
 Rannala, B, Mountain, JL (1997) Detecting immigration by using multilocus genotypes. PNAS 94: pp. 91979201 CrossRef
 Seppä, P, Gyllenstrand, M, Corander, J, Pamilo, P (2004) Coexistence of the social types: Genetic population structure in the ant Formica exsecta. Evolution 58: pp. 24622471
 Sawyer, S (1977) Asymptotic properties of the equilibrium probability of identity in a geographically structured population. Adv Appl Prob 9: pp. 268282 CrossRef
 Vounatsou, P, Smith, T, Gelfand, AE (2000) Spatial modeling of multinomial data with latent structure; an application to geographical mapping of human gene and haplotype frequencies. Biostatistics 1: pp. 177189 CrossRef
 Wasser, SK, Shedlock, AM, Comstock, K, Ostrander, EA, Mutayoba, B, Stephens, M (2004) Assigning African elephant DNA to geographic region of origin: Applications to the ivory trade. PNAS 101: pp. 1484714852 CrossRef
 Wright, S (1943) Isolation by distance. Genetics 28: pp. 139156
 Wright, S (1951) The genetical structure of populations. Ann Eugen 15: pp. 323354
 Wright, S (1965) The interpretation of population structure by Fstatistics with special regard to systems of mating. Evolution 52: pp. 950956
 Title
 Bayesian spatial modeling of genetic population structure
 Journal

Computational Statistics
Volume 23, Issue 1 , pp 111129
 Cover Date
 20080101
 DOI
 10.1007/s001800070072x
 Print ISSN
 09434062
 Online ISSN
 16139658
 Publisher
 SpringerVerlag
 Additional Links
 Topics
 Keywords

 Bayesian inference
 Genetic structure
 Spatial modeling
 Statistical learning theory
 Unsupervised classification
 Industry Sectors
 Authors

 Jukka Corander ^{(1)}
 Jukka Sirén ^{(1)}
 Elja Arjas ^{(1)}
 Author Affiliations

 1. Department of Mathematics and Statistics, University of Helsinki, P. O. Box 68, 00014, Helsinki, Finland