Tree Genetics & Genomes

, Volume 10, Issue 2, pp 273–285 | Cite as

Bayesian clustering analyses for genetic assignment and study of hybridization in oaks: effects of asymmetric phylogenies and asymmetric sampling schemes

Original Paper


Bayesian clustering methods have been widely used for studying species delimitation and genetic introgression. In order to test the effect of phylogenetic relationships and sampling scheme on the inferred clustering solution and on the performance of Bayesian clustering analysis, I simulated genotypes of the interfertile oak species Quercus robur, Quercus petraea, and Quercus pubescens and I run analyses using two popular software programs, STRUCTURE and BAPS. First, based on purebred simulations, I compared clustering solutions resulting from different sample size configurations. While clustering solution generally reflected the taxonomic relationships when equal samples of each species were included, spurious partition was inferred by STRUCTURE when some species were represented by larger and others by smaller samples. In very unbalanced configurations, STRUCTURE failed to identify the three species, even if three subpopulations were assumed. By contrast, BAPS could properly identify the three species under any sampling scheme. Second, based on simulations of purebreds and hybrids, I tested the performance of individual assignments with variable number of loci. This analysis showed that STRUCTURE can detect introgressed individuals more efficiently than BAPS. However, BAPS could assign purebreds more efficiently with a lower number of loci. Method performance also depended on phylogenetic relationships. In the case of Q. petraea, Q. pubescens, and their hybrids, method performance was lower due to their phylogenetic affinity. Inclusion of three instead of two species into the analysis led to reduction of performance, and to misclassification of hybrids, which often reflected the phylogenetic affinity between Q. petraea and Q. pubescens.


Bayesian clustering Quercus BAPS STRUCTURE Simulation Microsatellites 

Supplementary material

11295_2013_680_MOESM1_ESM.pdf (83 kb)
ESM 1(PDF 82 kb)
11295_2013_680_MOESM2_ESM.pdf (88 kb)
ESM 2(PDF 88 kb)
11295_2013_680_MOESM3_ESM.pdf (292 kb)
ESM 3(PDF 292 kb)
11295_2013_680_MOESM4_ESM.pdf (2 mb)
ESM 4(PDF 2 mb)
11295_2013_680_MOESM5_ESM.pdf (172 kb)
ESM 5(PDF 171 kb)
11295_2013_680_MOESM6_ESM.pdf (188 kb)
ESM 6(PDF 188 kb)
11295_2013_680_MOESM7_ESM.pdf (120 kb)
ESM 7(PDF 120 kb)
11295_2013_680_MOESM8_ESM.pdf (82 kb)
ESM 8(PDF 82 kb)
11295_2013_680_MOESM9_ESM.pdf (269 kb)
ESM 9(PDF 268 kb)
11295_2013_680_MOESM10_ESM.pdf (131 kb)
ESM 10(PDF 131 kb)


  1. Aldrich PR, Parker GR, Michler CH, Romero-Severson J (2003) Whole-tree silvic identifications and the microsatellite genetic structure of a red oak species complex in an Indiana old-growth forest. Can J Forest Res 33:2228–2237CrossRefGoogle Scholar
  2. Antao T, Lopes A, Lopes RJ, Beja-Pereira A, Luikart G (2008) LOSITAN: a workbench to detect molecular adaptation based on a FST-outlier method. BMC Bioinforma 9:323CrossRefGoogle Scholar
  3. Belkhir K, Borsa P, Chikhi L, Raufaste N, Bonhomme F (2004) GENETIX 4.05, WindowsTM Software for Population Genetics. Laboratoire génome, populations, interactions, CNRS UMR 5000Google Scholar
  4. Bohling JH, Adams JR, Waits LP (2013) Evaluating the ability of Bayesian clustering methods to detect hybridization and introgression using an empirical red wolf data set. Mol Ecol 22:74–86PubMedCrossRefGoogle Scholar
  5. Burgarella C, Lorenzo Z, Jabbour-Zahab R, Lumaret R, Guichoux E, Petit RJ, Soto Á, Gil L (2009) Detection of hybrids in nature: application to oaks (Quercus suber and Q. ilex). Heredity 102:442–452PubMedCrossRefGoogle Scholar
  6. Celeux G, Hurn M, Robert CP (2000) Computational and inferential difficulties with mixture posterior distributions. J Am Stat Assoc 95:957–970CrossRefGoogle Scholar
  7. Corander J, Marttinen P (2006) Bayesian identification of admixture events using multilocus molecular markers. Mol Ecol 15:2833–2843PubMedCrossRefGoogle Scholar
  8. Corander J, Marttinen P, Sirén J, Tang J (2008a) Enhanced Bayesian modelling in BAPS software for learning genetic structures of populations. BMC Bioinforma 9:539CrossRefGoogle Scholar
  9. Corander J, Sirén J, Arjas E (2008b) Bayesian spatial modeling of genetic population structure. Compu Stat 23:111–129CrossRefGoogle Scholar
  10. Curtu AL, Gailing O, Finkeldey R (2007) Evidence for hybridization and introgression within a species-rich oak (Quercus spp.) community. BMC Evol Biol 7:218PubMedCentralPubMedCrossRefGoogle Scholar
  11. Ding L, Wiener H, Abebe T et al (2011) Comparison of measures of marker informativeness for ancestry and admixture mapping. BMC Genomics 12:622PubMedCentralPubMedCrossRefGoogle Scholar
  12. Dow B, Ashley M, Howe H (1995) Characterization of highly variable (GA/CT) n microsatellites in the bur oak, Quercus macrocarpa. Theor Appl Genet 91:137–141PubMedCrossRefGoogle Scholar
  13. Duminil J, Caron H, Scotti I, Cazal S-O, Petit RJ (2006) Blind population genetics survey of tropical rainforest trees. Mol Ecol 15:3505–3513PubMedCrossRefGoogle Scholar
  14. Durand J, Bodénès C, Chancerel E et al (2010) A fast and cost-effective approach to develop and map EST-SSR markers: oak as a case study. BMC Genomics 11:570PubMedCentralPubMedCrossRefGoogle Scholar
  15. Earl DA, vonHoldt BM (2012) STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conserv Genet Resour 4:359–361CrossRefGoogle Scholar
  16. Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol Ecol 14:2611–2620PubMedCrossRefGoogle Scholar
  17. Falush D, Stephens M, Pritchard JK (2003) Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164:1567–1587PubMedCentralPubMedGoogle Scholar
  18. Frantz AC, Pourtois JT, Heuertz M, Schley L, Flamand MC, Krier A, Bertouille S, Chaumont F, Burke T (2006) Genetic structure and assignment tests demonstrate illegal translocation of red deer (Cervus elaphus) into a continuous population. Mol Ecol 15:3191–3203PubMedCrossRefGoogle Scholar
  19. Gugger PF, Cavender-Bares J (2011) Molecular and morphological support for a Florida origin of the Cuban oak. J Biogeogr. doi:10.1111/j.1365-2699.2011.02610.x Google Scholar
  20. Guichoux E, Lagache L, Wagner S, Léger P, Petit RJ (2011) Two highly validated multiplexes (12-plex and 8-plex) for species delimitation and parentage analysis in oaks (Quercus spp.). Mol Ecol Resour 11:578–585PubMedCrossRefGoogle Scholar
  21. Guichoux E, Garnier-Géré P, Lagache L, Lang T, Boury C, Petit RJ (2013) Outlier loci highlight the direction of introgression in oaks. Mol Ecol 22:450–462PubMedCrossRefGoogle Scholar
  22. Hanage WP, Fraser C, Tang J, Connor TR, Corander J (2009) Hyper-recombination, diversity, and antibiotic resistance in Pneumococcus. Science 324:1454–1457PubMedCrossRefGoogle Scholar
  23. Hedrick PW (1999) Perspective: highly variable loci and their interpretation in evolution and conservation. Evolution 53:313CrossRefGoogle Scholar
  24. Heuertz M, Fineschi S, Anzidei M et al (2004) Chloroplast DNA variation and postglacial recolonization of common ash (Fraxinus excelsior L.) in Europe. Mol Ecol 13:3437–3452PubMedCrossRefGoogle Scholar
  25. Höltken A, Buschbom J, Kätzel R (2012) Die Artintegrität unserer heimischen Eichen Quercus robur L., Q. petraea (Matt.) Liebl. und Q. pubescens Willd. aus genetischer Sicht (in German). Allg Forst Jagdztg 183:100–110Google Scholar
  26. Jakobsson M, Rosenberg NA (2007) CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure. Bioinformatics 23:1801–1806PubMedCrossRefGoogle Scholar
  27. Kalinowski ST (2011) The computer program STRUCTURE does not reliably identify the main genetic clusters within species: simulations and implications for human population structure. Heredity 106:625–632PubMedCentralPubMedCrossRefGoogle Scholar
  28. Kampfer S, Lexer C, Glössl J, Steinkellner H (1998) Characterization of (GA) n microsatellite loci from Quercus robur. Hereditas 129:183–186CrossRefGoogle Scholar
  29. Kronforst MR, Young LG, Blume LM, Gilbert LE (2006) Multilocus analyses of admixture and introgression among hybridizing Heliconius butterflies. Evolution 60:1254–1268PubMedGoogle Scholar
  30. Kumar S, Skjæveland Å, Orr RJ, Enger P, Ruden T, Mevik B-H, Burki F, Botnen A, Shalchian-Tabrizi K (2009) AIR: a batch-oriented web program package for construction of supermatrices ready for phylogenomic analyses. BMC Bioinforma 10:357CrossRefGoogle Scholar
  31. Lepais O, Petit R, Guichoux E, Lavabre J, Alberto F, Kremer A, Gerber S (2009) Species relative abundance and direction of introgression in oaks. Mol Ecol 18:2228–2242PubMedCrossRefGoogle Scholar
  32. Lexer C, Fay MF, Joseph JA, Nica M-S, Heinze B (2005) Barrier to gene flow between two ecologically divergent Populus species, P. alba (white poplar) and P. tremula (European aspen): the role of ecology and life history in gene introgression. Mol Ecol 14:1045–1057PubMedCrossRefGoogle Scholar
  33. Manos PS, Doyle JJ, Nixon KC (1999) Phylogeny, biogeography, and processes of molecular differentiation in Quercus Subgenus Quercus (Fagaceae). Mol Phylogenet Evol 12:333–349PubMedCrossRefGoogle Scholar
  34. Narum SR, Banks M, Beacham TD et al (2008) Differentiating salmon populations at broad and fine geographical scales with microsatellites and single nucleotide polymorphisms. Mol Ecol 17:3464–3477PubMedGoogle Scholar
  35. Neophytou C, Aravanopoulos F, Fink S, Dounavi A (2010) Detecting interspecific and geographic differentiation patterns in two interfertile oak species (Quercus petraea (Matt.) Liebl. and Q. robur L.) using small sets of microsatellite markers. For Ecol Manag 259:2026–2035CrossRefGoogle Scholar
  36. Neophytou C, Dounavi A, Fink S, Aravanopoulos F (2011) Interfertile oaks in an island environment: I. High nuclear genetic differentiation and high degree of chloroplast DNA sharing between Q. alnifolia and Q. coccifera in Cyprus. A multipopulation study. Eur J For Res 130:543–555CrossRefGoogle Scholar
  37. Nielsen EE, Bach LA, Kotlicki P (2006) HYBRIDLAB (version 1.0): a program for generating simulated hybrids from population samples. Mol Ecol Notes 6:971–973CrossRefGoogle Scholar
  38. Payseur BA, Jing P (2009) A genomewide comparison of population structure at STRPs and nearby SNPs in humans. Mol Biol Evol 26:1369–1377PubMedCentralPubMedCrossRefGoogle Scholar
  39. Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155:945–959PubMedCentralPubMedGoogle Scholar
  40. Reeves PA, Richards CM (2011) Species delimitation under the general lineage concept: an empirical example using wild North American hops (Cannabaceae: Humulus lupulus). Syst Biol 60:45–59PubMedCrossRefGoogle Scholar
  41. Rosenberg NA (2004) DISTRUCT: a program for the graphical display of population structure. Mol Ecol Notes 4:137–138CrossRefGoogle Scholar
  42. Rosenberg NA (2005) Algorithms for selecting informative marker panels for population assignment. J Comput Biol 12:1183–1201PubMedCrossRefGoogle Scholar
  43. Rosenberg NA, Burke T, Elo K et al (2001) Empirical evaluation of genetic clustering methods using multilocus genotypes from 20 chicken breeds. Genetics 159:699–713PubMedCentralPubMedGoogle Scholar
  44. Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, Zhivotovsky LA, Feldman MW (2002) Genetic structure of human populations. Science 298:2381–2385PubMedCrossRefGoogle Scholar
  45. Rosenberg NA, Li LM, Ward R, Pritchard JK (2003) Informativeness of genetic markers for inference of ancestry. Am J Hum Genet 73:1402–1422PubMedCentralPubMedCrossRefGoogle Scholar
  46. Steinkellner H, Fluch S, Turetschek E, Lexer C, Streiff R, Kremer A, Burg K, Glössl J (1997a) Identification and characterization of (GA/CT) n-microsatellite loci from Quercus petraea. Plant Mol Biol 33:1093–1096PubMedCrossRefGoogle Scholar
  47. Steinkellner H, Lexer C, Turetschek E, Glössl J (1997b) Conservation of (GA)n microsatellite loci between Quercus species. Mol Ecol 6:1189–1194CrossRefGoogle Scholar
  48. Takezaki N, Nei M, Tamura K (2010) POPTREE2: software for constructing population trees from allele frequency data and computing other population statistics with windows interface. Mol Biol Evol 27:747–752PubMedCentralPubMedCrossRefGoogle Scholar
  49. Vähä J-P, Primmer CR (2006) Efficiency of model-based Bayesian methods for detecting hybrid individuals under different hybridization scenarios and with different numbers of loci. Mol Ecol 15:63–72PubMedCrossRefGoogle Scholar
  50. Weir BS, Cockerham CC (1984) Estimating F-statistics for the analysis of population structure. Evolution 38:1358–1370CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Forest Research Institute (FVA) Baden-WürttembergFreiburgGermany

Personalised recommendations