Skip to main content
Log in

Bayesian clustering analyses for genetic assignment and study of hybridization in oaks: effects of asymmetric phylogenies and asymmetric sampling schemes

  • Original Paper
  • Published:
Tree Genetics & Genomes Aims and scope Submit manuscript

Abstract

Bayesian clustering methods have been widely used for studying species delimitation and genetic introgression. In order to test the effect of phylogenetic relationships and sampling scheme on the inferred clustering solution and on the performance of Bayesian clustering analysis, I simulated genotypes of the interfertile oak species Quercus robur, Quercus petraea, and Quercus pubescens and I run analyses using two popular software programs, STRUCTURE and BAPS. First, based on purebred simulations, I compared clustering solutions resulting from different sample size configurations. While clustering solution generally reflected the taxonomic relationships when equal samples of each species were included, spurious partition was inferred by STRUCTURE when some species were represented by larger and others by smaller samples. In very unbalanced configurations, STRUCTURE failed to identify the three species, even if three subpopulations were assumed. By contrast, BAPS could properly identify the three species under any sampling scheme. Second, based on simulations of purebreds and hybrids, I tested the performance of individual assignments with variable number of loci. This analysis showed that STRUCTURE can detect introgressed individuals more efficiently than BAPS. However, BAPS could assign purebreds more efficiently with a lower number of loci. Method performance also depended on phylogenetic relationships. In the case of Q. petraea, Q. pubescens, and their hybrids, method performance was lower due to their phylogenetic affinity. Inclusion of three instead of two species into the analysis led to reduction of performance, and to misclassification of hybrids, which often reflected the phylogenetic affinity between Q. petraea and Q. pubescens.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Aldrich PR, Parker GR, Michler CH, Romero-Severson J (2003) Whole-tree silvic identifications and the microsatellite genetic structure of a red oak species complex in an Indiana old-growth forest. Can J Forest Res 33:2228–2237

    Article  Google Scholar 

  • Antao T, Lopes A, Lopes RJ, Beja-Pereira A, Luikart G (2008) LOSITAN: a workbench to detect molecular adaptation based on a FST-outlier method. BMC Bioinforma 9:323

    Article  Google Scholar 

  • Belkhir K, Borsa P, Chikhi L, Raufaste N, Bonhomme F (2004) GENETIX 4.05, WindowsTM Software for Population Genetics. Laboratoire génome, populations, interactions, CNRS UMR 5000

  • Bohling JH, Adams JR, Waits LP (2013) Evaluating the ability of Bayesian clustering methods to detect hybridization and introgression using an empirical red wolf data set. Mol Ecol 22:74–86

    Article  PubMed  Google Scholar 

  • Burgarella C, Lorenzo Z, Jabbour-Zahab R, Lumaret R, Guichoux E, Petit RJ, Soto Á, Gil L (2009) Detection of hybrids in nature: application to oaks (Quercus suber and Q. ilex). Heredity 102:442–452

    Article  CAS  PubMed  Google Scholar 

  • Celeux G, Hurn M, Robert CP (2000) Computational and inferential difficulties with mixture posterior distributions. J Am Stat Assoc 95:957–970

    Article  Google Scholar 

  • Corander J, Marttinen P (2006) Bayesian identification of admixture events using multilocus molecular markers. Mol Ecol 15:2833–2843

    Article  PubMed  Google Scholar 

  • Corander J, Marttinen P, Sirén J, Tang J (2008a) Enhanced Bayesian modelling in BAPS software for learning genetic structures of populations. BMC Bioinforma 9:539

    Article  Google Scholar 

  • Corander J, Sirén J, Arjas E (2008b) Bayesian spatial modeling of genetic population structure. Compu Stat 23:111–129

    Article  Google Scholar 

  • Curtu AL, Gailing O, Finkeldey R (2007) Evidence for hybridization and introgression within a species-rich oak (Quercus spp.) community. BMC Evol Biol 7:218

    Article  PubMed Central  PubMed  Google Scholar 

  • Ding L, Wiener H, Abebe T et al (2011) Comparison of measures of marker informativeness for ancestry and admixture mapping. BMC Genomics 12:622

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Dow B, Ashley M, Howe H (1995) Characterization of highly variable (GA/CT) n microsatellites in the bur oak, Quercus macrocarpa. Theor Appl Genet 91:137–141

    Article  CAS  PubMed  Google Scholar 

  • Duminil J, Caron H, Scotti I, Cazal S-O, Petit RJ (2006) Blind population genetics survey of tropical rainforest trees. Mol Ecol 15:3505–3513

    Article  CAS  PubMed  Google Scholar 

  • Durand J, Bodénès C, Chancerel E et al (2010) A fast and cost-effective approach to develop and map EST-SSR markers: oak as a case study. BMC Genomics 11:570

    Article  PubMed Central  PubMed  Google Scholar 

  • Earl DA, vonHoldt BM (2012) STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conserv Genet Resour 4:359–361

    Article  Google Scholar 

  • Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol Ecol 14:2611–2620

    Article  CAS  PubMed  Google Scholar 

  • Falush D, Stephens M, Pritchard JK (2003) Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164:1567–1587

    CAS  PubMed Central  PubMed  Google Scholar 

  • Frantz AC, Pourtois JT, Heuertz M, Schley L, Flamand MC, Krier A, Bertouille S, Chaumont F, Burke T (2006) Genetic structure and assignment tests demonstrate illegal translocation of red deer (Cervus elaphus) into a continuous population. Mol Ecol 15:3191–3203

    Article  CAS  PubMed  Google Scholar 

  • Gugger PF, Cavender-Bares J (2011) Molecular and morphological support for a Florida origin of the Cuban oak. J Biogeogr. doi:10.1111/j.1365-2699.2011.02610.x

    Google Scholar 

  • Guichoux E, Lagache L, Wagner S, Léger P, Petit RJ (2011) Two highly validated multiplexes (12-plex and 8-plex) for species delimitation and parentage analysis in oaks (Quercus spp.). Mol Ecol Resour 11:578–585

    Article  CAS  PubMed  Google Scholar 

  • Guichoux E, Garnier-Géré P, Lagache L, Lang T, Boury C, Petit RJ (2013) Outlier loci highlight the direction of introgression in oaks. Mol Ecol 22:450–462

    Article  CAS  PubMed  Google Scholar 

  • Hanage WP, Fraser C, Tang J, Connor TR, Corander J (2009) Hyper-recombination, diversity, and antibiotic resistance in Pneumococcus. Science 324:1454–1457

    Article  CAS  PubMed  Google Scholar 

  • Hedrick PW (1999) Perspective: highly variable loci and their interpretation in evolution and conservation. Evolution 53:313

    Article  Google Scholar 

  • Heuertz M, Fineschi S, Anzidei M et al (2004) Chloroplast DNA variation and postglacial recolonization of common ash (Fraxinus excelsior L.) in Europe. Mol Ecol 13:3437–3452

    Article  CAS  PubMed  Google Scholar 

  • Höltken A, Buschbom J, Kätzel R (2012) Die Artintegrität unserer heimischen Eichen Quercus robur L., Q. petraea (Matt.) Liebl. und Q. pubescens Willd. aus genetischer Sicht (in German). Allg Forst Jagdztg 183:100–110

    Google Scholar 

  • Jakobsson M, Rosenberg NA (2007) CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure. Bioinformatics 23:1801–1806

    Article  CAS  PubMed  Google Scholar 

  • Kalinowski ST (2011) The computer program STRUCTURE does not reliably identify the main genetic clusters within species: simulations and implications for human population structure. Heredity 106:625–632

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Kampfer S, Lexer C, Glössl J, Steinkellner H (1998) Characterization of (GA) n microsatellite loci from Quercus robur. Hereditas 129:183–186

    Article  CAS  Google Scholar 

  • Kronforst MR, Young LG, Blume LM, Gilbert LE (2006) Multilocus analyses of admixture and introgression among hybridizing Heliconius butterflies. Evolution 60:1254–1268

    CAS  PubMed  Google Scholar 

  • Kumar S, Skjæveland Å, Orr RJ, Enger P, Ruden T, Mevik B-H, Burki F, Botnen A, Shalchian-Tabrizi K (2009) AIR: a batch-oriented web program package for construction of supermatrices ready for phylogenomic analyses. BMC Bioinforma 10:357

    Article  Google Scholar 

  • Lepais O, Petit R, Guichoux E, Lavabre J, Alberto F, Kremer A, Gerber S (2009) Species relative abundance and direction of introgression in oaks. Mol Ecol 18:2228–2242

    Article  CAS  PubMed  Google Scholar 

  • Lexer C, Fay MF, Joseph JA, Nica M-S, Heinze B (2005) Barrier to gene flow between two ecologically divergent Populus species, P. alba (white poplar) and P. tremula (European aspen): the role of ecology and life history in gene introgression. Mol Ecol 14:1045–1057

    Article  CAS  PubMed  Google Scholar 

  • Manos PS, Doyle JJ, Nixon KC (1999) Phylogeny, biogeography, and processes of molecular differentiation in Quercus Subgenus Quercus (Fagaceae). Mol Phylogenet Evol 12:333–349

    Article  CAS  PubMed  Google Scholar 

  • Narum SR, Banks M, Beacham TD et al (2008) Differentiating salmon populations at broad and fine geographical scales with microsatellites and single nucleotide polymorphisms. Mol Ecol 17:3464–3477

    CAS  PubMed  Google Scholar 

  • Neophytou C, Aravanopoulos F, Fink S, Dounavi A (2010) Detecting interspecific and geographic differentiation patterns in two interfertile oak species (Quercus petraea (Matt.) Liebl. and Q. robur L.) using small sets of microsatellite markers. For Ecol Manag 259:2026–2035

    Article  Google Scholar 

  • Neophytou C, Dounavi A, Fink S, Aravanopoulos F (2011) Interfertile oaks in an island environment: I. High nuclear genetic differentiation and high degree of chloroplast DNA sharing between Q. alnifolia and Q. coccifera in Cyprus. A multipopulation study. Eur J For Res 130:543–555

    Article  Google Scholar 

  • Nielsen EE, Bach LA, Kotlicki P (2006) HYBRIDLAB (version 1.0): a program for generating simulated hybrids from population samples. Mol Ecol Notes 6:971–973

    Article  Google Scholar 

  • Payseur BA, Jing P (2009) A genomewide comparison of population structure at STRPs and nearby SNPs in humans. Mol Biol Evol 26:1369–1377

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155:945–959

    CAS  PubMed Central  PubMed  Google Scholar 

  • Reeves PA, Richards CM (2011) Species delimitation under the general lineage concept: an empirical example using wild North American hops (Cannabaceae: Humulus lupulus). Syst Biol 60:45–59

    Article  CAS  PubMed  Google Scholar 

  • Rosenberg NA (2004) DISTRUCT: a program for the graphical display of population structure. Mol Ecol Notes 4:137–138

    Article  Google Scholar 

  • Rosenberg NA (2005) Algorithms for selecting informative marker panels for population assignment. J Comput Biol 12:1183–1201

    Article  CAS  PubMed  Google Scholar 

  • Rosenberg NA, Burke T, Elo K et al (2001) Empirical evaluation of genetic clustering methods using multilocus genotypes from 20 chicken breeds. Genetics 159:699–713

    CAS  PubMed Central  PubMed  Google Scholar 

  • Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, Zhivotovsky LA, Feldman MW (2002) Genetic structure of human populations. Science 298:2381–2385

    Article  CAS  PubMed  Google Scholar 

  • Rosenberg NA, Li LM, Ward R, Pritchard JK (2003) Informativeness of genetic markers for inference of ancestry. Am J Hum Genet 73:1402–1422

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Steinkellner H, Fluch S, Turetschek E, Lexer C, Streiff R, Kremer A, Burg K, Glössl J (1997a) Identification and characterization of (GA/CT) n-microsatellite loci from Quercus petraea. Plant Mol Biol 33:1093–1096

    Article  CAS  PubMed  Google Scholar 

  • Steinkellner H, Lexer C, Turetschek E, Glössl J (1997b) Conservation of (GA)n microsatellite loci between Quercus species. Mol Ecol 6:1189–1194

    Article  CAS  Google Scholar 

  • Takezaki N, Nei M, Tamura K (2010) POPTREE2: software for constructing population trees from allele frequency data and computing other population statistics with windows interface. Mol Biol Evol 27:747–752

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Vähä J-P, Primmer CR (2006) Efficiency of model-based Bayesian methods for detecting hybrid individuals under different hybridization scenarios and with different numbers of loci. Mol Ecol 15:63–72

    Article  PubMed  Google Scholar 

  • Weir BS, Cockerham CC (1984) Estimating F-statistics for the analysis of population structure. Evolution 38:1358–1370

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the European Regional Development Fund (ERDF), the regional government authority of Baden-Württemberg in Freiburg (Regierungspräsidium Freiburg; RPF), the National Office of Forests (Office National des Forêts; ONF) in France and the Regional Directory of Food, Agriculture and Forestry of Alsace (Direction Régionale de l'Alimentation, de l'Agriculture et de la Forêt d'Alsace; DRAAF) in the frame of the Interreg-IV project “The regeneration of the oaks in the Upper Rhine lowlands”. I express my gratitude to all the colleagues of the ONF, RPF and the FVA who worked for sample collections and laboratory analyses, to Jukka Corander for kindly answering several questions about the BAPS software and to two anonymous reviewers for providing valuable comments and suggestions.

Data archiving statement

Genotypic data for this study are available from the Dryad Digital Repository: http://doi.org/10.5061/dryad.b64b4.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Charalambos Neophytou.

Additional information

Communicated by D. Neale

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 82 kb)

ESM 2

(PDF 88 kb)

ESM 3

(PDF 292 kb)

ESM 4

(PDF 2 mb)

ESM 5

(PDF 171 kb)

ESM 6

(PDF 188 kb)

ESM 7

(PDF 120 kb)

ESM 8

(PDF 82 kb)

ESM 9

(PDF 268 kb)

ESM 10

(PDF 131 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Neophytou, C. Bayesian clustering analyses for genetic assignment and study of hybridization in oaks: effects of asymmetric phylogenies and asymmetric sampling schemes. Tree Genetics & Genomes 10, 273–285 (2014). https://doi.org/10.1007/s11295-013-0680-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11295-013-0680-2

Keywords

Navigation