Skip to main content

Automated tetraploid genotype calling by hierarchical clustering

Abstract

Key message

New software to make tetraploid genotype calls from SNP array data was developed, which uses hierarchical clustering and multiple F1 populations to calibrate the relationship between signal intensity and allele dosage.

Abstract

SNP arrays are transforming breeding and genetics research for autotetraploids. To fully utilize these arrays, the relationship between signal intensity and allele dosage must be calibrated for each marker. We developed an improved computational method to automate this process, which is provided as the R package ClusterCall. In the training phase of the algorithm, hierarchical clustering within an F1 population is used to group samples with similar intensity values, and allele dosages are assigned to clusters based on expected segregation ratios. In the prediction phase, multiple F1 populations and the prediction set are clustered together, and the genotype for each cluster is the mode of the training set samples. A concordance metric, defined as the proportion of training set samples equal to the mode, can be used to eliminate unreliable markers and compare different algorithms. Across three potato families genotyped with an 8K SNP array, ClusterCall scored 5729 markers with at least 0.95 concordance (94.6% of its total), compared to 5325 with the software fitTetra (82.5% of its total). The three families were used to predict genotypes for 5218 SNPs in the SolCAP diversity panel, compared with 3521 SNPs in a previous study in which genotypes were called manually. One of the additional markers produced a significant association for vine maturity near a well-known causal locus on chromosome 5. In conclusion, when multiple F1 populations are available, ClusterCall is an efficient method for accurate, autotetraploid genotype calling that enables the use of SNP data for research and plant breeding.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

References

  1. Bourke PM, Voorrips RE, Visser RGF, Maliepaard C (2015) The double reduction landscape in tetraploid potato as revealed by a high-density linkage map. Genetics 201:853–863. doi:10.1534/genetics.115.181008

    Article  PubMed  PubMed Central  Google Scholar 

  2. Bradshaw JE, Hackett CA, Pande B, Waugh R, Bryan GJ (2008) QTL mapping of yield, agronomic and quality traits in tetraploid potato (Solanum tuberosum subsp. tuberosum). Theor Appl Genet 116:193–211

    Article  PubMed  Google Scholar 

  3. Brouwer DJ, Osborn TC (1999) A molecular marker linkage map of tetraploid alfalfa (Medicago sativa L.). Theor Appl Genet 99:1194–1200. doi:10.1007/s001220051324

    CAS  Article  Google Scholar 

  4. Comai L (2005) The advantages and disadvantages of being polyploid. Nat Rev Genet 6:836–846. doi:10.1038/nrg1711

    CAS  Article  PubMed  Google Scholar 

  5. Douches D, Hirsch CN, Manrique-Carpintero NC, Massa AN, Coombs J, Hardigan M, Bisognin D, De Jong W, Buell CR (2014) The contribution of the Solanaceae coordinated agricultural project to potato breeding. Potato Res 57(3–4):215–224. doi:10.1007/s11540-014-9267-z

    Article  Google Scholar 

  6. Endelman J, Jansky S (2016) Genetic mapping with an inbred line derived F2 population in potato. Theor Appl Genet 129:935–943. doi:10.1007/s00122-016-2673-7

    CAS  Article  PubMed  Google Scholar 

  7. Felcher KJ, Coombs JJ, Massa AN, Hansey CN, Hamilton JP, Veilleux RE, Buell CB, Douches DS (2012) Integration of two diploid potato linkage maps with the potato genome sequence. PLoS One 7(4):e36347. doi:10.1371/journal.pone.0036347

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  8. Gallais A (2003) Quantitative genetics and breeding methods in autopolyploid plants. INRA, Paris

    Google Scholar 

  9. Hackett CA, McLean K, Bryan GJ (2013) Linkage analysis and QTL mapping using SNP dosage data in a tetraploid potato mapping population. PLoS One 8(5):e63939. doi:10.1371/journal.pone.0063939

    Article  PubMed  PubMed Central  Google Scholar 

  10. Hamilton JP, Hansey CN, Whitty BR, Stoffel K, Massa AN, Van Deynze A, De Jong WS, Douches DS, Buell CR (2011) Single nucleotide polymorphism discovery in elite North American potato germplasm. BMC Genom 12:302. doi:10.1186/1471-2164-12-302

    CAS  Article  Google Scholar 

  11. Hirsch CN, Hirsch CD, Felcher K, Coombs J, Zarka D, Van Deynze A, De Jong W, Veilleux RE, Jansky S, Bethke P, Douches DS, Buell CR (2013) Retrospective view of North American potato (Solanum tuberosum L.) breeding in the 20th and 21st centuries. G3 3:1003–1013. doi:10.1534/g3.113.005595

    Article  PubMed  PubMed Central  Google Scholar 

  12. Jones GH, Khazanehdari KA, Ford-Lloyd BV (1996) Meiosis in the leek (Allium porrum L.) revisited. II. Metaphase I observations. Heredity 76:186–191

    Article  Google Scholar 

  13. Kloosterman B, Abelenda JA, Carretero Gomez MM, Oortwijn M, de Boer JM, Kowitwanich K, Horvath BM, van Eck HJ, Smaczniak C, Prat S, Visser RGF, Bachem CWB (2013) Naturally occurring allele diversity allows potato cultivation in northern latitudes. Nature 495:246–250. doi:10.1038/nature11912

    CAS  Article  PubMed  Google Scholar 

  14. Koning-Boucoiran CFS, Esselink GD, Vukosavljev M, van’t Westende WPC, Gitonga VW, Krens FA, Voorrips RE, van de Weg WE, Schulz D, Debener T, Maliepaard C, Arens P, Smulders MJM (2015) Using RNA-seq to assemble a rose transcriptom with more than 13,000 full-length expressed genes and to develop the WagRhSNP 68k Axiom SNP array for rose (Rose L.). Front Plant Sci 6:249. doi:10.3389/fpls.2015.00249

    Article  PubMed  PubMed Central  Google Scholar 

  15. Krebs SL, Hancock JF (1989) Tetrasomic inheritance of isoenzyme markers in the highbush blueberry, Vaccinium corymbosum L. Heredity 63:11–18. doi:10.1038/hdy.1989.70

    Article  Google Scholar 

  16. Leal-Bertioli S, Shirasawa K, Abernathy B, Moretzsohn M, Chavarro C, Clevenger J, Ozias-Akins P, Jackson S, Bertioli D (2015) Tetrasomic recombination is surprisingly frequent in allotetraploid Arachis. Genetics 199:1093–1105. doi:10.1534/genetics.115.174607

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  17. Leitch IJ, Bennett MD (1997) Polyploidy in angiosperms. Trends Plant Sci 2:470–476. doi:10.1016/S1360-1385(97)01154-0

    Article  Google Scholar 

  18. Li X, van Eck HJ, Rouppe van der Voort JNAM, Huigen DJ, Stam P, Jacobsen E (1998) Autotetraploids and genetic mapping using common AFLP markers: the R2 allele conferring resistance to Phytophthora intestans mapped on potato chromosome 4. Theor Appl Genet 96:1121–1128. doi:10.1007/s001220050847

    CAS  Article  Google Scholar 

  19. Li X, De Jong H, De Jong DM, De Jong WS (2005) Inheritance and genetic mapping of tuber eye depth in cultivated diploid potatoes. Theor Appl Genet 110:1068–1073. doi:10.1007/s00122-005-1927-6

    CAS  Article  PubMed  Google Scholar 

  20. Li X, Han Y, Wei Y, Acharya A, Farmer AD, Ho J, Monteros MJ, Brummer EC (2014a) Development of an alfalfa SNP array and its use to evaluate patterns of population structure and linkage disequilibrium. Plos One. doi:10.1371/journal.pone.0084329

  21. Li X, Wei Y, Acharya A, Jiang Q, Kang J, Brummer EC (2014b) A saturated genetic linkage map of autotetraploid alfalfa (Medicago sativa L.) developed using genotyping-by-sequencing is highly syntenous with the Medicago truncatula genome. G3 4:1971–1979. doi:10.1534/g3.114.012245

  22. Luo ZW, Hackett CA, Bradshaw JE, McNicol JW, Milbourne D (2001) Construction of a genetic linkage map in tetraploid species using molecular markers. Genetics 157:1369–1385

    CAS  PubMed  PubMed Central  Google Scholar 

  23. Mather K (1936) Segregation and linkage in autotetraploids. J Genet 32(2):287–314. doi:10.1007/BF02982683

    Article  Google Scholar 

  24. Potato Genome Sequencing Consortium (2011) Genome sequence and analysis of the tuber crop potato. Nature 475:189–195. doi:10.1038/nature10158

    Article  Google Scholar 

  25. Quiros CF (1982) Tetrasomic segregation for multiple alleles in alfalfa. Genetics 101:117–127

    CAS  PubMed  PubMed Central  Google Scholar 

  26. R Development Core Team (2015) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna

    Google Scholar 

  27. Renny-Byfield S, Wendel FJ (2014) Doubling down on genomes: polyploidy and crop plants. Am J Bot 101:1711–1725. doi:10.3732/ajb.1400119

    Article  PubMed  Google Scholar 

  28. Rosyara UR, De Jong WS, Douches DS, Endelman JB (2016) Software for genome-wide association studies in autopolyploids and its application to potato. Plant Genome 9. doi:10.3835/plantgenome2015.08.0073

    Google Scholar 

  29. Serang O, Mollinari M, Garcia AAF (2012) Efficient exact maximum a posteriori computation for Bayesian SNP genotyping in polyploids. PLoS One 7:e30906

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  30. Sharma SK, Bolser D, de Boer J, Sønderkær M, Amoros W, Carboni MF, D’Ambrosio JM, de la Cruz G, Di Genova A, Douches DS, Eguiluz M, Guo X, Guzman F, Hackett CA, Hamilton JP, Li G, Li Y, Lozano R, Maass A, Marshall D, Martinez D, McLean K, Mejía N, Milne L, Munive S, Nagy I, Ponce O, Ramirez M, Simon R, Thomson SJ, Torres Y, Waugh R, Zhang Z, Huang S, Visser RGF, Bachem CWB, Sagredo B, Feingold SE, Orjeda G, Veilleux RE, Bonierbale M, Jacobs JME, Milbourne D, Martin DMA, Bryan GJ (2013) Construction of reference chromosome-scale pseudomolecules for potato: integrating the potato genome with genetic and physical maps. G3 3:2031–2047. doi:10.1534/g3.113.007153

    Article  PubMed  PubMed Central  Google Scholar 

  31. Simko I, Haynes KG, Jones RW (2006) Assessment of linkage disequilibrium in potato genome with single nucleotide polymorphism markers. Genetics 173:2237–2245. doi:10.1534/genetics.106.060905

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  32. Stebbins GL (1950) Variation and evolution in plants. Columbia University Press, New York

    Google Scholar 

  33. Uitdewilligen JGAML, Wolters AMA, D’hoop BB, Borm TJA, Visser RGF, van Eck HJ (2013) A next-generation sequencing method for genotyping-by-sequencing of highly heterozygous autotetraploid potato. PLoS One 8:e62355

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  34. van Eck HJ, Jacobs JM, Stam P, Ton J, Stiekema WJ, Jacobsen E (1994) Multiple alleles for tuber shape in diploid potato detected by qualitative and quantitative genetic analysis using RFLPs. Genetics 137:303–309

    PubMed  Google Scholar 

  35. Venables WN, Ripley BD (2002) Modern Applied Statistics with S. 4th edition. Springer, New York.

    Book  Google Scholar 

  36. Voorrips RE, Gort G, Vosman B (2011) Genotype calling in tetraploid species from bi-allelic marker data using mixture models. BMC Bioinform 12:172. doi:10.1186/1471-2105-12-172

    Article  Google Scholar 

  37. Vos PG, Uitdewilligen JGAML, Voorrips RE, Visser RGF, van Eck HJ (2015) Development and analysis of a 20 K SNP array for potato (Solanum tuberosum): an insight into the breeding history. Theor Appl Genet 128:2387–2401. doi:10.1007/s00122-015-2593-y

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  38. Wu KK, Burnquist W, Sorrells ME, Tew TL, Moore PH, Tanksley SD (1992) The detection and estimation of linkage in polyploids using single-dose restriction fragments. Theor Appl Genet 83:294–300. doi:10.1007/BF00224274

    CAS  Article  PubMed  Google Scholar 

  39. Zheng C, Voorrips RE, Jansen J, Hackett CA, Ho J, Bink MCAM (2016) Probabilistic multilocus haplotype reconstruction in outcrossing tetraploids. Genetics 203:119–131. doi:10.1534/genetics.115.185579

    Article  PubMed  Google Scholar 

  40. Zorrilla C, Navarro F, Vega S, Bamberg J, Palta J (2014) Identification and selection for tuber calcium, internal quality and pitted scab in segregating ‘Atlantic’ × ‘Superior’ reciprocal tetraploid populations. Am J Potato Res 91:673–687. doi:10.1007/s12230-014-9399-3

    Article  Google Scholar 

Download references

Acknowledgements

Financial support was provided by the National Institute of Food and Agriculture, U.S. Department of Agriculture, Award Number 2014-67013-22418 and Hatch Project Number 1002731.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jeffrey B. Endelman.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Communicated by Christine A. Hackett.

Electronic supplementary material

Below is the link to the electronic supplementary material.

SupplementaryMaterial.pdf, Figures S1 through S6 and Table S2 (PDF 845 KB)

AxS_theta.csv, Atlantic x Superior theta values from the 8303 SNP array (CSV 7793 KB)

AxS_r.csv, Atlantic x Superior r values from the 8303 SNP array (CSV 22632 KB)

WxL_theta.csv, Wauseon x Lenape theta values from the 8303 SNP array (CSV 9252 KB)

WxL_r.csv, Wauseon x Lenape r values from the 8303 SNP array for potato (CSV 27005 KB)

RGxP_r.csv, Rio Grande x Premier r values from the 8303 SNP array (CSV 8015 KB)

SolCAP_theta.csv, SolCAP diversity panel (n=187) theta values from the 8303 SNP array (CSV 8952 KB)

RGxP_theta.csv, Rio Grande x Premier theta values from the 8303 SNP array (CSV 7891 KB)

TableS1.csv, Linkage groups for Atlantic x Superior, Wauseon x Lenape, and Rio Grande x Premier (CSV 96 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Schmitz Carley, C.A., Coombs, J.J., Douches, D.S. et al. Automated tetraploid genotype calling by hierarchical clustering. Theor Appl Genet 130, 717–726 (2017). https://doi.org/10.1007/s00122-016-2845-5

Download citation

Keywords

  • Single Nucleotide Polymorphism Array
  • Genotype Call
  • Diversity Panel
  • Prediction Phase
  • Allele Dosage