Theoretical and Applied Genetics

, Volume 130, Issue 4, pp 717–726 | Cite as

Automated tetraploid genotype calling by hierarchical clustering

  • Cari A. Schmitz Carley
  • Joseph J. Coombs
  • David S. Douches
  • Paul C. Bethke
  • Jiwan P. Palta
  • Richard G. Novy
  • Jeffrey B. Endelman
Original Article


Key message

New software to make tetraploid genotype calls from SNP array data was developed, which uses hierarchical clustering and multiple F1 populations to calibrate the relationship between signal intensity and allele dosage.


SNP arrays are transforming breeding and genetics research for autotetraploids. To fully utilize these arrays, the relationship between signal intensity and allele dosage must be calibrated for each marker. We developed an improved computational method to automate this process, which is provided as the R package ClusterCall. In the training phase of the algorithm, hierarchical clustering within an F1 population is used to group samples with similar intensity values, and allele dosages are assigned to clusters based on expected segregation ratios. In the prediction phase, multiple F1 populations and the prediction set are clustered together, and the genotype for each cluster is the mode of the training set samples. A concordance metric, defined as the proportion of training set samples equal to the mode, can be used to eliminate unreliable markers and compare different algorithms. Across three potato families genotyped with an 8K SNP array, ClusterCall scored 5729 markers with at least 0.95 concordance (94.6% of its total), compared to 5325 with the software fitTetra (82.5% of its total). The three families were used to predict genotypes for 5218 SNPs in the SolCAP diversity panel, compared with 3521 SNPs in a previous study in which genotypes were called manually. One of the additional markers produced a significant association for vine maturity near a well-known causal locus on chromosome 5. In conclusion, when multiple F1 populations are available, ClusterCall is an efficient method for accurate, autotetraploid genotype calling that enables the use of SNP data for research and plant breeding.


Single Nucleotide Polymorphism Array Genotype Call Diversity Panel Prediction Phase Allele Dosage 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



Financial support was provided by the National Institute of Food and Agriculture, U.S. Department of Agriculture, Award Number 2014-67013-22418 and Hatch Project Number 1002731.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Supplementary material

122_2016_2845_MOESM1_ESM.pdf (846 kb)
SupplementaryMaterial.pdf, Figures S1 through S6 and Table S2 (PDF 845 KB)
122_2016_2845_MOESM2_ESM.csv (7.6 mb)
AxS_theta.csv, Atlantic x Superior theta values from the 8303 SNP array (CSV 7793 KB)
122_2016_2845_MOESM3_ESM.csv (22.1 mb)
AxS_r.csv, Atlantic x Superior r values from the 8303 SNP array (CSV 22632 KB)
122_2016_2845_MOESM4_ESM.csv (9 mb)
WxL_theta.csv, Wauseon x Lenape theta values from the 8303 SNP array (CSV 9252 KB)
122_2016_2845_MOESM5_ESM.csv (26.4 mb)
WxL_r.csv, Wauseon x Lenape r values from the 8303 SNP array for potato (CSV 27005 KB)
122_2016_2845_MOESM6_ESM.csv (7.8 mb)
RGxP_r.csv, Rio Grande x Premier r values from the 8303 SNP array (CSV 8015 KB)
122_2016_2845_MOESM7_ESM.csv (8.7 mb)
SolCAP_theta.csv, SolCAP diversity panel (n=187) theta values from the 8303 SNP array (CSV 8952 KB)
122_2016_2845_MOESM8_ESM.csv (7.7 mb)
RGxP_theta.csv, Rio Grande x Premier theta values from the 8303 SNP array (CSV 7891 KB)
122_2016_2845_MOESM9_ESM.csv (97 kb)
TableS1.csv, Linkage groups for Atlantic x Superior, Wauseon x Lenape, and Rio Grande x Premier (CSV 96 KB)


  1. Bourke PM, Voorrips RE, Visser RGF, Maliepaard C (2015) The double reduction landscape in tetraploid potato as revealed by a high-density linkage map. Genetics 201:853–863. doi: 10.1534/genetics.115.181008 CrossRefPubMedPubMedCentralGoogle Scholar
  2. Bradshaw JE, Hackett CA, Pande B, Waugh R, Bryan GJ (2008) QTL mapping of yield, agronomic and quality traits in tetraploid potato (Solanum tuberosum subsp. tuberosum). Theor Appl Genet 116:193–211CrossRefPubMedGoogle Scholar
  3. Brouwer DJ, Osborn TC (1999) A molecular marker linkage map of tetraploid alfalfa (Medicago sativa L.). Theor Appl Genet 99:1194–1200. doi: 10.1007/s001220051324 CrossRefGoogle Scholar
  4. Comai L (2005) The advantages and disadvantages of being polyploid. Nat Rev Genet 6:836–846. doi: 10.1038/nrg1711 CrossRefPubMedGoogle Scholar
  5. Douches D, Hirsch CN, Manrique-Carpintero NC, Massa AN, Coombs J, Hardigan M, Bisognin D, De Jong W, Buell CR (2014) The contribution of the Solanaceae coordinated agricultural project to potato breeding. Potato Res 57(3–4):215–224. doi: 10.1007/s11540-014-9267-z CrossRefGoogle Scholar
  6. Endelman J, Jansky S (2016) Genetic mapping with an inbred line derived F2 population in potato. Theor Appl Genet 129:935–943. doi: 10.1007/s00122-016-2673-7 CrossRefPubMedGoogle Scholar
  7. Felcher KJ, Coombs JJ, Massa AN, Hansey CN, Hamilton JP, Veilleux RE, Buell CB, Douches DS (2012) Integration of two diploid potato linkage maps with the potato genome sequence. PLoS One 7(4):e36347. doi: 10.1371/journal.pone.0036347 CrossRefPubMedPubMedCentralGoogle Scholar
  8. Gallais A (2003) Quantitative genetics and breeding methods in autopolyploid plants. INRA, ParisGoogle Scholar
  9. Hackett CA, McLean K, Bryan GJ (2013) Linkage analysis and QTL mapping using SNP dosage data in a tetraploid potato mapping population. PLoS One 8(5):e63939. doi: 10.1371/journal.pone.0063939 CrossRefPubMedPubMedCentralGoogle Scholar
  10. Hamilton JP, Hansey CN, Whitty BR, Stoffel K, Massa AN, Van Deynze A, De Jong WS, Douches DS, Buell CR (2011) Single nucleotide polymorphism discovery in elite North American potato germplasm. BMC Genom 12:302. doi: 10.1186/1471-2164-12-302 CrossRefGoogle Scholar
  11. Hirsch CN, Hirsch CD, Felcher K, Coombs J, Zarka D, Van Deynze A, De Jong W, Veilleux RE, Jansky S, Bethke P, Douches DS, Buell CR (2013) Retrospective view of North American potato (Solanum tuberosum L.) breeding in the 20th and 21st centuries. G3 3:1003–1013. doi: 10.1534/g3.113.005595 CrossRefPubMedPubMedCentralGoogle Scholar
  12. Jones GH, Khazanehdari KA, Ford-Lloyd BV (1996) Meiosis in the leek (Allium porrum L.) revisited. II. Metaphase I observations. Heredity 76:186–191CrossRefGoogle Scholar
  13. Kloosterman B, Abelenda JA, Carretero Gomez MM, Oortwijn M, de Boer JM, Kowitwanich K, Horvath BM, van Eck HJ, Smaczniak C, Prat S, Visser RGF, Bachem CWB (2013) Naturally occurring allele diversity allows potato cultivation in northern latitudes. Nature 495:246–250. doi: 10.1038/nature11912 CrossRefPubMedGoogle Scholar
  14. Koning-Boucoiran CFS, Esselink GD, Vukosavljev M, van’t Westende WPC, Gitonga VW, Krens FA, Voorrips RE, van de Weg WE, Schulz D, Debener T, Maliepaard C, Arens P, Smulders MJM (2015) Using RNA-seq to assemble a rose transcriptom with more than 13,000 full-length expressed genes and to develop the WagRhSNP 68k Axiom SNP array for rose (Rose L.). Front Plant Sci 6:249. doi: 10.3389/fpls.2015.00249 CrossRefPubMedPubMedCentralGoogle Scholar
  15. Krebs SL, Hancock JF (1989) Tetrasomic inheritance of isoenzyme markers in the highbush blueberry, Vaccinium corymbosum L. Heredity 63:11–18. doi: 10.1038/hdy.1989.70 CrossRefGoogle Scholar
  16. Leal-Bertioli S, Shirasawa K, Abernathy B, Moretzsohn M, Chavarro C, Clevenger J, Ozias-Akins P, Jackson S, Bertioli D (2015) Tetrasomic recombination is surprisingly frequent in allotetraploid Arachis. Genetics 199:1093–1105. doi: 10.1534/genetics.115.174607 CrossRefPubMedPubMedCentralGoogle Scholar
  17. Leitch IJ, Bennett MD (1997) Polyploidy in angiosperms. Trends Plant Sci 2:470–476. doi: 10.1016/S1360-1385(97)01154-0 CrossRefGoogle Scholar
  18. Li X, van Eck HJ, Rouppe van der Voort JNAM, Huigen DJ, Stam P, Jacobsen E (1998) Autotetraploids and genetic mapping using common AFLP markers: the R2 allele conferring resistance to Phytophthora intestans mapped on potato chromosome 4. Theor Appl Genet 96:1121–1128. doi: 10.1007/s001220050847 CrossRefGoogle Scholar
  19. Li X, De Jong H, De Jong DM, De Jong WS (2005) Inheritance and genetic mapping of tuber eye depth in cultivated diploid potatoes. Theor Appl Genet 110:1068–1073. doi: 10.1007/s00122-005-1927-6 CrossRefPubMedGoogle Scholar
  20. Li X, Han Y, Wei Y, Acharya A, Farmer AD, Ho J, Monteros MJ, Brummer EC (2014a) Development of an alfalfa SNP array and its use to evaluate patterns of population structure and linkage disequilibrium. Plos One. doi: 10.1371/journal.pone.0084329
  21. Li X, Wei Y, Acharya A, Jiang Q, Kang J, Brummer EC (2014b) A saturated genetic linkage map of autotetraploid alfalfa (Medicago sativa L.) developed using genotyping-by-sequencing is highly syntenous with the Medicago truncatula genome. G3 4:1971–1979. doi: 10.1534/g3.114.012245
  22. Luo ZW, Hackett CA, Bradshaw JE, McNicol JW, Milbourne D (2001) Construction of a genetic linkage map in tetraploid species using molecular markers. Genetics 157:1369–1385PubMedPubMedCentralGoogle Scholar
  23. Mather K (1936) Segregation and linkage in autotetraploids. J Genet 32(2):287–314. doi: 10.1007/BF02982683 CrossRefGoogle Scholar
  24. Potato Genome Sequencing Consortium (2011) Genome sequence and analysis of the tuber crop potato. Nature 475:189–195. doi: 10.1038/nature10158 CrossRefGoogle Scholar
  25. Quiros CF (1982) Tetrasomic segregation for multiple alleles in alfalfa. Genetics 101:117–127PubMedPubMedCentralGoogle Scholar
  26. R Development Core Team (2015) R: a language and environment for statistical computing. R Foundation for Statistical Computing, ViennaGoogle Scholar
  27. Renny-Byfield S, Wendel FJ (2014) Doubling down on genomes: polyploidy and crop plants. Am J Bot 101:1711–1725. doi: 10.3732/ajb.1400119 CrossRefPubMedGoogle Scholar
  28. Rosyara UR, De Jong WS, Douches DS, Endelman JB (2016) Software for genome-wide association studies in autopolyploids and its application to potato. Plant Genome 9. doi: 10.3835/plantgenome2015.08.0073 Google Scholar
  29. Serang O, Mollinari M, Garcia AAF (2012) Efficient exact maximum a posteriori computation for Bayesian SNP genotyping in polyploids. PLoS One 7:e30906CrossRefPubMedPubMedCentralGoogle Scholar
  30. Sharma SK, Bolser D, de Boer J, Sønderkær M, Amoros W, Carboni MF, D’Ambrosio JM, de la Cruz G, Di Genova A, Douches DS, Eguiluz M, Guo X, Guzman F, Hackett CA, Hamilton JP, Li G, Li Y, Lozano R, Maass A, Marshall D, Martinez D, McLean K, Mejía N, Milne L, Munive S, Nagy I, Ponce O, Ramirez M, Simon R, Thomson SJ, Torres Y, Waugh R, Zhang Z, Huang S, Visser RGF, Bachem CWB, Sagredo B, Feingold SE, Orjeda G, Veilleux RE, Bonierbale M, Jacobs JME, Milbourne D, Martin DMA, Bryan GJ (2013) Construction of reference chromosome-scale pseudomolecules for potato: integrating the potato genome with genetic and physical maps. G3 3:2031–2047. doi: 10.1534/g3.113.007153 CrossRefPubMedPubMedCentralGoogle Scholar
  31. Simko I, Haynes KG, Jones RW (2006) Assessment of linkage disequilibrium in potato genome with single nucleotide polymorphism markers. Genetics 173:2237–2245. doi: 10.1534/genetics.106.060905 CrossRefPubMedPubMedCentralGoogle Scholar
  32. Stebbins GL (1950) Variation and evolution in plants. Columbia University Press, New YorkGoogle Scholar
  33. Uitdewilligen JGAML, Wolters AMA, D’hoop BB, Borm TJA, Visser RGF, van Eck HJ (2013) A next-generation sequencing method for genotyping-by-sequencing of highly heterozygous autotetraploid potato. PLoS One 8:e62355CrossRefPubMedPubMedCentralGoogle Scholar
  34. van Eck HJ, Jacobs JM, Stam P, Ton J, Stiekema WJ, Jacobsen E (1994) Multiple alleles for tuber shape in diploid potato detected by qualitative and quantitative genetic analysis using RFLPs. Genetics 137:303–309PubMedGoogle Scholar
  35. Venables WN, Ripley BD (2002) Modern Applied Statistics with S. 4th edition. Springer, New York.CrossRefGoogle Scholar
  36. Voorrips RE, Gort G, Vosman B (2011) Genotype calling in tetraploid species from bi-allelic marker data using mixture models. BMC Bioinform 12:172. doi: 10.1186/1471-2105-12-172 CrossRefGoogle Scholar
  37. Vos PG, Uitdewilligen JGAML, Voorrips RE, Visser RGF, van Eck HJ (2015) Development and analysis of a 20 K SNP array for potato (Solanum tuberosum): an insight into the breeding history. Theor Appl Genet 128:2387–2401. doi: 10.1007/s00122-015-2593-y CrossRefPubMedPubMedCentralGoogle Scholar
  38. Wu KK, Burnquist W, Sorrells ME, Tew TL, Moore PH, Tanksley SD (1992) The detection and estimation of linkage in polyploids using single-dose restriction fragments. Theor Appl Genet 83:294–300. doi: 10.1007/BF00224274 CrossRefPubMedGoogle Scholar
  39. Zheng C, Voorrips RE, Jansen J, Hackett CA, Ho J, Bink MCAM (2016) Probabilistic multilocus haplotype reconstruction in outcrossing tetraploids. Genetics 203:119–131. doi: 10.1534/genetics.115.185579 CrossRefPubMedGoogle Scholar
  40. Zorrilla C, Navarro F, Vega S, Bamberg J, Palta J (2014) Identification and selection for tuber calcium, internal quality and pitted scab in segregating ‘Atlantic’ × ‘Superior’ reciprocal tetraploid populations. Am J Potato Res 91:673–687. doi: 10.1007/s12230-014-9399-3 CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2017

Authors and Affiliations

  1. 1.Department of HorticultureUniversity of WisconsinMadisonUSA
  2. 2.Department of Plant, Soil and Microbial SciencesMichigan State UniversityEast LansingUSA
  3. 3.USDA Agricultural Research ServiceMadisonUSA
  4. 4.USDA–ARS Small Grains and Potato Germplasm Research UnitAberdeenUSA

Personalised recommendations