Theoretical and Applied Genetics

, Volume 130, Issue 4, pp 717–726

Automated tetraploid genotype calling by hierarchical clustering

  • Cari A. Schmitz Carley
  • Joseph J. Coombs
  • David S. Douches
  • Paul C. Bethke
  • Jiwan P. Palta
  • Richard G. Novy
  • Jeffrey B. Endelman
Original Article

DOI: 10.1007/s00122-016-2845-5

Cite this article as:
Schmitz Carley, C.A., Coombs, J.J., Douches, D.S. et al. Theor Appl Genet (2017) 130: 717. doi:10.1007/s00122-016-2845-5


Key message

New software to make tetraploid genotype calls from SNP array data was developed, which uses hierarchical clustering and multiple F1 populations to calibrate the relationship between signal intensity and allele dosage.


SNP arrays are transforming breeding and genetics research for autotetraploids. To fully utilize these arrays, the relationship between signal intensity and allele dosage must be calibrated for each marker. We developed an improved computational method to automate this process, which is provided as the R package ClusterCall. In the training phase of the algorithm, hierarchical clustering within an F1 population is used to group samples with similar intensity values, and allele dosages are assigned to clusters based on expected segregation ratios. In the prediction phase, multiple F1 populations and the prediction set are clustered together, and the genotype for each cluster is the mode of the training set samples. A concordance metric, defined as the proportion of training set samples equal to the mode, can be used to eliminate unreliable markers and compare different algorithms. Across three potato families genotyped with an 8K SNP array, ClusterCall scored 5729 markers with at least 0.95 concordance (94.6% of its total), compared to 5325 with the software fitTetra (82.5% of its total). The three families were used to predict genotypes for 5218 SNPs in the SolCAP diversity panel, compared with 3521 SNPs in a previous study in which genotypes were called manually. One of the additional markers produced a significant association for vine maturity near a well-known causal locus on chromosome 5. In conclusion, when multiple F1 populations are available, ClusterCall is an efficient method for accurate, autotetraploid genotype calling that enables the use of SNP data for research and plant breeding.

Supplementary material

122_2016_2845_MOESM1_ESM.pdf (846 kb)
SupplementaryMaterial.pdf, Figures S1 through S6 and Table S2 (PDF 845 KB)
122_2016_2845_MOESM2_ESM.csv (7.6 mb)
AxS_theta.csv, Atlantic x Superior theta values from the 8303 SNP array (CSV 7793 KB)
122_2016_2845_MOESM3_ESM.csv (22.1 mb)
AxS_r.csv, Atlantic x Superior r values from the 8303 SNP array (CSV 22632 KB)
122_2016_2845_MOESM4_ESM.csv (9 mb)
WxL_theta.csv, Wauseon x Lenape theta values from the 8303 SNP array (CSV 9252 KB)
122_2016_2845_MOESM5_ESM.csv (26.4 mb)
WxL_r.csv, Wauseon x Lenape r values from the 8303 SNP array for potato (CSV 27005 KB)
122_2016_2845_MOESM6_ESM.csv (7.8 mb)
RGxP_r.csv, Rio Grande x Premier r values from the 8303 SNP array (CSV 8015 KB)
122_2016_2845_MOESM7_ESM.csv (8.7 mb)
SolCAP_theta.csv, SolCAP diversity panel (n=187) theta values from the 8303 SNP array (CSV 8952 KB)
122_2016_2845_MOESM8_ESM.csv (7.7 mb)
RGxP_theta.csv, Rio Grande x Premier theta values from the 8303 SNP array (CSV 7891 KB)
122_2016_2845_MOESM9_ESM.csv (97 kb)
TableS1.csv, Linkage groups for Atlantic x Superior, Wauseon x Lenape, and Rio Grande x Premier (CSV 96 KB)

Funding information

Funder NameGrant NumberFunding Note
National Institute of Food and Agriculture
  • 2014-67013-22418
  • Hatch 1002731

Copyright information

© Springer-Verlag Berlin Heidelberg 2017

Authors and Affiliations

  1. 1.Department of HorticultureUniversity of WisconsinMadisonUSA
  2. 2.Department of Plant, Soil and Microbial SciencesMichigan State UniversityEast LansingUSA
  3. 3.USDA Agricultural Research ServiceMadisonUSA
  4. 4.USDA–ARS Small Grains and Potato Germplasm Research UnitAberdeenUSA

Personalised recommendations