Automated tetraploid genotype calling by hierarchical clustering
- 936 Downloads
New software to make tetraploid genotype calls from SNP array data was developed, which uses hierarchical clustering and multiple F1 populations to calibrate the relationship between signal intensity and allele dosage.
SNP arrays are transforming breeding and genetics research for autotetraploids. To fully utilize these arrays, the relationship between signal intensity and allele dosage must be calibrated for each marker. We developed an improved computational method to automate this process, which is provided as the R package ClusterCall. In the training phase of the algorithm, hierarchical clustering within an F1 population is used to group samples with similar intensity values, and allele dosages are assigned to clusters based on expected segregation ratios. In the prediction phase, multiple F1 populations and the prediction set are clustered together, and the genotype for each cluster is the mode of the training set samples. A concordance metric, defined as the proportion of training set samples equal to the mode, can be used to eliminate unreliable markers and compare different algorithms. Across three potato families genotyped with an 8K SNP array, ClusterCall scored 5729 markers with at least 0.95 concordance (94.6% of its total), compared to 5325 with the software fitTetra (82.5% of its total). The three families were used to predict genotypes for 5218 SNPs in the SolCAP diversity panel, compared with 3521 SNPs in a previous study in which genotypes were called manually. One of the additional markers produced a significant association for vine maturity near a well-known causal locus on chromosome 5. In conclusion, when multiple F1 populations are available, ClusterCall is an efficient method for accurate, autotetraploid genotype calling that enables the use of SNP data for research and plant breeding.
KeywordsSingle Nucleotide Polymorphism Array Genotype Call Diversity Panel Prediction Phase Allele Dosage
Financial support was provided by the National Institute of Food and Agriculture, U.S. Department of Agriculture, Award Number 2014-67013-22418 and Hatch Project Number 1002731.
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest.
- Gallais A (2003) Quantitative genetics and breeding methods in autopolyploid plants. INRA, ParisGoogle Scholar
- Hirsch CN, Hirsch CD, Felcher K, Coombs J, Zarka D, Van Deynze A, De Jong W, Veilleux RE, Jansky S, Bethke P, Douches DS, Buell CR (2013) Retrospective view of North American potato (Solanum tuberosum L.) breeding in the 20th and 21st centuries. G3 3:1003–1013. doi: 10.1534/g3.113.005595 CrossRefPubMedPubMedCentralGoogle Scholar
- Kloosterman B, Abelenda JA, Carretero Gomez MM, Oortwijn M, de Boer JM, Kowitwanich K, Horvath BM, van Eck HJ, Smaczniak C, Prat S, Visser RGF, Bachem CWB (2013) Naturally occurring allele diversity allows potato cultivation in northern latitudes. Nature 495:246–250. doi: 10.1038/nature11912 CrossRefPubMedGoogle Scholar
- Koning-Boucoiran CFS, Esselink GD, Vukosavljev M, van’t Westende WPC, Gitonga VW, Krens FA, Voorrips RE, van de Weg WE, Schulz D, Debener T, Maliepaard C, Arens P, Smulders MJM (2015) Using RNA-seq to assemble a rose transcriptom with more than 13,000 full-length expressed genes and to develop the WagRhSNP 68k Axiom SNP array for rose (Rose L.). Front Plant Sci 6:249. doi: 10.3389/fpls.2015.00249 CrossRefPubMedPubMedCentralGoogle Scholar
- Leal-Bertioli S, Shirasawa K, Abernathy B, Moretzsohn M, Chavarro C, Clevenger J, Ozias-Akins P, Jackson S, Bertioli D (2015) Tetrasomic recombination is surprisingly frequent in allotetraploid Arachis. Genetics 199:1093–1105. doi: 10.1534/genetics.115.174607 CrossRefPubMedPubMedCentralGoogle Scholar
- Li X, van Eck HJ, Rouppe van der Voort JNAM, Huigen DJ, Stam P, Jacobsen E (1998) Autotetraploids and genetic mapping using common AFLP markers: the R2 allele conferring resistance to Phytophthora intestans mapped on potato chromosome 4. Theor Appl Genet 96:1121–1128. doi: 10.1007/s001220050847 CrossRefGoogle Scholar
- Li X, Han Y, Wei Y, Acharya A, Farmer AD, Ho J, Monteros MJ, Brummer EC (2014a) Development of an alfalfa SNP array and its use to evaluate patterns of population structure and linkage disequilibrium. Plos One. doi: 10.1371/journal.pone.0084329
- Li X, Wei Y, Acharya A, Jiang Q, Kang J, Brummer EC (2014b) A saturated genetic linkage map of autotetraploid alfalfa (Medicago sativa L.) developed using genotyping-by-sequencing is highly syntenous with the Medicago truncatula genome. G3 4:1971–1979. doi: 10.1534/g3.114.012245
- R Development Core Team (2015) R: a language and environment for statistical computing. R Foundation for Statistical Computing, ViennaGoogle Scholar
- Sharma SK, Bolser D, de Boer J, Sønderkær M, Amoros W, Carboni MF, D’Ambrosio JM, de la Cruz G, Di Genova A, Douches DS, Eguiluz M, Guo X, Guzman F, Hackett CA, Hamilton JP, Li G, Li Y, Lozano R, Maass A, Marshall D, Martinez D, McLean K, Mejía N, Milne L, Munive S, Nagy I, Ponce O, Ramirez M, Simon R, Thomson SJ, Torres Y, Waugh R, Zhang Z, Huang S, Visser RGF, Bachem CWB, Sagredo B, Feingold SE, Orjeda G, Veilleux RE, Bonierbale M, Jacobs JME, Milbourne D, Martin DMA, Bryan GJ (2013) Construction of reference chromosome-scale pseudomolecules for potato: integrating the potato genome with genetic and physical maps. G3 3:2031–2047. doi: 10.1534/g3.113.007153 CrossRefPubMedPubMedCentralGoogle Scholar
- Stebbins GL (1950) Variation and evolution in plants. Columbia University Press, New YorkGoogle Scholar