Abstract
Application of imputation methods to accurately predict a dense array of SNP genotypes in the dog could provide an important supplement to current analyses of array-based genotyping data. Here, we developed a reference panel of 4,885,283 SNPs in 83 dogs across 15 breeds using whole genome sequencing. We used this panel to predict the genotypes of 268 dogs across three breeds with 84,193 SNP array-derived genotypes as inputs. We then (1) performed breed clustering of the actual and imputed data; (2) evaluated several reference panel breed combinations to determine an optimal reference panel composition; and (3) compared the accuracy of two commonly used software algorithms (Beagle and IMPUTE2). Breed clustering was well preserved in the imputation process across eigenvalues representing 75 % of the variation in the imputed data. Using Beagle with a target panel from a single breed, genotype concordance was highest using a multi-breed reference panel (92.4 %) compared to a breed-specific reference panel (87.0 %) or a reference panel containing no breeds overlapping with the target panel (74.9 %). This finding was confirmed using target panels derived from two other breeds. Additionally, using the multi-breed reference panel, genotype concordance was slightly higher with IMPUTE2 (94.1 %) compared to Beagle; Pearson correlation coefficients were slightly higher for both software packages (0.946 for Beagle, 0.961 for IMPUTE2). Our findings demonstrate that genotype imputation from SNP array-derived data to whole genome-level genotypes is both feasible and accurate in the dog with appropriate breed overlap between the target and reference panels.
Similar content being viewed by others
References
Abecasis GR, Altshuler D, Auton A et al (2010) A map of human genome variation from population-scale sequencing. Nature 467:1061–1073. doi:10.1038/nature09534
Ahonen SJ, Pietilä E, Mellersh CS et al (2013) Genome-wide association study identifies a novel canine glaucoma locus. PLoS ONE 8:e70903–e70903. doi:10.1371/journal.pone.0070903
Axelsson E, Ratnakumar A, Arendt M-L et al (2013) The genomic signature of dog domestication reveals adaptation to a starch-rich diet. Nature 495:360–364. doi:10.1038/nature11837
Bannasch D, Young A, Myers J et al (2010) Localization of canine brachycephaly using an across breed mapping approach. PLoS ONE 5:e9632. doi:10.1371/journal.pone.0009632
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120. doi:10.1093/bioinformatics/btu170
Bolormaa S, Gore K, van der Werf JHJ et al (2015) Design of a low-density SNP chip for the main Australian sheep breeds and its effect on imputation and genomic prediction accuracy. Anim Genet 46:544–556. doi:10.1111/age.12340
Bouwman AC, Veerkamp RF (2014) Consequences of splitting whole-genome sequencing effort over multiple breeds on imputation accuracy. BMC Genet 15:105. doi:10.1186/s12863-014-0105-8
Brøndum RF, Guldbrandtsen B, Sahana G et al (2014) Strategies for imputation to whole genome sequence using a single or multi-breed reference population in cattle. BMC Genom 15:728. doi:10.1186/1471-2164-15-728
Browning SR, Browning BL (2007) Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet 81:1084–1097. doi:10.1086/521987
Calboli FCF, Sampson J, Fretwell N, Balding DJ (2008) Population structure and inbreeding from pedigree analysis of purebred dogs. Genetics 179:593–601. doi:10.1534/genetics.107.084954
Calus MPL, Bouwman AC, Hickey JM et al (2014) Evaluation of measures of correctness of genotype imputation in the context of genomic prediction: a review of livestock applications. Animal 8:1743–1753. doi:10.1017/S1751731114001803
Chang CC, Chow CC, Tellier LC et al (2015) Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4:7. doi:10.1186/s13742-015-0047-8
Check Hayden E (2014) Is the $1000 genome for real? Nature. doi:10.1038/nature.2014.14530
Daetwyler HD, Capitan A, Pausch H et al (2014) Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle. Nat Genet 46:858–865. doi:10.1038/ng.3034
Danecek P, Auton A, Abecasis G et al (2011) The variant call format and VCFtools. Bioinformatics 27:2156–2158. doi:10.1093/bioinformatics/btr330
Delaneau O, Marchini J, Zagury J-F (2012) A linear complexity phasing method for thousands of genomes. Nat Methods 9:179–181. doi:10.1038/nmeth.1785
DePristo MA, Banks E, Poplin R et al (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43:491–498. doi:10.1038/ng.806
Frischknecht M, Neuditschko M, Jagannathan V et al (2014) Imputation of sequence level genotypes in the Franches-Montagnes horse breed. Genet Sel Evol 46:63. doi:10.1186/s12711-014-0063-7
Guo G, Zhou Z, Wang Y et al (2011) Canine hip dysplasia is predictable by genotyping. Osteoarthr Cartil 19:420–429. doi:10.1016/j.joca.2010.12.011
Heidaritabar M, Calus MPL, Vereijken A et al (2015) Accuracy of imputation using the most common sires as reference population in layer chickens. BMC Genet 16:101. doi:10.1186/s12863-015-0253-5
Howie BN, Donnelly P, Marchini J (2009) A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet 5:e1000529–e1000529. doi:10.1371/journal.pgen.1000529
Howie B, Fuchsberger C, Stephens M et al (2012) Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat Genet 44:955–959. doi:10.1038/ng.2354
Huang Y, Hickey JM, Cleveland MA, Maltecca C (2012) Assessment of alternative genotyping strategies to maximize imputation accuracy at minimal cost. Genet Sel Evol 44:25. doi:10.1186/1297-9686-44-25
Kent WJ, Sugnet CW, Furey TS et al (2002) The human genome browser at UCSC. Genome Res 12:996–1006. doi:10.1101/gr.229102
Kreiner-Moller E, Medina-Gomez C, Uitterlinden AG et al (2015) Improving accuracy of rare variant imputation with a two-step imputation approach. Eur J Hum Genet 23:395–400. doi:10.1038/ejhg.2014.91
Larmer SG, Sargolzaei M, Schenkel FS (2014) Extent of linkage disequilibrium, consistency of gametic phase, and imputation accuracy within and across Canadian dairy breeds. J Dairy Sci 97:3128–3141. doi:10.3168/jds.2013-6826
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760. doi:10.1093/bioinformatics/btp324
Li Y, Willer C, Sanna S, Abecasis G (2009) Genotype imputation. Annu Rev Genom Human Genet 10:387–406. doi:10.1146/annurev.genom.9.081307.164242
Li L, Li Y, Browning SR et al (2011) Performance of genotype imputation for rare variants identified in exons and flanking regions of genes. PLoS ONE 6:e24945–e24945. doi:10.1371/journal.pone.0024945
Lindblad-Toh K, Wade CM, Mikkelsen TS et al (2005) Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature 438:803–819. doi:10.1038/nature04338
Marchini J, Howie B (2010) Genotype imputation for genome-wide association studies. Nat Rev Genet 11:499–511. doi:10.1038/nrg2796
Marchini J, Howie B, Myers S et al (2007) A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet 39:906–913. doi:10.1038/ng2088
McKenna A, Hanna M, Banks E et al (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20:1297–1303. doi:10.1101/gr.107524.110
Meurs KM, Mauceli E, Lahmers S et al (2010) Genome-wide association identifies a deletion in the 3′ untranslated region of striatin in a canine model of arrhythmogenic right ventricular cardiomyopathy. Hum Genet 128:315–324. doi:10.1007/s00439-010-0855-y
Pausch H, Aigner B, Emmerling R et al (2013) Imputation of high-density genotypes in the Fleckvieh cattle population. Genet Sel Evol 45:3. doi:10.1186/1297-9686-45-3
Stern JA, White SN, Meurs KM (2013) Extent of linkage disequilibrium in large-breed dogs: chromosomal and breed variation. Mamm Genome 24:409–415. doi:10.1007/s00335-013-9474-y
Sutter NB, Eberle MA, Parker HG et al (2004) Extensive and breed-specific linkage disequilibrium in Canis familiaris. Genome Res 14:2388–2396. doi:10.1101/gr.3147604
Team RC (2015) R: A language and environment for statistical computing
Van der Auwera GA, Carneiro MO, Hartl C, et al (2013) From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinform 11:11.10.1–11.10.33. doi: 10.1002/0471250953.bi1110s43
Verma SS, de Andrade M, Tromp G et al (2014) Imputation and quality control steps for combining multiple genome-wide datasets. Front Genet 5:370. doi:10.3389/fgene.2014.00370
Wilbe M, Jokinen P, Truvé K et al (2010) Genome-wide association mapping identifies multiple loci for a canine SLE-related disease complex. Nat Genet 42:250–254. doi:10.1038/ng.525
Wong AK, Ruhe AL, Dumont BL et al (2010) A comprehensive linkage map of the dog genome. Genetics 184:595–605. doi:10.1534/genetics.109.106831
Acknowledgments
SGF is supported by a National Institutes of Health T32 training award (5T32OD011130-07). Funding for whole genome sequencing was provided in part by the Poodle Club of America Foundation and the American Kennel Club Canine Health Foundation. Some whole genome sequencing data were graciously contributed by Drs. Leigh Anne Clark (13 dogs), Natasha J. Olby and Theirry Olivry (11 dogs), and Joshua A. Stern (2 dogs).
Authors contributions
SGF collected samples, designed the study, analyzed the data, and wrote the manuscript. KMM collected samples and supervised the study. All authors have read and edited the manuscript.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflicts of interest.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Friedenberg, S.G., Meurs, K.M. Genotype imputation in the domestic dog. Mamm Genome 27, 485–494 (2016). https://doi.org/10.1007/s00335-016-9636-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00335-016-9636-9