Effect of sample stratification on dairy GWAS results
Artificial insemination and genetic selection are major factors contributing to population stratification in dairy cattle. In this study, we analyzed the effect of sample stratification and the effect of stratification correction on results of a dairy genome-wide association study (GWAS). Three methods for stratification correction were used: the efficient mixed-model association expedited (EMMAX) method accounting for correlation among all individuals, a generalized least squares (GLS) method based on half-sib intraclass correlation, and a principal component analysis (PCA) approach.
Historical pedigree data revealed that the 1,654 contemporary cows in the GWAS were all related when traced through approximately 10–15 generations of ancestors. Genome and phenotype stratifications had a striking overlap with the half-sib structure. A large elite half-sib family of cows contributed to the detection of favorable alleles that had low frequencies in the general population and high frequencies in the elite cows and contributed to the detection of X chromosome effects. All three methods for stratification correction reduced the number of significant effects. EMMAX method had the most severe reduction in the number of significant effects, and the PCA method using 20 principal components and GLS had similar significance levels. Removal of the elite cows from the analysis without using stratification correction removed many effects that were also removed by the three methods for stratification correction, indicating that stratification correction could have removed some true effects due to the elite cows. SNP effects with good consensus between different methods and effect size distributions from USDA’s Holstein genomic evaluation included the DGAT1-NIBP region of BTA14 for production traits, a SNP 45kb upstream from PIGY on BTA6 and two SNPs in NIBP on BTA14 for protein percentage. However, most of these consensus effects had similar frequencies in the elite and average cows.
Genetic selection and extensive use of artificial insemination contributed to overlapped genome, pedigree and phenotype stratifications. The presence of an elite cluster of cows was related to the detection of rare favorable alleles that had high frequencies in the elite cluster and low frequencies in the remaining cows. Methods for stratification correction could have removed some true effects associated with genetic selection.
- blah 12864_2010_4352_MOESM1_ESM.pdf (2997KB)
- blah 12864_2010_4352_MOESM2_ESM.pdf (2298KB)
- blah 12864_2010_4352_MOESM3_ESM.pdf (3065KB)
- blah 12864_2010_4352_MOESM4_ESM.pdf (3017KB)
- blah 12864_2010_4352_MOESM5_ESM.pdf (7506KB)
- blah 12864_2010_4352_MOESM6_ESM.pdf (15006KB)
- blah 12864_2010_4352_MOESM7_ESM.xlsx (1938KB)
- blah 12864_2010_4352_MOESM8_ESM.xlsx (640KB)
- blah 12864_2010_4352_MOESM9_ESM.xlsx (532KB)
- blah 12864_2010_4352_MOESM10_ESM.xlsx (173KB)
- blah 12864_2010_4352_MOESM11_ESM.pdf (13381KB)
- Devlin B, Roeder K: Genomic control for association studies. Biometrics 1999, 55:997–1004. CrossRef
- Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D: Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 2006, 38:904–909. CrossRef
- Kang HM, Sul JH, Service SK, Zaitlen NA, Kong S, Freimer NB, Sabatti S, Eskin E: Variance component model to account for sample structure in genome-wide association studies. Nat Genet 2010, 42:348–354. CrossRef
- Zhang Z, Ersoz E, Lai C-Q, Todhunter RJ, Tiwari HK, Gore MA, Bradbury PJ, Yu J, Arnett DK, Ordovas JM, Buckler ES: Mixed linear model approach adapted for genome-wide association studies. Nat Genet 2010, 42:355–360. CrossRef
- Sonstegard TS, Ma L, Van Tassell CP, Kim E-S, Cole JB, Wiggans GR, Crooker BA, Mariani BD, Matukumalli LK, Garbe JR, Fahrenkrug SC, Liu G, Da Y: Forty years of artificial selection in U.S. Holstein cattle had genome-wide signatures. Leipzig, Germany: Poster presentation at 9th World Congr. Genet. Appl. Livest. Prod; 2010. [ http://aipl.arsusda.gov/publish/presentations/WC9_10/WC9_10_yang_da.pdf]
- Cole JB, Wiggans GR, Ma L, Sonstegard TS, Lawlor TJ, Crooker BA, Van Tassell CP, Yang J, Wang S, Matukumalli LK, Da Y: Genome-wide association analysis of thirty one production, health, reproduction and body conformation traits in contemporary US Holstein cows. BMC Genomics 2011,12(1):408. CrossRef
- Ma L: Generalized least squares method to account for sib correlation for testing SNP single-locus and epistasis effects in genome-wide association analysis. University of Minnesota: Ph.D. thesis (Chapter 3). Department of Animal Science; 2010.
- Ma L, Amos CI, Da Y: Accounting for correlations among individuals for testing SNP single-locus and epistasis effects in genome-wide association analysis [abstract]. International Plant & Animal Genome Conference; 2008. [Plant anim genome XVIII conf abstr [online]] http://www.intl-pag.org/16/abstracts/PAG16_P11_903.html
- Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, Sham PC: PLINK: a toolset for whole-genome association and population-based linkage analysis. Am J Hum Genet 2007, 81:559–575. CrossRef
- Balding DJ, Nichols RA: A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity. Genetica 1995, 96:3–12. CrossRef
- Zhao JH: gap: Genetic analysis package. J Stat Softw 2007.,23(i08): http://www.jstatsoft.org/v23/i08/paper
- Wang S, Dvorkin D, Da Y: SNPEVG: A graphical tool for SNP effect viewing and graphing. [ http://animalgene.umn.edu/snpevg/index.html],Version 3.1, June 6, 2012.
- VanRaden PM: Efficient methods to compute genomic predictions J. Dairy Sci. 2008, 91:4414–4423. CrossRef
- Cole JB, VanRaden PM, O’Connell JR, Van Tassell CP, Sonstegard TS, Schnabel RD, Taylor JF, Wiggans GR: Distribution and location of genetic effects for dairy traits. J Dairy Sci 2009, 92:2931–2946. CrossRef
- Wiggans GR, VanRaden PM, Cooper TA: The genomic evaluation system in the united states: past, present, future. J Dairy Sci 2011, 94:3202–3211. CrossRef
- Barton NH: Genetic hitchhiking. Phil. Trans. R. Soc. Lond. B 2000, 55:1553–1562. CrossRef
- Sabeti PC, Varilly P, Fry B, Lohmueller J, Elizabeth Hostetter E, Cotsapas C, Xie X, Byrne EH, McCarroll SA, Gaudet R, Schaffner SF, Lander E & The International HapMap Consortium: Genome-wide detection and characterization of positive selection in human populations. Nature 2007, 449:913–918. CrossRef
- Rubin CJ, Zody MC, Eriksson J, Meadows JRS, Sherwood E, Webster MT, Jiang L, Ingman M, Sharpe T, Ka S, Hallböök F, Besnier F, Carlborg Ö, Bed’hom B, Tixier-Boichard M, Jensen P, Siegel P, Lindblad-Toh K, Andersson L: Whole-genome resequencing reveals loci under selection during chicken domestication. Nature 2010, 464:587–591. CrossRef
- Qin H, Morris N, Kang SJ, Li M, Tayo B, Lyon H, Hirschhorn J, Cooper RS, Zhu X: Interrogating local population structure for fine mapping in genome-wide association studies. Bioinformatics 2010, 26:2961–2968. CrossRef
- Shriner D, Adeyemo A, Ramos E, Chen G, Rotimi CN: Mapping of disease-associated variants in admixed populations. Genome Biol 2011, 12:1–8. CrossRef
- Garbe JR, Da Y: Pedigraph: A pedigree and genealogy visualization program for drawing large complex pedigrees. University of Minnesota: Department of Animal Science; 2004. [User manual version 2.3]
- Donner A, Koval JJ: The estimation of intraclass correlation in the analysis of family data. Biometrics 1980, 36:19–25. CrossRef
- Hartley HO, Rao JNK: Maximum likelihood estimation for mixed analysis of variance model. Biometrika 1967, 54:93–108.
- Mao Y, London NR, Ma L, Dvorkin D, Da Y: Detection of SNP epistasis effects of quantitative traits using an extended kempthorne model. Physiol Genomics 2007, 28:46–52.
- Ma L, Runesha HB, Dvorkin D, Garbe JR, Da Y: Parallel and serial computing tools for testing single-locus and epistatic SNP effects of quantitative traits in genome-wide association studies. BMC Bioinforma 2008, 9:315. CrossRef
- Zimin AV, Delcher AL, Florea L, Kelley DR, Schatz MC, Puiu D, Hanrahan F, Pertea G, Van Tassell CP, Sonstegard TS, Marçais G, Roberts M, Subramanian P, Yorke JA, Salzberg SL: A whole-genome assembly of the domestic cow. Bos taurus. Genome Biol 2009,10(4):R42. CrossRef
- Zimin AV, Puiu D, Marcais G, Delcher A, Yorke JA, Salzberg SL: The latest high-quality bovine genome assembly, UMD Bos Taurus 3.0 [Abstract]. International Plant & Animal Genome Conference; 2010. [Plant anim genome XVIII conf abstr [online]] http://www.intl-pag.org/18/abstracts/W17_PAGXVIII_135.html
- ENSEMBL Genome Browser. Release 63, June 2011http://www.ensembl.org/index.html
- National Center for Biotechnology Information (NCBI)[ http://www.ncbi.nlm.nih.gov]
- Effect of sample stratification on dairy GWAS results
- Open Access
- Available under Open Access This content is freely available online to anyone, anywhere at any time.
- Online Date
- October 2012
- Online ISSN
- BioMed Central
- Additional Links
- Industry Sectors
- Author Affiliations
- 1. Department of Animal Science, University of Minnesota, St. Paul, Minnesota, USA
- 5. Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA
- 2. Animal Improvement Programs Laboratory, Agricultural Research Service, USDA, Beltsville, Maryland, USA
- 3. Bovine Functional Genomics Laboratory, Agricultural Research Service, USDA, Beltsville, Maryland, USA
- 4. Holstein Association USA, Brattleboro, Vermont, USA