Abstract
Key message
Based on the simplified FaST-LMM, wherein genomic variance is replaced with heritability, we have significantly improved computational efficiency by implementing rapid R/fastLmPure to statistically infer the genetic effects of tested SNPs and focus on large or highly significant SNPs obtained using the EMMAX algorithm.
Abstract
For a genome-wide mixed-model association analysis, we introduce a barebones linear model fitting function called fastLmPure from the R/RcppArmadillo package for the rapid estimation of single nucleotide polymorphism (SNP) effects and the maximum likelihood values of factored spectrally transformed linear mixed models (FaST-LMM). Starting from the estimated genomic heritability of quantitative traits under a null model without quantitative trait nucleotides, maximum likelihood estimations of the polygenic heritabilities of candidate markers consume the same time as approximately four rounds of genome-wide regression scans. When focusing only on SNPs with large effects or high significance levels, as estimated by the efficient mixed-model association expedited algorithm, the run time of genome-wide mixed-model association analysis is reduced to at most two rounds of genome-wide regression scans. We have developed a novel software application called Single-RunKing to transform nonlinear mixed-model association analyses into barebones linear regression scans. Based on a realised relationship matrix calculated using genome-wide markers, Single-RunKing saves significantly computation time, as compared with the FaST-LMM that optimises the variance ratios of polygenic variances to residual variances using the R/lm function.
Similar content being viewed by others
References
Aulchenko YS, de Koning DJ, Haley C (2007) Genomewide rapid association using mixed model and regression: a fast and simple method for genomewide pedigree-based quantitative trait loci association analysis. Genetics 177:577–585
Goddard ME, Wray NR, Verbyla K, Visscher PM (2009) Estimating effects and making predictions from genome-wide marker data. Stat Sci 24:517–529
Hayes BJ, Visscher PM, Goddard ME (2009) Increased accuracy of artificial selection by using the realized relationship matrix. Genet Res 91:47–60
Kang HM, Zaitlen NA, Wade CM, Kirby A, Heckerman D, Daly MJ, Eskin E (2008) Efficient control of population structure in model organism association mapping. Genetics 178:1709–1723
Kang HM, Sul JH, Service SK, Zaitlen NA, Kong SY, Freimer NB, Sabatti C, Eskin E (2010) Variance component model to account for sample structure in genome-wide association studies. Nat Genet 42:348–354
Lippert C, Listgarten J, Liu Y, Kadie CM, Davidson RI, Heckerman D (2011) FaST linear mixed models for genome-wide association studies. Nat Methods 8:833–835
Listgarten J, Lippert C, Kadie CM, Davidson RI, Eskin E, Heckerman D (2012) Improved linear mixed models for genome-wide association studies. Nat Methods 9:525–526
Loh PR, Tucker G, Bulik-Sullivan BK, Vilhjalmsson BJ, Finucane HK, Salem RM, Chasman DI, Ridker PM, Neale BM, Berger B, Patterson N, Price AL (2015) Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat Genet 47:284–290
Parker CC, Gopalakrishnan S, Carbonetto P, Gonzales NM, Leung E, Park YJ, Aryee E, Davis J, Blizard DA, Ackert-Bicknell CL (2016) Genome-wide association study of behavioral, physiological and gene expression traits in outbred CFW mice. Nat Genet 48:919
Patterson HD, Thompson R (1971) Recovery of Inter-block information when block sizes are unequal. Biometrika 58:545–554
Romay MC, Millard MJ, Glaubitz JC, Peiffer JA, Swarts KL, Casstevens TM, Elshire RJ, Acharya CB, Mitchell SE, Flintgarcia SA (2013) Comprehensive genotyping of the USA national maize inbred seed bank. Genome Biol 14:R55
Svishcheva GR, Axenovich TI, Belonogova NM, van Duijn CM, Aulchenko YS (2012) Rapid variance components-based method for whole-genome association analysis. Nat Genet 44:1166–1170
Vanraden PM (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91:4414–4423
Vanraden PM, Tassell CPV, Wiggans GR, Sonstegard TS, Schanabel RD, Taylor JF, Schenkel FS (2009) Reliability of genomic predictions for North American Holstein bulls. J Dairy Sci 92:16–24
Wang Q, Tian F, Pan Y, Buckler ES, Zhang Z (2014) A SUPER powerful method for genome-wide association study. PLoS ONE 9:e107684
Yang J, Benyamin B, Mcevoy BP, Gordon S, Henders AK, Nyholt DR, Madden PA, Heath AC, Martin NG, Montgomery GW (2010) Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42:565–569
Yang J, Zaitlen NA, Goddard ME, Visscher PM, Price AL (2014) Advantages and pitfalls in the application of mixed-model association methods. Nat Genet 46:100–106
Yu J, Pressoir G, Briggs WH, Vroh Bi I, Yamasaki M, Doebley JF, McMullen MD, Gaut BS, Nielsen DM, Holland JB, Kresovich S, Buckler ES (2006) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet 38:203–208
Zhang Z, Ersoz E, Lai CQ, Todhunter RJ, Tiwari HK, Gore MA, Bradbury PJ, Yu J, Arnett DK, Ordovas JM (2010) Mixed linear model approach adapted for genome-wide association studies. Nat Genet 42:355–360
Zhou X, Stephens M (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet 44:821–824
Acknowledgements
This work was supported by the Special Scientific Research Funds for Central Non-profit Institutes, Chinese Academy of Fishery Sciences (2017A001), and China Marine Fish Industry Technology System (CARS-47-G02).
Author information
Authors and Affiliations
Contributions
RQY conceived the proposed method and designed the associated computer software. JG and ZYH developed the source code. XFZ and LJ collected real datasets and participated in simulations and case analyses. All authors read and approved the final manuscript.
Corresponding author
Additional information
Communicated by Mikko J. Sillanpaa.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gao, J., Zhou, X., Hao, Z. et al. Genome-wide barebones regression scan for mixed-model association analysis. Theor Appl Genet 133, 51–58 (2020). https://doi.org/10.1007/s00122-019-03439-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00122-019-03439-5