Advertisement

Molecular Genetics and Genomics

, Volume 292, Issue 4, pp 923–934 | Cite as

A fast algorithm for Bayesian multi-locus model in genome-wide association studies

  • Weiwei Duan
  • Yang Zhao
  • Yongyue Wei
  • Sheng Yang
  • Jianling Bai
  • Sipeng Shen
  • Mulong Du
  • Lihong Huang
  • Zhibin Hu
  • Feng Chen
Methods Paper

Abstract

Genome-wide association studies (GWAS) have identified a large amount of single-nucleotide polymorphisms (SNPs) associated with complex traits. A recently developed linear mixed model for estimating heritability by simultaneously fitting all SNPs suggests that common variants can explain a substantial fraction of heritability, which hints at the low power of single variant analysis typically used in GWAS. Consequently, many multi-locus shrinkage models have been proposed under a Bayesian framework. However, most use Markov Chain Monte Carlo (MCMC) algorithm, which are time-consuming and challenging to apply to GWAS data. Here, we propose a fast algorithm of Bayesian adaptive lasso using variational inference (BAL-VI). Extensive simulations and real data analysis indicate that our model outperforms the well-known Bayesian lasso and Bayesian adaptive lasso models in accuracy and speed. BAL-VI can complete a simultaneous analysis of a lung cancer GWAS data with ~3400 subjects and ~570,000 SNPs in about half a day.

Keywords

Genome-wide association studies Multi-locus model Bayesian adaptive lasso Variational inference Variable selection 

Notes

Acknowledgements

We thank the participants and staff for their important contributions to this study. Special thanks to reviewers for their insightful and helpful suggestions. This research is supported by the National Natural Science Foundation of China (81373102, 81402764, 81473070, and 81530088), Research and Innovation Project for College Graduates of Jiangsu Province of China (KYLX16_1123), and the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD).

Compliance with ethical standards

Conflict of interest

Author WD, author YZ, author YW, author SY, author JB, author SS, author MD, author LH, author ZH, and FC declare that they have no conflict of interest.

Ethical approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional ethics committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Supplementary material

438_2017_1322_MOESM1_ESM.docx (2.1 mb)
Supplementary material 1 (DOCX 2136 kb)

References

  1. Andrews DF, Mallows CL (1974) Scale mixtures of normal distributions. R Stat Soc Series B Stat Methodol 36:99–102Google Scholar
  2. Beal MJ (2003) Variational algorithms for approximate Bayesian inference. University College London, LondonGoogle Scholar
  3. Bishop CM (2006) Pattern recognition and machine learning (Information Science and Statistics). Springer, New YorkGoogle Scholar
  4. Carbonetto P, Stephens M (2012) Scalable variational inference for Bayesian variable selection in Regression, and its accuracy in genetic association studies. Bayesian Anal 7:73–107CrossRefGoogle Scholar
  5. Carlin BP, Louis TA (2009) Bayesian methods for data analysis. J R Stat Soc 149:935–936Google Scholar
  6. Casella G, George EI (1992) Explaining the Gibbs sampler. Am Stat 46:167–174Google Scholar
  7. Dai J, Shen W, Wen W, Chang J, Wang T, Chen H, Jin G, Ma H, Wu C, Li L, Song F, Zeng Y, Jiang Y, Chen J, Wang C, Zhu M, Zhou W, Du J, Xiang Y, Shu XO, Hu Z, Zhou W, Chen K, Xu J, Jia W, Lin D, Zheng W, Shen H (2016) Estimation of heritability for nine common cancers using data from genome-wide association studies in Chinese population. Int J Cancer 140:329–336CrossRefPubMedGoogle Scholar
  8. de Maturana EL, Ye Y, Calle ML, Rothman N, Urrea V, Kogevinas M, Petrus S, Chanock SJ, Tardon A, Garcia-Closas M, Gonzalez-Neira A, Vellalta G, Carrato A, Navarro A, Lorente-Galdos B, Silverman DT, Real FX, Wu X, Malats N (2013) Application of multi-SNP approaches Bayesian LASSO and AUC-RF to detect main effects of inflammatory-gene variants associated with bladder cancer risk. PLoS One 8:e83745CrossRefPubMedPubMedCentralGoogle Scholar
  9. de Maturana EL, Chanok SJ, Picornell AC, Rothman N, Herranz J, Calle ML, Garcia-Closas M, Marenne G, Brand A, Tardon A, Carrato A, Silverman DT, Kogevinas M, Gianola D, Real FX, Malats N (2014) Whole genome prediction of bladder cancer risk with the Bayesian LASSO. Genet Epidemiol 38:467–476CrossRefPubMedGoogle Scholar
  10. Eichler EE, Flint J, Gibson G, Kong A, Leal SM, Moore JH, Nadeau JH (2010) Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet 11:446–450CrossRefPubMedPubMedCentralGoogle Scholar
  11. Feng H, Lopez GY, Kim CK, Alvarez A, Duncan CG, Nishikawa R, Nagane M, Su AJ, Auron PE, Hedberg ML, Wang L, Raizer JJ, Kessler JA, Parsa AT, Gao WQ, Kim SH, Minata M, Nakano I, Grandis JR, McLendon RE, Bigner DD, Lin HK, Furnari FB, Cavenee WK, Hu B, Yan H, Cheng SY (2014) EGFR phosphorylation of DCBLD2 recruits TRAF6 and stimulates AKT-promoted tumorigenesis. J Clin Invest 124:3741–3756CrossRefPubMedPubMedCentralGoogle Scholar
  12. Fish AE, Capra JA, Bush WS (2016) Are interactions between cis-regulatory variants evidence for biological epistasis or statistical artifacts? Am J Hum Genet 99:817–830CrossRefPubMedPubMedCentralGoogle Scholar
  13. Frullanti E, Colombo F, Falvella FS, Galvan A, Noci S, De Cecco L, Incarbone M, Alloisio M, Santambrogio L, Nosotti M, Tosi D, Pastorino U, Dragani TA (2012) Association of lung adenocarcinoma clinical stage with gene expression pattern in noninvolved lung tissue. Int J Cancer 131:E643–E648CrossRefPubMedGoogle Scholar
  14. George EI, McCulloch RE (1993) Variable selection via gibbs sampling. J Am Stat Assoc 88:881–889CrossRefGoogle Scholar
  15. Gilks WR, Tan KKC (1995) Adaptive rejection metropolis sampling within Gibbs sampling. Appl Stat 44:455–472CrossRefGoogle Scholar
  16. Golan D, Lander ES, Rosset S (2014) Measuring missing heritability: inferring the contribution of common variants. Proc Natl Acad Sci USA 111:E5272–E5281CrossRefPubMedPubMedCentralGoogle Scholar
  17. Guan Y, Stephens M (2011) Bayesian variable selection regression for genome-wide association studies and other large-scale problems. Ann Appl Stat 5:1780–1815CrossRefGoogle Scholar
  18. Hayashi T, Iwata H (2013) A Bayesian method and its variational approximation for prediction of genomic breeding values in multiple traits. BMC Bioinform 14:34CrossRefGoogle Scholar
  19. Hu Z, Wu C, Shi Y, Guo H, Zhao X, Yin Z, Yang L, Dai J, Hu L, Tan W, Li Z, Deng Q, Wang J, Wu W, Jin G, Jiang Y, Yu D, Zhou G, Chen H, Guan P, Chen Y, Shu Y, Xu L, Liu X, Liu L, Xu P, Han B, Bai C, Zhao Y, Zhang H, Yan Y, Ma H, Chen J, Chu M, Lu F, Zhang Z, Chen F, Wang X, Jin L, Lu J, Zhou B, Lu D, Wu T, Lin D, Shen H (2011) A genome-wide association study identifies two new lung cancer susceptibility loci at 13q12.12 and 22q12.2 in Han Chinese. Nat Genet 43:792–796CrossRefPubMedGoogle Scholar
  20. Jaakkola TS, Jordan MI (2000) Bayesian parameter estimation via variational methods. Stat Comput 10:25–37CrossRefGoogle Scholar
  21. Jeon HS, Dracheva T, Yang SH, Meerzaman D, Fukuoka J, Shakoori A, Shilo K, Travis WD, Jen J (2008) SMAD6 contributes to patient survival in non-small cell lung cancer and its knockdown reestablishes TGF-beta homeostasis in lung cancer cells. Cancer Res 68:9686–9692CrossRefPubMedPubMedCentralGoogle Scholar
  22. Karkkainen HP, Sillanpää MJ (2013) Fast genomic predictions via Bayesian G-BLUP and multilocus models of threshold traits including censored Gaussian data. G3 (Bethesda) 3:1511–1523CrossRefGoogle Scholar
  23. Karkkainen HP, Li Z, Sillanpää MJ (2015) An efficient genome-wide multilocus epistasis search. Genetics 201:865–870CrossRefPubMedPubMedCentralGoogle Scholar
  24. Kim M, Lee KT, Jang HR, Kim JH, Noh SM, Song KS, Cho JS, Jeong HY, Kim SY, Yoo HS, Kim YS (2008) Epigenetic down-regulation and suppressive role of DCBLD2 in gastric cancer cell proliferation and invasion. Mol Cancer Res 6:222–230CrossRefPubMedGoogle Scholar
  25. Koshikawa K, Osada H, Kozaki K, Konishi H, Masuda A, Tatematsu Y, Mitsudomi T, Nakao A, Takahashi T (2002) Significant up-regulation of a novel gene, CLCP1, in a highly metastatic lung cancer subline as well as in lung cancers in vivo. Oncogene 21:2822–2828CrossRefPubMedGoogle Scholar
  26. Lee SH, Wray NR, Goddard ME, Visscher PM (2011) Estimating missing heritability for disease from genome-wide association studies. Am J Hum Genet 88:294–305CrossRefPubMedPubMedCentralGoogle Scholar
  27. Li Z, Sillanpää MJ (2012) Estimation of quantitative trait locus effects with epistasis by variational Bayes algorithms. Genetics 190:231–249CrossRefPubMedPubMedCentralGoogle Scholar
  28. Li J, Das K, Fu G, Li R, Wu R (2011) The Bayesian lasso for genome-wide association studies. Bioinformatics 27:516–523CrossRefPubMedGoogle Scholar
  29. Lichtenstein P, Holm NV, Verkasalo PK, Iliadou A, Kaprio J, Koskenvuo M, Pukkala E, Skytthe A, Hemminki K (2000) Environmental and heritable factors in the causation of cancer—analyses of cohorts of twins from Sweden, Denmark, and Finland. N Engl J Med 343:78–85CrossRefPubMedGoogle Scholar
  30. Lippert C, Listgarten J, Liu Y, Kadie CM, Davidson RI, Heckerman D (2011) FaST linear mixed models for genome-wide association studies. Nat Methods 8:833–835CrossRefPubMedGoogle Scholar
  31. Logsdon BA, Hoffman GE, Mezey JG (2010) A variational Bayes algorithm for fast and accurate multiple locus genome-wide association analysis. BMC Bioinform 11:1–13CrossRefGoogle Scholar
  32. Logsdon BA, Carty CL, Reiner AP, Dai JY, Kooperberg C (2012) A novel variational Bayes multiple locus Z-statistic for genome-wide association studies with Bayesian model averaging. Bioinformatics 28:1738–1744CrossRefPubMedPubMedCentralGoogle Scholar
  33. Logsdon BA, Dai JY, Auer PL, Johnsen JM, Ganesh SK, Smith NL, Wilson JG, Tracy RP, Lange LA, Jiao S, Rich SS, Lettre G, Carlson CS, Jackson RD, O’Donnell CJ, Wurfel MM, Nickerson DA, Tang H, Reiner AP, Kooperberg C (2014) A variational Bayes discrete mixture test for rare variant association. Genet Epidemiol 38:21–30CrossRefPubMedPubMedCentralGoogle Scholar
  34. Loh PR, Tucker G, Bulik-Sullivan BK, Vilhjalmsson BJ, Finucane HK, Salem RM, Chasman DI, Ridker PM, Neale BM, Berger B, Patterson N, Price AL (2015) Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat Genet 47:284–290CrossRefPubMedPubMedCentralGoogle Scholar
  35. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A, Cho JH, Guttmacher AE, Kong A, Kruglyak L, Mardis E, Rotimi CN, Slatkin M, Valle D, Whittemore AS, Boehnke M, Clark AG, Eichler EE, Gibson G, Haines JL, Mackay TF, McCarroll SA, Visscher PM (2009) Finding the missing heritability of complex diseases. Nature 461:747–753CrossRefPubMedPubMedCentralGoogle Scholar
  36. Moser G, Lee SH, Hayes BJ, Goddard ME, Wray NR, Visscher PM (2015) Simultaneous discovery, estimation and prediction analysis of complex traits using a bayesian mixture model. PLoS Genet 11:e1004969CrossRefPubMedPubMedCentralGoogle Scholar
  37. Mutshinda CM, Sillanpää MJ (2012) A decision rule for quantitative trait locus detection under the extended Bayesian LASSO model. Genetics 192:1483–1491CrossRefPubMedPubMedCentralGoogle Scholar
  38. Nagai H, Sugito N, Matsubara H, Tatematsu Y, Hida T, Sekido Y, Nagino M, Nimura Y, Takahashi T, Osada H (2007) CLCP1 interacts with semaphorin 4B and regulates motility of lung cancer cells. Oncogene 26:4025–4031CrossRefPubMedGoogle Scholar
  39. O’Hara RB (2009) A review of Bayesian variable selection methods: what, how and which. Bayesian Anal 4:85–117CrossRefGoogle Scholar
  40. Park T, Casella G (2008) The Bayesian Lasso. J Am Stat Assoc 103:681–686CrossRefGoogle Scholar
  41. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81:559–575CrossRefPubMedPubMedCentralGoogle Scholar
  42. Singh S, Arcaroli JJ, Orlicky DJ, Chen Y, Messersmith WA, Bagby S, Purkey A, Quackenbush KS, Thompson DC, Vasiliou V (2016) Aldehyde dehydrogenase 1B1 as a modulator of pancreatic adenocarcinoma. Pancreas 45:117–122CrossRefPubMedPubMedCentralGoogle Scholar
  43. Speed D, Balding DJ (2014) MultiBLUP: improved SNP-based prediction for complex traits. Genome Res 24:1550–1557CrossRefPubMedPubMedCentralGoogle Scholar
  44. Sun W, Ibrahim JG, Zou F (2010) Genomewide multiple-loci mapping in experimental crosses by iterative adaptive penalized regression. Genetics 185:349–359CrossRefPubMedPubMedCentralGoogle Scholar
  45. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc 58:267–288Google Scholar
  46. Tibshirani R (2011) Regression shrinkage and selection via the lasso: a retrospective. J R Stat Soc Ser B Stat Methodol 73:273–282CrossRefGoogle Scholar
  47. Wang T, Chen YP, Goddard ME, Meuwissen TH, Kemper KE, Hayes BJ (2015) A computationally efficient algorithm for genomic prediction using a Bayesian model. Genet Sel Evol 47:34CrossRefPubMedPubMedCentralGoogle Scholar
  48. Xu S (2003) Estimating polygenic effects using markers of the entire genome. Genetics 163:789–801PubMedPubMedCentralGoogle Scholar
  49. Xu S (2010) An expectation-maximization algorithm for the Lasso estimation of quantitative trait locus effects. Heredity (Edinb) 105:483–494CrossRefGoogle Scholar
  50. Yang J, Lee SH, Goddard ME, Visscher PM (2011) GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet 88:76–82CrossRefPubMedPubMedCentralGoogle Scholar
  51. Yi N, Banerjee S (2009) Hierarchical generalized linear models for multiple quantitative trait locus mapping. Genetics 181:1101–1113CrossRefPubMedPubMedCentralGoogle Scholar
  52. You Q, Guo H, Xu D (2015) Distinct prognostic values and potential drug targets of ALDH1 isoenzymes in non-small-cell lung cancer. Drug Des Devel Ther 9:5087–5097CrossRefPubMedPubMedCentralGoogle Scholar
  53. Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc 68:49–67CrossRefGoogle Scholar
  54. Zhang Z, Ersoz E, Lai CQ, Todhunter RJ, Tiwari HK, Gore MA, Bradbury PJ, Yu J, Arnett DK, Ordovas JM, Buckler ES (2010) Mixed linear model approach adapted for genome-wide association studies. Nat Genet 42:355–360CrossRefPubMedPubMedCentralGoogle Scholar
  55. Zhou X, Stephens M (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet 44:821–824CrossRefPubMedPubMedCentralGoogle Scholar
  56. Zhou X, Carbonetto P, Stephens M (2013) Polygenic modeling with Bayesian sparse linear mixed models. PLoS Genet 9:e1003264CrossRefPubMedPubMedCentralGoogle Scholar
  57. Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429CrossRefGoogle Scholar
  58. Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc 67:301–320CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2017

Authors and Affiliations

  • Weiwei Duan
    • 1
    • 2
    • 3
    • 4
  • Yang Zhao
    • 1
    • 2
    • 3
    • 4
  • Yongyue Wei
    • 1
    • 2
    • 3
    • 4
  • Sheng Yang
    • 1
    • 2
    • 3
    • 4
  • Jianling Bai
    • 1
    • 2
    • 3
    • 4
  • Sipeng Shen
    • 1
    • 2
    • 3
    • 4
  • Mulong Du
    • 1
    • 2
    • 3
    • 4
  • Lihong Huang
    • 1
    • 2
    • 3
    • 4
  • Zhibin Hu
    • 2
    • 5
    • 6
  • Feng Chen
    • 1
    • 2
    • 3
    • 4
  1. 1.Department of Biostatistics, School of Public HealthNanjing Medical UniversityNanjingChina
  2. 2.The Key Laboratory of Modern Toxicology of Ministry of Education, School of Public HealthNanjing Medical UniversityNanjingChina
  3. 3.Joint Laboratory of Health and Environmental Risk Assessment (HERA)Nanjing Medical University School of Public Health/Harvard School of Public HealthNanjingChina
  4. 4.Key Laboratory of Biomedical Big DataNanjing Medical UniversityNanjingChina
  5. 5.Department of Epidemiology, School of Public HealthNanjing Medical UniversityNanjingChina
  6. 6.Section of Clinical Epidemiology, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Cancer CenterNanjing Medical UniversityNanjingChina

Personalised recommendations