A fast algorithm for Bayesian multi-locus model in genome-wide association studies
Genome-wide association studies (GWAS) have identified a large amount of single-nucleotide polymorphisms (SNPs) associated with complex traits. A recently developed linear mixed model for estimating heritability by simultaneously fitting all SNPs suggests that common variants can explain a substantial fraction of heritability, which hints at the low power of single variant analysis typically used in GWAS. Consequently, many multi-locus shrinkage models have been proposed under a Bayesian framework. However, most use Markov Chain Monte Carlo (MCMC) algorithm, which are time-consuming and challenging to apply to GWAS data. Here, we propose a fast algorithm of Bayesian adaptive lasso using variational inference (BAL-VI). Extensive simulations and real data analysis indicate that our model outperforms the well-known Bayesian lasso and Bayesian adaptive lasso models in accuracy and speed. BAL-VI can complete a simultaneous analysis of a lung cancer GWAS data with ~3400 subjects and ~570,000 SNPs in about half a day.
KeywordsGenome-wide association studies Multi-locus model Bayesian adaptive lasso Variational inference Variable selection
We thank the participants and staff for their important contributions to this study. Special thanks to reviewers for their insightful and helpful suggestions. This research is supported by the National Natural Science Foundation of China (81373102, 81402764, 81473070, and 81530088), Research and Innovation Project for College Graduates of Jiangsu Province of China (KYLX16_1123), and the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD).
Compliance with ethical standards
Conflict of interest
Author WD, author YZ, author YW, author SY, author JB, author SS, author MD, author LH, author ZH, and FC declare that they have no conflict of interest.
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional ethics committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
- Andrews DF, Mallows CL (1974) Scale mixtures of normal distributions. R Stat Soc Series B Stat Methodol 36:99–102Google Scholar
- Beal MJ (2003) Variational algorithms for approximate Bayesian inference. University College London, LondonGoogle Scholar
- Bishop CM (2006) Pattern recognition and machine learning (Information Science and Statistics). Springer, New YorkGoogle Scholar
- Carlin BP, Louis TA (2009) Bayesian methods for data analysis. J R Stat Soc 149:935–936Google Scholar
- Casella G, George EI (1992) Explaining the Gibbs sampler. Am Stat 46:167–174Google Scholar
- Dai J, Shen W, Wen W, Chang J, Wang T, Chen H, Jin G, Ma H, Wu C, Li L, Song F, Zeng Y, Jiang Y, Chen J, Wang C, Zhu M, Zhou W, Du J, Xiang Y, Shu XO, Hu Z, Zhou W, Chen K, Xu J, Jia W, Lin D, Zheng W, Shen H (2016) Estimation of heritability for nine common cancers using data from genome-wide association studies in Chinese population. Int J Cancer 140:329–336CrossRefPubMedGoogle Scholar
- de Maturana EL, Ye Y, Calle ML, Rothman N, Urrea V, Kogevinas M, Petrus S, Chanock SJ, Tardon A, Garcia-Closas M, Gonzalez-Neira A, Vellalta G, Carrato A, Navarro A, Lorente-Galdos B, Silverman DT, Real FX, Wu X, Malats N (2013) Application of multi-SNP approaches Bayesian LASSO and AUC-RF to detect main effects of inflammatory-gene variants associated with bladder cancer risk. PLoS One 8:e83745CrossRefPubMedPubMedCentralGoogle Scholar
- de Maturana EL, Chanok SJ, Picornell AC, Rothman N, Herranz J, Calle ML, Garcia-Closas M, Marenne G, Brand A, Tardon A, Carrato A, Silverman DT, Kogevinas M, Gianola D, Real FX, Malats N (2014) Whole genome prediction of bladder cancer risk with the Bayesian LASSO. Genet Epidemiol 38:467–476CrossRefPubMedGoogle Scholar
- Feng H, Lopez GY, Kim CK, Alvarez A, Duncan CG, Nishikawa R, Nagane M, Su AJ, Auron PE, Hedberg ML, Wang L, Raizer JJ, Kessler JA, Parsa AT, Gao WQ, Kim SH, Minata M, Nakano I, Grandis JR, McLendon RE, Bigner DD, Lin HK, Furnari FB, Cavenee WK, Hu B, Yan H, Cheng SY (2014) EGFR phosphorylation of DCBLD2 recruits TRAF6 and stimulates AKT-promoted tumorigenesis. J Clin Invest 124:3741–3756CrossRefPubMedPubMedCentralGoogle Scholar
- Frullanti E, Colombo F, Falvella FS, Galvan A, Noci S, De Cecco L, Incarbone M, Alloisio M, Santambrogio L, Nosotti M, Tosi D, Pastorino U, Dragani TA (2012) Association of lung adenocarcinoma clinical stage with gene expression pattern in noninvolved lung tissue. Int J Cancer 131:E643–E648CrossRefPubMedGoogle Scholar
- Hu Z, Wu C, Shi Y, Guo H, Zhao X, Yin Z, Yang L, Dai J, Hu L, Tan W, Li Z, Deng Q, Wang J, Wu W, Jin G, Jiang Y, Yu D, Zhou G, Chen H, Guan P, Chen Y, Shu Y, Xu L, Liu X, Liu L, Xu P, Han B, Bai C, Zhao Y, Zhang H, Yan Y, Ma H, Chen J, Chu M, Lu F, Zhang Z, Chen F, Wang X, Jin L, Lu J, Zhou B, Lu D, Wu T, Lin D, Shen H (2011) A genome-wide association study identifies two new lung cancer susceptibility loci at 13q12.12 and 22q12.2 in Han Chinese. Nat Genet 43:792–796CrossRefPubMedGoogle Scholar
- Jeon HS, Dracheva T, Yang SH, Meerzaman D, Fukuoka J, Shakoori A, Shilo K, Travis WD, Jen J (2008) SMAD6 contributes to patient survival in non-small cell lung cancer and its knockdown reestablishes TGF-beta homeostasis in lung cancer cells. Cancer Res 68:9686–9692CrossRefPubMedPubMedCentralGoogle Scholar
- Logsdon BA, Dai JY, Auer PL, Johnsen JM, Ganesh SK, Smith NL, Wilson JG, Tracy RP, Lange LA, Jiao S, Rich SS, Lettre G, Carlson CS, Jackson RD, O’Donnell CJ, Wurfel MM, Nickerson DA, Tang H, Reiner AP, Kooperberg C (2014) A variational Bayes discrete mixture test for rare variant association. Genet Epidemiol 38:21–30CrossRefPubMedPubMedCentralGoogle Scholar
- Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A, Cho JH, Guttmacher AE, Kong A, Kruglyak L, Mardis E, Rotimi CN, Slatkin M, Valle D, Whittemore AS, Boehnke M, Clark AG, Eichler EE, Gibson G, Haines JL, Mackay TF, McCarroll SA, Visscher PM (2009) Finding the missing heritability of complex diseases. Nature 461:747–753CrossRefPubMedPubMedCentralGoogle Scholar
- Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc 58:267–288Google Scholar