Abstract
A significant challenge in epistasis detection is the huge amount of data, which leads to combinatorial explosion. This study focuses on a two-stage approach for detecting epistasis only among single nucleotide polymorphisms (SNPs) that show some marginal effect. We present this two-stage approach based on the fusion of two criteria (TwoFC) to detect epistatic interactions. We fuse the G 2 test and absolute probability difference function as a scoring function to measure the strength of association between SNPs and disease status. The fused scoring function is an excellent measure of the strength of such an association. The two-stage strategy greatly reduces the computation load on epistasis detection. We use both simulated data sets and a real disease data set to evaluate our method. The results of an experiment on the simulated data sets show that TwoFC exhibits high power and sample efficiency. The results of an experiment on the real disease data set show that our method performs well even with large-scale data sets.
Similar content being viewed by others
References
Chen N-H, Reith ME, Quick MW (2004) Synaptic uptake and beyond: the sodium-and chloride-dependent neurotransmitter transporter family SLC6. Pflüg Arch 447(5):519–531
Cordell HJ (2009) Detecting gene–gene interactions that underlie human diseases. Nat Rev Genet 10(6):392–404
Evans DM, Marchini J, Morris AP, Cardon LR (2006) Two-stage two-locus models in genome-wide association. PLoS Genet 2(9):e157
Fontanarosa J, Dai Y (2010) A block-based evolutionary optimization strategy to investigate gene–gene interactions in genetic association studies. In: Bioinformatics and biomedicine workshops (BIBMW), 2010 IEEE international conference, pp 330–335
Giudici P, Castelo R (2003) Improving Markov chain Monte Carlo model search for data mining. Mach Learn 50(1–2):127–158
Han B, Chen X-W (2011) bNEAT: a Bayesian network method for detecting epistatic interactions in genome-wide association studies. BMC Genomics 12(Suppl 2):S9
Han B, Park M, Chen X-W (2010) A Markov blanket-based method for detecting causal SNPs in GWAS. BMC Bioinform 11(Suppl 3):S5
Han B, Chen X-W, Talebizadeh Z, Xu H (2012) Genetic studies of complex human diseases: characterizing SNP-disease associations using Bayesian networks. BMC Syst Biol 6(Suppl 3):S14
Hirschhorn JN, Daly MJ (2005) Genome-wide association studies for common diseases and complex traits. Nat Rev Genet 6(2):95–108
Jiang R, Tang W, Wu X, Fu W (2009) A random forest approach to the detection of epistatic interactions in case–control studies. BMC Bioinform 10(Suppl 1):S65
Klein RJ, Zeiss C, Chew EY, Tsai J-Y, Sackler RS, Haynes C, Henning AK, SanGiovanni JP, Mane SM, Mayne ST (2005) Complement factor H polymorphism in age-related macular degeneration. Science 308(5720):385–389
Lin HY, Chen YA, Tsai YY, Qu X, Tseng TS, Park JY (2012) TRM: a powerful two-stage machine learning approach for identifying SNP–SNP interactions. Ann Hum Genet 76(1):53–62
Marchini J, Donnelly P, Cardon LR (2005) Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat Genet 37(4):413–417
Mechanic LE, Luke BT, Goodman JE, Chanock SJ, Harris CC (2008) Polymorphism interaction analysis (PIA): a method for investigating complex gene–gene interactions. BMC Bioinform 9(1):146
Park MY, Hastie T (2008) Penalized logistic regression for detecting gene interactions. Biostatistics 9(1):30–50
Peña JM, Nilsson R, Björkegren J, Tegnér J (2007) Towards scalable and data efficient learning of Markov boundaries. Int J Approx Reason 45(2):211–232
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, De Bakker PI, Daly MJ (2007) Plink: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81(3):559–575
Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, Parl FF, Moore JH (2001) Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet 69(1):138–147
Shang J, Zhang J, Sun Y, Zhang Y (2014) EpiMiner: a three-stage co-information based method for detecting and visualizing epistatic interactions. Digit Signal Process 24:1–13
Spirtes P, Glymour C, Scheines R (2000) Causation, prediction, and search, vol 81. MIT Press, Cambridge
Tang W, Wu X, Jiang R, Li Y (2009) Epistatic module detection for case–control studies: a Bayesian model with a Gibbs sampling strategy. PLoS Genet 5(5):e1000464
Wan X, Yang C, Yang Q, Xue H, Fan X, Tang NL, Yu W (2010a) Boost: a fast approach to detecting gene–gene interactions in genome-wide case–control studies. Am J Hum Genet 87(3):325–340
Wan X, Yang C, Yang Q, Xue H, Tang NL, Yu W (2010b) Predictive rule inference for epistatic interaction detection in genome-wide association studies. Bioinformatics 26(1):30–37
Wang Y, Liu X, Robbins K, Rekaya R (2010) AntEpiSeeker: detecting epistatic interactions for case–control studies using a two-stage ant colony optimization algorithm. BMC Res Notes 3(1):117
Yang C, He Z, Wan X, Yang Q, Xue H, Yu W (2009) SNPHarvester: a filtering-based approach for detecting epistatic interactions in genome-wide association studies. Bioinformatics 25(4):504–511
Yang F, Mao K (2011) Robust feature selection for microarray data based on multicriterion fusion. IEEE/ACM Trans Comput Biol Bioinform 8(4):1080–1092
Zhang X, Huang S, Zou F, Wang W (2010) TEAM: efficient two-locus epistasis tests in human genome-wide association study. Bioinformatics 26(12):i217–i227
Zhang Y, Liu JS (2007) Bayesian inference of epistatic interactions in case–control studies. Nat Genet 39(9):1167–1173
Acknowledgments
We thank Dr. Bing Han for providing the EpiBN code. This work is supported by the Program for New Century Excellent Talents in University (Grant NCET-10-0365), the National Nature Science Foundation of China (Grants 60973082, 11171369, 61272395, 61370171), the National Nature Science Foundation of Hunan Province (Grant 12JJ2041), and the Planned Science and Technology Project of Hunan Province (Grants 2009FJ3195, 2012FJ2012).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liao, Z., Zeng, Q., Liao, B. et al. A Novel Two-Stage Approach for Epistasis Detection in Genome-Wide Case–Control Studies. Biochem Genet 52, 403–414 (2014). https://doi.org/10.1007/s10528-014-9656-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10528-014-9656-7