Abstract
To develop medical treatments and prevention, the association between disease and genetic variants needs to be identified. The main goal of genome-wide association study (GWAS) is to discover the underlying reason for vulnerability to disease and utilize this knowledge for the development of prevention and treatment against these diseases. Given the methods available to address the scientific problems involved in the search for epistasis, there is not any standard for detecting epistasis, and this remains a problem due to limited statistical power. The GenEpi package is a Python package that uses a two-level workflow machine learning model to detect within-gene and cross-gene epistasis. This protocol chapter shows the usage of GenEpi with example data. The package uses a three-step procedure to reduce dimensionality, select the within-gene epistasis, and select the cross-gene epistasis. The package also provides a medium to build prediction models with the combination of genetic features and environmental influences.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bush WS, Moore JH (2012) Genome-wide association studies. PLoS Comput Biol 8(12):e1002822
Wei W-H, Hemani G, Haley CS (2014) Detecting epistasis in human complex traits. Nat Rev Genet 15(11):722–733
Hemani G, Shakhbazov K, Westra H-J, Esko T, Henders AK, McRae AF, Yang J, Gibson G, Martin NG, Metspalu A (2014) Detection and replication of epistasis influencing transcription in humans. Nature 508(7495):249
Moore JH, Williams SM (2002) New strategies for identifying gene-gene interactions in hypertension. Ann Med 34(2):88–95
Briggs F, Ramsay P, Madden E, Norris J, Holers V, Mikuls TR, Sokka T, Seldin MF, Gregersen P, Criswell L (2010) Supervised machine learning and logistic regression identifies novel epistatic risk factors with PTPN22 for rheumatoid arthritis. Genes Immun 11(3):199
Ansarifar J, Wang L (2018) New algorithms for detecting multi-effect and multi-way epistatic interactions. Bioinformatics 35(24):5078–5085
Moore JH, Mackay TF, Williams SM (2019) Testing the assumptions of parametric linear models: the need for biological data mining in disciplines such as human genetics. BioData Min 12:6
Manduchi E, Orzechowski PR, Ritchie MD, Moore JH (2019) Exploration of a diversity of computational and statistical measures of association for genome-wide genetic studies. BioData Min 12(1):14
Zhou H, Jia D, Al-Dhelaan A, Al-Dhelaan M, Tian Y (2019) Feature selection with a local search strategy based on the forest optimization algorithm. Comput Model Eng Sci 121(2):569–592
David H, Dan H, Parida LP (2018) Feature selection for efficient epistasis modeling for phenotype prediction. Google Patents
Nejad MB, Ahmadabadi MES (2019) A novel image categorization strategy based on salp swarm algorithm to enhance efficiency of MRI images. Comput Model Eng Sci 119(1):185–205
Jiang X, Neapolitan RE, Barmada MM, Visweswaran S (2011) Learning genetic epistasis using Bayesian network scoring criteria. BMC Bioinformatics 12(1):89
Jiang R, Tang W, Wu X, Fu W (2009) A random forest approach to the detection of epistatic interactions in case-control studies. BMC Bioinformatics 10(1):S65
Huang K, Nogueira R (2019) EpiRL: a reinforcement learning agent to facilitate epistasis detection. In: International workshop on health intelligence. Springer
Chang Y-C, Wu J-T, Hong M-Y, Tung Y-A, Hsieh P-H, Yee SW, Giacomini KM, Oyang Y-J, Chen C-Y, A.s.D.N. Initiative (2018) GenEpi: gene-based epistasis discovery using machine learning. bioRxiv: p 421719
Joiret M, John JMM, Gusareva ES, Van Steen K (2019) Confounding of linkage disequilibrium patterns in large scale DNA based gene-gene interaction studies. BioData Min 12(1):11
Lewontin R (1964) The interaction of selection and linkage. I. General considerations; heterotic models. Genetics 49(1):49
Slim L, Chatelain C, Azencott C-A, Vert J-P (2019) Novel methods for epistasis detection in genome-wide association studies
Lin C, Chu C-M, Su S-L (2016) Epistasis test in meta-analysis: a multi-parameter Markov chain Monte Carlo model for consistency of evidence. PLoS One 11(4):e0152891
Abegaz F, Van Lishout F, Mahachie John JM, Chiachoompu K, Bhardwaj A, Gusareva ES, Wei Z, Hakonarson H, Van Steen K, Consortium IIG (2019) Epistasis detection in genome-wide screening for complex human diseases in structured populations. Syst Med 2(1):19–27
Kam-Thong T, Azencott C-A, Cayton L, Pütz B, Altmann A, Karbalai N, Sämann PG, Schölkopf B, Müller-Myhsok B, Borgwardt KM (2012) GLIDE: GPU-based linear regression for detection of epistasis. Hum Hered 73(4):220–236
Bi J-H, Tong Y-F, Qiu Z-W, Yang X-F, Minna J, Gazdar AF, Song K (2019) ClickGene: an open cloud-based platform for big pan-cancer data genome-wide association study, visualization and exploration. BioData Min 12(1):12
Acknowledgments
The work described in this paper was substantially supported by two grants from the Research Grants Council of the Hong Kong Special Administrative Region ([CityU 11203217] and [CityU 11200218]) and the funding from the Hong Kong Institute for Data Science (HKIDS) at the City University of Hong Kong. The work described in this paper was partially supported by a grant from the City University of Hong Kong (CityU 11202219).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Petinrin, O.O., Wong, KC. (2021). Protocol for Epistasis Detection with Machine Learning Using GenEpi Package. In: Wong, KC. (eds) Epistasis. Methods in Molecular Biology, vol 2212. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-0947-7_18
Download citation
DOI: https://doi.org/10.1007/978-1-0716-0947-7_18
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-0946-0
Online ISBN: 978-1-0716-0947-7
eBook Packages: Springer Protocols