Skip to main content

Brief Survey on Machine Learning in Epistasis

  • Protocol
  • First Online:
Epistasis

Abstract

In biology, the term “epistasis” indicates the effect of the interaction of a gene with another gene. A gene can interact with an independently sorted gene, located far away on the chromosome or on an entirely different chromosome, and this interaction can have a strong effect on the function of the two genes. These changes then can alter the consequences of the biological processes, influencing the organism’s phenotype. Machine learning is an area of computer science that develops statistical methods able to recognize patterns from data. A typical machine learning algorithm consists of a training phase, where the model learns to recognize specific trends in the data, and a test phase, where the trained model applies its learned intelligence to recognize trends in external data. Scientists have applied machine learning to epistasis problems multiple times, especially to identify gene–gene interactions from genome-wide association study (GWAS) data. In this brief survey, we report and describe the main scientific articles published in data mining and epistasis. Our article confirms the effectiveness of machine learning in this genetics subfield.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Losos JB, Mason KA, Singer SR, et al (2008) Biology, 8th edn. McGraw Hill, New York City, pp 233–234

    Google Scholar 

  2. Alberts B, Johnson A, Walter P, et al (2008) Molecular biology of the cell, 5th edn. Garland Science, New York City

    Google Scholar 

  3. Roff DA, Emerson K (2006) Epistasis and dominance: evidence for differential effects in life-history versus morphological traits. Evolution 60(10):1981–1990

    Article  PubMed  Google Scholar 

  4. Snustad DP, Simmons MJ (2015) Principles of genetics, binder ready version. Wiley, Hoboken

    Google Scholar 

  5. Smith SD, Wang S, Rausher MD (2012) Functional evolution of an anthocyanin pathway enzyme during a flower color transition. Mol Biol Evol 30(3):602–612

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  6. Katsumoto Y, Fukuchi-Mizutani M, Fukui Y, et al (2007) Engineering of the rose flavonoid biosynthetic pathway successfully generated blue-hued flowers accumulating delphinidin. Plant Cell Physiol 48(11):1589–1600

    Article  CAS  PubMed  Google Scholar 

  7. Gonnet JF (2003) Origin of the color of Cv. Rhapsody in blue rose and some other so-called “blue” roses. J Agric Food Chem 51(17):4990–4994

    Article  CAS  PubMed  Google Scholar 

  8. Nakamura N, Fukuchi-Mizutani M, Fukui Y, et al (2010) Generation of pink flower varieties from blue Torenia hybrida by redirecting the flavonoid biosynthetic pathway from delphinidin to pelargonidin. Plant Biotechnol 27(5):375–383

    Article  CAS  Google Scholar 

  9. Chayut N, Yuan H, Ohali S, et al (2017) Distinct mechanisms of the ORANGE protein in controlling carotenoid flux. Plant Physiol 173(1):376–389

    Article  CAS  PubMed  Google Scholar 

  10. Wolf JB, Brodie ED, Wade MJ (2000) Epistasis and the evolutionary process. Oxford University Press, Oxford

    Google Scholar 

  11. Abu-Mostafa YS, Magdon-Ismail M, Lin HT (2012) Learning from data, vol 4. AMLBook, New York City

    Google Scholar 

  12. Chicco D (2017) Ten quick tips for machine learning in computational biology. BioData Min 10(1):35

    Article  PubMed  PubMed Central  Google Scholar 

  13. Libbrecht MW, Noble WS (2015) Machine learning applications in genetics and genomics. Nat Rev Genet 16(6):321–332

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Pearson TA, Manolio TA (2008) How to interpret a genome-wide association study. J Am Med Assoc 299(11):1335–1344

    Article  CAS  Google Scholar 

  15. Zhang X, Huang S, Zhang Z, et al (2012) Chapter 10: Mining genome-wide genetic markers. PLoS Comput Biol 8(12):e1002828

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Niel C, Sinoquet C, Dina C, et al (2015) A survey about methods dedicated to epistasis detection. Front Genet 6:285

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  17. Cole BS, Hall MA, Urbanowicz RJ, et al (2017) Analysis of gene-gene interactions. Curr Protoc Hum Genet 95(1):1–14

    PubMed  Google Scholar 

  18. Pautasso M (2013) Ten simple rules for writing a literature review. PLoS Comput Biol 9(7):e1003149

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Moore JH, Gilbert JC, Tsai CT, et al (2006) A flexible computational frame-work for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. J Theor Biol 241(2):252–261

    Article  PubMed  Google Scholar 

  20. Motsinger-Reif AA, Dudek SM, Hahn LW, et al (2008) Comparison of approaches for machine-learning optimization of neural networks for detecting gene-gene interactions in genetic epidemiology. Genet Epidemiol 32(4):325–340

    Article  PubMed  Google Scholar 

  21. O’Neill M, Ryan C (2001) Grammatical evolution. IEEE Trans Evol Comput 5(4):349–358

    Article  Google Scholar 

  22. Briggs F, Ramsay P, Madden E, et al (2010) Supervised machine learning and logistic regression identifies novel epistatic risk factors with PTPN22 for rheumatoid arthritis. Genes Immun 11(3):199

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Jiang X, Neapolitan RE, Barmada MM, et al (2011) Learning genetic epistasis using Bayesian network scoring criteria. BMC Bioinf 12(1):89

    Article  CAS  Google Scholar 

  24. Collins RL, Hu T, Wejse C, et al (2013) Multifactor dimensionality reduction reveals a three-locus epistatic interaction associated with susceptibility to pulmonary tuberculosis. BioData Min 6(1):4

    Article  PubMed  PubMed Central  Google Scholar 

  25. Kira K, Rendell LA (1992) The feature selection problem: traditional methods and a new algorithm. In: Proceedings of AAAI 1992 – the 10th national conference on artificial intelligence, vol 2, pp 129–134

    Google Scholar 

  26. Hibar DP, Stein JL, Jahanshad N, et al (2013) Exhaustive search of the SNP-SNP interactome identifies epistatic effects on brain volume in two cohorts. In: Proceedings of MICCAI 2013 – the 16th international conference on medical image computing and computer-assisted intervention. Springer, Berlin, pp 600–607

    Google Scholar 

  27. Petersen RC, Aisen P, Beckett LA, et al (2010) Alzheimer’s disease neuroimaging initiative (ADNI): clinical characterization. Neurology 74(3):201–209

    Article  PubMed  PubMed Central  Google Scholar 

  28. Granados EAO, Vásquez LFN, Granados HA (2013) Characterizing genetic interactions using a machine learning approach in Colombian patients with Alzheimer’s disease. In: Proceedings of IEEE BIBE 2013 – the 13th IEEE international conference on bioinformatics and bioengineering. IEEE, Chania, pp 1–2

    Google Scholar 

  29. de Oliveira FC, Borges CCH, Almeida FN, et al (2014) SNPs selection using support vector regression and genetic algorithms in GWAS. BMC Genomics 15(7):S4

    Article  PubMed  PubMed Central  Google Scholar 

  30. Howard R, Carriquiry AL, Beavis WD (2014) Parametric and nonparametric statistical methods for genomic selection of traits with additive and epistatic genetic architectures. G3: Genes Genomes Genet 4(6):1027–1046

    Article  Google Scholar 

  31. Uppu S, Krishna A, Gopalan RP (2014) An associative classification based approach for detecting SNP-SNP interactions in high dimensional genome. In: Proceedings of IEEE BIBE 2014 – the 14th IEEE international conference on bioinformatics and bioengineering. IEEE, Boca Raton, pp 329–333

    Google Scholar 

  32. Holzinger ER, Szymczak S, Dasgupta A, et al (2014) Variable selection method for the identification of epistatic models. In: Pacific symposium on bio-computing. World Scientific, Singapore, pp 195–206

    Google Scholar 

  33. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  34. Li Q, Kim Y, Suktitipat B, et al (2015) Gene-gene interaction among WNT genes for oral cleft in trios. Genet Epidemiol 39(5):385–394

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Moore JH (2015) Epistasis analysis using ReliefF. Methods Mol Biol 1253:315–325

    Article  CAS  PubMed  Google Scholar 

  36. Li J, Malley JD, Andrew AS, et al (2016) Detecting gene-gene interactions using a permutation-based random forest method. BioData Min 9(1):14

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  37. Howard R, Carriquiry AL, Beavis WD (2017) Application of response surface methods to determine conditions for optimal genomic prediction. G3: Genes Genomes Genet 7(9):3103–3113

    Article  Google Scholar 

  38. Byvatov E, Schneider G (2003) Support vector machine applications in bioinformatics. Appl Bioinf 2(2):67–77

    Google Scholar 

  39. Cloninger CR, Zwir I (2018) What is the natural measurement unit of temperament: single traits or profiles? Philos Trans R Soc B Biol Sciences 373(1744):20170163

    Article  Google Scholar 

  40. Arabnejad M, Dawkins B, Bush WS, et al (2018) Transition-transversion encoding and genetic relationship metric in ReliefF feature selection improves pathway enrichment in GWAS. BioData Min 11(1):23

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Piette ER, Moore JH (2018) Improving machine learning reproducibility in genetic association studies with proportional instance cross validation (PICV). BioData Min 11(1):6

    Article  PubMed  PubMed Central  Google Scholar 

  42. Li B, Zhang N, Wang YG, et al (2018) Genomic prediction of breeding values using a subset of SNPs identified by three machine learning methods. Front Genet 9:237

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  43. Salesi S, Alani AA, Cosma G (2018) A hybrid model for classification of biomedical data using feature filtering and a convolutional neural network. In: Proceedings of SNAMS 2018 – the 5th international conference on social networks analysis, management and security. IEEE, Piscataway, pp 226–232

    Google Scholar 

  44. Urbanowicz RJ, Kiralis J, Sinnott-Armstrong NA, et al (2012) GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures. BioData Min 5(1):16

    Article  PubMed  PubMed Central  Google Scholar 

  45. Wang J, Chen J, Wang H (2018) A new model based on fuzzy integral for cancer prediction. In: Proceedings of IEEE BIBM 2018 – the 2018 IEEE international conference on bioinformatics and biomedicine. IEEE, Piscataway, pp 2309–2315

    Chapter  Google Scholar 

  46. Matthews BW (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta Protein Struct 405(2):442–451

    Article  CAS  Google Scholar 

  47. Chicco D, Jurman G (2020) The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, 21(6)

    Google Scholar 

  48. Chang YC, Wu JT, Hong MY, et al (2018) GenEpi: gene-based epistasis discovery using machine learning. bioRxiv 421719:1–41

    Google Scholar 

  49. Li Y, Raidan F, Li B, et al (2018) Using random forests as a prescreening tool for genomic prediction: impact of subsets of SNPs on prediction accuracy of total genetic values. In: Proceedings of the world congress on genetics applied to livestock production, vol 11, pp 1–5

    Google Scholar 

  50. Verma SS, Lucas A, Zhang X, et al (2018) Collective feature selection to identify crucial epistatic variants. BioData Min 11(1):5

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  51. Carey DJ, Fetterolf SN, Davis FD, et al (2016) The Geisinger MyCode community health initiative: an electronic health record-linked biobank for precision medicine research. Genet Med 18(9):906

    Article  PubMed  PubMed Central  Google Scholar 

  52. Ansarifar J, Wang L (2019) New algorithms for detecting multi-effect and multi-way epistatic interactions. Bioinformatics 35(24):5078–5085

    Article  CAS  PubMed  Google Scholar 

  53. Hanley JP, Rizzo DM, Buzas JS, et al (2019) A tandem evolutionary algorithm for identifying causal rules from complex data. Evol Comput 28:1–32

    Google Scholar 

  54. Yang CH, Yang HS, Chuang LY (2019) PBMDR: a particle swarm optimization-based multifactor dimensionality reduction for the detection of multilocus interactions. J Theor Biol 461:68–75

    Article  CAS  PubMed  Google Scholar 

  55. Eberhart R, Kennedy J (1995) Particle swarm optimization. In: Proceedings of ICNN 1995 – the 1995 IEEE international conference on neural networks, vol 4. Citeseer, pp 1942–1948

    Google Scholar 

  56. Chen Q, Zhang X, Zhang R (2019) Privacy-preserving decision tree for epistasis detection. Cybersecurity 2(1):7

    Article  Google Scholar 

  57. Romagnoni A, Jégou S, Van Steen K, et al (2019) Comparative performances of machine learning methods for classifying Crohn disease patients using genome-wide genotyping data. Sci Rep 9(1):10351

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  58. Castelvecchi D (2016) Can we open the black box of AI? Nat News 538(7623):20

    Article  CAS  Google Scholar 

  59. Rudin C (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 1(5):206–215

    Article  PubMed Central  Google Scholar 

Download references

Acknowledgements

The authors thank Ka Chun Wong (City University of Hong Kong) for having curated this book.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Davide Chicco .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Chicco, D., Faultless, T. (2021). Brief Survey on Machine Learning in Epistasis. In: Wong, KC. (eds) Epistasis. Methods in Molecular Biology, vol 2212. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-0947-7_11

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-0947-7_11

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-0946-0

  • Online ISBN: 978-1-0716-0947-7

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics