Abstract
Motivation
Linear or nonlinear interactions of multiple single-nucleotide polymorphisms (SNPs) play an important role in understanding the genetic basis of complex human diseases. However, combinatorial analytics in high-dimensional space makes it extremely challenging to detect multiorder SNP interactions. Most classic approaches can only perform one task (for detecting k-order SNP interactions) in each run. Since prior knowledge of a complex disease is usually not available, it is difficult to determine the value of k for detecting k-order SNP interactions.
Methods
A novel multitasking ant colony optimization algorithm (named MTACO-DMSI) is proposed to detect multiorder SNP interactions, and it is divided into two stages: searching and testing. In the searching stage, multiple multiorder SNP interaction detection tasks (from 2nd-order to kth-order) are executed in parallel, and two subpopulations that separately adopt the Bayesian network-based K2-score and Jensen–Shannon divergence (JS-score) as evaluation criteria are generated for each task to improve the global search capability and the discrimination ability for various disease models. In the testing stage, the G test statistical test is adopted to further verify the authenticity of candidate solutions to reduce the error rate.
Result
Three multiorder simulated disease models with different interaction effects and three real age-related macular degeneration (AMD), rheumatoid arthritis (RA) and type 1 diabetes (T1D) datasets were used to investigate the performance of the proposed MTACO-DMSI. The experimental results show that the MTACO-DMSI has a faster search speed and higher discriminatory power for diverse simulation disease models than traditional single-task algorithms. The results on real AMD data and RA and T1D datasets indicate that MTACO-DMSI has the ability to detect multiorder SNP interactions at a genome-wide scale. Availability and implementation: https://github.com/shouhengtuo/MTACO-DMSI/
Graphical abstract
Similar content being viewed by others
References
Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, Yang J (2017) 10 years of GWAS discovery: biology, function, and translation. Am J Hum Genet 101:5–22. https://doi.org/10.1016/j.ajhg.2017.06.005
Tam V, Patel N, Turcotte M, Bossé Y, Paré G, Meyre D (2019) Benefits and limitations of genome-wide association studies. Nat Rev Genet 20:467–484. https://doi.org/10.1038/s41576-019-0127-1
Manolio TA et al (2009) Finding the missing heritability of complex diseases. Nature 461:747–753. https://doi.org/10.1155/2015/870123
Wei W-H, Hemani G, Haley CS (2014) Detecting epistasis in human complex traits. Nat Rev Genet 15:722–733. https://doi.org/10.1038/nature08494
Ebbert MTW, Ridge PG, Kauwe JSK (2015) Bridging the gap between statistical and biological epistasis in Alzheimer’s disease. Biomed Res Int. https://doi.org/10.1155/2015/870123
Cortes A et al (2015) Major histocompatibility complex associations of ankylosing spondylitis are complex and involve further epistasis with ERAP1. Nat Commun 6:1–8. https://doi.org/10.1038/ncomms8146
Li P, Guo M, Wang C et al (2015) An overview of SNP interactions in genome-wide association studies. Brief Funct Genom 14:143–155. https://doi.org/10.1093/bfgp/elu036
Gardner S (2021) Combinatorial analytics: an essential tool for the delivery of precision medicine and precision agriculture. Artif Intell Life Sci 1:100003. https://doi.org/10.1016/j.ailsci.2021.100003
Moore JH et al (2010) Bioinformatics challenges for genome-wide association studies. Bioinformatics 26:445–455. https://doi.org/10.1093/bioinformatics/btp713
Guo X et al (2014) Cloud computing for detecting high-order genome-wide epistatic interaction via dynamic clustering. BMC Bioinform 15:1–16. https://doi.org/10.1186/1471-2105-15-102
Wan X et al (2010) BOOST: a fast approach to detecting gene–gene interactions in genome-wide case–control studies. Am J Hum Genet 87:325–340. https://doi.org/10.1016/j.ajhg.2010.07.021
Yung LS et al (2011) GBOOST: a GPU-based tool for detecting gene–gene interactions in genome–wide case control studies. Bioinformatics 27:1309–1310. https://doi.org/10.1093/bioinformatics/btr114
Li X (2017) A fast and exhaustive method for heterogeneity and epistasis analysis based on multi-objective optimization. Bioinformatics 33:2829–2836. https://doi.org/10.1093/bioinformatics/btx339
Shang J et al (2014) EpiMiner: A three-stage co-information based method for detecting and visualizing epistatic interactions. Digit Signal Process 24:1–13. https://doi.org/10.1186/s12859-016-1076-8
Hearst MA et al (1998) Support vector machines. IEEE Intell Syst Appl 13:18–28. https://doi.org/10.1109/5254.708428
Liu J, Yu G, Jiang Y, Wang J (2017) HiSeeker: detecting high-order SNP interactions based on pairwise SNP combinations. Genes 8:153. https://doi.org/10.3390/genes8060153
Tuo S (2018) FDHE-IW: A fast approach for detecting high-order epistasis in genome-wide case–control studies. Genes 9:435. https://doi.org/10.3390/genes9090435
Zhang Y, Liu JS (2007) Bayesian inference of epistatic interactions in case–control studies. Nat Genet 39:1167–1173. https://doi.org/10.1038/ng2110
Yang C, He Z, Wan X, Yang Q, Xue H, Yu W (2009) SNPHarvester: a filtering-based approach for detecting epistatic interactions in genome-wide association studies. Bioinformatics 25:504–511. https://doi.org/10.1093/bioinformatics/btn652
Shang J et al (2019) A review of ant colony optimization-based methods for detecting epistatic interactions. IEEE Access 7:13497–13509. https://doi.org/10.1109/ACCESS.2019.2894676
Aflakparast M, Salimi H, Gerami A, Dubé MP, Visweswaran S, Masoudi-Nejad A (2014) Cuckoo search epistasis: a new method for exploring significant genetic interactions. Heredity 112:666–674. https://doi.org/10.1038/hdy.2014.4
Tuo S, Zhang J, Yuan X, He Z, Liu Y, Liu Z (2017) Niche harmony search algorithm for detecting complex disease associated high-order SNP combinations. Sci Rep 7:1–8. https://doi.org/10.1038/s41598-017-11064-9
Yang CH, Chuang LY, Lin YD (2017) CMDR based differential evolution identify the epistatic interaction in genome-wide association studies. Bioinformatics 33:2354–2362. https://doi.org/10.1093/bioinformatics/btx163
Shang J et al (2015) An improved opposition-based learning particle swarm optimization for the detection of SNP-SNP interactions. Biomed Res Int. https://doi.org/10.1155/2015/524821
Chen Y, Xu F, Pian C, Xu M, Kong L, Fang J, Li Z, Zhang L (2021) EpiMOGA: an epistasis detection method based on a multi-objective genetic algorithm. Genes 12:191. https://doi.org/10.3390/genes12020191
Ponte-Fernández C, González-Domínguez J, Carvajal-Rodriguez A, Martin MJ (2020) Evaluation of existing methods for high-order epistasis detection. IEEE/ACM Trans Comput Biol Bioinform 2:912–926. https://doi.org/10.1109/TCBB.2020.3030312
Tuo S, Chen H, Liu H (2019) A survey on swarm intelligence search methods dedicated to detection of high-order SNP interactions. IEEE Access 7:162229–162244. https://doi.org/10.1109/ACCESS.2019.2951700
Sun Y, Shang J, Liu JX, Li S, Zheng CH (2017) epiACO-a method for identifying epistasis based on ant Colony optimization algorithm. BioData Min 10:1–7. https://doi.org/10.1186/s13040-017-0143-7
Marchini J, Donnelly P, Cardon LR (2005) Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat Genet 37:413–417. https://doi.org/10.1038/ng1537
Jiang X et al (2011) Learning genetic epistasis using Bayesian network scoring criteria. BMC Bioinform 12:1–12. https://doi.org/10.1186/1471-2105-12-89
Guan B, Zhao Y (2019) Self-adjusting ant colony optimization based on information entropy for detecting epistatic interactions. Genes (Basel) 10:114. https://doi.org/10.3390/genes10020114
Wu TT et al (2009) Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics 25:714–721. https://doi.org/10.1093/bioinformatics/btp041
Ritchie MD, Hahn LW, Roodi N, Bailey LR, DuPont WD, Parl FF, Moore JH (2001) Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet 69:138–147. https://doi.org/10.1086/321276
Gola D, Mahachie John JM, Van Steen K, König IR (2016) A roadmap to multifactor dimensionality reduction methods. Brief Bioinform 17:293–308. https://doi.org/10.1093/bib/bbv038
Jing PJ, Shen HB (2015) MACOED: a multi-objective ant colony optimization algorithm for SNP epistasis detection in genome-wide association studies. Bioinformatics 31:634–641. https://doi.org/10.1093/bioinformatics/btu702
Deb K (2011) Multi-objective optimization using evolutionary algorithms: an introduction. Multi-objective evolutionary optimization for product design and manufacturing. Springer, London, pp 3–34
Tuo S, Liu H, Chen H (2020) Multi-population harmony search algorithm for the detection of high-order SNP interactions. Bioinformatics 36:4389–4398. https://doi.org/10.1093/bioinformatics/btaa215
Ong Y-S, Gupta A (2016) Evolutionary multitasking: a computer science view of cognitive multitasking. Cogn Comput 8:125–142. https://doi.org/10.1007/s12559-016-9395-7
Gupta A, Ong Y-S, Feng L (2015) Multifactorial evolution: toward evolutionary multitasking. IEEE Trans Evol Comput 20:343–357. https://doi.org/10.1109/TEVC.2015.2458037
Chen K, Xue B, Zhang M, Zhou F (2021) Evolutionary multitasking for feature selection in high-dimensional classification via particle swarm optimization. IEEE Trans Evol Comput. https://doi.org/10.1109/TEVC.2021.3100056
Wang H et al (2021) Surrogate-assisted evolutionary multitasking for expensive minimax optimization in multiple scenarios. IEEE Comput Intell Mag 16:34–48. https://doi.org/10.1109/MCI.2020.3039067
Zhang F, Mei Y, Nguyen S, Zhang M (2020) A preliminary approach to evolutionary multitasking for dynamic flexible job shop scheduling via genetic programming. In: Proceedings of the 2020 genetic and evolutionary computation conference companion, pp 107–108. https://doi.org/10.1145/3377929.3389934
Xu H, Qin AK, Xia S (2021) Evolutionary multi-task optimization with adaptive knowledge transfer. IEEE Trans Evol Comput. https://doi.org/10.1109/TEVC.2021.3107435
Cooper GF, Herskovits E (1992) A Bayesian method for the induction of probabilistic networks from data. Mach Learn 9:309–347. https://doi.org/10.1007/BF00994110
Lin J (1991) Divergence measures based on the Shannon entropy. IEEE Trans Inf Theory 37:145–151. https://doi.org/10.1109/18.61115
Hoey J (2012) The two-way likelihood ratio (G) test and comparison to two-way chi squared test. arXiv preprint arXiv:1206.4881. https://doi.org/10.48550/arXiv.1206.4881
Crow JF (1999) Hardy, Weinberg and language impediments. Genetics 152:821–825. https://doi.org/10.1093/genetics/152.3.821
Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS, Haynes C, Henning AK, SanGiovanni JP, Mane SM, Mayne ST, Bracken MB (2005) Complement factor H polymorphism in age-related macular degeneration. Science 308:385–389. https://doi.org/10.1126/science.1109557
Ponte-Fernández C et al (2020) Toxo: a library for calculating penetrance tables of high-order epistasis models. BMC Bioinform 21:1–9. https://doi.org/10.1186/s12859-020-3456-3
Urbanowicz RJ et al (2012) GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures. BioData Min 5:1–14. https://doi.org/10.1186/1756-0381-5-16
Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13:2498–2504. https://doi.org/10.1101/gr.1239303
Jiang R, Tang W, Wu X, Fu W (2009) A random forest approach to the detection of epistatic interactions in case–control studies. BMC Bioinform 10:1–2. https://doi.org/10.1186/1471-2105-10-S1-S65
Piriyapongsa J et al (2012) iLOCi: a SNP interaction prioritization technique for detecting epistasis in genome-wide association studies. BMC Genom 13:1–15. https://doi.org/10.1186/1471-2164-13-S7-S2
Adrianto I et al (2012) Genome-wide association study of African and European Americans implicates multiple shared and ethnic specific loci in sarcoidosis susceptibility. PLoS ONE 1:e43907. https://doi.org/10.1371/journal.pone.0043907
Rybicki BA et al (2005) The BTNL2 gene and sarcoidosis susceptibility in African Americans and Whites. Am J Hum Genet 70:491–499. https://doi.org/10.1086/444435
Pathan S et al (2009) Confirmation of the novel association at the BTNL2 locus with ulcerative colitis. Tissue Antigens 74:322–329. https://doi.org/10.1111/j.1399-0039.2009.01314.x
Fisher SA et al (2008) Genetic determinants of ulcerative colitis include the ECM1 locus and five loci implicated in Crohn’s disease. Nat Genet 40:710–712. https://doi.org/10.1038/ng.145
Suzuki H et al (2012) Genetic characterization and susceptibility for sarcoidosis in Japanese patients: risk factors of BTNL2 gene polymorphisms and HLA class II alleles. Invest Ophthalmol Vis Sci 53:7109–7115. https://doi.org/10.1167/iovs.12-10491
Wang M et al (2009) Detecting significant single-nucleotide polymorphisms in a rheumatoid arthritis study using random forests. BMC Proc 3:1–5. https://doi.org/10.1186/1753-6561-3-s7-s69
Jin Y et al (2010) Variant of TYR and autoimmunity susceptibility loci in generalized vitiligo. N Engl J Med 362:1686–1697. https://doi.org/10.1056/NEJMoa0908547
Zhang Z et al (2019) Host genetic determinants of hepatitis B virus infection. Front Genet. https://doi.org/10.3389/fgene.2019.00696
Yoshida M, Koike A (2011) SNPInterForest: a new method for detecting epistatic interactions. BMC Bioinform 12:1–10. https://doi.org/10.1186/1471-2105-12-469
Woo HJ, Yu C, Reifman J (2017) Collective genetic interaction effects and the role of antigen-presenting cells in autoimmune diseases. PLoS ONE 12:e0169918. https://doi.org/10.1371/journal.pone.0169918
Herbeck JT, Gottlieb GS et al (2010) Multistage genomewide association study identifies a locus at 1q41 associated with rate of HIV-1 disease progression to clinical AIDS. J Infect Dis 201:618–626. https://doi.org/10.1086/649842
Achour Y, Ben Hamad M et al (2017) Analysis of two susceptibility SNPs in HLA region and evidence of interaction between rs6457617 in HLA-DQB1 and HLA-DRB1* 04 locus on Tunisian rheumatoid arthritis. J Genet 96:911–918. https://doi.org/10.1007/s12041-017-0855-y
Holmberg D, Ruikka K et al (2016) Association of CD247 (CD3ζ) gene polymorphisms with T1D and AITD in the population of northern Sweden. BMC Med Genet 17:1–7. https://doi.org/10.1186/s12881-016-0333-z
Nygard L, Laine AP et al (2021) Tri-SNP polymorphism in the intron of HLA-DRA1 affects type 1 diabetes susceptibility in the Finnish population. Hum Immunol 82:912–916. https://doi.org/10.1016/j.humimm.2021.07.010
Sambo F, Trifoglio E et al (2012) Bag of Naïve Bayes: biomarker selection and classification from genome-wide SNP data. BMC Bioinform 13:1. https://doi.org/10.1186/1471-2105-13-S14-S2
Wan X, Yang C et al (2009) MegaSNPHunter: a learning approach to detect disease predisposition SNPs and high level interactions in genome wide association study. BMC Bioinform 10:1–5. https://doi.org/10.1186/1471-2105-10-13
Buzdugan L et al (2016) Assessing statistical significance in multivariable genome wide association analysis. Bioinformatics 32:1990–2000. https://doi.org/10.1093/bioinformatics/btw128
Acknowledgements
Shouheng Tuo first proposed the idea of MTACO-DMSI algorithm, guided the writing of the paper, and revised the manuscript in detail. Chao Li proposed some ideas for improving the performance of MTACO-DMSI, did all the experiments, and wrote and revised the paper; Fan Liu, Yanling Zhu, Tianrui Chen, Zhenyu Feng, Haiyan Liu, and Aimin Li also proposed many good ideas for this work. This study utilizes data generated by the Wellcome Trust Case–Control Consortium. A full list of the investigators who contributed to the generation of the data is available from http://www.wtccc.org.uk. Funding for the project was provided by the Wellcome Trust under awards 076113, 085475 and 090355.
Funding
This work was supported in part by the Natural Science Foundation of Shaanxi Provincial Education Department (Grant No. 18JK0165) and Natural Science Foundation of China under Grant No. 62002289.
Author information
Authors and Affiliations
Contributions
A novel multitasking ant colony optimization algorithm is proposed for the detection of multiorder SNP interactions. A new multipopulation mechanism is presented for enhancing the global search ability. Two complementary score functions are employed to improve the discrimination ability of various disease models. An effective knowledge transfer mechanism is designed for sharing interacting SNPs among tasks, which can help the current task rapidly detect multiorder SNP interactions.
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Tuo, S., Li, C., Liu, F. et al. A Novel Multitasking Ant Colony Optimization Method for Detecting Multiorder SNP Interactions. Interdiscip Sci Comput Life Sci 14, 814–832 (2022). https://doi.org/10.1007/s12539-022-00530-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12539-022-00530-2