Evaluation of associative classification-based multifactor dimensionality reduction in the presence of noise

  • Suneetha Uppu
  • Aneesh Krishna
Original Article


The advancements in genetic epidemiology have focused more on understanding the associations and functional relationships among the genes. Identifying the susceptible genes and their interaction effects over the complex traits remains statistically and computationally challenging. An associative classification-based multifactor dimensionality reduction method (MDRAC) was proposed to improve the identification of multi-locus interacting genes associated with a disease. The method was evaluated for one to six loci by varying heritability, minor allele frequency, case–control ratios, and sample size. The experimental results demonstrated significant improvements in the accuracy over the previous methods. However, the performance of MDRAC in the presence of noise due to genotyping error, missing data, phenocopy, and genetic heterogeneity is unknown. The goal of this study is to evaluate MDRAC for identifying single nucleotide polymorphism interactions in the presence of noise. Several experiments are conducted on simulated datasets and on a published dataset to demonstrate the performance of MDRAC. On average, the results showed improved performance over the previous MDR method in all the models. However, the performance of MDRAC is reduced in the presence of phenocopy and genetic heterogeneity, or their combinations with other sources of noise.


Epistasis Multifactor dimensionality reduction Genotyping error Missing data Phenocopy Genetic heterogeneity 



We thank John Wallace from the Ritchie Lab, Pennsylvania State University for his expert assistance in simulating the datasets in the presence of common sources of noise. We appreciate the generosity of Dr. Jason Moore and his colleagues at the Dartmouth Medical School in making MDR software tool and java source code available at We also appreciate Dr. Juan R Gonzalez and his colleagues for developing the SNPassoc package available for R environment along with the datasets.


  1. Akey JM, Zhang K, Xiong M, Doris P, Jin L (2001) The effect that genotyping errors have on the robustness of common linkage-disequilibrium measures. Am J Hum Genet 68:1447–1456CrossRefGoogle Scholar
  2. Anderson JA (1995) An introduction to neural networks. MIT Press, CambridgeMATHGoogle Scholar
  3. Breiman L (2001) Random forests. Mach Learn 45:5–32CrossRefMATHGoogle Scholar
  4. Chen CC, Schwender H, Keith J, Nunkesser R, Mengersen K, Macrossan P (2011) Methods for identifying SNP interactions: a review on variations of Logic Regression, Random Forest and Bayesian logistic regression. IEEE/ACM Trans Comput Biol Bioinform 8:1580–1591CrossRefGoogle Scholar
  5. Chung Y, Lee SY, Elston RC, Park T (2007) Odds ratio based multifactor-dimensionality reduction method for detecting gene–gene interactions. Bioinformatics 23:71–76CrossRefGoogle Scholar
  6. Cordell HJ (2009) Detecting gene–gene interactions that underlie human diseases. Nat Rev Genet 10:392–404CrossRefGoogle Scholar
  7. Culverhouse R, Klein T, Shannon W (2004) Detecting epistatic interactions contributing to quantitative traits. Genet Epidemiol 27:141–152CrossRefGoogle Scholar
  8. Frankel WN, Schork NJ (1996) Who’s afraid of epistasis? Nat Genet 14:371–373CrossRefGoogle Scholar
  9. González JR, Armengol L, Solé X, Guinó E, Mercader JM, Estivill X, Moreno V (2007) SNPassoc: an R package to perform whole genome association studies. Bioinformatics 23:654–655CrossRefGoogle Scholar
  10. González JR, Armengol L, Guinó E, Solé X, Moreno V (2014) SNPs-based whole genome association studies.
  11. Hahn LW, Ritchie MD, Moore JH (2003) Multifactor dimensionality reduction software for detecting gene–gene and gene–environment interactions. Bioinformatics 19:376–382CrossRefGoogle Scholar
  12. Haines JL, Pericak-Vance MA (2006) Genetic analysis of complex disease. Wiley, New YorkCrossRefGoogle Scholar
  13. Han J (2003) CPAR: classification based on predictive association rules. In: Proceedings of the third SIAM international conference on data mining, pp 331–335Google Scholar
  14. Han J, Kamber M, Pei J (2006) Data mining: concepts and techniques. Morgan Kaufmann, AmsterdamMATHGoogle Scholar
  15. Kaufman L, Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis, vol 344. Wiley, New YorkMATHGoogle Scholar
  16. King RA, Rotter JI, Motulsky AG (2002) The genetic basis of common diseases. Oxford University Press, OxfordGoogle Scholar
  17. Kohonen T (2001) Self-organizing maps, vol 30. Springer, BerlinMATHGoogle Scholar
  18. Lee SY, Chung Y, Elston RC, Kim Y, Park T (2007) Log-linear model-based multifactor dimensionality reduction method to detect gene–gene interactions. Bioinformatics 23:2589–2595CrossRefGoogle Scholar
  19. Lescai F, Franceschi C (2010) The impact of phenocopy on the genetic analysis of complex traits. PLoS ONE 5:e11876CrossRefGoogle Scholar
  20. Li W, Reich J (2000) A complete enumeration and classification of two-locus disease models. Hum Hered 50:334–349CrossRefGoogle Scholar
  21. Moore JH (2003) The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Hum Hered 56:73–82CrossRefGoogle Scholar
  22. Moore JH, Williams SM (2002) New strategies for identifying gene–gene interactions in hypertension. Ann Med 34:88–95CrossRefGoogle Scholar
  23. Moore JH, Williams SM (2005) Traversing the conceptual divide between biological and statistical epistasis: systems biology and a more modern synthesis. BioEssays 27:637–646CrossRefGoogle Scholar
  24. Moore JH, Hahn LW, Ritchie MD, Thornton TA, White BC (2002) Application of genetic algorithms to the discovery of complex models for simulation studies in human genetics. In: Proceedings of the genetic and evolutionary computation conference/GECCO, Genetic and evolutionary computation conferenceGoogle Scholar
  25. Moore JH, Gilbert JC, Tsai C-T, Chiang F-T, Holden T, Barney N, White BC (2006) A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. J Theor Biol 241:252–261MathSciNetCrossRefGoogle Scholar
  26. Nelson M, Kardia S, Ferrell R, Sing C (2001) A combinatorial partitioning method to identify multilocus genotypic partitions that predict quantitative trait variation. Genome Res 11:458–470CrossRefGoogle Scholar
  27. Niel C, Sinoquet C, Dina C, Rocheleau G (2015) A survey about methods dedicated to epistasis detection. Front Genet 6:285Google Scholar
  28. Padyukov L (2013) Between the lines of genetic code: genetic interactions in understanding disease and complex phenotypes. Academic, San DiegoGoogle Scholar
  29. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, De Bakker PI, Daly MJ (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81:559–575CrossRefGoogle Scholar
  30. Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, Parl FF, Moore JH (2001) Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet 69:138–147CrossRefGoogle Scholar
  31. Ritchie MD, Hahn LW, Moore JH (2003) Power of multifactor dimensionality reduction for detecting gene–gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity. Genet Epidemiol 24:150–157CrossRefGoogle Scholar
  32. Schork NJ, Fallin D, Thiel B, Xu X, Broeckel U, Jacob HJ, Cohen D (2001) 14 The future of genetic case–control studies. Adv Genet 42:191–212Google Scholar
  33. Shannon WD, Province MA, Rao D (2001) Tree-based recursive partitioning methods for subdividing sibpairs into relatively more homogeneous subgroups. Genet Epidemiol 20:293–306CrossRefGoogle Scholar
  34. Tang W, Wu X, Jiang R, Li Y (2009) Epistatic module detection for case–control studies: a Bayesian model with a Gibbs sampling strategy. PLoS Genet 5:e1000464CrossRefGoogle Scholar
  35. Thabtah F (2007) A review of associative classification mining. Knowl Eng Rev 22:37–65CrossRefGoogle Scholar
  36. Uppu S, Krishna A, Gopalan RP (2014) Detecting SNP interactions in balanced and imbalanced datasets using associative classification. Aust J Intell Inform Process Syst 14(1):7–18Google Scholar
  37. Uppu S, Krishna A, Gopalan RP (2015a) Rule-based analysis for detecting epistasis using associative classification mining. Netw Model Anal Health Inform Bioinform 4:1–19CrossRefGoogle Scholar
  38. Uppu S, Krishna A, Gopalan RP (2015b) A multifactor dimensionality reduction based associative classification for detecting SNP interactions. In: Arik S, Huang T, Kin Lai W, Liu Q (eds) Neural information processing, vol 9489. Springer, pp 328–336Google Scholar
  39. Upstill-Goddard R, Eccles D, Fliege J, Collins A (2013) Machine learning approaches for the discovery of gene–gene interactions in disease data. Brief Bioinform 14:251–260CrossRefGoogle Scholar
  40. Van Steen K (2012) Travelling the world of gene–gene interactions. Brief Bioinform 13:1–19CrossRefGoogle Scholar
  41. Wan X, Yang C, Yang Q, Xue H, Tang NL, Yu W (2010a) Predictive rule inference for epistatic interaction detection in genome-wide association studies. Bioinformatics 26:30–37CrossRefGoogle Scholar
  42. Wan X, Yang C, Yang Q, Xue H, Fan X, Tang NL, Yu W (2010b) BOOST: a fast approach to detecting gene–gene interactions in genome-wide case–control studies. Am J Hum Genet 87:325–340CrossRefGoogle Scholar
  43. Wang Y, Liu X, Robbins K, Rekaya R (2010) AntEpiSeeker: detecting epistatic interactions for case–control studies using a two-stage ant colony optimization algorithm. BMC Res Notes 3:117CrossRefGoogle Scholar
  44. Wei W-H, Hemani G, Haley CS (2014) Detecting epistasis in human complex traits. Nat Rev Genet 15:722–733CrossRefGoogle Scholar
  45. Yu P, Wild DJ (2012) Fast rule-based bioactivity prediction using associative classification mining. J Cheminform 4:1–10CrossRefGoogle Scholar
  46. Zhang X, Huang S, Zou F, Wang W (2010) TEAM: efficient two-locus epistasis tests in human genome-wide association study. Bioinformatics 26:i217–i227CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Wien 2016

Authors and Affiliations

  1. 1.Department of ComputingCurtin UniversityBentley, PerthAustralia

Personalised recommendations