Skip to main content

Defining and Discovering Interactive Causes

  • Chapter
  • First Online:
Advances in Biomedical Informatics

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 137))

  • 1074 Accesses

Abstract

The problem of learning causal influences from passive data has attracted a good deal of attention in the past 30 years, and techniques have been developed and tested. These techniques assume the composition property, which entails that they cannot in general learn interactive causes with little marginal effects. However, such interactions are fairly commonplace. One notable example is genetic epistasis, which is the interaction of two or more genetic loci to affect phenotype. Often the genes exhibit little marginal effects. Another important example is the interaction of a treatment with patient features to affect outcomes. Even though efforts have recently been made towards developing new algorithms that discover such interactions from data, to our knowledge no definition of a discrete causal interaction has been forwarded. Using information theory, we develop a fuzzy definition of a discrete causal action, called Interaction Strength (IS). The IS is bounded above by 1 and equals 1 if the causes in the interaction exhibit no marginal effects. Using the IS and BN scoring, we develop an exhaustive search algorithm, Exhaustive-IGain, which learns interactions from low-dimension datasets, and a heuristic search algorithm, called MBS-IGain, which learns interactions from high-dimensional datasets. Using simulated high-dimensional datasets, based on models of genetic epistasis, we compare MBS-IGain to 7 algorithms that learn genetic epistasis from high-dimensional datasets, and show that MBS-IGain’s discovery performance is notably better than the other methods. We apply MBS-IGain to a real LOAD dataset, and obtain results substantiating previous research and new results. Using low-dimensional simulated datasets, we show Exhaustive-IGain can learn 4-cause interactions with no marginal effects. We apply Exhaustive-Gain to a real clinical breast cancer datasets, and learn interactions that agree with the judgements of a breast cancer oncologist. Our algorithms are only directly applicable to problems where we have a specified target and its candidate causes. However, our algorithms could be used for general causal learning by being a front end to a standard causal learning algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Adam Brufsky, MD, PhD, ​Professor of Medicine at the University of Pittsburgh School of Medicine.

References

  1. Spirtes, P., Glymour, C., Scheines, R.: Causation, prediction, and search. MIT Press, Boston, MA (2000)

    MATH  Google Scholar 

  2. http://www.phil.cmu.edu/tetrad/

  3. Chickering, D., Meek, C.,: Finding optimal Bayesian networks. In: Darwiche, A., Friedman, N. (eds.) Uncertainty in Artificial Intelligence, Proceedings of the Eighteenth Conference. Morgan Kaufmann, San Mateo, CA (2002)

    Google Scholar 

  4. Cheverud, J., Routman, E.: Epistasis and its contribution to genetic variance components. Genetics 139(3), 1455 (1995)

    Google Scholar 

  5. Urbanowicz, R., Granizo-Mackenzie, A., Kiralis, J., Moore, J.H.: A classification and characterization of two-locus, pure, strict, epistatic models for simulation and detection. BioData Min. 7, 8 (2014)

    Article  Google Scholar 

  6. Fisher, R.: The correlation between relatives on the supposition of mendelian inheritance. Trans R Soc Edinburgh 52, 399–433 (1918)

    Article  Google Scholar 

  7. Galvin, A., Ioannidis, J.P.A., Dragani, T.A.: Beyond genome-wide association studies: Genetic heterogeneity and individual predisposition to cancer. Trends Genet. 26(3), 132–141 (2010)

    Article  Google Scholar 

  8. Manolio, T.A., Collins, F.S., Cox, N.J., et al.: Finding the missing heritability of complex diseases and complex traits. Nature 461, 747–753 (2009)

    Article  Google Scholar 

  9. Mahr, B.: Personal genomics: The case of missing heritability. Nature 456, 18–21 (2008)

    Article  Google Scholar 

  10. Moore, J.H., Asselbergs, F.W., Williams, S.M.: Bioinformatics challenges for genome-wide association studies. Bioinformatics 26, 445–455 (2010)

    Article  Google Scholar 

  11. Manolio, T.A., Collins, F.S.: The HapMap and genome-wide association studies in diagnosis and therapy. Annu. Rev. Med. 60, 443–456 (2009)

    Article  Google Scholar 

  12. Herbert, A., Gerry, N.P., McQueen, M.B.: A common genetic variant is associated with adult and childhood obesity. J. Comput. Biol. 312, 279–384 (2006)

    Google Scholar 

  13. Spinola, M., Meyer, P., Kammerer, S., et al.: Association of the PDCD5 locus with long cancer risk and prognosis in smokers. Am. J. Hum. Genet. 55, 27–46 (2001)

    Google Scholar 

  14. Lambert, J.C., Heath, S., Even, G., et al.: Genome-wide association study identifies variants at CLU and CR1 associated with Alzheimer’s disease. Nat. Genet. 41, 1094–1099 (2009)

    Article  Google Scholar 

  15. Curtis, C., Shah, S.P., Chin, S.F., et al.: The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroup. Nature 486, 346–352 (2012)

    Google Scholar 

  16. Soulakis, N.D., Carson, M.B., Lee, Y.J., Schneider, D.H., Skeehan, C.T., Scholtens, D.M.: Visualizing collaborative electronic health record usage for hospitalized patients with heart failure. JAMIA 22(2), 299–311 (2015)

    Google Scholar 

  17. Neapolitan, R.E.: Learning Bayesian Networks. Prentice Hall, Upper Saddle River, NJ (2004)

    Google Scholar 

  18. Jensen, F.V., Neilsen, T.D.: Bayesian Networks and Decision Graphs. Springer-Verlag, New York (2007)

    Book  Google Scholar 

  19. Neapolitan, R.E.: Probabilistic reasoning in expert systems. Wiley, NY, NY (1989)

    Google Scholar 

  20. Pearl, J.: Probabilistic reasoning in intelligent systems. Morgan Kaufmann, Burlington, MA (1988)

    MATH  Google Scholar 

  21. Segal, E., Pe’er, D., Regev, A., Koller, D., Friedman, N.: Learning module networks. Journal of Machine Learning Research 6, 557–588 (2005)

    MATH  MathSciNet  Google Scholar 

  22. Friedman, N., Linial, M., Nachman, I., Pe’er, D. Using Bayesian networks to analyze expression data. In: Proceedings of the fourth annual international conference on computational molecular biology, Tokyo, Japan (2005)

    Google Scholar 

  23. Fishelson, M., Geiger, D.: Optimizing exact genetic linkage computation. J. Comput. Biol. 11, 263–275 (2004)

    Article  Google Scholar 

  24. Neapolitan, R.E.: Probabilistic Reasoning in Bioinformatics. Morgan Kaufmann, Burlington, MA (2009)

    MATH  Google Scholar 

  25. Jiang, X., Cooper, G.F.: A real-time temporal Bayesian architecture for event surveillance and its application to patient-specific multiple disease outbreak detection. Data Min. Knowl. Disc. 20(3), 328–360 (2010)

    Article  Google Scholar 

  26. Jiang, X., Wallstrom, G., Cooper, G.F., Wagner, M.M.: Bayesian prediction of an epidemic curve. J. Biomed. Inform. 42(1), 90–99 (2009)

    Article  Google Scholar 

  27. Cooper, G.F.: The computational complexity of probabilistic inference using Bayesian belief networks. J. Artif. Intell. Res 42(2–3), 393–405 (1990)

    Article  MATH  MathSciNet  Google Scholar 

  28. Cooper, G.F., Herskovits, E.: A Bayesian method for the induction of probabilistic networks from data. Mach. Learn. 9, 309–347 (1992)

    MATH  Google Scholar 

  29. Heckerman D, Geiger D, Chickering D. Learning Bayesian networks: The combination of knowledge and statistical data. Technical report MSR-TR-94–09. Microsoft Research, 1995

    Google Scholar 

  30. Chickering, M.: Learning Bayesian networks is NP-complete. In: Fisher, D., Lenz, H., (eds.) Learning from Data: Artificial Intelligence and Statistics V. Springer-Verlag, NY (1996)

    Google Scholar 

  31. Shannon, C.E.: A mathematical theory of communication. The Bell System Technical Journal 27(3), 379–423 (1948)

    Article  MATH  MathSciNet  Google Scholar 

  32. Zadeh, L.A.: Fuzzy sets. Inf. Control 8, 338–353 (1965)

    Article  MATH  Google Scholar 

  33. Zang, Z., Jiang, X., Neapolitan, R.E.: Discovering causal interactions using Bayesian network scoring and information gain. BMC Bioinformatics 17, 221 (2016)

    Article  Google Scholar 

  34. Jiang, X., Jao, J., Neapolitan, R.E. Learning predictive interactions using Information Gain and Bayesian network scoring. PLOS ONE (2015) http://dx.doi.org/10.1371/journal.pone.0143247

  35. Jiang, X., Barmada, M.M., Cooper, G.F., Becich, M.J.: A Bayesian method for evaluating and discovering disease loci associations. PLoS ONE 6(8), e22075 (2011)

    Article  Google Scholar 

  36. Kooperberg, C., Ruczinski, I.: Identifying interacting SNPs using Monte Carlo logic regression. Genet. Epidemiol. 28, 157–170 (2005)

    Article  Google Scholar 

  37. Agresti, A.: Categorical data analysis, 2nd edn. Wiley, New York (2007)

    MATH  Google Scholar 

  38. Park, M.Y., Hastie, T.: Penalized logistic regression for detecting gene interactions. Biostatistics 9, 30–50 (2008)

    Article  MATH  Google Scholar 

  39. Wu, T.T., Chen, Y.F., Hastie, T., Sobel, E., Lange, K.: Genome-wide association analysis by lasso penalized logistic regression. Genome Analysis 25, 714–721 (2009)

    Google Scholar 

  40. Hahn, L.W., Ritchie, M.D., Moore, J.H.: Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions. Bioinformatics 19, 376–382 (2003)

    Article  Google Scholar 

  41. Marchini, J., Donnelly, P., Cardon, L.R.: Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat. Genet. 37, 413–417 (2005)

    Article  Google Scholar 

  42. Moore, J.H., Gilbert, J.C., Tsai, C.T., Chiang, F.T., Holden, T., Barney, N., et al.: A flexible computational framework for detecting characterizing and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. J. Theor. Biol. 241, 252–261 (2006)

    Article  MathSciNet  Google Scholar 

  43. Yang, C., He, Z., Wan, X., Yang, Q., Xue, H., Yu, W.: SNPHarvester: a filtering-based approach for detecting epistatic interactions in genome-wide association studies. Bioinformatics 25, 504–511 (2009)

    Article  Google Scholar 

  44. Moore, J.H., White, B.C. Tuning ReliefF for genome-wide genetic analysis. In: Marchiori, E., Moore JH, Rajapakee JC (eds.) Proceedings of EvoBIO 2007. Berlin: Springer-Verlag (2007)

    Google Scholar 

  45. Meng Y, Yang Q, Cuenco KT, Cupples LA, Destefano AL, Lunetta KL 2007. Two-stage approach for identifying single-nucleotide polymorphisms associated with rheumatoid arthritis using random forests and Bayesian networks. BMC Proc 2007: 1 Suppl 1:S56

    Google Scholar 

  46. Wan, X., Yang, C., Yang, Q., Xue, H., Tang, N.L., Yu, W.: Predictive rule inference for epistatic interaction detection in genome-wide association studies. Bioinformatics 26(1), 30–37 (2007)

    Article  Google Scholar 

  47. Zhang, Y., Liu, J.S.: Bayesian inference of epistatic interactions in case control studies. Nat. Genet. 39, 1167–1173 (2007)

    Article  Google Scholar 

  48. Miller, D.J., Zhang, Y., Yu, G., Liu, Y., Chen, L., Langefeld, C.D., et al.: An algorithm for learning maximum entropy probability models of disease risk that efficiently searches and sparingly encodes multilocus genomic interactions. Bioinformatics 25(19), 2478–2485 (2009)

    Article  Google Scholar 

  49. Jiang X, Barmada MM, Neapolitan RE, Visweswaran S, Cooper GF. A fast algorithm for learning epistatic genomic relationships. AMIA Symposium Proceedings 2010: 341–345

    Google Scholar 

  50. Jiang, X., Neapolitan, R.E.: LEAP: biomarker inference through learning and evaluating association patterns. Genet. Epidemiol. 39(3), 173–184 (2015)

    Article  Google Scholar 

  51. Chen, L., Yu, G., Langefeld, C.D., et al.: Comparative analysis of methods for detecting interacting loci. BMC Genom. 12, 344 (2011)

    Article  Google Scholar 

  52. Rieman, E.M., Webster, J.A., Myers, A.J., Hardy, J., Dunckley, T., Zismann, V.L., et al.: GAB2 alleles modify Alzheimer’s risk in APOE carriers. Neuron 54, 713–720 (2007)

    Article  Google Scholar 

  53. Tycko, B., Lee, J.H., Ciappa, A., Saxena, A., Li, C.M., Feng, L.: APOE and APOC1 promoter polymorphisms and the risk of Alzheimer disease in African American and Caribbean Hispanic individuals. Arch. Neurol. 61(9), 1434–1439 (2004)

    Article  Google Scholar 

  54. Turner SD, Martin ER, Beecham GW, Gilbert JR, Haines JL, Pericak-Vance MA, et al. Genome-wide Analysis of Gene-Gene Interaction in Alzheimer Disease. Abstract in ASHG 2008 Annual Meeting (2008)

    Google Scholar 

  55. Urbanowicz R, Kiralis J, Sinnott-Armstrong NA, et al. GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures. BioData Min. 2012; 5(1):16. doi:10.1186/1756-0381-5-16

  56. Fisher, R.A.: On the ‘probable error’ of a coefficient of correlation deduced from a small sample. Metron 1, 3–32 (1921)

    Google Scholar 

  57. Rathnam, C., Lee, S., Jiang, X.: An algorithm for direct causal learning of influences on patient outcomes. Artif. Intell. Med. 75, 1–15 (2017)

    Article  Google Scholar 

Download references

Acknowledgements

Funding

This work was supported by National Library of Medicine grants number R00LM010822, R01LM011663, and R01LM011962.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xia Jiang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Cite this chapter

Jiang, X., Neapolitan, R. (2018). Defining and Discovering Interactive Causes. In: Holmes, D., Jain, L. (eds) Advances in Biomedical Informatics. Intelligent Systems Reference Library, vol 137. Springer, Cham. https://doi.org/10.1007/978-3-319-67513-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67513-8_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67512-1

  • Online ISBN: 978-3-319-67513-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics