Abstract
The problem of learning causal influences from passive data has attracted a good deal of attention in the past 30 years, and techniques have been developed and tested. These techniques assume the composition property, which entails that they cannot in general learn interactive causes with little marginal effects. However, such interactions are fairly commonplace. One notable example is genetic epistasis, which is the interaction of two or more genetic loci to affect phenotype. Often the genes exhibit little marginal effects. Another important example is the interaction of a treatment with patient features to affect outcomes. Even though efforts have recently been made towards developing new algorithms that discover such interactions from data, to our knowledge no definition of a discrete causal interaction has been forwarded. Using information theory, we develop a fuzzy definition of a discrete causal action, called Interaction Strength (IS). The IS is bounded above by 1 and equals 1 if the causes in the interaction exhibit no marginal effects. Using the IS and BN scoring, we develop an exhaustive search algorithm, Exhaustive-IGain, which learns interactions from low-dimension datasets, and a heuristic search algorithm, called MBS-IGain, which learns interactions from high-dimensional datasets. Using simulated high-dimensional datasets, based on models of genetic epistasis, we compare MBS-IGain to 7 algorithms that learn genetic epistasis from high-dimensional datasets, and show that MBS-IGain’s discovery performance is notably better than the other methods. We apply MBS-IGain to a real LOAD dataset, and obtain results substantiating previous research and new results. Using low-dimensional simulated datasets, we show Exhaustive-IGain can learn 4-cause interactions with no marginal effects. We apply Exhaustive-Gain to a real clinical breast cancer datasets, and learn interactions that agree with the judgements of a breast cancer oncologist. Our algorithms are only directly applicable to problems where we have a specified target and its candidate causes. However, our algorithms could be used for general causal learning by being a front end to a standard causal learning algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Adam Brufsky, MD, PhD, Professor of Medicine at the University of Pittsburgh School of Medicine.
References
Spirtes, P., Glymour, C., Scheines, R.: Causation, prediction, and search. MIT Press, Boston, MA (2000)
Chickering, D., Meek, C.,: Finding optimal Bayesian networks. In: Darwiche, A., Friedman, N. (eds.) Uncertainty in Artificial Intelligence, Proceedings of the Eighteenth Conference. Morgan Kaufmann, San Mateo, CA (2002)
Cheverud, J., Routman, E.: Epistasis and its contribution to genetic variance components. Genetics 139(3), 1455 (1995)
Urbanowicz, R., Granizo-Mackenzie, A., Kiralis, J., Moore, J.H.: A classification and characterization of two-locus, pure, strict, epistatic models for simulation and detection. BioData Min. 7, 8 (2014)
Fisher, R.: The correlation between relatives on the supposition of mendelian inheritance. Trans R Soc Edinburgh 52, 399–433 (1918)
Galvin, A., Ioannidis, J.P.A., Dragani, T.A.: Beyond genome-wide association studies: Genetic heterogeneity and individual predisposition to cancer. Trends Genet. 26(3), 132–141 (2010)
Manolio, T.A., Collins, F.S., Cox, N.J., et al.: Finding the missing heritability of complex diseases and complex traits. Nature 461, 747–753 (2009)
Mahr, B.: Personal genomics: The case of missing heritability. Nature 456, 18–21 (2008)
Moore, J.H., Asselbergs, F.W., Williams, S.M.: Bioinformatics challenges for genome-wide association studies. Bioinformatics 26, 445–455 (2010)
Manolio, T.A., Collins, F.S.: The HapMap and genome-wide association studies in diagnosis and therapy. Annu. Rev. Med. 60, 443–456 (2009)
Herbert, A., Gerry, N.P., McQueen, M.B.: A common genetic variant is associated with adult and childhood obesity. J. Comput. Biol. 312, 279–384 (2006)
Spinola, M., Meyer, P., Kammerer, S., et al.: Association of the PDCD5 locus with long cancer risk and prognosis in smokers. Am. J. Hum. Genet. 55, 27–46 (2001)
Lambert, J.C., Heath, S., Even, G., et al.: Genome-wide association study identifies variants at CLU and CR1 associated with Alzheimer’s disease. Nat. Genet. 41, 1094–1099 (2009)
Curtis, C., Shah, S.P., Chin, S.F., et al.: The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroup. Nature 486, 346–352 (2012)
Soulakis, N.D., Carson, M.B., Lee, Y.J., Schneider, D.H., Skeehan, C.T., Scholtens, D.M.: Visualizing collaborative electronic health record usage for hospitalized patients with heart failure. JAMIA 22(2), 299–311 (2015)
Neapolitan, R.E.: Learning Bayesian Networks. Prentice Hall, Upper Saddle River, NJ (2004)
Jensen, F.V., Neilsen, T.D.: Bayesian Networks and Decision Graphs. Springer-Verlag, New York (2007)
Neapolitan, R.E.: Probabilistic reasoning in expert systems. Wiley, NY, NY (1989)
Pearl, J.: Probabilistic reasoning in intelligent systems. Morgan Kaufmann, Burlington, MA (1988)
Segal, E., Pe’er, D., Regev, A., Koller, D., Friedman, N.: Learning module networks. Journal of Machine Learning Research 6, 557–588 (2005)
Friedman, N., Linial, M., Nachman, I., Pe’er, D. Using Bayesian networks to analyze expression data. In: Proceedings of the fourth annual international conference on computational molecular biology, Tokyo, Japan (2005)
Fishelson, M., Geiger, D.: Optimizing exact genetic linkage computation. J. Comput. Biol. 11, 263–275 (2004)
Neapolitan, R.E.: Probabilistic Reasoning in Bioinformatics. Morgan Kaufmann, Burlington, MA (2009)
Jiang, X., Cooper, G.F.: A real-time temporal Bayesian architecture for event surveillance and its application to patient-specific multiple disease outbreak detection. Data Min. Knowl. Disc. 20(3), 328–360 (2010)
Jiang, X., Wallstrom, G., Cooper, G.F., Wagner, M.M.: Bayesian prediction of an epidemic curve. J. Biomed. Inform. 42(1), 90–99 (2009)
Cooper, G.F.: The computational complexity of probabilistic inference using Bayesian belief networks. J. Artif. Intell. Res 42(2–3), 393–405 (1990)
Cooper, G.F., Herskovits, E.: A Bayesian method for the induction of probabilistic networks from data. Mach. Learn. 9, 309–347 (1992)
Heckerman D, Geiger D, Chickering D. Learning Bayesian networks: The combination of knowledge and statistical data. Technical report MSR-TR-94–09. Microsoft Research, 1995
Chickering, M.: Learning Bayesian networks is NP-complete. In: Fisher, D., Lenz, H., (eds.) Learning from Data: Artificial Intelligence and Statistics V. Springer-Verlag, NY (1996)
Shannon, C.E.: A mathematical theory of communication. The Bell System Technical Journal 27(3), 379–423 (1948)
Zadeh, L.A.: Fuzzy sets. Inf. Control 8, 338–353 (1965)
Zang, Z., Jiang, X., Neapolitan, R.E.: Discovering causal interactions using Bayesian network scoring and information gain. BMC Bioinformatics 17, 221 (2016)
Jiang, X., Jao, J., Neapolitan, R.E. Learning predictive interactions using Information Gain and Bayesian network scoring. PLOS ONE (2015) http://dx.doi.org/10.1371/journal.pone.0143247
Jiang, X., Barmada, M.M., Cooper, G.F., Becich, M.J.: A Bayesian method for evaluating and discovering disease loci associations. PLoS ONE 6(8), e22075 (2011)
Kooperberg, C., Ruczinski, I.: Identifying interacting SNPs using Monte Carlo logic regression. Genet. Epidemiol. 28, 157–170 (2005)
Agresti, A.: Categorical data analysis, 2nd edn. Wiley, New York (2007)
Park, M.Y., Hastie, T.: Penalized logistic regression for detecting gene interactions. Biostatistics 9, 30–50 (2008)
Wu, T.T., Chen, Y.F., Hastie, T., Sobel, E., Lange, K.: Genome-wide association analysis by lasso penalized logistic regression. Genome Analysis 25, 714–721 (2009)
Hahn, L.W., Ritchie, M.D., Moore, J.H.: Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions. Bioinformatics 19, 376–382 (2003)
Marchini, J., Donnelly, P., Cardon, L.R.: Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat. Genet. 37, 413–417 (2005)
Moore, J.H., Gilbert, J.C., Tsai, C.T., Chiang, F.T., Holden, T., Barney, N., et al.: A flexible computational framework for detecting characterizing and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. J. Theor. Biol. 241, 252–261 (2006)
Yang, C., He, Z., Wan, X., Yang, Q., Xue, H., Yu, W.: SNPHarvester: a filtering-based approach for detecting epistatic interactions in genome-wide association studies. Bioinformatics 25, 504–511 (2009)
Moore, J.H., White, B.C. Tuning ReliefF for genome-wide genetic analysis. In: Marchiori, E., Moore JH, Rajapakee JC (eds.) Proceedings of EvoBIO 2007. Berlin: Springer-Verlag (2007)
Meng Y, Yang Q, Cuenco KT, Cupples LA, Destefano AL, Lunetta KL 2007. Two-stage approach for identifying single-nucleotide polymorphisms associated with rheumatoid arthritis using random forests and Bayesian networks. BMC Proc 2007: 1 Suppl 1:S56
Wan, X., Yang, C., Yang, Q., Xue, H., Tang, N.L., Yu, W.: Predictive rule inference for epistatic interaction detection in genome-wide association studies. Bioinformatics 26(1), 30–37 (2007)
Zhang, Y., Liu, J.S.: Bayesian inference of epistatic interactions in case control studies. Nat. Genet. 39, 1167–1173 (2007)
Miller, D.J., Zhang, Y., Yu, G., Liu, Y., Chen, L., Langefeld, C.D., et al.: An algorithm for learning maximum entropy probability models of disease risk that efficiently searches and sparingly encodes multilocus genomic interactions. Bioinformatics 25(19), 2478–2485 (2009)
Jiang X, Barmada MM, Neapolitan RE, Visweswaran S, Cooper GF. A fast algorithm for learning epistatic genomic relationships. AMIA Symposium Proceedings 2010: 341–345
Jiang, X., Neapolitan, R.E.: LEAP: biomarker inference through learning and evaluating association patterns. Genet. Epidemiol. 39(3), 173–184 (2015)
Chen, L., Yu, G., Langefeld, C.D., et al.: Comparative analysis of methods for detecting interacting loci. BMC Genom. 12, 344 (2011)
Rieman, E.M., Webster, J.A., Myers, A.J., Hardy, J., Dunckley, T., Zismann, V.L., et al.: GAB2 alleles modify Alzheimer’s risk in APOE carriers. Neuron 54, 713–720 (2007)
Tycko, B., Lee, J.H., Ciappa, A., Saxena, A., Li, C.M., Feng, L.: APOE and APOC1 promoter polymorphisms and the risk of Alzheimer disease in African American and Caribbean Hispanic individuals. Arch. Neurol. 61(9), 1434–1439 (2004)
Turner SD, Martin ER, Beecham GW, Gilbert JR, Haines JL, Pericak-Vance MA, et al. Genome-wide Analysis of Gene-Gene Interaction in Alzheimer Disease. Abstract in ASHG 2008 Annual Meeting (2008)
Urbanowicz R, Kiralis J, Sinnott-Armstrong NA, et al. GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures. BioData Min. 2012; 5(1):16. doi:10.1186/1756-0381-5-16
Fisher, R.A.: On the ‘probable error’ of a coefficient of correlation deduced from a small sample. Metron 1, 3–32 (1921)
Rathnam, C., Lee, S., Jiang, X.: An algorithm for direct causal learning of influences on patient outcomes. Artif. Intell. Med. 75, 1–15 (2017)
Acknowledgements
Funding
This work was supported by National Library of Medicine grants number R00LM010822, R01LM011663, and R01LM011962.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Jiang, X., Neapolitan, R. (2018). Defining and Discovering Interactive Causes. In: Holmes, D., Jain, L. (eds) Advances in Biomedical Informatics. Intelligent Systems Reference Library, vol 137. Springer, Cham. https://doi.org/10.1007/978-3-319-67513-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-67513-8_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67512-1
Online ISBN: 978-3-319-67513-8
eBook Packages: EngineeringEngineering (R0)