Defining and Discovering Interactive Causes

Jiang, Xia; Neapolitan, Richard

doi:10.1007/978-3-319-67513-8_4

Xia Jiang⁵ &
Richard Neapolitan⁶

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 137))

1074 Accesses

Abstract

The problem of learning causal influences from passive data has attracted a good deal of attention in the past 30 years, and techniques have been developed and tested. These techniques assume the composition property, which entails that they cannot in general learn interactive causes with little marginal effects. However, such interactions are fairly commonplace. One notable example is genetic epistasis, which is the interaction of two or more genetic loci to affect phenotype. Often the genes exhibit little marginal effects. Another important example is the interaction of a treatment with patient features to affect outcomes. Even though efforts have recently been made towards developing new algorithms that discover such interactions from data, to our knowledge no definition of a discrete causal interaction has been forwarded. Using information theory, we develop a fuzzy definition of a discrete causal action, called Interaction Strength (IS). The IS is bounded above by 1 and equals 1 if the causes in the interaction exhibit no marginal effects. Using the IS and BN scoring, we develop an exhaustive search algorithm, Exhaustive-IGain, which learns interactions from low-dimension datasets, and a heuristic search algorithm, called MBS-IGain, which learns interactions from high-dimensional datasets. Using simulated high-dimensional datasets, based on models of genetic epistasis, we compare MBS-IGain to 7 algorithms that learn genetic epistasis from high-dimensional datasets, and show that MBS-IGain’s discovery performance is notably better than the other methods. We apply MBS-IGain to a real LOAD dataset, and obtain results substantiating previous research and new results. Using low-dimensional simulated datasets, we show Exhaustive-IGain can learn 4-cause interactions with no marginal effects. We apply Exhaustive-Gain to a real clinical breast cancer datasets, and learn interactions that agree with the judgements of a breast cancer oncologist. Our algorithms are only directly applicable to problems where we have a specified target and its candidate causes. However, our algorithms could be used for general causal learning by being a front end to a standard causal learning algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Adam Brufsky, MD, PhD, Professor of Medicine at the University of Pittsburgh School of Medicine.

References

Spirtes, P., Glymour, C., Scheines, R.: Causation, prediction, and search. MIT Press, Boston, MA (2000)
MATH Google Scholar
http://www.phil.cmu.edu/tetrad/
Chickering, D., Meek, C.,: Finding optimal Bayesian networks. In: Darwiche, A., Friedman, N. (eds.) Uncertainty in Artificial Intelligence, Proceedings of the Eighteenth Conference. Morgan Kaufmann, San Mateo, CA (2002)
Google Scholar
Cheverud, J., Routman, E.: Epistasis and its contribution to genetic variance components. Genetics 139(3), 1455 (1995)
Google Scholar
Urbanowicz, R., Granizo-Mackenzie, A., Kiralis, J., Moore, J.H.: A classification and characterization of two-locus, pure, strict, epistatic models for simulation and detection. BioData Min. 7, 8 (2014)
Article Google Scholar
Fisher, R.: The correlation between relatives on the supposition of mendelian inheritance. Trans R Soc Edinburgh 52, 399–433 (1918)
Article Google Scholar
Galvin, A., Ioannidis, J.P.A., Dragani, T.A.: Beyond genome-wide association studies: Genetic heterogeneity and individual predisposition to cancer. Trends Genet. 26(3), 132–141 (2010)
Article Google Scholar
Manolio, T.A., Collins, F.S., Cox, N.J., et al.: Finding the missing heritability of complex diseases and complex traits. Nature 461, 747–753 (2009)
Article Google Scholar
Mahr, B.: Personal genomics: The case of missing heritability. Nature 456, 18–21 (2008)
Article Google Scholar
Moore, J.H., Asselbergs, F.W., Williams, S.M.: Bioinformatics challenges for genome-wide association studies. Bioinformatics 26, 445–455 (2010)
Article Google Scholar
Manolio, T.A., Collins, F.S.: The HapMap and genome-wide association studies in diagnosis and therapy. Annu. Rev. Med. 60, 443–456 (2009)
Article Google Scholar
Herbert, A., Gerry, N.P., McQueen, M.B.: A common genetic variant is associated with adult and childhood obesity. J. Comput. Biol. 312, 279–384 (2006)
Google Scholar
Spinola, M., Meyer, P., Kammerer, S., et al.: Association of the PDCD5 locus with long cancer risk and prognosis in smokers. Am. J. Hum. Genet. 55, 27–46 (2001)
Google Scholar
Lambert, J.C., Heath, S., Even, G., et al.: Genome-wide association study identifies variants at CLU and CR1 associated with Alzheimer’s disease. Nat. Genet. 41, 1094–1099 (2009)
Article Google Scholar
Curtis, C., Shah, S.P., Chin, S.F., et al.: The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroup. Nature 486, 346–352 (2012)
Google Scholar
Soulakis, N.D., Carson, M.B., Lee, Y.J., Schneider, D.H., Skeehan, C.T., Scholtens, D.M.: Visualizing collaborative electronic health record usage for hospitalized patients with heart failure. JAMIA 22(2), 299–311 (2015)
Google Scholar
Neapolitan, R.E.: Learning Bayesian Networks. Prentice Hall, Upper Saddle River, NJ (2004)
Google Scholar
Jensen, F.V., Neilsen, T.D.: Bayesian Networks and Decision Graphs. Springer-Verlag, New York (2007)
Book Google Scholar
Neapolitan, R.E.: Probabilistic reasoning in expert systems. Wiley, NY, NY (1989)
Google Scholar
Pearl, J.: Probabilistic reasoning in intelligent systems. Morgan Kaufmann, Burlington, MA (1988)
MATH Google Scholar
Segal, E., Pe’er, D., Regev, A., Koller, D., Friedman, N.: Learning module networks. Journal of Machine Learning Research 6, 557–588 (2005)
MATH MathSciNet Google Scholar
Friedman, N., Linial, M., Nachman, I., Pe’er, D. Using Bayesian networks to analyze expression data. In: Proceedings of the fourth annual international conference on computational molecular biology, Tokyo, Japan (2005)
Google Scholar
Fishelson, M., Geiger, D.: Optimizing exact genetic linkage computation. J. Comput. Biol. 11, 263–275 (2004)
Article Google Scholar
Neapolitan, R.E.: Probabilistic Reasoning in Bioinformatics. Morgan Kaufmann, Burlington, MA (2009)
MATH Google Scholar
Jiang, X., Cooper, G.F.: A real-time temporal Bayesian architecture for event surveillance and its application to patient-specific multiple disease outbreak detection. Data Min. Knowl. Disc. 20(3), 328–360 (2010)
Article Google Scholar
Jiang, X., Wallstrom, G., Cooper, G.F., Wagner, M.M.: Bayesian prediction of an epidemic curve. J. Biomed. Inform. 42(1), 90–99 (2009)
Article Google Scholar
Cooper, G.F.: The computational complexity of probabilistic inference using Bayesian belief networks. J. Artif. Intell. Res 42(2–3), 393–405 (1990)
Article MATH MathSciNet Google Scholar
Cooper, G.F., Herskovits, E.: A Bayesian method for the induction of probabilistic networks from data. Mach. Learn. 9, 309–347 (1992)
MATH Google Scholar
Heckerman D, Geiger D, Chickering D. Learning Bayesian networks: The combination of knowledge and statistical data. Technical report MSR-TR-94–09. Microsoft Research, 1995
Google Scholar
Chickering, M.: Learning Bayesian networks is NP-complete. In: Fisher, D., Lenz, H., (eds.) Learning from Data: Artificial Intelligence and Statistics V. Springer-Verlag, NY (1996)
Google Scholar
Shannon, C.E.: A mathematical theory of communication. The Bell System Technical Journal 27(3), 379–423 (1948)
Article MATH MathSciNet Google Scholar
Zadeh, L.A.: Fuzzy sets. Inf. Control 8, 338–353 (1965)
Article MATH Google Scholar
Zang, Z., Jiang, X., Neapolitan, R.E.: Discovering causal interactions using Bayesian network scoring and information gain. BMC Bioinformatics 17, 221 (2016)
Article Google Scholar
Jiang, X., Jao, J., Neapolitan, R.E. Learning predictive interactions using Information Gain and Bayesian network scoring. PLOS ONE (2015) http://dx.doi.org/10.1371/journal.pone.0143247
Jiang, X., Barmada, M.M., Cooper, G.F., Becich, M.J.: A Bayesian method for evaluating and discovering disease loci associations. PLoS ONE 6(8), e22075 (2011)
Article Google Scholar
Kooperberg, C., Ruczinski, I.: Identifying interacting SNPs using Monte Carlo logic regression. Genet. Epidemiol. 28, 157–170 (2005)
Article Google Scholar
Agresti, A.: Categorical data analysis, 2nd edn. Wiley, New York (2007)
MATH Google Scholar
Park, M.Y., Hastie, T.: Penalized logistic regression for detecting gene interactions. Biostatistics 9, 30–50 (2008)
Article MATH Google Scholar
Wu, T.T., Chen, Y.F., Hastie, T., Sobel, E., Lange, K.: Genome-wide association analysis by lasso penalized logistic regression. Genome Analysis 25, 714–721 (2009)
Google Scholar
Hahn, L.W., Ritchie, M.D., Moore, J.H.: Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions. Bioinformatics 19, 376–382 (2003)
Article Google Scholar
Marchini, J., Donnelly, P., Cardon, L.R.: Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat. Genet. 37, 413–417 (2005)
Article Google Scholar
Moore, J.H., Gilbert, J.C., Tsai, C.T., Chiang, F.T., Holden, T., Barney, N., et al.: A flexible computational framework for detecting characterizing and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. J. Theor. Biol. 241, 252–261 (2006)
Article MathSciNet Google Scholar
Yang, C., He, Z., Wan, X., Yang, Q., Xue, H., Yu, W.: SNPHarvester: a filtering-based approach for detecting epistatic interactions in genome-wide association studies. Bioinformatics 25, 504–511 (2009)
Article Google Scholar
Moore, J.H., White, B.C. Tuning ReliefF for genome-wide genetic analysis. In: Marchiori, E., Moore JH, Rajapakee JC (eds.) Proceedings of EvoBIO 2007. Berlin: Springer-Verlag (2007)
Google Scholar
Meng Y, Yang Q, Cuenco KT, Cupples LA, Destefano AL, Lunetta KL 2007. Two-stage approach for identifying single-nucleotide polymorphisms associated with rheumatoid arthritis using random forests and Bayesian networks. BMC Proc 2007: 1 Suppl 1:S56
Google Scholar
Wan, X., Yang, C., Yang, Q., Xue, H., Tang, N.L., Yu, W.: Predictive rule inference for epistatic interaction detection in genome-wide association studies. Bioinformatics 26(1), 30–37 (2007)
Article Google Scholar
Zhang, Y., Liu, J.S.: Bayesian inference of epistatic interactions in case control studies. Nat. Genet. 39, 1167–1173 (2007)
Article Google Scholar
Miller, D.J., Zhang, Y., Yu, G., Liu, Y., Chen, L., Langefeld, C.D., et al.: An algorithm for learning maximum entropy probability models of disease risk that efficiently searches and sparingly encodes multilocus genomic interactions. Bioinformatics 25(19), 2478–2485 (2009)
Article Google Scholar
Jiang X, Barmada MM, Neapolitan RE, Visweswaran S, Cooper GF. A fast algorithm for learning epistatic genomic relationships. AMIA Symposium Proceedings 2010: 341–345
Google Scholar
Jiang, X., Neapolitan, R.E.: LEAP: biomarker inference through learning and evaluating association patterns. Genet. Epidemiol. 39(3), 173–184 (2015)
Article Google Scholar
Chen, L., Yu, G., Langefeld, C.D., et al.: Comparative analysis of methods for detecting interacting loci. BMC Genom. 12, 344 (2011)
Article Google Scholar
Rieman, E.M., Webster, J.A., Myers, A.J., Hardy, J., Dunckley, T., Zismann, V.L., et al.: GAB2 alleles modify Alzheimer’s risk in APOE carriers. Neuron 54, 713–720 (2007)
Article Google Scholar
Tycko, B., Lee, J.H., Ciappa, A., Saxena, A., Li, C.M., Feng, L.: APOE and APOC1 promoter polymorphisms and the risk of Alzheimer disease in African American and Caribbean Hispanic individuals. Arch. Neurol. 61(9), 1434–1439 (2004)
Article Google Scholar
Turner SD, Martin ER, Beecham GW, Gilbert JR, Haines JL, Pericak-Vance MA, et al. Genome-wide Analysis of Gene-Gene Interaction in Alzheimer Disease. Abstract in ASHG 2008 Annual Meeting (2008)
Google Scholar
Urbanowicz R, Kiralis J, Sinnott-Armstrong NA, et al. GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures. BioData Min. 2012; 5(1):16. doi:10.1186/1756-0381-5-16
Fisher, R.A.: On the ‘probable error’ of a coefficient of correlation deduced from a small sample. Metron 1, 3–32 (1921)
Google Scholar
Rathnam, C., Lee, S., Jiang, X.: An algorithm for direct causal learning of influences on patient outcomes. Artif. Intell. Med. 75, 1–15 (2017)
Article Google Scholar

Download references

Acknowledgements

Funding

This work was supported by National Library of Medicine grants number R00LM010822, R01LM011663, and R01LM011962.

Author information

Authors and Affiliations

Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
Xia Jiang
Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
Richard Neapolitan

Authors

Xia Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Richard Neapolitan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xia Jiang .

Editor information

Editors and Affiliations

Dept. of Statistics & Applied Probabilit, University of California Santa Barbara, Santa Barbara, California, USA
Dawn E. Holmes
KES International , Adelaide, South Australia, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Jiang, X., Neapolitan, R. (2018). Defining and Discovering Interactive Causes. In: Holmes, D., Jain, L. (eds) Advances in Biomedical Informatics. Intelligent Systems Reference Library, vol 137. Springer, Cham. https://doi.org/10.1007/978-3-319-67513-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-67513-8_4
Published: 20 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67512-1
Online ISBN: 978-3-319-67513-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics