Skip to main content
Log in

Methods for computational causal discovery in biomedicine

  • Invited Paper
  • Published:
Behaviormetrika Aims and scope Submit manuscript

Abstract

With the development of high throughput technology in the past twenty years, it has become easier and cheaper to simultaneously measure tens of thousands of molecules in biological systems. One of the major challenges is how to extract knowledge from these high dimensional datasets and infer the underlying mechanisms of the system. In this review, we discuss several topics related to causal discovery from biomedical data, including causal structural learning from observational and experimental data, estimation of causal effects, and using causal information for predictive modeling.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Here we assume all variables in the system are observed, see Definition 6), and we consider the smallest \(\varvec{Pa_i}\) possible.

  2. Let \(a_{|V|}\) denote the number of all possible DAGs over |V| vertices. \(a_{|V|}\) could be obtained through the following recursion: \(a_{|V|}=\sum _{k=1}^{|V|}{(-1)}^{k+1}{{|V|}\atopwithdelims (){k}}2^{k(|V|-k)}a_{|V|-k}\)(Robinson 1978).

  3. Such link can also be established through the Markov condition, but the d-separation is more intuitive and useful, as pointed out in chapter 3 of Spirtes et al. (2000).

  4. Here we are talking about the general case. It is easy to recognize that certain manipulations, i.e. the manipulations that do not affect P(T|predictors), would not change the predictive performance of the model. More detailed discussion on this can be found in Tillman and Spirtes (2008).

  5. There still could be unoriented edges in the local causal neighborhood, since certain causal structure cannot be discovered from observational data. By “completely” we mean the orientation of local neighborhood by PCD-by-PCD is the same as that obtained from a global causal discovery algorithm.

References

  • Alekseyenko AV, Lytkin NI, Ai J, Ding B, Padyukov L, Aliferis CF, Statnikov A (2011) Causal graph-based analysis of genome-wide association data in rheumatoid arthritis. Biol Direct 6(1):1

    Article  Google Scholar 

  • Aliferis CF, Statnikov A, Tsamardinos I, Mani S, Koutsoukos XD (2010) Local causal and markov blanket induction for causal discovery and feature selection for classification part i: Algorithms and empirical evaluation. J Mach Learn Res 11:171–234

  • Aliferis CF, Statnikov A, Tsamardinos I, Mani S, Koutsoukos XD (2010) Local causal and markov blanket induction for causal discovery and feature selection for classification part ii: Analysis and extensions. J Mach Learn Res 11:235–284

  • Aliferis CF, Tsamardinos I, Statnikov A v Hiton: a novel markov blanket algorithm for optimal variable selection. In: AMIA Annual Symposium Proceedings. Am Med Inform Assoc 2003:21

  • Angrist JD, Kuersteiner GM (2011) Causal effects of monetary shocks: Semiparametric conditional independence tests with a multinomial propensity score. Rev Econ Stat 93(3):725–747

    Article  Google Scholar 

  • Baba K, Shibata R, Sibuya M (2004) Partial correlation and conditional correlation as measures of conditional independence. Australian New Zealand J Stat 46(4):657–664

    Article  MathSciNet  MATH  Google Scholar 

  • Bareinboim E, Pearl J (2013) A general algorithm for deciding transportability of experimental results. J Causal Infer 1(1):107–134

    Google Scholar 

  • Breiman L et al (2001) Statistical modeling: The two cultures (with comments and a rejoinder by the author). Stat Sci 16(3):199–231

    Article  MathSciNet  MATH  Google Scholar 

  • Brown LE, Tsamardinos I, Aliferis CF (2004) A novel algorithm for scalable and accurate bayesian network learning. Medinfo 11(Pt 1):711–715

    Google Scholar 

  • Buntine W (1991) Theory refinement on bayesian networks. In: Proceedings of the Seventh conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc., pp 52–60

  • Cantone I, Marucci L, Iorio F, Ricci MA, Belcastro V, Bansal M, Santini S, Di Bernardo M, Di Bernardo D, Cosma MP (2009) A yeast synthetic network for in vivo assessment of reverse-engineering and modeling approaches. Cell 137(1):172–181

    Article  Google Scholar 

  • Chickering DM (2002) Learning equivalence classes of bayesian-network structures. J Mach Learn Res 2:445–498

    MathSciNet  MATH  Google Scholar 

  • Chickering DM (2002) Optimal structure identification with greedy search. J Mach Learn Res 3:507–554

  • Colombo D, Maathuis MH, Kalisch M, Richardson TS (2012) Learning high-dimensional directed acyclic graphs with latent and selection variables. Ann Stat 294–321

  • Cooper GF, Herskovits E (1992) A bayesian method for the induction of probabilistic networks from data. Mach Learn 9(4):309–347

    MATH  Google Scholar 

  • Danks D (2002) Learning the causal structure of overlapping variable sets. In: International Conference on Discovery Science. Springer, pp 178–191

  • De Smet R, Marchal K (2010) Advantages and limitations of current network inference methods. Nature Rev Microbiol 8(10):717–729

    Google Scholar 

  • Dodge Y, Rousson V (2001) On asymmetric properties of the correlation coeffcient in the regression setting. Am Stat 55(1):51–54

    Article  Google Scholar 

  • Duda S, Aliferis C, Miller R, Statnikov A, Johnson K (2005) Extracting drug-drug interaction articles from medline to improve the content of drug databases. In: AMIA annual symposium proceedings. Am Med Inform Assoc 2005:216

  • Feelders A, Van der Gaag LC (2006) Learning bayesian network parameters under order constraints. Int J Approx Reason 42(1):37–53

    Article  MathSciNet  MATH  Google Scholar 

  • Fisher RA et al (1924) The distribution of the partial correlation coefficient. Metron 3:329–332

    Google Scholar 

  • Fisher RA (1936) Design of experiments. Br Med J 1(3923):554

    Article  Google Scholar 

  • Fisher RA, et al (1950) Statistical methods for research workers. Biological monographs and manuals. No. V. Oliver and Boyd, Edinburgh and London

  • Friedman N, Nachman I, Peér D (1999) Learning bayesian network structure from massive datasets: the sparse candidate. In: Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc. pp 206–215

  • Geiger D, Heckerman D (1994) Learning Gaussian networks. In: Proceedings of the Tenth international conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc. pp 235–243

  • Guyon I, Aliferis C, Elisseeff A (2007) Causal feature selection. Computational methods of feature selection pp 63–86

  • Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

  • Guyon I, Elisseeff A (2006) An introduction to feature extraction. In: Feature extraction. Springer pp 1–25

  • He YB, Geng Z (2008) Active learning of causal networks with intervention experiments and optimal designs. J Mach Learn Res 9:2523–2547

  • Heckerman D, Geiger D, Chickering DM (1995) Learning bayesian networks: The combination of knowledge and statistical data. Mach Learn 20(3):197–243

    MATH  Google Scholar 

  • Heckerman D, Geiger D (2013) Learning bayesian networks: a unification for discrete and gaussian domains. arXiv preprint arXiv:1302.4957

  • Hill SM, Heiser LM, Cokelaer T, Unger M, Nesser NK, Carlin DE, Zhang Y, Sokolov A, Paull EO, Wong CK et al (2016) Inferring causal molecular networks: empirical assessment through a community-based effort. Nature Methods 13(4):310–318

    Article  Google Scholar 

  • Hyttinen A, Eberhardt F, Hoyer PO (2010) Causal discovery for linear cyclic models with latent variables. on Probabilistic Graphical Models, p 153

  • Hyttinen A, Eberhardt F, Hoyer PO (2012) Learning linear cyclic causal models with latent variables. J Mach Learn Res 13:3387–3439

  • Hyttinen A, Eberhardt F, Järvisalo M (2015) Do-calculus when the true graph is unknown. In: Proceedings of the 31th Conference on Uncertainty in Artificial Intelligence

  • Hyttinen A, Hoyer PO, Eberhardt F, Jarvisalo M (2013) Discovering cyclic causal models with latent variables: A general sat-based procedure. arXiv preprint arXiv:1309.6836

  • Imoto S, Higuchi T, Goto T, Tashiro K, Kuhara S, Miyano S (2004) Combining microarrays and biological knowledge for estimating gene networks via bayesian networks. J Bioinform Comput Biol 2(01):77–98

    Article  Google Scholar 

  • Isci S, Dogan H, Ozturk C, Otu HH (2014) Bayesian network prior: network analysis of biological data using external knowledge. Bioinformatics 30(6):860–867

    Article  Google Scholar 

  • Jmlr workshop and conference proceedings: Volume 3. http://www.jmlr.org/proceedings/papers/v3/. Accessed: 2016-11-23

  • John GH, Langley P (1995) Estimating continuous distributions in bayesian classifiers. In: Proceedings of the Eleventh conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc. pp 338–345

  • Karstoft KI, Galatzer-Levy IR, Statnikov A, Li Z, Shalev AY (2015) Bridging a translational gap: using machine learning to improve the prediction of ptsd. BMC Psychiatry 15(1):1

    Article  Google Scholar 

  • Lagani V, Athineou G, Farcomeni A, Tsagris M, Tsamardinos I (2016) Feature selection with the r package mxm: Discovering statistically-equivalent feature subsets. arXiv preprint arXiv:1611.03227

  • Lagani V, Triantafillou S, Ball G, Tegnér J, Tsamardinos I (2016) Probabilistic computational causal discovery for systems biology. In: Uncertainty in Biology. Springer pp 33–73

  • Lemeire J (2007) Learning causal models of multivariate systems and the value of it for the performance modeling of computer programs. ASP/VUBPRESS/UPA

  • Lemeire J, Maes S, Meganck S, Dirkx E (2006) The representation and learning of equivalent information in causal models. Tech. rep., Technical Report IRIS-TR-0099, Vrije Universiteit Brussel

  • Li H, Lu L, Manly KF, Chesler EJ, Bao L, Wang J, Zhou M, Williams RW, Cui Y (2005) Inferring gene transcriptional modulatory relations: a genetical genomics approach. Human Mol Genet 14(9):1119–1125

    Article  Google Scholar 

  • Li J, Wang ZJ (2009) Controlling the false discovery rate of the association/causality structure learned with the pc algorithm. J Mach Learn Res 10:475–514

  • Liu H, Motoda H (2007) Computational methods of feature selection. CRC Press

  • Lytkin NI, McVoy L, Weitkamp JH, Aliferis CF, Statnikov A (2011) Expanding the understanding of biases in development of clinical-grade molecular signatures: a case study in acute respiratory viral infections. PloS One 6(6):e20,662

  • Ma S, Kemmeren P, Aliferis CF, Statnikov A (2016) An evaluation of active learning causal discovery methods for reverse-engineering local causal pathways of gene regulation. Sci Rep 6

  • Ma S, Kemmeren P, Gresham D, Statnikov A (2014) De-novo learning of genome-scale regulatory networks in s. cerevisiae. PLOS One 9(9):e106,479

  • Maathuis MH, Colombo D, Kalisch M, Bühlmann P (2010) Predicting causal effects in large-scale systems from observational data. Nature Methods 7(4):247–248

    Article  Google Scholar 

  • Maathuis MH, Kalisch M, Bühlmann P et al (2009) Estimating high-dimensional intervention effects from observational data. Ann Stat 37(6A):3133–3164

    Article  MathSciNet  MATH  Google Scholar 

  • Maathuis MH, Nandy P (2015) A review of some recent advances in causal inference. arXiv preprint arXiv:1506.07669

  • Marbach D, Schaffter T, Mattiussi C, Floreano D (2009) Generating realistic in silico gene networks for performance assessment of reverse engineering methods. J Comput Biol 16(2):229–239

    Article  Google Scholar 

  • Meganck S, Leray P, Manderick B (2006) Learning causal bayesian networks from observations and experiments: a decision theoretic approach. In: International Conference on Modeling Decisions for Artificial Intelligence. Springer, pp 58–69

  • Murphy KP (2001) Active learning of causal bayes net structure. Tech. rep, UC Berkeley

    Google Scholar 

  • Nandy P, Maathuis MH, Richardson TS (2014) Estimating the effect of joint interventions from observational data in sparse high-dimensional settings. arXiv preprint arXiv:1407.2451

  • Olsen C, Fleming K, Prendergast N, Rubio R, Emmert-Streib F, Bontempi G, Haibe-Kains B, Quackenbush J (2014) Inference and validation of predictive gene networks from biomedical literature and gene expression data. Genomics 103(5):329–336

    Article  Google Scholar 

  • Ott S, Imoto S, Miyano S (2004) Finding optimal models for small gene networks. In: Pacific symposium on biocomputing, Citeseer 9:557–567

  • Pearl J, Bareinboim E et al (2014) External validity: From do-calculus to transportability across populations. Stat Sci 29(4):579–595

    Article  MathSciNet  MATH  Google Scholar 

  • Pearl J (2009) Causality. Cambridge University Press

  • Pearson K (1900) X. on the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of. Science 50(302):157–175

    MATH  Google Scholar 

  • Peer D, Regev A, Elidan G, Friedman N (2001) Inferring subnetworks from perturbed expression profiles. Bioinformatics 17(suppl 1):S215–S224

    Article  Google Scholar 

  • Ramsey J (2006) A pc-style markov blanket search for high dimensional datasets. Tech. rep., Technical Report No. CMU-PHIL-177

  • Ramsey JD (2014) A scalable conditional independence test for nonlinear, non-gaussian data. arXiv preprint arXiv:1401.5031

  • Ramírez-Gallego S, García S, Mouriño-Talín H, Martínez-Rego D, Bolón-Canedo V, Alonso-Betanzos A, Benítez JM, Herrera F (2016) Data discretization: taxonomy and big data challenge. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 6(1):5–21

    Google Scholar 

  • Richardson T, Spirtes P (2002) Ancestral graph markov models. Ann Stat 962–1030

  • Robinson R (1978) Counting labeled acyclic digraphs. In: Harary F (ed) New directions in the theory of graphs. Academic Press, New York and London, pp 239–273

    Google Scholar 

  • Sachs K, Perez O, Pe’er D, Lauffenburger DA, Nolan GP (2005) Causal protein-signaling networks derived from multiparameter single-cell data. Science 308(5721):523–529

    Article  Google Scholar 

  • Sachs K, Itani S, Fitzgerald J, Schoeberl B, Nolan G, Tomlin C (2013) Single timepoint models of dynamic systems. Interface focus 3(4):20130,019

  • Schadt EE (2009) Molecular networks as sensors and drivers of common human diseases. Nature 461(7261):218–223

    Article  Google Scholar 

  • Schadt EE, Lamb J, Yang X, Zhu J, Edwards S, GuhaThakurta D, Sieberts SK, Monks S, Reitman M, Zhang C et al (2005) An integrative genomics approach to infer causal associations between gene expression and disease. Nature Genet 37(7):710–717

    Article  Google Scholar 

  • Scheines R, Eberhardt F, Hoyer PO (2010) Combining experiments to discover linear cyclic models with latent variables. Tech. rep, CMU, Pittsburg, US

    Google Scholar 

  • Shimizu S, Bollen K (2014) Bayesian estimation of causal direction in acyclic structural equation models with individual-specific confounder variables and non-gaussian distributions. J Mach Learn Res 15(1):2629–2652

    MathSciNet  MATH  Google Scholar 

  • Shimizu S, Kano Y (2008) Use of non-normality in structural equation modeling: Application to direction of causation. J Stat Plann Infer 138(11):3483–3491

    Article  MathSciNet  MATH  Google Scholar 

  • Shimizu S, Hoyer PO, Hyvärinen A, Kerminen A (2006) A linear non-gaussian acyclic model for causal discovery. J Mach Learn Res 7:2003–2030

  • Shimizu S, Inazumi T, Sogawa Y, Hyvärinen A, Kawahara Y, Washio T, Hoyer PO, Bollen K (2011) Directlingam: A direct method for learning a linear non-gaussian structural equation model. J Mach Learn Res 12:1225–1248

  • Simpson EH (1951) The interpretation of interaction in contingency tables. Journal of the Royal Statistical Society. Series B (Methodological) pp 238–241

  • Sokal RR, Rohlf FJ (1981) Biometry: the principles and practice of statistics in biological research. Freedman, New York

    MATH  Google Scholar 

  • Spirtes P, Glymour C, Scheines R, Kauffman S, Aimale V, Wimberly F (2000) Constructing bayesian network models of gene expression networks from microarray data. Tech. rep, CMU

    Google Scholar 

  • Spirtes P (2001) An anytime algorithm for causal inference. In: AISTATS. Citeseer

  • Spirtes P, Glymour CN, Scheines R (2000) Causation, prediction, and search. MIT Press

  • Spirtes P, Glymour CN, Scheines R, Spirtes P, Glymour C, Scheines R (1990) Causality from probability. In: Conference Proceedings: Advanced Computing for the Social Sciences, Williamsburgh

  • Spirtes P, Meek C, Richardson T (1995) Causal inference in the presence of latent variables and selection bias. In: Proceedings of the Eleventh conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc. pp 499–506

  • Statnikov A, Lytkin NI, McVoy L, Weitkamp JH, Aliferis CF (2010) Using gene expression profiles from peripheral blood to identify asymptomatic responses to acute respiratory viral infections. BMC Res Notes 3(1):264

    Article  Google Scholar 

  • Statnikov A, McVoy L, Lytkin N, Aliferis CF (2010) Improving development of the molecular signature for diagnosis of acute respiratory viral infections. Cell Host Microbe 7(2):100

    Article  Google Scholar 

  • Statnikov A, Aliferis CF (2010) Analysis and computational dissection of molecular signature multiplicity. PLoS Comput Biol 6(5):e1000,790

  • Statnikov A, Lytkin NI, Lemeire J, Aliferis CF (2013) Algorithms for discovery of multiple markov boundaries. J Mach Learn Res 14:499–566

  • Statnikov A, Ma S, Henaff M, Lytkin N, Efstathiadis E, Peskin ER, Aliferis CF (2015) Ultra-scalable and efficient methods for hybrid observational and experimental local causal pathway discovery. J Mach Learn Res

  • Stolovitzky G, Monroe D, Califano A (2007) Dialogue on reverse-engineering assessment and methods: the dream of high-throughput pathway inference. Ann New York Acad Sci 1115:1

    Article  Google Scholar 

  • Su L, White H (2007) A consistent characteristic function-based test for conditional independence. J Econ 141(2):807–834

    Article  MathSciNet  MATH  Google Scholar 

  • Su L, White H (2008) A nonparametric hellinger metric test for conditional independence. Econ Theory 24(04):829–864

    Article  MathSciNet  MATH  Google Scholar 

  • Sun X, Janzing D, Schölkopf B, Fukumizu K (2007) A kernel-based causal learning algorithm. In: Proceedings of the 24th international conference on Machine learning. ACM pp 855–862

  • Sørlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, Hastie T, Eisen MB, Van De Rijn M, Jeffrey SS et al (2001) Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proceed Nat Acad Sci 98(19):10869–10874

    Article  Google Scholar 

  • Tamada Y, Kim S, Bannai H, Imoto S, Tashiro K, Kuhara S, Miyano S (2003) Estimating gene networks from gene expression data by combining bayesian network model with promoter element detection. Bioinformatics 19(suppl 2):ii227–ii236

  • Tan M, Alshalalfa M, Alhajj R, Polat F (2011) Influence of prior knowledge in constraint-based learning of gene regulatory networks. IEEE/ACM Trans Comput Biol Bioinform 8(1):130–142. doi:10.1109/TCBB.2009.58

    Article  Google Scholar 

  • Tan M, AlShalalfa M, Alhajj R, Polat F (2008) Combining multiple types of biological data in constraint-based learning of gene regulatory networks. In: Computational Intelligence in Bioinformatics and Computational Biology, 2008. CIBCB’08. IEEE Symposium on IEEE pp 90–97

  • Tillman ER, Eberhardt F (2014) Learning causal structure from multiple datasets with similar variable sets. Behaviormetrika 41(1):41–64

    Article  Google Scholar 

  • Tillman RE, Spirtes P (2008) When causality matters for prediction: Investigating the practical tradeoffs. In: Proceedings of the 2008th International Conference on Causality: Objectives and Assessment - Volume 6, COA’08, pp. 137–146. JMLR.org . http://dl.acm.org/citation.cfm?id=2996801.2996811

  • Tong S, Koller D (2001) Active learning for structure in bayesian networks. In: International joint conference on artificial intelligence, vol. 17, pp. 863–869. LAWRENCE ERLBAUM ASSOCIATES LTD

  • Triantafillou S, Tsamardinos I (2015) Constraint-based causal discovery from multiple interventions over overlapping variable sets. J Machine Learn Res 16:2147–2205

    MathSciNet  MATH  Google Scholar 

  • Tsamardinos I, Brown LE, Aliferis CF (2006) The max-min hill-climbing bayesian network structure learning algorithm. Mach Learn 65(1):31–78

    Article  Google Scholar 

  • Tsamardinos I, Lagani V, Pappas D (2012) Discovering multiple, equivalent biomarker signatures. In: 7th Conference of the Hellenic Society for Computational Biology and Bioinformatics (HSCBB12). Heraklion

  • Uusitalo L (2007) Advantages and challenges of bayesian networks in environmental modelling. Ecol Model 203(3):312–318

    Article  Google Scholar 

  • Veer Van’t LJ, Dai H, Van De Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871):530–536

    Article  Google Scholar 

  • Wang M, Benedito VA, Zhao PX, Udvardi M (2010) Inferring large-scale gene regulatory networks using a low-order constraint-based algorithm. Mol BioSyst 6(6):988–998

    Article  Google Scholar 

  • Werhli AV, Husmeier D (2007) Reconstructing gene regulatory networks with bayesian networks by combining expression data with multiple sources of prior knowledge. Stat Appl Genet Mol Biol 6(1)

  • Yin J, Zhou Y, Wang C, He P, Zheng C, Geng Z (2008) Partial orientation and local structural learning of causal networks for prediction. In: WCCI Causation and Prediction Challenge, pp 93–105

  • Zhang K, Peters J, Janzing D, Schoelkopf B (2011) Kernel-based conditional independence test and application in causal discovery. In: Proceedings of the Twenty-Seventh Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-11), pp. 804–813. AUAI Press, Corvallis, Oregon

  • Zhang K, Peters J, Janzing D, Schölkopf B (2012) Kernel-based conditional independence test and application in causal discovery. arXiv preprint arXiv:1202.3775

Download references

Acknowledgements

The authors are grateful to Roshan Tourani for helpful comments on the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sisi Ma.

Additional information

Communicated by Shohei Shimizu.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, S., Statnikov, A. Methods for computational causal discovery in biomedicine. Behaviormetrika 44, 165–191 (2017). https://doi.org/10.1007/s41237-016-0013-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41237-016-0013-5

Keywords

Navigation