Abstract
With the development of high throughput technology in the past twenty years, it has become easier and cheaper to simultaneously measure tens of thousands of molecules in biological systems. One of the major challenges is how to extract knowledge from these high dimensional datasets and infer the underlying mechanisms of the system. In this review, we discuss several topics related to causal discovery from biomedical data, including causal structural learning from observational and experimental data, estimation of causal effects, and using causal information for predictive modeling.
Similar content being viewed by others
Notes
Here we assume all variables in the system are observed, see Definition 6), and we consider the smallest \(\varvec{Pa_i}\) possible.
Let \(a_{|V|}\) denote the number of all possible DAGs over |V| vertices. \(a_{|V|}\) could be obtained through the following recursion: \(a_{|V|}=\sum _{k=1}^{|V|}{(-1)}^{k+1}{{|V|}\atopwithdelims (){k}}2^{k(|V|-k)}a_{|V|-k}\)(Robinson 1978).
Such link can also be established through the Markov condition, but the d-separation is more intuitive and useful, as pointed out in chapter 3 of Spirtes et al. (2000).
Here we are talking about the general case. It is easy to recognize that certain manipulations, i.e. the manipulations that do not affect P(T|predictors), would not change the predictive performance of the model. More detailed discussion on this can be found in Tillman and Spirtes (2008).
There still could be unoriented edges in the local causal neighborhood, since certain causal structure cannot be discovered from observational data. By “completely” we mean the orientation of local neighborhood by PCD-by-PCD is the same as that obtained from a global causal discovery algorithm.
References
Alekseyenko AV, Lytkin NI, Ai J, Ding B, Padyukov L, Aliferis CF, Statnikov A (2011) Causal graph-based analysis of genome-wide association data in rheumatoid arthritis. Biol Direct 6(1):1
Aliferis CF, Statnikov A, Tsamardinos I, Mani S, Koutsoukos XD (2010) Local causal and markov blanket induction for causal discovery and feature selection for classification part i: Algorithms and empirical evaluation. J Mach Learn Res 11:171–234
Aliferis CF, Statnikov A, Tsamardinos I, Mani S, Koutsoukos XD (2010) Local causal and markov blanket induction for causal discovery and feature selection for classification part ii: Analysis and extensions. J Mach Learn Res 11:235–284
Aliferis CF, Tsamardinos I, Statnikov A v Hiton: a novel markov blanket algorithm for optimal variable selection. In: AMIA Annual Symposium Proceedings. Am Med Inform Assoc 2003:21
Angrist JD, Kuersteiner GM (2011) Causal effects of monetary shocks: Semiparametric conditional independence tests with a multinomial propensity score. Rev Econ Stat 93(3):725–747
Baba K, Shibata R, Sibuya M (2004) Partial correlation and conditional correlation as measures of conditional independence. Australian New Zealand J Stat 46(4):657–664
Bareinboim E, Pearl J (2013) A general algorithm for deciding transportability of experimental results. J Causal Infer 1(1):107–134
Breiman L et al (2001) Statistical modeling: The two cultures (with comments and a rejoinder by the author). Stat Sci 16(3):199–231
Brown LE, Tsamardinos I, Aliferis CF (2004) A novel algorithm for scalable and accurate bayesian network learning. Medinfo 11(Pt 1):711–715
Buntine W (1991) Theory refinement on bayesian networks. In: Proceedings of the Seventh conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc., pp 52–60
Cantone I, Marucci L, Iorio F, Ricci MA, Belcastro V, Bansal M, Santini S, Di Bernardo M, Di Bernardo D, Cosma MP (2009) A yeast synthetic network for in vivo assessment of reverse-engineering and modeling approaches. Cell 137(1):172–181
Chickering DM (2002) Learning equivalence classes of bayesian-network structures. J Mach Learn Res 2:445–498
Chickering DM (2002) Optimal structure identification with greedy search. J Mach Learn Res 3:507–554
Colombo D, Maathuis MH, Kalisch M, Richardson TS (2012) Learning high-dimensional directed acyclic graphs with latent and selection variables. Ann Stat 294–321
Cooper GF, Herskovits E (1992) A bayesian method for the induction of probabilistic networks from data. Mach Learn 9(4):309–347
Danks D (2002) Learning the causal structure of overlapping variable sets. In: International Conference on Discovery Science. Springer, pp 178–191
De Smet R, Marchal K (2010) Advantages and limitations of current network inference methods. Nature Rev Microbiol 8(10):717–729
Dodge Y, Rousson V (2001) On asymmetric properties of the correlation coeffcient in the regression setting. Am Stat 55(1):51–54
Duda S, Aliferis C, Miller R, Statnikov A, Johnson K (2005) Extracting drug-drug interaction articles from medline to improve the content of drug databases. In: AMIA annual symposium proceedings. Am Med Inform Assoc 2005:216
Feelders A, Van der Gaag LC (2006) Learning bayesian network parameters under order constraints. Int J Approx Reason 42(1):37–53
Fisher RA et al (1924) The distribution of the partial correlation coefficient. Metron 3:329–332
Fisher RA (1936) Design of experiments. Br Med J 1(3923):554
Fisher RA, et al (1950) Statistical methods for research workers. Biological monographs and manuals. No. V. Oliver and Boyd, Edinburgh and London
Friedman N, Nachman I, Peér D (1999) Learning bayesian network structure from massive datasets: the sparse candidate. In: Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc. pp 206–215
Geiger D, Heckerman D (1994) Learning Gaussian networks. In: Proceedings of the Tenth international conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc. pp 235–243
Guyon I, Aliferis C, Elisseeff A (2007) Causal feature selection. Computational methods of feature selection pp 63–86
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Guyon I, Elisseeff A (2006) An introduction to feature extraction. In: Feature extraction. Springer pp 1–25
He YB, Geng Z (2008) Active learning of causal networks with intervention experiments and optimal designs. J Mach Learn Res 9:2523–2547
Heckerman D, Geiger D, Chickering DM (1995) Learning bayesian networks: The combination of knowledge and statistical data. Mach Learn 20(3):197–243
Heckerman D, Geiger D (2013) Learning bayesian networks: a unification for discrete and gaussian domains. arXiv preprint arXiv:1302.4957
Hill SM, Heiser LM, Cokelaer T, Unger M, Nesser NK, Carlin DE, Zhang Y, Sokolov A, Paull EO, Wong CK et al (2016) Inferring causal molecular networks: empirical assessment through a community-based effort. Nature Methods 13(4):310–318
Hyttinen A, Eberhardt F, Hoyer PO (2010) Causal discovery for linear cyclic models with latent variables. on Probabilistic Graphical Models, p 153
Hyttinen A, Eberhardt F, Hoyer PO (2012) Learning linear cyclic causal models with latent variables. J Mach Learn Res 13:3387–3439
Hyttinen A, Eberhardt F, Järvisalo M (2015) Do-calculus when the true graph is unknown. In: Proceedings of the 31th Conference on Uncertainty in Artificial Intelligence
Hyttinen A, Hoyer PO, Eberhardt F, Jarvisalo M (2013) Discovering cyclic causal models with latent variables: A general sat-based procedure. arXiv preprint arXiv:1309.6836
Imoto S, Higuchi T, Goto T, Tashiro K, Kuhara S, Miyano S (2004) Combining microarrays and biological knowledge for estimating gene networks via bayesian networks. J Bioinform Comput Biol 2(01):77–98
Isci S, Dogan H, Ozturk C, Otu HH (2014) Bayesian network prior: network analysis of biological data using external knowledge. Bioinformatics 30(6):860–867
Jmlr workshop and conference proceedings: Volume 3. http://www.jmlr.org/proceedings/papers/v3/. Accessed: 2016-11-23
John GH, Langley P (1995) Estimating continuous distributions in bayesian classifiers. In: Proceedings of the Eleventh conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc. pp 338–345
Karstoft KI, Galatzer-Levy IR, Statnikov A, Li Z, Shalev AY (2015) Bridging a translational gap: using machine learning to improve the prediction of ptsd. BMC Psychiatry 15(1):1
Lagani V, Athineou G, Farcomeni A, Tsagris M, Tsamardinos I (2016) Feature selection with the r package mxm: Discovering statistically-equivalent feature subsets. arXiv preprint arXiv:1611.03227
Lagani V, Triantafillou S, Ball G, Tegnér J, Tsamardinos I (2016) Probabilistic computational causal discovery for systems biology. In: Uncertainty in Biology. Springer pp 33–73
Lemeire J (2007) Learning causal models of multivariate systems and the value of it for the performance modeling of computer programs. ASP/VUBPRESS/UPA
Lemeire J, Maes S, Meganck S, Dirkx E (2006) The representation and learning of equivalent information in causal models. Tech. rep., Technical Report IRIS-TR-0099, Vrije Universiteit Brussel
Li H, Lu L, Manly KF, Chesler EJ, Bao L, Wang J, Zhou M, Williams RW, Cui Y (2005) Inferring gene transcriptional modulatory relations: a genetical genomics approach. Human Mol Genet 14(9):1119–1125
Li J, Wang ZJ (2009) Controlling the false discovery rate of the association/causality structure learned with the pc algorithm. J Mach Learn Res 10:475–514
Liu H, Motoda H (2007) Computational methods of feature selection. CRC Press
Lytkin NI, McVoy L, Weitkamp JH, Aliferis CF, Statnikov A (2011) Expanding the understanding of biases in development of clinical-grade molecular signatures: a case study in acute respiratory viral infections. PloS One 6(6):e20,662
Ma S, Kemmeren P, Aliferis CF, Statnikov A (2016) An evaluation of active learning causal discovery methods for reverse-engineering local causal pathways of gene regulation. Sci Rep 6
Ma S, Kemmeren P, Gresham D, Statnikov A (2014) De-novo learning of genome-scale regulatory networks in s. cerevisiae. PLOS One 9(9):e106,479
Maathuis MH, Colombo D, Kalisch M, Bühlmann P (2010) Predicting causal effects in large-scale systems from observational data. Nature Methods 7(4):247–248
Maathuis MH, Kalisch M, Bühlmann P et al (2009) Estimating high-dimensional intervention effects from observational data. Ann Stat 37(6A):3133–3164
Maathuis MH, Nandy P (2015) A review of some recent advances in causal inference. arXiv preprint arXiv:1506.07669
Marbach D, Schaffter T, Mattiussi C, Floreano D (2009) Generating realistic in silico gene networks for performance assessment of reverse engineering methods. J Comput Biol 16(2):229–239
Meganck S, Leray P, Manderick B (2006) Learning causal bayesian networks from observations and experiments: a decision theoretic approach. In: International Conference on Modeling Decisions for Artificial Intelligence. Springer, pp 58–69
Murphy KP (2001) Active learning of causal bayes net structure. Tech. rep, UC Berkeley
Nandy P, Maathuis MH, Richardson TS (2014) Estimating the effect of joint interventions from observational data in sparse high-dimensional settings. arXiv preprint arXiv:1407.2451
Olsen C, Fleming K, Prendergast N, Rubio R, Emmert-Streib F, Bontempi G, Haibe-Kains B, Quackenbush J (2014) Inference and validation of predictive gene networks from biomedical literature and gene expression data. Genomics 103(5):329–336
Ott S, Imoto S, Miyano S (2004) Finding optimal models for small gene networks. In: Pacific symposium on biocomputing, Citeseer 9:557–567
Pearl J, Bareinboim E et al (2014) External validity: From do-calculus to transportability across populations. Stat Sci 29(4):579–595
Pearl J (2009) Causality. Cambridge University Press
Pearson K (1900) X. on the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of. Science 50(302):157–175
Peer D, Regev A, Elidan G, Friedman N (2001) Inferring subnetworks from perturbed expression profiles. Bioinformatics 17(suppl 1):S215–S224
Ramsey J (2006) A pc-style markov blanket search for high dimensional datasets. Tech. rep., Technical Report No. CMU-PHIL-177
Ramsey JD (2014) A scalable conditional independence test for nonlinear, non-gaussian data. arXiv preprint arXiv:1401.5031
Ramírez-Gallego S, García S, Mouriño-Talín H, Martínez-Rego D, Bolón-Canedo V, Alonso-Betanzos A, Benítez JM, Herrera F (2016) Data discretization: taxonomy and big data challenge. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 6(1):5–21
Richardson T, Spirtes P (2002) Ancestral graph markov models. Ann Stat 962–1030
Robinson R (1978) Counting labeled acyclic digraphs. In: Harary F (ed) New directions in the theory of graphs. Academic Press, New York and London, pp 239–273
Sachs K, Perez O, Pe’er D, Lauffenburger DA, Nolan GP (2005) Causal protein-signaling networks derived from multiparameter single-cell data. Science 308(5721):523–529
Sachs K, Itani S, Fitzgerald J, Schoeberl B, Nolan G, Tomlin C (2013) Single timepoint models of dynamic systems. Interface focus 3(4):20130,019
Schadt EE (2009) Molecular networks as sensors and drivers of common human diseases. Nature 461(7261):218–223
Schadt EE, Lamb J, Yang X, Zhu J, Edwards S, GuhaThakurta D, Sieberts SK, Monks S, Reitman M, Zhang C et al (2005) An integrative genomics approach to infer causal associations between gene expression and disease. Nature Genet 37(7):710–717
Scheines R, Eberhardt F, Hoyer PO (2010) Combining experiments to discover linear cyclic models with latent variables. Tech. rep, CMU, Pittsburg, US
Shimizu S, Bollen K (2014) Bayesian estimation of causal direction in acyclic structural equation models with individual-specific confounder variables and non-gaussian distributions. J Mach Learn Res 15(1):2629–2652
Shimizu S, Kano Y (2008) Use of non-normality in structural equation modeling: Application to direction of causation. J Stat Plann Infer 138(11):3483–3491
Shimizu S, Hoyer PO, Hyvärinen A, Kerminen A (2006) A linear non-gaussian acyclic model for causal discovery. J Mach Learn Res 7:2003–2030
Shimizu S, Inazumi T, Sogawa Y, Hyvärinen A, Kawahara Y, Washio T, Hoyer PO, Bollen K (2011) Directlingam: A direct method for learning a linear non-gaussian structural equation model. J Mach Learn Res 12:1225–1248
Simpson EH (1951) The interpretation of interaction in contingency tables. Journal of the Royal Statistical Society. Series B (Methodological) pp 238–241
Sokal RR, Rohlf FJ (1981) Biometry: the principles and practice of statistics in biological research. Freedman, New York
Spirtes P, Glymour C, Scheines R, Kauffman S, Aimale V, Wimberly F (2000) Constructing bayesian network models of gene expression networks from microarray data. Tech. rep, CMU
Spirtes P (2001) An anytime algorithm for causal inference. In: AISTATS. Citeseer
Spirtes P, Glymour CN, Scheines R (2000) Causation, prediction, and search. MIT Press
Spirtes P, Glymour CN, Scheines R, Spirtes P, Glymour C, Scheines R (1990) Causality from probability. In: Conference Proceedings: Advanced Computing for the Social Sciences, Williamsburgh
Spirtes P, Meek C, Richardson T (1995) Causal inference in the presence of latent variables and selection bias. In: Proceedings of the Eleventh conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc. pp 499–506
Statnikov A, Lytkin NI, McVoy L, Weitkamp JH, Aliferis CF (2010) Using gene expression profiles from peripheral blood to identify asymptomatic responses to acute respiratory viral infections. BMC Res Notes 3(1):264
Statnikov A, McVoy L, Lytkin N, Aliferis CF (2010) Improving development of the molecular signature for diagnosis of acute respiratory viral infections. Cell Host Microbe 7(2):100
Statnikov A, Aliferis CF (2010) Analysis and computational dissection of molecular signature multiplicity. PLoS Comput Biol 6(5):e1000,790
Statnikov A, Lytkin NI, Lemeire J, Aliferis CF (2013) Algorithms for discovery of multiple markov boundaries. J Mach Learn Res 14:499–566
Statnikov A, Ma S, Henaff M, Lytkin N, Efstathiadis E, Peskin ER, Aliferis CF (2015) Ultra-scalable and efficient methods for hybrid observational and experimental local causal pathway discovery. J Mach Learn Res
Stolovitzky G, Monroe D, Califano A (2007) Dialogue on reverse-engineering assessment and methods: the dream of high-throughput pathway inference. Ann New York Acad Sci 1115:1
Su L, White H (2007) A consistent characteristic function-based test for conditional independence. J Econ 141(2):807–834
Su L, White H (2008) A nonparametric hellinger metric test for conditional independence. Econ Theory 24(04):829–864
Sun X, Janzing D, Schölkopf B, Fukumizu K (2007) A kernel-based causal learning algorithm. In: Proceedings of the 24th international conference on Machine learning. ACM pp 855–862
Sørlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, Hastie T, Eisen MB, Van De Rijn M, Jeffrey SS et al (2001) Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proceed Nat Acad Sci 98(19):10869–10874
Tamada Y, Kim S, Bannai H, Imoto S, Tashiro K, Kuhara S, Miyano S (2003) Estimating gene networks from gene expression data by combining bayesian network model with promoter element detection. Bioinformatics 19(suppl 2):ii227–ii236
Tan M, Alshalalfa M, Alhajj R, Polat F (2011) Influence of prior knowledge in constraint-based learning of gene regulatory networks. IEEE/ACM Trans Comput Biol Bioinform 8(1):130–142. doi:10.1109/TCBB.2009.58
Tan M, AlShalalfa M, Alhajj R, Polat F (2008) Combining multiple types of biological data in constraint-based learning of gene regulatory networks. In: Computational Intelligence in Bioinformatics and Computational Biology, 2008. CIBCB’08. IEEE Symposium on IEEE pp 90–97
Tillman ER, Eberhardt F (2014) Learning causal structure from multiple datasets with similar variable sets. Behaviormetrika 41(1):41–64
Tillman RE, Spirtes P (2008) When causality matters for prediction: Investigating the practical tradeoffs. In: Proceedings of the 2008th International Conference on Causality: Objectives and Assessment - Volume 6, COA’08, pp. 137–146. JMLR.org . http://dl.acm.org/citation.cfm?id=2996801.2996811
Tong S, Koller D (2001) Active learning for structure in bayesian networks. In: International joint conference on artificial intelligence, vol. 17, pp. 863–869. LAWRENCE ERLBAUM ASSOCIATES LTD
Triantafillou S, Tsamardinos I (2015) Constraint-based causal discovery from multiple interventions over overlapping variable sets. J Machine Learn Res 16:2147–2205
Tsamardinos I, Brown LE, Aliferis CF (2006) The max-min hill-climbing bayesian network structure learning algorithm. Mach Learn 65(1):31–78
Tsamardinos I, Lagani V, Pappas D (2012) Discovering multiple, equivalent biomarker signatures. In: 7th Conference of the Hellenic Society for Computational Biology and Bioinformatics (HSCBB12). Heraklion
Uusitalo L (2007) Advantages and challenges of bayesian networks in environmental modelling. Ecol Model 203(3):312–318
Veer Van’t LJ, Dai H, Van De Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871):530–536
Wang M, Benedito VA, Zhao PX, Udvardi M (2010) Inferring large-scale gene regulatory networks using a low-order constraint-based algorithm. Mol BioSyst 6(6):988–998
Werhli AV, Husmeier D (2007) Reconstructing gene regulatory networks with bayesian networks by combining expression data with multiple sources of prior knowledge. Stat Appl Genet Mol Biol 6(1)
Yin J, Zhou Y, Wang C, He P, Zheng C, Geng Z (2008) Partial orientation and local structural learning of causal networks for prediction. In: WCCI Causation and Prediction Challenge, pp 93–105
Zhang K, Peters J, Janzing D, Schoelkopf B (2011) Kernel-based conditional independence test and application in causal discovery. In: Proceedings of the Twenty-Seventh Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-11), pp. 804–813. AUAI Press, Corvallis, Oregon
Zhang K, Peters J, Janzing D, Schölkopf B (2012) Kernel-based conditional independence test and application in causal discovery. arXiv preprint arXiv:1202.3775
Acknowledgements
The authors are grateful to Roshan Tourani for helpful comments on the manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Shohei Shimizu.
About this article
Cite this article
Ma, S., Statnikov, A. Methods for computational causal discovery in biomedicine. Behaviormetrika 44, 165–191 (2017). https://doi.org/10.1007/s41237-016-0013-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41237-016-0013-5