An efficient algorithm for large-scale causal discovery
- 230 Downloads
Causal discovery is a fundamental problem in scientific research. Although many researchers are committed to finding causal relationships from observational data, large-scale causal discovery remains a tremendous challenge. In this paper, a new approach for large-scale causal discovery is proposed, based on a split-and-merge strategy. The method first splits a given dataset into small subdatasets using a graph-partitioning method and then develops a effective algorithm to infer the causality of each subdataset. The entire causal structure with respect to the given dataset is achieved by combining all the causalities of each subdataset. The experimental results show that the proposed approach is effective and scalable for large-scale causal discovery problems.
KeywordsCausation discovery Causal network Additive noise model
This paper has been supported by Science and Technology Planning Project of Guangdong Province, China (2015A030401101), (2015B090922014), and by National Natural Science Foundation of China(61572144).
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest.
- Cai R, Zhang Z, Hao Z (2013) Sada: a general framework to support robust causation discovery. In: Proceedings of the 30th international conference on machine learning, pp 208–216Google Scholar
- Daniusis P, Janzing D, Mooij J, Zscheischler J, Steudel B, Zhang K, Schölkopf B (2012) Inferring deterministic causal relations. arXiv preprint arXiv:1203.3475
- Geiger D, Heckerman D (1994) Learning gaussian networks. In: Proceedings of the tenth international conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc, pp 235–243Google Scholar
- Gullberg M, Noreus K, Brattsand G, Friedrich B, Shingler V (1990) Purification and characterization of a 19-kilodalton intracellular protein. An activation-regulated putative protein kinase c substrate of t lymphocytes. J Biol Chem 265(29):17499–17505Google Scholar
- Gu B, Sheng VS (2016) A robust regularization path algorithm for v-support vector classification. IEEE Trans Neural Netw Learn Syst. doi: 10.1109/TNNLS.2016.2527796
- Gu B, Sun X, Sheng VS (2016) Structural minimax probability machine. IEEE Trans Neural Netw Learn Syst. doi: 10.1109/TNNLS.2016.2544779
- Hadley SW, Pelizzari C, Chen GTY (1996) Registration of localization images by maximization of mutual information. In: Proceedings of annual meeting of the American association of physicists in medicineGoogle Scholar
- Herskovits E (1991) Computer-based probabilistic-network construction. Ph.D thesis, Stanford University, USAGoogle Scholar
- Hoyer PO, Janzing D, Mooij JM, Peters J, Schölkopf B (2009) Nonlinear causal discovery with additive noise models. In: Advances in neural information processing systems. MIT press, Massachusetts, pp 689–696Google Scholar
- Kelly L, Clark J, Gilliland G (2002) Comprehensive genotypic analysis of leukemia: clinical and therapeutic implications. Curr Opin Oncol 14(1):10–18Google Scholar
- Kim K-J, Cho S-B (2015) Ensemble bayesian networks evolved with speciation for high-performance prediction in data mining. Soft Comput. doi: 10.1007/s00500-015-1841-z
- Liu Z, Yan H, Lin Z, Xu L (2015b) An improved cloud data sharing scheme with hierarchical attribute structure. J Univers Comput Sci 21(3):454–472Google Scholar
- Meek C (1997) Graphical models: selecting causal and statistical models. Ph.D thesis, Carnegie Mellon UniversityGoogle Scholar
- Peters J, Janzing D, Schölkopf B (2010) Identifying cause and effect on discrete data using additive noise models. In: International conference on artificial intelligence and statistics, pp 597–604Google Scholar
- Zhang K, Hyvärinen A (2008) Distinguishing causes from effects using nonlinear acyclic causal models. In: Journal of machine learning research, workshop and conference proceedings (NIPS 2008 causality workshop), vol 6, pp 157–164Google Scholar
- Zhang K, Peters J, Janzing D, Schölkopf B (2012) Kernel-based conditional independence test and application in causal discovery. arXiv preprint arXiv:1202.3775