Abstract
In this work, we consider the identifiability assumption of Gaussian linear structural equation models (SEMs) in which each variable is determined by a linear function of its parents plus normally distributed error. It has been shown that linear Gaussian structural equation models are fully identifiable if all error variances are the same or known. Hence, this work proves the identifiability of Gaussian SEMs with both homogeneous and heterogeneous unknown error variances. Our new identifiability assumption exploits not only error variances, but edge weights; hence, it is strictly milder than prior work on the identifiability result. We further provide a structure learning algorithm that is statistically consistent and computationally feasible, based on our new assumption. The proposed algorithm assumes that all relevant variables are observed, while it does not assume causal minimality and faithfulness. We verify our theoretical findings through simulations and real multivariate data, and compare our algorithm to state-of-the-art PC, GES and GDS algorithms.
Similar content being viewed by others
References
Bühlmann, P., Peters, J., Ernest, J., et al. (2014). Cam: Causal additive models, high-dimensional order search and penalized regression. The Annals of Statistics, 42, 2526–2556.
Chickering, D. M. (1996). Learning bayesian networks is np-complete, In Learning from data. Springer, pp. 121–130.
Chickering, D. M. (2003). Optimal structure identification with greedy search. The Journal of Machine Learning Research, 3, 507–554.
Chickering, D. M., Geiger, D., & Heckerman, D., et al. (1994). Learning Bayesian networks is NP-hard. Technical Report. Citeseer.
Doya, K. (2007). Bayesian brain: Probabilistic approaches to neural coding. Cambridge: MIT Press.
Edwards, D. (2012). Introduction to graphical modelling. New York: Springer.
Friedman, N., Linial, M., Nachman, I., & Pe’er, D. (2000). Using Bayesian networks to analyze expression data. Journal of computational biology, 7, 601–620.
Ghoshal, A., & Honorio, J. (2017). Learning identifiable gaussian bayesian networks in polynomial time and sample complexity. In: Advances in Neural Information Processing Systems, pp. 6457–6466.
Harary, F. (1973). New directions in the theory of graphs. Technical Report. Michigan Univ. Ann Arbor Dept. of Mathematics.
Hoyer, P. O., Janzing, D., Mooij, J. M., Peters, J., & Schölkopf, B. (2009). Nonlinear causal discovery with additive noise models. In Advances in neural information processing systems, pp. 689–696.
Kalisch, M., & Bühlmann, P. (2007). Estimating high-dimensional directed acyclic graphs with the PC-algorithm. Journal of Machine Learning Research, 8, 613–636.
Kephart, J.O., & White, S.R. (1991). Directed-graph epidemiological models of computer viruses, In Research in Security and Privacy, 1991. Proceedings., 1991 IEEE Computer Society Symposium on IEEE. pp. 343–359.
Lauritzen, S. L. (1996). Graphical models. Oxford: Oxford University Press.
Loh, P. L., & Bühlmann, P. (2014). High-dimensional learning of linear causal networks via inverse covariance estimation. The Journal of Machine Learning Research, 15, 3065–3105.
Mooij, J., Janzing, D., Peters, J., & Schölkopf, B. (2009). Regression by dependence minimization and its application to causal inference in additive noise models, In Proceedings of the 26th annual international conference on machine learning, ACM. pp. 745–752.
Park, G., & Park, H. (2019). Identifiability of generalized hypergeometric distribution (ghd) directed acyclic graphical models. In Proceedings of Machine Learning Research, PMLR. pp. 158–166.
Park, G., & Raskutti, G. (2015). Learning large-scale poisson dag models based on overdispersion scoring. In Advances in Neural Information Processing Systems, pp. 631–639.
Park, G., & Raskutti, G. (2018). Learning quadratic variance function (qvf) dag models via overdispersion scoring (ods). Journal of Machine Learning Research, 18, 1–44.
Pearl, J. (2014). Probabilistic reasoning in intelligent systems: Networks of plausible inference. Amsterdam: Elsevier.
Peters, J., & Bühlmann, P. (2014). Identifiability of gaussian structural equation models with equal error variances. Biometrika, 101, 219–228.
Peters, J., Mooij, J., Janzing, D., & Schölkopf, B. (2012). Identifiability of causal graphs using functional models. arXiv preprint arXiv:1202.3757 .
Shimizu, S., Hoyer, P. O., Hyvärinen, A., & Kerminen, A. (2006). A linear non-Gaussian acyclic model for causal discovery. The Journal of Machine Learning Research, 7, 2003–2030.
Shimizu, S., Inazumi, T., Sogawa, Y., Hyvärinen, A., Kawahara, Y., Washio, T., et al. (2011). Directlingam: A direct method for learning a linear non-gaussian structural equation model. Journal of Machine Learning Research, 12, 1225–1248.
Spirtes, P. (1995). Directed cyclic graphical representations of feedback models, In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc.. pp. 491–498.
Spirtes, P., Glymour, C. N., & Scheines, R. (2000). Causation, prediction, and search. Cambridge: MIT Press.
Tsamardinos, I., & Aliferis, C. F. (2003). Towards principled feature selection: Relevancy, filters and wrappers, In Proceedings of the ninth international workshop on Artificial Intelligence and Statistics, Morgan Kaufmann Publishers: Key West, FL, USA.
Tsamardinos, I., Brown, L. E., & Aliferis, C. F. (2006). The max-min hill-climbing bayesian network structure learning algorithm. Machine Learning, 65, 31–78.
Uhler, C., Raskutti, G., Bühlmann, P., & Yu, B. (2013). Geometry of the faithfulness assumption in causal inference. The Annals of Statistics, pp. 436–463.
Zhang, J., & Spirtes, P. (2016). The three faces of faithfulness. Synthese, 193, 1011–1027.
Acknowledgements
This work was supported by the 2018 Research Fund of the University of Seoul.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 Proof for Theorem 2
Proof
Without loss of generality, we assume that the true ordering \(\pi = (\pi _1,...,\pi _p)\) is unique. For simplicity, we define \(X_{1:j} = (X_{\pi _1},X_{\pi _2},\ldots ,X_{\pi _j})\) and \(X_{1:0} = \emptyset \). We restate the identifiability assumption of Gaussian SEMs.
Assumption 5
(Identifiability) For any node \(m \in V\), let \(j = \pi _{m}\) and \(k \in V {\setminus } \text{ Nd }(j)\). The conditional variance of \(X_j\) given its parents is smaller than the conditional variance of \(X_k\) given the variables before j in ordering \(\pi \):
Now, we prove identifiability of Gaussian SEMs using mathematical induction.
Step (1) By Assumption 5, for any node \(k \in V {\setminus } \{\pi _1\}\), we have
Therefore, \(\pi _1\) can be correctly identified.
Step (m−1) For the \((m-1)\)th element of the ordering, assume that the first \(m-1\) elements of the ordering and their parents are correctly estimated.
Step (m) Now, we consider the mth element of the causal ordering and its parents. By Assumption 5, for \(k \in \{ \pi _{m+1},\cdots , \pi _{p} \}\),
Hence, we can choose the true \(m^{th}\) element of the ordering \(\pi _m\).
In terms of the parents search, it is clear that conditional independence relations naturally encoded by the factorization (1) and imply causal minimality (see details in Pearl 2014; Peters and Bühlmann 2014). In our settings, causal minimality states that for any node \(j \in V\) and one of its parents \(k \in \text{ Pa }(j)\),
Therefore, we can choose the correct parents of \(\pi _m\). By mathematical induction, this completes the proof. \(\square \)
1.2 Identifiability for three-node chain graph
Consider a Gaussian SEM, \(X_1 \rightarrow X_2 \rightarrow X_3\), where \(X_1 = \epsilon _1\), \(X_2 = \beta _1 X_1 + \epsilon _1\), and \(X_3 =\beta _2 X_2 + \epsilon _3\) with \(\epsilon _j \sim N(0, \sigma _j^2)\) for all \(j \in \{1,2,3\}\). Then the first element of the ordering can be determined by comparing the variances of nodes:
as long as \(\sigma _2^2/ \sigma _1^2 > (1 - \beta _1^2)\) and \(\sigma _3^2/ \sigma _1^2 > (1 - \beta _2^2)\).
The second element of the ordering can also be recovered by comparing the expectation of the conditional variance of the remaining variables given the estimated first element of the ordering:
as long as \(\sigma _3^2/ \sigma _2^2 > (1 - \beta _2^2)\).
Under the minimality and the Markov condition, we also have the following (conditional) dependence relations: Therefore, the true graph can be recovered.
Rights and permissions
About this article
Cite this article
Park, G., Kim, Y. Identifiability of Gaussian linear structural equation models with homogeneous and heterogeneous error variances. J. Korean Stat. Soc. 49, 276–292 (2020). https://doi.org/10.1007/s42952-019-00019-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42952-019-00019-7