Skip to main content
Log in

Identifiability of Gaussian linear structural equation models with homogeneous and heterogeneous error variances

  • Research Article
  • Published:
Journal of the Korean Statistical Society Aims and scope Submit manuscript

Abstract

In this work, we consider the identifiability assumption of Gaussian linear structural equation models (SEMs) in which each variable is determined by a linear function of its parents plus normally distributed error. It has been shown that linear Gaussian structural equation models are fully identifiable if all error variances are the same or known. Hence, this work proves the identifiability of Gaussian SEMs with both homogeneous and heterogeneous unknown error variances. Our new identifiability assumption exploits not only error variances, but edge weights; hence, it is strictly milder than prior work on the identifiability result. We further provide a structure learning algorithm that is statistically consistent and computationally feasible, based on our new assumption. The proposed algorithm assumes that all relevant variables are observed, while it does not assume causal minimality and faithfulness. We verify our theoretical findings through simulations and real multivariate data, and compare our algorithm to state-of-the-art PC, GES and GDS algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Bühlmann, P., Peters, J., Ernest, J., et al. (2014). Cam: Causal additive models, high-dimensional order search and penalized regression. The Annals of Statistics, 42, 2526–2556.

    Article  MathSciNet  Google Scholar 

  • Chickering, D. M. (1996). Learning bayesian networks is np-complete, In Learning from data. Springer, pp. 121–130.

  • Chickering, D. M. (2003). Optimal structure identification with greedy search. The Journal of Machine Learning Research, 3, 507–554.

    MathSciNet  MATH  Google Scholar 

  • Chickering, D. M., Geiger, D., & Heckerman, D., et al. (1994). Learning Bayesian networks is NP-hard. Technical Report. Citeseer.

  • Doya, K. (2007). Bayesian brain: Probabilistic approaches to neural coding. Cambridge: MIT Press.

    MATH  Google Scholar 

  • Edwards, D. (2012). Introduction to graphical modelling. New York: Springer.

    MATH  Google Scholar 

  • Friedman, N., Linial, M., Nachman, I., & Pe’er, D. (2000). Using Bayesian networks to analyze expression data. Journal of computational biology, 7, 601–620.

    Article  Google Scholar 

  • Ghoshal, A., & Honorio, J. (2017). Learning identifiable gaussian bayesian networks in polynomial time and sample complexity. In: Advances in Neural Information Processing Systems, pp. 6457–6466.

  • Harary, F. (1973). New directions in the theory of graphs. Technical Report. Michigan Univ. Ann Arbor Dept. of Mathematics.

  • Hoyer, P. O., Janzing, D., Mooij, J. M., Peters, J., & Schölkopf, B. (2009). Nonlinear causal discovery with additive noise models. In Advances in neural information processing systems, pp. 689–696.

  • Kalisch, M., & Bühlmann, P. (2007). Estimating high-dimensional directed acyclic graphs with the PC-algorithm. Journal of Machine Learning Research, 8, 613–636.

    MATH  Google Scholar 

  • Kephart, J.O., & White, S.R. (1991). Directed-graph epidemiological models of computer viruses, In Research in Security and Privacy, 1991. Proceedings., 1991 IEEE Computer Society Symposium on IEEE. pp. 343–359.

  • Lauritzen, S. L. (1996). Graphical models. Oxford: Oxford University Press.

    MATH  Google Scholar 

  • Loh, P. L., & Bühlmann, P. (2014). High-dimensional learning of linear causal networks via inverse covariance estimation. The Journal of Machine Learning Research, 15, 3065–3105.

    MathSciNet  MATH  Google Scholar 

  • Mooij, J., Janzing, D., Peters, J., & Schölkopf, B. (2009). Regression by dependence minimization and its application to causal inference in additive noise models, In Proceedings of the 26th annual international conference on machine learning, ACM. pp. 745–752.

  • Park, G., & Park, H. (2019). Identifiability of generalized hypergeometric distribution (ghd) directed acyclic graphical models. In Proceedings of Machine Learning Research, PMLR. pp. 158–166.

  • Park, G., & Raskutti, G. (2015). Learning large-scale poisson dag models based on overdispersion scoring. In Advances in Neural Information Processing Systems, pp. 631–639.

  • Park, G., & Raskutti, G. (2018). Learning quadratic variance function (qvf) dag models via overdispersion scoring (ods). Journal of Machine Learning Research, 18, 1–44.

    MATH  Google Scholar 

  • Pearl, J. (2014). Probabilistic reasoning in intelligent systems: Networks of plausible inference. Amsterdam: Elsevier.

    MATH  Google Scholar 

  • Peters, J., & Bühlmann, P. (2014). Identifiability of gaussian structural equation models with equal error variances. Biometrika, 101, 219–228.

    Article  MathSciNet  Google Scholar 

  • Peters, J., Mooij, J., Janzing, D., & Schölkopf, B. (2012). Identifiability of causal graphs using functional models. arXiv preprint arXiv:1202.3757 .

  • Shimizu, S., Hoyer, P. O., Hyvärinen, A., & Kerminen, A. (2006). A linear non-Gaussian acyclic model for causal discovery. The Journal of Machine Learning Research, 7, 2003–2030.

    MathSciNet  MATH  Google Scholar 

  • Shimizu, S., Inazumi, T., Sogawa, Y., Hyvärinen, A., Kawahara, Y., Washio, T., et al. (2011). Directlingam: A direct method for learning a linear non-gaussian structural equation model. Journal of Machine Learning Research, 12, 1225–1248.

    MathSciNet  MATH  Google Scholar 

  • Spirtes, P. (1995). Directed cyclic graphical representations of feedback models, In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc.. pp. 491–498.

  • Spirtes, P., Glymour, C. N., & Scheines, R. (2000). Causation, prediction, and search. Cambridge: MIT Press.

    MATH  Google Scholar 

  • Tsamardinos, I., & Aliferis, C. F. (2003). Towards principled feature selection: Relevancy, filters and wrappers, In Proceedings of the ninth international workshop on Artificial Intelligence and Statistics, Morgan Kaufmann Publishers: Key West, FL, USA.

  • Tsamardinos, I., Brown, L. E., & Aliferis, C. F. (2006). The max-min hill-climbing bayesian network structure learning algorithm. Machine Learning, 65, 31–78.

    Article  Google Scholar 

  • Uhler, C., Raskutti, G., Bühlmann, P., & Yu, B. (2013). Geometry of the faithfulness assumption in causal inference. The Annals of Statistics, pp. 436–463.

  • Zhang, J., & Spirtes, P. (2016). The three faces of faithfulness. Synthese, 193, 1011–1027.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work was supported by the 2018 Research Fund of the University of Seoul.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gunwoong Park.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Proof for Theorem 2

Proof

Without loss of generality, we assume that the true ordering \(\pi = (\pi _1,...,\pi _p)\) is unique. For simplicity, we define \(X_{1:j} = (X_{\pi _1},X_{\pi _2},\ldots ,X_{\pi _j})\) and \(X_{1:0} = \emptyset \). We restate the identifiability assumption of Gaussian SEMs.

Assumption 5

(Identifiability) For any node \(m \in V\), let \(j = \pi _{m}\) and \(k \in V {\setminus } \text{ Nd }(j)\). The conditional variance of \(X_j\) given its parents is smaller than the conditional variance of \(X_k\) given the variables before j in ordering \(\pi \):

$$\begin{aligned} \sigma _j^2 < \sigma _k^2 + \mathbb {E}( \text{ Var }( \mathbb {E}(X_k \mid X_{\text{ Pa }(k)} ) \mid X_{1:(j-1)} ) ). \end{aligned}$$

Now, we prove identifiability of Gaussian SEMs using mathematical induction.

Step (1) By Assumption 5, for any node \(k \in V {\setminus } \{\pi _1\}\), we have

$$\begin{aligned} \text{ Var }(X_{\pi _1}) = \sigma _{\pi _1}^2 < \sigma _k^2 + \text{ Var }( \mathbb {E}(X_k \mid X_{\text{ Pa }(k)} ) ) = \text{ Var }(X_k). \end{aligned}$$

Therefore, \(\pi _1\) can be correctly identified.

Step (m−1) For the \((m-1)\)th element of the ordering, assume that the first \(m-1\) elements of the ordering and their parents are correctly estimated.

Step (m) Now, we consider the mth element of the causal ordering and its parents. By Assumption 5, for \(k \in \{ \pi _{m+1},\cdots , \pi _{p} \}\),

$$\begin{aligned} \begin{aligned} \mathbb {E}( \text {Var}( X_{\pi _m} \mid X_{1:(m-1)} ) )=&{} \ \sigma _{\pi _m}^2 < \sigma _k^2 +\mathbb {E}( \text {Var}( \mathbb {E}(X_k \mid X_{\text { Pa }(k)} ) \mid X_{1:(m-1)}) ) \\=&{} \ \mathbb {E}( \text {Var}(X_{k} \mid X_{1:(m-1)}) ). \end{aligned} \end{aligned}$$

Hence, we can choose the true \(m^{th}\) element of the ordering \(\pi _m\).

In terms of the parents search, it is clear that conditional independence relations naturally encoded by the factorization (1) and imply causal minimality (see details in Pearl 2014; Peters and Bühlmann 2014). In our settings, causal minimality states that for any node \(j \in V\) and one of its parents \(k \in \text{ Pa }(j)\),

Therefore, we can choose the correct parents of \(\pi _m\). By mathematical induction, this completes the proof. \(\square \)

1.2 Identifiability for three-node chain graph

Consider a Gaussian SEM, \(X_1 \rightarrow X_2 \rightarrow X_3\), where \(X_1 = \epsilon _1\), \(X_2 = \beta _1 X_1 + \epsilon _1\), and \(X_3 =\beta _2 X_2 + \epsilon _3\) with \(\epsilon _j \sim N(0, \sigma _j^2)\) for all \(j \in \{1,2,3\}\). Then the first element of the ordering can be determined by comparing the variances of nodes:

$$\begin{aligned} \text{ Var }( X_2 )&= \mathbb {E}( \text{ Var }( X_2 \mid X_1) ) + \text{ Var }( \mathbb {E}( X_2 \mid X_1) ) = \sigma _2^2 + \beta _1^2 \sigma _1^2> \sigma _1^2 = \text{ Var }(X_1) \\ \text{ Var }( X_3 )&= \mathbb {E}( \text{ Var }( X_3 \mid X_2) ) + \text{ Var }( \mathbb {E}( X_3 \mid X_2) ) \\&= \sigma _3^2 + \beta _2^2 \sigma _2^2 + \beta _2^2 \beta _1^2 \sigma _1^2 > \sigma _1^2 = \text{ Var }(X_1). \end{aligned}$$

as long as \(\sigma _2^2/ \sigma _1^2 > (1 - \beta _1^2)\) and \(\sigma _3^2/ \sigma _1^2 > (1 - \beta _2^2)\).

The second element of the ordering can also be recovered by comparing the expectation of the conditional variance of the remaining variables given the estimated first element of the ordering:

$$\begin{aligned} \mathbb {E}( \text{ Var }( X_3 \mid X_1) )&= \mathbb {E}( \mathbb {E}( \text{ Var }( X_3 \mid X_2) \mid X_1) ) + \mathbb {E}( \text{ Var }( \mathbb {E}( X_3 \mid X_2) \mid X_1 ) )\\&= \sigma _3^2 + \beta _2^2 \sigma _2^2 > \sigma _2^2 = \mathbb {E}( \text{ Var }( X_2 \mid X_1 ) ). \end{aligned}$$

as long as \(\sigma _3^2/ \sigma _2^2 > (1 - \beta _2^2)\).

Under the minimality and the Markov condition, we also have the following (conditional) dependence relations: Therefore, the true graph can be recovered.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Park, G., Kim, Y. Identifiability of Gaussian linear structural equation models with homogeneous and heterogeneous error variances. J. Korean Stat. Soc. 49, 276–292 (2020). https://doi.org/10.1007/s42952-019-00019-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42952-019-00019-7

Keywords

Navigation