Identifiability of Gaussian linear structural equation models with homogeneous and heterogeneous error variances

Park, Gunwoong; Kim, Youngwhan

doi:10.1007/s42952-019-00019-7

Identifiability of Gaussian linear structural equation models with homogeneous and heterogeneous error variances

Research Article
Published: 01 January 2020

Volume 49, pages 276–292, (2020)
Cite this article

Journal of the Korean Statistical Society Aims and scope Submit manuscript

Gunwoong Park¹ &
Youngwhan Kim¹

495 Accesses
9 Citations
Explore all metrics

Abstract

In this work, we consider the identifiability assumption of Gaussian linear structural equation models (SEMs) in which each variable is determined by a linear function of its parents plus normally distributed error. It has been shown that linear Gaussian structural equation models are fully identifiable if all error variances are the same or known. Hence, this work proves the identifiability of Gaussian SEMs with both homogeneous and heterogeneous unknown error variances. Our new identifiability assumption exploits not only error variances, but edge weights; hence, it is strictly milder than prior work on the identifiability result. We further provide a structure learning algorithm that is statistically consistent and computationally feasible, based on our new assumption. The proposed algorithm assumes that all relevant variables are observed, while it does not assume causal minimality and faithfulness. We verify our theoretical findings through simulations and real multivariate data, and compare our algorithm to state-of-the-art PC, GES and GDS algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Fig. 2

Fig. 3

Fig. 7

Comparison of Statistical Procedures for Gaussian Graphical Model Selection

Learning Bivariate Functional Causal Models

Bayesian sparse covariance decomposition with a graphical structure

Article 09 December 2014

References

Bühlmann, P., Peters, J., Ernest, J., et al. (2014). Cam: Causal additive models, high-dimensional order search and penalized regression. The Annals of Statistics, 42, 2526–2556.
Article MathSciNet Google Scholar
Chickering, D. M. (1996). Learning bayesian networks is np-complete, In Learning from data. Springer, pp. 121–130.
Chickering, D. M. (2003). Optimal structure identification with greedy search. The Journal of Machine Learning Research, 3, 507–554.
MathSciNet MATH Google Scholar
Chickering, D. M., Geiger, D., & Heckerman, D., et al. (1994). Learning Bayesian networks is NP-hard. Technical Report. Citeseer.
Doya, K. (2007). Bayesian brain: Probabilistic approaches to neural coding. Cambridge: MIT Press.
MATH Google Scholar
Edwards, D. (2012). Introduction to graphical modelling. New York: Springer.
MATH Google Scholar
Friedman, N., Linial, M., Nachman, I., & Pe’er, D. (2000). Using Bayesian networks to analyze expression data. Journal of computational biology, 7, 601–620.
Article Google Scholar
Ghoshal, A., & Honorio, J. (2017). Learning identifiable gaussian bayesian networks in polynomial time and sample complexity. In: Advances in Neural Information Processing Systems, pp. 6457–6466.
Harary, F. (1973). New directions in the theory of graphs. Technical Report. Michigan Univ. Ann Arbor Dept. of Mathematics.
Hoyer, P. O., Janzing, D., Mooij, J. M., Peters, J., & Schölkopf, B. (2009). Nonlinear causal discovery with additive noise models. In Advances in neural information processing systems, pp. 689–696.
Kalisch, M., & Bühlmann, P. (2007). Estimating high-dimensional directed acyclic graphs with the PC-algorithm. Journal of Machine Learning Research, 8, 613–636.
MATH Google Scholar
Kephart, J.O., & White, S.R. (1991). Directed-graph epidemiological models of computer viruses, In Research in Security and Privacy, 1991. Proceedings., 1991 IEEE Computer Society Symposium on IEEE. pp. 343–359.
Lauritzen, S. L. (1996). Graphical models. Oxford: Oxford University Press.
MATH Google Scholar
Loh, P. L., & Bühlmann, P. (2014). High-dimensional learning of linear causal networks via inverse covariance estimation. The Journal of Machine Learning Research, 15, 3065–3105.
MathSciNet MATH Google Scholar
Mooij, J., Janzing, D., Peters, J., & Schölkopf, B. (2009). Regression by dependence minimization and its application to causal inference in additive noise models, In Proceedings of the 26th annual international conference on machine learning, ACM. pp. 745–752.
Park, G., & Park, H. (2019). Identifiability of generalized hypergeometric distribution (ghd) directed acyclic graphical models. In Proceedings of Machine Learning Research, PMLR. pp. 158–166.
Park, G., & Raskutti, G. (2015). Learning large-scale poisson dag models based on overdispersion scoring. In Advances in Neural Information Processing Systems, pp. 631–639.
Park, G., & Raskutti, G. (2018). Learning quadratic variance function (qvf) dag models via overdispersion scoring (ods). Journal of Machine Learning Research, 18, 1–44.
MATH Google Scholar
Pearl, J. (2014). Probabilistic reasoning in intelligent systems: Networks of plausible inference. Amsterdam: Elsevier.
MATH Google Scholar
Peters, J., & Bühlmann, P. (2014). Identifiability of gaussian structural equation models with equal error variances. Biometrika, 101, 219–228.
Article MathSciNet Google Scholar
Peters, J., Mooij, J., Janzing, D., & Schölkopf, B. (2012). Identifiability of causal graphs using functional models. arXiv preprint arXiv:1202.3757 .
Shimizu, S., Hoyer, P. O., Hyvärinen, A., & Kerminen, A. (2006). A linear non-Gaussian acyclic model for causal discovery. The Journal of Machine Learning Research, 7, 2003–2030.
MathSciNet MATH Google Scholar
Shimizu, S., Inazumi, T., Sogawa, Y., Hyvärinen, A., Kawahara, Y., Washio, T., et al. (2011). Directlingam: A direct method for learning a linear non-gaussian structural equation model. Journal of Machine Learning Research, 12, 1225–1248.
MathSciNet MATH Google Scholar
Spirtes, P. (1995). Directed cyclic graphical representations of feedback models, In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc.. pp. 491–498.
Spirtes, P., Glymour, C. N., & Scheines, R. (2000). Causation, prediction, and search. Cambridge: MIT Press.
MATH Google Scholar
Tsamardinos, I., & Aliferis, C. F. (2003). Towards principled feature selection: Relevancy, filters and wrappers, In Proceedings of the ninth international workshop on Artificial Intelligence and Statistics, Morgan Kaufmann Publishers: Key West, FL, USA.
Tsamardinos, I., Brown, L. E., & Aliferis, C. F. (2006). The max-min hill-climbing bayesian network structure learning algorithm. Machine Learning, 65, 31–78.
Article Google Scholar
Uhler, C., Raskutti, G., Bühlmann, P., & Yu, B. (2013). Geometry of the faithfulness assumption in causal inference. The Annals of Statistics, pp. 436–463.
Zhang, J., & Spirtes, P. (2016). The three faces of faithfulness. Synthese, 193, 1011–1027.
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work was supported by the 2018 Research Fund of the University of Seoul.

Author information

Authors and Affiliations

Department of Statistics, University of Seoul, Seoulsiripdaero 163, Dongdaemun-gu, Seoul, 02504, Republic of Korea
Gunwoong Park & Youngwhan Kim

Authors

Gunwoong Park
View author publications
You can also search for this author in PubMed Google Scholar
Youngwhan Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gunwoong Park.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Proof for Theorem 2

Proof

Without loss of generality, we assume that the true ordering $\pi = (\pi _1,...,\pi _p)$ is unique. For simplicity, we define $X_{1:j} = (X_{\pi _1},X_{\pi _2},\ldots ,X_{\pi _j})$ and $X_{1:0} = \emptyset $. We restate the identifiability assumption of Gaussian SEMs.

Assumption 5

(Identifiability) For any node $m \in V$, let $j = \pi _{m}$ and $k \in V {\setminus } \text{ Nd }(j)$. The conditional variance of $X_j$ given its parents is smaller than the conditional variance of $X_k$ given the variables before j in ordering $\pi $:

$$\begin{aligned} \sigma _j^2 < \sigma _k^2 + \mathbb {E}( \text{ Var }( \mathbb {E}(X_k \mid X_{\text{ Pa }(k)} ) \mid X_{1:(j-1)} ) ). \end{aligned}$$

Now, we prove identifiability of Gaussian SEMs using mathematical induction.

Step (1) By Assumption 5, for any node $k \in V {\setminus } \{\pi _1\}$, we have

$$\begin{aligned} \text{ Var }(X_{\pi _1}) = \sigma _{\pi _1}^2 < \sigma _k^2 + \text{ Var }( \mathbb {E}(X_k \mid X_{\text{ Pa }(k)} ) ) = \text{ Var }(X_k). \end{aligned}$$

Therefore, $\pi _1$ can be correctly identified.

Step (m−1) For the $(m-1)$th element of the ordering, assume that the first $m-1$ elements of the ordering and their parents are correctly estimated.

Step (m) Now, we consider the mth element of the causal ordering and its parents. By Assumption 5, for $k \in \{ \pi _{m+1},\cdots , \pi _{p} \}$,

$$\begin{aligned} \begin{aligned} \mathbb {E}( \text {Var}( X_{\pi _m} \mid X_{1:(m-1)} ) )=&{} \ \sigma _{\pi _m}^2 < \sigma _k^2 +\mathbb {E}( \text {Var}( \mathbb {E}(X_k \mid X_{\text { Pa }(k)} ) \mid X_{1:(m-1)}) ) \\=&{} \ \mathbb {E}( \text {Var}(X_{k} \mid X_{1:(m-1)}) ). \end{aligned} \end{aligned}$$

Hence, we can choose the true $m^{th}$ element of the ordering $\pi _m$.

In terms of the parents search, it is clear that conditional independence relations naturally encoded by the factorization (1) and imply causal minimality (see details in Pearl 2014; Peters and Bühlmann 2014). In our settings, causal minimality states that for any node $j \in V$ and one of its parents $k \in \text{ Pa }(j)$,

Therefore, we can choose the correct parents of $\pi _m$. By mathematical induction, this completes the proof. $\square $

1.2 Identifiability for three-node chain graph

Consider a Gaussian SEM, $X_1 \rightarrow X_2 \rightarrow X_3$, where $X_1 = \epsilon _1$, $X_2 = \beta _1 X_1 + \epsilon _1$, and $X_3 =\beta _2 X_2 + \epsilon _3$ with $\epsilon _j \sim N(0, \sigma _j^2)$ for all $j \in \{1,2,3\}$. Then the first element of the ordering can be determined by comparing the variances of nodes:

$$\begin{aligned} \text{ Var }( X_2 )&= \mathbb {E}( \text{ Var }( X_2 \mid X_1) ) + \text{ Var }( \mathbb {E}( X_2 \mid X_1) ) = \sigma _2^2 + \beta _1^2 \sigma _1^2> \sigma _1^2 = \text{ Var }(X_1) \\ \text{ Var }( X_3 )&= \mathbb {E}( \text{ Var }( X_3 \mid X_2) ) + \text{ Var }( \mathbb {E}( X_3 \mid X_2) ) \\&= \sigma _3^2 + \beta _2^2 \sigma _2^2 + \beta _2^2 \beta _1^2 \sigma _1^2 > \sigma _1^2 = \text{ Var }(X_1). \end{aligned}$$

as long as $\sigma _2^2/ \sigma _1^2 > (1 - \beta _1^2)$ and $\sigma _3^2/ \sigma _1^2 > (1 - \beta _2^2)$.

The second element of the ordering can also be recovered by comparing the expectation of the conditional variance of the remaining variables given the estimated first element of the ordering:

$$\begin{aligned} \mathbb {E}( \text{ Var }( X_3 \mid X_1) )&= \mathbb {E}( \mathbb {E}( \text{ Var }( X_3 \mid X_2) \mid X_1) ) + \mathbb {E}( \text{ Var }( \mathbb {E}( X_3 \mid X_2) \mid X_1 ) )\\&= \sigma _3^2 + \beta _2^2 \sigma _2^2 > \sigma _2^2 = \mathbb {E}( \text{ Var }( X_2 \mid X_1 ) ). \end{aligned}$$

as long as $\sigma _3^2/ \sigma _2^2 > (1 - \beta _2^2)$.

Under the minimality and the Markov condition, we also have the following (conditional) dependence relations: Therefore, the true graph can be recovered.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Park, G., Kim, Y. Identifiability of Gaussian linear structural equation models with homogeneous and heterogeneous error variances. J. Korean Stat. Soc. 49, 276–292 (2020). https://doi.org/10.1007/s42952-019-00019-7

Download citation

Received: 23 April 2019
Accepted: 06 July 2019
Published: 01 January 2020
Issue Date: March 2020
DOI: https://doi.org/10.1007/s42952-019-00019-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Identifiability of Gaussian linear structural equation models with homogeneous and heterogeneous error variances

Abstract

Access this article

Similar content being viewed by others

Comparison of Statistical Procedures for Gaussian Graphical Model Selection

Learning Bivariate Functional Causal Models

Bayesian sparse covariance decomposition with a graphical structure

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

1.1 Proof for Theorem 2

Proof

Assumption 5

1.2 Identifiability for three-node chain graph

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Identifiability of Gaussian linear structural equation models with homogeneous and heterogeneous error variances

Abstract

Access this article

Similar content being viewed by others

Comparison of Statistical Procedures for Gaussian Graphical Model Selection

Learning Bivariate Functional Causal Models

Bayesian sparse covariance decomposition with a graphical structure

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

1.1 Proof for Theorem 2

Proof

Assumption 5

1.2 Identifiability for three-node chain graph

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation