Skip to main content
Log in

Estimating linear causality in the presence of latent variables

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Learning causality from data is known as the causal discovery problem, and it is an important and relatively new field. In many applications, there often exist latent variables, if such latent variables are completely ignored, which can lead to the estimation results seriously biased. In this paper, a method of combining exploratory factor analysis and path analysis (EFA-PA) is proposed to infer the causality in the presence of latent variables. Our method expands latent variables as well as their linear causal relationships with observed variables, which enhances the accuracy of causal models. Such model can be thought of as the simplest possible causal models for continuous data. The EFA-PA is very similar to that of structural equation model, but the theoretical model established by the structural equation model needs to be modified in the process of data fitting until the ideal model is established.The model gained by EFA-PA not only avoids subjectivity but also reduces estimation complexity. It is found that the EFA-PA estimation model is superior to the other models. EFA-PA can provides a basis for the correct estimation of the causal relationship between the observed variables in the presence of latent variables. The experiment shows that EFA-PA is better than the structural equation model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Browne, R.P., Mcnicholas, P.D.: Model-based clustering, classification, and discriminant analysis of data with mixed type. J. Stat. Plan. Inference 142(11), 2976–2984 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  2. Cai, J.H., Song, X.Y., Lam, K.H., et al.: A mixture of generalized latent variable models for mixed mode and heterogeneous data. Comput. Stat. Data Anal. 55(11), 2889–2907 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  3. Chen, Z., Chan, L.: Causality in linear nongaussian acyclic models in the presence of latent Gaussian confounders. Neural Comput. 25(6), 1605–1641 (2013)

    Article  MathSciNet  Google Scholar 

  4. Coolen, F.P.A.: Causation, prediction, and search by P. Spirtes; C. Glymour; R. Scheines. J. R. Stat. Soc. 51(4), 586–587 (2002)

    Google Scholar 

  5. Entner, D., Hoyer, P.O.: Estimating a causal order among groups of variables in linear models. International Conference on Artificial Neural Networks and Machine Learning, pp. 84–91. Springer, New York (2012)

  6. Entner, D., Hoyer, P.O., Spirtes, P.: Statistical test for consistent estimation of causal effects in linear non-Gaussian models (2012)

  7. Espejo, M.R.: The Oxford Dictionary of Statistical Terms, p. 377. Oxford University Press, Oxford (2003)

    Google Scholar 

  8. Gollini, I., Murphy, T.B.: Mixture of latent trait analyzers for model-based clustering of categorical data. Stat. Comput. 24(4), 569–588 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  9. Henao, R., Winther, O.: Sparse linear identifiable multivariate modeling. J. Mach. Learn. Res. 12(5), 863–905 (2011)

    MathSciNet  MATH  Google Scholar 

  10. Henao, R., Winther, O.: Predictive active set selection methods for Gaussian processes. Neurocomputing 80(2), 10–18 (2011)

    Google Scholar 

  11. Hoyer, P.O., Hyttinen, A. et al.: Bayesian discovery of linear acyclic causal models, pp. 240–248(2012)

  12. Hoyer, P.O., Shimizu, S., Kerminen, A.J.: Estimation of linear, non-Gaussian causal models in the presence of confounding latent variables. Comput. Sci. 16, 1535–1538 (2006)

    Google Scholar 

  13. Hoyer, P.O., Shimizu, S., et al.: Estimation of causal effects using linear non-Gaussian causal models with hidden variables. Int. J. Approx. Reason. 49(2), 362–378 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  14. Huang, A.: Joint estimation of the mean and error distribution in generalized linear models. J. Am. Stat. Assoc. 109(505), 186–196 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  15. Hyvärinen, A., Smith, S.M.: Pairwise likelihood ratios for estimation of non-Gaussian structural equation models. J. Mach. Learn. Res. 14(1), 111–152 (2013)

    MathSciNet  MATH  Google Scholar 

  16. Kline, R.B.: Principles and Practice of Structural Equation Modeling. Journal of the American Statistical Association, vol. 101, No. 12 (2006)

  17. Loehlin, J.C.: Latent Variable Models: An Introduction to Factor, Path, and Structural Equation Analysis, vol. 12, 4th edn. Lawrence Erlbaum Associates, Mahwah (2004)

    MATH  Google Scholar 

  18. Moneta, A., Coad, A., Entner, D. et al.: Causal Inference by Independent Component Analysis with Applications to Micro- and Macroeconomic Data. Jena Economic Research Papers (2010-031) (2010)

  19. Neuberg, L.G.: Causality: models, reasoning, and inference, by Judea Pearl, Cambridge University Press, 2000. Econ. Theory 19(4), 675–685 (2003)

    Article  Google Scholar 

  20. Peters, J., Bühlmann, P., Meinshausen, N.: Causal inference using invariant prediction: identification and confidence intervals. Statistics 78(5), 947 (2015)

    MathSciNet  Google Scholar 

  21. Ramsey, J.D., Sanchez-Romero, R., Glymour, C.: Non-Gaussian methods and high-pass filters in the estimation of effective connections. Neuroimage 84(1), 986–1006 (2014)

    Article  Google Scholar 

  22. Rosenström, T., Jokela, M., Puttonen, S., et al.: Pairwise measures of causal direction in the epidemiology of sleep problems and depression. PLoS ONE 7(11), 154–159 (2012)

    Article  Google Scholar 

  23. Shimizu, S., Hoyer, P.O., Hyvärinen, A., et al.: A linear non-Gaussian acyclic model for causal discovery. J. Mach. Learn. Res. 7(4), 2003–2030 (2006)

    MathSciNet  MATH  Google Scholar 

  24. Shimizu, S., Hyvarinen, A., Kano, Y., et al.: Discovery of non-Gaussian linear causal models using ICA. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 526–533 (2012)

  25. Shimizu, S., Inazumi, T., Sogawa, Y., et al.: DirectLiNGAM: a direct method for learning a linear non-Gaussian structural equation model. J. Mach. Learn. Res. 12(2), 1225 (2011)

    MathSciNet  MATH  Google Scholar 

  26. Statnikov, A., et al.: New methods for separating causes from effects in genomics data. BMC Genomics 13(8), S22 (2012)

    Article  Google Scholar 

  27. Stubbe, M., Gyurova, A., Gimsa, J.: Bayesian estimation of causal direction in acyclic structural equation models with individual-specific confounder variables and non-Gaussian distributions. J. Mach. Learn. Res. 15(10), 2629–2652 (2013)

    MathSciNet  Google Scholar 

  28. Tabachnick, B.G., Fidell, L.S.: SAS for Windows workbook for Tabachnick and Fidell : using multivariate statistics: Allyn and Bacon (2001)

  29. Zhang, J., Spirtes, P.L.: A transformational characterization of Markov equivalence for directed acyclic graphs with latent variables. In: Proceeings of the Conference on Uncertainty in Artificial Intelligence (2012)

  30. Zhang, K., Hyvärinen, A.: On the identifiability of the post-nonlinear causal model. Conference on Uncertainty in Artificial Intelligence. AUAI Press, pp. 647-655 (2009)

  31. Zhou, X.H., Guo, W.J.: Comparison on the sameness and difference of exploratory factor analysis and confirmatory factory analysis. Science Technology & Industry (2008)

Download references

Acknowledgements

We thank the editor and referees for comments that led to improvements in the manuscript. The research is supported by the National Natural Science Foundation of China (61573266).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nina Fei.

Appendix

Appendix

1. See Table 5.

Table 5 The specific meaning of the16 observed variables

2. See Fig. 6.

With a scree plot auxiliary judgment the number of latents

Fig. 6
figure 6

Scree plot is used to help determine the number of latent variables

3. See Table 6.

Table 6 Factor loading matrix after orthogonal

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fei, N., Yang, Y. Estimating linear causality in the presence of latent variables. Cluster Comput 20, 1025–1033 (2017). https://doi.org/10.1007/s10586-017-0824-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-017-0824-5

Keywords

Navigation