Abstract
Learning causality from data is known as the causal discovery problem, and it is an important and relatively new field. In many applications, there often exist latent variables, if such latent variables are completely ignored, which can lead to the estimation results seriously biased. In this paper, a method of combining exploratory factor analysis and path analysis (EFA-PA) is proposed to infer the causality in the presence of latent variables. Our method expands latent variables as well as their linear causal relationships with observed variables, which enhances the accuracy of causal models. Such model can be thought of as the simplest possible causal models for continuous data. The EFA-PA is very similar to that of structural equation model, but the theoretical model established by the structural equation model needs to be modified in the process of data fitting until the ideal model is established.The model gained by EFA-PA not only avoids subjectivity but also reduces estimation complexity. It is found that the EFA-PA estimation model is superior to the other models. EFA-PA can provides a basis for the correct estimation of the causal relationship between the observed variables in the presence of latent variables. The experiment shows that EFA-PA is better than the structural equation model.
Similar content being viewed by others
References
Browne, R.P., Mcnicholas, P.D.: Model-based clustering, classification, and discriminant analysis of data with mixed type. J. Stat. Plan. Inference 142(11), 2976–2984 (2012)
Cai, J.H., Song, X.Y., Lam, K.H., et al.: A mixture of generalized latent variable models for mixed mode and heterogeneous data. Comput. Stat. Data Anal. 55(11), 2889–2907 (2011)
Chen, Z., Chan, L.: Causality in linear nongaussian acyclic models in the presence of latent Gaussian confounders. Neural Comput. 25(6), 1605–1641 (2013)
Coolen, F.P.A.: Causation, prediction, and search by P. Spirtes; C. Glymour; R. Scheines. J. R. Stat. Soc. 51(4), 586–587 (2002)
Entner, D., Hoyer, P.O.: Estimating a causal order among groups of variables in linear models. International Conference on Artificial Neural Networks and Machine Learning, pp. 84–91. Springer, New York (2012)
Entner, D., Hoyer, P.O., Spirtes, P.: Statistical test for consistent estimation of causal effects in linear non-Gaussian models (2012)
Espejo, M.R.: The Oxford Dictionary of Statistical Terms, p. 377. Oxford University Press, Oxford (2003)
Gollini, I., Murphy, T.B.: Mixture of latent trait analyzers for model-based clustering of categorical data. Stat. Comput. 24(4), 569–588 (2014)
Henao, R., Winther, O.: Sparse linear identifiable multivariate modeling. J. Mach. Learn. Res. 12(5), 863–905 (2011)
Henao, R., Winther, O.: Predictive active set selection methods for Gaussian processes. Neurocomputing 80(2), 10–18 (2011)
Hoyer, P.O., Hyttinen, A. et al.: Bayesian discovery of linear acyclic causal models, pp. 240–248(2012)
Hoyer, P.O., Shimizu, S., Kerminen, A.J.: Estimation of linear, non-Gaussian causal models in the presence of confounding latent variables. Comput. Sci. 16, 1535–1538 (2006)
Hoyer, P.O., Shimizu, S., et al.: Estimation of causal effects using linear non-Gaussian causal models with hidden variables. Int. J. Approx. Reason. 49(2), 362–378 (2008)
Huang, A.: Joint estimation of the mean and error distribution in generalized linear models. J. Am. Stat. Assoc. 109(505), 186–196 (2014)
Hyvärinen, A., Smith, S.M.: Pairwise likelihood ratios for estimation of non-Gaussian structural equation models. J. Mach. Learn. Res. 14(1), 111–152 (2013)
Kline, R.B.: Principles and Practice of Structural Equation Modeling. Journal of the American Statistical Association, vol. 101, No. 12 (2006)
Loehlin, J.C.: Latent Variable Models: An Introduction to Factor, Path, and Structural Equation Analysis, vol. 12, 4th edn. Lawrence Erlbaum Associates, Mahwah (2004)
Moneta, A., Coad, A., Entner, D. et al.: Causal Inference by Independent Component Analysis with Applications to Micro- and Macroeconomic Data. Jena Economic Research Papers (2010-031) (2010)
Neuberg, L.G.: Causality: models, reasoning, and inference, by Judea Pearl, Cambridge University Press, 2000. Econ. Theory 19(4), 675–685 (2003)
Peters, J., Bühlmann, P., Meinshausen, N.: Causal inference using invariant prediction: identification and confidence intervals. Statistics 78(5), 947 (2015)
Ramsey, J.D., Sanchez-Romero, R., Glymour, C.: Non-Gaussian methods and high-pass filters in the estimation of effective connections. Neuroimage 84(1), 986–1006 (2014)
Rosenström, T., Jokela, M., Puttonen, S., et al.: Pairwise measures of causal direction in the epidemiology of sleep problems and depression. PLoS ONE 7(11), 154–159 (2012)
Shimizu, S., Hoyer, P.O., Hyvärinen, A., et al.: A linear non-Gaussian acyclic model for causal discovery. J. Mach. Learn. Res. 7(4), 2003–2030 (2006)
Shimizu, S., Hyvarinen, A., Kano, Y., et al.: Discovery of non-Gaussian linear causal models using ICA. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 526–533 (2012)
Shimizu, S., Inazumi, T., Sogawa, Y., et al.: DirectLiNGAM: a direct method for learning a linear non-Gaussian structural equation model. J. Mach. Learn. Res. 12(2), 1225 (2011)
Statnikov, A., et al.: New methods for separating causes from effects in genomics data. BMC Genomics 13(8), S22 (2012)
Stubbe, M., Gyurova, A., Gimsa, J.: Bayesian estimation of causal direction in acyclic structural equation models with individual-specific confounder variables and non-Gaussian distributions. J. Mach. Learn. Res. 15(10), 2629–2652 (2013)
Tabachnick, B.G., Fidell, L.S.: SAS for Windows workbook for Tabachnick and Fidell : using multivariate statistics: Allyn and Bacon (2001)
Zhang, J., Spirtes, P.L.: A transformational characterization of Markov equivalence for directed acyclic graphs with latent variables. In: Proceeings of the Conference on Uncertainty in Artificial Intelligence (2012)
Zhang, K., Hyvärinen, A.: On the identifiability of the post-nonlinear causal model. Conference on Uncertainty in Artificial Intelligence. AUAI Press, pp. 647-655 (2009)
Zhou, X.H., Guo, W.J.: Comparison on the sameness and difference of exploratory factor analysis and confirmatory factory analysis. Science Technology & Industry (2008)
Acknowledgements
We thank the editor and referees for comments that led to improvements in the manuscript. The research is supported by the National Natural Science Foundation of China (61573266).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Fei, N., Yang, Y. Estimating linear causality in the presence of latent variables. Cluster Comput 20, 1025–1033 (2017). https://doi.org/10.1007/s10586-017-0824-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-017-0824-5