Abstract
Understanding causal mechanisms is a central goal in the behavioral, developmental, and social sciences. When estimating and probing causal effects using observational data, covariate adjustment is a crucial element to remove dependencies between focal predictors and the error term. Covariate selection, however, constitutes a challenging task because availability alone is not an adequate criterion to decide whether a covariate should be included in the statistical model. The present study introduces a non-Gaussian method for covariate selection and provides a forward selection algorithm for linear models (i.e., non-Gaussian forward selection; nGFS) to select appropriate covariates from a set of potential control variables to avoid inconsistent and biased estimators of the causal effect of interest. Further, we demonstrate that the forward selection algorithm has properties compatible with principles of direction of dependence, i.e., probing whether the causal target model is correctly specified with respect to the causal direction of effects. Results of a Monte Carlo simulation study suggest that the selection algorithm performs well, in particular when sample sizes are large (i.e., n ≥ 250) and data strongly deviate from Gaussianity (e.g., distributions with skewness beyond 1.5). An empirical example is given for illustrative purposes.
This is a preview of subscription content,
to check access.






Similar content being viewed by others
Data Availability
An artificial dataset and the algorithm code are available at https://osf.io/qwr69/.
References
Akkuş, K., & Peker, M. (2022). Exploring the relationship between interpersonal emotion regulation and social anxiety symptoms: The mediating role of negative mood regulation expectancies. Cognitive Therapy and Research, 46(2), 287–301. https://doi.org/10.1007/s10608-021-10262-0
Amemiya, T. (1977). Some theorems in the linear probability model. International Economic Review, 645–650. https://doi.org/10.2307/2525953
Beck, N. (2020). Estimating grouped data models with a binary-dependent variable and fixed effects via a logit versus a linear probability model: The impact of dropped units. Political Analysis, 28(1), 139–145. https://doi.org/10.1017/pan.2019.20
Bellemare, M. F., Masaki, T., & Pepinsky, T. B. (2017). Lagged explanatory variables and the estimation of causal effect. The Journal of Politics, 79(3), 949–963. https://doi.org/10.1086/690946
Blanca, M. J., Arnau, J., López-Montiel, D., Bono, R., & Bendayan, R. (2013). Skewness and kurtosis in real data samples. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 9(2), 78–84. https://doi.org/10.1027/1614-2241/a000057
Brys, G., Hubert, M., & Struyf, A. (2004). A robust measure of skewness. Journal of Computational and Graphical Statistics, 13(4), 996–1017. https://doi.org/10.1198/106186004X12632
Cain, M. K., Zhang, Z., & Yuan, K.-H. (2017). Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation. Behavior Research Methods, 49(5), 1716–1735. https://doi.org/10.3758/s13428-016-0814-1
Caudill, S. B. (1988). An advantage of the linear probability model over probit or logit. Oxford Bulletin of Economics and Statistics, 50(4), 425–427.
Chew, Q. H., Chia, F.L.-A., Ng, W. K., Lee, W. C. I., Tan, P. L. L., Wong, C. S., Puah, S. H., Shelat, V. G., Seah, E.-J.D., Huey, C. W. T., Phua, E. J., & Sim, K. (2020). Perceived stress, stigma, traumatic stress levels and coping responses amongst residents in training across multiple specialties during COVID-19 pandemic—A longitudinal study. International Journal of Environmental Research and Public Health, 17(18), 6572. https://doi.org/10.3390/ijerph17186572
Chickering, D. M. (2002). Optimal structure identification with greedy search. Journal of Machine Learning Research, 3, 507–554. https://doi.org/10.1162/153244303321897717
Cook, T. D. (2002). Randomized experiments in educational policy research: A critical examination of the reasons the educational evaluation community has offered for not doing them. Educational Evaluation and Policy Analysis, 24(3), 175–199. https://doi.org/10.3102/01623737024003175
Darmois, G. (1953). Analyse générale des liaisons stochastiques: etude particulière de l’analyse factorielle linéaire [General analysis of stochastic links: a particular study of linear factor analysis]. Revue de l’Institut international de statistique [Journal of the International Statistical Institute], 21(1/2), 2–8. https://doi.org/10.2307/1401511
Dehaene, S., & Cohen, L. (1998). Levels of representation in number processing. In Handbook of neurolinguistics (pp. 331–341). Academic Press.
Elwert, F., & Winship, C. (2014). Endogenous selection bias: The problem of conditioning on a collider variable. Annual Review of Sociology, 40, 31–53. https://doi.org/10.1146/annurev-soc-071913-043455
Entner, D., Hoyer, P., & Spirtes, P. (2012). Statistical test for consistent estimation of causal effects in linear non-Gaussian models. In Artificial Intelligence and Statistics (pp. 364–372). Proceedings of Machine Learning Research. Retrieved from http://proceedings.mlr.press/v22/entner12/entner12.pdf
Ewert, A., & Sibthorp, J. (2009). Creating outcomes through experiential education: The challenge of confounding variables. Journal of Experiential Education, 31(3), 376–389. https://doi.org/10.1177/105382590803100305
Ferguson, K. D., McCann, M., Katikireddi, S. V., et al. (2020). Evidence synthesis for constructing directed acyclic graphs (ESC-DAGs): A novel and systematic method for building directed acyclic graphs. Int J Epidemiol, 49, 322–329. https://doi.org/10.1093/ije/dyz220
Foster, E. M. (2010). Causal inference and developmental psychology. Developmental Psychology, 46(6), 1454–1480. https://doi.org/10.1037/a0020204
Garreau, D. (2017). Asymptotic normality of the median heuristic. arXiv preprint. arXiv:1707.07269[math.ST]. https://doi.org/10.48550/arXiv.1707.07269
Greenland, S. (1989). Modeling and variable selection in epidemiologic analysis. American Journal of Public Health, 79(3), 340–349. https://doi.org/10.2105/AJPH.79.3.340
Greenland, S. (2003). Quantifying biases in causal models: Classical confounding vs collider-stratification bias. Epidemiology, 14(3), 300–306. https://doi.org/10.1097/01.EDE.0000042804.12056.6C
Greenland, S., Pearl, J., & Robins, J. M. (1999). Causal diagrams for epidemiologic research. Epidemiology (Cambridge, Mass.), 10(1), 37–48.
Gretton, A., Fukumizu, K., Teo, C., Song, L., Schölkopf, B., & Smola, A. (2008). A kernel statistical test of independence. Advances in Neural Information Processing Systems, 20, 585–592.
Guay, F., Marsh, H. W., & Boivin, M. (2003). Academic self-concept and academic achievement: Developmental perspectives on their causal ordering. Journal of Educational Psychology, 95(1), 124–136. https://doi.org/10.1037/0022-0663.95.1.124
Hamaker, E. L., Kuiper, R. M., & Grasman, R. P. (2015). A critique of the cross-lagged panel model. Psychological methods, 20(1), 102–116. https://doi.org/10.1037/a0038889
Heinze, G., Wallisch, C., & Dunkler, D. (2018). Variable selection–a review and recommendations for the practicing statistician. Biometrical Journal, 60(3), 431–449. https://doi.org/10.1002/bimj.201700067
Hermann, E., Eisend, M., & Bayón, T. (2020). Facebook and the cultivation of ethnic diversity perceptions and attitudes. Internet Research, 30(4), 1123–1141. https://doi.org/10.1108/INTR-10-2019-0423
Ho, A. D., & Yu, C. C. (2015). Descriptive statistics for modern test score distributions: Skewness, kurtosis, discreteness, and ceiling effects. Educational and Psychological Measurement, 75(3), 365–388. https://doi.org/10.1177/0013164414548576
Hofert, M., Kojadinovic, I., Maechler, M., Yan, J. (2023). copula: Multivariate Dependence with Copulas. R package version 1.1–2. https://CRAN.R-project.org/package=copula
Horrace, W. C., & Oaxaca, R. L. (2006). Results on the bias and inconsistency of ordinary least squares for the linear probability model. Economics letters, 90(3), 321–327. https://doi.org/10.1016/j.econlet.2005.08.024
Hoyer, P. O., Shimizu, S., Kerminen, A. J., & Palviainen, M. (2008). Estimation of causal effects using linear non-Gaussian causal models with hidden variables. International Journal of Approximate Reasoning, 49(2), 362–378. https://doi.org/10.1016/j.ijar.2008.02.006
Huang, F. L. (2022). Alternatives to logistic regression models in experimental studies. The Journal of Experimental Education, 90(1), 213–228. https://doi.org/10.1080/00220973.2019.1699769
Hyvärinen, A., Karhunen, J., & Oja, E. (2001). Independent component analysis. Wiley & Sons.
Hyvärinen, A., Zhang, K., Shimizu, S., & Hoyer, P. O. (2010). Estimation of a structural vector autoregression model using non-Gaussianity. Journal of Machine Learning Research, 11, 1709–1731. https://doi.org/10.5555/1756006.1859907
Kim, D., & Kim, J. M. (2014). Analysis of directional dependence using asymmetric copula-based regression models. Journal of Statistical Computation and Simulation, 84(9), 1990–2010. https://doi.org/10.1080/00949655.2013.779696
Kim, T. H., & White, H. (2004). On more robust estimation of skewness and kurtosis. Finance Research Letters, 1(1), 56–73. https://doi.org/10.1016/S1544-6123(03)00003-5
Kim, Y., Kim, T. H., & Ergün, T. (2015). The instability of the Pearson correlation coefficient in the presence of coincidental outliers. Finance Research Letters, 13, 243–257. https://doi.org/10.1016/j.frl.2014.12.005
Koller, I., & Alexandrowicz, R. W. (2010). A psychometric analysis of ZAREKI-R using rasch models. Diagnostica, 56(2), 57–67. https://doi.org/10.1026/0012-1924/a000003
Krempel, R., Schleicher, D., Jarvers, I., Ecker, A., Brunner, R., & Kandsperger, S. (2022). Sleep quality and neurohormonal and psychophysiological accompanying factors in adolescents with depressive disorders: Study protocol. BJPsych Open, 8(2), e57. https://doi.org/10.1192/bjo.2022.29
Li, X., Bergin, C., Olsen, A. A. (2022). Positive teacher-student relationships may lead to better teaching. Learning and Instruction, 80, 101581. https://doi.org/10.1016/j.learninstruc.2022.101581
Li, X., & Wiedermann, W. (2020). Conditional direction dependence analysis: Evaluating the causal direction of effects in linear models with interaction terms. Multivariate Behavioral Research, 55(5), 786–810. https://doi.org/10.1080/00273171.2019.1687276
Maeda, T. N., & Shimizu, S. (2022). Repetitive causal discovery of linear non-Gaussian acyclic models in the presence of latent confounders. International Journal of Data Science and Analytics, 13(2), 77–89. https://doi.org/10.1007/s41060-021-00282-0
Marszalek, J. M., Barber, C., Kohlhart, J., & Cooper, B. H. (2011). Sample Size in Psychological Research over the Past 30 Years. Perceptual and Motor Skills, 112(2), 331–348. https://doi.org/10.2466/03.11.PMS.112.2.331-348
McCullagh, P., & Nelder, A. (1989). Generalized linear models (2nd ed.). Chapman & Hall.
Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105(1), 156–166. https://doi.org/10.1037/0033-2909.105.1.156
Nelsen, R. B. (2006). An introduction to copulas (2nd ed.). Springer.
Pearl. (1993). Comment: graphical models causality and intervention. Statistical Science, 8(3), 266–269. https://doi.org/10.1214/ss/1177010894
Pearl, J. (2009). Causality: Models, Reasoning, and Inference. Cambridge University Press.
Peters, J., Mooij, D., Janzing, D., & Scholkopf, B. (2014). Causal discovery with continuous additive noise models. Journal of Machine Learning Research, 15, 2009–2053.
R Core Team. (2023). R: A language and environment for statistical computing. R Foundation for Statistical Computing.
Sauer, B., VanderWeele, T. J., (2013). Use of directed acyclic graphs. In P. Velentgas, N. A. Dreyer, P. Nourjah (eds.), Developing a Protocol for Observational Comparative Effectiveness Research: A User’s Guide, Agency for Healthcare Research and Quality (US)
Sen, A., & Sen, B. (2014). Testing independence and goodness-of-fit in linear models. Biometrika, 101(4), 927–942. https://doi.org/10.1093/biomet/asu026
Shimizu, S., Hoyer, P. O., Hyvärinen, A., & Kerminen, A. (2006). A linear non-Gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7(72), 2003–2030. https://doi.org/10.5555/1248547.1248619
Shimizu, S., Inazumi, T., Sogawa, Y., Hyvärinen, A., Kawahara, Y., Washio, T., Hoyer, P. O., & Bollen, K. (2011). DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model. Journal of Machine Learning Research, 12(33), 1225–1248. https://doi.org/10.5555/1953048.2021040
Skitovich, V. P. (1953). On a property of the normal distribution. DAN SSSR, 89, 217–219.
Spirtes, P., Glymour, C. N., Scheines, R., & Heckerman, D. (2000). Causation, prediction, and search. MIT Press.
Sungur, E. A. (2005). A note on directional dependence in regression setting. Communications in Statistics-Theory and Methods, 34(9–10), 1957–1965. https://doi.org/10.1080/03610920500201228
Tennant, P. W., Murray, E. J., Arnold, K. F., Berrie, L., Fox, M. P., Gadd, S. C., ..., Ellison, G. T. (2021). Use of directed acyclic graphs (DAGs) to identify confounders in applied health research: review and recommendations. International journal of Epidemiology, 50(2), 620–632. https://doi.org/10.1093/ije/dyaa213
von Aster, M. G., & Shalev, R. S. (2007). Number development and developmental dyscalculia. Developmental Medicine and Child Neurology, 49(11), 868–873. https://doi.org/10.1111/j.1469-8749.2007.00868.x
von Aster, M., WeinholdZulauf, M., & Horn, R. (2006). Testbatterie fuer Zahlenverarbeitung und Rechnen bei Kindern (ZAREKI-R) [Neuropsychological test battery for number processing and calculation in children]. Harcourt Test Services.
Weinberg, C. R. (1993). Toward a clearer definition of confounding. American Journal of Epidemiology, 137(1), 1–8. https://doi.org/10.1093/oxfordjournals.aje.a116591
Wiedermann, W., & Li, X. (2018). Direction dependence analysis: A framework to test the direction of effects in linear models with an implementation in SPSS. Behavior Research Methods, 50(4), 1581–1601. https://doi.org/10.3758/s13428-018-1031-x
Wiedermann, W., & Li, X. (2020). Confounder detection in linear mediation models: performance of kernel-based tests of independence. Behavior Research Methods, 52(1), 342–359. https://doi.org/10.3758/s13428-019-01230-4
Wiedermann, W., & Sebastian, J. (2020a). Direction dependence analysis in the presence of confounders: Applications to linear mediation models using observational data. Multivariate Behavioral Research, 55(4), 495–515. https://doi.org/10.1080/00273171.2018.1528542
Wiedermann, W., & Sebastian, J. (2020b). Sensitivity analysis and extensions of testing the causal direction of dependence: A rejoinder to Thoemmes. Multivariate Behavioral Research, 55(4), 523–530. https://doi.org/10.1080/00273171.2019.1659127
Wiedermann, W., & von Eye, A. (2015a). Direction-dependence analysis: A confirmatory approach for testing directional theories. International Journal of Behavioral Development, 39(6), 570–580. https://doi.org/10.1177/0165025415582056
Wiedermann, W., & von Eye, A. (2015b). Direction of effects in mediation analysis. Psychological Methods, 20, 221–244. https://doi.org/10.1037/met0000027
Wiedermann, W., & von Eye, A. (2016). Testing directionality of effects in causal mediation analysis. In W. Wiedermann & A. von Eye (Eds.), Statistics and Causality: Methods for applied empirical research (pp. 63–106). Wiley & Sons.
Wiedermann, W., Artner, R., & von Eye, A. (2017). Heteroscedasticity as a basis of direction dependence in reversible linear regression models. Multivariate Behavioral Research, 52, 222–241. https://doi.org/10.1080/00273171.2016.1275498
Wiedermann, W., Reinke, W., & Herman, K. (2020). Prosocial skills causally mediate the relation between effective classroom management and academic competence: An application of Direction Dependence Analysis. Developmental Psychology, 56(9), 1723–1735. https://doi.org/10.1037/dev0001087
Winship, C., & Morgan, S. L. (1999). The estimation of causal effects from observational data. Annual Review of Sociology, 25(1), 659–706.
Yan, J. (2007). Enjoy the Joy of Copulas: With a Package copula. Journal of Statistical Software, 21(4), 1–21. https://doi.org/10.18637/jss.v021.i04
Zhang, J. (2008). On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. Artificial Intelligence, 172, 1873–1896. https://doi.org/10.1016/j.artint.2008.08.001
Acknowledgement
The authors are indebted to Dr. Ingrid Koller for providing the data used for illustrative purposes.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, B., Wiedermann, W. Covariate selection in causal learning under non-Gaussianity. Behav Res (2023). https://doi.org/10.3758/s13428-023-02217-y
Accepted:
Published:
DOI: https://doi.org/10.3758/s13428-023-02217-y