Abstract
Estimating dependence relationships between variables is a crucial issue in many applied domains and in particular psychology. When several variables are entertained, these can be organized into a network which encodes their set of conditional dependence relations. Typically however, the underlying network structure is completely unknown or can be partially drawn only; accordingly it should be learned from the available data, a process known as structure learning. In addition, data arising from social and psychological studies are often of different types, as they can include categorical, discrete and continuous measurements. In this paper, we develop a novel Bayesian methodology for structure learning of directed networks which applies to mixed data, i.e., possibly containing continuous, discrete, ordinal and binary variables simultaneously. Whenever available, our method can easily incorporate known dependence structures among variables represented by paths or edge directions that can be postulated in advance based on the specific problem under consideration. We evaluate the proposed method through extensive simulation studies, with appreciable performances in comparison with current state-of-the-art alternative methods. Finally, we apply our methodology to well-being data from a social survey promoted by the United Nations, and mental health data collected from a cohort of medical students. R code implementing the proposed methodology is available at https://github.com/FedeCastelletti/bayes_networks_mixed_data.
Similar content being viewed by others
References
Andersson, S. A., Madigan, D., & Perlman, M. D. (1997). On the Markov equivalence of chain graphs, undirected graphs, and acyclic digraphs. Scandinavian Journal of Statistics, 24, 81–102.
Andrews, B., Ramsey, J., & Cooper, G. F. (2018). Scoring Bayesian networks of mixed variables. International Journal of Data Science and Analytics, 14, 2–18.
Andrews, B., Ramsey, J., & Cooper, G. F. (2019). Learning high-dimensional directed acyclic graphs with mixed data-types. In Proceedings of machine learning research, vol. 104 of Proceedings of Machine Learning Research. PMLR, pp. 4–21.
Barbieri, M. M., & Berger, J. O. (2004). Optimal predictive model selection. The Annals of Statistics, 32, 870–897.
Bhadra, A., Rao, A., & Baladandayuthapani, V. (2018). Inferring network structure in non-normal and mixed discrete-continuous genomic data. Biometrics, 74, 185–195.
Borsboom, D. (2008). Psychometric perspectives on diagnostic systems. Journal of Clinical Psychology, 64, 1089–1108.
Borsboom, D., Deserno, M. K., Rhemtulla, M., Epskamp, S., Fried, E. I., McNally, R. J., Robinaugh, D. J., Perugini, M., Dalege, G., Costantini, Jonasand, Isvoranu, A.-M., Wysocki, A. C., van Borkulo, C. D., van Bork, R., & Waldorp, L. J. (2021). Network analysis of multivariate data in psychological science. Nature Reviews Methods Primers, 1, 58.
Briganti, G., Scutari, M., & McNally, R. (2022). A tutorial on Bayesian networks for psychopathology researchers. Psychological Methods, Advance Online Publication NA–NA.
Cao, X., Khare, K., & Ghosh, M. (2019). Posterior graph selection and estimation consistency for high-dimensional Bayesian DAG models. The Annals of Statistics, 47, 319–348.
Carrard, V., Bourquin, C., Berney, S., Schlegel, K., Gaume, J., Bart, P.-A., Preisig, M., Mast, M. S., & Berney, A. (2022). The relationship between medical students’ empathy, mental health, and burnout: A cross-sectional study. Medical Teacher, 44, 1392–1399.
Castelletti, F., & Consonni, G. (2021). Bayesian causal inference in probit graphical models. Bayesian Analysis, 16, 1113–1137.
Castelletti, F., & Consonni, G. (2023). Bayesian graphical modeling for heterogeneous causal effects. Statistics in Medicine, 42, 15–32.
Castelletti, F., Consonni, G., Della Vedova, M., & Peluso, S. (2018). Learning Markov equivalence classes of directed acyclic graphs: An objective Bayes approach. Bayesian Analysis, 13, 1231–1256.
Castelletti, F., & Mascaro, A. (2022). BCDAG: An R package for Bayesian structure and causal learning of Gaussian DAGs. arXiv pre-printarXiv:2201.12003.
Castelletti, F., & Peluso, S. (2021). Equivalence class selection of categorical graphical models. Computational Statistics & Data Analysis, 164, 107304.
Cheng, J., Li, T., Levina, E., & Zhu, J. (2017). High-dimensional mixed graphical models. Journal of Computational and Graphical Statistics, 26, 367–378.
Chickering, D. M. (2002). Learning equivalence classes of Bayesian-network structures. Journal of Machine Learning Research, 2, 445–498.
Cowell, R. G., Dawid, P. A., Lauritzen, S. L., & Spiegelhalter, D. J. (1999). Probabilistic Networks and Expert Systems. Springer.
Cramer, A. O. J., Waldorp, L. J., Van der Maas, H. L. J., & Borsboom, D. (2010). Comorbidity: A network perspective. Behavioral and Brain Sciences, 33, 137–150.
Cui, R., Groot, P., & Heskes, T. (2018). Learning causal structure from mixed data with missing values using Gaussian copula models. Statistics and Computing, 29, 311–333.
Cui, R., Groot, P., & Heskes, T. (2016). Copula PC algorithm for causal discovery from mixed data. In P. Frasconi, N. Landwehr, G. Manco, & J. Vreeken (Eds.), Machine learning and knowledge discovery in databases (pp. 377–392). Springer International Publishing.
Dobra, A., & Lenkoski, A. (2011). Copula Gaussian graphical models and their application to modeling functional disability data. The Annals of Applied Statistics, 2A, 969–993.
Edwards, D. (2000). Introduction to graphical modelling. Springer.
Epskamp, S., Kruis, J., & Marsman, M. (2017). Estimating psychopathological networks: Be careful what you wish for. PLOS ONE, 12, 1–13.
Friedman, J., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9, 432–441.
Godsill, S. J. (2012). On the relationship between Markov chain monte Carlo methods for model uncertainty. Journal of Computational and Graphical Statistics, 10, 230–248.
Hambleton, R. K., & Cook, L. L. (1977). Latent trait models and their use in the analysis of educational test data. Journal of Educational Measurement, 14, 75–96.
Harris, N., & Drton, M. (2013). PC algorithm for nonparanormal graphical models. Journal of Machine Learning Research, 14, 3365–3383.
Haslbeck, J. M. B., & Waldorp, L. J. (2018). How well do network models predict observations? On the importance of predictability in network models. Behavior Research Methods, 50, 853–861.
He, Y., Zhang, X., Wang, P., & Zhang, L. (2017). High dimensional Gaussian copula graphical model with FDR control. Computational Statistics & Data Analysis, 113, 457–474.
Heckerman, D., Geiger, D., & Chickering, D. M. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20, 197–243.
Hoeting, J. A., Madigan, D., Raftery, A. E., & Volinsky, C. T. (1999). Bayesian model averaging: A tutorial. Statistical Science, 14, 382–401.
Hoff, P. (2007). Extending the rank likelihood for semiparametric copula estimation. The Annals of Applied Statistics, 1, 265–283.
Ickstadt, K., Bornkamp, B., Grzegorczyk, M., Wieczorek, J., Sheriff, M. R., Grecco, H. E., & Zamir, E. (2011). Nonparametric Bayesian networks. In J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith, & M. West (Eds.), Bayesian Statistics 9 (pp. 283–316). Oxford University Press.
Isvoranu, A., Epskamp, S., Waldorp, L. J., & Borsboom, D. E. (2022). Network psychometrics with R: A guide for behavioral and social scientists. Routledge.
Kalisch, M., & Bühlmann, P. (2007). Estimating high-dimensional directed acyclic graphs with the PC-algorithm. Journal of Machine Learning Research, 8, 613–636.
Lauritzen, S. L. (1996). Graphical models. Oxford University Press.
Lauritzen, S. L., & Wermuth, N. (1989). Graphical models for associations between variables, some of which are qualitative and some quantitative. The Annals of Statistics, 17, 31–57.
Lee, J., & Hastie, T. (2013). Structure learning of mixed graphical models. In Carvalho, C. M. & Ravikumar, P. (eds.), Proceedings of the sixteenth international conference on artificial intelligence and statistics, vol. 31 of Proceedings of Machine Learning Research. Scottsdale, Arizona, USA: PMLR, pp. 388–396.
Lee, K. H., Chen, Q., DeSarbo, W. S., & Xue, L. (2022). Estimating finite mixtures of ordinal graphical models. Psychometrika, 87, 83–106.
Liu, H., Lafferty, J., & Wasserman, L. (2009). The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. Journal of Machine Learning Research, 10, 2295–2328.
Maathuis, M., & Nandy, P. (2016). A review of some recent advances in causal inference. In P. Bühlmann, P. Drineas, M. Kane, & M. van der Laan (Eds.), Handbook of big data (pp. 387–408). Chapman and Hall/CRC.
Marsman, M., Huth, K., Waldorp, L. J., & Ntzoufras, I. (2022). Objective Bayesian edge screening and structure selection for Ising networks. Psychometrika, 87, 47–82.
Marsman, M., & Rhemtulla, M. (2022). Guest Editors’ introduction to the special issue “Network Psychometrics in action’’: methodological innovations inspired by empirical problems. Psychometrika, 87, 1–11.
Meinshausen, N., & Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. The Annals of Statistics, 34, 1436–1462.
Mohammadi, A., Abegaz, F., van den Heuvel, E., & Wit, E. C. (2017). Bayesian modelling of Dupuytren disease by using Gaussian copula graphical models. Journal of the Royal Statistical Society: Series C (Applied Statistics), 66, 629–645.
Moustaki, I., & Knott, M. (2000). Generalized latent trait models. Psychometrika, 65, 391–411.
Müller, D., & Czado, C. (2019). Dependence modelling in ultra high dimensions with vine copulas and the graphical lasso. Computational Statistics & Data Analysis, 137, 211–232.
Ni, Y., Baladandayuthapani, V., Vannucci, M., & Stingo, F. C. (2022). Bayesian graphical models for modern biological applications. Statistical Methods & Applications, 31, 197–225.
Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge University Press.
Peluso, S., & Consonni, G. (2020). Compatible priors for model selection of high-dimensional Gaussian DAGs. Electronic Journal of Statistics, 14, 4110–4132.
Peterson, C., Stingo, F. C., & Vannucci, M. (2015). Bayesian inference of multiple Gaussian graphical models. Journal of the American Statistical Association, 110, 159–174.
Rodríguez, A., Lenkoski, A., & Dobra, A. (2011). Sparse covariance estimation in heterogeneous samples. Electronic Journal of Statistics, 5, 981–1014.
Scott, J. G., & Berger, J. O. (2010). Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem. The Annals of Statistics, 38, 2587–2619.
Spearman, C. (1904). General intelligence, objectively determined and measured. The American Journal of Psychology, 15, 201–292.
Spirtes, P., Glymour, C., & Scheines, R. (2000). Causation, prediction and search (2nd edition) (pp. 1–16). The MIT Press.
Wirth, R., & Edwards, M. (2007). Item factor analysis: Current approaches and future directions. Psychological Methods, 12, 58–79.
Acknowledgements
We thank the Editor and three reviewers for constructive comments that helped improve the paper. We gratefully acknowledge valuable suggestions from Maarten Marsman (University of Amsterdam) during the revision of our manuscript. Work carried out within MUR-PRIN grant 2022 SMNNKY - CUP J53D23003870008, funded by the European Union - Next Generation EU. The views and opinions expressed are only those of the authors and do not necessarily reflect those of the European Union or the European Commission. Neither the European Union nor the European Commission can be held responsible for them. Partial support from UCSC (D1 and 2019-D.3.2 research grants) is also acknowledged.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 Comparison with Copula PC
In this section, we compare our methodology with the benchmark Copula PC method of Cui et al. (2016); see also Cui et al. (2018). Copula PC is a two-step approach which can be applied to mixed data comprising categorical (binary and ordinal), discrete and continuous variables. It first estimates a correlation matrix in the space of latent variables (each associated with one of the observed variables) which is then used to test conditional independencies as in the standard PC algorithm. For the first step, the same Gibbs sampling scheme introduced by Hoff (2007) and based on data augmentation with latent observations is adopted. Moreover, conditional independence tests are implemented at significance level \(\alpha \) which we vary in \(\{0.01,0.05,0.10\}\); lower values of \(\alpha \) imply a higher expected level of sparsity in the estimated graph. We refer to the three benchmarks as Copula PC 0.01, 0.05 and 0.10, respectively. Output of Copula PC is a completed partially directed acyclic graph (CPDAG) representing the estimated equivalence class. With regard to our method, we also consider as a single graph estimate summarizing our MCMC output the CPDAG representing the equivalence class of the estimated median probability DAG model. Each model estimate is finally compared with the true CPDAG by means of the SHD between the two graphs.
Results for Scenario Free, type of variables Binary, Ordinal, Count, Mixed and each sample size \(n\in \{100,200,500,1000,2000\}\) are summarized in Fig. 9 which reports the distribution across \(N=40\) simulations of the SHD. It first appears that all methods improve their performances as the sample size n increases. In addition, structure learning is more difficult in the Binary case, while easier in general in the case of Ordinal and Count data. Moreover, Copula PC 0.01 (light grey) performs better than Copula PC 0.05 and 0.10 (middle and dark gray, respectively). Our method clearly outperforms the three benchmarks in the Binary scenario, a behavior which is more evident for large sample sizes. In addition, it performs better than Copula PC 0.05 and 0.10 most of the time under the remaining settings and remains highly competitive with Copula PC 0.01, with an overall better performance in terms of average SHD under almost all sample sizes for Scenarios Ordinal and Count.
1.2 Comparison with Bayesian Parametric Strategy
Our methodology is based on a semi-parametric strategy which models separately the dependence parameter, corresponding to a DAG-dependent covariance matrix, and the marginal distributions of the observed variables, which are estimated using a rank-based nonparametric approach.
Alternatively, one can adopt suitable parametric families for modeling the various mixed types of variables, as in a generalized linear model (glm) framework. To implement this parametric strategy, we generalize the latent Gaussian DAG-model in (1) to accommodate a nonzero marginal mean for the latent variables. Specifically, we assume
with \(\varvec{\mu }\in \mathbb {R}^{q}\) and \(\varvec{\Omega }\in \mathcal {P}_{\mathcal {D}}\), the space of all s.p.d. precision matrices Markov w.r.t. DAG \(\mathcal {D}\). The allied structural equation model (SEM) representation of such model is given by \(\varvec{\eta }+\varvec{L}^\top \varvec{z}= \varvec{\varepsilon }, \varvec{\varepsilon }\sim \mathcal {N}_q(\varvec{0},\varvec{D})\), or equivalently, in terms of node-distributions
for each \(j=1,\ldots ,q\) with \( \varvec{D}_{jj} = \varvec{\Sigma }_{j\,\vert \,\text {pa}_{\mathcal {D}}(j)}, \varvec{L}_{\prec j \, ]} = -\varvec{\Sigma }^{-1}_{\prec j \succ }\varvec{\Sigma }_{\prec j \, ]}, \eta _j = \mu _j + \varvec{L}^{\top }_{\prec j \, ]} \varvec{\mu }_{\text {pa}_{\mathcal {D}}(j)}. \) Importantly, each equation in (22) now resembles the structure of a linear “regression” model with a nonzero intercept term \(\eta _j\). A Normal-DAG-Wishart prior can be then assigned to \((\varvec{\eta },\varvec{D},\varvec{L})\); see Castelletti and Consonni (2023, Supplement, Section 1) for full details. Under such prior, the posterior distribution of \((\varvec{\eta },\varvec{D},\varvec{L})\) given independent (latent) Gaussian data \(\varvec{Z}\) is still Normal-DAG-Wishart and also a marginal data distribution is available in closed-from expression. Therefore, we can adapt the MCMC scheme of Sect. 4 to this more general framework and specifically with the update of \((\varvec{D},\varvec{L},\mathcal {D})\) in Sect. 4.1 replaced by \((\varvec{\eta }, \varvec{D},\varvec{L},\mathcal {D})\).
Consider now the observed variables \(X_1,\ldots ,X_q\), where each \(X_j\sim F_j(\cdot )\), a suitably specified parametric family for \(X_j\), e.g., Bernoulli, Poisson, or Binomial; see also Sect. 4.1. As in a glm framework, we assume that
where \(h^{-1}(\cdot )\) is a suitable inverse-link function and it appears that \(\eta _j - \varvec{L}_{\prec j \, ]}^\top \varvec{z}_{\text {pa}_{\mathcal {D}}(j)}\) plays the role of the linear predictor in the glm model for \(X_j\). Specifically, we take \(h(\cdot )=\text {logit}(\cdot )\) and \(h(\cdot )=\log (\cdot )\) for \(X_j \sim \text {Bern}(\pi _j)\) and \(X_j\sim \text {Pois}(\lambda _j)\), respectively. Moreover, for \(X_j \sim \text {Bin}(n_j,\pi _j)\) we take \(h(\pi _j)=\text {logit}(\pi _j)\) while fix \(n_j=\max \{x_{i,j}, i=1,\ldots ,n\}\). From (5) we then have \( Z_j = \Phi ^{-1}\left\{ F_j(X_j\,\vert \,\varvec{z}_{\text {pa}_{\mathcal {D}}(j)})\right\} , \) with \(\Phi (\cdot )\) the standard normal c.d.f. and with \(F_j\) implicitly depending on DAG parameters \((\eta _j,\varvec{L}_{\prec j \, ]})\) through (23). The update of \(\varvec{Z}\) in Sect. 4.2 conditionally on the DAG parameters is then replaced by computing \(z_{i,j}=\Phi ^{-1}\left\{ F_j(x_{i,j}\,\vert \,\varvec{z}_{\text {pa}_{\mathcal {D}}(j)})\right\} \) iteratively for each \(i=1,\ldots ,n\) and \(j=1,\ldots ,q\).
We consider the same simulation settings as in the Balanced Scenario, with the four different types of variables and with class of DAG structure Free; see Sect. 5.1. We compare the performance of the parametric strategy introduced above with our original method. Specifically, from the MCMC output provided by each method we first recover a CPDAG estimate and compare true and estimated graphs in terms of structural Hamming distance (SHD); see also Sect. 5.2 for details.
Results are summarized in the box-plots of Fig. 10, representing the distribution of SHD (across the 40 independent simulations) obtained from our original method (light blue) and its parametric version (dark blue) under the various scenarios. It appears that the parametric “version” of our method outperforms our original semi-parametric model in the Binary Scenario, while it is clearly outperformed under all the other scenarios for small-to-moderate sample sizes; however, the two approaches tend to perform similarly as the sample size n increases.
1.3 MCMC Diagnostics of Convergence and Computational Time
Our methodology relies on Markov Chain Monte Carlo (MCMC) methods to approximate the posterior distribution of the parameters. Accordingly, diagnostics of convergence of the resulting MCMC output to the target distribution should be implemented before posterior analysis. In the following, we include a few results relative to the application of our method to the well-being data presented in Sect. 6.1.
As a first diagnostic tool, we monitor the behavior of the estimated posterior expectation of each correlation coefficient across iterations. Each quantity is computed at MCMC iteration s using the sampled values collected up to step s, for \(s=1,\ldots ,\)25,000. According to the results, reported for selected variables \((X_u,X_v)\) in Fig. 11, we discard the initial \(B=5000\) draws that are therefore used as a burnin period. The behavior of each traceplot suggests for each parameter an appreciable degree of convergence to the posterior mean.
As a further diagnostic, we run two independent MCMC chains of length \(S=\) 25,000, again including a burnin period of \(B=5000\) runs, and with randomly chosen DAGs for the MCMC initialization. Results in terms of estimated posterior probabilities of edge inclusion computed from the two MCMC chains are reported in the heatmaps of Fig. 12 and suggest a visible agreement between the two outputs.
Finally, we investigate the computational time of our algorithm as a function of the number of variables q and sample size n. Plots in Fig. 13 summarize the behavior of the running time (averaged over 40 replicates) per iteration, as a function of \(q\in \{5,10,20,50,100\}\) for \(n = 500\), and as a function of \(n\in \{50,100,200,500,1000\}\) for \(q = 20\). Results were obtained on a PC Intel(R) Core(TM) i7-8550U 1,80 GHz.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Castelletti, F. Learning Bayesian Networks: A Copula Approach for Mixed-Type Data. Psychometrika (2024). https://doi.org/10.1007/s11336-024-09969-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11336-024-09969-2