This is an interesting paper that distils structure learning in Bayesian networks (BNs) and kernel methods in a quest to produce more flexible distributional assumptions. Conditional (linear) Gaussian Bayesian networks (CGBNs) have been well explored in the literature for some time, to the point that they now appear in many recent textbooks (Koller and Friedman 2009; Scutari and Denis 2021; Kjærluff and Madsen 2013). The authors address one of the key limitations of CGBNS that they can only capture linear dependencies between the continuous variables they contain and remove it by replacing (mixtures of) linear regression models with more general kernel densities. Dependencies between discrete variables were already flexible, because the conditional probability tables that parametrise them essentially act as a saturated model (Rijmen 2008). It is not obvious that more flexibility will produce better models for whatever task we have in mind: it can also lead to overfitting, instability and hyperparameter tuning problems. However, the accuracy of reconstruction demonstrated by the proposed Hybrid Semiparametric BNs (HSPBNs) is encouraging.
Like all good papers, it raises interesting questions in its construction.
How to measure structural distances? Common distributional assumptions in BNs, including CGBNs, assign the same type of distribution to a node in all possible network structures. Therefore, the presence of an arc denotes the same general type of dependence in all possible structures as well. However, this is no longer the case in HSPBNs because continuous nodes can have either parametric or nonparametric characterisations for the same arcs. The authors acknowledge this by complementing the Structural Hamming Distance (SHD; Tsamardinos et al. 2006) with a Type Hamming Distance (THD) in their experimental evaluation. Should we combine them in a single measure by adding colours to the arcs and extending SHD to count different colours as errors? And how should we weight such errors compared to false positive and false negative arcs? Then, there is also the question of whether we should update the definition of equivalence classes: it is used in constructing SHD and it has wide implications in our interpretation of BNs. In a CGBNs, continuous variables are assumed to be jointly distributed as a multivariate normal: that ensures that arcs that are not compelled can be oriented in either direction while producing networks in the same equivalence class. It is not obvious that this is the case with nonparametric nodes. We would also assume that any single node must have the same distribution in all the BNs in the same equivalence class, which means extending its characterisation as well to consider arc colours.
How to implement constraint-based learning with kernel-based nonparametric local distributions? Since the heuristic algorithms used in constraint-based learning are distribution-agnostic, the question becomes how to define a suitable conditional independence test. The answer is not trivial because we cannot easily construct likelihood-ratio tests from the likelihoods used in the cross-validated score \(S^k_\mathrm {CV}(\mathcal {D}, \mathcal {G})\). Firstly, those likelihoods are only defined for continuous variables conditional on discrete ones which is problematic when testing the independence of discrete variables conditional on continuous ones. Secondly, it is unclear what the degrees of freedom of the test would be: computing effective degrees of freedom from the kernel transform is possible (Hastie et al. 2009), but not obviously appropriate. One option is to look for inspiration at those existing kernel tests that have been extended to test conditional independence like the Hilbert–Schmidt independence criterion (HSIC; Gretton et al. 2008; Doran et al. 2014).
A second option would be to look at BN structure learning approaches such as Handhayani and Cussens (2020) that use kernels for both continuous and discrete variables. The resulting flexibility in defining the nature of conditional independence relationships in the BN could also address the remaining limitation of CGBNs that is still present in HSPBNs: removing the constraint that continuous nodes cannot be parents of discrete nodes. The impact of this limitation cannot be overstated: it prevents CGBNs and HSPBNs from being used as causal models in the general case because the direction of arcs connecting discrete and continuous variables is fixed and has nothing to do with the cause–effect relationship present in the data we learn such BNs from. In turn, the directions of adjacent arcs may not necessarily reflect cause–effect relationships either because of the cascading effects of incorrect arc inclusions in structure learning.
References
Doran G, Muandet K, Zhang K, Schölkopf B (2014) Permutation-based Kernel conditional independence test. In: Zhang NL, Tian J (eds) Uncertainty in artificial intelligence. AUAI Press pp 132–141
Gretton A, Fukumizu K, Teo C-H, Song L, Schölkopf B, Smola AJ (2008) A Kernel statistical test of independence. In: Platt J, Koller D, Singer Y, Roweis S (eds) Advances in neural information processing systems. MIT Press pp 585–592
Handhayani T, Cussens J (2020) Kernel-based approach for learning causal graphs from mixed data. Proc Mach Learn Res 138:221–232
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer
Kjærluff UB, Madsen AL (2013) Bayesian networks and influence diagrams: a guide to construction and analysis, 2nd edn. Springer, New York
Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT Press, Cambridge
Rijmen F (2008) Bayesian networks with a logistic regression model for the conditional probabilities. Int J Approx Reason 48(2):659–666
Scutari M, Denis J-B (2021) Bayesian networks with examples in R, 2nd edn. Chapman & Hall, Boca Raton
Tsamardinos I, Brown LE, Aliferis CF (2006) The Max–Min Hill–Climbing Bayesian network structure learning algorithm. Mach Learn 65(1):31–78
Funding
Open access funding provided by SUPSI - University of Applied Sciences and Arts of Southern Switzerland
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This comment refers to the invited paper available at: https://doi.org/10.1007/s11749-022-00812-3
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Scutari, M. Comments on: Hybrid semiparametric Bayesian networks. TEST 31, 328–330 (2022). https://doi.org/10.1007/s11749-022-00815-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-022-00815-0