Permutation Testing Improves Bayesian Network Learning

Tsamardinos, Ioannis; Borboudakis, Giorgos

doi:10.1007/978-3-642-15939-8_21

Ioannis Tsamardinos²³ &
Giorgos Borboudakis²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6323))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

4116 Accesses
14 Citations

Abstract

We are taking a peek “under the hood” of constraint-based learning of graphical models such as Bayesian Networks. This mainstream approach to learning is founded on performing statistical tests of conditional independence. In all prior work however, the tests employed for categorical data are only asymptotically-correct, i.e., they converge to the exact p-value in the sample limit. In this paper we present, evaluate, and compare exact tests, based on standard, adjustable, and semi-parametric Monte-Carlo permutation testing procedures appropriate for small sample sizes. It is demonstrated that (a) permutation testing is calibrated, i.e, the actual Type I error matches the significance level α set by the user; this is not the case with asymptotic tests, (b) permutation testing leads to more robust structural learning, and (c) permutation testing allows learning networks from multiple datasets sharing a common underlying structure but different distribution functions (e.g. continuous vs. discrete); we name this problem the Bayesian Network Meta-Analysis problem. In contrast, asymptotic tests may lead to erratic learning behavior in this task (error increasing with total sample-size). The semi-parametric permutation procedure we propose is a reasonable approximation of the basic procedure using 5000 permutations, while being only 10-20 times slower than the asymptotic tests for small sample sizes. Thus, this test should be practical in most graphical learning problems and could substitute asymptotic tests. The conclusions of our studies have ramifications for learning not only Bayesian Networks but other graphical models too and for related causal-based variable selection algorithms, such as HITON. The code is available at mensxmachina.org.

Download to read the full chapter text

Chapter PDF

Scoring Bayesian networks of mixed variables

Article 11 January 2018

A survey of Bayesian Network structure learning

Article Open access 17 January 2023

Relating the Partial Dependence Plot and Permutation Feature Importance to the Data Generating Process

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Agresti, A.: Categorical Data Analysis, 2nd edn. Wiley Series in Probability and Statistics. Wiley-Interscience, Hoboken (2002)
MATH Google Scholar
Agresti, A.: A survey of exact inference for contingency tables. Statistical Science 7(1), 131–153 (1992)
Article MATH MathSciNet Google Scholar
Aliferis, C.F., et al.: Local causal and markov blanket induction for causal discovery and feature selection for classification part i: Algorithms and empirical evaluation. JMLR 11, 171–234 (2010)
Google Scholar
Beinlich, I., Suermondt, G., Chavez, R., Cooper, G.: The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks. Artificial Intelligence in Medicine, 247–256 (1989)
Google Scholar
Frank, E., Witten, I.H.: Using a permutation test for attribute selection in decision trees. In: ICML, pp. 152–160. Morgan Kaufmann, San Francisco (1998)
Google Scholar
Good, P.: Permutation, Parametric, and Bootstrap Tests of Hypotheses, 3rd edn. Springer Series in Statistics. Springer, Heidelberg (2004)
Google Scholar
Hong, F., Breitling, R.: A comparison of meta-analysis methods for detecting differentially expressed genes in microarray experiments. Bioinformatics 24, 374–382 (2008)
Article Google Scholar
Jensen, D.: Induction with Randomization Testing: Decision-Oriented Analysis of Large Data Sets. Ph.D. thesis, Washington University (1992)
Google Scholar
Jensen, D., Neville, J.: Randomization tests for relational learning. Tech. Rep. 03-05, Department of Computer Science, University of Massachusetts Amherst (2003)
Google Scholar
Mehta, C.P.: Statxact: A statistical package for exact nonparametric inference. The American Statistician 45, 74–75 (1991)
Article Google Scholar
Neville, J., Jensen, D., Friedland, L., Hay, M.: Learning relational probability trees. In: 9th ACM SIGKDD (2003)
Google Scholar
Richardson, T., Spirtes, P.: Ancestral graph markov models. Annals of Statistics 30(4), 962–1030 (2002)
Article MATH MathSciNet Google Scholar
Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search, 2nd edn. MIT Press, Cambridge (2000)
Google Scholar
Tillman, R.E.: Structure learning with independent non-identically distributed data. In: 26th International Conference on Machine Learning, ICML 2009 (2009)
Google Scholar
Tillman, R.E., Danks, D., Glymour, C.: Integrating locally learned causal structures with overlapping variables. In: NIPS (2008)
Google Scholar
Triantafyllou, S., Tsamardinos, I., Tollis, I.G.: Learning causal structure from overlapping variable sets. In: AI and Statistics (2010)
Google Scholar
Tsamardinos, I., Brown, L., Aliferis, C.: The Max-Min Hill-Climbing Bayesian Network Structure Learning Algorithm. Machine Learning 65(1), 31–78 (2006)
Article Google Scholar
Tsamardinos, I., Triantafyllou, S.: The possibility of integrative causal analysis: Learning from different datasets and studies. Journal of Engineering Intelligent Systems (to appear, 2010)
Google Scholar
Tsamardinos, I., Brown, L.E.: Bounding the false discovery rate in local bayesian network learning. In: AAAI, pp. 1100–1105 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, University of Crete and, Institute of Computer Science, Foundation for Research and Technology, Hellas
Ioannis Tsamardinos & Giorgos Borboudakis

Authors

Ioannis Tsamardinos
View author publications
You can also search for this author in PubMed Google Scholar
Giorgos Borboudakis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Departamento de Matemáticas, Estadística y Computación, Universidad de Cantabria, Avenida de los Castros, s/n, 39071, Santander, Spain
José Luis Balcázar
Yahoo! Research Barcelona, Avinguda Diagonal 177, 08018, Barcelona, Spain
Francesco Bonchi
Yahoo! Research Barcelona, Avinguda Diagnonal 177, 08018, Barcelona, Spain
Aristides Gionis
TAO, CNRS-INRIA-LRI, Université Paris-Sud, 91405, Orsay, France
Michèle Sebag

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tsamardinos, I., Borboudakis, G. (2010). Permutation Testing Improves Bayesian Network Learning. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2010. Lecture Notes in Computer Science(), vol 6323. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15939-8_21

Download citation

DOI: https://doi.org/10.1007/978-3-642-15939-8_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15938-1
Online ISBN: 978-3-642-15939-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Permutation Testing Improves Bayesian Network Learning

Abstract

Chapter PDF

Similar content being viewed by others

Scoring Bayesian networks of mixed variables

A survey of Bayesian Network structure learning

Relating the Partial Dependence Plot and Permutation Feature Importance to the Data Generating Process

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Permutation Testing Improves Bayesian Network Learning

Abstract

Chapter PDF

Similar content being viewed by others

Scoring Bayesian networks of mixed variables

A survey of Bayesian Network structure learning

Relating the Partial Dependence Plot and Permutation Feature Importance to the Data Generating Process

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation