Abstract
A fundamental step in the PC causal discovery algorithm consists of testing for (conditional) independence. When the number of data records is very small, a classical statistical independence test is typically unable to reject the (null) independence hypothesis. In this paper, we are comparing two conflicting pieces of advice in the literature that in case of too few data records recommend (1) assuming dependence and (2) assuming independence. Our results show that assuming independence is a safer strategy in minimizing the structural distance between the causal structure that has generated the data and the discovered structure. We also propose a simple improvement on the PC algorithm that we call blacklisting. We demonstrate that blacklisting can lead to orders of magnitude savings in computation by avoiding unnecessary independence tests.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abramson, B., Brown, J., Edwards, W., Winkler, R., Murphy, A.: Hailfinder: A Bayesian system for forecasting severe weather. International Journal of Forecasting 12(1), 57–72 (1996)
Acid, S., de Campos, L.M.: Searching for Bayesian network structures in the space of restricted acyclic partially directed graphs. Journal of Artificial Intelligence Research 18, 445–490 (2003)
Bache, K., Lichman, M.: UCI Machine Learning Repository (2013)
Beinlich, I., Suermondt, J., Chavez, M., Cooper, G.: The ALARM Monitoring System: A Case Study with Two Probablistic Inference Techniques for Belief Networks. In: Second European Conference on Artificial Intelligence in Medicine, London, pp. 247–256 (1989)
Chickering, D.M.: A transformational characterization of equivalent Bayesian network structures. In: Proceedings of the 11th Annual Conference on Uncertainty in Artificial Intelligence (UAI 1995), pp. 87–98. Morgan Kaufmann, San Francisco (1995)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Dor, D., Tarsi, M.: A simple algorithm to construct a consistent extension of a partially oriented graph. Technicial Report R-185, Cognitive Systems Laboratory, UCLA (1992)
Fienberg, S.E., Holland, P.W.: Methods for eliminating zero counts in contingency tables. In: Random Counts on Models and Structures, pp. 233–260 (1970)
Lauritzen, S.L., Spiegelhalter, D.J.: Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society 50, 157–224 (1988)
Onisko, A.: Probabilistic Causal Models in Medicine: Application to Diagnosis of Liver Disorders. PhD thesis, Institute of Biocybernetics and Biomedical Engineering, Polish Academy of Science, Warsaw (March 2003)
Pradhan, M., Provan, G., Middleton, B., Henrion, M.: Knowledge engineering for large belief networks. In: Proceedings of the Tenth International Conference on Uncertainty in Artificial Intelligence, pp. 484–490. Morgan Kaufmann Publishers Inc. (1994)
Ratnapinda, P., Druzdzel, M.J.: An empirical comparison of Bayesian network parameter learning algorithms for continuous data streams. In: Recent Advances in Artificial Intelligence: Proceedings of the Nineteenth International Florida Artificial Intelligence Research Society Conference (FLAIRS–2013), pp. 627–632 (2013)
Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search, 2nd edn. MIT Press (2000)
Stojnic, R., Fu, A.Q., Adryan, B.: A graphical modelling approach to the dissection of highly correlated transcription factor binding site profiles. PLoS Computational Biology 8(11), e1002725 (2012)
Tsamardinos, I., Brown, L.E., Aliferis, C.F.: The max-min hill-climbing Bayesian network structure learning algorithm. Mach. Learn. 65(1), 31–78 (2006)
Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics 1(6), 80–83 (1945)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
de Jongh, M., Druzdzel, M.J. (2014). Evaluation of Rules for Coping with Insufficient Data in Constraint-Based Search Algorithms. In: van der Gaag, L.C., Feelders, A.J. (eds) Probabilistic Graphical Models. PGM 2014. Lecture Notes in Computer Science(), vol 8754. Springer, Cham. https://doi.org/10.1007/978-3-319-11433-0_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-11433-0_13
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11432-3
Online ISBN: 978-3-319-11433-0
eBook Packages: Computer ScienceComputer Science (R0)