Abstract
Many algorithms have been proposed to learn local graphical structures around target variables of interest from observational data. The Markov boundary (MB) provides a complete picture of the local causal structure around a variable and is a theoretically optimal solution for the feature selection problem. Available algorithms for MB discovery have focused on various challenges such as scalability and data-efficiency. However, current approaches do not provide guarantees in terms of false discoveries in the MB.
In this paper we introduce a novel algorithm for the MB discovery problem with rigorous guarantees on the Family-Wise Error Rate (FWER), that is, the probability of reporting any false positive. Our algorithm uses Rademacher averages, a key concept from statistical learning theory, to properly account for the multiple-hypothesis testing problem arising in MB discovery. Our evaluation on simulated data shows that our algorithm properly controls for the FWER, while widely used algorithms do not provide guarantees on false discoveries even when correcting for multiple-hypothesis testing. Our experiments also show that our algorithm identifies meaningful relations in real-world data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
N counts, in fact, the total number of possible conditional independencies between any couple of variables by considering the symmetry property of independence tests, that is testing the (conditional) independence of X from Y is equivalent to testing the one of Y from X.
- 2.
Code and appendix available at https://github.com/VandinLab/RAveL.
References
Aliferis, C.F., Statnikov, A., Tsamardinos, I., Mani, S., Koutsoukos, X.D.: Local causal and Markov blanket induction for causal discovery and feature selection for classification Part I: algorithms and empirical evaluation. JMLR 11, 171–234 (2010)
Aliferis, C.F., Tsamardinos, I., Statnikov, A.: Hiton: a novel markov blanket algorithm for optimal variable selection. In: Proceedings of AMIA (2003)
Armen, A.P., Tsamardinos, I.: Estimation and control of the false discovery rate of Bayesian network skeleton identification. Technical report, TR-441. University of Crete (2014)
Bartlett, P.L., Mendelson, S.: Rademacher and gaussian complexities: risk bounds and structural results. JMLR 3, 463–482 (2002)
Bellot, A., van der Schaar, M.: Conditional independence testing using generative adversarial networks. In: Advances in Neural Information Processing Systems (2019)
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Royal Stat. Soc. 57(1), 289–300 (1995)
Benjamini, Y., Yekutieli, D.: The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 1165–1188 (2001)
Bielza, C., Larranaga, P.: Bayesian networks in neuroscience: a survey. Front. Comput. Neurosci. 8, 131 (2014)
Bonferroni, C.: Teoria statistica delle classi e calcolo delle probabilita. Istituto Superiore di Scienze Economiche e Commericiali di Firenze (1936)
Cousins, C., Wohlgemuth, C., Riondato, M.: BAVARIAN: betweenness centrality approximation with variance-aware rademacher averages. In: ACM SIGKDD (2021)
Harrison, D., Jr., Rubinfeld, D.L.: Hedonic housing prices and the demand for clean air. J. Environ. Econ. Manag. 5(1), 81–102 (1978)
Koltchinskii, V., Panchenko, D.: Rademacher processes and bounding the risk of function learning. In: High Dimensional Probability II (2000)
Kusner, M.J., Loftus, J.R.: The long road to fairer algorithms. Nature 578(7793), 34–36 (2020)
Li, J., Wang, Z.J.: Controlling the false discovery rate of the association/causality structure learned with the PC algorithm. JMLR (2009)
Liu, A., Li, J., Wang, Z.J., McKeown, M.J.: A computationally efficient, exploratory approach to brain connectivity incorporating false discovery rate control, a priori knowledge, and group inference. Comput. Math. Methods Med. (2012)
Ma, S., Tourani, R.: Predictive and causal implications of using shapley value for model interpretation. In: KDD Workshop on Causal Discovery. PMLR (2020)
Mhasawade, V., Chunara, R.: Causal multi-level fairness. In: Proceedings of AAAI/ACM Conference on AI, Ethics, and Society (2021)
Mitzenmacher, M., Upfal, E.: Probability and Computing. Cambridge University Press, Cambridge (2017)
Neapolitan, R.E., et al.: Learning Bayesian Networks. Pearson Prentice Hall, Hoboken (2004)
Pearl, J.: Causality, 2nd edn. Cambridge University Press, Cambridge (2009)
Pe’er, D.: Bayesian network analysis of signaling networks: a primer. Science’s STKE (2005)
Pellegrina, L., Cousins, C., Vandin, F., Riondato, M.: MCRapper: Monte-Carlo rademacher averages for poset families and approximate pattern mining. In: ACM SIGKDD (2020)
Pellegrina, L., Vandin, F.: Silvan: estimating betweenness centralities with progressive sampling and non-uniform rademacher bounds. arXiv:2106.03462 (2021)
Pena, J.M., Nilsson, R., Björkegren, J., Tegnér, J.: Towards scalable and data efficient learning of Markov boundaries. Int. J. Approximate Reasoning 45(2), 211–232 (2007)
Riondato, M., Upfal, E.: Mining frequent itemsets through progressive sampling with rademacher averages. In: ACM SIGKDD (2015)
Riondato, M., Upfal, E.: ABRA: approximating betweenness centrality in static and dynamic graphs with rademacher averages. ACM TKDD 12(5), 1–38 (2018)
Sachs, K., Perez, O., Pe’er, D., Lauffenburger, D.A., Nolan, G.P.: Causal protein-signaling networks derived from multiparameter single-cell data. Science 308(5721), 523–529 (2005)
Santoro, D., Tonon, A., Vandin, F.: Mining sequential patterns with VC-dimension and rademacher complexity. Algorithms 13(5), 123 (2020)
Shah, R.D. and Peters, J.: The hardness of conditional independence testing and the generalised covariance measure. Ann. Stat. (2020)
Spirtes, P., Glymour, C.N., Scheines, R., Heckerman, D.: Causation, Prediction, and Search. MIT Press, Cambridge (2000)
Strobl, E.V., Spirtes, P.L., Visweswaran, S.: Estimating and controlling the false discovery rate of the PC algorithm using edge-specific P-values. ACM TIST 10(5), 1–37 (2019)
Tsamardinos, I., Aliferis, C.F.: Towards principled feature selection: relevancy, filters and wrappers. In: International Workshop on AI and Statistics. PMLR (2003)
Tsamardinos, I., Aliferis, C.F., Statnikov, A.: Time and sample efficient discovery of Markov blankets and direct causal relations. In: ACM SIGKDD (2003)
Tsamardinos, I., Aliferis, C.F., Statnikov, A.R., Statnikov, E.: Algorithms for large scale Markov blanket discovery. In: FLAIRS Conference (2003)
Tsamardinos, I., Brown, L.E.: Bounding the false discovery rate in local Bayesian network learning. In: AAAI (2008)
Velikova, M., van Scheltinga, J.T., Lucas, P.J., Spaanderman, M.: Exploiting causal functional relationships in Bayesian network modelling for personalised healthcare. Int. J. Approximate Reasoning 55(1), 59–73 (2014)
Yusuf, F., Cheng, S., Ganapati, S., Narasimhan, G.: Causal inference methods and their challenges: the case of 311 data. In: International Conference on on Digital Government Research (2021)
Acknowledgements
This work is supported, in part, by the Italian Ministry of Education, University and Research (MIUR), under PRIN Project n. 20174LF3T8 “AHeAD” (efficient Algorithms for HArnessing networked Data) and the initiative “Departments of Excellence” (Law 232/2016), and by University of Padova under project “SID 2020: RATED-X”.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Simionato, D., Vandin, F. (2023). Bounding the Family-Wise Error Rate in Local Causal Discovery Using Rademacher Averages. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13717. Springer, Cham. https://doi.org/10.1007/978-3-031-26419-1_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-26419-1_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26418-4
Online ISBN: 978-3-031-26419-1
eBook Packages: Computer ScienceComputer Science (R0)