Bounding the Family-Wise Error Rate in Local Causal Discovery Using Rademacher Averages

Simionato, Dario; Vandin, Fabio

doi:10.1007/978-3-031-26419-1_16

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13717))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

636 Accesses

Abstract

Many algorithms have been proposed to learn local graphical structures around target variables of interest from observational data. The Markov boundary (MB) provides a complete picture of the local causal structure around a variable and is a theoretically optimal solution for the feature selection problem. Available algorithms for MB discovery have focused on various challenges such as scalability and data-efficiency. However, current approaches do not provide guarantees in terms of false discoveries in the MB.

In this paper we introduce a novel algorithm for the MB discovery problem with rigorous guarantees on the Family-Wise Error Rate (FWER), that is, the probability of reporting any false positive. Our algorithm uses Rademacher averages, a key concept from statistical learning theory, to properly account for the multiple-hypothesis testing problem arising in MB discovery. Our evaluation on simulated data shows that our algorithm properly controls for the FWER, while widely used algorithms do not provide guarantees on false discoveries even when correcting for multiple-hypothesis testing. Our experiments also show that our algorithm identifies meaningful relations in real-world data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
N counts, in fact, the total number of possible conditional independencies between any couple of variables by considering the symmetry property of independence tests, that is testing the (conditional) independence of X from Y is equivalent to testing the one of Y from X.
2.
Code and appendix available at https://github.com/VandinLab/RAveL.

References

Aliferis, C.F., Statnikov, A., Tsamardinos, I., Mani, S., Koutsoukos, X.D.: Local causal and Markov blanket induction for causal discovery and feature selection for classification Part I: algorithms and empirical evaluation. JMLR 11, 171–234 (2010)
MathSciNet MATH Google Scholar
Aliferis, C.F., Tsamardinos, I., Statnikov, A.: Hiton: a novel markov blanket algorithm for optimal variable selection. In: Proceedings of AMIA (2003)
Google Scholar
Armen, A.P., Tsamardinos, I.: Estimation and control of the false discovery rate of Bayesian network skeleton identification. Technical report, TR-441. University of Crete (2014)
Google Scholar
Bartlett, P.L., Mendelson, S.: Rademacher and gaussian complexities: risk bounds and structural results. JMLR 3, 463–482 (2002)
MathSciNet MATH Google Scholar
Bellot, A., van der Schaar, M.: Conditional independence testing using generative adversarial networks. In: Advances in Neural Information Processing Systems (2019)
Google Scholar
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Royal Stat. Soc. 57(1), 289–300 (1995)
MathSciNet MATH Google Scholar
Benjamini, Y., Yekutieli, D.: The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 1165–1188 (2001)
Google Scholar
Bielza, C., Larranaga, P.: Bayesian networks in neuroscience: a survey. Front. Comput. Neurosci. 8, 131 (2014)
Article MATH Google Scholar
Bonferroni, C.: Teoria statistica delle classi e calcolo delle probabilita. Istituto Superiore di Scienze Economiche e Commericiali di Firenze (1936)
Google Scholar
Cousins, C., Wohlgemuth, C., Riondato, M.: BAVARIAN: betweenness centrality approximation with variance-aware rademacher averages. In: ACM SIGKDD (2021)
Google Scholar
Harrison, D., Jr., Rubinfeld, D.L.: Hedonic housing prices and the demand for clean air. J. Environ. Econ. Manag. 5(1), 81–102 (1978)
Article MATH Google Scholar
Koltchinskii, V., Panchenko, D.: Rademacher processes and bounding the risk of function learning. In: High Dimensional Probability II (2000)
Google Scholar
Kusner, M.J., Loftus, J.R.: The long road to fairer algorithms. Nature 578(7793), 34–36 (2020)
Article Google Scholar
Li, J., Wang, Z.J.: Controlling the false discovery rate of the association/causality structure learned with the PC algorithm. JMLR (2009)
Google Scholar
Liu, A., Li, J., Wang, Z.J., McKeown, M.J.: A computationally efficient, exploratory approach to brain connectivity incorporating false discovery rate control, a priori knowledge, and group inference. Comput. Math. Methods Med. (2012)
Google Scholar
Ma, S., Tourani, R.: Predictive and causal implications of using shapley value for model interpretation. In: KDD Workshop on Causal Discovery. PMLR (2020)
Google Scholar
Mhasawade, V., Chunara, R.: Causal multi-level fairness. In: Proceedings of AAAI/ACM Conference on AI, Ethics, and Society (2021)
Google Scholar
Mitzenmacher, M., Upfal, E.: Probability and Computing. Cambridge University Press, Cambridge (2017)
MATH Google Scholar
Neapolitan, R.E., et al.: Learning Bayesian Networks. Pearson Prentice Hall, Hoboken (2004)
Google Scholar
Pearl, J.: Causality, 2nd edn. Cambridge University Press, Cambridge (2009)
Book MATH Google Scholar
Pe’er, D.: Bayesian network analysis of signaling networks: a primer. Science’s STKE (2005)
Google Scholar
Pellegrina, L., Cousins, C., Vandin, F., Riondato, M.: MCRapper: Monte-Carlo rademacher averages for poset families and approximate pattern mining. In: ACM SIGKDD (2020)
Google Scholar
Pellegrina, L., Vandin, F.: Silvan: estimating betweenness centralities with progressive sampling and non-uniform rademacher bounds. arXiv:2106.03462 (2021)
Pena, J.M., Nilsson, R., Björkegren, J., Tegnér, J.: Towards scalable and data efficient learning of Markov boundaries. Int. J. Approximate Reasoning 45(2), 211–232 (2007)
Article MATH Google Scholar
Riondato, M., Upfal, E.: Mining frequent itemsets through progressive sampling with rademacher averages. In: ACM SIGKDD (2015)
Google Scholar
Riondato, M., Upfal, E.: ABRA: approximating betweenness centrality in static and dynamic graphs with rademacher averages. ACM TKDD 12(5), 1–38 (2018)
Article Google Scholar
Sachs, K., Perez, O., Pe’er, D., Lauffenburger, D.A., Nolan, G.P.: Causal protein-signaling networks derived from multiparameter single-cell data. Science 308(5721), 523–529 (2005)
Article Google Scholar
Santoro, D., Tonon, A., Vandin, F.: Mining sequential patterns with VC-dimension and rademacher complexity. Algorithms 13(5), 123 (2020)
Article MathSciNet Google Scholar
Shah, R.D. and Peters, J.: The hardness of conditional independence testing and the generalised covariance measure. Ann. Stat. (2020)
Google Scholar
Spirtes, P., Glymour, C.N., Scheines, R., Heckerman, D.: Causation, Prediction, and Search. MIT Press, Cambridge (2000)
MATH Google Scholar
Strobl, E.V., Spirtes, P.L., Visweswaran, S.: Estimating and controlling the false discovery rate of the PC algorithm using edge-specific P-values. ACM TIST 10(5), 1–37 (2019)
Article Google Scholar
Tsamardinos, I., Aliferis, C.F.: Towards principled feature selection: relevancy, filters and wrappers. In: International Workshop on AI and Statistics. PMLR (2003)
Google Scholar
Tsamardinos, I., Aliferis, C.F., Statnikov, A.: Time and sample efficient discovery of Markov blankets and direct causal relations. In: ACM SIGKDD (2003)
Google Scholar
Tsamardinos, I., Aliferis, C.F., Statnikov, A.R., Statnikov, E.: Algorithms for large scale Markov blanket discovery. In: FLAIRS Conference (2003)
Google Scholar
Tsamardinos, I., Brown, L.E.: Bounding the false discovery rate in local Bayesian network learning. In: AAAI (2008)
Google Scholar
Velikova, M., van Scheltinga, J.T., Lucas, P.J., Spaanderman, M.: Exploiting causal functional relationships in Bayesian network modelling for personalised healthcare. Int. J. Approximate Reasoning 55(1), 59–73 (2014)
Article Google Scholar
Yusuf, F., Cheng, S., Ganapati, S., Narasimhan, G.: Causal inference methods and their challenges: the case of 311 data. In: International Conference on on Digital Government Research (2021)
Google Scholar

Download references

Acknowledgements

This work is supported, in part, by the Italian Ministry of Education, University and Research (MIUR), under PRIN Project n. 20174LF3T8 “AHeAD” (efficient Algorithms for HArnessing networked Data) and the initiative “Departments of Excellence” (Law 232/2016), and by University of Padova under project “SID 2020: RATED-X”.

Author information

Authors and Affiliations

Department of Information Engineering, University of Padova, Padua, Italy
Dario Simionato & Fabio Vandin

Authors

Dario Simionato
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Vandin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fabio Vandin .

Editor information

Editors and Affiliations

Grenoble Alpes University, Saint Martin d’Hères, France
Massih-Reza Amini
INSA Rouen Normandy, Saint Etienne du Rouvray, France
Stéphane Canu
Ruhr-Universität Bochum, Bochum, Germany
Asja Fischer
KU Leuven, Leuven, Belgium
Tias Guns
Central European University, Vienna, Austria
Petra Kralj Novak
Aristotle University of Thessaloniki, Thessaloniki, Greece
Grigorios Tsoumakas

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 464 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Simionato, D., Vandin, F. (2023). Bounding the Family-Wise Error Rate in Local Causal Discovery Using Rademacher Averages. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13717. Springer, Cham. https://doi.org/10.1007/978-3-031-26419-1_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-26419-1_16
Published: 17 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26418-4
Online ISBN: 978-3-031-26419-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Bounding the Family-Wise Error Rate in Local Causal Discovery Using Rademacher Averages