Skip to main content

Bounding the Family-Wise Error Rate in Local Causal Discovery Using Rademacher Averages

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2022)

Abstract

Many algorithms have been proposed to learn local graphical structures around target variables of interest from observational data. The Markov boundary (MB) provides a complete picture of the local causal structure around a variable and is a theoretically optimal solution for the feature selection problem. Available algorithms for MB discovery have focused on various challenges such as scalability and data-efficiency. However, current approaches do not provide guarantees in terms of false discoveries in the MB.

In this paper we introduce a novel algorithm for the MB discovery problem with rigorous guarantees on the Family-Wise Error Rate (FWER), that is, the probability of reporting any false positive. Our algorithm uses Rademacher averages, a key concept from statistical learning theory, to properly account for the multiple-hypothesis testing problem arising in MB discovery. Our evaluation on simulated data shows that our algorithm properly controls for the FWER, while widely used algorithms do not provide guarantees on false discoveries even when correcting for multiple-hypothesis testing. Our experiments also show that our algorithm identifies meaningful relations in real-world data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    N counts, in fact, the total number of possible conditional independencies between any couple of variables by considering the symmetry property of independence tests, that is testing the (conditional) independence of X from Y is equivalent to testing the one of Y from X.

  2. 2.

    Code and appendix available at https://github.com/VandinLab/RAveL.

References

  1. Aliferis, C.F., Statnikov, A., Tsamardinos, I., Mani, S., Koutsoukos, X.D.: Local causal and Markov blanket induction for causal discovery and feature selection for classification Part I: algorithms and empirical evaluation. JMLR 11, 171–234 (2010)

    MathSciNet  MATH  Google Scholar 

  2. Aliferis, C.F., Tsamardinos, I., Statnikov, A.: Hiton: a novel markov blanket algorithm for optimal variable selection. In: Proceedings of AMIA (2003)

    Google Scholar 

  3. Armen, A.P., Tsamardinos, I.: Estimation and control of the false discovery rate of Bayesian network skeleton identification. Technical report, TR-441. University of Crete (2014)

    Google Scholar 

  4. Bartlett, P.L., Mendelson, S.: Rademacher and gaussian complexities: risk bounds and structural results. JMLR 3, 463–482 (2002)

    MathSciNet  MATH  Google Scholar 

  5. Bellot, A., van der Schaar, M.: Conditional independence testing using generative adversarial networks. In: Advances in Neural Information Processing Systems (2019)

    Google Scholar 

  6. Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Royal Stat. Soc. 57(1), 289–300 (1995)

    MathSciNet  MATH  Google Scholar 

  7. Benjamini, Y., Yekutieli, D.: The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 1165–1188 (2001)

    Google Scholar 

  8. Bielza, C., Larranaga, P.: Bayesian networks in neuroscience: a survey. Front. Comput. Neurosci. 8, 131 (2014)

    Article  MATH  Google Scholar 

  9. Bonferroni, C.: Teoria statistica delle classi e calcolo delle probabilita. Istituto Superiore di Scienze Economiche e Commericiali di Firenze (1936)

    Google Scholar 

  10. Cousins, C., Wohlgemuth, C., Riondato, M.: BAVARIAN: betweenness centrality approximation with variance-aware rademacher averages. In: ACM SIGKDD (2021)

    Google Scholar 

  11. Harrison, D., Jr., Rubinfeld, D.L.: Hedonic housing prices and the demand for clean air. J. Environ. Econ. Manag. 5(1), 81–102 (1978)

    Article  MATH  Google Scholar 

  12. Koltchinskii, V., Panchenko, D.: Rademacher processes and bounding the risk of function learning. In: High Dimensional Probability II (2000)

    Google Scholar 

  13. Kusner, M.J., Loftus, J.R.: The long road to fairer algorithms. Nature 578(7793), 34–36 (2020)

    Article  Google Scholar 

  14. Li, J., Wang, Z.J.: Controlling the false discovery rate of the association/causality structure learned with the PC algorithm. JMLR (2009)

    Google Scholar 

  15. Liu, A., Li, J., Wang, Z.J., McKeown, M.J.: A computationally efficient, exploratory approach to brain connectivity incorporating false discovery rate control, a priori knowledge, and group inference. Comput. Math. Methods Med. (2012)

    Google Scholar 

  16. Ma, S., Tourani, R.: Predictive and causal implications of using shapley value for model interpretation. In: KDD Workshop on Causal Discovery. PMLR (2020)

    Google Scholar 

  17. Mhasawade, V., Chunara, R.: Causal multi-level fairness. In: Proceedings of AAAI/ACM Conference on AI, Ethics, and Society (2021)

    Google Scholar 

  18. Mitzenmacher, M., Upfal, E.: Probability and Computing. Cambridge University Press, Cambridge (2017)

    MATH  Google Scholar 

  19. Neapolitan, R.E., et al.: Learning Bayesian Networks. Pearson Prentice Hall, Hoboken (2004)

    Google Scholar 

  20. Pearl, J.: Causality, 2nd edn. Cambridge University Press, Cambridge (2009)

    Book  MATH  Google Scholar 

  21. Pe’er, D.: Bayesian network analysis of signaling networks: a primer. Science’s STKE (2005)

    Google Scholar 

  22. Pellegrina, L., Cousins, C., Vandin, F., Riondato, M.: MCRapper: Monte-Carlo rademacher averages for poset families and approximate pattern mining. In: ACM SIGKDD (2020)

    Google Scholar 

  23. Pellegrina, L., Vandin, F.: Silvan: estimating betweenness centralities with progressive sampling and non-uniform rademacher bounds. arXiv:2106.03462 (2021)

  24. Pena, J.M., Nilsson, R., Björkegren, J., Tegnér, J.: Towards scalable and data efficient learning of Markov boundaries. Int. J. Approximate Reasoning 45(2), 211–232 (2007)

    Article  MATH  Google Scholar 

  25. Riondato, M., Upfal, E.: Mining frequent itemsets through progressive sampling with rademacher averages. In: ACM SIGKDD (2015)

    Google Scholar 

  26. Riondato, M., Upfal, E.: ABRA: approximating betweenness centrality in static and dynamic graphs with rademacher averages. ACM TKDD 12(5), 1–38 (2018)

    Article  Google Scholar 

  27. Sachs, K., Perez, O., Pe’er, D., Lauffenburger, D.A., Nolan, G.P.: Causal protein-signaling networks derived from multiparameter single-cell data. Science 308(5721), 523–529 (2005)

    Article  Google Scholar 

  28. Santoro, D., Tonon, A., Vandin, F.: Mining sequential patterns with VC-dimension and rademacher complexity. Algorithms 13(5), 123 (2020)

    Article  MathSciNet  Google Scholar 

  29. Shah, R.D. and Peters, J.: The hardness of conditional independence testing and the generalised covariance measure. Ann. Stat. (2020)

    Google Scholar 

  30. Spirtes, P., Glymour, C.N., Scheines, R., Heckerman, D.: Causation, Prediction, and Search. MIT Press, Cambridge (2000)

    MATH  Google Scholar 

  31. Strobl, E.V., Spirtes, P.L., Visweswaran, S.: Estimating and controlling the false discovery rate of the PC algorithm using edge-specific P-values. ACM TIST 10(5), 1–37 (2019)

    Article  Google Scholar 

  32. Tsamardinos, I., Aliferis, C.F.: Towards principled feature selection: relevancy, filters and wrappers. In: International Workshop on AI and Statistics. PMLR (2003)

    Google Scholar 

  33. Tsamardinos, I., Aliferis, C.F., Statnikov, A.: Time and sample efficient discovery of Markov blankets and direct causal relations. In: ACM SIGKDD (2003)

    Google Scholar 

  34. Tsamardinos, I., Aliferis, C.F., Statnikov, A.R., Statnikov, E.: Algorithms for large scale Markov blanket discovery. In: FLAIRS Conference (2003)

    Google Scholar 

  35. Tsamardinos, I., Brown, L.E.: Bounding the false discovery rate in local Bayesian network learning. In: AAAI (2008)

    Google Scholar 

  36. Velikova, M., van Scheltinga, J.T., Lucas, P.J., Spaanderman, M.: Exploiting causal functional relationships in Bayesian network modelling for personalised healthcare. Int. J. Approximate Reasoning 55(1), 59–73 (2014)

    Article  Google Scholar 

  37. Yusuf, F., Cheng, S., Ganapati, S., Narasimhan, G.: Causal inference methods and their challenges: the case of 311 data. In: International Conference on on Digital Government Research (2021)

    Google Scholar 

Download references

Acknowledgements

This work is supported, in part, by the Italian Ministry of Education, University and Research (MIUR), under PRIN Project n. 20174LF3T8 “AHeAD” (efficient Algorithms for HArnessing networked Data) and the initiative “Departments of Excellence” (Law 232/2016), and by University of Padova under project “SID 2020: RATED-X”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fabio Vandin .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 464 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Simionato, D., Vandin, F. (2023). Bounding the Family-Wise Error Rate in Local Causal Discovery Using Rademacher Averages. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13717. Springer, Cham. https://doi.org/10.1007/978-3-031-26419-1_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-26419-1_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-26418-4

  • Online ISBN: 978-3-031-26419-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics