Markov Blanket Ranking Using Kernel-Based Conditional Dependence Measures

  • Eric V. StroblEmail author
  • Shyam Visweswaran
Part of the The Springer Series on Challenges in Machine Learning book series (SSCML)


Developing feature selection algorithms that move beyond a pure correlational to a more causal analysis of observational data is an important problem in the sciences. Several algorithms attempt to do so by discovering the Markov blanket of a target, but they all contain a forward selection step which variables must pass in order to be included in the conditioning set. As a result, these algorithms may not consider all possible conditional multivariate combinations. We improve on this limitation by proposing a backward elimination method that uses a kernel-based conditional dependence measure to identify the Markov blanket in a fully multivariate fashion. The algorithm is easy to implement and compares favorably to other methods on synthetic and real datasets.


Feature ranking Markov blanket Machine learning 



We thank Dr. Subramani Mani for providing the U.S. Linked Infant Birth and Death 1991 dataset. This research was funded by the National Library of Medicine grant T15 LM007059-24 to the University of Pittsburgh Biomedical Informatics Training Program and the National Institute of General Medical Sciences grant T32 GM008208 to the University of Pittsburgh Medical Scientist Training Program.


  1. 1.
    Constantin F Aliferis, Ioannis Tsamardinos, and Alexander Statnikov. Hiton: a novel markov blanket algorithm for optimal variable selection. In AMIA Annual Symposium Proceedings, volume 2003, page 21. American Medical Informatics Association, 2003.Google Scholar
  2. 2.
    Kenji Fukumizu, Francis R Bach, Michael I Jordan, et al. Kernel dimension reduction in regression. The Annals of Statistics, 37(4):1871–1905, 2009.MathSciNetCrossRefGoogle Scholar
  3. 3.
    Arthur Gretton, Ralf Herbrich, Alexander Smola, Olivier Bousquet, and Bernhard Schölkopf. Kernel methods for measuring independence. Journal of Machine Learning Research, 6(Dec):2075–2129, 2005.MathSciNetzbMATHGoogle Scholar
  4. 4.
    Isabelle Guyon, Constantin Aliferis, and André Elisseeff. Causal feature selection. In Computational methods of feature selection, pages 75–97. Chapman and Hall/CRC, 2007.Google Scholar
  5. 5.
    Patrik O Hoyer, Dominik Janzing, Joris M Mooij, Jonas Peters, and Bernhard Schölkopf. Nonlinear causal discovery with additive noise models. In Advances in neural information processing systems, pages 689–696, 2009.Google Scholar
  6. 6.
    Daphne Koller, Nir Friedman, and Francis Bach. Probabilistic graphical models: principles and techniques. MIT Press, 2009.Google Scholar
  7. 7.
    Qiang Lou and Zoran Obradovic. Feature selection by approximating the markov blanket in a kernel-induced space. In ECAI:European Conference on Artificial Intelligence, pages 797–802, 2010.Google Scholar
  8. 8.
    Subramani Mani and Gregory F Cooper. A study in causal discovery from population-based infant birth and death records. In Proceedings of the AMIA Symposium, page 315. American Medical Informatics Association, 1999.Google Scholar
  9. 9.
    Jose M Peña, Johan Björkegren, and Jesper Tegnér. Scalable, efficient and correct learning of markov boundaries under the faithfulness assumption. In European Conference on Symbolic and Quantitative Approaches to Reasoning and Uncertainty, pages 136–147. Springer, 2005.Google Scholar
  10. 10.
    Karen Sachs, Omar Perez, Dana Pe’er, Douglas A Lauffenburger, and Garry P Nolan. Causal protein-signaling networks derived from multiparameter single-cell data. Science, 308(5721):523–529, 2005.CrossRefGoogle Scholar
  11. 11.
    Shohei Shimizu, Patrik O Hoyer, Aapo Hyvärinen, and Antti Kerminen. A linear non-gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7(Oct):2003–2030, 2006.Google Scholar
  12. 12.
    Le Song, Justin Bedo, Karsten M Borgwardt, Arthur Gretton, and Alex Smola. Gene selection via the bahsic family of algorithms. Bioinformatics, 23(13):i490–i498, 2007.CrossRefGoogle Scholar
  13. 13.
    Peter Spirtes. An anytime algorithm for causal inference. In AISTATS: Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics, 2001.Google Scholar
  14. 14.
    Peter Spirtes, Clark N Glymour, Richard Scheines, David Heckerman, Christopher Meek, Gregory Cooper, and Thomas Richardson. Causation, prediction, and search. MIT Press, 2000.Google Scholar
  15. 15.
    Alexander Statnikov, Nikita I Lytkin, Jan Lemeire, and Constantin F Aliferis. Algorithms for discovery of multiple markov boundaries. Journal of Machine Learning Research, 14(Feb):499–566, 2013.Google Scholar
  16. 16.
    Ioannis Tsamardinos and Constantin F Aliferis. Towards principled feature selection: relevancy, filters and wrappers. In AISTATS: Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, 2003.Google Scholar
  17. 17.
    Ioannis Tsamardinos, Laura E Brown, and Constantin F Aliferis. The max-min hill-climbing bayesian network structure learning algorithm. Machine learning, 65(1):31–78, 2006.CrossRefGoogle Scholar
  18. 18.
    Kun Zhang, Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. Kernel-based conditional independence test and application in causal discovery. In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, pages 804–813. AUAI Press, 2011.Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Biomedical InformaticsUniversity of Pittsburgh School of MedicinePittsburghUSA

Personalised recommendations