Learning Markov Blankets for Continuous or Discrete Networks via Feature Selection

Part of the Studies in Computational Intelligence book series (SCI, volume 373)


Learning Markov Blankets is important for classification and regression, causal discovery, and Bayesian network learning. We present an argument that ensemble masking measures can provide an approximate Markov Blanket. Consequently, an ensemble feature selection method can be used to learnMarkov Blankets for either discrete or continuous networks (without linear, Gaussian assumptions). We use masking measures for redundancy and statistical inference for feature selection criteria. We compare our performance in the causal structure learning problem to a collection of common feature selection methods.We also compare to Bayesian local structure learning. These results can also be easily extended to other casual structure models such as undirected graphical models.


Feature Selection Bayesian Network Causal Structure Feature Selection Method Structure Learning 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)CrossRefzbMATHGoogle Scholar
  2. 2.
    Frey, L., Fisher, D., Tsamardinos, I., Aliferis, C., Statnikov, A.: Identifying Markov blankets with decision tree induction. In: Proc. the 3rd IEEE Int. Conf. Data Mining, Melbourne, FL, pp. 59–66. IEEE Comp. Society, Los Alamitos (2003)CrossRefGoogle Scholar
  3. 3.
    Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: A statistical view of boosting. Annals of Statistics 28, 832–844 (2000)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)CrossRefzbMATHGoogle Scholar
  5. 5.
    Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Langley, P. (ed.) Proc. the 17th Int. Conf. Machine Learning, Stanford, CA, pp. 359–366. Morgan Kaufmann, San Francisco (2000)Google Scholar
  6. 6.
    Hornik, K., Buchta, C., Zeileis, A.: Open-source machine learning: R meets Weka. Computational Statistics 24, 225–232 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Hoyer, P., Janzing, D., Mooij, J., Peters, J., Scholkopf, B.: Nonlinear causal discovery with additive noise models. In: Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (eds.) Advances in Neural Inf. Proc. Syst., pp. 689–696. MIT Press, Cambridge (2009)Google Scholar
  8. 8.
    Ihaka, R., Gentleman, R.: A language for data analysis and graphics. J. Comp. and Graphical Stat. 5, 299–314 (1996)CrossRefGoogle Scholar
  9. 9.
    Koller, D., Sahami, M.: Toward optimal feature selection. In: Saitta, L. (ed.) Proc. the 13th Int. Conf. Machine Learning, Bari, Italy, pp. 284–292. Morgan Kaufmann, San Francisco (1996)Google Scholar
  10. 10.
    Li, F., Yang, Y.: Use modified lasso regressions to learn large undirected graphs in a probabilistic framework. In: Veloso, M.M., Kambhampati, S. (eds.) Proc. the 20th Natl. Conf. Artif. Intell. and the 17th Innovative Appl. Artif. Intell. Conf., Pittsburgh, PA, pp. 81–86. AAAI Press, MIT Press (2005)Google Scholar
  11. 11.
    Margaritis, D., Thrun, S.: Bayesian network induction via local neighborhoods. In: Solla, S.A., Leen, T.K., Müller, K.-R. (eds.) Advances in Neural Inf. Proc. Syst., pp. 505–511. MIT Press, Cambridge (2000)Google Scholar
  12. 12.
    Pudil, P., Kittler, J., Novovicová, J.: Floating search methods in feature selection. Pattern Recognition Letters 15, 1119–1125 (1994)CrossRefGoogle Scholar
  13. 13.
    Pellet, J.P., Elisseeff, A.: Using Markov blankets for causal structure learning. J. Machine Learning Research 9, 1295–1342 (2008)MathSciNetGoogle Scholar
  14. 14.
    Robnik-Sikonja, M., Kononenko, I.: Theoretical and empirical analysis of relief and relieff. Machine Learning 53, 23–69 (2003)CrossRefzbMATHGoogle Scholar
  15. 15.
    Scutari, M.: Learning bayesian networks with the bnlearn R package. J. Stat. Software 35, 1–22 (2010)Google Scholar
  16. 16.
    Shimizu, S., Hoyer, P., Hyvärinen, A., Kerminen, A.: A linear non-gaussian acyclic model for causal discovery. J. Machine Learning Research 7, 2003–2030 (2006)Google Scholar
  17. 17.
    Tillman, R., Gretton, A., Spirtes, P.: Nonlinear directed acyclic structure learning with weakly additive noise models. In: Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C.K.I., Culotta, A. (eds.) Advances in Neural Inf. Proc. Syst., pp. 1847–1855. MIT Press, Cambridge (2010)Google Scholar
  18. 18.
    Tsamardinos, I., Aliferis, C., Statnikov, A.: Algorithms for large scale Markov blanket discovery. In: Russell, I., Haller, S.M. (eds.) Proc. the 16th Florida Artif. Intell. Research Society Conference, St. Augustine, FL, pp. 376–381. AAAI Press, New York (2003)Google Scholar
  19. 19.
    Tsamardinos, I., Aliferis, C., Statnikov, A.: Time and sample efficient discovery of Markov blankets and direct causal relations. In: Getoor, L., Senator, T.E., Domingos, P., Faloutsos, C. (eds.) Proc. the 9th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Washington DC, pp. 673–678. ACM, New York (2003)CrossRefGoogle Scholar
  20. 20.
    Tuv, E., Borisov, A., Runger, G., Torkkola, K.: Feature selection with ensembles, artificial variables, and redundancy elimination. J. Machine Learning Research 10, 1341–1366 (2009)MathSciNetGoogle Scholar
  21. 21.
    Voortman, M., Druzdzel, M.: Insensitivity of constraint-based causal discovery algorithms to violations of the assumption of multivariate normality. In: Wilson, D., Lane, H.C. (eds.) Proc. the 21st Int. Florida Artif. Intell. Research Society Conf., Coconut Grove, FL, pp. 680–695. AAAI Press, New York (2008)Google Scholar
  22. 22.
    Witten, I.H., Frank, E.: Data mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar
  23. 23.
    Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Machine Learning Research 5, 1205–1224 (2004)MathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  1. 1.Arizona State UniversityTempeUSA
  2. 2.IntelChandlerUSA

Personalised recommendations