Markov Blanket Discovery in Positive-Unlabelled and Semi-supervised Data

  • Konstantinos Sechidis
  • Gavin Brown
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9284)


The importance of Markov blanket discovery algorithms is twofold: as the main building block in constraint-based structure learning of Bayesian network algorithms and as a technique to derive the optimal set of features in filter feature selection approaches. Equally, learning from partially labelled data is a crucial and demanding area of machine learning, and extending techniques from fully to partially supervised scenarios is a challenging problem. While there are many different algorithms to derive the Markov blanket of fully supervised nodes, the partially-labelled problem is far more challenging, and there is a lack of principled approaches in the literature. Our work derives a generalization of the conditional tests of independence for partially labelled binary target variables, which can handle the two main partially labelled scenarios: positive-unlabelled and semi-supervised. The result is a significantly deeper understanding of how to control false negative errors in Markov Blanket discovery procedures and how unlabelled data can help.


Markov blanket discovery Partially labelled Positive unlabelled Semi supervised Mutual information 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agresti, A.: Categorical Data Analysis. Wiley Series in Probability and Statistics, 3rd edn. Wiley-Interscience (2013)Google Scholar
  2. 2.
    Aliferis, C.F., Statnikov, A., Tsamardinos, I., Mani, S., Koutsoukos, X.D.: Local causal and Markov blan. induction for causal discovery and feat. selection for classification part I: Algor. and empirical eval. JMLR 11, 171–234 (2010)MathSciNetzbMATHGoogle Scholar
  3. 3.
    Allison, P.: Missing Data. Sage University Papers Series on Quantitative Applications in the Social Sciences, 07–136 (2001)Google Scholar
  4. 4.
    Bacciu, D., Etchells, T., Lisboa, P., Whittaker, J.: Efficient identification of independence networks using mutual information. Comp. Stats 28(2), 621–646 (2013)CrossRefMathSciNetzbMATHGoogle Scholar
  5. 5.
    Brown, G., Pocock, A., Zhao, M.J., Luján, M.: Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. The Journal of Machine Learning Research (JMLR) 13(1), 27–66 (2012)zbMATHGoogle Scholar
  6. 6.
    Cai, R., Zhang, Z., Hao, Z.: BASSUM: A Bayesian semi-supervised method for classification feature selection. Pattern Recognition 44(4), 811–820 (2011)CrossRefzbMATHGoogle Scholar
  7. 7.
    Cohen, J.: Statistical Power Analysis for the Behavioral Sciences, 2nd edn. Routledge Academic (1988)Google Scholar
  8. 8.
    Cover, T.M., Thomas, J.A.: Elements of information theory. J. Wiley & Sons (2006)Google Scholar
  9. 9.
    Elkan, C., Noto, K.: Learning classifiers from only positive and unlabeled data. In: ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (2008)Google Scholar
  10. 10.
    Koller, D., Sahami, M.: Toward optimal feature selection. In: International Conference of Machine Learning (ICML), pp. 284–292 (1996)Google Scholar
  11. 11.
    Lawrence, N.D., Jordan, M.I.: Gaussian processes and the null-category noise model. In: Semi-Supervised Learning, chap. 8, pp. 137–150. MIT Press (2006)Google Scholar
  12. 12.
    Margaritis, D., Thrun, S.: Bayesian network induction via local neighborhoods. In: NIPS, pp. 505–511. MIT Press (1999)Google Scholar
  13. 13.
    Mohan, K., Van den Broeck, G., Choi, A., Pearl, J.: Efficient algorithms for bayesian network parameter learning from incomplete data. In: Conference on Uncertainty in Artificial Intelligence (UAI) (2015)Google Scholar
  14. 14.
    Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers Inc., San Francisco (1988)Google Scholar
  15. 15.
    Pellet, J.P., Elisseeff, A.: Using Markov blankets for causal structure learning. The Journal of Machine Learning Research (JMLR) 9, 1295–1342 (2008)MathSciNetzbMATHGoogle Scholar
  16. 16.
    Plessis, M.C.d., Sugiyama, M.: Semi-supervised learning of class balance under class-prior change by distribution matching. In: 29th ICML (2012)Google Scholar
  17. 17.
    Pocock, A., Luján, M., Brown, G.: Informative priors for Markov blanket discovery. In: 15th AISTATS (2012)Google Scholar
  18. 18.
    Rosset, S., Zhu, J., Zou, H., Hastie, T.J.: A method for inferring label sampling mechanisms in semi-supervised learning. In: NIPS (2004)Google Scholar
  19. 19.
    Sechidis, K., Calvo, B., Brown, G.: Statistical hypothesis testing in positive unlabelled data. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014, Part III. LNCS, vol. 8726, pp. 66–81. Springer, Heidelberg (2014) Google Scholar
  20. 20.
    Smith, A.T., Elkan, C.: Making generative classifiers robust to selection bias. In: 13th ACM SIGKDD Inter. Conf. on Knwl. Disc. and Data Min., pp. 657–666 (2007)Google Scholar
  21. 21.
    Tsamardinos, I., Aliferis, C.F.: Towards principled feature selection: relevancy, filters and wrappers. In: AISTATS (2003)Google Scholar
  22. 22.
    Tsamardinos, I., Aliferis, C.F., Statnikov, A.: Time and sample efficient discovery of Markov blankets and direct causal relations. In: ACM SIGKDD (2003)Google Scholar
  23. 23.
    Yaramakala, S., Margaritis, D.: Speculative Markov blanket discovery for optimal feature selection. In: 5th ICDM. IEEE (2005)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.School of Computer ScienceUniversity of ManchesterManchesterUK

Personalised recommendations