A Novel Scalable and Data Efficient Feature Subset Selection Algorithm

  • Sergio Rodrigues de Morais
  • Alex Aussem
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5212)

Abstract

In this paper, we aim to identify the minimal subset of discrete random variables that is relevant for probabilistic classification in data sets with many variables but few instances. A principled solution to this problem is to determine the Markov boundary of the class variable. Also, we present a novel scalable, data efficient and correct Markov boundary learning algorithm under the so-called faithfulness condition. We report extensive empiric experiments on synthetic and real data sets scaling up to 139,351 variables.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)MATHCrossRefGoogle Scholar
  2. 2.
    Nilsson, R., Peña, J., Bjrkegren, J., Tegnr, J.: Consistent feature selection for pattern recognition in polynomial time. Journal of Machine Learning Research 8, 589–612 (2007)Google Scholar
  3. 3.
    Peña, J., Nilsson, R., Bjrkegren, J., Tegnr, J.: Towards scalable and data efficient learning of markov boundaries. International Journal of Approximate Reasoning 45(2), 211–232 (2007)MATHCrossRefGoogle Scholar
  4. 4.
    Yaramakala, S., Margaritis, D.: Speculative markov blanket discovery for optimal feature selection. In: ICDM, pp. 809–812 (2005)Google Scholar
  5. 5.
    Tsamardinos, I., Aliferis, C.F., Statnikov, A.R.: Algorithms for large scale markov blanket discovery. In: FLAIRS Conference, pp. 376–381 (2003)Google Scholar
  6. 6.
    Cheng, J., Hatzis, C., Hayashi, H., Krogel, M., Morishita, S., Page, D., Sese, J.: KDD Cup 2001 Report. In: ACM SIGKDD Explorations, pp. 1–18 (2002)Google Scholar
  7. 7.
    Dash, D., Druzdzel, M.J.: Robust independence testing for constraint-based learning of causal structure. In: UAI, pp. 167–174 (2003)Google Scholar
  8. 8.
    Tsamardinos, I., Brown, L.E., Aliferis, C.F.: The max-min hill-climbing bayesian network structure learning algorithm. Machine Learning 65(1), 31–78 (2006)CrossRefGoogle Scholar
  9. 9.
    Spirtes, P., Glymour, C., Scheines, R.: Causation, prediction, and search. Springer, Heidelberg (1993)MATHGoogle Scholar
  10. 10.
    Yaramakala, S.: Fast markov blanket discovery. In MS-Thesis. Iowa State University (2004)Google Scholar
  11. 11.
    Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search, 2nd edn. MIT Press, Cambridge (2000)Google Scholar
  12. 12.
    Yilmaz, Y.K., Alpaydin, E., Akin, H.L., Bilgiç, T.: Handling of deterministic relationships in constraint-based causal discovery. In: Probabilistic Graphical Models (2002)Google Scholar
  13. 13.
    Luo, W.: Learning bayesian networks in semi-deterministic systems. In: Canadian Conference on AI, pp. 230–241 (2006)Google Scholar
  14. 14.
    Kebaili, Z., Aussem, A.: A novel bayesian network structure learning algorithm based on minimal correlated itemset mining techniques. In: IEEE Int. Conference on Digital Information Management ICDIM 2007, pp. 121–126 (2007)Google Scholar
  15. 15.
    Aussem, A., de Morais, S.R., Corbex, M.: Nasopharyngeal carcinoma data analysis with a novel bayesian network skeleton learning. In: Bellazzi, R., Abu-Hanna, A., Hunter, J. (eds.) AIME 2007. LNCS (LNAI), vol. 4594, pp. 326–330. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  16. 16.
    Rodrigues de Morais, S., Aussem, A., Corbex, M.: Handling almost-deterministic relationships in constraint-based bayesian network discovery: Application to cancer risk factor identification. In: 16th European Symposium on Artificial Neural Networks ESANN 2008, pp. 101–106 (2008)Google Scholar
  17. 17.
    Neapolitan, R.E.: Learning Bayesian Networks. Prentice-Hall, Englewood Cliffs (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Sergio Rodrigues de Morais
    • 1
  • Alex Aussem
    • 2
  1. 1.INSA-Lyon, LIESPVilleurbanneFrance
  2. 2.Université de Lyon 1, LIESPVilleurbanneFrance

Personalised recommendations