Hybrid Correlation and Causal Feature Selection for Ensemble Classifiers

Part of the Studies in Computational Intelligence book series (SCI, volume 373)


PC and TPDA algorithms are robust and well known prototype algorithms, incorporating constraint-based approaches for causal discovery. However, both algorithms cannot scale up to deal with high dimensional data, that is more than few hundred features. This chapter presents hybrid correlation and causal feature selection for ensemble classifiers to deal with this problem. Redundant features are removed by correlation-based feature selection and then irrelevant features are eliminated by causal feature selection. The number of eliminated features, accuracy, the area under the receiver operating characteristic curve (AUC) and false negative rate (FNR) of proposed algorithms are compared with correlation-based feature selection (FCBF and CFS) and causal based feature selection algorithms (PC, TPDA, GS, IAMB).


Feature Selection False Negative Rate Irrelevant Feature Redundant Feature Causal Discovery 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aliferis, C.F., Tsamardinos, I., Statnikov, A.: HITON, A novel Markov blanket algorithm for optimal variable selection. In: Proc. American Medical Iinformation Association Annual Symp., Washington DC, pp. 21–25 (2003)Google Scholar
  2. 2.
    Almuallim, H., Dietterich, T.G.: Learning with many irrelevant features. In: Proc. the 9th Natl. Conf. Artif. Intell., San Jose, CA, pp. 547–552. AAAI Press, New York (1991)Google Scholar
  3. 3.
    Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Machine Learning 36, 105–139 (1999)CrossRefGoogle Scholar
  4. 4.
    Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Brown, L.E., Tsamardinos, I.: Markov blanket-based variable selection. Technical Report DSL TR-08-01 (2008)Google Scholar
  6. 6.
    Cheng, J., Bell, D.A., Liu, W.: Learning belief networks from data: An information theory based approach. In: Golshani, F., Makki, K. (eds.) Proc. the 6th Int. Conf. Inf. and Knowledge Management, Las Vegas, NV, pp. 325–331. ACM, New York (1997)Google Scholar
  7. 7.
    Friedman, N., Nachman, I., Peer, D.: Learning of Bayesian network structure from massive datasets: The sparse candidate algorithm. In: Laskey, K., Prade, H. (eds.) Proc. the 15th Conf. Uncertainty in Artif. Intell., Stockholm, Sweden, pp. 206–215. Morgan Kaufmann, San Francisco (1999)Google Scholar
  8. 8.
    Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Langley, P. (ed.) Proc. the 17th Int. Conf. Machine Learning, Stanford, CA, pp. 359–366. Morgan Kaufmann, San Francisco (2000)Google Scholar
  9. 9.
    Duangsoithong, R., Windeatt, T.: Relevance and redundancy analysis for ensemble classifiers. In: Perner, P. (ed.) MLDM 2009. LNCS, vol. 5632, pp. 206–220. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  10. 10.
    Guyon, I., Aliferis, C., Elisseeff, A.: Causal feature selection. In: Liu, H., Motoda, H. (eds.) Computational Methods of Feature Selection, pp. 63–86. Chapman & Hall/CRC Press, Boca Raton (2007)CrossRefGoogle Scholar
  11. 11.
    Guyon, I.: Causality workbench (2008),
  12. 12.
    Kudo, M., Sklansky, J.: Comparison of algorithms that select features for pattern classifiers. Pattern Recognition 33, 25–41 (2000)CrossRefGoogle Scholar
  13. 13.
    Liu, F., Tian, F., Zhu, Q.: Bayesian network structure ensemble learning. In: Alhajj, R., Gao, H., Li, X., Li, J., Zaïane, O.R. (eds.) ADMA 2007. LNCS (LNAI), vol. 4632, pp. 454–465. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  14. 14.
    Liu, F., Tian, F., Zhu, Q.: Ensembling Bayesian network structure learning on limited data. In: Silva, M.J., Laender, A.H.F., Baeza-Yates, R.A., McGuinness, D.L., Olstad, B., Olsen, Ø.H., Falcão, A.O. (eds.) Proc. of the 16th ACM Conf. Inf. and Knowledge Management, Lisbon, Portugal, pp. 927–930. ACM, New York (2007)Google Scholar
  15. 15.
    Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowledge and Data Engineering 17, 491–502 (2005)CrossRefGoogle Scholar
  16. 16.
    Margaritis, D., Thrun, S.: Bayesian network induction via local neighborhoods. In: Solla, S.A., Leen, T.K., Müller, K.-R. (eds.) Proc. Neural Inf. Proc. Conf., Denver, CO., pp. 505–511. MIT Press, Cambridge (2000)Google Scholar
  17. 17.
    Pudil, P., Novovicova, J., Kittler, J.: Floating Search Methods in Feature Selection. Pattern Recognition Letters 15, 1119–1125 (1994)CrossRefGoogle Scholar
  18. 18.
    Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23, 2507–2517 (2007)CrossRefGoogle Scholar
  19. 19.
    Spirtes, P., Glymour, C., Scheines, R.: Causation, prediction, and search. Springer, New York (1993)zbMATHGoogle Scholar
  20. 20.
    Tsamardinos, I., Aliferis, C.F., Statnikov, A.: Time and sample efficient discovery of Markov blankets and direct causal relations. In: Getoor, L., Senator, T.E., Domingos, P., Faloutsos, C. (eds.) Proc. the 9th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Washington DC, pp. 673–678. ACM, New York (2003)CrossRefGoogle Scholar
  21. 21.
    Tsamardinos, I., Brown, L.E., Aliferis, C.F.: The max-min hill-climbing Bayesian network structure learning algorithm. Machine Learning 65, 31–78 (2006)CrossRefGoogle Scholar
  22. 22.
    Wang, M., Chen, Z., Cloutier, S.: A hybrid Bayesian network learning method for constructing gene networks. J. Comp. Biol. and Chem. 31, 361–372 (2007)CrossRefzbMATHGoogle Scholar
  23. 23.
    Windeatt, T.: Accuracy/diversity and ensemble MLP classifier design. IEEE Trans. Neural Networks 17, 1194–1211 (2006)CrossRefGoogle Scholar
  24. 24.
    Windeatt, T.: Ensemble MLP classifier design. In: Lakhmi, J.C., Sato-Ilic, M., Virvou, M., Tsihrintzis, G.A., Balas, V.E., Abeynayake, C. (eds.) Computational Intelligence Paradigms. SCI, vol. 137, pp. 133–147. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  25. 25.
    Witten, I.H., Frank, E.: Data mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar
  26. 26.
    Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Machine Learning Research 5, 1205–1224 (2004)MathSciNetGoogle Scholar
  27. 27.
    Zhang, H., Sun, G.: Feature selection using tabu search. Pattern Recognition 35, 701–711 (2002)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  1. 1.Centre for Vision, Speech and Signal ProcessingUniversity of SurreyGuildfordUnited Kingdom

Personalised recommendations