Hybrid Correlation and Causal Feature Selection for Ensemble Classifiers

Duangsoithong, Rakkrit; Windeatt, Terry

doi:10.1007/978-3-642-22910-7_6

Hybrid Correlation and Causal Feature Selection for Ensemble Classifiers

Rakkrit Duangsoithong⁵ &
Terry Windeatt⁵

Chapter

1554 Accesses
1 Citations

Part of the book series: Studies in Computational Intelligence ((SCI,volume 373))

Abstract

PC and TPDA algorithms are robust and well known prototype algorithms, incorporating constraint-based approaches for causal discovery. However, both algorithms cannot scale up to deal with high dimensional data, that is more than few hundred features. This chapter presents hybrid correlation and causal feature selection for ensemble classifiers to deal with this problem. Redundant features are removed by correlation-based feature selection and then irrelevant features are eliminated by causal feature selection. The number of eliminated features, accuracy, the area under the receiver operating characteristic curve (AUC) and false negative rate (FNR) of proposed algorithms are compared with correlation-based feature selection (FCBF and CFS) and causal based feature selection algorithms (PC, TPDA, GS, IAMB).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aliferis, C.F., Tsamardinos, I., Statnikov, A.: HITON, A novel Markov blanket algorithm for optimal variable selection. In: Proc. American Medical Iinformation Association Annual Symp., Washington DC, pp. 21–25 (2003)
Google Scholar
Almuallim, H., Dietterich, T.G.: Learning with many irrelevant features. In: Proc. the 9th Natl. Conf. Artif. Intell., San Jose, CA, pp. 547–552. AAAI Press, New York (1991)
Google Scholar
Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Machine Learning 36, 105–139 (1999)
Article Google Scholar
Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)
MathSciNet MATH Google Scholar
Brown, L.E., Tsamardinos, I.: Markov blanket-based variable selection. Technical Report DSL TR-08-01 (2008)
Google Scholar
Cheng, J., Bell, D.A., Liu, W.: Learning belief networks from data: An information theory based approach. In: Golshani, F., Makki, K. (eds.) Proc. the 6th Int. Conf. Inf. and Knowledge Management, Las Vegas, NV, pp. 325–331. ACM, New York (1997)
Google Scholar
Friedman, N., Nachman, I., Peer, D.: Learning of Bayesian network structure from massive datasets: The sparse candidate algorithm. In: Laskey, K., Prade, H. (eds.) Proc. the 15th Conf. Uncertainty in Artif. Intell., Stockholm, Sweden, pp. 206–215. Morgan Kaufmann, San Francisco (1999)
Google Scholar
Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Langley, P. (ed.) Proc. the 17th Int. Conf. Machine Learning, Stanford, CA, pp. 359–366. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Duangsoithong, R., Windeatt, T.: Relevance and redundancy analysis for ensemble classifiers. In: Perner, P. (ed.) MLDM 2009. LNCS, vol. 5632, pp. 206–220. Springer, Heidelberg (2009)
Chapter Google Scholar
Guyon, I., Aliferis, C., Elisseeff, A.: Causal feature selection. In: Liu, H., Motoda, H. (eds.) Computational Methods of Feature Selection, pp. 63–86. Chapman & Hall/CRC Press, Boca Raton (2007)
Chapter Google Scholar
Guyon, I.: Causality workbench (2008), http://www.causality.inf.ethz.ch/home.php
Kudo, M., Sklansky, J.: Comparison of algorithms that select features for pattern classifiers. Pattern Recognition 33, 25–41 (2000)
Article Google Scholar
Liu, F., Tian, F., Zhu, Q.: Bayesian network structure ensemble learning. In: Alhajj, R., Gao, H., Li, X., Li, J., Zaïane, O.R. (eds.) ADMA 2007. LNCS (LNAI), vol. 4632, pp. 454–465. Springer, Heidelberg (2007)
Chapter Google Scholar
Liu, F., Tian, F., Zhu, Q.: Ensembling Bayesian network structure learning on limited data. In: Silva, M.J., Laender, A.H.F., Baeza-Yates, R.A., McGuinness, D.L., Olstad, B., Olsen, Ø.H., Falcão, A.O. (eds.) Proc. of the 16th ACM Conf. Inf. and Knowledge Management, Lisbon, Portugal, pp. 927–930. ACM, New York (2007)
Google Scholar
Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowledge and Data Engineering 17, 491–502 (2005)
Article Google Scholar
Margaritis, D., Thrun, S.: Bayesian network induction via local neighborhoods. In: Solla, S.A., Leen, T.K., Müller, K.-R. (eds.) Proc. Neural Inf. Proc. Conf., Denver, CO., pp. 505–511. MIT Press, Cambridge (2000)
Google Scholar
Pudil, P., Novovicova, J., Kittler, J.: Floating Search Methods in Feature Selection. Pattern Recognition Letters 15, 1119–1125 (1994)
Article Google Scholar
Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23, 2507–2517 (2007)
Article Google Scholar
Spirtes, P., Glymour, C., Scheines, R.: Causation, prediction, and search. Springer, New York (1993)
MATH Google Scholar
Tsamardinos, I., Aliferis, C.F., Statnikov, A.: Time and sample efficient discovery of Markov blankets and direct causal relations. In: Getoor, L., Senator, T.E., Domingos, P., Faloutsos, C. (eds.) Proc. the 9th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Washington DC, pp. 673–678. ACM, New York (2003)
Chapter Google Scholar
Tsamardinos, I., Brown, L.E., Aliferis, C.F.: The max-min hill-climbing Bayesian network structure learning algorithm. Machine Learning 65, 31–78 (2006)
Article Google Scholar
Wang, M., Chen, Z., Cloutier, S.: A hybrid Bayesian network learning method for constructing gene networks. J. Comp. Biol. and Chem. 31, 361–372 (2007)
Article MATH Google Scholar
Windeatt, T.: Accuracy/diversity and ensemble MLP classifier design. IEEE Trans. Neural Networks 17, 1194–1211 (2006)
Article Google Scholar
Windeatt, T.: Ensemble MLP classifier design. In: Lakhmi, J.C., Sato-Ilic, M., Virvou, M., Tsihrintzis, G.A., Balas, V.E., Abeynayake, C. (eds.) Computational Intelligence Paradigms. SCI, vol. 137, pp. 133–147. Springer, Heidelberg (2008)
Chapter Google Scholar
Witten, I.H., Frank, E.: Data mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Machine Learning Research 5, 1205–1224 (2004)
MathSciNet Google Scholar
Zhang, H., Sun, G.: Feature selection using tabu search. Pattern Recognition 35, 701–711 (2002)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford, United Kingdom, GU2 7XH
Rakkrit Duangsoithong & Terry Windeatt

Authors

Rakkrit Duangsoithong
View author publications
You can also search for this author in PubMed Google Scholar
Terry Windeatt
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Malmo, Stora Trädgårdsgatan 20, läg 1601, 21128, Malmö, Sweden
Oleg Okun
Department of Computer Science, University of Milan, Via Comelico 39, 20135, Milano, Italy
Giorgio Valentini
Department of Computer Science, University of Milan, via Comelico 39/41, 20135, Milano, Italia
Matteo Re

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Duangsoithong, R., Windeatt, T. (2011). Hybrid Correlation and Causal Feature Selection for Ensemble Classifiers. In: Okun, O., Valentini, G., Re, M. (eds) Ensembles in Machine Learning Applications. Studies in Computational Intelligence, vol 373. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22910-7_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-22910-7_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22909-1
Online ISBN: 978-3-642-22910-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics