Redundant Feature Elimination by Using Approximate Markov Blanket Based on Discriminative Contribution
As a high dimensional problem, it is a hard task to analyze the text data sets, where many weakly relevant but redundant features hurt generalization performance of classifiers. There are previous works to handle this problem by using pair-wise feature similarities, which do not consider discriminative contribution of each feature by utilizing the label information. Here we define an Approximate Markov Blanket (AMB) based on the metric of DIScriminative Contribution (DISC) to eliminate redundant features and propose the AMB-DISC algorithm. Experimental results on the data set of Reuter-21578 show AMB-DISC is much better than the previous state-of-arts feature selection algorithms considering feature redundancy in terms of Micro avg F1 and Macro avg F1.
KeywordsFeature Selection Discriminative Feature Feature Selection Algorithm Redundant Feature Markov Blanket
Unable to display preview. Download preview PDF.
- 7.Koller, D., Sahami, M.: Toward optimal feature selection. In: Proceedings of the Thirteenth International Conference on Machine Learning, pp. 284–292 (1996)Google Scholar
- 8.Yang, Y., Liu, X.: A re-examination of text categorization methods. In: SIGIR 1999: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 42–49. ACM Press, New York (1999)Google Scholar
- 11.Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: International Conference on Machine Learning, pp. 359–366 (2000)Google Scholar