The Use of Multi-Criteria in Feature Selection to Enhance Text Categorization
Feature selection has been an interesting issue in text categorization up to now. Previous works in feature selection often used filter model in which features, after ranked by a measure, are selected based on a given threshold. In this paper, we present a novel approach to feature selection based on multi-criteria of each feature. Instead of only one criterion, multi-criteria of a feature are used; and a procedure based on each threshold of feature selection is proposed. This framework seems to be suitable for text data and applied to text categorization. Experimental results on Reuters-21578 benchmark data show that our approach has a promising scheme and enhances the performance of a text categorization system.
KeywordsFeature Selection Mutual Information Text Categorization Baseline Method Optimal Subset
Unable to display preview. Download preview PDF.
- Y. Yang and J.O. Pedersen. A comparative study on feature selection in text categorization. In Proceeding of the 14th International Conference on Machine Learning (ICML97), pages 412–420, 1997.Google Scholar
- D. Mladenic. Feature subset selection in text learning. In Proc of European Conference on Machine Learning(ECML), pages 95–100, 1998.Google Scholar
- M. Rogati and Y. Yang. High-performing feature selection for text classification. In International Conference on Information and Knowledge Management-CIKM2002, pages 659–661, 2002.Google Scholar
- P. Soucy and G. Mineau. A simple feature selection method for text classification. In International Joint Conference of Artificial Intelligence (IJCAI), 2001.Google Scholar
- F. Debole and F. Sebastiani. An analysis of the relative hardness of reuters-21578 subsets. Journal of the American Society for Information Science and Technology (JASIST), 2004. Forthcoming.Google Scholar
- Andrew Kachites McCallum. Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering, 1996. http://www.cs.cmu.edu/~mccallum/bow.Google Scholar