Conditional Mutual Information Based Feature Selection for Classification Task

  • Jana Novovičová
  • Petr Somol
  • Michal Haindl
  • Pavel Pudil
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4756)


We propose a sequential forward feature selection method to find a subset of features that are most relevant to the classification task. Our approach uses novel estimation of the conditional mutual information between candidate feature and classes, given a subset of already selected features which is utilized as a classifier independent criterion for evaluation of feature subsets. The proposed mMIFS-U algorithm is applied to text classification problem and compared with MIFS method and MIFS-U method proposed by Battiti and Kwak and Choi, respectively. Our feature selection algorithm outperforms MIFS method and MIFS-U in experiments on high dimensional Reuters textual data.


Pattern classification feature selection conditional mutual information text categorization 


  1. 1.
    Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Proceedings of the 20th International Conference on Machine Learning, pp. 56–63 (2003)Google Scholar
  2. 2.
    Dash, M., Choi, K., Scheuermann, P., Liu, H.: Feature selection for clustering - a filter solution. In: Proceedings of the Second International Conference on Data Mining, pp. 115–122 (2002)Google Scholar
  3. 3.
    Kohavi, R., John, G.: Wrappers for feature subset selection. Artificial Intelligence 97, 273–324 (1997)zbMATHCrossRefGoogle Scholar
  4. 4.
    Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering 17(3), 491–502 (2005)Google Scholar
  5. 5.
    Dash, M., Liu, H.: Consistency-based search in feature selection. Artificial Intelligence 151(1-2), 155–176 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Jain, A.K., Duin, R.P.W., Mao, J.: Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 4–37 (2000)CrossRefGoogle Scholar
  7. 7.
    Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Networks 5, 537–550 (1994)CrossRefGoogle Scholar
  8. 8.
    Kwak, N., Choi, C.H.: Input feature selection for classification problems. IEEE Transactions on Neural Networks 13(1), 143–159 (2002)CrossRefGoogle Scholar
  9. 9.
    Cover, T., Thomas, J.: Elements of Information Theory, 1st edn. John Wiley & Sons, Chichester (1991)zbMATHGoogle Scholar
  10. 10.
    Fleuret, F.: Fast binary feature selection with conditional mutual information. Journal of Machine Learning Research 5, 1531–1555 (2004)MathSciNetGoogle Scholar
  11. 11.
    Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(8), 1226–1238 (2005)CrossRefGoogle Scholar
  12. 12.
    Fano, R.: Transmission of Information: A Sattistical Theory of Communications. John Wiley and M.I.T.& Sons (1991)Google Scholar
  13. 13.
    Kwak, N., Choi, C.: Improved mutual information feature selector for neural networks in supervised learning. In: Proceedings of the IJCNN 1999, 10th International Joint Conference on Neural Networks pp. 1313–1318 (1999)Google Scholar
  14. 14.
    Forman, G.: An experimental study of feature selection metrics for text categorization. Journal of Machine Learning Research 3, 1289–1305 (2003)zbMATHCrossRefGoogle Scholar
  15. 15.
    McCallum, A., Nigam, K.: A comparison of event models for naive Bayes text classification. In: Proceedings of the AAAI-1998 Workshop on Learning for Text Categorization (1998)Google Scholar
  16. 16.
    Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)Google Scholar
  17. 17.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)CrossRefGoogle Scholar
  18. 18.
    Yang, Y.: A study on thresholding strategies for text categorization. In: Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2001), New Orleans, Louisiana USA (September 9-12, 2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Jana Novovičová
    • 1
    • 2
  • Petr Somol
    • 1
    • 2
  • Michal Haindl
    • 1
    • 2
  • Pavel Pudil
    • 2
    • 1
  1. 1.Dept. of Pattern Recognition, Institute of Academy of Sciences of the Czech Republic 
  2. 2.Faculty of Management, Prague University of EconomicsCzech Republic

Personalised recommendations