A Comparative Performance Study of Feature Selection Methods for the Anti-spam Filtering Domain

  • J. R. Méndez
  • F. Fdez-Riverola
  • F. Díaz
  • E. L. Iglesias
  • J. M. Corchado
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4065)

Abstract

In this paper we analyse the strengths and weaknesses of the mainly used feature selection methods in text categorization when they are applied to the spam problem domain. Several experiments with different feature selection methods and content-based filtering techniques are carried out and discussed. Information Gain, χ2-text, Mutual Information and Document Frequency feature selection methods have been analysed in conjunction with Naïve Bayes, boosting trees, Support Vector Machines and ECUE models in different scenarios. From the experiments carried out the underlying ideas behind feature selection methods are identified and applied for improving the feature selection process of SpamHunting, a novel anti-spam filtering software able to accurate classify suspicious e-mails.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • J. R. Méndez
    • 1
  • F. Fdez-Riverola
    • 1
  • F. Díaz
    • 2
  • E. L. Iglesias
    • 1
  • J. M. Corchado
    • 3
  1. 1.Dept. InformáticaUniversity of VigoOurenseSpain
  2. 2.Dept. InformáticaUniversity of ValladolidSegoviaSpain
  3. 3.Dept. Informática y AutomáticaUniversity of SalamancaSalamancaSpain

Personalised recommendations