Neural Recognition and Genetic Features Selection for Robust Detection of E-Mail Spam

  • Dimitris Gavrilis
  • Ioannis G. Tsoulos
  • Evangelos Dermatas
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3955)


In this paper a method for feature selection and classification of email spam messages is presented. The selection of features is performed in two steps: The selection is performed by measuring their entropy and a fine-tuning selection is implemented using a genetic algorithm. In the classification process, a Radial Basis Function Network is used to ensure robust classification rate even in case of complex cluster structure. The proposed method shows that, when using a two-level feature selection, a better accuracy is achieved than using one-stage selection. Also, the use of a lemmatizer or a stop-word list gives minimal classification improvement. The proposed method achieves 96-97% average accuracy when using only 20 features out of 15000.


Feature Selection Classification Error Feature Selection Method Radial Basis Function Network Latent Semantic Analysis 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Sakkis, G., Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Spyropoulos, C.D., Stamatopoulos, P.: A memory-based approach to anti-spam filtering for mailing lists. Information Retrieval 6, 49–73 (2003)CrossRefGoogle Scholar
  2. 2.
    Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Paliouras, G., Spyropoulos, C.D.: An Evaluation of Naive Bayesian Anti-Spam Filtering. In: Proc. of the workshop on Machine Learning in the New Information Age (2000)Google Scholar
  3. 3.
    Lee, D.L., Chuang, H., Seamons, K.: Document Ranking and the Vector-Space Model. IEEE Software 14, 67–75 (1997)CrossRefGoogle Scholar
  4. 4.
    Steinbach, M., Karypis, G., Kumar, V.: A Comparison of Document Clustering Techniques. In: KDD Workshop on Text Mining (2000)Google Scholar
  5. 5.
    Michelakis, E., Androutsopoulos, I., Paliouras, G., Sakkis, G., Stamatopoulos, P.: Filtron: A Learning-Based Anti-Spam Filter. In: Proc. of the 1st Conference on Email and Anti-Spam (2004)Google Scholar
  6. 6.
    Gavrilis, D., Tsoulos, I., Dermatas, E.: Stochastic Classification of Scientific Abstracts. In: Proceedings of the 6th Speech and Computer Conference, Patra (2005)Google Scholar
  7. 7.
    Pierre, J.M.: On the Automated Classification of Web Sites. Linkoping Electronic Articles in Computer and Information Science 6 (2001)Google Scholar
  8. 8.
    Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 46, 391–407 (1990)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Dimitris Gavrilis
    • 1
  • Ioannis G. Tsoulos
    • 2
  • Evangelos Dermatas
    • 1
  1. 1.Electrical & Computer EngineeringUniversity of PatrasGreece
  2. 2.Computer Science DepartmentUniversity of IoanninaGreece

Personalised recommendations