Neural Recognition and Genetic Features Selection for Robust Detection of E-Mail Spam
In this paper a method for feature selection and classification of email spam messages is presented. The selection of features is performed in two steps: The selection is performed by measuring their entropy and a fine-tuning selection is implemented using a genetic algorithm. In the classification process, a Radial Basis Function Network is used to ensure robust classification rate even in case of complex cluster structure. The proposed method shows that, when using a two-level feature selection, a better accuracy is achieved than using one-stage selection. Also, the use of a lemmatizer or a stop-word list gives minimal classification improvement. The proposed method achieves 96-97% average accuracy when using only 20 features out of 15000.
KeywordsFeature Selection Classification Error Feature Selection Method Radial Basis Function Network Latent Semantic Analysis
Unable to display preview. Download preview PDF.
- 2.Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Paliouras, G., Spyropoulos, C.D.: An Evaluation of Naive Bayesian Anti-Spam Filtering. In: Proc. of the workshop on Machine Learning in the New Information Age (2000)Google Scholar
- 4.Steinbach, M., Karypis, G., Kumar, V.: A Comparison of Document Clustering Techniques. In: KDD Workshop on Text Mining (2000)Google Scholar
- 5.Michelakis, E., Androutsopoulos, I., Paliouras, G., Sakkis, G., Stamatopoulos, P.: Filtron: A Learning-Based Anti-Spam Filter. In: Proc. of the 1st Conference on Email and Anti-Spam (2004)Google Scholar
- 6.Gavrilis, D., Tsoulos, I., Dermatas, E.: Stochastic Classification of Scientific Abstracts. In: Proceedings of the 6th Speech and Computer Conference, Patra (2005)Google Scholar
- 7.Pierre, J.M.: On the Automated Classification of Web Sites. Linkoping Electronic Articles in Computer and Information Science 6 (2001)Google Scholar