Using Biased Discriminant Analysis for Email Filtering
- Juan Carlos GomezAffiliated withITESM
- , Marie-Francine MoensAffiliated withKatholieke Universiteit Leuven
This paper reports on email filtering based on content features. We test the validity of a novel statistical feature extraction method, which relies on dimensionality reduction to retain the most informative and discriminative features from messages. The approach, named Biased Discriminant Analysis (BDA), aims at finding a feature space transformation that closely clusters positive examples while pushing away the negative ones. This method is an extension of Linear Discriminant Analysis (LDA), but introduces a different transformation to improve the separation between classes and it has up till now not been applied for text mining tasks.
We successfully test BDA under two schemas. The first one is a traditional classification scenario using a 10-fold cross validation for four ground truth standard corpora: LingSpam, SpamAssassin, Phishing corpus and a subset of the TREC 2007 spam corpus. In the second schema we test the anticipatory properties of the statistical features with the TREC 2007 spam corpus.
The contributions of this work is the evidence that BDA offers better discriminative features for email filtering, gives stable classification results notwithstanding the amount of features chosen, and robustly retains their discriminative value over time.
- Using Biased Discriminant Analysis for Email Filtering
- Book Title
- Knowledge-Based and Intelligent Information and Engineering Systems
- Book Subtitle
- 14th International Conference, KES 2010, Cardiff, UK, September 8-10, 2010, Proceedings, Part I
- pp 566-575
- Print ISBN
- Online ISBN
- Series Title
- Lecture Notes in Computer Science
- Series Volume
- Series ISSN
- Springer Berlin Heidelberg
- Copyright Holder
- Springer-Verlag Berlin Heidelberg
- Additional Links
- Industry Sectors
- eBook Packages
- Editor Affiliations
- 19. School of Engineering, The Parade, Cardiff University
- 20. Dept. of Computer Science and Software Engineering, BUckingham Building, Lion Terrace, University of Portsmouth
- 21. KES International
- 22. School of Electrical and Information Engineering, University of South Australia, ,
- Author Affiliations
- 23. ITESM, Eugenio Garza Sada 2501, Monterrey, NL, 64849, Mexico
- 24. Katholieke Universiteit Leuven, Celestijnenlaan 200A, B-3001, Heverlee, Belgium
To view the rest of this content please follow the download PDF link above.