A Computational Study of Naíve Bayesian Learning in Anti-spam Management
It has been argued that Bayesian learning can be used to filter unsolicited junk e-mail (“spam”) and outperform other anti-spam methods, e.g., the heuristics approaches. We develop a Bayesian learning system, and conduct a computational study on a corpus of 10,000 emails to evaluate its performance and robustness, particularly the performances with different training-corpus sizes and multi-grams. Based on the computational results, we conclude that the Bayesian anti-spam approach is promising in anti-spam management as compared with other methods at the client side, and may need additional work to be viable at the corporate level in practice.
KeywordsClassification Accuracy Heuristic Approach Client Side Bayesian Learning Training Size
- 1.Baker, S.: The Taming of the Internet. Business Week Magazine (December 2003)Google Scholar
- 2.Conry-Murray, A.: Fighting the Spam Monster - and Winning. Network Magazine (April 2003)Google Scholar
- 4.Gelman, A., Carlin, J., Stern, H., Rubin, D.: Bayesian Data Analysis. Chapman & Hall/CRC, New York (2000)Google Scholar
- 6.Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)Google Scholar
- 7.Riply, B.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996)Google Scholar