SSPR /SPR 2004: Structural, Syntactic, and Statistical Pattern Recognition pp 824-830 | Cite as
A Computational Study of Naíve Bayesian Learning in Anti-spam Management
Abstract
It has been argued that Bayesian learning can be used to filter unsolicited junk e-mail (“spam”) and outperform other anti-spam methods, e.g., the heuristics approaches. We develop a Bayesian learning system, and conduct a computational study on a corpus of 10,000 emails to evaluate its performance and robustness, particularly the performances with different training-corpus sizes and multi-grams. Based on the computational results, we conclude that the Bayesian anti-spam approach is promising in anti-spam management as compared with other methods at the client side, and may need additional work to be viable at the corporate level in practice.
Keywords
Classification Accuracy Heuristic Approach Client Side Bayesian Learning Training SizeReferences
- 1.Baker, S.: The Taming of the Internet. Business Week Magazine (December 2003)Google Scholar
- 2.Conry-Murray, A.: Fighting the Spam Monster - and Winning. Network Magazine (April 2003)Google Scholar
- 3.Baldonado, M., Chang, C.-C.K., Gravano, L., Paepcke, A.: The Stanford Digital Library Metadata Architecture. Int. J. Digit. Libr. 1, 108–121 (1997); Elkan, C.: Naïve Bayesian Learning, Technical Report No. CS97-557, University of California, San Diego (1997)CrossRefGoogle Scholar
- 4.Gelman, A., Carlin, J., Stern, H., Rubin, D.: Bayesian Data Analysis. Chapman & Hall/CRC, New York (2000)Google Scholar
- 5.Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning – Data Mining, Inference, and Prediction. Springer, New York (2001)MATHGoogle Scholar
- 6.Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)Google Scholar
- 7.Riply, B.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996)Google Scholar