Skip to main content

Application and Evaluation of Bayesian Filter for Chinese Spam

  • Conference paper
Information Security and Cryptology (Inscrypt 2006)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 4318))

Included in the following conference series:

Abstract

Recently, a statistical filtering based on Bayes theory, so-called Bayesian filtering gain attention when it was described in the paper “A Plan for Spam” by Paul Graham, and has become a popular mechanism to distinguish spam email from legitimate email. Many modern mail programs make use of Bayesian spam filtering techniques. The implementation of the Bayesian filtering corresponding to the email written in English and Japanese has already been developed. On the other hand, few work is conducted on the implementation of the Bayesian spam corresponding to Chinese email. In this paper, firstly, we adopted a statistical filtering called as bsfilter and modified it to filter out Chinese email. When we targeted Chinese emails for experiment, we analyzed the relation between the parameter and the spam judgement accuracy of the filtering, and also considered the optimal parameter values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Graham, P.: A Plan For Spam (August 2002)

    Google Scholar 

  2. Bsfilter, http://bsfilter.org/

  3. CCERT Data Sets of Chinese Emails, http://www.ccert.edu.cn/spam/sa/datasets.htm

  4. Robinson, G.: A statistical approach to the spam problem. Linux Journal 107 (2003)

    Google Scholar 

  5. Graham, P.: Better bayesian filtering. In: Spam Conference (2003)

    Google Scholar 

  6. Zhang, L., Zhu, J., Yao, T.: An Evaluation of Statistical Spam Filtering Techniques. ACM Transactions on Asian Language Information Processing 3(4), 243–269 (2004)

    Article  Google Scholar 

  7. Maosong, S., Dayang, S., Changning, H.: CSeg Tagl.0: A Practical Word Segmenter and POS Tagger for Chinese Texts, A97-1018, A Digital Archive of Research Papers in Computational Linguistics

    Google Scholar 

  8. Hovold, J.: Naive Bayes Spam Filtering Using Word-Position-Based Attributes. In: Second Conference on Email and Anti-Spam, CEAS 2005 (2005)

    Google Scholar 

  9. Iwanaga, M., Tabata, T., Sakurai, K.: Comparison with Implementations of Bayesian Filtering for Anti-spam. In: SCIS 2004, vol. 2, pp. 1025–1028 (2004) (in Japanese)

    Google Scholar 

  10. Ohfuku, H., Matsuura, K.: Optimization of Bayesian filtering for Anti-spam. In: SCIS 2005, vol. 1, pp. 199–204 (2005) (in Japanese)

    Google Scholar 

  11. http://www.statsoft.com/textbook/stnaiveb.html

  12. Support Vector Machine, http://www.support-vector.net/

  13. Boosting, http://www.boosting.org/

  14. Markov Chain, http://www.taygeta.com/rwalks/node7.html

  15. Nie, J.-Y., Ren, F.: Chinese Information Retrieval: Using Characters or Words? Information Processing and Management 35(4), 443–462 (1999)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, Z., Hori, Y., Sakurai, K. (2006). Application and Evaluation of Bayesian Filter for Chinese Spam. In: Lipmaa, H., Yung, M., Lin, D. (eds) Information Security and Cryptology. Inscrypt 2006. Lecture Notes in Computer Science, vol 4318. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11937807_20

Download citation

  • DOI: https://doi.org/10.1007/11937807_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49608-3

  • Online ISBN: 978-3-540-49610-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics