Skip to main content

Abstract

Email has become one of the fastest and most economical forms of communication. However, the increase of email users have resulted in the dramatic increase of spam emails during the past few years. In this paper, email data was classified using four different classifiers (Neural Network, SVM classifier, Naïve Bayesian Classifier, and J48 classifier). The experiment was performed based on different data size and different feature size. The final classification result should be ‘1’ if it is finally spam, otherwise, it should be ‘0’. This paper shows that simple J48 classifier which make a binary tree, could be efficient for the dataset which could be classified as binary tree.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. I. Androutsopoulos, G. Paliouras, V. Karkaletsis, G. Sakkis, C. Spyropoulos, and P. Stamatopoulos, “Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach,” CoRR cs.CL/0009009, 2000.

    Google Scholar 

  2. W. Cohen, “Learning rules that classify e-mail,” In Proc. of the AAAI Spring Symposium on Machine Learning in Information Access, 1996.

    Google Scholar 

  3. B. Cui, A. Mondal, J. Shen, G. Cong, and K. Tan, “On Effective E-mail Classification via Neural Networks,” In Proc. of DEXA, 2005, pp. 85-94.

    Google Scholar 

  4. E. Crawford, I. Koprinska, and J. Patrick, “Phrases and Feature Selection in E-Mail Classification,” In symposium of ADCS, 2004, pp. 59-62.

    Google Scholar 

  5. Y. Diao, H. Lu, and D. Wu, “A comparative study of classification based personal e-mail filtering,” In Proc. of fourth PAKDD, 2000.

    Google Scholar 

  6. T. Fawcett, “in vivo spam filtering: A challenge problem for data mining,” In Proc. of ninth KDD Explorations vol.5 no.2, 2003.

    Google Scholar 

  7. K. Gee, “Using latent semantic indexing to filter spam,” In Proc. of eighteenth ACM Symposium on Applied Computing, Data Mining Track, 2003.

    Google Scholar 

  8. Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen, “Combating Web Spam with TrustRank,” In VLDB, 2004, pp. 576-587.

    Google Scholar 

  9. T. Joachims, “A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization,” In ICML, 1997, pp. 143-151.

    Google Scholar 

  10. T. Joachims, “Structured Output Prediction with Support Vector Machines,” SSPR/SPR, 2006, pp. 1-7

    Google Scholar 

  11. S. Kiritchenko, S. Matwin, and S. Abu-Hakima, “Email Classification with Temporal Features,” Intelligent Information Systems 2004, pp. 523-533.

    Google Scholar 

  12. S. Martin, B. Nelson, A. Sewani, K. Chen, and A. Joseph, “Analyzing Behavioral Features for Email Classification,” CEAS, 2005.

    Google Scholar 

  13. T. Meyer, and B. Whateley, “SpamBayes: Effective open-source, Bayesian based, email classification system,” In Proc. of first Conference of Email and Anti-Spam, 2004.

    Google Scholar 

  14. M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz, “A Bayesian Approach to Filtering Junk E-Mail,” In Proc. of the AAAI Workshop on Learning for Text Categorization, 1998.

    Google Scholar 

  15. S. Shankar and G. Karypis, “Weight adjustment schemes for a centroid based classifier,” Computer Science Technical Report TR00-035, 2000.

    Google Scholar 

  16. I. Stuart, S. Cha, and C. Tappert, “A Neural Network Classifier for Junk E-Mail,” in Document Analysis Systems, 2004, pp. 442-450.

    Google Scholar 

  17. Y. Yang, “An Evaluation of Statistical Approaches to Text Categorization,” Journal of Information Retrieval, Vol 1, No. 1/2, 1999, pp. 67-88.

    Google Scholar 

  18. Y. Yang and J. Pedersen, “A Comparative Study on Feature Selection in Text Categorization,” In ICML, 1997, pp. 412-420.

    Google Scholar 

  19. S. Youn and D. McLeod, “Ontology Development Tools for Ontology-Based Knowledge Management,” In Encyclopedia of E-Commerce, E-Government and Mobile Commerce. Idea Group Inc, 2006.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer

About this paper

Cite this paper

Youn, S., McLeod, D. (2007). A Comparative Study for Email Classification. In: Elleithy, K. (eds) Advances and Innovations in Systems, Computing Sciences and Software Engineering. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-6264-3_67

Download citation

  • DOI: https://doi.org/10.1007/978-1-4020-6264-3_67

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-1-4020-6263-6

  • Online ISBN: 978-1-4020-6264-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics