Abstract
Email has become one of the fastest and most economical forms of communication. However, the increase of email users have resulted in the dramatic increase of spam emails during the past few years. In this paper, email data was classified using four different classifiers (Neural Network, SVM classifier, Naïve Bayesian Classifier, and J48 classifier). The experiment was performed based on different data size and different feature size. The final classification result should be ‘1’ if it is finally spam, otherwise, it should be ‘0’. This paper shows that simple J48 classifier which make a binary tree, could be efficient for the dataset which could be classified as binary tree.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
I. Androutsopoulos, G. Paliouras, V. Karkaletsis, G. Sakkis, C. Spyropoulos, and P. Stamatopoulos, “Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach,” CoRR cs.CL/0009009, 2000.
W. Cohen, “Learning rules that classify e-mail,” In Proc. of the AAAI Spring Symposium on Machine Learning in Information Access, 1996.
B. Cui, A. Mondal, J. Shen, G. Cong, and K. Tan, “On Effective E-mail Classification via Neural Networks,” In Proc. of DEXA, 2005, pp. 85-94.
E. Crawford, I. Koprinska, and J. Patrick, “Phrases and Feature Selection in E-Mail Classification,” In symposium of ADCS, 2004, pp. 59-62.
Y. Diao, H. Lu, and D. Wu, “A comparative study of classification based personal e-mail filtering,” In Proc. of fourth PAKDD, 2000.
T. Fawcett, “in vivo spam filtering: A challenge problem for data mining,” In Proc. of ninth KDD Explorations vol.5 no.2, 2003.
K. Gee, “Using latent semantic indexing to filter spam,” In Proc. of eighteenth ACM Symposium on Applied Computing, Data Mining Track, 2003.
Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen, “Combating Web Spam with TrustRank,” In VLDB, 2004, pp. 576-587.
T. Joachims, “A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization,” In ICML, 1997, pp. 143-151.
T. Joachims, “Structured Output Prediction with Support Vector Machines,” SSPR/SPR, 2006, pp. 1-7
S. Kiritchenko, S. Matwin, and S. Abu-Hakima, “Email Classification with Temporal Features,” Intelligent Information Systems 2004, pp. 523-533.
S. Martin, B. Nelson, A. Sewani, K. Chen, and A. Joseph, “Analyzing Behavioral Features for Email Classification,” CEAS, 2005.
T. Meyer, and B. Whateley, “SpamBayes: Effective open-source, Bayesian based, email classification system,” In Proc. of first Conference of Email and Anti-Spam, 2004.
M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz, “A Bayesian Approach to Filtering Junk E-Mail,” In Proc. of the AAAI Workshop on Learning for Text Categorization, 1998.
S. Shankar and G. Karypis, “Weight adjustment schemes for a centroid based classifier,” Computer Science Technical Report TR00-035, 2000.
I. Stuart, S. Cha, and C. Tappert, “A Neural Network Classifier for Junk E-Mail,” in Document Analysis Systems, 2004, pp. 442-450.
Y. Yang, “An Evaluation of Statistical Approaches to Text Categorization,” Journal of Information Retrieval, Vol 1, No. 1/2, 1999, pp. 67-88.
Y. Yang and J. Pedersen, “A Comparative Study on Feature Selection in Text Categorization,” In ICML, 1997, pp. 412-420.
S. Youn and D. McLeod, “Ontology Development Tools for Ontology-Based Knowledge Management,” In Encyclopedia of E-Commerce, E-Government and Mobile Commerce. Idea Group Inc, 2006.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer
About this paper
Cite this paper
Youn, S., McLeod, D. (2007). A Comparative Study for Email Classification. In: Elleithy, K. (eds) Advances and Innovations in Systems, Computing Sciences and Software Engineering. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-6264-3_67
Download citation
DOI: https://doi.org/10.1007/978-1-4020-6264-3_67
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-6263-6
Online ISBN: 978-1-4020-6264-3
eBook Packages: EngineeringEngineering (R0)