Classifying E-Mails Via Support Vector Machine

  • Lidan Shou
  • Bin Cui
  • Gang Chen
  • Jinxiang Dong
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4016)


For addressing the growing problem of junk E-mail on the Internet, this paper proposes an effective E-mail classifying technique. Our work handles E-mail messages as semi-structured documents consisting of a set of fields with predefined semantics and a number of variable length free-text contents. The main contributions of this paper include the following: First, we present a Support Vector Machine (SVM) based model that incorporates the Principal Component Analysis (PCA) technique to reduce the data in terms of size and dimensionality of the input feature space. As a result, the input data become classifiable with fewer features, and the training process has faster convergence speed. Second, we build the classification model using both the \(\mathcal{C}\)-support vector machine and v-support vector machine algorithms. Various control parameters for performance tuning are studied in an extensive set of experiments. The results of our performance evaluation indicate that the proposed technique is effective in E-mail classification.


Support Vector Machine Support Vector Machine Model Principal Component Analysis Method Body Feature Optimal Hyperplane 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Burges, C.J.C.: A Tutorial on Support Vector Machine for Pattern Recognition Data Mining and Knowledge Discovery 2, 121–167 (1998)Google Scholar
  2. 2.
    Cohen, W.W.: Learning rules that classify e-mail. In: Proc. AAAI Spring Symposium on Machine Learning in Information Access, pp. 124–143 (1996)Google Scholar
  3. 3.
    Cui, B., Mondal, A., Shen, J., Cong, G., Tan, K.-L.: On Effective E-mail Classification via Neural Networks. In: Andersen, K.V., Debenham, J., Wagner, R. (eds.) DEXA 2005. LNCS, vol. 3588, pp. 85–94. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  4. 4.
    Diao, Y., Lu, H., Wu, D.: A Comparative Study of Classification Based Personal E-mail Filtering. In: Terano, T., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805, pp. 408–419. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  5. 5.
    Drucker, H., Wu, D., Vapnik, V.N.: Support Vector Machine for Spam Categorization. IEEE Trans. on Neural Networks 10(5), 1048–1054 (1999)CrossRefGoogle Scholar
  6. 6.
    Joachims, T.: Making large-Scale SVM Learning Practical. In: Advances in KernelMethods - Support Vector Learning, ch. 11. MIT Press, Cambridge (1999)Google Scholar
  7. 7.
    Jolliffe, I.T.: Principal Component Analysis. Springer, Heidelberg (1986)Google Scholar
  8. 8.
    Kiritchenko, S., Matwin, S.: E-mail Classification with Co-Training. In: Proc. Of CASCON, Toronto, Canada, pp. 192–201 (2001)Google Scholar
  9. 9.
    Rüping, S.: mySVM-Manual. University of Dortmund, Lehrstuhl Informatik 8 (2000),
  10. 10.
    Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A bayesian approach tofiltering junk e-mail. In: Proc. AAAI Workshop Learning for Text Categorization, Madison, Wisconsin (1998)Google Scholar
  11. 11.
    Schölkopf, B., Smola, A.J., Williamson, R.C., Bartlett, P.L.: New support vector algorithms. Neural Computation 12, 1207–1245 (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Lidan Shou
    • 1
  • Bin Cui
    • 2
  • Gang Chen
    • 1
  • Jinxiang Dong
    • 1
  1. 1.College of Computer ScienceZhejiang UniversityHangzhouP.R. China
  2. 2.School of ComputingNational University of SingaporeSingapore

Personalised recommendations