PERC: A Personal Email Classifier

  • Shih-Wen Ke
  • Chris Bowerman
  • Michael Oakes
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3936)


Improving the accuracy of assigning new email messages to small folders can reduce the likelihood of users creating duplicate folders for some topics. In this paper we presented a hybrid classification model, PERC, and use the Enron Email Corpus to investigate the performance of kNN, SVM and PERC in a simulation of a real-time situation. Our results show that PERC is significantly better at assigning messages to small folders. The effects of different parameter settings for the classifiers are discussed.


Support Vector Machine Support Vector Machine Parameter Thresholding Strategy Rare Category Small Folder 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bekkerman, R., McCallum, A., Huang, G.: Automatic Categorization of Email into Folders: Benchmark Experiments on Enron and SRI Corpora. CIIR Technical Report IR- 418(2004), Available at:
  2. 2.
    Guo, G., Wang, H., Bell, D., Bi, Y., Greer., K.: KNN Model-Based Approach in Classification. In: ODBASE (2003)Google Scholar
  3. 3.
    Han, E., Karypis, G.: Centroid-Based Document Classification: Analysis and Experimental Results. In: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, pp. 424–431 (2000)Google Scholar
  4. 4.
    Ke, S., Bowerman, C., Oakes, M.: Mining Personal Data Collections to Discover Categories and Category Labels. In: International Workshop of Text Mining Research, Practice and Opportunities, RANLP, pp. 17-22 (2005)Google Scholar
  5. 5.
    Kiritchenko, S., Matwin, S.: Email Classification with Co-Training. In: CASCON (2001)Google Scholar
  6. 6.
    Klimt, B., Yang, Y.: The Enron Corpus: A New Dataset for Email Classification Research. In: ECML (2004)Google Scholar
  7. 7.
    Lam, W., Ho, C.: Using a Generalized Instance Set for Automatic Text Categorization. In: SIGIR, pp. 81-89 (1998)Google Scholar
  8. 8.
    Yang, Y.: A Study on Thresholding Strategies for Text Classification. In: SIGIR, pp. 137-145 (2001)Google Scholar
  9. 9.
    Zhang, J., Yang, Y.: Robustness of Regularized Linear Classification Methods in Text Classification. In: SIGIR, pp. 190-197 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Shih-Wen Ke
    • 1
  • Chris Bowerman
    • 1
  • Michael Oakes
    • 1
  1. 1.School of Computing and TechnologyUniversity of SunderlandSunderlandUK

Personalised recommendations