Advertisement

Detection of Illegitimate Emails Using Boosting Algorithm

  • Sarwat Nizamani
  • Nasrullah Memon
  • Uffe Kock Wiil
Chapter
Part of the Lecture Notes in Social Networks book series (LNSN)

Abstract

In this paper, we report on experiments to detect illegitimate emails using boosting algorithm. We call an email illegitimate if it is not useful for the receiver or for the society. We have divided the problem into two major areas of illegitimate email detection: suspicious email detection and spam email detection. For our desired task, we have applied a boosting technique. With the use of boosting we can achieve high accuracy of traditional classification algorithms. When using boosting one has to choose a suitable weak learner as well as the number of boosting iterations. In this paper, we propose suitable weak learners and parameter settings for the boosting algorithm for the desired task. We have initially analyzed the problem using base learners. Then we have applied boosting algorithm with suitable weak learners and parameter settings such as the number of boosting iterations. We propose a Naive Bayes classifier as a suitable weak learner for the boosting algorithm. It achieves maximum performance with very few boosting iterations.

Keywords

Support Vector Machine Base Learner Training Instance Weak Learner Decision Tree Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Appavu, S., Rajaram, R.: Suspicious email detection via decision tree: A data mining approach. J. Comput. Inform. Technol. 15, 161–169 (2007)Google Scholar
  2. 2.
    Appavu, S., Rajaram, R.: Association rule mining for suspicious email detection: A data mining approach. IEEE International Conference on Intelligence and Security Informatics, pp. 316–323. (2007)Google Scholar
  3. 3.
    Appavu, S., Rajaram, R.: Learning to Classify threatening e-mail. Int. J. Artif. Intell. Soft Comput. 1, 39–51 (2008)CrossRefGoogle Scholar
  4. 4.
    Allanach, J., Tu, H., Singh, S., Willet, P., Pattipati, K.: Detecting, Tracking and Counteracting Terrorist Networks Via Hidden Markov Model. In: IEEE Aerospace Conference, pp. 3246–3257 (2004)Google Scholar
  5. 5.
    Bylander, T., Tate, L.: Using Validation Sets to Avoid Overfitting in AdaBoost. In: 19th International Florida Artificial Intelligence Research Society Conference, pp. 544–549. (2006)Google Scholar
  6. 6.
    Carnegie Mellom Universiy. http://www.cs.cmu.edu/\~enron/\AQPlease provide Publication year for reference “(6)".
  7. 7.
    Clayton, R.: Email traffic: A quantitative snapshot. In: CEAS 2007-Fourth Conference on Email and Anti-Spam, Mountain View, California USA (2007)Google Scholar
  8. 8.
    Ferris Research Report: Spam Control: Problems and opportunities”, http://www.ferris.com. Accessed on 25-08-2010
  9. 9.
    Freund, Y., Schapire, R.E.: Experiments with a New Boosting Algorithm. In: Machine Learning: 13th International Conference on Machine Learning, pp. 148–156. (1996)Google Scholar
  10. 10.
    Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1997)CrossRefMATHMathSciNetGoogle Scholar
  11. 11.
    Fette, I., Sadeh, N., Tomasic, A.: Learning to Detect Phishing Emails. Technical Report. Carnegie Mellon Cyber Laboratory (2006)Google Scholar
  12. 12.
    Federal Energy Regulatory Commission. A report downloaded from http://www.ferc.gov/. Accessed on 20-08-2010
  13. 13.
    Graham, P.: A plan for Spam. http://www.paulgraham.com/spam.html. An Internet article. Accessed on 23-08-2010
  14. 14.
    Joachims, T: A Statistical Learning Model of Text Classification for Support Vector Machines. In: 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (2001)Google Scholar
  15. 15.
    Lim, M.J.H.: Computational Intelligence in Email Traffic Analysis. Ph.D. Dissertation, University of Tasmania. (2008)Google Scholar
  16. 16.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Ian H. Witten, I. H.: The WEKA Data Mining Software: An Update; SIGKDD Explorations, vol. 11(1). (2009)Google Scholar
  17. 17.
    McCallum, A., Nigam, K.: A Comparison of Event Models for Naive Bayes Text Classification. Technical Report. Workshop on Learning for Text Categorization, pp. 41–48. (1998)Google Scholar
  18. 18.
    Meir, R., Rastch, G.: An Introduction to Boosting and Leveraging. Advanced lectures on Machine Learning, pp. 118–183. Springer, New York (2003)Google Scholar
  19. 19.
    Metsis, V., Androutsopoulos, I., Paliouras, G.: Spam Filtering with Naive Bayes – Which Naive Bayes. In: 3rd Conference on Email and Anti-Spam, pp. 1702–1761. (2006)Google Scholar
  20. 20.
    National Commission on Terrorist Attacks Upon the United States. http://govinfo.library.unt.edu/911/report/911Report.pdf, (2004). Accessed on 25-08-2010
  21. 21.
    Quinlan, J.R.: Induction of Decision Trees. J. Mach. Learn. 1, 81–106 (1986)Google Scholar
  22. 22.
    Quinlan, J.R.: C4.5: Programs for machine learning. Machine Learning, vol. 16, pp. 235–240. Springer, Berlin (1993)Google Scholar
  23. 23.
    Renuka, D.K., Hamsapriya, T.: Email Classification for Spam Detection using Word Stemming. Int. J. Comput. Appl. 1, 45–47 (2010)Google Scholar
  24. 24.
    pc]Please provide Publication year for reference “(25)".Schlimmer, J.C., Fisher, D.: A case study of incremental concept induction. In: 5th National Conference on Artificial Intelligence, pp. 496–501. (1986)Google Scholar
  25. 25.
    Spambase dataset. Downloaded from UCI Machine Learning Repository. http://archive.ics.uci.edu/ml/datasets/Spambase
  26. 26.
    Shawkat, A., S., Xiang, Y.: Spam classification using adaptive boosting algorithm. In: IEEE 6th Conference on Computer and Information Science, pp. 972–976. (2007)Google Scholar
  27. 27.
    Tan, P.N., Michael Steinbach, M., Kumar, V.: Introduction to Data Mining. pp. 285–290. (2006)Google Scholar
  28. 28.
    Utgoff, P.E.: ID5: An incremental ID3. In: 5th International Conference on Machine Learning, pp. 107–120. (1988)Google Scholar
  29. 29.
    Utgoff, P.E.: Incremental induction of decision trees. Mach. Learn. 4, 161–186. (1989)CrossRefGoogle Scholar
  30. 30.
    Utgoff, P.E., Berkman, N.C., Clouse, J.A.: Decision tree induction based on efficient tree restructuring. Mach. Learn. 29, 5–44 (1997)CrossRefMATHGoogle Scholar
  31. 31.
    Vapnik, V.: The Nature of Statistical Theory. Springer, New York (1995)CrossRefMATHGoogle Scholar
  32. 32.
    Weber, R., Waldstein, I., Deshpande, A., Proctor, M.J.: Integrated approach to detect inconspicuous contents. LNAI. 304–315. (2005)Google Scholar
  33. 33.
    Youn, S., Dennis, M.: A comparative study for email classification. Advances and Innovations in Systems, Computing Sciences and Software Engineering, pp. 387–391. Springer, Berlin (2007)Google Scholar
  34. 34.
    Youn, S., Dennis, M.: Efficient spam email filtering using an adaptive ontology. In: IEEE 4th International Conference on Information Technology: New Generations (ITNG), pp. 249–254. (2007)Google Scholar

Copyright information

© Springer-Verlag/Wien 2011

Authors and Affiliations

  • Sarwat Nizamani
    • 1
    • 2
  • Nasrullah Memon
    • 1
    • 3
  • Uffe Kock Wiil
    • 1
  1. 1.Counterterrorism Research Lab, The Maersk Mc-Kinney Moller InstituteUniversity of Southern DenmarkOdenseDenmark
  2. 2.University of SindhJamshoroPakistan
  3. 3.Hellenic American UniversityManchesterUSA

Personalised recommendations