An Intelligent Tool for Detection of Phishing Messages
Phishing messages are a common attack on the web that results in the theft of user information. Finding a solution for this problem is a difficult task because phishers are very creative, and often it is hard even for a human to differentiate between legitimate and malign content. The goal of this project was to develop an intelligent tool for phishing detection that integrates only local information (the full content of the message) and does not rely on external (usually commercial) sources or black lists.
The major focus of this paper is the selection of appropriate features to discriminate between ordinary and phishing messages and the choice of an efficient classifier. The system can dynamically update the feature list and quickly adapt to new trends of phishing attacks. The proposed tool is suitable for implementation in email accounts or any other social network or communication channel. It is intended to reduce the workload on human experts that otherwise need to go through hundreds of messages everyday to verify their authenticity.
KeywordsPhishing messages Text mining Feature selection Random Forest
This Research work is funded by National Funds through the FCT - Foundation for Science and Technology, in the context of the project UID /CEC/00127/2013.
- 1.Jan, T.R.: Effectiveness and limitations of statistical spam filters. In: International Conference on New Trends in Statistics and Optimization (2009). https://pdfs.semanticscholar.org/85cc/8a68a7a822efcd24aa939170c03473c65846.pdf
- 3.Zhang, J., Liu, Y.: Spam email detection: a comparative study. Tech. Data Min. J. (2013)Google Scholar
- 4.Awad, W.A., ELseuofi, S.M.: Machine learning methods for spam email classification. Int. J. Comput. Sci. Inf. Technol. 3(1), 173–184 (2011)Google Scholar
- 5.Divya, S., Kumaresan, T.: Email spam classification using machine learning algorithm. Int. J. Innov. Res. Comput. Commun. Eng. 2(1) (2014)Google Scholar
- 6.Enron corpus dataset. http://www2.aueb.gr/users/ion/data/enron-spam/
- 7.Klimt, B., Yang, Y.: The enron corpus: a new dataset for email classification research. Language Technologies Institute Carnegie Mellon University Pittsburgh, PA 15213-8213, USA. http://nyc.lti.cs.cmu.edu/yiming/Publications/klimt-ecml04.pdf
- 8.Radev, D.: Clair collection of fraud emails (2008). http://aclweb.org/aclwiki
- 9.Fraudulent email corpus. https://www.kaggle.com/rtatman/fraudulent-email-corpus
- 10.Leskovec, J., Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets (2011). on-line bookGoogle Scholar
- 12.Basnet, R., Mukkamala, S., Sung, A.H.: Detection of phishing attacks: a machine learning approach. Soft Comput. Appl. Ind. (2008). https://doi.org/10.1007/978-3-540-77465-5_19