Skip to main content

Advertisement

SpringerLink
  • Log in
Book cover

European Symposium on Research in Computer Security

ESORICS 2012: Computer Security – ESORICS 2012 pp 824–841Cite as

  1. Home
  2. Computer Security – ESORICS 2012
  3. Conference paper
Detecting Phishing Emails the Natural Language Way

Detecting Phishing Emails the Natural Language Way

  • Rakesh Verma19,
  • Narasimha Shashidhar20 &
  • Nabil Hossain21 
  • Conference paper
  • 5185 Accesses

  • 33 Citations

Part of the Lecture Notes in Computer Science book series (LNSC,volume 7459)

Abstract

Phishing causes billions of dollars in damage every year and poses a serious threat to the Internet economy. Email is still the most commonly used medium to launch phishing attacks [1]. In this paper, we present a comprehensive natural language based scheme to detect phishing emails using features that are invariant and fundamentally characterize phishing. Our scheme utilizes all the information present in an email, namely, the header, the links and the text in the body. Although it is obvious that a phishing email is designed to elicit an action from the intended victim, none of the existing detection schemes use this fact to identify phishing emails. Our detection protocol is designed specifically to distinguish between “actionable” and “informational” emails. To this end, we incorporate natural language techniques in phishing detection. We also utilize contextual information, when available, to detect phishing: we study the problem of phishing detection within the contextual confines of the user’s email box and demonstrate that context plays an important role in detection. To the best of our knowledge, this is the first scheme that utilizes natural language techniques and contextual information to detect phishing. We show that our scheme outperforms existing phishing detection schemes. Finally, our protocol detects phishing at the email level rather than detecting masqueraded websites. This is crucial to prevent the victim from clicking any harmful links in the email. Our implementation called PhishNet-NLP, operates between a user’s mail transfer agent (MTA) and mail user agent (MUA) and processes each arriving email for phishing attacks even before reaching the inbox.

Keywords

  • Natural Language Processing
  • Context Analysis
  • Word Sense Disambiguation
  • Stopword Removal
  • Natural Language Processing Technique

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Download conference paper PDF

References

  1. Parno, B., Kuo, C., Perrig, A.: Phoolproof Phishing Prevention. In: Di Crescenzo, G., Rubin, A. (eds.) FC 2006. LNCS, vol. 4107, pp. 1–19. Springer, Heidelberg (2006)

    CrossRef  Google Scholar 

  2. Irani, D., Webb, S., Giffin, J., Pu, C.: Evolutionary study of phishing. In: 3rd Anti-Phishing Working Group eCrime Researchers Summit (2008)

    Google Scholar 

  3. Ludl, C., McAllister, S., Kirda, E., Kruegel, C.: On the Effectiveness of Techniques to Detect Phishing Sites. In: Hämmerli, B.M., Sommer, R. (eds.) DIMVA 2007. LNCS, vol. 4579, pp. 20–39. Springer, Heidelberg (2007)

    CrossRef  Google Scholar 

  4. Sheng, S., Wardman, B., Warner, G., Cranor, L., Hong, J., Zhang, C.: An empirical analysis of phishing blacklists. In: Proc. 6th Conf. on Email and Anti-Spam (2009)

    Google Scholar 

  5. Zhang, Y., Hong, J., Cranor, L.: Cantina: a content-based approach to detecting phishing web sites. In: Proc. 16th Int’l Conf. on World Wide Web, pp. 639–648. ACM (2007)

    Google Scholar 

  6. Xiang, G., Hong, J., Rose, C.P., Cranor, L.: Cantina+: A feature-rich machine learning framework for detecting phishing web sites. CM Trans. Inf. Syst. Secur. 14, 21:1–21:28 (2011)

    CrossRef  Google Scholar 

  7. Whittaker, C., Ryner, B., Nazif, M.: Large-scale automatic classification of phishing pages. In: Proc. of 17th NDSS (2010)

    Google Scholar 

  8. Garera, S., Provos, N., Chew, M., Rubin, A.: A framework for detection and measurement of phishing attacks. In: Proc. 2007 ACM Workshop on Recurring Malcode, pp. 1–8 (2007)

    Google Scholar 

  9. Chen, J., Guo, C.: Online detection and prevention of phishing attacks. In: First Int’l Conf. on Communications and Networking in China, ChinaCom 2006, pp. 1–7. IEEE (2006)

    Google Scholar 

  10. Fette, I., Sadeh, N., Tomasic, A.: Learning to detect phishing emails. In: Proc. 16th Int’l Conf. on World Wide Web, pp. 649–656. ACM (2007)

    Google Scholar 

  11. Chandrasekaran, M., Narayanan, K., Upadhyaya, S.: Phishing email detection based on structural properties. In: NYS CyberSecurity Conf. (2006)

    Google Scholar 

  12. Bergholz, A., Chang, J., Paaß, G., Reichartz, F., Strobel, S.: Improved phishing detection using model-based features. In: Proc. Conf. on Email and Anti-Spam, CEAS (2008)

    Google Scholar 

  13. Basnet, R., Mukkamala, S., Sung, A.: Detection of phishing attacks: A machine learning approach. In: Soft Computing Applications in Industry, pp. 373–383 (2008)

    Google Scholar 

  14. Bergholz, A., Beer, J.D., Glahn, S., Moens, M.F., Paaß, G., Strobel, S.: New filtering approaches for phishing email. Journal of Computer Security 18(1), 7–35 (2010)

    Google Scholar 

  15. Gansterer, W.N., Pölz, D.: E-Mail Classification for Phishing Defense. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 449–460. Springer, Heidelberg (2009)

    CrossRef  Google Scholar 

  16. Abu-Nimeh, S., Nappa, D., Wang, X., Nair, S.: A comparison of machine learning techniques for phishing detection. In: Proc. Anti-phishing Working Group’s 2nd Annual eCrime Researchers Summit, pp. 60–69. ACM (2007)

    Google Scholar 

  17. Yu, W., Nargundkar, S., Tiruthani, N.: Phishcatch-a phishing detection tool. In: 33rd IEEE Int’l Computer Software and Applications Conf., pp. 451–456 (2009)

    Google Scholar 

  18. Jakobsson, M., Myers, S.: Phishing and countermeasures: understanding the increasing problem of electronic identity theft. Wiley-Interscience (2006)

    Google Scholar 

  19. James, L.: Phishing exposed. Syngress Publishing (2005)

    Google Scholar 

  20. Ollmann, G.: The phishing guide. Next Generation Security Software Ltd. (2004)

    Google Scholar 

  21. Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, Inc. (1986)

    Google Scholar 

  22. Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    CrossRef  Google Scholar 

  23. Fellbaum, C. (ed.): WordNet An Electronic Lexical Database. MIT Press (1998)

    Google Scholar 

  24. Richens, T.: Anomalies in the wordnet verb hierarchy. In: COLING, pp. 729–736 (2008)

    Google Scholar 

  25. Mihalcea, R., Csomai, A.: Senselearner: Word sense disambiguation for all words in unrestricted text. In: ACL (2005)

    Google Scholar 

  26. Mihalcea, R., Tarau, P.: Textrank: Bringing order into text. In: EMNLP, pp. 404–411 (2004)

    Google Scholar 

  27. Hansen, T., Crocker, D., Hallam-Baker, P.: Domainkeys identified mail (dkim) service overview (2009), http://www.dkim.org/specs/rfc5585.html

  28. Wong, M., Schlitt, W.: Sender policy framework (spf) for authorizing use of domains in e-mail (2006), http://tools.ietf.org/html/rfc4408

  29. Verma, R., Shashidhar, N., Hossain, N.: Two-pronged phish snagging. In: Seventh International Conference on Availability, Reliability and Security, Availability, Reliability and Security (ARES). IEEE (2012)

    Google Scholar 

  30. Nazario, J.: The online phishing corpus (2004), http://monkey.org/~jose/wiki/doku.php

Download references

Author information

Authors and Affiliations

  1. Department of Computer Science, University of Houston, USA

    Rakesh Verma

  2. Department of Computer Science, Sam Houston State University, USA

    Narasimha Shashidhar

  3. Division of Science, Mathematics, and Computing, Bard College, USA

    Nabil Hossain

Authors
  1. Rakesh Verma
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Narasimha Shashidhar
    View author publications

    You can also search for this author in PubMed Google Scholar

  3. Nabil Hossain
    View author publications

    You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

  1. Dipartimento di Informatica, Università degli Studi di Milano, Via Bramante 65, 26013, Crema, Italy

    Sara Foresti

  2. Computer Science Department, Columbia University, 1214 Amsterdam Avenue, 10025, New York, NY, US

    Moti Yung

  3. Institute of Informatics and Telematics, Information Security Group, National Research Council, Pisa Research Area, Via G. Moruzzi 1, 56125, Pisa, Italy

    Fabio Martinelli

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Verma, R., Shashidhar, N., Hossain, N. (2012). Detecting Phishing Emails the Natural Language Way. In: Foresti, S., Yung, M., Martinelli, F. (eds) Computer Security – ESORICS 2012. ESORICS 2012. Lecture Notes in Computer Science, vol 7459. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33167-1_47

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/978-3-642-33167-1_47

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33166-4

  • Online ISBN: 978-3-642-33167-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Over 10 million scientific documents at your fingertips

Switch Edition
  • Academic Edition
  • Corporate Edition
  • Home
  • Impressum
  • Legal information
  • Privacy statement
  • California Privacy Statement
  • How we use cookies
  • Manage cookies/Do not sell my data
  • Accessibility
  • FAQ
  • Contact us
  • Affiliate program

Not logged in - 18.206.92.240

Not affiliated

Springer Nature

© 2023 Springer Nature Switzerland AG. Part of Springer Nature.