Reading Between the Lines: Content-Agnostic Detection of Spear-Phishing Emails

  • Hugo GasconEmail author
  • Steffen Ullrich
  • Benjamin Stritter
  • Konrad Rieck
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11050)


Spear-phishing is an effective attack vector for infiltrating companies and organisations. Based on the multitude of personal information available online, an attacker can craft seemingly legit emails and trick his victims into opening malicious attachments and links. Although anti-spoofing techniques exist, their adoption is still limited and alternative protection approaches are needed. In this paper, we show that a sender leaves content-agnostic traits in the structure of an email. Based on these traits, we develop a method capable of learning profiles for a large set of senders and identifying spoofed emails as deviations thereof. We evaluate our approach on over 700,000 emails from 16,000 senders and demonstrate that it can discriminate thousands of senders, identifying spoofed emails with 90% detection rate and less than 1 false positive in 10,000 emails. Moreover, we show that individual traits are hard to guess and spoofing only succeeds if entire emails of the sender are available to the attacker.


Spear-phishing Email spoofing Targeted attack detection 


  1. 1.
    Amin, R.M.: Detecting targeted malicious email through supervised classification of persistent threat and recipient oriented features. Ph.D. thesis, George Washington University, Washington, DC, USA (2010). aAI3428188Google Scholar
  2. 2.
    Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: International Conference on Machine Learning (ICML), pp. 97–104 (2006)Google Scholar
  3. 3.
    Buildwith technology lookup. Accessed November 2017
  4. 4.
    Callas, J., Donnerhacke, L., Finney, H., Shaw, D., Thayer, R.: OpenPGP Message Format. RFC 4880 (Proposed Standard), November 2007. Updated by RFC 5581
  5. 5.
    Caputo, D.D., Pfleeger, S.L., Freeman, J.D., Johnson, M.E.: Going spear phishing: exploring embedded training and awareness. IEEE Secur. Priv. 12(1), 28–38 (2014)CrossRefGoogle Scholar
  6. 6.
    Chen, P., Desmet, L., Huygens, C.: A study on advanced persistent threats. In: De Decker, B., Zúquete, A. (eds.) CMS 2014. LNCS, vol. 8735, pp. 63–72. Springer, Heidelberg (2014). Scholar
  7. 7.
    Crocker, D., Hansen, T., Kucherawy, M.: DomainKeys Identified Mail (DKIM) Signatures. RFC 6376 (Internet Standard), September 2011.
  8. 8.
    Lawrence, N.D., Schölkopf, B.: Estimating a kernel fisher discriminant in the presence of label noise. In: ICML, vol. 1, pp. 306–313 (2001)Google Scholar
  9. 9.
    Duda, R., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, Hoboken (2001)zbMATHGoogle Scholar
  10. 10.
    Duman, S., Cakmakci, K.K., Egele, M., Robertson, W., Kirda, E.: EmailProfiler: spearphishing filtering with header and stylometric features of emails. In: COMPSAC (2016)Google Scholar
  11. 11.
    Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. JMLR 9, 1871–1874 (2008)zbMATHGoogle Scholar
  12. 12.
    Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Foster, I.D., Larson, J., Masich, M., Snoeren, A.C., Savage, S., Levchenko, K.: Security by any other name: on the effectiveness of provider based email security. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, CCS 2015, pp. 450–464. ACM, New York (2015).
  14. 14.
    Freed, N., Borenstein, N.: Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies. RFC 2045 (Draft Standard), November 1996. Updated by RFCs 2184, 2231, 5335, 6532
  15. 15.
    Freed, N., Moore, K.: MIME Parameter Value and Encoded Word Extensions: Character Sets, Languages, and Continuations. RFC 2231 (Proposed Standard), November 1997.
  16. 16.
    Gupta, S., Singhal, A., Kapoor, A.: A literature survey on social engineering attacks: phishing attack. In: 2016 International Conference on Computing, Communication and Automation (ICCCA), pp. 537–540. IEEE (2016)Google Scholar
  17. 17.
    Han, F., Shen, Y.: Accurate spear phishing campaign attribution and early detection. In: SAC, pp. 2079–2086 (2016)Google Scholar
  18. 18.
    Hardy, S., et al.: Targeted threat index: characterizing and quantifying politically-motivated targeted malware. In: USENIX Security, pp. 527–541 (2014)Google Scholar
  19. 19.
    Ho, G., et al.: Detecting credential spearphishing attacks in enterprise settings. In: USENIX Security Symposium (2017)Google Scholar
  20. 20.
    Trend Micro Incorporated: Spear-Phishing Email: Most Favored APT Attack Bait. Technical report, Trend Micro Inc. (2012)Google Scholar
  21. 21.
    Joachims, T.: Text categorization with support vector machines: learning with many relevant features. Technical report 23, LS VIII, University of Dortmund (1997)Google Scholar
  22. 22.
    Joachims, T.: Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms. Kluwer Academic Publishers (2002)Google Scholar
  23. 23.
    Josefsson, S.: The Base16, Base32, and Base64 Data Encodings. RFC 4648 (Proposed Standard), October 2006.
  24. 24.
    Kitterman, S.: Sender Policy Framework (SPF) for Authorizing Use of Domains in Email, Version 1. RFC 7208 (Proposed Standard), April 2014. Updated by RFC 7372
  25. 25.
    Kucherawy, M., Zwicky, E.: Domain-based Message Authentication, Reporting, and Conformance (DMARC). RFC 7489 (Informational), March 2015.
  26. 26.
    Le Blond, S., Uritesc, A., Gilbert, C.: A look at targeted attacks through the lense of an NGO. In: USENIX Security, pp. 543–558 (2014)Google Scholar
  27. 27.
    Lin, E., Aycock, J., Mannan, M.: Lightweight client-side methods for detecting email forgery. In: Lee, D.H., Yung, M. (eds.) WISA 2012. LNCS, vol. 7690, pp. 254–269. Springer, Heidelberg (2012). Scholar
  28. 28.
    Mori, T., Sato, K., Takahashi, Y., Ishibashi, K.: How is e-mail sender authentication used and misused? In: Proceedings of the 8th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, CEAS 2011, pp. 31–37. ACM, New York (2011).
  29. 29.
    Ramsdell, B., Turner, S.: Secure/Multipurpose Internet Mail Extensions (S/MIME) Version 3.2 Message Specification. RFC 5751 (Proposed Standard), January 2010.
  30. 30.
    Resnick, P.: Internet Message Format. RFC 5322 (Draft Standard), October 2008. Updated by RFC 6854
  31. 31.
    Rieck, K., Wressnegger, C., Bikadorov, A.: Sally: a tool for embedding strings in vector spaces. J. Mach. Learn. Res. (JMLR) 13(Nov), 3247–3251 (2012)MathSciNetzbMATHGoogle Scholar
  32. 32.
    Salton, G., Wong, A., Yang, C.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)CrossRefGoogle Scholar
  33. 33.
    Stringhini, G., Thonnard, O.: That ain’t you: blocking spearphishing through behavioral modelling. In: Almgren, M., Gulisano, V., Maggi, F. (eds.) DIMVA 2015. LNCS, vol. 9148, pp. 78–97. Springer, Cham (2015). Scholar
  34. 34.
    Wang, J., Herath, T., Chen, R., Vishwanath, A., Rao, H.R.: Research article phishing susceptibility: an investigation into the processing of a targeted spear phishing email. IEEE Trans. Prof. Commun. 55(4), 345–362 (2012)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Hugo Gascon
    • 1
    Email author
  • Steffen Ullrich
    • 2
  • Benjamin Stritter
    • 3
  • Konrad Rieck
    • 1
  1. 1.TU BraunschweigBraunschweigGermany
  2. 2.Genua GmbHKirchheim bei MünchenGermany
  3. 3.Friedrich Alexander-Universität Erlangen-NürnbergErlangenGermany

Personalised recommendations