Skip to main content

Reading Between the Lines: Content-Agnostic Detection of Spear-Phishing Emails

Part of the Lecture Notes in Computer Science book series (LNSC,volume 11050)


Spear-phishing is an effective attack vector for infiltrating companies and organisations. Based on the multitude of personal information available online, an attacker can craft seemingly legit emails and trick his victims into opening malicious attachments and links. Although anti-spoofing techniques exist, their adoption is still limited and alternative protection approaches are needed. In this paper, we show that a sender leaves content-agnostic traits in the structure of an email. Based on these traits, we develop a method capable of learning profiles for a large set of senders and identifying spoofed emails as deviations thereof. We evaluate our approach on over 700,000 emails from 16,000 senders and demonstrate that it can discriminate thousands of senders, identifying spoofed emails with 90% detection rate and less than 1 false positive in 10,000 emails. Moreover, we show that individual traits are hard to guess and spoofing only succeeds if entire emails of the sender are available to the attacker.


  • Spear-phishing
  • Email spoofing
  • Targeted attack detection

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-00470-5_4
  • Chapter length: 23 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   84.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-00470-5
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   109.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.
Fig. 9.


  1. Amin, R.M.: Detecting targeted malicious email through supervised classification of persistent threat and recipient oriented features. Ph.D. thesis, George Washington University, Washington, DC, USA (2010). aAI3428188

    Google Scholar 

  2. Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: International Conference on Machine Learning (ICML), pp. 97–104 (2006)

    Google Scholar 

  3. Buildwith technology lookup. Accessed November 2017

  4. Callas, J., Donnerhacke, L., Finney, H., Shaw, D., Thayer, R.: OpenPGP Message Format. RFC 4880 (Proposed Standard), November 2007. Updated by RFC 5581

  5. Caputo, D.D., Pfleeger, S.L., Freeman, J.D., Johnson, M.E.: Going spear phishing: exploring embedded training and awareness. IEEE Secur. Priv. 12(1), 28–38 (2014)

    CrossRef  Google Scholar 

  6. Chen, P., Desmet, L., Huygens, C.: A study on advanced persistent threats. In: De Decker, B., Zúquete, A. (eds.) CMS 2014. LNCS, vol. 8735, pp. 63–72. Springer, Heidelberg (2014).

    CrossRef  Google Scholar 

  7. Crocker, D., Hansen, T., Kucherawy, M.: DomainKeys Identified Mail (DKIM) Signatures. RFC 6376 (Internet Standard), September 2011.

  8. Lawrence, N.D., Schölkopf, B.: Estimating a kernel fisher discriminant in the presence of label noise. In: ICML, vol. 1, pp. 306–313 (2001)

    Google Scholar 

  9. Duda, R., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, Hoboken (2001)

    MATH  Google Scholar 

  10. Duman, S., Cakmakci, K.K., Egele, M., Robertson, W., Kirda, E.: EmailProfiler: spearphishing filtering with header and stylometric features of emails. In: COMPSAC (2016)

    Google Scholar 

  11. Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. JMLR 9, 1871–1874 (2008)

    MATH  Google Scholar 

  12. Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)

    MathSciNet  CrossRef  Google Scholar 

  13. Foster, I.D., Larson, J., Masich, M., Snoeren, A.C., Savage, S., Levchenko, K.: Security by any other name: on the effectiveness of provider based email security. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, CCS 2015, pp. 450–464. ACM, New York (2015).

  14. Freed, N., Borenstein, N.: Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies. RFC 2045 (Draft Standard), November 1996. Updated by RFCs 2184, 2231, 5335, 6532

  15. Freed, N., Moore, K.: MIME Parameter Value and Encoded Word Extensions: Character Sets, Languages, and Continuations. RFC 2231 (Proposed Standard), November 1997.

  16. Gupta, S., Singhal, A., Kapoor, A.: A literature survey on social engineering attacks: phishing attack. In: 2016 International Conference on Computing, Communication and Automation (ICCCA), pp. 537–540. IEEE (2016)

    Google Scholar 

  17. Han, F., Shen, Y.: Accurate spear phishing campaign attribution and early detection. In: SAC, pp. 2079–2086 (2016)

    Google Scholar 

  18. Hardy, S., et al.: Targeted threat index: characterizing and quantifying politically-motivated targeted malware. In: USENIX Security, pp. 527–541 (2014)

    Google Scholar 

  19. Ho, G., et al.: Detecting credential spearphishing attacks in enterprise settings. In: USENIX Security Symposium (2017)

    Google Scholar 

  20. Trend Micro Incorporated: Spear-Phishing Email: Most Favored APT Attack Bait. Technical report, Trend Micro Inc. (2012)

    Google Scholar 

  21. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. Technical report 23, LS VIII, University of Dortmund (1997)

    Google Scholar 

  22. Joachims, T.: Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms. Kluwer Academic Publishers (2002)

    Google Scholar 

  23. Josefsson, S.: The Base16, Base32, and Base64 Data Encodings. RFC 4648 (Proposed Standard), October 2006.

  24. Kitterman, S.: Sender Policy Framework (SPF) for Authorizing Use of Domains in Email, Version 1. RFC 7208 (Proposed Standard), April 2014. Updated by RFC 7372

  25. Kucherawy, M., Zwicky, E.: Domain-based Message Authentication, Reporting, and Conformance (DMARC). RFC 7489 (Informational), March 2015.

  26. Le Blond, S., Uritesc, A., Gilbert, C.: A look at targeted attacks through the lense of an NGO. In: USENIX Security, pp. 543–558 (2014)

    Google Scholar 

  27. Lin, E., Aycock, J., Mannan, M.: Lightweight client-side methods for detecting email forgery. In: Lee, D.H., Yung, M. (eds.) WISA 2012. LNCS, vol. 7690, pp. 254–269. Springer, Heidelberg (2012).

    CrossRef  Google Scholar 

  28. Mori, T., Sato, K., Takahashi, Y., Ishibashi, K.: How is e-mail sender authentication used and misused? In: Proceedings of the 8th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, CEAS 2011, pp. 31–37. ACM, New York (2011).

  29. Ramsdell, B., Turner, S.: Secure/Multipurpose Internet Mail Extensions (S/MIME) Version 3.2 Message Specification. RFC 5751 (Proposed Standard), January 2010.

  30. Resnick, P.: Internet Message Format. RFC 5322 (Draft Standard), October 2008. Updated by RFC 6854

  31. Rieck, K., Wressnegger, C., Bikadorov, A.: Sally: a tool for embedding strings in vector spaces. J. Mach. Learn. Res. (JMLR) 13(Nov), 3247–3251 (2012)

    MathSciNet  MATH  Google Scholar 

  32. Salton, G., Wong, A., Yang, C.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)

    CrossRef  Google Scholar 

  33. Stringhini, G., Thonnard, O.: That ain’t you: blocking spearphishing through behavioral modelling. In: Almgren, M., Gulisano, V., Maggi, F. (eds.) DIMVA 2015. LNCS, vol. 9148, pp. 78–97. Springer, Cham (2015).

    CrossRef  Google Scholar 

  34. Wang, J., Herath, T., Chen, R., Vishwanath, A., Rao, H.R.: Research article phishing susceptibility: an investigation into the processing of a targeted spear phishing email. IEEE Trans. Prof. Commun. 55(4), 345–362 (2012)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Hugo Gascon .

Editor information

Editors and Affiliations

A Appendix

A Appendix

Tables 4, 5 and 6 provide an overview of the different traits characterizing the behavior, composition and transport of emails, respectively.

Table 4. List of behavior features.
Table 5. List of composition features.
Table 6. List of transport features.

Rights and permissions

Reprints and Permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Gascon, H., Ullrich, S., Stritter, B., Rieck, K. (2018). Reading Between the Lines: Content-Agnostic Detection of Spear-Phishing Emails. In: Bailey, M., Holz, T., Stamatogiannakis, M., Ioannidis, S. (eds) Research in Attacks, Intrusions, and Defenses. RAID 2018. Lecture Notes in Computer Science(), vol 11050. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00469-9

  • Online ISBN: 978-3-030-00470-5

  • eBook Packages: Computer ScienceComputer Science (R0)