Lightweight Client-Side Methods for Detecting Email Forgery

  • Eric Lin
  • John Aycock
  • Mohammad Mannan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7690)


We examine a related, but distinct, problem to spam detection. Instead of trying to decide if email is spam or ham, we try to determine if email purporting to be from a known correspondent actually comes from that person – this may be seen as a way to address a class of targeted email attacks. We propose two methods, geolocation and stylometry analysis. The efficacy of geolocation was evaluated using over 73,000 emails collected from real users; stylometry, for comparison with related work from the area of computer forensics, was evaluated using selections from the Enron corpus. Both methods show promise for addressing the problem, and are complementary to existing anti-spam techniques. Neither requires global changes to email infrastructure, and both are done on the email client side, a practical means to empower end users with respect to security. Furthermore, both methods are lightweight in the sense that they leverage existing information and software in new ways, instead of needing massive deployments of untried applications.


Forgery Detection Authorship Analysis Geolocation Database Enron Email Legitimate Email 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Allman, E., Callas, J., Delany, M., Libbey, M., Fenton, J., Thomas, M.: DomainKeys Identified Mail (DKIM) Signatures. RFC 4871 (Proposed Standard), Updated by RFC 5672 (May 2007)Google Scholar
  2. 2.
    Argamon, S., Šarić, M., Stein, S.S.: Style mining of electronic messages for multiple authorship discrimination: first results. In: 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 475–480 (2003)Google Scholar
  3. 3.
    Aycock, J., Friess, N.: Spam zombies from outer space. In: 15th Annual EICAR Conference, pp. 164–179 (2006)Google Scholar
  4. 4.
    Balakrishnan, M., Mohomed, I., Ramasubramanian, V.: Where’s that phone?: geolocating IP addresses on 3G networks. In: 9th ACM SIGCOMM Conference on Internet Measurement, pp. 294–300 (2009)Google Scholar
  5. 5.
    Brennan, M., Greenstadt, R.: Practical attacks against authorship recognition techniques. In: 21st Innovative Applications of Artificial Intelligence Conference, pp. 60–65 (2009)Google Scholar
  6. 6.
    BusinessWeek. The new e-spionage threat. Cover story (April 10, 2008),
  7. 7.
    Calix, K., Connors, M., Levy, D., Manzar, H., McCabe, G., Westcott, S.: Stylometry for e-mail author identification and authentication. In: Proceedings of CSIS Research Day. Pace University (2008)Google Scholar
  8. 8.
    CBC News. Ottawa man victim of Facebook, email scam. News article (March 2, 2011),
  9. 9. Email attacks: This time its personal. Online resource (June 2011),
  10. 10.
    Cook, D., Hartnett, J., Manderson, K., Scanlan, J.: Catching spam before it arrives: domain specific dynamic blacklists. In: 2006 Australasian Workshops on Grid Computing and e-Research, pp. 193–202 (2006)Google Scholar
  11. 11.
    Corney, M.: Analysing e-mail text authorship for forensic purposes. Master of Information Technology thesis, Queensland University of Technology (2003)Google Scholar
  12. 12.
    Dingledine, R., Mathewson, N., Syverson, P.: Tor: The second-generation onion router. In: 13th USENIX Security Symposium, pp. 303–320 (2004)Google Scholar
  13. 13.
    Frantzeskou, G., Stamatatos, E., Gritzalis, S., Katsikas, S.: Source code author identification based on n-gram author profiles. In: IFIP International Federation for Information Processing, pp. 508–515 (2006)Google Scholar
  14. 14.
    Gallagher, D.F.: E-mail scammers ask your friends for money. New York Times. Blog article (November 9, 2007).
  15. 15.
    Gomes, L.H., Cazita, C., Almeida, J.M., Almeida, V., Meira Jr., W.: Characterizing a spam traffic. In: 4th ACM SIGCOMM Conference on Internet Measurement, pp. 356–369 (2004)Google Scholar
  16. 16.
    Hemmingsen, R., Aycock, J., Jacobson Jr., M.: Spam, phishing, and the looming challenge of big botnets. In: EU Spam Symposium (2007)Google Scholar
  17. 17.
    Iqbal, F., Hadjidj, R., Fung, B.C., Debbabi, M.: A novel approach of mining write-prints for authorship attribution in e-mail forensics. Digital Investigation 5(suppl. 1), S42–S51 (2008)CrossRefGoogle Scholar
  18. 18.
    Jagatic, T., Johnson, N., Jakobsson, M., Menczer, F.: Social phishing. Commun. ACM 50(10), 94–100 (2007)CrossRefGoogle Scholar
  19. 19.
    Kaelbling, L.: Enron email dataset. CALO Project (August 21, 2009),
  20. 20.
    Kanaris, I., Kanaris, K., Houvardas, J., Stamatatos, E.: Words vs. character n-grams for anti-spam filtering. Int. Journal on Artificial Intelligence Tools (2007)Google Scholar
  21. 21.
    Lin, E.: Detecting email forgery. Master’s thesis, University of Calgary (2011)Google Scholar
  22. 22.
    Luyckx, K., Daelemans, W.: Authorship attribution and verification with many authors and limited data. In: 22nd International Conference on Computational Linguistics, pp. 513–520 (2008)Google Scholar
  23. 23.
    MessageLabs. MessageLabs intelligence: 2010 annual security report,
  24. 24.
    Meyer, T.A., Whateley, B.: SpamBayes: Effective open-source, Bayesian based, email classification system. In: 1st Conference on Email and Anti-Spam (2004)Google Scholar
  25. 25.
    Muir, J.A., Van Oorschot, P.C.: Internet geolocation: Evasion and counterevasion. ACM Comput. Surv. 42(1), 1–23 (2009)CrossRefGoogle Scholar
  26. 26.
    Ramachandran, A., Feamster, N.: Understanding the network-level behavior of spammers. SIGCOMM Comput. Commun. Rev. 36(4), 291–302 (2006)CrossRefGoogle Scholar
  27. 27.
    Rekhter, Y., Moskowitz, B., Karrenberg, D., de Groot, G.J., Lear, E.: Address Allocation for Private Internets. RFC 1918 (Best Current Practice) (February 1996)Google Scholar
  28. 28.
    Robinson, G.: A statistical approach to the spam problem. Linux Journal 107 (March 2003)Google Scholar
  29. 29.
    Sanchez, F., Duan, Z., Dong, Y.: Understanding forgery properties of spam delivery paths. In: Proc. 7th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference (CEAS), pp. 13–14 (July 2010)Google Scholar
  30. 30. RSA: SecurID attack was phishing via an Excel spreadsheet. Blog article (April 1, 2011),
  31. 31.
    Vogel, C., Lynch, G.: Computational stylometry: Who’s in a play? In: Verbal and Nonverbal Features of Human-Human and Human-Machine Interaction: COST Action 2102 International Conference, Revised Papers, pp. 169–186 (2008)Google Scholar
  32. 32.
    Wong, M., Schlitt, W.: Sender Policy Framework (SPF) for Authorizing Use of Domains in E-Mail, Version 1. RFC 4408 (Experimental) (April 2006)Google Scholar
  33. 33.
    Xie, Y., Yu, F., Achan, K., Gillum, E., Goldszmidt, M., Wobber, T.: How dynamic are IP addresses? In: 2007 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, pp. 301–312 (2007)Google Scholar
  34. 34.
    Zheng, R., Qin, Y., Huang, Z., Chen, H.: Authorship Analysis in Cybercrime Investigation. In: Chen, H., Miranda, R., Zeng, D.D., Demchak, C.C., Schroeder, J., Madhusudan, T. (eds.) ISI 2003. LNCS, vol. 2665, pp. 59–73. Springer, Heidelberg (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Eric Lin
    • 1
  • John Aycock
    • 1
  • Mohammad Mannan
    • 2
  1. 1.Department of Computer ScienceUniversity of CalgaryCalgaryCanada
  2. 2.Concordia Institute for Information Systems Engineering, Faculty of Engineering and Computer ScienceConcordia UniversityMontrealCanada

Personalised recommendations