Social Network Analysis and Mining

, Volume 2, Issue 1, pp 5–16 | Cite as

Profiling phishing activity based on hyperlinks extracted from phishing emails

  • John Yearwood
  • Musa MammadovEmail author
  • Dean Webb
Original Article


Phishing activity has recently been focused on social networking sites as a more effective way of exploiting not only the technology but also the trust that may exist between members in a social network. In this paper, a novel method for profiling phishing activity from an analysis of phishing emails is proposed. Profiling is useful in determining the activity of an individual or a particular group of phishers. Work in the area of phishing is usually aimed at detection of phishing emails. In this paper, we concentrate on profiling as distinct from detection of phishing emails. We formulate the profiling problem as a multi-label classification problem using the hyperlinks in the phishing emails as features and structural properties of emails along with whois (i.e. DNS) information on hyperlinks as profile classes. Further, we generate profiles based on the classifier predictions. Thus, classes become elements of profiles. We employ a boosting algorithm (AdaBoost) as well as SVM to generate multi-label class predictions on three different datasets created from hyperlink information in phishing emails. These predictions are further utilized to generate complete profiles of these emails. Results show that profiling can be done with quite high accuracy using hyperlink information.


Phishing Profiling phishing emails Multi-label classification 


  1. Alison L, Smith M, Eastman O, Rainbow L (2003) Toulmin’s philosophy of argument and its relevance to offender profiling. Psychol Crime Law 9(2):173–183CrossRefGoogle Scholar
  2. Bhattacharyya P, Garg A, Wu SF (2010) Analysis of user keyword similarity in online social networks. Social Netw Anal Min. doi: 10.1007/s13278-010-0006-4
  3. Brandjacking index (2009), Spring 2009.
  4. Castle T, Hensley C (2002) Serial killers with military experience: applying learning theory to serial murder. International J Offender Ther Comp Criminol 46:453–465Google Scholar
  5. Chandrasekaran M, Karayanan K, Upadhyaya S (2006) Towards phishing e-mail detection based on their structural properties, In: Proceedings of the New York State Cyber Security ConferenceGoogle Scholar
  6. Chau D (2005) Prototyping a lightweight trust architecture to fight phishing, MIT Computer Science and Artificial Intelligence Laboratory, Tech. Rep., final Report.
  7. Clark R (1993) Profiling: a hidden challenge to the regulation of data surveillance. J Inf Sci 4(2)Google Scholar
  8. Cortez P, Correia A, Sousa P, Rocha M, Rio M, Perner P (2010) Spam Email Filtering Using Network-Level Properties. In: Proceedings, Advances in Data Mining Applications and Theoretical Aspects. 10th Industrial Conference, ICDM 2010. Berlin, GermanyGoogle Scholar
  9. Customer profiling survey solution enabling cross and up selling (2007) Confirmit.
  10. Doyle S (2008) Social network analysis in the telco sector marketing applications. J Database Mark Cust Strategy Manag 15:130–134CrossRefGoogle Scholar
  11. Emigh A (2005) Online identity theft: Phishing technilogy, chokepoints and countermeasures, Radix Labs, Tech. Rep., retrieved from Anti-Phishing Working Group.
  12. FBI method of profiling (2006) Wikipedia, January 2006, retrieved June 10, 2011.
  13. Fette I, Sadeh N, Tomasic A (2007) Learning to detect phishing emails. In: WWW ’07: Proceedings of the 16th international conference on the World Wide Web. New York, NY, USA. ACM Press, New York, pp 649–656Google Scholar
  14. Freund Y, Schapire R (1999) A short introduction to boosting, J Jpn Soc Artif Intell 14(5):771–780Google Scholar
  15. Hanneman RA, Shelton CR (2011) Applying modality and equivalence concepts to pattern finding in social process-produced data. Soc Netw Anal Min 1(1):59–72Google Scholar
  16. Interactive investor profile tool (2011) Southside Bank: Trust & Investment Services Group, retrieved June 10, 2011.
  17. InterNIC : Whois search InterNIC—Public information Regarding Internel Domain Name Registration Services.
  18. Investor profiles (2011) The National Mutual Life Association of Australasia Limited, retrieved June 10, 2011.
  19. Jakobsson M, Young A (2005) Distributed phishing attacks, Cryptology ePrint Archive, Report 2005/091.
  20. Joachims TJ (2002) Learning to classify text using support vector machines: methods, theory and algorithms. Kluwer Academic Publishers, DordrechtGoogle Scholar
  21. Juels A, Jakobsson M, Jagatic TN (2006) Cache cookies for browser authentication (extended abstract), In: SP ’06: Proceedings of the 2006 IEEE Symposium on Security and Privacy (S&P’06). minus 0.4em Washington, DC, USA. IEEE Computer Society, NW, pp 301–305Google Scholar
  22. Mammadov M, Rubinov A, Yearwood J (2007a) The study of drug-reaction relationships using global optimization techniques. Optim Methods Softw 22(1):99–126Google Scholar
  23. Mammadov M, Yearwood J, Banarjee A (2007b) Classification on shorter featured and multi-label datasets. In: Proceedings of the 7th International Conference on Optimization: Techniques and Applications (ICOTA07), December 12–15, Kobe, JapanGoogle Scholar
  24. Market basket analysis (2011) Information drivers, retrieved June 10, 2011.
  25. Petrovic D (2007) Analysis of consumer behaviour online,, Tech. Rep., retrieved June 10, 2011.
  26. Ripe database RIPE Network Coordination Centre.
  27. Rosen D, Barnett GA, Kim JH (2011) Social networks and online environments: when science and practice co-evolve. Soc Netw Anal Min 1(1):27–42Google Scholar
  28. Schapire R, Singer Y (2000) BoosTexter: a boosting-based system for text categorization. Mach Learn 39(2/3):135–168zbMATHCrossRefGoogle Scholar
  29. Sebastiani F (2002) Machine learning in automated text categorization. IACM Comput Surv (CSUR) 34(1). doi: 10.1145/505282.505283
  30. Stewart J (2003) DNS cache poisoning—the next generation, Secure Works, Tech. Rep.
  31. Tang L, Rajan S, Narayanan VK (2009) Large scale multi-label classification via metalabeler. In: Proceedings of the 18th international conference on World Wide Web, ACM Press, New York, pp 211–220Google Scholar
  32. The APNIC whois Asia Pacific Network Information Center.
  33. Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehous Min 3(3):1–13CrossRefGoogle Scholar
  34. Webb D (2011) A free and comprehensive guide to the world of forensic psychology, All About Forensic Psychology, retrieved June, 2011.
  35. Webb D, Yearwood J, Vamplew P, Liping M, Ofoghi B, Kelarev A (2009) Applying clustering and ensemble clustering approaches to phishing profiling. In: Proceedings of AusDM 2009, The Australasian Data Mining Conference 2009. CRPITGoogle Scholar
  36. Wu X, et al (2008) Top 10 algorithms in data mining. Knowl Inf Sys 14:1–37CrossRefGoogle Scholar
  37. Wu M, Miller R, Little G (2006) Preventing phishing attacks by revealing user intentions. In: Symposium on Usable Privacy and Security (SOUPS)Google Scholar
  38. Xu KS, Kliger MM, Yilun C, Woolf PJ, Hero A (2009) Revealing social networks of spammers through spectral clustering. In: Proceedings of the IEEE International Conference on Communications. Dresden, Germany. doi: 10.1109/ICC.2009.5199418
  39. Yang Y (1999) A re-examination of text categorization methods. In: Proceedings of SIGIR-99, 22nd ACM International Conference on Research and Development in Information RetrievalGoogle Scholar

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  1. 1.Graduate School of ITMSUniversity of BallaratBallaratAustralia

Personalised recommendations