Skip to main content
Log in

Malicious URL detection by dynamically mining patterns without pre-defined elements

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Detecting malicious URLs is an essential task in network security intelligence. In this paper, we make two new contributions beyond the state-of-the-art methods on malicious URL detection. First, instead of using any pre-defined features or fixed delimiters for feature selection, we propose to dynamically extract lexical patterns from URLs. Our novel model of URL patterns provides new flexibility and capability on capturing malicious URLs algorithmically generated by malicious programs. Second, we develop a new method to mine our novel URL patterns, which are not assembled using any pre-defined items and thus cannot be mined using any existing frequent pattern mining methods. Our extensive empirical study using the real data sets from Fortinet, a leader in the network security industry, clearly shows the effectiveness and efficiency of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bergroth, L., Hakonen, H., Raita, T.: A survey of longest common subsequence algorithms. In: Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE’00), pp. 39–48. IEEE Computer Society, Washington, DC, USA (2000)

    Google Scholar 

  2. Dredze, M., Crammer, K.: Confidence-weighted linear classification. In: ICML ’08: Proceedings of the 25th International Conference on Machine Learning, pp. 264–271. ACM (2008)

  3. Fortinet: Fortinet web filtering. http://www.fortiguard.com/webfiltering/webfiltering.html. Accessed 12 Mar 2012

  4. Gartner: Gartner survey shows phishing attacks escalated in 2007; more than $3 billion lost to these attacks. http://www.gartner.com/it/page.jsp?id=565125. Accessed 12 Mar 2012

  5. ICT Applications and Cybersecurity Division, Policies and Strategies Department, and ITU Telecommunication Development Sector: ITU study on the financial aspects of network security: malware and spam. http://www.itu.int/ITU-D/cyb/cybersecurity/docs/itu-study-financial-aspects-of-malware-and-spam.pdf. Accessed 12 Mar 2012

  6. Kan, M.-Y., Thi, H.O.N.: Fast webpage classification using url features. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management. CIKM ’05, pp. 325–326. ACM, New York, NY, USA (2005)

    Google Scholar 

  7. Le, A., Markopoulou, A., Faloutsos, M.: Phishdef: url names say it all. In: Proceedings of the 30th IEEE International Conference on Computer Communications, Joint Conference of the IEEE Computer and Communications Societies, pp. 191–195. IEEE, Shanghai, China (2011)

    Google Scholar 

  8. Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Beyond blacklists: learning to detect malicious web sites from suspicious urls. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’09, pp. 1245–1254. ACM, New York, NY, USA (2009)

    Chapter  Google Scholar 

  9. Maier, D.: The complexity of some problems on subsequences and supersequences. J. ACM 25(2), 322–336 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  10. McGrath, D.K., Gupta, M.: Behind phishing: an examination of phisher modi operandi. In: Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats, pp. 4:1–4:8. USENIX Association, Berkeley, CA, USA (2008)

    Google Scholar 

  11. Moshchuk, A., Bragin, T., Gribble, S.D., Levy, H.M.: A crawler-based study of spyware in the web. In: Proceedings of the Network and Distributed System Security Symposium (NDSS’06). The Internet Society, San Diego, California, USA (2006)

    Google Scholar 

  12. Netcraft: Netcraft. http://news.netcraft.com/. Accessed 12 Mar 2012

  13. PhishTank: Phishtank. http://www.phishtank.com/. Accessed 12 Mar 2012

  14. Provos, N., Mavrommatis, P., Rajab, M.A., Monrose, F.: All your iframes point to us. In: Proceedings of the 17th Conference on Security Symposium, pp. 1–15. USENIX Association, Berkeley, CA, USA (2008)

    Google Scholar 

  15. Wikipedia: Web threat. http://en.wikipedia.org/wiki/Web_threat. Accessed 12 Mar 2012

  16. Yadav, S., Reddy, A.K.K., Reddy, A.N., Ranjan, S.: Detecting algorithmically generated malicious domain names. In: Proceedings of the 10th Annual Conference on Internet Measurement. IMC ’10, pp. 48–61. ACM, New York, NY, USA (2010)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jian Pei.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, D., Xu, K. & Pei, J. Malicious URL detection by dynamically mining patterns without pre-defined elements. World Wide Web 17, 1375–1394 (2014). https://doi.org/10.1007/s11280-013-0250-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-013-0250-4

Keywords

Navigation