Malicious URL detection by dynamically mining patterns without pre-defined elements

Huang, Da; Xu, Kai; Pei, Jian

doi:10.1007/s11280-013-0250-4

Malicious URL detection by dynamically mining patterns without pre-defined elements

Published: 10 August 2013

Volume 17, pages 1375–1394, (2014)
Cite this article

World Wide Web Aims and scope Submit manuscript

Da Huang^1,2,
Kai Xu² &
Jian Pei¹

706 Accesses
26 Citations
3 Altmetric
Explore all metrics

Abstract

Detecting malicious URLs is an essential task in network security intelligence. In this paper, we make two new contributions beyond the state-of-the-art methods on malicious URL detection. First, instead of using any pre-defined features or fixed delimiters for feature selection, we propose to dynamically extract lexical patterns from URLs. Our novel model of URL patterns provides new flexibility and capability on capturing malicious URLs algorithmically generated by malicious programs. Second, we develop a new method to mine our novel URL patterns, which are not assembled using any pre-defined items and thus cannot be mined using any existing frequent pattern mining methods. Our extensive empirical study using the real data sets from Fortinet, a leader in the network security industry, clearly shows the effectiveness and efficiency of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bergroth, L., Hakonen, H., Raita, T.: A survey of longest common subsequence algorithms. In: Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE’00), pp. 39–48. IEEE Computer Society, Washington, DC, USA (2000)
Google Scholar
Dredze, M., Crammer, K.: Confidence-weighted linear classification. In: ICML ’08: Proceedings of the 25th International Conference on Machine Learning, pp. 264–271. ACM (2008)
Fortinet: Fortinet web filtering. http://www.fortiguard.com/webfiltering/webfiltering.html. Accessed 12 Mar 2012
Gartner: Gartner survey shows phishing attacks escalated in 2007; more than $3 billion lost to these attacks. http://www.gartner.com/it/page.jsp?id=565125. Accessed 12 Mar 2012
ICT Applications and Cybersecurity Division, Policies and Strategies Department, and ITU Telecommunication Development Sector: ITU study on the financial aspects of network security: malware and spam. http://www.itu.int/ITU-D/cyb/cybersecurity/docs/itu-study-financial-aspects-of-malware-and-spam.pdf. Accessed 12 Mar 2012
Kan, M.-Y., Thi, H.O.N.: Fast webpage classification using url features. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management. CIKM ’05, pp. 325–326. ACM, New York, NY, USA (2005)
Google Scholar
Le, A., Markopoulou, A., Faloutsos, M.: Phishdef: url names say it all. In: Proceedings of the 30th IEEE International Conference on Computer Communications, Joint Conference of the IEEE Computer and Communications Societies, pp. 191–195. IEEE, Shanghai, China (2011)
Google Scholar
Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Beyond blacklists: learning to detect malicious web sites from suspicious urls. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’09, pp. 1245–1254. ACM, New York, NY, USA (2009)
Chapter Google Scholar
Maier, D.: The complexity of some problems on subsequences and supersequences. J. ACM 25(2), 322–336 (1978)
Article MathSciNet MATH Google Scholar
McGrath, D.K., Gupta, M.: Behind phishing: an examination of phisher modi operandi. In: Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats, pp. 4:1–4:8. USENIX Association, Berkeley, CA, USA (2008)
Google Scholar
Moshchuk, A., Bragin, T., Gribble, S.D., Levy, H.M.: A crawler-based study of spyware in the web. In: Proceedings of the Network and Distributed System Security Symposium (NDSS’06). The Internet Society, San Diego, California, USA (2006)
Google Scholar
Netcraft: Netcraft. http://news.netcraft.com/. Accessed 12 Mar 2012
PhishTank: Phishtank. http://www.phishtank.com/. Accessed 12 Mar 2012
Provos, N., Mavrommatis, P., Rajab, M.A., Monrose, F.: All your iframes point to us. In: Proceedings of the 17th Conference on Security Symposium, pp. 1–15. USENIX Association, Berkeley, CA, USA (2008)
Google Scholar
Wikipedia: Web threat. http://en.wikipedia.org/wiki/Web_threat. Accessed 12 Mar 2012
Yadav, S., Reddy, A.K.K., Reddy, A.N., Ranjan, S.: Detecting algorithmically generated malicious domain names. In: Proceedings of the 10th Annual Conference on Internet Measurement. IMC ’10, pp. 48–61. ACM, New York, NY, USA (2010)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Simon Fraser University, Burnaby, Canada
Da Huang & Jian Pei
Fortinet Inc., Burnaby, Canada
Da Huang & Kai Xu

Authors

Da Huang
View author publications
You can also search for this author in PubMed Google Scholar
Kai Xu
View author publications
You can also search for this author in PubMed Google Scholar
Jian Pei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jian Pei.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, D., Xu, K. & Pei, J. Malicious URL detection by dynamically mining patterns without pre-defined elements. World Wide Web 17, 1375–1394 (2014). https://doi.org/10.1007/s11280-013-0250-4

Download citation

Received: 22 June 2012
Revised: 04 March 2013
Accepted: 24 July 2013
Published: 10 August 2013
Issue Date: November 2014
DOI: https://doi.org/10.1007/s11280-013-0250-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Malicious URL detection by dynamically mining patterns without pre-defined elements

Abstract

Access this article

Similar content being viewed by others

Sniping at web applications to discover input-handling vulnerabilities

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Applying NLP techniques to malware detection in a practical environment

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Malicious URL detection by dynamically mining patterns without pre-defined elements

Abstract

Access this article

Similar content being viewed by others

Sniping at web applications to discover input-handling vulnerabilities

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Applying NLP techniques to malware detection in a practical environment

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation