Abstract
Detecting malicious URLs is an essential task in network security intelligence. In this paper, we make two new contributions beyond the state-of-the-art methods on malicious URL detection. First, instead of using any pre-defined features or fixed delimiters for feature selection, we propose to dynamically extract lexical patterns from URLs. Our novel model of URL patterns provides new flexibility and capability on capturing malicious URLs algorithmically generated by malicious programs. Second, we develop a new method to mine our novel URL patterns, which are not assembled using any pre-defined items and thus cannot be mined using any existing frequent pattern mining methods. Our extensive empirical study using the real data sets from Fortinet, a leader in the network security industry, clearly shows the effectiveness and efficiency of our approach.
Similar content being viewed by others
References
Bergroth, L., Hakonen, H., Raita, T.: A survey of longest common subsequence algorithms. In: Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE’00), pp. 39–48. IEEE Computer Society, Washington, DC, USA (2000)
Dredze, M., Crammer, K.: Confidence-weighted linear classification. In: ICML ’08: Proceedings of the 25th International Conference on Machine Learning, pp. 264–271. ACM (2008)
Fortinet: Fortinet web filtering. http://www.fortiguard.com/webfiltering/webfiltering.html. Accessed 12 Mar 2012
Gartner: Gartner survey shows phishing attacks escalated in 2007; more than $3 billion lost to these attacks. http://www.gartner.com/it/page.jsp?id=565125. Accessed 12 Mar 2012
ICT Applications and Cybersecurity Division, Policies and Strategies Department, and ITU Telecommunication Development Sector: ITU study on the financial aspects of network security: malware and spam. http://www.itu.int/ITU-D/cyb/cybersecurity/docs/itu-study-financial-aspects-of-malware-and-spam.pdf. Accessed 12 Mar 2012
Kan, M.-Y., Thi, H.O.N.: Fast webpage classification using url features. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management. CIKM ’05, pp. 325–326. ACM, New York, NY, USA (2005)
Le, A., Markopoulou, A., Faloutsos, M.: Phishdef: url names say it all. In: Proceedings of the 30th IEEE International Conference on Computer Communications, Joint Conference of the IEEE Computer and Communications Societies, pp. 191–195. IEEE, Shanghai, China (2011)
Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Beyond blacklists: learning to detect malicious web sites from suspicious urls. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’09, pp. 1245–1254. ACM, New York, NY, USA (2009)
Maier, D.: The complexity of some problems on subsequences and supersequences. J. ACM 25(2), 322–336 (1978)
McGrath, D.K., Gupta, M.: Behind phishing: an examination of phisher modi operandi. In: Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats, pp. 4:1–4:8. USENIX Association, Berkeley, CA, USA (2008)
Moshchuk, A., Bragin, T., Gribble, S.D., Levy, H.M.: A crawler-based study of spyware in the web. In: Proceedings of the Network and Distributed System Security Symposium (NDSS’06). The Internet Society, San Diego, California, USA (2006)
Netcraft: Netcraft. http://news.netcraft.com/. Accessed 12 Mar 2012
PhishTank: Phishtank. http://www.phishtank.com/. Accessed 12 Mar 2012
Provos, N., Mavrommatis, P., Rajab, M.A., Monrose, F.: All your iframes point to us. In: Proceedings of the 17th Conference on Security Symposium, pp. 1–15. USENIX Association, Berkeley, CA, USA (2008)
Wikipedia: Web threat. http://en.wikipedia.org/wiki/Web_threat. Accessed 12 Mar 2012
Yadav, S., Reddy, A.K.K., Reddy, A.N., Ranjan, S.: Detecting algorithmically generated malicious domain names. In: Proceedings of the 10th Annual Conference on Internet Measurement. IMC ’10, pp. 48–61. ACM, New York, NY, USA (2010)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Huang, D., Xu, K. & Pei, J. Malicious URL detection by dynamically mining patterns without pre-defined elements. World Wide Web 17, 1375–1394 (2014). https://doi.org/10.1007/s11280-013-0250-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-013-0250-4