DURLD: Malicious URL Detection Using Deep Learning-Based Character Level Representations

Srinivasan, Sriram; Vinayakumar, R.; Arunachalam, Ajay; Alazab, Mamoun; Soman, KP

doi:10.1007/978-3-030-62582-5_21

Sriram Srinivasan⁴,
R. Vinayakumar⁵,
Ajay Arunachalam⁶,
Mamoun Alazab⁷ &
…
KP Soman⁴

2404 Accesses
20 Citations

Abstract

Cybercriminals widely use Malicious URL, a.k.a. malicious website as a primary mechanism to host unsolicited content, such as spam, malicious advertisements, phishing, and drive-by exploits, to name a few. Previous studies used blacklisting, regular expression, and signature matching approaches to detect malicious URLs. However, these approaches are limited to detect variants of existing or newly generated malicious URLs. Over the last decade, classic machine learning techniques have been used to detect malicious URLs. In this work, we evaluate various state-of-the-art deep learning-based character level embedding methods for malicious URL detection. To leverage and transform the performance improvement, we propose DeepURLDetect (DURLD) in which raw URLs are encoded using character level embedding. To capture several types of information in URL, we used the hidden layers in deep learning architectures to extract features from character level embedding and then employ a non-linear activation function to estimate the probability of the URL as malicious or not. Experimental evaluation demonstrates that DURLD can detect variants of malicious URLs, and it is computationally inexpensive when compared to various relevant deep learning-based character level embedding methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Deep Learning Based Online Malicious URL and DNS Detection Scheme

Analysis for Malicious URLs Using Machine Learning and Deep Learning Approaches

Phish-armour: phishing detection using deep recurrent neural networks

Article 27 March 2023

References

Abadi, Martín, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, and Michael Isard. 2016. Tensorflow: A system for large-scale machine learning. In 12th \(\{\)USENIX\(\}\)symposium on operating systems design and implementation (\(\{\)OSDI\(\}\)16), 265–283.
Google Scholar
Alazab, M., R. Layton, R. Broadhurst, and B. Bouhours. 2013. Malicious spam emails developments and authorship attribution. In 2013 fourth cybercrime and trustworthy computing workshop, 58–68.
Google Scholar
Alazab, Mamoun, and Roderic Broadhurst. 2016. Spam and criminal activity. Trends and Issues in Crime and Criminal Justice (Australian Institute of Criminology) (526). https://www.aic.gov.au/publications/tandi/tandi526.
Alazab, Mamoun, Robert Layton, Roderic Broadhurst, and Brigitte Bouhours. 2013. Malicious spam emails developments and authorship attribution. In 2013 fourth cybercrime and trustworthy computing workshop, 58–68. IEEE, 2013.
Google Scholar
Alazab, Mamoun, Sitalakshmi Venkatraman, Paul Watters, and Moutaz Alazab. 2010. Zero-day malware detection based on supervised learning algorithms of api call signatures.
Google Scholar
Alazab, Mamoun, Sitalakshmi Venkatraman, Paul Watters, and Moutaz Alazab. 2013. Information security governance: the art of detecting hidden malware. In IT security governance innovations: theory and research, 293–315. IGI Global.
Google Scholar
Anderson, Hyrum S., Jonathan Woodbridge, and Bobby Filar. 2016. Deepdga: Adversarially-tuned domain generation and detection. In Proceedings of the 2016 ACM workshop on artificial intelligence and security, 13–21.
Google Scholar
Azab, A., M. Alazab, and M. Aiash. 2016. Machine learning based botnet identification traffic. In 2016 IEEE Trustcom/BigDataSE/ISPA, 1788–1794.
Google Scholar
Azab, A., R. Layton, M. Alazab, and J. Oliver. 2014. Mining malware to detect variants. In 2014 fifth cybercrime and trustworthy computing conference, 44–53.
Google Scholar
Bahnsen, A.C., E.C. Bohorquez, S. Villegas, J. Vargas, and F.A. González. 2017. Classifying phishing urls using recurrent neural networks. In 2017 APWG symposium on electronic crime research (eCrime), 1–8.
Google Scholar
Blum, Aaron, Brad Wardman, Thamar Solorio, and Gary Warner. 2010. Lexical feature based phishing url detection using online learning. In Proceedings of the 3rd ACM Workshop on Artificial Intelligence and Security, 54–60.
Google Scholar
Broadhurst, Roderic, Peter Grabosky, Mamoun Alazab, Brigitte Bouhours, and Steve Chon. 2014. An analysis of the nature of groups engaged in cyber crime. An Analysis of the Nature of Groups engaged in Cyber Crime, International Journal of Cyber Criminology 8 (1): 1–20.
Google Scholar
Cao, Jian, Qiang Li, Yuede Ji, Yukun He, and Dong Guo. 2016. Detection of forwarding-based malicious urls in online social networks. International Journal of Parallel Programming 44 (1): 163–180.
Article Google Scholar
Chiba, Daiki, Kazuhiro Tobe, Tatsuya Mori, and Shigeki Goto. 2012. Detecting malicious websites by learning ip address features. In 2012 IEEE/IPSJ 12th international symposium on applications and the internet, 29–39. IEEE.
Google Scholar
Choi, Hyunsang, Bin B. Zhu, and Heejo Lee. 2011. Detecting malicious web links and identifying their attack types. WebApps 11 (11): 218.
Google Scholar
Chollet, François. 2015. keras.
Google Scholar
Dhingra, Bhuwan, Zhong Zhou, Dylan Fitzpatrick, Michael Muehl, and William W Cohen. 2016. Tweet2vec: Character-based distributed representations for social media. arXiv:1605.03481.
Felegyhazi, Mark, Christian Kreibich, and Vern Paxson. 2010. On the potential of proactive domain blacklisting. LEET 10: 6.
Google Scholar
Harikrishnan, N.B., R. Vinayakumar, K.P. Soman, and Prabaharan Poornachandran. 2019. Time split based pre-processing with a data-driven approach for malicious url detection. In Cybersecurity and secure information systems, 43–65. Springer.
Google Scholar
Kolari, Pranam, Tim Finin, and Anupam Joshi. 2006. Svms for the blogosphere: Blog identification and splog detection. In AAAI spring symposium on computational approaches to analysing weblogs.
Google Scholar
Lee, S., and J. Kim. 2013. Warningbird: A near real-time detection system for suspicious urls in twitter stream. IEEE Transactions on Dependable and Secure Computing 10 (3): 183–195.
Article Google Scholar
Ma, Justin, Lawrence K. Saul, Stefan Savage, and Geoffrey M. Voelker. 2009. Beyond blacklists: learning to detect malicious web sites from suspicious urls. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, 1245–1254.
Google Scholar
Ma, Justin, Lawrence K. Saul, Stefan Savage, and Geoffrey M. Voelker. 2009. Identifying suspicious urls: an application of large-scale online learning. In Proceedings of the 26th annual international conference on machine learning, 681–688.
Google Scholar
Kevin McGrath, D., and Minaxi Gupta. 2008. Behind phishing: An examination of phisher modi operandi. LEET 8: 4.
Google Scholar
Pedregosa, Fabian, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, and Vincent Dubourg. 2011. Scikit-learn: Machine learning in python. the Journal of Machine Learning Research, 12: 2825–2830.
Google Scholar
R., V., M. Alazab, A. Jolfaei, S. K.P., and P. Poornachandran. 2019. Ransomware triage using deep learning: Twitter as a case study. In 2019 cybersecurity and cyberforensics conference (CCC), 67–73
Google Scholar
S, S., V. R, M. Alazab, and S. KP. 2020. Network flow based iot botnet attack detection using deep learning. In IEEE INFOCOM 2020 - IEEE conference on computer communications workshops (INFOCOM WKSHPS), 189–194.
Google Scholar
S, S., V. R, S. V, M. Alazab, and S. KP. 2020. Multi-scale learning based malware variant detection using spatial pyramid pooling network. In IEEE INFOCOM 2020 - IEEE conference on computer communications workshops (INFOCOM WKSHPS), 740–745.
Google Scholar
Sahoo, Doyen, Chenghao Liu, and Steven CH Hoi. 2017. Malicious url detection using machine learning: A survey. arXiv:1701.07179.
Sanders, Hillary, and Joshua Saxe. 2017. Garbage in, garbage out: How purport-edly great ml models can be screwed up by bad data. Technical report.
Google Scholar
Saxe, Joshua, and Konstantin Berlin. 2017. expose: A character-level convolutional neural network with embeddings for detecting malicious urls, file paths and registry keys. arXiv:1702.08568.
Schiappa, Madeline. 2009. Machine learning: How to build a better threat detection model. Accessed July 3, 2020.
Google Scholar
Sommer, R., and V. Paxson. 2010. Outside the closed world: On using machine learning for network intrusion detection. In 2010 IEEE symposium on security and privacy, 305–316.
Google Scholar
Srinivasan, S., V. Ravi, S. V., M. Krichen, D. Ben Noureddine, S. Anivilla, and S. K. P. 2020. Deep convolutional neural network based image spam classification. In 2020 6th conference on data science and machine learning applications (CDMA), 112–117.
Google Scholar
Tran, Khoi-Nguyen, Mamoun Alazab, and Roderic Broadhurst. 2014. Towards a feature rich model for predicting spam emails containing malicious attachments and URLs.
Google Scholar
Verma, Rakesh. 2018. Security analytics: Adapting data science for security challenges. In Proceedings of the fourth ACM international workshop on security and privacy analytics, 40–41.
Google Scholar
Vinayakumar, R., M. Alazab, K.P. Soman, P. Poornachandran, A. Al-Nemrat, and S. Venkatraman. 2019. Deep learning approach for intelligent intrusion detection system. IEEE Access 7: 41525–41550.
Article Google Scholar
Vinayakumar, R., M. Alazab, K.P. Soman, P. Poornachandran, and S. Venkatraman. 2019. Robust intelligent malware detection using deep learning. IEEE Access 7: 46717–46738.
Article Google Scholar
Vinayakumar, R., M. Alazab, S. Srinivasan, Q. Pham, S.K. Padannayil, and K. Simran. 2020. A visualized botnet detection system based deep learning for the internet of things networks of smart cities. IEEE Transactions on Industry Applications 56 (4): 4436–4456.
Article Google Scholar
Vinayakumar, R., Prabaharan Poornachandran, and K.P. Soman. 2018. Scalable framework for cyber threat situational awareness based on domain name systems data analysis. In Big data in engineering applications, 113–142. Springer.
Google Scholar
Vinayakumar, R., K.P. Soman, and Prabaharan Poornachandran. 2018. Evaluating deep learning approaches to characterize and classify malicious url’s. Journal of Intelligent & Fuzzy Systems, 34(3):1333–1343.
Google Scholar
Vinayakumar, R., K.P. Soman, Prabaharan Poornachandran, Mamoun Alazab, and Sabu Thampi 2019. Amritadga: a comprehensive data set for domain generation algorithms (dgas) based domain name detection systems and application of deep learning. In Big data recommender systems-Volume 2: application paradigms, 455–485. Institution of Engineering and Technology (IET).
Google Scholar
Vosoughi, Soroush, Prashanth Vijayaraghavan, and Deb Roy. 2016. Tweet2vec: Learning tweet embeddings using character-level cnn-lstm encoder-decoder. In Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval, 1041–1044.
Google Scholar
Zhang, Xiang, Junbo Zhao, and Yann LeCun. Character-level convolutional networks for text classification. In Advances in neural information processing systems, 649–657.
Google Scholar

Download references

Acknowledgements

This work was supported by the Department of Corporate and Information Services, Northern Territory Government of Australia and in part by Paramount Computer Systems and Lakhshya Cyber Security Labs. We are grateful to NVIDIA India, for the GPU hardware support to the research grant. We are also grateful to Centre for Computational Engineering and Networking (CEN), Amrita School of Engineering, Coimbatore, for encouraging this research.

Author information

Authors and Affiliations

Center for Computational Engineering and Networking, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India
Sriram Srinivasan & KP Soman
Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
R. Vinayakumar
Centre for Applied Autonomous Sensors Systems (AASS), Örebro University, Örebro, Sweden
Ajay Arunachalam
Charles Darwin University, Darwin, Australia
Mamoun Alazab

Authors

Sriram Srinivasan
View author publications
You can also search for this author in PubMed Google Scholar
R. Vinayakumar
View author publications
You can also search for this author in PubMed Google Scholar
Ajay Arunachalam
View author publications
You can also search for this author in PubMed Google Scholar
Mamoun Alazab
View author publications
You can also search for this author in PubMed Google Scholar
KP Soman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sriram Srinivasan .

Editor information

Editors and Affiliations

Department of Computer Science, San Jose State University, San Jose, CA, USA
Mark Stamp
College of Engineering, IT & Environment, Charles Darwin University, Darwin, NT, Australia
Mamoun Alazab
Faculty of Information Technology and Electrical Engineering, Norwegian University of Science and Techology, Gjøvik, Norway
Andrii Shalaginov

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Srinivasan, S., Vinayakumar, R., Arunachalam, A., Alazab, M., Soman, K. (2021). DURLD: Malicious URL Detection Using Deep Learning-Based Character Level Representations. In: Stamp, M., Alazab, M., Shalaginov, A. (eds) Malware Analysis Using Artificial Intelligence and Deep Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-62582-5_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-62582-5_21
Published: 21 December 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62581-8
Online ISBN: 978-3-030-62582-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

DURLD: Malicious URL Detection Using Deep Learning-Based Character Level Representations

Abstract

Access this chapter

Similar content being viewed by others

A Deep Learning Based Online Malicious URL and DNS Detection Scheme

Analysis for Malicious URLs Using Machine Learning and Deep Learning Approaches

Phish-armour: phishing detection using deep recurrent neural networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

DURLD: Malicious URL Detection Using Deep Learning-Based Character Level Representations

Abstract

Access this chapter

Similar content being viewed by others

A Deep Learning Based Online Malicious URL and DNS Detection Scheme

Analysis for Malicious URLs Using Machine Learning and Deep Learning Approaches

Phish-armour: phishing detection using deep recurrent neural networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation