Malicious Domain Detection Using Random Indexing and Machine Learning

Gowri Raghavendra Narayan, Kurmala; Rajendra Prasath, R.; Odelu, Vanga

doi:10.1007/978-981-99-8438-1_39

Part of the book series: Algorithms for Intelligent Systems ((AIS))

Included in the following conference series:

International Conference on Engineering, Applied Sciences and System Modeling

79 Accesses

Abstract

In this paper, we have described the use of distributed representations for malicious domain detection using Random Indexing and Machine Learning. At first, the proposed approach focuses on distributed representations of the context accumulated from domains, subdomains, and the path of each URL in the given set using Random Indexing and then applies the machine learning approaches for the classification to detect malicious and benign domains. In order to measure the classification performance, we have built five machine learning classifiers using Logistic Regression, Decision Tree, \(k-\)Nearest Neighbors, Support Vector Machines, and Random Forest. All these machine learning models are used to detect malicious domains from others in a given set of URLs. We have used two datasets: one consisting of malicious domains collected from 360.net Lab and another one consisting of benign domains collected from Alexa’s top 1 million domains. We have compared the performance of the existing malicious detection approach with the proposed Random Indexing and machine learning-based approach on different distributions of the training and test dataset. It has been observed that the proposed approach with the Random Forest classifier identifies malicious URLs with a precision score of 99.5%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Hardcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
APWG, https://apwg.org/trendsreports (last accessed: 22 November 2021).
2.
Alexa top 1 million sites, https://www.alexa.com/topsites.
3.
Qihoo 360 Technology Co., Ltd, https://data.netlab.360.com/dga.

References

Akiyama M, Yagi T, Yada T, Mori T, Kadobayashi Y (2017) Analyzing the ecosystem of malicious URL redirection through longitudinal observation from honeypots. Comput Secur 69:155–173
Article Google Scholar
Althobaiti K, Rummani G, Vaniea K (2019) A review of human-and computer-facing URL phishing features. In: 2019 IEEE European symposium on security and privacy workshops (EuroS &PW). IEEE, pp 182–191
Google Scholar
Bilge L, Sen S, Balzarotti D, Kirda E, Kruegel C (2014) Exposure: a passive DNS analysis service to detect and report malicious domains. ACM Trans Inform Syst Secur (TISSEC) 16(4):1–28
Article Google Scholar
Holz T et al (2008) Measuring and detecting fast-flux service networks. In: Proceedings of the network and distributed system security symposium, NDSS 2008, San Diego, California, USA, 10th–13th February 2008. The Internet Society
Google Scholar
Hou YT, Chang Y, Chen T, Laih CS, Chen CM (2010) Malicious web content detection by machine learning. Exp Syst Appl 37(1):55–60
Google Scholar
Korkmaz M, Sahingoz OK, Diri B (2020) Detection of phishing websites by using machine learning-based url analysis. In: 2020 11th international conference on computing, communication and networking technologies (ICCCNT). IEEE, pp 1–7
Google Scholar
Liang Y, Deng J, Cui B (2019) Bidirectional LSTM: an innovative approach for phishing URL identification. In: International conference on innovative mobile and internet services in ubiquitous computing. Springer, pp 326–337
Google Scholar
Ma J, Saul LK, Savage S, Voelker GM (2009) Beyond blacklists: learning to detect malicious web sites from suspicious URLs. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1245–1254
Google Scholar
Mamun MSI, Rathore MA, Lashkari AH, Stakhanova N, Ghorbani AA (2016) Detecting malicious URLs using lexical analysis. In: International conference on network and system security. Springer, pp 467–482
Google Scholar
McGrath DK, Gupta M (2008) Behind phishing: an examination of phisher Modi operandi. LEET 8:4
Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
MathSciNet Google Scholar
Prakash P, Kumar M, Kompella RR, Gupta M (2010) Phishnet: predictive blacklisting to detect phishing attacks. In: 2010 proceedings IEEE INFOCOM. IEEE, pp 1–5
Google Scholar
Rao RS, Vaishnavi T, Pais AR (2020) Catchphish: detection of phishing websites by inspecting URLs. J Amb Intell Human Comput 11(2):813–825
Article Google Scholar
Sahingoz OK, Buber E, Demir O, Diri B (2019) Machine learning based phishing detection from URLs. Exp Syst Appl 117:345–357
Article Google Scholar
Sahlgren M (2005) An introduction to random indexing. In: Methods and applications of semantic indexing workshop at the 7th international conference on terminology and knowledge engineering
Google Scholar
Sahoo D, Liu C, Hoi SC (2017) Malicious URL detection using machine learning: a survey. arXiv preprint arXiv:1701.07179
Singh V, Vijay D, Akhtar SS, Shrivastava M (2018) Named entity recognition for Hindi-English code-mixed social media text. In: Proceedings of the seventh named entities workshop, pp 27–35
Google Scholar
Yadav S, Reddy AKK, Reddy AN, Ranjan S (2010) Detecting algorithmically generated malicious domain names. In: Proceedings of the 10th ACM SIGCOMM conference on Internet measurement, pp 48–61
Google Scholar
Zhang Y, Hong JI, Cranor LF (2007) Cantina: a content-based approach to detecting phishing web sites. In: Proceedings of the 16th international conference on World Wide Web, pp 639–648
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science and Engineering Group, Indian Institute of Information Technology Sri City, Chittoor, 630 Gnan Marg, Sri City, 517646, Andhra Pradesh, India
Kurmala Gowri Raghavendra Narayan, R. Rajendra Prasath & Vanga Odelu

Authors

Kurmala Gowri Raghavendra Narayan
View author publications
You can also search for this author in PubMed Google Scholar
R. Rajendra Prasath
View author publications
You can also search for this author in PubMed Google Scholar
Vanga Odelu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to R. Rajendra Prasath .

Editor information

Editors and Affiliations

Faculty of Innovation and Technology, Taylor’s University, Subang Jaya, Selangor, Malaysia
David Asirvatham
University of Southeast Norway, Notodden, Norway
Francisco M. Gonzalez-Longatt
Gdansk University of Technology, Gdańsk, Poland
Przemyslaw Falkowski-Gilski
Professor of Computer Engineering, Papua New Guinea University of Technology, Lae, Papua New Guinea
R. Kanthavel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gowri Raghavendra Narayan, K., Rajendra Prasath, R., Odelu, V. (2024). Malicious Domain Detection Using Random Indexing and Machine Learning. In: Asirvatham, D., Gonzalez-Longatt, F.M., Falkowski-Gilski, P., Kanthavel, R. (eds) Evolutionary Artificial Intelligence. ICEASSM 2017. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-99-8438-1_39

Download citation

DOI: https://doi.org/10.1007/978-981-99-8438-1_39
Published: 14 March 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8437-4
Online ISBN: 978-981-99-8438-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Malicious Domain Detection Using Random Indexing and Machine Learning