Abstract
Recent developments in information technology have brought numerous benefits but have also created risks for information security. One notable threat is the domain generated by the algorithm (DGA) technique used by botnets, which allows them to automatically generate and register multiple domains to evade detection and control from network security systems. To address this issue, we conducted research on a domain classification model specific to botnet-generated domains. We developed three domain classification models: bigrams, long short-term memory networks (LSTM), and a combination of LSTM and one-hot encoding. In this study, we implemented an ensemble model using a domain classification system, named UIT-DGADetector. To optimize the system, we employed Kafka to queue and streamline the requests, thereby reducing the load on the classification server. The deployed system operates well and achieves a high accuracy rate in predicting the domain types. However, this model still has limitations in predicting Word-based DGA botnets. The process must be optimized to reduce the waiting time in the queue. This study aims to contribute to network security and information protection, particularly by addressing the issue of DGA botnets.
Similar content being viewed by others
Data availability
The data supporting the findings of this study are available from the corresponding author upon reasonable request.
References
Nasir, M.H., Arshad, J., Khan, M.M.: Collaborative device-level botnet detection for internet of things. Comput. Secur. 129, 103172 (2023)
Alaeiyan, M., Parsa, S., Vinod, P., Conti, M.: Detection of algorithmically-generated domains: an adversarial machine learning approach. Comput. Commun. 160, 661–673 (2020)
Gaonkar, S., Dessai, N.F., Costa, J., Borkar, A., Aswale, S., Shetgaonkar, P.: A survey on botnet detection techniques. In: 2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE). IEEE, pp. 1–6 (2020)
Jayalaxmi, P., Kumar, G., Saha, R., Conti, M., Kim, T.-H., Thomas, R.: Debot: a deep learning-based model for bot detection in industrial internet-of-things. Comput. Electr. Eng. 102, 108214 (2022)
Yu, B., Pan, J., Hu, J., Nascimento, A., De Cock, M.: Character level based detection of dga domain names. In: international joint conference on neural networks (IJCNN). IEEE, 2018, pp. 1–8 (2018)
Almuhaideb, A.M., Alynanbaawi, D.Y.: Applications of artificial intelligence to detect android botnets: a survey. IEEE Access 10, 71737–71748 (2022)
Alani, M.M.: Botstop: packet-based efficient and explainable iot botnet detection using machine learning. Comput. Commun. 193, 53–62 (2022)
Mousavi, S., Khansari, M., Rahmani, R.: A fully scalable big data framework for botnet detection based on network traffic analysis. Inf. Sci. 512, 629–640 (2020)
Durmaz, A.E.: Dga classification and detection for automated malware analysis (2017). [Online]. Available: https://cyber.wtf/2017/08/30/dga-classification-and-detection-for-automated-malware-analysis/
Hoang, X.D., Nguyen, Q.C.: Botnet detection based on machine learning techniques using dns query data. Fut. Internet 10(5), 43 (2018)
Alieyan, K., ALmomani, A., Manasrah, A., Kadhum, M.M.: A survey of botnet detection based on dns. Neural Comput. Appl. 28, 1541–1558 (2017)
Hanafi, A.V., Ghaffari, A., Rezaei, H., Valipour, A., Arasteh, B.: Intrusion detection in internet of things using improved binary golden jackal optimization algorithm and lstm. Cluster Comput. pp. 1–18 (2023)
Tuan, T.A., Long, H.V., Taniar, D.: On detecting and classifying dga botnets and their families. Comput. Secur. 113, 102549 (2022)
Yun, X., Huang, J., Wang, Y., Zang, T., Zhou, Y., Zhang, Y.: Khaos: an adversarial neural network dga with high anti-detection ability. IEEE Trans. Inf. Foren. Secur. 15, 2225–2240 (2019)
Kara, I., Ok, M., Ozaday, A.: Characteristics of understanding urls and domain names features: the detection of phishing websites with machine learning methods. IEEE Access 10, 124420–124428 (2022)
Zhu, Y., Cui, L., Ding, Z., Li, L., Liu, Y., Hao, Z.: Black box attack and network intrusion detection using machine learning for malicious traffic. Comput. Secur. 123, 102922 (2022)
Gibert, D., Mateu, C., Planes, J.: The rise of machine learning for detection and classification of malware: research developments, trends and challenges. J. Netw. Comput. Appl. 153, 102526 (2020)
Kalakoti, R., Nõmm, S., Bahsi, H.: In-depth feature selection for the statistical machine learning-based botnet detection in iot networks. IEEE Access 10, 94518–94535 (2022)
Zeidanloo, H.R., Shooshtari, M.J.Z., Amoli, P.V., Safari, M., Zamani, M.: A taxonomy of botnet detection techniques. In: 2010 3rd International Conference on Computer Science and Information Technology, vol. 2, pp. 158–162. IEEE (2010)
Highnam, K., Puzio, D., Luo, S., Jennings, N.R.: Real-time detection of dictionary dga network traffic using deep learning. SN Comput. Sci. 2(2), 110 (2021)
Mughaid, A., AlZu’bi, S., Hnaif, A., Taamneh, S., Alnajjar, A., Elsoud, E.A.: An intelligent cyber security phishing detection system using deep learning techniques. Clust. Comput. 25(6), 3819–3828 (2022)
Hu, X., Chen, H., Li, M., Cheng, G., Li, R., Wu, H., Yuan, Y.: Replacedga: Bilstm based adversarial dga with high anti-detection ability. IEEE Trans. Inform. Foren. Secur. (2023)
Wang, S., Sun, L., Qin, S., Li, W., Liu, W.: Krtunnel: Dns channel detector for mobile devices. Comput. Secur. 120, 102818 (2022)
Wang, T.-S., Lin, H.-T., Cheng, W.-T., Chen, C.-Y.: Dbod: clustering and detecting dga-based botnets using dns traffic analysis. Comput. Secur. 64, 1–15 (2017)
Zago, M., Pérez, M.G., Pérez, G.M.: Umudga: a dataset for profiling dga-based botnet. Comput. Secur. 92, 101719 (2020)
Fu, Y., Yu, L., Hambolu, O., Ozcelik, I., Husain, B., Sun, J., Sapra, K., Du, D., Beasley, C.T., Brooks, R.R.: Stealthy domain generation algorithms. IEEE Trans. Inf. Foren. Secur. 12(6), 1430–1443 (2017)
Liang, J., Chen, S., Wei, Z., Zhao, S., Zhao, W.: Hagdetector: heterogeneous dga domain name detection model. Comput. Secur. 120, 102803 (2022)
Motylinski, M., MacDermott, Á., Iqbal, F., Shah, B.: A gpu-based machine learning approach for detection of botnet attacks. Comput. Secur. 123, 102918 (2022)
Chiba, D., Akiyama, M., Yagi, T., Hato, K., Mori, T., Goto, S.: Domainchroma: Building actionable threat intelligence from malicious domain names. Computers & Security 77, 138–161 (2018)
Almashhadani, A.O., Kaiiali, M., Carlin, D., Sezer, S.: Maldomdetector: a system for detecting algorithmically generated domain names with machine learning. Comput. Secur. 93, 101787 (2020)
Logistic Regression in Machine Learning - Javatpoint - javatpoint.com (2021). https://www.javatpoint.com/logistic-regression-in-machine-learning, [Accessed 09-06-2023]
Foroozan Yazdani, S., Tan, Z., Kakavand, M., Mustapha, A.: Ngrampos: a bigram-based linguistic and statistical feature process model for unstructured text classification. Wirel. Netw. 1–11 (2022)
Cucchiarelli, A., Morbidoni, C., Spalazzi, L., Baldi, M.: Algorithmically generated malicious domain names detection based on n-grams features. Expert Syst. Appl. 170, 114551 (2021)
Aydın, H., Orman, Z., Aydın, M.A.: A long short-term memory (lstm)-based distributed denial of service (ddos) detection and defense system design in public cloud network environment. Comput. Secur. 118, 102725 (2022)
Understanding LSTM Networks - colah’s blog - colah.github.io (2015). https://colah.github.io/posts/2015-08-Understanding-LSTMs, [Accessed 09-06-2023]
Tran, D., Mac, H., Tong, V., Tran, H.A., Nguyen, L.G.: A lstm based framework for handling multiclass imbalance in dga botnet detection. Neurocomputing 275, 2401–2413 (2018)
Hyrum Anderson, J.W.: Using deep learning to detect DGAs - elastic.co (2016). https://www.elastic.co/blog/using-deep-learning-detect-dgas [Accessed 09-06-2023]
Qiao, Y., Zhang, B., Zhang, W., Sangaiah, A.K., Wu, H.: Dga domain name classification method based on long short-term memory with attention mechanism. Appl. Sci. 9(20), 4205 (2019)
Jafarzadehpour, F., Molahosseini, A.S., Zarandi, A.A.E., Sousa, L.: Efficient modular adder designs based on thermometer and one-hot coding. IEEE Trans. Very Large Scale Integr. (vlsi) Syst. 27(9), 2142–2155 (2019)
Mestour, Z.: Domain Generation Algorithm - kaggle.com (2023). https://www.kaggle.com/datasets/slashtea/domain-generation-algorithm [Accessed 09-06-2023]
Nowroozi, E., Mohammadi, M., Conti, M., et al.: An adversarial attack analysis on malicious advertisement url detection framework. IEEE Trans. Netw. Serv. Manag. (2022)
Raptis, T.P., Passarella, A.: A survey on networked data streaming with apache kafka. IEEE Access (2023)
Braunisch, N., Schlesinger, S., Lehmann, R.: Adaptive industrial iot gateway using kafka streaming platform. In: 2022 IEEE 20th International Conference on Industrial Informatics (INDIN), pp. 600–605. IEEE, (2022)
Confluent: Quick Start for Confluent Platform | Confluent Documentation - docs.confluent.io (2023). https://docs.confluent.io/platform/current/platform-quickstart.html [Accessed 09-06-2023]
Yu, B., Gray, D.L., Pan, J., De Cock, M., Nascimento, A.C.: Inline dga detection with deep networks. In: 2017 IEEE International Conference on Data Mining Workshops (ICDMW), pp. 683–692. IEEE (2017)
Givre, C.: DGA dataset - kaggle.com (2023). https://www.kaggle.com/datasets/gtkcyber/dga-dataset [Accessed 09-06-2023]
Acknowledgements
This research was supported by The VNUHCM-University of Information Technology’s Scientific Research Support Fund.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by NTC and NNM. The first draft of the manuscript was written by both authors and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Cam, N.T., Man, N.N. Uit-DGAdetector: detect domains generated by algorithms using machine learning. Cluster Comput (2024). https://doi.org/10.1007/s10586-024-04363-0
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10586-024-04363-0