Abstract
Botnets are evolving, and their covert modus operandi, based on cloud technologies such as the virtualisation and the dynamic fast-flux addressing, has been proved challenging for classic intrusion detection systems and even the so-called next-generation firewalls. Moreover, dynamic addressing has been spotted in the wild in combination with pseudo-random domain names generation algorithm (DGA), ultimately leading to an extremely accurate and effective disguise technique. Although these concealing methods have been exposed and analysed to great extent in the past decade, the literature lacks some important conclusions and common-ground knowledge, especially when it comes to Machine Learning (ML) solutions. This research horizontally navigates the state of the art aiming to polish the feature discovery process, which is the single most time-consuming part of any ML approach. Results show that only a minor fraction of the defined features are indeed practical and informative, especially when considering 0-day botnet identification. The contributions described in this article will ease the detection process, ultimately enabling improved and more scalable solutions for DGA-based botnets detection.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Including four features (NLP-L-x , NLP-R-NUM-x , NLP-R-VOW-x , NLP-R-CON-x ) for each domain name level: the FQDN, the Second Level Domain Name (2LD) or all the others sub-levels as a whole (OLD).
According to ICANN specifics, the minimum length of a domain name without considering the Top Level Domain (TLD) is three characters. The maximum, including symbols and extensions, is 255, having a maximum length per-level of 63 characters.
The IG, is purely theoretic, it does not consider any particular classification algorithm.
By experimentally demonstrating that users’ data are not strictly required to recognise malwares in the wild. See Sect. 3.3.
References
Abakumov A (2016) andrewaeva/DGA. URL https://github.com/andrewaeva/DGA
Abbink J, Doerr C (2017) Popularity-based detection of domain generation algorithms. In: 12th international conference on availability, reliability and security, pp 79:1–79:8. https://doi.org/10.1145/3098954.3107008
Abdel-Hamid O, Mohamed Ar, Jiang H, Deng L, Penn G, Yu D (2014) Convolutional neural networks for speech recognition. IEEE/ACM Trans Audio Speech Lang Process 22(10):1533–1545. https://doi.org/10.1109/TASLP.2014.2339736
Ahluwalia A, Traore I, Ganame K, Agarwal N (2017) Detecting broad length algorithmically generated domains. In: Intelligent, secure, and dependable systems in distributed and cloud environments, chap. 2, pp 19–34. Springer International Publishing. https://doi.org/10.1007/978-3-319-69155-8_2
Alieyan K, ALmomani A, Manasrah A, Kadhum MM (2017) A survey of botnet detection based on DNS. Neural Comput Appl 28(7):1541–1558. https://doi.org/10.1007/s00521-015-2128-0
Almomani A, Alauthman M, Albalas F, Dorgham O, Obeidat A (2018) An online intrusion detection system to cloud computing based on Neucube algorithms. Int J Cloud Appl Comput 8(2):96–112. https://doi.org/10.4018/IJCAC.2018040105
Anderson HS, Woodbridge J, Filar B (2016) DeepDGA: adversarially-tuned domain generation and detection. In: 2016 ACM workshop on artificial intelligence and security, pp 13–21. https://doi.org/10.1145/2996758.2996767
Antonakakis M, Perdisci R, Nadji Y, Vasiloglou N, Abu-Nimeh S, Lee W, Dagon D (2012) From throw-away traffic to bots: detecting the rise of DGA-based malware. In: 21st USENIX security symposium, pp 491–506. Bellevue, WA. URL https://www.usenix.org/conference/usenixsecurity12/technical-sessions/presentation/antonakakis
Bader J. Domain Generation Algorithms. URL https://github.com/baderj/domain_generation_algorithms
Baruch M, David G (2018) Domain generation algorithm detection using machine learning methods. In: Cyber security: power and technology, pp 133–161. Springer International Publishing. https://doi.org/10.1007/978-3-319-75307-2_9
Berger A, Gansterer WN (2013) Modeling DNS agility with DNSMap. In: 2013 proceedings IEEE INFOCOM, pp 3153–3158. https://doi.org/10.1109/INFCOM.2013.6567130
Biglar Beigi E, Hadian Jazi H, Stakhanova N, Ghorbani AA (2014) Towards effective feature selection in machine learning-based botnet detection approaches. In: 2014 IEEE conference on communications and network security, pp 247–25. https://doi.org/10.1109/CNS.2014.6997492
Bilge L, Sen S, Balzarotti D, Kirda E, Kruegel C (2014) Exposure: a passive DNS analysis service to detect and report malicious domains. ACM Trans Inf Syst Secur 16(4):14:1–14:28. https://doi.org/10.1145/2584679
Bishop C (2006) Pattern recognition and machine learning. Springer, Berlin
Bisio F, Saeli S, Lombardo P, Bernardi D, Perotti A, Massa D (2017) Real-time behavioral DGA detection through machine learning. In: 2017 international carnahan conference on security technology, pp 1–6. https://doi.org/10.1109/CCST.2017.8167790
Bugiel S, Nürnberger S, Pöppelmann T, Sadeghi AR, Schneider T (2011) AmazonIA: when elasticity snaps back. In: 18th ACM conference on computer and communications security, pp 389–400. https://doi.org/10.1145/2046707.2046753
Demšar J, Curk T, Erjavec A, Gorup Č, Hočevar T, Milutinovič M, Možina M, Polajnar M, Toplak M, Starič A, Štajdohar M, Umek L, Žagar L, Žbontar J, Žitnik M, Zupan B (2013) Orange: data mining toolbox in Python. J Mach Learn Res 14:2349–2353. URL http://jmlr.org/papers/v14/demsar13a.html
Fran E, Hall MA, Witten IH (2016) The WEKA Workbench. Tech. rep. URL https://www.cs.waikato.ac.nz/ml/weka
Fu Y, Yu L, Hambolu O, Ozcelik I, Husain B, Sun J, Sapra K, Du D, Beasley CT, Brooks RR (2017) Stealthy domain generation algorithms. IEEE Trans Inf Forensics Secur 12(6):1430–1443. https://doi.org/10.1109/TIFS.2017.2668361
García S, Grill M, Stiborek J, Zunino A (2014) An empirical comparison of botnet detection methods. Comput Secur 45:100–123. https://doi.org/10.1016/j.cose.2014.05.011
Grill M, Nikolaev I, Valeros V, Rehak M (2015) Detecting DGA malware using NetFlow. In: 2015 IFIP/IEEE international symposium on integrated network management, pp 1304–1309. https://doi.org/10.1109/INM.2015.7140486
Gupta B, Agrawal DP, Yamaguchi S (eds) (2016) Handbook of research on modern cryptographic solutions for computer and cyber security, 1st edn. IGI Global
Han C, Zhang Y (2017) CODDULM: an approach for detecting C&C domains of DGA on passive DNS traffic. In: 2017 6th international conference on computer science and network technology, pp 385–388. https://doi.org/10.1109/ICCSNT.2017.8343724
Haykin S (1998) Neural networks: a comprehensive foundation, 2nd edn. Prentice Hall PTR, Upper Saddle River
Holz T, Steiner M, Dahl F, Biersack E, Freiling F (2008) Measurements and mitigation of peer-to-peer-based Botnets: a case study on storm worm. In: USENIX security 2008. URL https://www.usenix.org/conference/leet-08/measurements-and-mitigation-peer-peer-based-botnets-case-study-storm-worm
Hussain SA, Fatima M, Saeed A, Raza I, Shahzad RK (2017) Multilevel classification of security concerns in cloud computing. Appl Comput Inform 13(1):57–65. https://doi.org/10.1016/j.aci.2016.03.001
Kintis P, Miramirkhani N, Lever C, Chen Y, Romero-Gómez R, Pitropakis N, Nikiforakis N, Antonakakis M (2017) Hiding in plain sight: a longitudinal study of combosquatting abuse. In: ACM SIGSAC conference on computer and communications security, pp 569–586. https://doi.org/10.1145/3133956.3134002
Kührer M, Rossow C, Holz T (2014) Paint it black: evaluating the effectiveness of malware blacklists. In: RAID 2014: research in attacks, intrusions and defenses, June, pp 1–21. Springer International Publishing. https://doi.org/10.1007/978-3-319-11379-1_1
Leelasankar K, Chellappan C, Sivasankar P (2018) Handbook of research on network forensics and analysis techniques, chap. successful computer forensics analysis on the cyber attack Botnet, pp 266–281. IGI Global. https://doi.org/10.4018/978-1-5225-4100-4.ch014
Lerner Z (2014) Microsoft the Botnet hunter: the role of public-private partnerships in mitigating Botnets. Harvard J Law Technol 28(1):237–261. URL http://jolt.law.harvard.edu/articles/pdf/v28/28HarvJLTech237.pdf
Lobato AGP, Lopez MA, Sanz IJ, Cardenas AA, Duarte OCMB, Pujolle G (2018) An Adaptive real-time architecture for zero-day threat detection. In: 2018 IEEE international conference on communications (ICC), pp 1–6. https://doi.org/10.1109/ICC.2018.8422622
Luo X, Wang L, Xu Z, Yang J, Sun M, Wang J (2017) DGASensor: fast detection for DGA-based malwares. In: 5th international conference on communications and broadband networking, pp 47–53. https://doi.org/10.1145/3057109.3057112
Mac H, Tran D, Tong V, Nguyen LG, Tran HA (2017) DGA Botnet detection using supervised learning methods. In: 8th international symposium on information and communication technology, pp 211–218. https://doi.org/10.1145/3155133.3155166
Majestic-12 Ltd: The Majestic Million (2018) URL https://majestic.com/reports/majestic-million
Malware Domain List (2009) URL https://www.malwaredomainlist.com/mdl.php
Mantovani RG, Rossi AL, Vanschoren J, Bischl B, Carvalho AC (2015) To tune or not to tune: recommending when to adjust SVM hyper-parameters via meta-learning. In: Proceedings of the international joint conference on neural networks, vol 2015-September, pp 1–8. https://doi.org/10.1109/IJCNN.2015.7280644
Mell P, Grance T (2011) The NIST definition of cloud computing, NIST Special Publication 800-145. URL http://faculty.winthrop.edu/domanm/csci411/Handouts/NIST.pdf
Mowbray M, Hagen J (2014) Finding domain-generation algorithms by looking at length distribution. In: 2014 IEEE international symposium on software reliability engineering workshops, pp 395–400. https://doi.org/10.1109/ISSREW.2014.20
Nespoli P, Papamartzivanos D, Mrmol FG, Kambourakis G (2018) Optimal countermeasures selection against cyber attacks: a comprehensive survey on reaction frameworks. IEEE Commun Surv Tutor 20(2):1361–1396. https://doi.org/10.1109/COMST.2017.2781126
Netlab 360: DGA Families. URL http://data.netlab.360.com/dga/
Nguyen TD, Cao TD, Nguyen LG (2015) DGA Botnet detection using collaborative filtering and density-based clustering. In: 6th international symposium on information and communication technology, pp 203–209. https://doi.org/10.1145/2833258.2833310
OSINT: OSINT DGA List. URL http://osint.bambenekconsulting.com/feeds/
Pelleg D, Moore A (2000) X-means: Extending K-Means with efficient estimation of the number of clusters. In: 7th international conference on machine learning pp 727–734. https://doi.org/10.1007/3-540-44491-2_3
Plohmann D (2015) DGArchive. URL https://dgarchive.caad.fkie.fraunhofer.de
Plohmann D, Yakdan K, Klatt M, Bader J, Gerhards-Padilla E (2016) A comprehensive measurement study of domain generating malware. In: 25th USENIX security symposium, pp 263–278. Austin, TX. URL https://www.usenix.org/system/files/conference/usenixsecurity16/sec16_paper_plohmann.pdf
Pu Y, Chen X, Pu Y, Shi J (2015) A clustering approach for detecting auto-generated Botnet domains. In: Applications and techniques in information security, pp 269–279. https://doi.org/10.1007/978-3-662-48683-2_24
Risk Analytics: DNS-BH-Malware Domain Blocklist (2007). URL http://www.malwaredomains.com
Schales DL, Jang J, Wang T, Hu X, Kirat D, Wuest B, Stoecklin MP (2016) Scalable analytics to detect DNS misuse for establishing stealthy communication channels. IBM J Res Dev 60(4):3:1–3:14. https://doi.org/10.1147/JRD.2016.2557639
Schiavoni S, Maggi F, Cavallaro L, Zanero S (2014) Phoenix: DGA-based Botnet tracking and intelligence. In: 11th international conference on detection of intrusions and malware, and vulnerability assessment, pp 192–211. Springer International Publishing. https://doi.org/10.1007/978-3-319-08509-8_11
Sharieh A, Albdour L (2017) A heuristic approach for service allocation in cloud computing. Int J Cloud Appl Comput 7(4):60–74. https://doi.org/10.4018/IJCAC.2017100104
Shi Y, Chen G, Li J (2017) Malicious domain name detection based on extreme machine learning. Neural Process Lett. https://doi.org/10.1007/s11063-017-9666-7
Song WJ, Li B (2016) A method to detect machine generated domain names based on random forest algorithm. In: 2016 international conference on information system and artificial intelligence, pp 509–513. https://doi.org/10.1109/ISAI.2016.0114
Stergiou C, Psannis KE, Kim BG, Gupta B (2018) Secure integration of IoT and cloud computing. Future Gener Comput Syst 78(3):964–975. https://doi.org/10.1016/j.future.2016.11.031
Stevanovic M, Pedersen JM, D’Alconzo A, Ruehrup S, Berger A (2015) On the ground truth problem of malicious DNS traffic analysis. Comput Secur 55:142–158. https://doi.org/10.1016/j.cose.2015.09.004
Stevanovic M, Pedersen JM, D’Alconzo A, Ruehrup S (2017) A method for identifying compromised clients based on DNS traffic analysis. Int J Inf Secur 16(2):115–132. https://doi.org/10.1007/s10207-016-0331-3
Thomas M, Mohaisen A (2014) Kindred domains: detecting and clustering Botnet domains using DNS traffic. In: 23rd international conference on World Wide Web, pp 707–712. https://doi.org/10.1145/2567948.2579359
Tong V, Nguyen G (2016) A method for detecting DGA Botnet based on semantic and cluster analysis. In: 7th symposium on information and communication technology, pp 272–277. https://doi.org/10.1145/3011077.3011112
Tran D, Mac H, Tong V, Tran HA, Nguyen LG (2018) A LSTM based framework for handling multiclass imbalance in DGA Botnet detection. Neurocomputing 275:2401–2413. https://doi.org/10.1016/j.neucom.2017.11.018
Truong D, Cheng G (2016) Detecting domain-flux botnet based on DNS traffic features in managed network. Secur Commun Netw 9(14):2338–2347. https://doi.org/10.1002/sec.1495
Tu TD, Guang C, Xin LY (2015) Detecting Bot-infected machines based on analyzing the similar periodic DNS queries. In: 2015 international conference on communications, management and telecommunications, pp 35–40. https://doi.org/10.1109/ComManTel.2015.7394256
Vinayakumar R, Soman K, Poornachandran P, Sachin Kumar S (2018) Evaluating deep learning approaches to characterize and classify the DGAs at scale. J Intell Fuzzy Syst 34(3):1265–1276. https://doi.org/10.3233/JIFS-169423
Vormayr G, Zseby T, Fabini J (2017) Botnet communication patterns. IEEE Commun Surv Tutor 19(4):2768–2796. https://doi.org/10.1109/COMST.2017.2749442
Watkins L, Beck S, Zook J, Buczak A, Chavis J, Robinson WH, Morales JA, Mishra S (2017) Using semi-supervised machine learning to address the big data problem in DNS networks. In: 2017 IEEE 7th annual computing and communication workshop and conference, pp 1–6. https://doi.org/10.1109/CCWC.2017.7868376
Woodbridge J, Anderson HS, Ahuja A, Grant D (2016) Predicting domain generation algorithms with long short-term memory networks. CoRR abs/1611.0. URL http://arxiv.org/abs/1611.00791
Xu S, Li S, Meng K, Wu L, Ding M (2017) An adaptive malicious domain detection mechanism with DNS traffic. In: 2017 VI international conference on network, communication and computing, pp 86–91. https://doi.org/10.1145/3171592.3171595
Yadav S, Reddy AKK, Reddy ALN, Ranjan S (2010) Detecting algorithmically generated malicious domain names. In: 10th ACM SIGCOMM conference on internet measurement, pp 48–61. https://doi.org/10.1145/1879141.1879148
Zhang S, Zhang X, Ou X (2014) After we knew it: empirical study and modeling of cost-effectiveness of exploiting prevalent known vulnerabilities across IaaS cloud. In: 9th ACM symposium on information, computer and communications security, pp 317–328. https://doi.org/10.1145/2590296.2590300
Zhang H, Gharaibeh M, Thanasoulas S, Papadopoulos C (2016) BotDigger: detecting DGA Bots in a single network. Tech. rep., Colorado State University. URL http://www.cs.colostate.edu/TechReports/Reports/2016/tr16-101.pdf
Acknowledgements
This study was founded by a predoctoral and a postdoctoral INCIBE Grant within the “Ayudas para la Excelencia de los Equipos de Investigación Avanzada en Ciberseguridad” program, with Codes INCIBEI-2015-27353 and INCIBEI-2015-27352.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they do not have any conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by B. B. Gupta.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zago, M., Gil Pérez, M. & Martínez Pérez, G. Scalable detection of botnets based on DGA. Soft Comput 24, 5517–5537 (2020). https://doi.org/10.1007/s00500-018-03703-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-018-03703-8