Advertisement

Scalable detection of botnets based on DGA

Efficient feature discovery process in machine learning techniques
  • Mattia Zago
  • Manuel Gil Pérez
  • Gregorio Martínez PérezEmail author
Focus
  • 67 Downloads

Abstract

Botnets are evolving, and their covert modus operandi, based on cloud technologies such as the virtualisation and the dynamic fast-flux addressing, has been proved challenging for classic intrusion detection systems and even the so-called next-generation firewalls. Moreover, dynamic addressing has been spotted in the wild in combination with pseudo-random domain names generation algorithm (DGA), ultimately leading to an extremely accurate and effective disguise technique. Although these concealing methods have been exposed and analysed to great extent in the past decade, the literature lacks some important conclusions and common-ground knowledge, especially when it comes to Machine Learning (ML) solutions. This research horizontally navigates the state of the art aiming to polish the feature discovery process, which is the single most time-consuming part of any ML approach. Results show that only a minor fraction of the defined features are indeed practical and informative, especially when considering 0-day botnet identification. The contributions described in this article will ease the detection process, ultimately enabling improved and more scalable solutions for DGA-based botnets detection.

Keywords

Botnet Domain generation algorithm DGA Machine Learning Natural language processing 

Notes

Acknowledgements

This study was founded by a predoctoral and a postdoctoral INCIBE Grant within the “Ayudas para la Excelencia de los Equipos de Investigación Avanzada en Ciberseguridad” program, with Codes INCIBEI-2015-27353 and INCIBEI-2015-27352.

Compliance with ethical standards

Conflict of interest

The authors declare that they do not have any conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

References

  1. Abakumov A (2016) andrewaeva/DGA. URL https://github.com/andrewaeva/DGA
  2. Abbink J, Doerr C (2017) Popularity-based detection of domain generation algorithms. In: 12th international conference on availability, reliability and security, pp 79:1–79:8.  https://doi.org/10.1145/3098954.3107008
  3. Abdel-Hamid O, Mohamed Ar, Jiang H, Deng L, Penn G, Yu D (2014) Convolutional neural networks for speech recognition. IEEE/ACM Trans Audio Speech Lang Process 22(10):1533–1545.  https://doi.org/10.1109/TASLP.2014.2339736 CrossRefGoogle Scholar
  4. Ahluwalia A, Traore I, Ganame K, Agarwal N (2017) Detecting broad length algorithmically generated domains. In: Intelligent, secure, and dependable systems in distributed and cloud environments, chap. 2, pp 19–34. Springer International Publishing.  https://doi.org/10.1007/978-3-319-69155-8_2
  5. Alieyan K, ALmomani A, Manasrah A, Kadhum MM (2017) A survey of botnet detection based on DNS. Neural Comput Appl 28(7):1541–1558.  https://doi.org/10.1007/s00521-015-2128-0 CrossRefGoogle Scholar
  6. Almomani A, Alauthman M, Albalas F, Dorgham O, Obeidat A (2018) An online intrusion detection system to cloud computing based on Neucube algorithms. Int J Cloud Appl Comput 8(2):96–112.  https://doi.org/10.4018/IJCAC.2018040105 Google Scholar
  7. Anderson HS, Woodbridge J, Filar B (2016) DeepDGA: adversarially-tuned domain generation and detection. In: 2016 ACM workshop on artificial intelligence and security, pp 13–21.  https://doi.org/10.1145/2996758.2996767
  8. Antonakakis M, Perdisci R, Nadji Y, Vasiloglou N, Abu-Nimeh S, Lee W, Dagon D (2012) From throw-away traffic to bots: detecting the rise of DGA-based malware. In: 21st USENIX security symposium, pp 491–506. Bellevue, WA. URL https://www.usenix.org/conference/usenixsecurity12/technical-sessions/presentation/antonakakis
  9. Bader J. Domain Generation Algorithms. URL https://github.com/baderj/domain_generation_algorithms
  10. Baruch M, David G (2018) Domain generation algorithm detection using machine learning methods. In: Cyber security: power and technology, pp 133–161. Springer International Publishing.  https://doi.org/10.1007/978-3-319-75307-2_9
  11. Berger A, Gansterer WN (2013) Modeling DNS agility with DNSMap. In: 2013 proceedings IEEE INFOCOM, pp 3153–3158.  https://doi.org/10.1109/INFCOM.2013.6567130
  12. Biglar Beigi E, Hadian Jazi H, Stakhanova N, Ghorbani AA (2014) Towards effective feature selection in machine learning-based botnet detection approaches. In: 2014 IEEE conference on communications and network security, pp 247–25.  https://doi.org/10.1109/CNS.2014.6997492
  13. Bilge L, Sen S, Balzarotti D, Kirda E, Kruegel C (2014) Exposure: a passive DNS analysis service to detect and report malicious domains. ACM Trans Inf Syst Secur 16(4):14:1–14:28.  https://doi.org/10.1145/2584679 CrossRefGoogle Scholar
  14. Bishop C (2006) Pattern recognition and machine learning. Springer, BerlinzbMATHGoogle Scholar
  15. Bisio F, Saeli S, Lombardo P, Bernardi D, Perotti A, Massa D (2017) Real-time behavioral DGA detection through machine learning. In: 2017 international carnahan conference on security technology, pp 1–6.  https://doi.org/10.1109/CCST.2017.8167790
  16. Bugiel S, Nürnberger S, Pöppelmann T, Sadeghi AR, Schneider T (2011) AmazonIA: when elasticity snaps back. In: 18th ACM conference on computer and communications security, pp 389–400.  https://doi.org/10.1145/2046707.2046753
  17. Demšar J, Curk T, Erjavec A, Gorup Č, Hočevar T, Milutinovič M, Možina M, Polajnar M, Toplak M, Starič A, Štajdohar M, Umek L, Žagar L, Žbontar J, Žitnik M, Zupan B (2013) Orange: data mining toolbox in Python. J Mach Learn Res 14:2349–2353. URL http://jmlr.org/papers/v14/demsar13a.html
  18. Fran E, Hall MA, Witten IH (2016) The WEKA Workbench. Tech. rep. URL https://www.cs.waikato.ac.nz/ml/weka
  19. Fu Y, Yu L, Hambolu O, Ozcelik I, Husain B, Sun J, Sapra K, Du D, Beasley CT, Brooks RR (2017) Stealthy domain generation algorithms. IEEE Trans Inf Forensics Secur 12(6):1430–1443.  https://doi.org/10.1109/TIFS.2017.2668361 CrossRefGoogle Scholar
  20. García S, Grill M, Stiborek J, Zunino A (2014) An empirical comparison of botnet detection methods. Comput Secur 45:100–123.  https://doi.org/10.1016/j.cose.2014.05.011 CrossRefGoogle Scholar
  21. Grill M, Nikolaev I, Valeros V, Rehak M (2015) Detecting DGA malware using NetFlow. In: 2015 IFIP/IEEE international symposium on integrated network management, pp 1304–1309.  https://doi.org/10.1109/INM.2015.7140486
  22. Gupta B, Agrawal DP, Yamaguchi S (eds) (2016) Handbook of research on modern cryptographic solutions for computer and cyber security, 1st edn. IGI GlobalGoogle Scholar
  23. Han C, Zhang Y (2017) CODDULM: an approach for detecting C&C domains of DGA on passive DNS traffic. In: 2017 6th international conference on computer science and network technology, pp 385–388.  https://doi.org/10.1109/ICCSNT.2017.8343724
  24. Haykin S (1998) Neural networks: a comprehensive foundation, 2nd edn. Prentice Hall PTR, Upper Saddle RiverzbMATHGoogle Scholar
  25. Holz T, Steiner M, Dahl F, Biersack E, Freiling F (2008) Measurements and mitigation of peer-to-peer-based Botnets: a case study on storm worm. In: USENIX security 2008. URL https://www.usenix.org/conference/leet-08/measurements-and-mitigation-peer-peer-based-botnets-case-study-storm-worm
  26. Hussain SA, Fatima M, Saeed A, Raza I, Shahzad RK (2017) Multilevel classification of security concerns in cloud computing. Appl Comput Inform 13(1):57–65.  https://doi.org/10.1016/j.aci.2016.03.001 CrossRefGoogle Scholar
  27. Kintis P, Miramirkhani N, Lever C, Chen Y, Romero-Gómez R, Pitropakis N, Nikiforakis N, Antonakakis M (2017) Hiding in plain sight: a longitudinal study of combosquatting abuse. In: ACM SIGSAC conference on computer and communications security, pp 569–586.  https://doi.org/10.1145/3133956.3134002
  28. Kührer M, Rossow C, Holz T (2014) Paint it black: evaluating the effectiveness of malware blacklists. In: RAID 2014: research in attacks, intrusions and defenses, June, pp 1–21. Springer International Publishing.  https://doi.org/10.1007/978-3-319-11379-1_1
  29. Leelasankar K, Chellappan C, Sivasankar P (2018) Handbook of research on network forensics and analysis techniques, chap. successful computer forensics analysis on the cyber attack Botnet, pp 266–281. IGI Global.  https://doi.org/10.4018/978-1-5225-4100-4.ch014
  30. Lerner Z (2014) Microsoft the Botnet hunter: the role of public-private partnerships in mitigating Botnets. Harvard J Law Technol 28(1):237–261. URL http://jolt.law.harvard.edu/articles/pdf/v28/28HarvJLTech237.pdf
  31. Lobato AGP, Lopez MA, Sanz IJ, Cardenas AA, Duarte OCMB, Pujolle G (2018) An Adaptive real-time architecture for zero-day threat detection. In: 2018 IEEE international conference on communications (ICC), pp 1–6.  https://doi.org/10.1109/ICC.2018.8422622
  32. Luo X, Wang L, Xu Z, Yang J, Sun M, Wang J (2017) DGASensor: fast detection for DGA-based malwares. In: 5th international conference on communications and broadband networking, pp 47–53.  https://doi.org/10.1145/3057109.3057112
  33. Mac H, Tran D, Tong V, Nguyen LG, Tran HA (2017) DGA Botnet detection using supervised learning methods. In: 8th international symposium on information and communication technology, pp 211–218.  https://doi.org/10.1145/3155133.3155166
  34. Majestic-12 Ltd: The Majestic Million (2018) URL https://majestic.com/reports/majestic-million
  35. Malware Domain List (2009) URL https://www.malwaredomainlist.com/mdl.php
  36. Mantovani RG, Rossi AL, Vanschoren J, Bischl B, Carvalho AC (2015) To tune or not to tune: recommending when to adjust SVM hyper-parameters via meta-learning. In: Proceedings of the international joint conference on neural networks, vol 2015-September, pp 1–8.  https://doi.org/10.1109/IJCNN.2015.7280644
  37. Mell P, Grance T (2011) The NIST definition of cloud computing, NIST Special Publication 800-145. URL http://faculty.winthrop.edu/domanm/csci411/Handouts/NIST.pdf
  38. Mowbray M, Hagen J (2014) Finding domain-generation algorithms by looking at length distribution. In: 2014 IEEE international symposium on software reliability engineering workshops, pp 395–400.  https://doi.org/10.1109/ISSREW.2014.20
  39. Nespoli P, Papamartzivanos D, Mrmol FG, Kambourakis G (2018) Optimal countermeasures selection against cyber attacks: a comprehensive survey on reaction frameworks. IEEE Commun Surv Tutor 20(2):1361–1396.  https://doi.org/10.1109/COMST.2017.2781126 CrossRefGoogle Scholar
  40. Netlab 360: DGA Families. URL http://data.netlab.360.com/dga/
  41. Nguyen TD, Cao TD, Nguyen LG (2015) DGA Botnet detection using collaborative filtering and density-based clustering. In: 6th international symposium on information and communication technology, pp 203–209.  https://doi.org/10.1145/2833258.2833310
  42. OSINT: OSINT DGA List. URL http://osint.bambenekconsulting.com/feeds/
  43. Pelleg D, Moore A (2000) X-means: Extending K-Means with efficient estimation of the number of clusters. In: 7th international conference on machine learning pp 727–734.  https://doi.org/10.1007/3-540-44491-2_3
  44. Plohmann D (2015) DGArchive. URL https://dgarchive.caad.fkie.fraunhofer.de
  45. Plohmann D, Yakdan K, Klatt M, Bader J, Gerhards-Padilla E (2016) A comprehensive measurement study of domain generating malware. In: 25th USENIX security symposium, pp 263–278. Austin, TX. URL https://www.usenix.org/system/files/conference/usenixsecurity16/sec16_paper_plohmann.pdf
  46. Pu Y, Chen X, Pu Y, Shi J (2015) A clustering approach for detecting auto-generated Botnet domains. In: Applications and techniques in information security, pp 269–279.  https://doi.org/10.1007/978-3-662-48683-2_24
  47. Risk Analytics: DNS-BH-Malware Domain Blocklist (2007). URL http://www.malwaredomains.com
  48. Schales DL, Jang J, Wang T, Hu X, Kirat D, Wuest B, Stoecklin MP (2016) Scalable analytics to detect DNS misuse for establishing stealthy communication channels. IBM J Res Dev 60(4):3:1–3:14.  https://doi.org/10.1147/JRD.2016.2557639 CrossRefGoogle Scholar
  49. Schiavoni S, Maggi F, Cavallaro L, Zanero S (2014) Phoenix: DGA-based Botnet tracking and intelligence. In: 11th international conference on detection of intrusions and malware, and vulnerability assessment, pp 192–211. Springer International Publishing.  https://doi.org/10.1007/978-3-319-08509-8_11
  50. Sharieh A, Albdour L (2017) A heuristic approach for service allocation in cloud computing. Int J Cloud Appl Comput 7(4):60–74.  https://doi.org/10.4018/IJCAC.2017100104 Google Scholar
  51. Shi Y, Chen G, Li J (2017) Malicious domain name detection based on extreme machine learning. Neural Process Lett.  https://doi.org/10.1007/s11063-017-9666-7
  52. Song WJ, Li B (2016) A method to detect machine generated domain names based on random forest algorithm. In: 2016 international conference on information system and artificial intelligence, pp 509–513.  https://doi.org/10.1109/ISAI.2016.0114
  53. Stergiou C, Psannis KE, Kim BG, Gupta B (2018) Secure integration of IoT and cloud computing. Future Gener Comput Syst 78(3):964–975.  https://doi.org/10.1016/j.future.2016.11.031 CrossRefGoogle Scholar
  54. Stevanovic M, Pedersen JM, D’Alconzo A, Ruehrup S, Berger A (2015) On the ground truth problem of malicious DNS traffic analysis. Comput Secur 55:142–158.  https://doi.org/10.1016/j.cose.2015.09.004 CrossRefGoogle Scholar
  55. Stevanovic M, Pedersen JM, D’Alconzo A, Ruehrup S (2017) A method for identifying compromised clients based on DNS traffic analysis. Int J Inf Secur 16(2):115–132.  https://doi.org/10.1007/s10207-016-0331-3 CrossRefGoogle Scholar
  56. Thomas M, Mohaisen A (2014) Kindred domains: detecting and clustering Botnet domains using DNS traffic. In: 23rd international conference on World Wide Web, pp 707–712.  https://doi.org/10.1145/2567948.2579359
  57. Tong V, Nguyen G (2016) A method for detecting DGA Botnet based on semantic and cluster analysis. In: 7th symposium on information and communication technology, pp 272–277.  https://doi.org/10.1145/3011077.3011112
  58. Tran D, Mac H, Tong V, Tran HA, Nguyen LG (2018) A LSTM based framework for handling multiclass imbalance in DGA Botnet detection. Neurocomputing 275:2401–2413.  https://doi.org/10.1016/j.neucom.2017.11.018 CrossRefGoogle Scholar
  59. Truong D, Cheng G (2016) Detecting domain-flux botnet based on DNS traffic features in managed network. Secur Commun Netw 9(14):2338–2347.  https://doi.org/10.1002/sec.1495 CrossRefGoogle Scholar
  60. Tu TD, Guang C, Xin LY (2015) Detecting Bot-infected machines based on analyzing the similar periodic DNS queries. In: 2015 international conference on communications, management and telecommunications, pp 35–40.  https://doi.org/10.1109/ComManTel.2015.7394256
  61. Vinayakumar R, Soman K, Poornachandran P, Sachin Kumar S (2018) Evaluating deep learning approaches to characterize and classify the DGAs at scale. J Intell Fuzzy Syst 34(3):1265–1276.  https://doi.org/10.3233/JIFS-169423 CrossRefGoogle Scholar
  62. Vormayr G, Zseby T, Fabini J (2017) Botnet communication patterns. IEEE Commun Surv Tutor 19(4):2768–2796.  https://doi.org/10.1109/COMST.2017.2749442 CrossRefGoogle Scholar
  63. Watkins L, Beck S, Zook J, Buczak A, Chavis J, Robinson WH, Morales JA, Mishra S (2017) Using semi-supervised machine learning to address the big data problem in DNS networks. In: 2017 IEEE 7th annual computing and communication workshop and conference, pp 1–6.  https://doi.org/10.1109/CCWC.2017.7868376
  64. Woodbridge J, Anderson HS, Ahuja A, Grant D (2016) Predicting domain generation algorithms with long short-term memory networks. CoRR abs/1611.0. URL http://arxiv.org/abs/1611.00791
  65. Xu S, Li S, Meng K, Wu L, Ding M (2017) An adaptive malicious domain detection mechanism with DNS traffic. In: 2017 VI international conference on network, communication and computing, pp 86–91.  https://doi.org/10.1145/3171592.3171595
  66. Yadav S, Reddy AKK, Reddy ALN, Ranjan S (2010) Detecting algorithmically generated malicious domain names. In: 10th ACM SIGCOMM conference on internet measurement, pp 48–61.  https://doi.org/10.1145/1879141.1879148
  67. Zhang S, Zhang X, Ou X (2014) After we knew it: empirical study and modeling of cost-effectiveness of exploiting prevalent known vulnerabilities across IaaS cloud. In: 9th ACM symposium on information, computer and communications security, pp 317–328.  https://doi.org/10.1145/2590296.2590300
  68. Zhang H, Gharaibeh M, Thanasoulas S, Papadopoulos C (2016) BotDigger: detecting DGA Bots in a single network. Tech. rep., Colorado State University. URL http://www.cs.colostate.edu/TechReports/Reports/2016/tr16-101.pdf

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Communications and Information Engineering, Faculty of Computer ScienceUniversity of MurciaMurciaSpain

Personalised recommendations