Algorithmically Generated Domain Detection and Malware Family Classification

  • Chhaya Choudhary
  • Raaghavi Sivaguru
  • Mayana Pereira
  • Bin Yu
  • Anderson C. Nascimento
  • Martine De CockEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 969)


In this paper, we compare the performance of several machine learning based approaches for the tasks of detecting algorithmically generated malicious domains and the categorization of domains according to their malware family. The datasets used for model comparison were provided by the shared task on Detecting Malicious Domain names (DMD 2018). Our models ranked first for two out of the four test datasets provided in the competition.


Domain Generation Algorithms Malware Supervised learning Deep learning Random forest 



We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.


  1. 1.
    Does Alexa have a list of its top-ranked websites? Accessed 28 May 2017
  2. 2.
    OSINT feeds from Bambenek Consulting. Accessed 28 May 2017
  3. 3.
    Antonakakis, M., et al.: From throw-away traffic to bots: detecting the rise of DGA-based malware. In: USENIX Security Symposium, vol. 12 (2012)Google Scholar
  4. 4.
    Bilge, L., Kirda, E., Kruegel, C., Balduzzi, M.: Exposure: finding malicious domains using passive DNS analysis. In: NDSS Symposium (2011)Google Scholar
  5. 5.
    Dhingra, B., Zhou, Z., Fitzpatrick, D., Muehl, M., Cohen, W.: Tweet2vec: character-based distributed representations for social media. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol. 2, pp. 269–274 (2016)Google Scholar
  6. 6.
    Lison, P., Mavroeidis, V.: Automatic detection of malware-generated domains with recurrent neural models. preprint arXiv:1709.07102 (2017)
  7. 7.
    Plohmann, D., Yakdan, K., Klatt, M., Bader, J., Gerhards-Padilla, E.: A comprehensive measurement study of domain generating malware. In: USENIX Security Symposium, pp. 263–278 (2016)Google Scholar
  8. 8.
    Saxe, J., Berlin, K.: eXpose: A character-level convolutional neural network with embeddings for detecting malicious urls, file paths and registry keys. preprint arXiv:1702.08568 (2017)
  9. 9.
    Schiavoni, S., Maggi, F., Cavallaro, L., Zanero, S.: Phoenix: DGA-based botnet tracking and intelligence. In: Dietrich, S. (ed.) DIMVA 2014. LNCS, vol. 8550, pp. 192–211. Springer, Cham (2014). Scholar
  10. 10.
    Tran, D., Mac, H., Tong, V., Tran, H.A., Nguyen, L.G.: A LSTM based framework for handling multiclass imbalance in DGA botnet detection. Neurocomputing 275, 2401–2413 (2018)CrossRefGoogle Scholar
  11. 11.
    Vinayakumar, R., Poornachandran, P., Soman, K.P.: Scalable framework for cyber threat situational awareness based on domain name systems data analysis. In: Roy, S.S., Samui, P., Deo, R., Ntalampiras, S. (eds.) Big Data in Engineering Applications. SBD, vol. 44, pp. 113–142. Springer, Singapore (2018). Scholar
  12. 12.
    Vinayakumar, R., Soman, K., Poornachandran, P.: Detecting malicious domain names using deep learning approaches at scale. J. Intell. Fuzzy Syst. 34(3), 1355–1367 (2018)CrossRefGoogle Scholar
  13. 13.
    Vinayakumar, R., Soman, K., Poornachandran, P., Sachin Kumar, S.: Evaluating deep learning approaches to characterize and classify the DGAs at scale. J. Intell. Fuzzy Syst. 34(3), 1265–1276 (2018)CrossRefGoogle Scholar
  14. 14.
    Vosoughi, S., Vijayaraghavan, P., Roy, D.: Tweet2vec: learning tweet embeddings using character-level CNN-LSTM encoder-decoder. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1041–1044 (2016)Google Scholar
  15. 15.
    Woodbridge, J., Anderson, H.S., Ahuja, A., Grant, D.: Predicting domain generation algorithms with long short-term memory networks. preprint arXiv:1611.00791 (2016)
  16. 16.
    Yadav, S., Reddy, A.K.K., Reddy, A.L.N., Ranjan, S.: Detecting algorithmically generated malicious domain names. In: Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, pp. 48–61 (2010)Google Scholar
  17. 17.
    Yu, B., Gray, D., Pan, J., De Cock, M., Nascimento, A.: Inline DGA detection with deep networks. In: Data Mining for Cyber Security, Proceedings of International Conference on Data Mining (ICDM2017) Workshops, pp. 683–692 (2017)Google Scholar
  18. 18.
    Yu, B., Pan, J., Hu, J., Nascimento, A., De Cock, M.: Character level based detection of DGA domain names. In: Proceedings of IJCNN at WCCI2018 (2018 IEEE World Congress on Computational Intelligence), pp. 4168–4175 (2018)Google Scholar
  19. 19.
    Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. Adv. Neural Inf. Process. Syst. 28, 649–657 (2015)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  • Chhaya Choudhary
    • 1
  • Raaghavi Sivaguru
    • 1
  • Mayana Pereira
    • 2
  • Bin Yu
    • 2
  • Anderson C. Nascimento
    • 1
  • Martine De Cock
    • 1
    • 3
    Email author
  1. 1.University of Washington TacomaTacomaUSA
  2. 2.Infoblox Inc.Santa ClaraUSA
  3. 3.Ghent UniversityGhentBelgium

Personalised recommendations