Skip to main content

DeepDGA-MINet: Cost-Sensitive Deep Learning Based Framework for Handling Multiclass Imbalanced DGA Detection

  • Chapter
  • First Online:

Abstract

Contemporary malware families typically use domain generation algorithms (DGAs) to circumvent DNS blacklists, sinkholing, or any types of security system. It means that compromised system generates a large number of pseudo-random domain names by using DGAs based on a seed and uses the subset of domain names to contact the command and control server (C2C). To block the communication point, the security organizations reverse engineer the malware samples based on a seed to identify the corresponding DGA algorithm. Primarily, the lists of reverse engineered domain names are sink-holed and preregistered in a DNS blacklist. This type of task is tedious and moreover DNS blacklist able to detect the already existing DGA based domain name. Additionally, this type of system can be easily circumvented by DGA malware authors. A variant to detect DGA domain name is to intercept DNS packets and identify the nature of domain name based on statistical features. This type of system uses contextual data such as passive DNS and NXDomain. Developing system to detect DGA based on contextual data is difficult due to aggregation of all data and it causes more cost in real-time environment and moreover obtaining the contextual information in end point system is often difficult due to the real-world constraints. Recently, the method which detects the DGA domain name on per domain basis is followed. This method doesn’t rely on any external information and uses only full domain name. There are many works for detecting DGA on per domain names based on both manual feature engineering with classical machine learning (CML) algorithms and automatic feature engineering with deep learning architectures. The performance of methods based on deep learning architectures is higher when compared to the CML algorithms. Additionally, the deep learning based DGA detection methods can stay safe in an adversarial environment when compared to CML classifiers. However, the deep learning architectures are vulnerable to multiclass imbalance problem. Additionally, the multiclass imbalance problem is becoming much more important in DGA domain detection. This is mainly due to the fact that many DGA families have very less number of samples in the training data set. In this work, we propose DeepDGA-MINet which collects the DNS information inside an Ethernet LAN and uses Cost-Sensitive deep learning architectures to handle multiclass imbalance problem. This is done by initiating cost items into backpropogation methodology to identify the importance among each DGA families. The performances of the Cost-Sensitive deep learning architecture are evaluated on AmritaDGA benchmark data set. The Cost-Sensitive deep learning architectures performed well when compared to the original deep learning architectures.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   299.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://dgarchive.caad.fkie.fraunhofer.de/.

  2. 2.

    https://vinayakumarr.github.io/AmritaDGA/.

  3. 3.

    https://vinayakumarr.github.io/AmritaDGA/.

  4. 4.

    https://github.com/vinayakumarr/DMD2018.

References

  1. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., et al. (2016). TensorFlow: A system for large-scale machine learning. In OSDI (Vol. 16, pp. 265–283).

    Google Scholar 

  2. Alomari, E., Manickam, S., Gupta, B. B., Anbar, M., Saad, R. M., & Alsaleem, S. (2016). A survey of botnet-based DDoS flooding attacks of application layer: Detection and mitigation approaches. In Handbook of research on modern cryptographic solutions for computer and cyber security (pp. 52–79). Pennsylvania, PA: IGI Global.

    Chapter  Google Scholar 

  3. Antonakakis, M., Perdisci, R., Nadji, Y., Vasiloglou, N., Abu-Nimeh, S., Lee, W., et al. (2012). From throw-away traffic to bots: Detecting the rise of DGA-based malware. In USENIX Security Symposium (Vol. 12).

    Google Scholar 

  4. Gulli, A., & Pal, S. (2017). Deep Learning with Keras. Packt Publishing Ltd.

    Google Scholar 

  5. Curtin, R. R., Gardner, A. B., Grzonkowski, S., Kleymenov, A., & Mosquera, A. (2018). Detecting DGA domains with recurrent neural networks and side information. arXiv preprint arXiv:1810.02023.

    Google Scholar 

  6. Eslahi, M., Salleh, R., & Anuar, N. B. (2012). Bots and botnets: An overview of characteristics, detection and challenges. In 2012 IEEE International Conference on Control System, Computing and Engineering (ICCSCE) (pp. 349–354). Piscataway, NJ: IEEE.

    Chapter  Google Scholar 

  7. Feng, Z., Shuo, C., & Xiaochuan, W. (2017). Classification for DGA-based malicious domain names with deep learning architectures. In 2017 Second International Conference on Applied Mathematics and Information Technology (p. 5).

    Google Scholar 

  8. Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In ICML (Vol. 96, pp. 148–156).

    Google Scholar 

  9. He, H., & Garcia, E. A. (2008). Learning from imbalanced data. IEEE Transactions on Knowledge & Data Engineering, 21(9), 1263–1284.

    Google Scholar 

  10. Krishnan, S., Taylor, T., Monrose, F., & McHugh, J. (2013). Crossing the threshold: Detecting network malfeasance via sequential hypothesis testing. In 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) (pp. 1–12). Piscataway, NJ: IEEE.

    Google Scholar 

  11. Kührer, M., Rossow, C., & Holz, T. (2014). Paint it black: Evaluating the effectiveness of malware blacklists. In International Workshop on Recent Advances in Intrusion Detection (pp. 1–21). Cham: Springer.

    Google Scholar 

  12. Kukar, M., & Kononenko, I. (1998). Cost-sensitive learning with neural networks. In ECAI (pp. 445–449).

    Google Scholar 

  13. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436.

    Article  Google Scholar 

  14. Lison, P., & Mavroeidis, V. (2017). Automatic detection of malware-generated domains with recurrent neural models. arXiv preprint arXiv:1709.07102.

    Google Scholar 

  15. Mac, H., Tran, D., Tong, V., Nguyen, L. G., & Tran, H. A. (2017). DGA botnet detection using supervised learning methods. In Proceedings of the Eighth International Symposium on Information and Communication Technology (pp. 211–218). New York, NY: ACM.

    Google Scholar 

  16. Mohan, V. S., Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2018). Spoof net: Syntactic patterns for identification of ominous online factors. In 2018 IEEE Security and Privacy Workshops (SPW) (pp. 258–263). Piscataway, NJ: IEEE.

    Chapter  Google Scholar 

  17. Plohmann, D., Yakdan, K., Klatt, M., Bader, J., & Gerhards-Padilla, E. (2016). A comprehensive measurement study of domain generating malware. In USENIX Security Symposium (pp. 263–278).

    Google Scholar 

  18. Qiu, C., Jiang, L., & Kong, G. (2015). A differential evolution-based method for class-imbalanced cost-sensitive learning. In 2015 International Joint Conference on Neural Networks (IJCNN) (pp. 1–8). Piscataway, NJ: IEEE.

    Google Scholar 

  19. Schiavoni, S., Maggi, F., Cavallaro, L., & Zanero, S. (2014). Phoenix: DGA-based botnet tracking and intelligence. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (pp. 192–211). Cham: Springer.

    Google Scholar 

  20. Schüppen, S., Teubert, D., Herrmann, P., & Meyer, U. (2018). FANCI: Feature-based automated NXDomain classification and intelligence. In 27th USENIX Security Symposium (USENIX Security 18) (pp. 1165–1181).

    Google Scholar 

  21. Seiffert, C., Khoshgoftaar, T. M., Van Hulse, J., & Napolitano, A. (2010). RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man, and Cybernetics Part A: Systems and Humans, 40(1), 185–197.

    Article  Google Scholar 

  22. Sivaguru, R., Choudhary, C., Yu, B., Tymchenko, V., Nascimento, A., & De Cock, M. (2018). An evaluation of DGA classifiers. In 2018 IEEE International Conference on Big Data (Big Data) (pp. 5058–5067). Piscataway, NJ: IEEE.

    Chapter  Google Scholar 

  23. Stone-Gross, B., Cova, M., Gilbert, B., Kemmerer, R., Kruegel, C., & Vigna, G. (2011). Analysis of a botnet takeover. IEEE Security & Privacy, 9(1), 64–72.

    Article  Google Scholar 

  24. Tran, D., Mac, H., Tong, V., Tran, H. A., & Nguyen, L. G. (2018). A LSTM based framework for handling multiclass imbalance in DGA botnet detection. Neurocomputing, 275, 2401–2413.

    Article  Google Scholar 

  25. Vinayakumar, R., Barathi Ganesh, H. B., & Anand Kumar, M., Soman, K. P. DeepAnti-PhishNet: Applying deep neural networks for phishing email detection cen-aisecurity@iwspa-2018 (pp. 40–50). http://ceur-ws.org/Vol2124/#paper_9

  26. Vinayakumar, R., Poornachandran, P., & Soman, K. P. (2018). Scalable framework for cyber threat situational awareness based on domain name systems data analysis. In Big Data in Engineering Applications (pp. 113–142). Singapore: Springer.

    Chapter  Google Scholar 

  27. Vinayakumar, R., & Soman, K. P. (2018). DeepMalNet: Evaluating shallow and deep networks for static PE malware detection. ICT Express, 4(4), 255–258.

    Google Scholar 

  28. Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2017). Applying convolutional neural network for network intrusion detection. In 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp. 1222–1228). Piscataway, NJ: IEEE.

    Chapter  Google Scholar 

  29. Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2017). Evaluating effectiveness of shallow and deep networks to intrusion detection system. In 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp. 1282–1289). Piscataway, NJ: IEEE.

    Chapter  Google Scholar 

  30. Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2017). Evaluation of recurrent neural network and its variants for intrusion detection system (IDS). International Journal of Information System Modeling and Design, 8(3), 43–63.

    Article  Google Scholar 

  31. Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2017). Long short-term memory based operation log anomaly detection. In 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp. 236–242). Piscataway, NJ: IEEE.

    Chapter  Google Scholar 

  32. Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2017). Secure shell (SSH) traffic analysis with flow based features using shallow and deep networks. In 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp. 2026–2032). Piscataway, NJ: IEEE.

    Chapter  Google Scholar 

  33. Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2018). Detecting malicious domain names using deep learning approaches at scale. Journal of Intelligent & Fuzzy Systems, 34(3), 1355–1367.

    Article  Google Scholar 

  34. Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2018). Evaluating deep learning approaches to characterize, signalize and classify malicious URLs. Journal of Intelligent and Fuzzy Systems, 34(3), 1333–1343.

    Article  Google Scholar 

  35. Vinayakumar, R., Soman, K. P., Poornachandran, P., Alazab, M., & Jolfaei, A. (in press). Detecting domain generation algorithms using deep learning. In Deep learning applications for cyber security. Cham: Springer.

    Google Scholar 

  36. Vinayakumar, R., Soman, K. P., Poornachandran, P., Alazab, M., & Thampi, S. M. (in press). AmritaDGA: A comprehensive data set for domain generation algorithms (DGAs). In Big Data Recommender Systems: Recent Trends and Advances, Institution of Engineering and Technology (IET).

    Google Scholar 

  37. Vinayakumar, R., Soman, K. P., Poornachandran, P., & Menon, P. (2019). A deep-dive on machine learning for cyber security use cases. In: Machine Learning for computer and cyber security: Principle, algorithms, and practices. Boca Raton, FL: CRC Press.

    Google Scholar 

  38. Vinayakumar, R., Soman, K. P., Poornachandran, P., Mohan, V. S., & Kumar, A. D. (2019). ScaleNet: Scalable and hybrid framework for cyber threat situational awareness based on DNS, URL, and email data analysis. Journal of Cyber Security and Mobility, 8(2), 189–240.

    Article  Google Scholar 

  39. Vinayakumar, R., Soman, K. P., Poornachandran, P., & Sachin Kumar, S. (2018). Detecting Android malware using long short-term memory (LSTM). Journal of Intelligent & Fuzzy Systems, 34(3), 1277–1288.

    Article  Google Scholar 

  40. Vinayakumar, R., Soman, K. P., Poornachandran, P., & Sachin Kumar, S. (2018). Evaluating deep learning approaches to characterize and classify the DGAs at scale. Journal of Intelligent & Fuzzy Systems, 34(3), 1265–1276.

    Article  Google Scholar 

  41. Wang, S., & Yao, X. (2012). Multiclass imbalance problems: Analysis and potential solutions. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 42(4), 1119–1130.

    Article  Google Scholar 

  42. Woodbridge, J., Anderson, H. S., Ahuja, A., & Grant, D. (2016). Predicting domain generation algorithms with long short-term memory networks. arXiv preprint arXiv:1611.00791.

    Google Scholar 

  43. Yadav, S., Reddy, A. K. K., Reddy, A. L., & Ranjan, S. (2010). Detecting algorithmically generated malicious domain names. In Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement (pp. 48–61). New York, NY: ACM.

    Google Scholar 

  44. Yadav, S., Reddy, A. K. K., Reddy, A. N., & Ranjan, S. (2012). Detecting algorithmically generated domain-flux attacks with DNS traffic analysis. IEEE/ACM Transactions on Networking, 20(5), 1663–1677.

    Article  Google Scholar 

  45. Yu, B., Gray, D. L., Pan, J., De Cock, M., & Nascimento, A. C. (2017). Inline DGA detection with deep networks. In 2017 IEEE International Conference on Data Mining Workshops (ICDMW) (pp. 683–692). Piscataway, NJ: IEEE.

    Chapter  Google Scholar 

  46. Yu, B., Pan, J., Hu, J., Nascimento, A., & De Cock, M. (2018). Character level based detection of DGA domain names. In 2018 International Joint Conference on Neural Networks (IJCNN) (pp. 1–8). Piscataway, NJ: IEEE.

    Google Scholar 

  47. Zhauniarovich, Y., Khalil, I., Yu, T., & Dacier, M. (2018). A Survey on malicious domains detection through DNS data analysis. ACM Computing Surveys, 51(4), 67

    Article  Google Scholar 

  48. Zhou, Z. H., & Liu, X. Y. (2006). Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge and Data Engineering, 18(1), 63–77.

    Article  MathSciNet  Google Scholar 

  49. Zhou, Z. H., & Liu, X. Y. (2010). On multi-class cost-sensitive learning. Computational Intelligence, 26(3), 232–257.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This research was supported in part by Paramount Computer Systems and Lakhshya Cyber Security Labs. We are grateful to NVIDIA India, for the GPU hardware support to research grant. We are also grateful to Computational Engineering and Networking (CEN) department for encouraging the research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to R. Vinayakumar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Vinayakumar, R., Soman, K.P., Poornachandran, P. (2020). DeepDGA-MINet: Cost-Sensitive Deep Learning Based Framework for Handling Multiclass Imbalanced DGA Detection. In: Gupta, B., Perez, G., Agrawal, D., Gupta, D. (eds) Handbook of Computer Networks and Cyber Security. Springer, Cham. https://doi.org/10.1007/978-3-030-22277-2_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-22277-2_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-22276-5

  • Online ISBN: 978-3-030-22277-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics