DeepDGA-MINet: Cost-Sensitive Deep Learning Based Framework for Handling Multiclass Imbalanced DGA Detection

Vinayakumar, R.; Soman, K. P.; Poornachandran, Prabaharan

doi:10.1007/978-3-030-22277-2_37

DeepDGA-MINet: Cost-Sensitive Deep Learning Based Framework for Handling Multiclass Imbalanced DGA Detection

R. Vinayakumar^5,6,
K. P. Soman^5,6 &
Prabaharan Poornachandran⁶

Chapter
First Online: 01 January 2020

3943 Accesses
3 Citations

Abstract

Contemporary malware families typically use domain generation algorithms (DGAs) to circumvent DNS blacklists, sinkholing, or any types of security system. It means that compromised system generates a large number of pseudo-random domain names by using DGAs based on a seed and uses the subset of domain names to contact the command and control server (C2C). To block the communication point, the security organizations reverse engineer the malware samples based on a seed to identify the corresponding DGA algorithm. Primarily, the lists of reverse engineered domain names are sink-holed and preregistered in a DNS blacklist. This type of task is tedious and moreover DNS blacklist able to detect the already existing DGA based domain name. Additionally, this type of system can be easily circumvented by DGA malware authors. A variant to detect DGA domain name is to intercept DNS packets and identify the nature of domain name based on statistical features. This type of system uses contextual data such as passive DNS and NXDomain. Developing system to detect DGA based on contextual data is difficult due to aggregation of all data and it causes more cost in real-time environment and moreover obtaining the contextual information in end point system is often difficult due to the real-world constraints. Recently, the method which detects the DGA domain name on per domain basis is followed. This method doesn’t rely on any external information and uses only full domain name. There are many works for detecting DGA on per domain names based on both manual feature engineering with classical machine learning (CML) algorithms and automatic feature engineering with deep learning architectures. The performance of methods based on deep learning architectures is higher when compared to the CML algorithms. Additionally, the deep learning based DGA detection methods can stay safe in an adversarial environment when compared to CML classifiers. However, the deep learning architectures are vulnerable to multiclass imbalance problem. Additionally, the multiclass imbalance problem is becoming much more important in DGA domain detection. This is mainly due to the fact that many DGA families have very less number of samples in the training data set. In this work, we propose DeepDGA-MINet which collects the DNS information inside an Ethernet LAN and uses Cost-Sensitive deep learning architectures to handle multiclass imbalance problem. This is done by initiating cost items into backpropogation methodology to identify the importance among each DGA families. The performances of the Cost-Sensitive deep learning architecture are evaluated on AmritaDGA benchmark data set. The Cost-Sensitive deep learning architectures performed well when compared to the original deep learning architectures.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., et al. (2016). TensorFlow: A system for large-scale machine learning. In OSDI (Vol. 16, pp. 265–283).
Google Scholar
Alomari, E., Manickam, S., Gupta, B. B., Anbar, M., Saad, R. M., & Alsaleem, S. (2016). A survey of botnet-based DDoS flooding attacks of application layer: Detection and mitigation approaches. In Handbook of research on modern cryptographic solutions for computer and cyber security (pp. 52–79). Pennsylvania, PA: IGI Global.
Chapter Google Scholar
Antonakakis, M., Perdisci, R., Nadji, Y., Vasiloglou, N., Abu-Nimeh, S., Lee, W., et al. (2012). From throw-away traffic to bots: Detecting the rise of DGA-based malware. In USENIX Security Symposium (Vol. 12).
Google Scholar
Gulli, A., & Pal, S. (2017). Deep Learning with Keras. Packt Publishing Ltd.
Google Scholar
Curtin, R. R., Gardner, A. B., Grzonkowski, S., Kleymenov, A., & Mosquera, A. (2018). Detecting DGA domains with recurrent neural networks and side information. arXiv preprint arXiv:1810.02023.
Google Scholar
Eslahi, M., Salleh, R., & Anuar, N. B. (2012). Bots and botnets: An overview of characteristics, detection and challenges. In 2012 IEEE International Conference on Control System, Computing and Engineering (ICCSCE) (pp. 349–354). Piscataway, NJ: IEEE.
Chapter Google Scholar
Feng, Z., Shuo, C., & Xiaochuan, W. (2017). Classification for DGA-based malicious domain names with deep learning architectures. In 2017 Second International Conference on Applied Mathematics and Information Technology (p. 5).
Google Scholar
Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In ICML (Vol. 96, pp. 148–156).
Google Scholar
He, H., & Garcia, E. A. (2008). Learning from imbalanced data. IEEE Transactions on Knowledge & Data Engineering, 21(9), 1263–1284.
Google Scholar
Krishnan, S., Taylor, T., Monrose, F., & McHugh, J. (2013). Crossing the threshold: Detecting network malfeasance via sequential hypothesis testing. In 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) (pp. 1–12). Piscataway, NJ: IEEE.
Google Scholar
Kührer, M., Rossow, C., & Holz, T. (2014). Paint it black: Evaluating the effectiveness of malware blacklists. In International Workshop on Recent Advances in Intrusion Detection (pp. 1–21). Cham: Springer.
Google Scholar
Kukar, M., & Kononenko, I. (1998). Cost-sensitive learning with neural networks. In ECAI (pp. 445–449).
Google Scholar
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436.
Article Google Scholar
Lison, P., & Mavroeidis, V. (2017). Automatic detection of malware-generated domains with recurrent neural models. arXiv preprint arXiv:1709.07102.
Google Scholar
Mac, H., Tran, D., Tong, V., Nguyen, L. G., & Tran, H. A. (2017). DGA botnet detection using supervised learning methods. In Proceedings of the Eighth International Symposium on Information and Communication Technology (pp. 211–218). New York, NY: ACM.
Google Scholar
Mohan, V. S., Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2018). Spoof net: Syntactic patterns for identification of ominous online factors. In 2018 IEEE Security and Privacy Workshops (SPW) (pp. 258–263). Piscataway, NJ: IEEE.
Chapter Google Scholar
Plohmann, D., Yakdan, K., Klatt, M., Bader, J., & Gerhards-Padilla, E. (2016). A comprehensive measurement study of domain generating malware. In USENIX Security Symposium (pp. 263–278).
Google Scholar
Qiu, C., Jiang, L., & Kong, G. (2015). A differential evolution-based method for class-imbalanced cost-sensitive learning. In 2015 International Joint Conference on Neural Networks (IJCNN) (pp. 1–8). Piscataway, NJ: IEEE.
Google Scholar
Schiavoni, S., Maggi, F., Cavallaro, L., & Zanero, S. (2014). Phoenix: DGA-based botnet tracking and intelligence. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (pp. 192–211). Cham: Springer.
Google Scholar
Schüppen, S., Teubert, D., Herrmann, P., & Meyer, U. (2018). FANCI: Feature-based automated NXDomain classification and intelligence. In 27th USENIX Security Symposium (USENIX Security 18) (pp. 1165–1181).
Google Scholar
Seiffert, C., Khoshgoftaar, T. M., Van Hulse, J., & Napolitano, A. (2010). RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man, and Cybernetics Part A: Systems and Humans, 40(1), 185–197.
Article Google Scholar
Sivaguru, R., Choudhary, C., Yu, B., Tymchenko, V., Nascimento, A., & De Cock, M. (2018). An evaluation of DGA classifiers. In 2018 IEEE International Conference on Big Data (Big Data) (pp. 5058–5067). Piscataway, NJ: IEEE.
Chapter Google Scholar
Stone-Gross, B., Cova, M., Gilbert, B., Kemmerer, R., Kruegel, C., & Vigna, G. (2011). Analysis of a botnet takeover. IEEE Security & Privacy, 9(1), 64–72.
Article Google Scholar
Tran, D., Mac, H., Tong, V., Tran, H. A., & Nguyen, L. G. (2018). A LSTM based framework for handling multiclass imbalance in DGA botnet detection. Neurocomputing, 275, 2401–2413.
Article Google Scholar
Vinayakumar, R., Barathi Ganesh, H. B., & Anand Kumar, M., Soman, K. P. DeepAnti-PhishNet: Applying deep neural networks for phishing email detection cen-aisecurity@iwspa-2018 (pp. 40–50). http://ceur-ws.org/Vol2124/#paper_9
Vinayakumar, R., Poornachandran, P., & Soman, K. P. (2018). Scalable framework for cyber threat situational awareness based on domain name systems data analysis. In Big Data in Engineering Applications (pp. 113–142). Singapore: Springer.
Chapter Google Scholar
Vinayakumar, R., & Soman, K. P. (2018). DeepMalNet: Evaluating shallow and deep networks for static PE malware detection. ICT Express, 4(4), 255–258.
Google Scholar
Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2017). Applying convolutional neural network for network intrusion detection. In 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp. 1222–1228). Piscataway, NJ: IEEE.
Chapter Google Scholar
Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2017). Evaluating effectiveness of shallow and deep networks to intrusion detection system. In 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp. 1282–1289). Piscataway, NJ: IEEE.
Chapter Google Scholar
Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2017). Evaluation of recurrent neural network and its variants for intrusion detection system (IDS). International Journal of Information System Modeling and Design, 8(3), 43–63.
Article Google Scholar
Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2017). Long short-term memory based operation log anomaly detection. In 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp. 236–242). Piscataway, NJ: IEEE.
Chapter Google Scholar
Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2017). Secure shell (SSH) traffic analysis with flow based features using shallow and deep networks. In 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp. 2026–2032). Piscataway, NJ: IEEE.
Chapter Google Scholar
Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2018). Detecting malicious domain names using deep learning approaches at scale. Journal of Intelligent & Fuzzy Systems, 34(3), 1355–1367.
Article Google Scholar
Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2018). Evaluating deep learning approaches to characterize, signalize and classify malicious URLs. Journal of Intelligent and Fuzzy Systems, 34(3), 1333–1343.
Article Google Scholar
Vinayakumar, R., Soman, K. P., Poornachandran, P., Alazab, M., & Jolfaei, A. (in press). Detecting domain generation algorithms using deep learning. In Deep learning applications for cyber security. Cham: Springer.
Google Scholar
Vinayakumar, R., Soman, K. P., Poornachandran, P., Alazab, M., & Thampi, S. M. (in press). AmritaDGA: A comprehensive data set for domain generation algorithms (DGAs). In Big Data Recommender Systems: Recent Trends and Advances, Institution of Engineering and Technology (IET).
Google Scholar
Vinayakumar, R., Soman, K. P., Poornachandran, P., & Menon, P. (2019). A deep-dive on machine learning for cyber security use cases. In: Machine Learning for computer and cyber security: Principle, algorithms, and practices. Boca Raton, FL: CRC Press.
Google Scholar
Vinayakumar, R., Soman, K. P., Poornachandran, P., Mohan, V. S., & Kumar, A. D. (2019). ScaleNet: Scalable and hybrid framework for cyber threat situational awareness based on DNS, URL, and email data analysis. Journal of Cyber Security and Mobility, 8(2), 189–240.
Article Google Scholar
Vinayakumar, R., Soman, K. P., Poornachandran, P., & Sachin Kumar, S. (2018). Detecting Android malware using long short-term memory (LSTM). Journal of Intelligent & Fuzzy Systems, 34(3), 1277–1288.
Article Google Scholar
Vinayakumar, R., Soman, K. P., Poornachandran, P., & Sachin Kumar, S. (2018). Evaluating deep learning approaches to characterize and classify the DGAs at scale. Journal of Intelligent & Fuzzy Systems, 34(3), 1265–1276.
Article Google Scholar
Wang, S., & Yao, X. (2012). Multiclass imbalance problems: Analysis and potential solutions. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 42(4), 1119–1130.
Article Google Scholar
Woodbridge, J., Anderson, H. S., Ahuja, A., & Grant, D. (2016). Predicting domain generation algorithms with long short-term memory networks. arXiv preprint arXiv:1611.00791.
Google Scholar
Yadav, S., Reddy, A. K. K., Reddy, A. L., & Ranjan, S. (2010). Detecting algorithmically generated malicious domain names. In Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement (pp. 48–61). New York, NY: ACM.
Google Scholar
Yadav, S., Reddy, A. K. K., Reddy, A. N., & Ranjan, S. (2012). Detecting algorithmically generated domain-flux attacks with DNS traffic analysis. IEEE/ACM Transactions on Networking, 20(5), 1663–1677.
Article Google Scholar
Yu, B., Gray, D. L., Pan, J., De Cock, M., & Nascimento, A. C. (2017). Inline DGA detection with deep networks. In 2017 IEEE International Conference on Data Mining Workshops (ICDMW) (pp. 683–692). Piscataway, NJ: IEEE.
Chapter Google Scholar
Yu, B., Pan, J., Hu, J., Nascimento, A., & De Cock, M. (2018). Character level based detection of DGA domain names. In 2018 International Joint Conference on Neural Networks (IJCNN) (pp. 1–8). Piscataway, NJ: IEEE.
Google Scholar
Zhauniarovich, Y., Khalil, I., Yu, T., & Dacier, M. (2018). A Survey on malicious domains detection through DNS data analysis. ACM Computing Surveys, 51(4), 67
Article Google Scholar
Zhou, Z. H., & Liu, X. Y. (2006). Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge and Data Engineering, 18(1), 63–77.
Article MathSciNet Google Scholar
Zhou, Z. H., & Liu, X. Y. (2010). On multi-class cost-sensitive learning. Computational Intelligence, 26(3), 232–257.
Article MathSciNet Google Scholar

Download references

Acknowledgements

This research was supported in part by Paramount Computer Systems and Lakhshya Cyber Security Labs. We are grateful to NVIDIA India, for the GPU hardware support to research grant. We are also grateful to Computational Engineering and Networking (CEN) department for encouraging the research.

Author information

Authors and Affiliations

Center for Computational Engineering and Networking (CEN), Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India
R. Vinayakumar & K. P. Soman
Centre for Cyber Security Systems and Networks, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Amritapuri, India
R. Vinayakumar, K. P. Soman & Prabaharan Poornachandran

Authors

R. Vinayakumar
View author publications
You can also search for this author in PubMed Google Scholar
K. P. Soman
View author publications
You can also search for this author in PubMed Google Scholar
Prabaharan Poornachandran
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to R. Vinayakumar .

Editor information

Editors and Affiliations

Department of Computer Engineering, National Institute of Technology Kurukshetra, Kurukshetra, India
Brij B. Gupta
Department of Computer Science, University of Murcia, Catedrático de Universidad, Murcia, Spain
Gregorio Martinez Perez
Department of Electrical Engineering and Computer Science, University of Cincinnati, Cincinnati, USA
Dharma P. Agrawal
LoginRadius Inc., Vancouver, BC, Canada
Deepak Gupta

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Vinayakumar, R., Soman, K.P., Poornachandran, P. (2020). DeepDGA-MINet: Cost-Sensitive Deep Learning Based Framework for Handling Multiclass Imbalanced DGA Detection. In: Gupta, B., Perez, G., Agrawal, D., Gupta, D. (eds) Handbook of Computer Networks and Cyber Security. Springer, Cham. https://doi.org/10.1007/978-3-030-22277-2_37

Download citation

DOI: https://doi.org/10.1007/978-3-030-22277-2_37
Published: 01 January 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22276-5
Online ISBN: 978-3-030-22277-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics