Abstract
Contemporary malware families typically use domain generation algorithms (DGAs) to circumvent DNS blacklists, sinkholing, or any types of security system. It means that compromised system generates a large number of pseudo-random domain names by using DGAs based on a seed and uses the subset of domain names to contact the command and control server (C2C). To block the communication point, the security organizations reverse engineer the malware samples based on a seed to identify the corresponding DGA algorithm. Primarily, the lists of reverse engineered domain names are sink-holed and preregistered in a DNS blacklist. This type of task is tedious and moreover DNS blacklist able to detect the already existing DGA based domain name. Additionally, this type of system can be easily circumvented by DGA malware authors. A variant to detect DGA domain name is to intercept DNS packets and identify the nature of domain name based on statistical features. This type of system uses contextual data such as passive DNS and NXDomain. Developing system to detect DGA based on contextual data is difficult due to aggregation of all data and it causes more cost in real-time environment and moreover obtaining the contextual information in end point system is often difficult due to the real-world constraints. Recently, the method which detects the DGA domain name on per domain basis is followed. This method doesn’t rely on any external information and uses only full domain name. There are many works for detecting DGA on per domain names based on both manual feature engineering with classical machine learning (CML) algorithms and automatic feature engineering with deep learning architectures. The performance of methods based on deep learning architectures is higher when compared to the CML algorithms. Additionally, the deep learning based DGA detection methods can stay safe in an adversarial environment when compared to CML classifiers. However, the deep learning architectures are vulnerable to multiclass imbalance problem. Additionally, the multiclass imbalance problem is becoming much more important in DGA domain detection. This is mainly due to the fact that many DGA families have very less number of samples in the training data set. In this work, we propose DeepDGA-MINet which collects the DNS information inside an Ethernet LAN and uses Cost-Sensitive deep learning architectures to handle multiclass imbalance problem. This is done by initiating cost items into backpropogation methodology to identify the importance among each DGA families. The performances of the Cost-Sensitive deep learning architecture are evaluated on AmritaDGA benchmark data set. The Cost-Sensitive deep learning architectures performed well when compared to the original deep learning architectures.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., et al. (2016). TensorFlow: A system for large-scale machine learning. In OSDI (Vol. 16, pp. 265–283).
Alomari, E., Manickam, S., Gupta, B. B., Anbar, M., Saad, R. M., & Alsaleem, S. (2016). A survey of botnet-based DDoS flooding attacks of application layer: Detection and mitigation approaches. In Handbook of research on modern cryptographic solutions for computer and cyber security (pp. 52–79). Pennsylvania, PA: IGI Global.
Antonakakis, M., Perdisci, R., Nadji, Y., Vasiloglou, N., Abu-Nimeh, S., Lee, W., et al. (2012). From throw-away traffic to bots: Detecting the rise of DGA-based malware. In USENIX Security Symposium (Vol. 12).
Gulli, A., & Pal, S. (2017). Deep Learning with Keras. Packt Publishing Ltd.
Curtin, R. R., Gardner, A. B., Grzonkowski, S., Kleymenov, A., & Mosquera, A. (2018). Detecting DGA domains with recurrent neural networks and side information. arXiv preprint arXiv:1810.02023.
Eslahi, M., Salleh, R., & Anuar, N. B. (2012). Bots and botnets: An overview of characteristics, detection and challenges. In 2012 IEEE International Conference on Control System, Computing and Engineering (ICCSCE) (pp. 349–354). Piscataway, NJ: IEEE.
Feng, Z., Shuo, C., & Xiaochuan, W. (2017). Classification for DGA-based malicious domain names with deep learning architectures. In 2017 Second International Conference on Applied Mathematics and Information Technology (p. 5).
Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In ICML (Vol. 96, pp. 148–156).
He, H., & Garcia, E. A. (2008). Learning from imbalanced data. IEEE Transactions on Knowledge & Data Engineering, 21(9), 1263–1284.
Krishnan, S., Taylor, T., Monrose, F., & McHugh, J. (2013). Crossing the threshold: Detecting network malfeasance via sequential hypothesis testing. In 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) (pp. 1–12). Piscataway, NJ: IEEE.
Kührer, M., Rossow, C., & Holz, T. (2014). Paint it black: Evaluating the effectiveness of malware blacklists. In International Workshop on Recent Advances in Intrusion Detection (pp. 1–21). Cham: Springer.
Kukar, M., & Kononenko, I. (1998). Cost-sensitive learning with neural networks. In ECAI (pp. 445–449).
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436.
Lison, P., & Mavroeidis, V. (2017). Automatic detection of malware-generated domains with recurrent neural models. arXiv preprint arXiv:1709.07102.
Mac, H., Tran, D., Tong, V., Nguyen, L. G., & Tran, H. A. (2017). DGA botnet detection using supervised learning methods. In Proceedings of the Eighth International Symposium on Information and Communication Technology (pp. 211–218). New York, NY: ACM.
Mohan, V. S., Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2018). Spoof net: Syntactic patterns for identification of ominous online factors. In 2018 IEEE Security and Privacy Workshops (SPW) (pp. 258–263). Piscataway, NJ: IEEE.
Plohmann, D., Yakdan, K., Klatt, M., Bader, J., & Gerhards-Padilla, E. (2016). A comprehensive measurement study of domain generating malware. In USENIX Security Symposium (pp. 263–278).
Qiu, C., Jiang, L., & Kong, G. (2015). A differential evolution-based method for class-imbalanced cost-sensitive learning. In 2015 International Joint Conference on Neural Networks (IJCNN) (pp. 1–8). Piscataway, NJ: IEEE.
Schiavoni, S., Maggi, F., Cavallaro, L., & Zanero, S. (2014). Phoenix: DGA-based botnet tracking and intelligence. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (pp. 192–211). Cham: Springer.
Schüppen, S., Teubert, D., Herrmann, P., & Meyer, U. (2018). FANCI: Feature-based automated NXDomain classification and intelligence. In 27th USENIX Security Symposium (USENIX Security 18) (pp. 1165–1181).
Seiffert, C., Khoshgoftaar, T. M., Van Hulse, J., & Napolitano, A. (2010). RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man, and Cybernetics Part A: Systems and Humans, 40(1), 185–197.
Sivaguru, R., Choudhary, C., Yu, B., Tymchenko, V., Nascimento, A., & De Cock, M. (2018). An evaluation of DGA classifiers. In 2018 IEEE International Conference on Big Data (Big Data) (pp. 5058–5067). Piscataway, NJ: IEEE.
Stone-Gross, B., Cova, M., Gilbert, B., Kemmerer, R., Kruegel, C., & Vigna, G. (2011). Analysis of a botnet takeover. IEEE Security & Privacy, 9(1), 64–72.
Tran, D., Mac, H., Tong, V., Tran, H. A., & Nguyen, L. G. (2018). A LSTM based framework for handling multiclass imbalance in DGA botnet detection. Neurocomputing, 275, 2401–2413.
Vinayakumar, R., Barathi Ganesh, H. B., & Anand Kumar, M., Soman, K. P. DeepAnti-PhishNet: Applying deep neural networks for phishing email detection cen-aisecurity@iwspa-2018 (pp. 40–50). http://ceur-ws.org/Vol2124/#paper_9
Vinayakumar, R., Poornachandran, P., & Soman, K. P. (2018). Scalable framework for cyber threat situational awareness based on domain name systems data analysis. In Big Data in Engineering Applications (pp. 113–142). Singapore: Springer.
Vinayakumar, R., & Soman, K. P. (2018). DeepMalNet: Evaluating shallow and deep networks for static PE malware detection. ICT Express, 4(4), 255–258.
Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2017). Applying convolutional neural network for network intrusion detection. In 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp. 1222–1228). Piscataway, NJ: IEEE.
Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2017). Evaluating effectiveness of shallow and deep networks to intrusion detection system. In 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp. 1282–1289). Piscataway, NJ: IEEE.
Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2017). Evaluation of recurrent neural network and its variants for intrusion detection system (IDS). International Journal of Information System Modeling and Design, 8(3), 43–63.
Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2017). Long short-term memory based operation log anomaly detection. In 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp. 236–242). Piscataway, NJ: IEEE.
Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2017). Secure shell (SSH) traffic analysis with flow based features using shallow and deep networks. In 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp. 2026–2032). Piscataway, NJ: IEEE.
Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2018). Detecting malicious domain names using deep learning approaches at scale. Journal of Intelligent & Fuzzy Systems, 34(3), 1355–1367.
Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2018). Evaluating deep learning approaches to characterize, signalize and classify malicious URLs. Journal of Intelligent and Fuzzy Systems, 34(3), 1333–1343.
Vinayakumar, R., Soman, K. P., Poornachandran, P., Alazab, M., & Jolfaei, A. (in press). Detecting domain generation algorithms using deep learning. In Deep learning applications for cyber security. Cham: Springer.
Vinayakumar, R., Soman, K. P., Poornachandran, P., Alazab, M., & Thampi, S. M. (in press). AmritaDGA: A comprehensive data set for domain generation algorithms (DGAs). In Big Data Recommender Systems: Recent Trends and Advances, Institution of Engineering and Technology (IET).
Vinayakumar, R., Soman, K. P., Poornachandran, P., & Menon, P. (2019). A deep-dive on machine learning for cyber security use cases. In: Machine Learning for computer and cyber security: Principle, algorithms, and practices. Boca Raton, FL: CRC Press.
Vinayakumar, R., Soman, K. P., Poornachandran, P., Mohan, V. S., & Kumar, A. D. (2019). ScaleNet: Scalable and hybrid framework for cyber threat situational awareness based on DNS, URL, and email data analysis. Journal of Cyber Security and Mobility, 8(2), 189–240.
Vinayakumar, R., Soman, K. P., Poornachandran, P., & Sachin Kumar, S. (2018). Detecting Android malware using long short-term memory (LSTM). Journal of Intelligent & Fuzzy Systems, 34(3), 1277–1288.
Vinayakumar, R., Soman, K. P., Poornachandran, P., & Sachin Kumar, S. (2018). Evaluating deep learning approaches to characterize and classify the DGAs at scale. Journal of Intelligent & Fuzzy Systems, 34(3), 1265–1276.
Wang, S., & Yao, X. (2012). Multiclass imbalance problems: Analysis and potential solutions. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 42(4), 1119–1130.
Woodbridge, J., Anderson, H. S., Ahuja, A., & Grant, D. (2016). Predicting domain generation algorithms with long short-term memory networks. arXiv preprint arXiv:1611.00791.
Yadav, S., Reddy, A. K. K., Reddy, A. L., & Ranjan, S. (2010). Detecting algorithmically generated malicious domain names. In Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement (pp. 48–61). New York, NY: ACM.
Yadav, S., Reddy, A. K. K., Reddy, A. N., & Ranjan, S. (2012). Detecting algorithmically generated domain-flux attacks with DNS traffic analysis. IEEE/ACM Transactions on Networking, 20(5), 1663–1677.
Yu, B., Gray, D. L., Pan, J., De Cock, M., & Nascimento, A. C. (2017). Inline DGA detection with deep networks. In 2017 IEEE International Conference on Data Mining Workshops (ICDMW) (pp. 683–692). Piscataway, NJ: IEEE.
Yu, B., Pan, J., Hu, J., Nascimento, A., & De Cock, M. (2018). Character level based detection of DGA domain names. In 2018 International Joint Conference on Neural Networks (IJCNN) (pp. 1–8). Piscataway, NJ: IEEE.
Zhauniarovich, Y., Khalil, I., Yu, T., & Dacier, M. (2018). A Survey on malicious domains detection through DNS data analysis. ACM Computing Surveys, 51(4), 67
Zhou, Z. H., & Liu, X. Y. (2006). Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge and Data Engineering, 18(1), 63–77.
Zhou, Z. H., & Liu, X. Y. (2010). On multi-class cost-sensitive learning. Computational Intelligence, 26(3), 232–257.
Acknowledgements
This research was supported in part by Paramount Computer Systems and Lakhshya Cyber Security Labs. We are grateful to NVIDIA India, for the GPU hardware support to research grant. We are also grateful to Computational Engineering and Networking (CEN) department for encouraging the research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Vinayakumar, R., Soman, K.P., Poornachandran, P. (2020). DeepDGA-MINet: Cost-Sensitive Deep Learning Based Framework for Handling Multiclass Imbalanced DGA Detection. In: Gupta, B., Perez, G., Agrawal, D., Gupta, D. (eds) Handbook of Computer Networks and Cyber Security. Springer, Cham. https://doi.org/10.1007/978-3-030-22277-2_37
Download citation
DOI: https://doi.org/10.1007/978-3-030-22277-2_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22276-5
Online ISBN: 978-3-030-22277-2
eBook Packages: Computer ScienceComputer Science (R0)