Abstract
The Domain Name System (DNS) exfiltration is an activity in which an infected device sends data to the attacker’s server by encoding it in DNS request messages. Because of the frequent use of DNS exfiltration for malicious purposes, exfiltration detection gained attention from the research community which proposed several predominantly machine learning-based methods. The majority of previous studies used publicly available DNS exfiltration tools with the default configuration parameters, resulting in datasets created from DNS exfiltration requests that are usually significantly longer, have more DNS name labels, and higher character entropy than average regular DNS requests. This further led to overly optimistic detection rates. In this paper, we have explored some of the strategies an attacker could use to avoid exfiltration detection. First, we have explored the impact of DNS exfiltration tools’ parameter variation on the exfiltration detection accuracy. Second, we have modified the DNSExfiltrator tool to produce exfiltration requests which have significantly lower character entropy. This approach proved to be capable of deceiving classifiers based on single DNS request features. Only around 1% of modified DNS requests shorter or equal to 9 bytes, and less than one third of DNS exfiltration requests in the overall population were accurately detected. In addition, we present a methodology and an aggregated feature set (including inter-request timing statistics) which can be used for accurate DNS exfiltration in this kind of adversarial settings.
Similar content being viewed by others
Availability of data and material
Data is publicly available at [31]
References
New FrameworkPOS variant exfiltrates data via DNS requests (2014), G Data blog, https://www.gdatasoftware.com/blog/2014/10/23942-new-frameworkpos-variant-exfiltrates-data-via-dns-requests, Accessed on March 6 2023
Krebs B.: Deconstructing the 2014 Sally Beauty Breach (2015), Crebs on Security, https://krebsonsecurity.com/2015/05/deconstructing-the-2014-sally-beauty-breach/, Accessed on March 6th 2023
Netlab blog, New Threat: B1txor20, A Linux Backdoor Using DNS Tunnel, https://blog.netlab.360.com/b1txor20-use-of-dns-tunneling_en/, accessed on March 16th (2023)
Marinho, R.: Translating Saitama’s DNS tunneling messages, SANS Infosec handlers diary, https://isc.sans.edu/diary/Translating+Saitama%27s+DNS+tunneling+messages/28738, Accessed on March 16th (2023)
Yunakovsky S.,Pomerantsev I.: Denis and Co, Securelist by Kaspersky, https://securelist.com/denis-and-company/83671/, 2018, Accessed on March 6 (2023)
Tuna, O.F., Catak, F.O., Eskil, M.T.: TENET: a new hybrid network architecture for adversarial defense. Int. J. Inf. Secur. (2023). https://doi.org/10.1007/s10207-023-00675-1
Sabir, B., Ullah, F., Babar, M.A., Gaire, R.: Machine learning for detecting data exfiltration. ACM Comput. Surv. 54(3), 1–47 (2021). https://doi.org/10.1145/3442181
Wang, Y., Zhou, A., Liao, S., Zheng, R., Hu, R., Zhang, L.: A comprehensive survey on DNS tunnel detection. Comput. Netw. 197, 108322 (2021). https://doi.org/10.1016/j.comnet.2021.108322
Ishikura, N., Kondo, D., Vassiliades, V., Iordanov, I., Tode, H.: DNS tunneling detection by cache-property-aware features. IEEE Trans. Netw. Service Manag. 18(2), 1203–1217 (2021). https://doi.org/10.1109/TNSM.2021.3078428
Zhan, M., Li, Y., Yu, G., Li, B., Wang, W.: Detecting DNS over HTTPS based data exfiltration. Comput. Netw. 209, 108919 (2022). https://doi.org/10.1016/j.comnet.2022.108919
Ahmed, J., Gharakheili, H.H., Raza, Q., Russell, C., Sivaraman, V.: Real-time detection of DNS exfiltration and tunneling from enterprise networks. IFIP/IEEE Sympos. Integrat. Netw. Service Manag. (IM) 2019, 649–653 (2019)
Tatang, D., Quinkert, F., Holz, T.: Below the radar: spotting DNS tunnels in newly observed hostnames in the wild. APWG Sympos. Electron. Crime Res. (ECrime) 2019, 1–15 (2019). https://doi.org/10.1109/eCrime47957.2019.9037595
CIC-Bell-DNS-EXF-2021 Dataset, A collaborative project with Bell Canada (BC) Cyber Threat Intelligence (CTI), https://www.unb.ca/cic/datasets/dns-exf-2021.html, Accessed on October 22, (2022)
Wang, S., Sun, L., Qin, S., Li, W., Liu, W.: KRTunnel: DNS channel detector for mobile devices. Comput. Secur. 120, 102818 (2022). https://doi.org/10.1016/j.cose.2022.102818
Liu, J., Li, S., Zhang, Y., Xiao, J., Chang, P., Peng, C.: Detecting DNS tunnel through binary-classification based on behavior features. Proceedings - 16th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, 11th IEEE International Conference on Big Data Science and Engineering and 14th IEEE International Conference on Embedded Software and Systems, 339-346. https://doi.org/10.1109/Trustcom/BigDataSE/ICESS.2017.256 (2017)
Bai, H., Liu, W., Liu, G., Dai, Y., Huang, S.: Application behavior identification in DNS tunnels based on spatial-temporal information. IEEE Access 9, 80639–80653 (2021). https://doi.org/10.1109/ACCESS.2021.3085500
Xu, K., Butler, P., Saha, S., Yao, D.: DNS for massive-scale command and control. IEEE Trans. Dependable Secure Comput. 10(3), 143–153 (2013). https://doi.org/10.1109/TDSC.2013.10
Jovanović, Ɖ., Vuletić, P.: Analysis and characterization of IoT malware command and control communication. Telfor Journal 12(2), 80–85 (2020). https://doi.org/10.5937/telfor2002080J
Paxson, V., Christodorescu, M., Javed, M., Rao, J., Sailer, R., Schales, D.L., Stoecklin, M., Thomas, K., Venema, W., Weaver, N.: Practical Comprehensive Bounds on Surreptitious Communication over DNS. 22nd USENIX Security Symposium (USENIX Security 13), 17-32. https://www.usenix.org/conference/usenixsecurity13/technical-sessions/presentation/paxson (2013)
Almusawi, A., Amintoosi, H.: DNS tunneling detection method based on multilabel support vector machine. Security and Commun. Netw. 2018, 1–9 (2018). https://doi.org/10.1155/2018/6137098
Nadler, A., Aminov, A., Shabtai, A.: Detection of malicious and low throughput data exfiltration over the DNS protocol. Comput. Secur. 80, 36–53 (2019). https://doi.org/10.1016/j.cose.2018.09.006
Aiello, M., Mongelli, M., Papaleo, G.: Basic classifiers for DNS tunneling detection. Proceedings - International Symposium on Computers and Communications 880–885, (2013). https://doi.org/10.1109/ISCC.2013.6755060
Chen, S., Lang, B., Liu, H., Li, D., Gao, C.: DNS covert channel detection method using the LSTM model. Comput. Secur. 104, 102095 (2021). https://doi.org/10.1016/j.cose.2020.102095
Homem, I., Papapetrou, P., Dosis, S.: Information-Entropy-Based DNS Tunnel Prediction pp. 127-140. https://doi.org/10.1007/978-3-319-99277-8_8 (2018)
Steadman, J., Scott-Hayward, S.: DNSxD: Detecting Data Exfiltration over DNS. 2018 IEEE Conference on Network Function Virtualization and Software Defined Networks, NFV-SDN 2018, 2013, 1-6. (2018). https://doi.org/10.1109/NFV-SDN.2018.8725640
Shafieian, S., Smith, D., Zulkernine, M.: Detecting DNS Tunneling Using Ensemble Learning (pp. 112-127). https://doi.org/10.1007/978-3-319-64701-2_9 (2017)
D’Angelo, G., Castiglione, A., Palmieri, F.: DNS tunnels detection via DNS-images. Inf. Process. Manage. 59(3), 102930 (2022). https://doi.org/10.1016/j.ipm.2022.102930
Steadman, J., Scott-Hayward, S.: DNSxP: Enhancing data exfiltration protection through data plane programmability. Comput. Netw. 195, 108174 (2021). https://doi.org/10.1016/j.comnet.2021.108174
Hu, Z., Zhu, L., Heidemann, J., Mankin, A., Wessels, D., Hoffman, P.: Specification for DNS over Transport Layer Security (TLS), IETF RFC 7858, ISSN: 2070-1721
https://github.com/kristijanziza/dns , Accessed on March 20th, (2023)
Ziza, K., Vuletić, P., Tadić, P.: DNS Exfiltration Dataset, Mendeley Data, v2 https://doi.org/10.17632/c4n7fckkz3.2 (2022)
DNS Exfiltration classifiers, https://github.com/ptadic/dns-exfiltration, Accessed on March 4th, (2023)
Sagi, O., Rokach, L.: Ensemble learning: a survey. Wiley Interdisciplin. Rev.: Data Mining and Knowl. Dis. 8(4), e1249 (2018). https://doi.org/10.1002/widm.1249
Rincy, T.N., Gupta, R.: Ensemble learning techniques and its efficiency in machine learning: A survey. 2nd International Conference on Data, Engineering and Applications (IDEA), 1-6. https://doi.org/10.1109/IDEA49133.2020.9170675 (2020)
James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to statistical learning with applications in R, Second Edition. Springer Science+Business Media, LLC. ISBN 978-1-0716-1417-4. https://doi.org/10.1007/978-1-0716-1418-1
Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15(1), 3133–3181 (2014)
Wainberg, M., Alipanahi, B., Frey, B.J.: Are random forests truly the best classifiers? J. Mach. Learn. Res. 17(1), 3837–3841 (2016)
Géron, A.: Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. O’Reilly Media, Inc. ISBN 978-1-492-03264-9 (2019)
Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. Proceedings of the 22nd acm SIGKDD international conference on knowledge discovery and data mining (pp. 785-794) (2016)
https://github.com/dmlc/xgboost/tree/master/demo#machine-learning-challenge-winning-solutions , Accessed on March 28th, (2023)
Ho, T.K.: Random decision forests. In Proceedings of 3rd international conference on document analysis and recognition (Vol. 1, pp. 278-282). IEEE. https://doi.org/10.1109/icdar.1995.598994 (1995)
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998). https://doi.org/10.1109/34.709601
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/a:1010933404324
Pedregosa, Fabian, et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Hastie, T., Tibshirani, R., Friedman, J.H., Friedman, J.H.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd edition). Springer, Berlin (2009)
iodine DNS exfiltration tool, https://code.kryo.se/iodine/, accessed on May 27th, (2023)
DNSexfiltrator, https://github.com/Arno0x/DNSExfiltrator, Accessed on May 27th, (2023)
Funding
This work has been supported by the Ministry of Education, Science and Technological Development of the Republic of Serbia under Grant Agreement No. 451-03-68/2022-14/200103, and under projects TR32038 and III42007.
Author information
Authors and Affiliations
Contributions
Kristijan Žiža: Data curation, Investigation, Methodology, Software, Writing. Predrag Tadić: Data curation, Investigation, Validation, Software, Writing - review and editing. Pavle Vuletić: Conceptualisation, Methodology, Investigation, Resources, Writing - review and editing, Validation, Supervision
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflicts of interest to declare. All co-authors have seen and agreed with the contents of the manuscript. We certify that the submission is original work and is not under review at any other publication.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Appendix A
Appendix A: Appendix A
Due to the unavailability of the datasets, tools and full descriptions of the classification methodologies in previous studies, it is not possible to compare different detection strategies in a fair and unbiased way. In order to still provide some sort of comparison with the classification methodologies used in previous research we have analysed 9 additional classification methods found in the previous research. Table 7 lists the classification results obtained for 11 machine learning models: logistic regression (LR), support vector machine with the Gaussian kernel (G-SVM), support vector machine with a linear kernel (L-SVM), naïve Bayes (NB), decision tree (DT), random forest (RF), extremely randomised trees (ERT), AdaBoost (AB), histogram-based gradient boosting (HBG), multi-layer perceptron (MLP) and XGBoost (XGB). We report the accuracies and the F1-score averaged over the two classes (exfiltration or legitimate request). We have analysed the classifiers in three different cases: 1) original examples (regular requests plus attacks generated by the unmodified exfiltrator), 2) Modified DNSexfiltrator examples, and 3) All examples (both previous groups taken together). To speed up training, we used a smaller training set, composed of around 3.5 M datapoints randomly chosen from the original dataset and 17k modified requests. We used both individual and aggregated features. The test set had the same number of original requests as the training set and 25k modified examples. XGBoost achieved the best metrics in all three categories, although the differences between most models are quite small.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Žiža, K., Tadić, P. & Vuletić, P. DNS exfiltration detection in the presence of adversarial attacks and modified exfiltrator behaviour. Int. J. Inf. Secur. 22, 1865–1880 (2023). https://doi.org/10.1007/s10207-023-00723-w
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10207-023-00723-w