Skip to main content
Log in

DNS exfiltration detection in the presence of adversarial attacks and modified exfiltrator behaviour

  • Regular Contribution
  • Published:
International Journal of Information Security Aims and scope Submit manuscript

Abstract

The Domain Name System (DNS) exfiltration is an activity in which an infected device sends data to the attacker’s server by encoding it in DNS request messages. Because of the frequent use of DNS exfiltration for malicious purposes, exfiltration detection gained attention from the research community which proposed several predominantly machine learning-based methods. The majority of previous studies used publicly available DNS exfiltration tools with the default configuration parameters, resulting in datasets created from DNS exfiltration requests that are usually significantly longer, have more DNS name labels, and higher character entropy than average regular DNS requests. This further led to overly optimistic detection rates. In this paper, we have explored some of the strategies an attacker could use to avoid exfiltration detection. First, we have explored the impact of DNS exfiltration tools’ parameter variation on the exfiltration detection accuracy. Second, we have modified the DNSExfiltrator tool to produce exfiltration requests which have significantly lower character entropy. This approach proved to be capable of deceiving classifiers based on single DNS request features. Only around 1% of modified DNS requests shorter or equal to 9 bytes, and less than one third of DNS exfiltration requests in the overall population were accurately detected. In addition, we present a methodology and an aggregated feature set (including inter-request timing statistics) which can be used for accurate DNS exfiltration in this kind of adversarial settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Availability of data and material

Data is publicly available at [31]

Code availability

Code is publicly available at [30, 32]

References

  1. New FrameworkPOS variant exfiltrates data via DNS requests (2014), G Data blog, https://www.gdatasoftware.com/blog/2014/10/23942-new-frameworkpos-variant-exfiltrates-data-via-dns-requests, Accessed on March 6 2023

  2. Krebs B.: Deconstructing the 2014 Sally Beauty Breach (2015), Crebs on Security, https://krebsonsecurity.com/2015/05/deconstructing-the-2014-sally-beauty-breach/, Accessed on March 6th 2023

  3. Netlab blog, New Threat: B1txor20, A Linux Backdoor Using DNS Tunnel, https://blog.netlab.360.com/b1txor20-use-of-dns-tunneling_en/, accessed on March 16th (2023)

  4. Marinho, R.: Translating Saitama’s DNS tunneling messages, SANS Infosec handlers diary, https://isc.sans.edu/diary/Translating+Saitama%27s+DNS+tunneling+messages/28738, Accessed on March 16th (2023)

  5. Yunakovsky S.,Pomerantsev I.: Denis and Co, Securelist by Kaspersky, https://securelist.com/denis-and-company/83671/, 2018, Accessed on March 6 (2023)

  6. Tuna, O.F., Catak, F.O., Eskil, M.T.: TENET: a new hybrid network architecture for adversarial defense. Int. J. Inf. Secur. (2023). https://doi.org/10.1007/s10207-023-00675-1

    Article  Google Scholar 

  7. Sabir, B., Ullah, F., Babar, M.A., Gaire, R.: Machine learning for detecting data exfiltration. ACM Comput. Surv. 54(3), 1–47 (2021). https://doi.org/10.1145/3442181

    Article  Google Scholar 

  8. Wang, Y., Zhou, A., Liao, S., Zheng, R., Hu, R., Zhang, L.: A comprehensive survey on DNS tunnel detection. Comput. Netw. 197, 108322 (2021). https://doi.org/10.1016/j.comnet.2021.108322

    Article  Google Scholar 

  9. Ishikura, N., Kondo, D., Vassiliades, V., Iordanov, I., Tode, H.: DNS tunneling detection by cache-property-aware features. IEEE Trans. Netw. Service Manag. 18(2), 1203–1217 (2021). https://doi.org/10.1109/TNSM.2021.3078428

    Article  Google Scholar 

  10. Zhan, M., Li, Y., Yu, G., Li, B., Wang, W.: Detecting DNS over HTTPS based data exfiltration. Comput. Netw. 209, 108919 (2022). https://doi.org/10.1016/j.comnet.2022.108919

    Article  Google Scholar 

  11. Ahmed, J., Gharakheili, H.H., Raza, Q., Russell, C., Sivaraman, V.: Real-time detection of DNS exfiltration and tunneling from enterprise networks. IFIP/IEEE Sympos. Integrat. Netw. Service Manag. (IM) 2019, 649–653 (2019)

    Google Scholar 

  12. Tatang, D., Quinkert, F., Holz, T.: Below the radar: spotting DNS tunnels in newly observed hostnames in the wild. APWG Sympos. Electron. Crime Res. (ECrime) 2019, 1–15 (2019). https://doi.org/10.1109/eCrime47957.2019.9037595

    Article  Google Scholar 

  13. CIC-Bell-DNS-EXF-2021 Dataset, A collaborative project with Bell Canada (BC) Cyber Threat Intelligence (CTI), https://www.unb.ca/cic/datasets/dns-exf-2021.html, Accessed on October 22, (2022)

  14. Wang, S., Sun, L., Qin, S., Li, W., Liu, W.: KRTunnel: DNS channel detector for mobile devices. Comput. Secur. 120, 102818 (2022). https://doi.org/10.1016/j.cose.2022.102818

    Article  Google Scholar 

  15. Liu, J., Li, S., Zhang, Y., Xiao, J., Chang, P., Peng, C.: Detecting DNS tunnel through binary-classification based on behavior features. Proceedings - 16th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, 11th IEEE International Conference on Big Data Science and Engineering and 14th IEEE International Conference on Embedded Software and Systems, 339-346. https://doi.org/10.1109/Trustcom/BigDataSE/ICESS.2017.256 (2017)

  16. Bai, H., Liu, W., Liu, G., Dai, Y., Huang, S.: Application behavior identification in DNS tunnels based on spatial-temporal information. IEEE Access 9, 80639–80653 (2021). https://doi.org/10.1109/ACCESS.2021.3085500

    Article  Google Scholar 

  17. Xu, K., Butler, P., Saha, S., Yao, D.: DNS for massive-scale command and control. IEEE Trans. Dependable Secure Comput. 10(3), 143–153 (2013). https://doi.org/10.1109/TDSC.2013.10

    Article  Google Scholar 

  18. Jovanović, Ɖ., Vuletić, P.: Analysis and characterization of IoT malware command and control communication. Telfor Journal 12(2), 80–85 (2020). https://doi.org/10.5937/telfor2002080J

  19. Paxson, V., Christodorescu, M., Javed, M., Rao, J., Sailer, R., Schales, D.L., Stoecklin, M., Thomas, K., Venema, W., Weaver, N.: Practical Comprehensive Bounds on Surreptitious Communication over DNS. 22nd USENIX Security Symposium (USENIX Security 13), 17-32. https://www.usenix.org/conference/usenixsecurity13/technical-sessions/presentation/paxson (2013)

  20. Almusawi, A., Amintoosi, H.: DNS tunneling detection method based on multilabel support vector machine. Security and Commun. Netw. 2018, 1–9 (2018). https://doi.org/10.1155/2018/6137098

    Article  Google Scholar 

  21. Nadler, A., Aminov, A., Shabtai, A.: Detection of malicious and low throughput data exfiltration over the DNS protocol. Comput. Secur. 80, 36–53 (2019). https://doi.org/10.1016/j.cose.2018.09.006

    Article  Google Scholar 

  22. Aiello, M., Mongelli, M., Papaleo, G.: Basic classifiers for DNS tunneling detection. Proceedings - International Symposium on Computers and Communications 880–885, (2013). https://doi.org/10.1109/ISCC.2013.6755060

  23. Chen, S., Lang, B., Liu, H., Li, D., Gao, C.: DNS covert channel detection method using the LSTM model. Comput. Secur. 104, 102095 (2021). https://doi.org/10.1016/j.cose.2020.102095

    Article  Google Scholar 

  24. Homem, I., Papapetrou, P., Dosis, S.: Information-Entropy-Based DNS Tunnel Prediction pp. 127-140. https://doi.org/10.1007/978-3-319-99277-8_8 (2018)

  25. Steadman, J., Scott-Hayward, S.: DNSxD: Detecting Data Exfiltration over DNS. 2018 IEEE Conference on Network Function Virtualization and Software Defined Networks, NFV-SDN 2018, 2013, 1-6. (2018). https://doi.org/10.1109/NFV-SDN.2018.8725640

  26. Shafieian, S., Smith, D., Zulkernine, M.: Detecting DNS Tunneling Using Ensemble Learning (pp. 112-127). https://doi.org/10.1007/978-3-319-64701-2_9 (2017)

  27. D’Angelo, G., Castiglione, A., Palmieri, F.: DNS tunnels detection via DNS-images. Inf. Process. Manage. 59(3), 102930 (2022). https://doi.org/10.1016/j.ipm.2022.102930

    Article  Google Scholar 

  28. Steadman, J., Scott-Hayward, S.: DNSxP: Enhancing data exfiltration protection through data plane programmability. Comput. Netw. 195, 108174 (2021). https://doi.org/10.1016/j.comnet.2021.108174

    Article  Google Scholar 

  29. Hu, Z., Zhu, L., Heidemann, J., Mankin, A., Wessels, D., Hoffman, P.: Specification for DNS over Transport Layer Security (TLS), IETF RFC 7858, ISSN: 2070-1721

  30. https://github.com/kristijanziza/dns , Accessed on March 20th, (2023)

  31. Ziza, K., Vuletić, P., Tadić, P.: DNS Exfiltration Dataset, Mendeley Data, v2 https://doi.org/10.17632/c4n7fckkz3.2 (2022)

  32. DNS Exfiltration classifiers, https://github.com/ptadic/dns-exfiltration, Accessed on March 4th, (2023)

  33. Sagi, O., Rokach, L.: Ensemble learning: a survey. Wiley Interdisciplin. Rev.: Data Mining and Knowl. Dis. 8(4), e1249 (2018). https://doi.org/10.1002/widm.1249

    Article  Google Scholar 

  34. Rincy, T.N., Gupta, R.: Ensemble learning techniques and its efficiency in machine learning: A survey. 2nd International Conference on Data, Engineering and Applications (IDEA), 1-6. https://doi.org/10.1109/IDEA49133.2020.9170675 (2020)

  35. James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to statistical learning with applications in R, Second Edition. Springer Science+Business Media, LLC. ISBN 978-1-0716-1417-4. https://doi.org/10.1007/978-1-0716-1418-1

  36. Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15(1), 3133–3181 (2014)

  37. Wainberg, M., Alipanahi, B., Frey, B.J.: Are random forests truly the best classifiers? J. Mach. Learn. Res. 17(1), 3837–3841 (2016)

    MathSciNet  Google Scholar 

  38. Géron, A.: Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. O’Reilly Media, Inc. ISBN 978-1-492-03264-9 (2019)

  39. Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. Proceedings of the 22nd acm SIGKDD international conference on knowledge discovery and data mining (pp. 785-794) (2016)

  40. https://github.com/dmlc/xgboost/tree/master/demo#machine-learning-challenge-winning-solutions , Accessed on March 28th, (2023)

  41. Ho, T.K.: Random decision forests. In Proceedings of 3rd international conference on document analysis and recognition (Vol. 1, pp. 278-282). IEEE. https://doi.org/10.1109/icdar.1995.598994 (1995)

  42. Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998). https://doi.org/10.1109/34.709601

    Article  Google Scholar 

  43. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/a:1010933404324

    Article  MATH  Google Scholar 

  44. Pedregosa, Fabian, et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  45. Hastie, T., Tibshirani, R., Friedman, J.H., Friedman, J.H.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd edition). Springer, Berlin (2009)

    Book  MATH  Google Scholar 

  46. iodine DNS exfiltration tool, https://code.kryo.se/iodine/, accessed on May 27th, (2023)

  47. DNSexfiltrator, https://github.com/Arno0x/DNSExfiltrator, Accessed on May 27th, (2023)

Download references

Funding

This work has been supported by the Ministry of Education, Science and Technological Development of the Republic of Serbia under Grant Agreement No. 451-03-68/2022-14/200103, and under projects TR32038 and III42007.

Author information

Authors and Affiliations

Authors

Contributions

Kristijan Žiža: Data curation, Investigation, Methodology, Software, Writing. Predrag Tadić: Data curation, Investigation, Validation, Software, Writing - review and editing. Pavle Vuletić: Conceptualisation, Methodology, Investigation, Resources, Writing - review and editing, Validation, Supervision

Corresponding author

Correspondence to Pavle Vuletić.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare. All co-authors have seen and agreed with the contents of the manuscript. We certify that the submission is original work and is not under review at any other publication.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Appendix A

Appendix A: Appendix A

Due to the unavailability of the datasets, tools and full descriptions of the classification methodologies in previous studies, it is not possible to compare different detection strategies in a fair and unbiased way. In order to still provide some sort of comparison with the classification methodologies used in previous research we have analysed 9 additional classification methods found in the previous research. Table 7 lists the classification results obtained for 11 machine learning models: logistic regression (LR), support vector machine with the Gaussian kernel (G-SVM), support vector machine with a linear kernel (L-SVM), naïve Bayes (NB), decision tree (DT), random forest (RF), extremely randomised trees (ERT), AdaBoost (AB), histogram-based gradient boosting (HBG), multi-layer perceptron (MLP) and XGBoost (XGB). We report the accuracies and the F1-score averaged over the two classes (exfiltration or legitimate request). We have analysed the classifiers in three different cases: 1) original examples (regular requests plus attacks generated by the unmodified exfiltrator), 2) Modified DNSexfiltrator examples, and 3) All examples (both previous groups taken together). To speed up training, we used a smaller training set, composed of around 3.5 M datapoints randomly chosen from the original dataset and 17k modified requests. We used both individual and aggregated features. The test set had the same number of original requests as the training set and 25k modified examples. XGBoost achieved the best metrics in all three categories, although the differences between most models are quite small.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Žiža, K., Tadić, P. & Vuletić, P. DNS exfiltration detection in the presence of adversarial attacks and modified exfiltrator behaviour. Int. J. Inf. Secur. 22, 1865–1880 (2023). https://doi.org/10.1007/s10207-023-00723-w

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10207-023-00723-w

Keywords

Navigation