International Journal of Information Security

, Volume 18, Issue 6, pp 761–785 | Cite as

DDoS attack detection with feature engineering and machine learning: the framework and performance evaluation

  • Muhammad AamirEmail author
  • Syed Mustafa Ali Zaidi
Regular contribution


This paper applies an organized flow of feature engineering and machine learning to detect distributed denial-of-service (DDoS) attacks. Feature engineering has a focus to obtain the datasets of different dimensions with significant features, using feature selection methods of backward elimination, chi2, and information gain scores. Different supervised machine learning models are applied on the feature-engineered datasets to demonstrate the adaptability of datasets for machine learning under optimal tuning of parameters within given sets of values. The results show that substantial feature reduction is possible to make DDoS detection faster and optimized with minimal performance hit. The paper proposes a strategic-level framework which incorporates the necessary elements of feature engineering and machine learning with a defined flow of experimentation. The models are also validated with cross-validation and evaluated for area-under-curve analyses. It provides comprehensive solutions which can be trusted to avoid the overfitting and collinearity problems of data while detecting DDoS attacks. In the case study of DDoS datasets, K-nearest neighbors algorithm overall exhibits the best performance followed by support vector machine, whereas low-dimensional datasets of discrete feature types perform better under the Random Forest model as compared to high dimensions with numerical features. The accuracy scores of dataset with the lowest number of features remain competitive with other datasets under all machine learning models, leading to a substantially reduced processing overhead. The experiments show that approximately 68% reduction in the feature space is possible with an impact of only about 0.03% on accuracy.


Cyber security DDoS attacks denial-of-service Feature engineering Feature selection Machine learning Neural network 


Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.


  1. 1.
    Mitrokotsa, A., Douligeris, C.: Denial of Service Attacks, Network Security: Current Status and Future Directions, pp. 117–134. Wiley, Hoboken (2006)Google Scholar
  2. 2.
    Zhang, L., Yu, S., Wu, D., Watters, P.: A survey on latest botnet attack and defense. In: 10th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), IEEE, pp. 53–60 (2011)Google Scholar
  3. 3.
    State of the Internet Security—Q4 2017, Report from Akamai, 4(4), (2018)Google Scholar
  4. 4.
    Nagesh, K., Sumathy, R., Devakumar, P., Sathiyamurthy, K.: A survey on denial of service attacks and preclusions. In: International conference on informatics and analytics, p. 118 (2016)Google Scholar
  5. 5.
  6. 6.
  7. 7.
    CAIDA Anonymized Internet Traces 2008 Dataset.
  8. 8.
    Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the KDD CUP 99 data set. In: Symposium on Computational Intelligence for Security and Defense Applications (CISDA), IEEE, pp. 1–6 (2009)Google Scholar
  9. 9.
  10. 10.
  11. 11.
    Shiravi, A., Shiravi, H., Tavallaee, M., Ghorbani, A.A.: Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput. Secur. 31(3), 357–374 (2012)Google Scholar
  12. 12.
    Moustafa, N., Slay, J.: UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: Military Communications and Information Systems Conference (MilCIS), pp. 1–6 (2015)Google Scholar
  13. 13.
    Gao, Y., Feng, Y., Kawamoto, J., Sakurai, K.: A machine learning based approach for detecting DRDoS attacks and its performance evaluation. In: 11th Asia Joint Conference on Information Security (AsiaJCIS), pp. 80–86 (2016)Google Scholar
  14. 14.
    Singh, N.A., Singh, K.J., De, T.: Distributed denial of service attack detection using Naive Bayes classifier through info gain feature selection. In: International Conference on Informatics and Analytics, p. 54 (2016)Google Scholar
  15. 15.
    Azab, A., Alazab, M., Aiash, M.: Machine learning based botnet identification traffic. In: Trustcom/BigDataSE/I SPA, IEEE, pp. 1788–1794 (2016)Google Scholar
  16. 16.
    Yusof, A.R., Udzir, N.I., Selamat, A., Hamdan, H., Abdullah, M.T.: Adaptive feature selection for denial of services (DoS) attack. In: IEEE Conference on Application, Information and Network Security (AINS), IEEE, pp. 81–84 (2017)Google Scholar
  17. 17.
    Singh, K.J., De, T.: Efficient classification of DDoS attacks using an ensemble feature selection algorithm. J. Intell. Syst (2017).
  18. 18.
    Khan, S., Gani, A., Wahab, A.W.A., Singh, P.K.: Feature selection of Denial-of-Service attacks using entropy and granular computing. Arab. J. Sci. Eng. 43(2), 499–508 (2018)Google Scholar
  19. 19.
    Alejandre, F.V., Corts, N.C., Anaya, E.A.: Feature selection to detect botnets using machine learning algorithms. In: International Conference on Electronics, Communications and Computers (CONIELECOMP), pp. 1–7 (2017)Google Scholar
  20. 20.
    Al-Hawawreh, M.S.: SYN flood attack detection in cloud environment based on TCP/IP header statistical features. In: 8th International Conference on Information Technology (ICIT), pp. 236–243 (2017)Google Scholar
  21. 21.
    Li, J., Liu, Y., Gu, L.: DDoS attack detection based on neural network. In: 2nd International Symposium on Aware Computing (ISAC), pp. 196–199 (2010)Google Scholar
  22. 22.
    Agrawal, P.K., Gupta, B.B., Jain, S., Pattanshetti, M.K.: Estimating Strength of a DDoS Attack in Real Time Using ANN Based Scheme, Computer Networks and Intelligent Computing, pp. 301–310. Springer, Berlin (2011)Google Scholar
  23. 23.
    Gupta, B.B., Joshi, R.C., Misra, M., Jain, A., Juyal, S., Prabhakar, R., Singh, A.K.: Predicting Number of Zombies in a DDoS Attack Using ANN Based Scheme, Information Technology and Mobile Communication, pp. 117–122. Springer, Berlin (2011)Google Scholar
  24. 24.
    Bansal, A., Mahapatra, S.: A comparative analysis of machine learning techniques for botnet detection. In: 10th International Conference on Security of Information and Networks, pp. 91–98 (2017)Google Scholar
  25. 25.
    Lu, L., Feng, Y., Sakurai, K.: C&C session detection using random forest. In: 11th International Conference on Ubiquitous Information Management and Communication, p. 34 (2017)Google Scholar
  26. 26.
    Zekri, M., El Kafhali, S., Aboutabit, N., Saadi, Y.: DDoS attack detection using machine learning techniques in cloud computing environments. In: 3rd International Conference of Cloud Computing Technologies and Applications (CloudTech), pp. 1–7 (2017)Google Scholar
  27. 27.
    Yuan, X., Li, C., Li, X.: DeepDefense: identifying DDoS attack via deep learning. In: International Conference on Smart Computing (SMARTCOMP), IEEE, pp. 1–8 (2017)Google Scholar
  28. 28.
    Alkasassbeh, M., Al-Naymat, G., Hassanat, A.B., Almseidin, M.: Detecting distributed denial of service attacks using data mining techniques. Int. J. Adv. Comput. Sci. Appl. 7(1), 436–445 (2016)Google Scholar
  29. 29.
    Singh, K., Singh, P., Kumar, K.: Application layer HTTP-GET flood DDoS attacks: research landscape and challenges. Comput. Secur. 65, 344–372 (2017)Google Scholar
  30. 30.
    Tripathi, N., Hubballi, N.: Slow rate denial of service attacks against HTTP/2 and detection. Comput. Secur. 72, 255–272 (2018)Google Scholar
  31. 31.
    Jonker, M., King, A., Krupp, J., Rossow, C., Sperotto, A., Dainotti, A.: Millions of targets under attack: a macroscopic characterization of the DoS ecosystem. In: Internet Measurement Conference, pp. 100–113 (2017)Google Scholar
  32. 32.
    Aamir, M., Zaidi, M.A.: A survey on DDoS attack and defense strategies: from traditional schemes to current techniques. Interdiscip. Inf. Sci. 19(2), 173–200 (2013)Google Scholar
  33. 33.
    Shakeel, F., Sabhitha, A.S., Sharma, S.: Exploratory review on class imbalance problem: an overview. In: 8th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–8 (2017)Google Scholar
  34. 34.
    Idhammad, M., Afdel, K., Belouch, M.: Semi-supervised machine learning approach for DDoS detection. Appl. Intell. 48, 1–16 (2018)Google Scholar
  35. 35.
    Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)Google Scholar
  36. 36.
    Miller, S., Busby-Earle, C.: The role of machine learning in botnet detection. In: 11th International Conference for Internet Technology and Secured Transactions (ICITST), pp. 359–364 (2016)Google Scholar
  37. 37.
    Kirubavathi, G., Anitha, R.: Botnet detection via mining of traffic flow characteristics. Comput. Electr. Eng. 50, 91–101 (2016)Google Scholar
  38. 38.
    Osanaiye, O., Choo, K.-K.R., Dlodlo, M.: Analysing feature selection and classification techniques for DDoS detection in cloud. In: Proceedings of Southern Africa Telecommunication (2016)Google Scholar
  39. 39.
    Larose, D.T., Larose, C.D.: k-Nearest neighbor algorithm. Discovering Knowledge in Data: an Introduction to Data Mining, 2nd edn, pp. 149–164. John Wiley & Sons (2014)Google Scholar
  40. 40.
    Wu, X., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)Google Scholar
  41. 41.
    Suthaharan, S.: Support Vector Machine, Machine Learning Models and Algorithms for Big Data Classification, pp. 207–235. Springer, Berlin (2016)zbMATHGoogle Scholar
  42. 42.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)zbMATHGoogle Scholar
  43. 43.
    Nielsen, M.A.: Neural Networks and Deep Learning. Determination Press (2015).
  44. 44.
    Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: 14th International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011)Google Scholar
  45. 45.
    scikit-learn: Data science library for Python.
  46. 46.
    TensorFlow: Open source ML platform.
  47. 47.
    Loh, W.-Y.: Classification and regression trees. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 1(1), 14–23 (2011)Google Scholar
  48. 48.
    Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 30(7), 1145–1159 (1997)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Shaheed Zulfikar Ali Bhutto Institute of Science & Technology (SZABIST)KarachiPakistan

Personalised recommendations