Abstract
This paper applies an organized flow of feature engineering and machine learning to detect distributed denial-of-service (DDoS) attacks. Feature engineering has a focus to obtain the datasets of different dimensions with significant features, using feature selection methods of backward elimination, chi2, and information gain scores. Different supervised machine learning models are applied on the feature-engineered datasets to demonstrate the adaptability of datasets for machine learning under optimal tuning of parameters within given sets of values. The results show that substantial feature reduction is possible to make DDoS detection faster and optimized with minimal performance hit. The paper proposes a strategic-level framework which incorporates the necessary elements of feature engineering and machine learning with a defined flow of experimentation. The models are also validated with cross-validation and evaluated for area-under-curve analyses. It provides comprehensive solutions which can be trusted to avoid the overfitting and collinearity problems of data while detecting DDoS attacks. In the case study of DDoS datasets, K-nearest neighbors algorithm overall exhibits the best performance followed by support vector machine, whereas low-dimensional datasets of discrete feature types perform better under the Random Forest model as compared to high dimensions with numerical features. The accuracy scores of dataset with the lowest number of features remain competitive with other datasets under all machine learning models, leading to a substantially reduced processing overhead. The experiments show that approximately 68% reduction in the feature space is possible with an impact of only about 0.03% on accuracy.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The wrapper method applies a regressor on the identified feature and validates its significance with accuracies of regression. Another approach is the filter method which directly selects the features based on statistical scores without any validation approach of regression.
The data curves are supposed to be normal distributions where sample size > 30. As the population mean is not known and we are only dealing with samples, the t-statistic test is applied
Epoch is a round of completion when all the records of a dataset have been fed to the neural network. If epoch No. 1 is just completed, all the records will again be fed in epoch No. 2 and so on. It is not necessary that all the records are fed simultaneously or sequentially in one batch for an epoch. This part is driven by the batch size parameter which is configurable for ANN.
Making a change in kernel according to the underlying data is termed as ‘kernel trick.’
CART is one of the decision tree algorithms. There is a bunch of others including ID3, C4.5, MARS (multivariate adaptive regression splines) etc.
References
Mitrokotsa, A., Douligeris, C.: Denial of Service Attacks, Network Security: Current Status and Future Directions, pp. 117–134. Wiley, Hoboken (2006)
Zhang, L., Yu, S., Wu, D., Watters, P.: A survey on latest botnet attack and defense. In: 10th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), IEEE, pp. 53–60 (2011)
State of the Internet Security—Q4 2017, Report from Akamai, 4(4), (2018)
Nagesh, K., Sumathy, R., Devakumar, P., Sathiyamurthy, K.: A survey on denial of service attacks and preclusions. In: International conference on informatics and analytics, p. 118 (2016)
KDD Cup 1999 Dataset. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
CAIDA DDoS Attack 2007 Dataset. http://www.caida.org/data/passive/ddos-20070804_dataset.xml
CAIDA Anonymized Internet Traces 2008 Dataset. http://www.caida.org/data/passive/passive_2008_dataset.xml
Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the KDD CUP 99 data set. In: Symposium on Computational Intelligence for Security and Defense Applications (CISDA), IEEE, pp. 1–6 (2009)
ISOT Botnet Dataset. https://www.uvic.ca/engineering/ece/isot/datasets/index.php
The Honeynet Project. http://www.honeynet.org/chapters/france
Shiravi, A., Shiravi, H., Tavallaee, M., Ghorbani, A.A.: Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput. Secur. 31(3), 357–374 (2012)
Moustafa, N., Slay, J.: UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: Military Communications and Information Systems Conference (MilCIS), pp. 1–6 (2015)
Gao, Y., Feng, Y., Kawamoto, J., Sakurai, K.: A machine learning based approach for detecting DRDoS attacks and its performance evaluation. In: 11th Asia Joint Conference on Information Security (AsiaJCIS), pp. 80–86 (2016)
Singh, N.A., Singh, K.J., De, T.: Distributed denial of service attack detection using Naive Bayes classifier through info gain feature selection. In: International Conference on Informatics and Analytics, p. 54 (2016)
Azab, A., Alazab, M., Aiash, M.: Machine learning based botnet identification traffic. In: Trustcom/BigDataSE/I SPA, IEEE, pp. 1788–1794 (2016)
Yusof, A.R., Udzir, N.I., Selamat, A., Hamdan, H., Abdullah, M.T.: Adaptive feature selection for denial of services (DoS) attack. In: IEEE Conference on Application, Information and Network Security (AINS), IEEE, pp. 81–84 (2017)
Singh, K.J., De, T.: Efficient classification of DDoS attacks using an ensemble feature selection algorithm. J. Intell. Syst (2017). https://doi.org/10.1515/jisys-2017-0472
Khan, S., Gani, A., Wahab, A.W.A., Singh, P.K.: Feature selection of Denial-of-Service attacks using entropy and granular computing. Arab. J. Sci. Eng. 43(2), 499–508 (2018)
Alejandre, F.V., Corts, N.C., Anaya, E.A.: Feature selection to detect botnets using machine learning algorithms. In: International Conference on Electronics, Communications and Computers (CONIELECOMP), pp. 1–7 (2017)
Al-Hawawreh, M.S.: SYN flood attack detection in cloud environment based on TCP/IP header statistical features. In: 8th International Conference on Information Technology (ICIT), pp. 236–243 (2017)
Li, J., Liu, Y., Gu, L.: DDoS attack detection based on neural network. In: 2nd International Symposium on Aware Computing (ISAC), pp. 196–199 (2010)
Agrawal, P.K., Gupta, B.B., Jain, S., Pattanshetti, M.K.: Estimating Strength of a DDoS Attack in Real Time Using ANN Based Scheme, Computer Networks and Intelligent Computing, pp. 301–310. Springer, Berlin (2011)
Gupta, B.B., Joshi, R.C., Misra, M., Jain, A., Juyal, S., Prabhakar, R., Singh, A.K.: Predicting Number of Zombies in a DDoS Attack Using ANN Based Scheme, Information Technology and Mobile Communication, pp. 117–122. Springer, Berlin (2011)
Bansal, A., Mahapatra, S.: A comparative analysis of machine learning techniques for botnet detection. In: 10th International Conference on Security of Information and Networks, pp. 91–98 (2017)
Lu, L., Feng, Y., Sakurai, K.: C&C session detection using random forest. In: 11th International Conference on Ubiquitous Information Management and Communication, p. 34 (2017)
Zekri, M., El Kafhali, S., Aboutabit, N., Saadi, Y.: DDoS attack detection using machine learning techniques in cloud computing environments. In: 3rd International Conference of Cloud Computing Technologies and Applications (CloudTech), pp. 1–7 (2017)
Yuan, X., Li, C., Li, X.: DeepDefense: identifying DDoS attack via deep learning. In: International Conference on Smart Computing (SMARTCOMP), IEEE, pp. 1–8 (2017)
Alkasassbeh, M., Al-Naymat, G., Hassanat, A.B., Almseidin, M.: Detecting distributed denial of service attacks using data mining techniques. Int. J. Adv. Comput. Sci. Appl. 7(1), 436–445 (2016)
Singh, K., Singh, P., Kumar, K.: Application layer HTTP-GET flood DDoS attacks: research landscape and challenges. Comput. Secur. 65, 344–372 (2017)
Tripathi, N., Hubballi, N.: Slow rate denial of service attacks against HTTP/2 and detection. Comput. Secur. 72, 255–272 (2018)
Jonker, M., King, A., Krupp, J., Rossow, C., Sperotto, A., Dainotti, A.: Millions of targets under attack: a macroscopic characterization of the DoS ecosystem. In: Internet Measurement Conference, pp. 100–113 (2017)
Aamir, M., Zaidi, M.A.: A survey on DDoS attack and defense strategies: from traditional schemes to current techniques. Interdiscip. Inf. Sci. 19(2), 173–200 (2013)
Shakeel, F., Sabhitha, A.S., Sharma, S.: Exploratory review on class imbalance problem: an overview. In: 8th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–8 (2017)
Idhammad, M., Afdel, K., Belouch, M.: Semi-supervised machine learning approach for DDoS detection. Appl. Intell. 48, 1–16 (2018)
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)
Miller, S., Busby-Earle, C.: The role of machine learning in botnet detection. In: 11th International Conference for Internet Technology and Secured Transactions (ICITST), pp. 359–364 (2016)
Kirubavathi, G., Anitha, R.: Botnet detection via mining of traffic flow characteristics. Comput. Electr. Eng. 50, 91–101 (2016)
Osanaiye, O., Choo, K.-K.R., Dlodlo, M.: Analysing feature selection and classification techniques for DDoS detection in cloud. In: Proceedings of Southern Africa Telecommunication (2016)
Larose, D.T., Larose, C.D.: k-Nearest neighbor algorithm. Discovering Knowledge in Data: an Introduction to Data Mining, 2nd edn, pp. 149–164. John Wiley & Sons (2014)
Wu, X., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)
Suthaharan, S.: Support Vector Machine, Machine Learning Models and Algorithms for Big Data Classification, pp. 207–235. Springer, Berlin (2016)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Nielsen, M.A.: Neural Networks and Deep Learning. Determination Press (2015). http://neuralnetworksanddeeplearning.com/
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: 14th International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011)
scikit-learn: Data science library for Python. https://pypi.org/project/scikit-learn/
TensorFlow: Open source ML platform. https://www.tensorflow.org/
Loh, W.-Y.: Classification and regression trees. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 1(1), 14–23 (2011)
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 30(7), 1145–1159 (1997)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Aamir, M., Zaidi, S.M.A. DDoS attack detection with feature engineering and machine learning: the framework and performance evaluation. Int. J. Inf. Secur. 18, 761–785 (2019). https://doi.org/10.1007/s10207-019-00434-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10207-019-00434-1