Advertisement

A Detailed Analysis of the CICIDS2017 Data Set

  • Iman Sharafaldin
  • Arash Habibi LashkariEmail author
  • Ali A. Ghorbani
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 977)

Abstract

The likelihood of suffering damage from an attack is obvious with the exponential growth in the size of computer networks and the internet. Meanwhile, intrusion detection systems (IDSs) and intrusion prevention systems (IPSs) are one of the most important defensive tools against the ever more sophisticated and ever-growing frequency of network attacks. Anomaly-based research in intrusion detection systems suffers from inaccurate deployment, analysis and evaluation due to the lack of an adequate dataset. A number of datasets such as DARPA98, KDD99, ISC2012, and ADFA13 have been used by the researchers to evaluate the performance of their proposed intrusion detection and intrusion prevention approaches. Based on our study of 16 datasets since 1998, many are out of date and unreliable. There are various shortcomings: lack of traffic diversity and volume, incomplete attack coverage, anonymized packet information and payload which does not reflect the current reality, or they lack some feature set and metadata. This paper focused on CICIDS2017 as the last updated IDS dataset that contains benign and seven common attack network flows, which meets real world criteria and is publicly available. It also evaluates the effectiveness of a set of network traffic features and machine learning algorithms to indicate the best set of features for detecting an attack category. Furthermore, we define the concept of superfeatures which are high quality derived features using a dimension reduction algorithm. We show that the random forest algorithm as one of our best performing algorithm can achieve better results with superfeatures versus top selected features.

Keywords

Intrusion detection IDS dataset DoS Web attack Infiltration Brute force Superfeature 

Notes

Acknowledgements

The authors acknowledge the generous funding from the Atlantic Canada Opportunity Agency (ACOA) through the Atlantic Innovation Fund (AIF) and through grants from the National Science and Engineering Research Council of Canada (NSERC) to Dr. Ghorbani.

References

  1. 1.
    Shiravi, A., Shiravi, H., Tavallaee, M., Ghorbani, A.A.: Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput. Secur. 31(3), 357–374 (2012)CrossRefGoogle Scholar
  2. 2.
    Brown, C., Cowperthwaite, A., Hijazi, A., Somayaji, A.: Analysis of the 1999 DARPA/Lincoln laboratory IDS evaluation data with NetaDHICT. In: 2009 IEEE SCISDA, pp. 1–7 (2009)Google Scholar
  3. 3.
    The Canadian Institute for Cybersecurity (CIC), CICFlowMeter: The network traffic flow generator and alanlyzer (2017). https://github.com/ISCX/CICFlowMeter
  4. 4.
    Creech, G., Hu, J.L.: Generation of a new IDS test dataset: time to retire the KDD collection. In: 2013 IEEE Wireless Communications and Networking Conference (WCNC), pp. 4487–4492 (2013)Google Scholar
  5. 5.
    T.C. Center for Applied Internet Data Analysis (CAIDA): The CAIDA OC48 Peering Point Traces Dataset, San Jose, California (2002)Google Scholar
  6. 6.
    I.U. University of California: KDD cup 1999 dataset (1999). http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
  7. 7.
    T.C. Center for Applied Internet Data Analysis (CAIDA): CAIDA DDoS attack dataset (2007)Google Scholar
  8. 8.
    T.C. Center for Applied Internet Data Analysis (CAIDA): CAIDA anonymized internet traces 2016 dataset (2016)Google Scholar
  9. 9.
    Gharib, A., Sharafaldin, I., Habibi Lashkari, A., Ghorbani, A.A.: An evaluation framework for intrusion detection dataset. In: 2016 International Conference on Information Science and Security (ICISS), Thailand, pp. 1–6 (2016)Google Scholar
  10. 10.
    Ghorbani, A.A., Lu, W., Tavallaee, M.: Network Intrusion Detection and Prevention: Concepts and Techniques. Springer, Boston (2010).  https://doi.org/10.1007/978-0-387-88771-5CrossRefGoogle Scholar
  11. 11.
    T.S. Group: Defcon 8, 10 and 11 (2000). https://www.defcon.org/
  12. 12.
    Habibi Lashkari, A., Draper Gil, G., Mamun, M.S.I., Ghorbani, A.A.: Characterization of tor traffic using time based features. In: Proceedings of the 3rd International Conference on Information Systems Security and Privacy (ICISSP), Portugal, pp. 253–262 (2017)Google Scholar
  13. 13.
    Heidemann, J., Papdopoulos, C.: Uses and challenges for network datasets. In: Cybersecurity Applications Technology Conference For Homeland Security, CATCH 2009, pp. 73–82 (2009)Google Scholar
  14. 14.
    Koch, R., Golling, M.G., Rodosek, G.D.: Towards comparability of intrusion detection systems: new data sets. In: Proceedings of the TERENA Networking Conference, p. 7 (2017)Google Scholar
  15. 15.
    Sato M., Yamaki H., Takakura H.: Unknown attacks detection using feature extraction from anomaly-based IDS alerts. In: 2012 IEEE/IPSJ 12th International Symposium on Applications and the Internet (SAINT), pp. 273–277 (2012)Google Scholar
  16. 16.
    McHugh, J.: Testing intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln laboratory. ACM Trans. Inf. Syst. Secur. 3(4), 262–294 (2000)CrossRefGoogle Scholar
  17. 17.
    Nechaev, B., Allman, M., Paxson, V., Gurtov, A.: Lawrence Berkeley National Laboratory (LBNL)/ICSI enterprise tracing project (2004)Google Scholar
  18. 18.
    Nehinbe, J.O.: A simple method for improving intrusion detections in corporate networks. In: Weerasinghe, D. (ed.) ISDF 2009. LNICST, vol. 41, pp. 111–122. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-11530-1_13CrossRefGoogle Scholar
  19. 19.
    Nehinbe, J.O.: A critical evaluation of datasets for investigating IDSS and IPSS researches. In: IEEE 10th International Conference on CIS, pp. 92–97 (2011)Google Scholar
  20. 20.
    University of Massachusetts Amherst: Optimistic TCP hacking (2011). http://traces.cs.umass.edu
  21. 21.
    Pedregosa, F., et al.: Scikit-learn: machine learning in Python (2011)Google Scholar
  22. 22.
    Proebstel, E.P.: Characterizing and improving distributed network-based intrusion detection systems (NIDS): timestamp synchronization and sampled traffic. Master’s thesis, University of California DAVIS, CA, USA (2008)Google Scholar
  23. 23.
    Chitrakar, R., Huang, C.: Anomaly based intrusion detection using hybrid learning approach of combining k-medoids clustering and Naive Bayes classification (2012)Google Scholar
  24. 24.
    Umer, M.F., Sher, M., Bi, Y.: Flow-based intrusion detection: techniques and challenges. Comput. Secur. 70, 238–254 (2017). In: 8th WiCOM, pp. 1–5CrossRefGoogle Scholar
  25. 25.
    Sangster, B., et al.: Toward instrumenting network warfare competitions to generate labeled datasets. In: 2009 USENIX. USENIX: The Advanced Computing System Association (2009)Google Scholar
  26. 26.
    Scott, P., Wilkins, E.: Evaluating data mining procedures: techniques for generating artificial data sets. Inf. Softw. Technol. 41(9), 579–587 (1999)CrossRefGoogle Scholar
  27. 27.
    Sharafaldin, I., Gharib, A., Habibi Lashkari, A., Ghorbani, A.A.: Towards a reliable intrusion detection benchmark dataset. Softw. Netw. 2017, 177–200 (2017)CrossRefGoogle Scholar
  28. 28.
    Song, J., Takakura, H., Okabe, Y., Eto, M., Inoue, D., Nakao, K.: Statistical analysis of honeypot data and building of Kyoto 2006+ dataset for NIDS evaluation. In: Proceedings of the First Workshop on Building Analysis Datasets and Gathering Experience Returns for Security, pp. 29–36. ACM (2011)Google Scholar
  29. 29.
    Sperotto, A., Sadre, R., Vliet, F., Pras, A.: A labeled data set for flow-based intrusion detection. In: Proceedings of the 9th IEEE International Workshop on IP Operations and Management, IPOM 2009, pp. 39–50 (2009)CrossRefGoogle Scholar
  30. 30.
    Prusty, S., Levine, B.N., Liberatore, M.: Forensic Investigation of the OneSwarm Anonymous Filesharing System. In: ACM Conference on CCS (2011)Google Scholar
  31. 31.
    Tavallaee, M., Bagheri, E., Lu, W.,, Ghorbani, A.A.: A detailed analysis of the KDD cup 99 data set. In: 2009 IEEE SCISDA, pp. 1–6 (2009)Google Scholar
  32. 32.
    Xie, M., Hu, J.: Evaluating host-based anomaly detection systems: a preliminary analysis of ADFA-LD. In: Proceedings of the 6th IEEE International Congress on Image and Signal Processing (CISP 2013), pp. 1711–1716 (2013)Google Scholar
  33. 33.
    Skillicorn, D.: Understanding Complex Datasets: Data Mining with Matrix Decompositions. CRC Press, Boca Rato (2007). Evaluating host-based anomaly detection systems: a preliminary analysis of ADFA-LD. In: 2013 6th International Congress on Image and Signal Processing (CISP), vol. 03, pp. 1711–1716CrossRefGoogle Scholar
  34. 34.
    Xie, M., Hu, J., Slay, J.: Evaluating host-based anomaly detection systems: application of the one-class SVM algorithm to ADFA-LD. In: 2014 11th FSKD, pp. 978–982 (2014)Google Scholar
  35. 35.
    Sharafaldin, I., Habibi Lashkari, A., Ghorbani, A.A.: Toward generating a new intrusion detection dataset and intrusion traffic characterization. In: 4th International Conference on Information Systems Security and Privacy (ICISSP), Portugal, January 2018 (2017)Google Scholar
  36. 36.
    Szabó, G., Orincsay, D., Malomsoky, S., Szabó, I.: On the validation of traffic classification algorithms. In: Claypool, M., Uhlig, S. (eds.) PAM 2008. LNCS, vol. 4979, pp. 72–81. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-79232-1_8CrossRefGoogle Scholar
  37. 37.
    Garcia, S., Grill, M., Stiborek, J., Zunino, A.: An empirical comparison of botnet detection methods. Comput. Securi. 45, 100–123 (2014)CrossRefGoogle Scholar
  38. 38.
    Hofstede, R., Hendriks, L., Sperotto, A., Pras, A.: SSH compromise detection using NetFlow/IPFIX. ACM SIGCOMM Comput. Commun. Rev. 44(5), 20–26 (2014)CrossRefGoogle Scholar
  39. 39.
    Maciá-Fernández, G., Camacho, J., Magán-Carrión, R., García-Teodoro, P., Therón, R.: UGR ‘16: a new dataset for the evaluation of cyclostationarity-based network IDSs. Comput. Secur. 73, 411–424 (2018)CrossRefGoogle Scholar
  40. 40.
    De Lathauwer, L., De Moor, B., Vandewalle, J., B.S.S. by Higher-Order: Blind source separation by higher-order singular value decomposition. In: Proceeding of the 7th European Signal Processing Conference (EUSIPCO 1994), Edinburgh, UK, pp. 175–178 (1994)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Iman Sharafaldin
    • 1
  • Arash Habibi Lashkari
    • 1
    Email author
  • Ali A. Ghorbani
    • 1
  1. 1.Canadian Institute for Cybersecurity (CIC)University of New Brunswick (UNB)FrederictonCanada

Personalised recommendations