A Systematic Hands-On Approach to Generate Real-Life Intrusion Datasets

  • Monowar H. Bhuyan
  • Dhruba K. Bhattacharyya
  • Jugal K. Kalita
Part of the Computer Communications and Networks book series (CCN)


To evaluate a network anomaly detection or prevention, it is essential to test using benchmark network traffic datasets. This chapter aims to provide a systematic hands-on approach to generate real-life intrusion dataset. It is organized in three major sections. Section 3.1 provides the basic concepts. Section 3.2 introduces several benchmark and real-life datasets. Finally, Sect. 3.3 provides a systematic approach toward generation of an unbiased real-life intrusion datasets. We establish the importance of intrusion datasets in the development and validation of a detection mechanism or a system, identify a set of requirements for effective dataset generation, and discuss several attack scenarios.


  1. 1.
    Bhuyan, M.H., Bhattacharyya, D.K., Kalita, J.K.: RODD: an effective reference-based outlier detection technique for large datasets. In: Advanced Computing, vol. 133, pp. 76–84. Springer (2011)Google Scholar
  2. 2.
    Bhuyan, M.H., Bhattacharyya, D.K., Kalita, J.K.: Surveying port scans and their detection methodologies. Comp. J. 54(10), 1565–1581 (2011)CrossRefGoogle Scholar
  3. 3.
    CACE Technologies: Winpcap.
  4. 4.
    CAIDA: The cooperative analysis for internet data analysis. (2011)
  5. 5.
    Cemerlic, A., Yang, L., Kizza, J.: Network intrusion detection based on Bayesian networks. In: Proceedings of the 20th International Conference on Software Engineering and Knowledge Engineering, SEKE’08, pp. 791–794. KSI, San Francisco (2008)Google Scholar
  6. 6.
    Cole, E.: Hackers Beware: Defending Your Network from the Wiley Hacker. New Riders Publishing, Thousand Oaks (2001)Google Scholar
  7. 7.
    Dainotti, A., Pescape, A.: Plab: a packet capture and analysis architecture (2004). Http:// Google Scholar
  8. 8.
    Defcon: The Shmoo group. (2011)
  9. 9.
    Delooze, L.: Applying soft-computing techniques to intrusion detection. Ph.D. thesis, Computer Science Department, University of Colorado, Colorado Springs (2005)Google Scholar
  10. 10.
    Denning, D.E.: An intrusion-detection model. IEEE Trans. Softw. Eng. 13(2), 222–232 (1987)CrossRefGoogle Scholar
  11. 11.
    Ghorbani, A.A., Lu, W., Tavallaee, M.: Network Intrusion Detection and Prevention: Concepts and Techniques. Advances in Information Security. Springer, US (2009)Google Scholar
  12. 12.
    Gogoi, P., Bhuyan, M.H., Bhattacharyya, D.K., Kalita, J.K.: Packet and flow-based network intrusion dataset. In: Proc. of the 5th International Conference on Contemporary Computing, vol. LNCS-CCIS 306, pp. 322–334. Springer (2012)Google Scholar
  13. 13.
    Jacobson, V., Leres, C., McCanne, S.: tcpdump. URL
  14. 14.
    KDDcup99: Knowledge discovery in databases DARPA archive. (1999)
  15. 15.
    Kendall, K.: A database of computer attacks for the evaluation of intrusion detection systems. Master’s thesis, MIT (1999)Google Scholar
  16. 16.
    Lazarevic, A., Ertoz, L., Kumar, V., Ozgur, A., Srivastava, J.: A comparative study of anomaly detection schemes in network intrusion detection. In: Proceedings of the 3rd SIAM International Conference on Data mining. SIAM (2003)Google Scholar
  17. 17.
    LBNL: Lawrence Berkeley National Laboratory and ICSI, LBNL/ICSI Enterprise Tracing Project. (2005)
  18. 18.
    Lippmann, R.P., Fried, D.J., Graf, I., Haines, J.W., Kendall, K.R., McClung, D., Weber, D., Webster, S.E., Wyschogrod, D., Cunningham, R.K., Zissman, M.A.: Evaluating intrusion detection systems: the 1998 DARPA offline intrusion detection evaluation. In: DARPA Information Survivability Conference and Exposition, vol. 2, pp. 12–26 (2000)Google Scholar
  19. 19.
    Mahoney, M.V., Chan, P.K.: An analysis of the 1999 DARPA/Lincoln laboratory evaluation data for network anomaly detection. In: Proceedings of the 6th International Symposium on Recent Advances in Intrusion Detection, pp. 220–237. Springer (2003)Google Scholar
  20. 20.
    McCanne, S., Jacobson, V.: The BSD packet filter: a new architecture for user level packet capture. In: Proceedings of the Winter 1993 USENIX Conference, pp. 259–269. USENIX Association (1993)Google Scholar
  21. 21.
    McHugh, J.: Testing intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory. ACM Trans. Inf. Syst. Secur. 3(4), 262–294 (2000)CrossRefGoogle Scholar
  22. 22.
    Mell, P., Hu, V., Lippmann, R., Haines, J., Zissman, M.: An overview of issues in testing intrusion detection systems. (2003)
  23. 23.
    MIT Lincoln Lab, Information Systems Technology Group: DARPA intrusion detection data sets. (2000)
  24. 24.
    Muda, Z., Yassin, W., Sulaiman, M.N., Udzir, N.I.: A K-means and naive-bayes learning approach for better intrusion detection. Inf. Technol. J. 10(3), 648–655 (2011)CrossRefGoogle Scholar
  25. 25.
    NSL-KDD: NSL-KDD data set for network-based intrusion detection systems. (2009)
  26. 26.
    Otey, M.E., Ghoting, A., Parthasarathy, S.: Fast distributed outlier detection in mixed-attribute data sets. Data Min. Knowl. Disc. 12(2–3), 203–228 (2006)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Pang, R., Allman, M., Bennett, M., Lee, J., Paxson, V., Tierney, B.: A first look at modern enterprise traffic. In: Proceedings of the 5th ACM SIGCOMM Conference on Internet Measurement, pp. 2–2. USENIX Association, Berkeley (2005)Google Scholar
  28. 28.
    Pang, R., Allman, M., Paxson, V., Lee, J.: The devil and packet trace anonymization. SIGCOMM Comput. Commun. Rev. 36(1), 29–38 (2006)CrossRefGoogle Scholar
  29. 29.
    Portnoy, L., Eskin, E., Stolfo, S.: Intrusion detection with unlabeled data using clustering. In: Proceedings of the ACM CSS Workshop on on Data Mining Applied to Security, Philadelphia, pp. 5–8 (2001)Google Scholar
  30. 30.
    Shiravi, A., Shiravi, H., Tavallaee, M., Ghorbani, A.A.: Towards developing a systematic approach to generate benchmark datasets for intrusion detection. Comput. Secur. 31(3), 357–374 (2012)CrossRefGoogle Scholar
  31. 31.
    Song, J., Takakura, H., Okabe, Y.: Description of Kyoto University Benchmark Data. (2006)
  32. 32.
    Song, J., Takakura, H., Okabe, Y., Nakao, K.: Toward a more practical unsupervised anomaly detection system. Inf. Sci. 231 (2011).
  33. 33.
    Sperotto, A., Sadre, R., Vliet, F., Pras, A.: A labeled data set for flow-based intrusion detection. In: Proceedings of the 9th IEEE International Workshop on IP Operations and Management, IPOM ’09, pp. 39–50. Springer, Venice (2009)Google Scholar
  34. 34.
    Stolfo, S.J., Fan, W., Lee, W., Prodromidis, A., Chan, P.K.: Cost-based modeling for fraud and intrusion detection: results from the JAM project. In: Proceedings of the DARPA Information Survivability Conference and Exposition, vol. 2, pp. 130–144. IEEE CS (2000)Google Scholar
  35. 35. Symantec security response.
  36. 36.
    Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the KDD CUP 99 data set. In: Proceedings of the 2nd IEEE International Conference on Computational Intelligence for Security and Defense Applications, pp. 53–58. IEEE Press (2009)Google Scholar
  37. 37.
    Thomas, C., Sharma, V., Balakrishnan, N.: Usefulness of DARPA dataset for intrusion detection system evaluation. In: Proceedings of the Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security, 6973. SPIE, Orlando (2008)Google Scholar
  38. 38.
    UNIBS: University of Brescia Dataset. (2009)
  39. 39.
    Xu, J., Shelton, C.R.: Intrusion detection using continuous time Bayesian networks. J. Artif. Intell. Res. 39, 745–774 (2010)MathSciNetzbMATHGoogle Scholar
  40. 40.
    Zhang, C., Zhang, G., Sun, S.: A mixed unsupervised clustering-based intrusion detection model. In: Proceedings of the 3rd International Conference on Genetic and Evolutionary Computing, pp. 426–428. IEEE CS (2009)Google Scholar
  41. 41.
    Zhang, Y.F., Xiong, Z.Y., Wang, X.Q.: Distributed intrusion detection based on clustering. In: Proceedings of the International Conference on Machine Learning and Cybernetics, vol. 4, pp. 2379–2383 (2005)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Monowar H. Bhuyan
    • 1
  • Dhruba K. Bhattacharyya
    • 2
  • Jugal K. Kalita
    • 3
  1. 1.Kaziranga UniversityJorhatIndia
  2. 2.Tezpur UniversityNapaamIndia
  3. 3.University of ColoradoColorado SpringsUSA

Personalised recommendations