Skip to main content

A Systematic Hands-On Approach to Generate Real-Life Intrusion Datasets

  • Chapter
  • First Online:
Network Traffic Anomaly Detection and Prevention

Abstract

To evaluate a network anomaly detection or prevention, it is essential to test using benchmark network traffic datasets. This chapter aims to provide a systematic hands-on approach to generate real-life intrusion dataset. It is organized in three major sections. Section 3.1 provides the basic concepts. Section 3.2 introduces several benchmark and real-life datasets. Finally, Sect. 3.3 provides a systematic approach toward generation of an unbiased real-life intrusion datasets. We establish the importance of intrusion datasets in the development and validation of a detection mechanism or a system, identify a set of requirements for effective dataset generation, and discuss several attack scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 89.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.iscx.ca/NSL-KDD/

  2. 2.

    http://www.ll.mit.edu/mission/communications/ist/corpora/ideval/data/index.html

  3. 3.

    http://cctf.shmoo.com/data/

  4. 4.

    http://www.caida.org/home/

  5. 5.

    http://www.takakura.com/kyoto_data

  6. 6.

    http://agnigarh.tezu.ernet.in/~dkb/resources.html

  7. 7.

    A demilitarized zone is a network segment located between a secure local network and unsecure external networks (Internet). A DMZ usually contains servers that provide services to users on the external network, such as Web, mail, and DNS servers, that are hardened systems. Typically, two firewalls are installed to form the DMZ.

  8. 8.

    http://packetstormsecurity.com/

  9. 9.

    http://nmap.org/

  10. 10.

    http://rnmap.sourceforge.net/

  11. 11.

    http://www.securitytube-tools.net/

  12. 12.

    http://sourceforge.net/projects/loic/

  13. 13.

    http://staff.washington.edu/corey/gulp/

  14. 14.

    http://www.packet-craft.net/

References

  1. Bhuyan, M.H., Bhattacharyya, D.K., Kalita, J.K.: RODD: an effective reference-based outlier detection technique for large datasets. In: Advanced Computing, vol. 133, pp. 76–84. Springer (2011)

    Google Scholar 

  2. Bhuyan, M.H., Bhattacharyya, D.K., Kalita, J.K.: Surveying port scans and their detection methodologies. Comp. J. 54(10), 1565–1581 (2011)

    Article  Google Scholar 

  3. CACE Technologies: Winpcap. http://www.winpcap.org

  4. CAIDA: The cooperative analysis for internet data analysis. http://www.caida.org (2011)

  5. Cemerlic, A., Yang, L., Kizza, J.: Network intrusion detection based on Bayesian networks. In: Proceedings of the 20th International Conference on Software Engineering and Knowledge Engineering, SEKE’08, pp. 791–794. KSI, San Francisco (2008)

    Google Scholar 

  6. Cole, E.: Hackers Beware: Defending Your Network from the Wiley Hacker. New Riders Publishing, Thousand Oaks (2001)

    Google Scholar 

  7. Dainotti, A., Pescape, A.: Plab: a packet capture and analysis architecture (2004). Http://www.grid.unina.it/software/ITG/D-ITGpublications/TR-DIS-122004.pdf

    Google Scholar 

  8. Defcon: The Shmoo group. http://cctf.shmoo.com/ (2011)

  9. Delooze, L.: Applying soft-computing techniques to intrusion detection. Ph.D. thesis, Computer Science Department, University of Colorado, Colorado Springs (2005)

    Google Scholar 

  10. Denning, D.E.: An intrusion-detection model. IEEE Trans. Softw. Eng. 13(2), 222–232 (1987)

    Article  Google Scholar 

  11. Ghorbani, A.A., Lu, W., Tavallaee, M.: Network Intrusion Detection and Prevention: Concepts and Techniques. Advances in Information Security. Springer, US (2009)

    Google Scholar 

  12. Gogoi, P., Bhuyan, M.H., Bhattacharyya, D.K., Kalita, J.K.: Packet and flow-based network intrusion dataset. In: Proc. of the 5th International Conference on Contemporary Computing, vol. LNCS-CCIS 306, pp. 322–334. Springer (2012)

    Google Scholar 

  13. Jacobson, V., Leres, C., McCanne, S.: tcpdump. URL ftp://ftp.ee.lbl.gov/tcpdump.tar.gz

  14. KDDcup99: Knowledge discovery in databases DARPA archive. http://www.kdd.ics.uci.edu/databases/kddcup99/task.html (1999)

  15. Kendall, K.: A database of computer attacks for the evaluation of intrusion detection systems. Master’s thesis, MIT (1999)

    Google Scholar 

  16. Lazarevic, A., Ertoz, L., Kumar, V., Ozgur, A., Srivastava, J.: A comparative study of anomaly detection schemes in network intrusion detection. In: Proceedings of the 3rd SIAM International Conference on Data mining. SIAM (2003)

    Google Scholar 

  17. LBNL: Lawrence Berkeley National Laboratory and ICSI, LBNL/ICSI Enterprise Tracing Project. http://www.icir.org/enterprise-tracing/ (2005)

  18. Lippmann, R.P., Fried, D.J., Graf, I., Haines, J.W., Kendall, K.R., McClung, D., Weber, D., Webster, S.E., Wyschogrod, D., Cunningham, R.K., Zissman, M.A.: Evaluating intrusion detection systems: the 1998 DARPA offline intrusion detection evaluation. In: DARPA Information Survivability Conference and Exposition, vol. 2, pp. 12–26 (2000)

    Google Scholar 

  19. Mahoney, M.V., Chan, P.K.: An analysis of the 1999 DARPA/Lincoln laboratory evaluation data for network anomaly detection. In: Proceedings of the 6th International Symposium on Recent Advances in Intrusion Detection, pp. 220–237. Springer (2003)

    Google Scholar 

  20. McCanne, S., Jacobson, V.: The BSD packet filter: a new architecture for user level packet capture. In: Proceedings of the Winter 1993 USENIX Conference, pp. 259–269. USENIX Association (1993)

    Google Scholar 

  21. McHugh, J.: Testing intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory. ACM Trans. Inf. Syst. Secur. 3(4), 262–294 (2000)

    Article  Google Scholar 

  22. Mell, P., Hu, V., Lippmann, R., Haines, J., Zissman, M.: An overview of issues in testing intrusion detection systems. http://citeseer.ist.psu.edu/621355.html (2003)

  23. MIT Lincoln Lab, Information Systems Technology Group: DARPA intrusion detection data sets. http://www.ll.mit.edu/mission/communications/ist/corpora/ideval/data/2000data.html (2000)

  24. Muda, Z., Yassin, W., Sulaiman, M.N., Udzir, N.I.: A K-means and naive-bayes learning approach for better intrusion detection. Inf. Technol. J. 10(3), 648–655 (2011)

    Article  Google Scholar 

  25. NSL-KDD: NSL-KDD data set for network-based intrusion detection systems. http://iscx.cs.unb.ca/NSL-KDD/ (2009)

  26. Otey, M.E., Ghoting, A., Parthasarathy, S.: Fast distributed outlier detection in mixed-attribute data sets. Data Min. Knowl. Disc. 12(2–3), 203–228 (2006)

    Article  MathSciNet  Google Scholar 

  27. Pang, R., Allman, M., Bennett, M., Lee, J., Paxson, V., Tierney, B.: A first look at modern enterprise traffic. In: Proceedings of the 5th ACM SIGCOMM Conference on Internet Measurement, pp. 2–2. USENIX Association, Berkeley (2005)

    Google Scholar 

  28. Pang, R., Allman, M., Paxson, V., Lee, J.: The devil and packet trace anonymization. SIGCOMM Comput. Commun. Rev. 36(1), 29–38 (2006)

    Article  Google Scholar 

  29. Portnoy, L., Eskin, E., Stolfo, S.: Intrusion detection with unlabeled data using clustering. In: Proceedings of the ACM CSS Workshop on on Data Mining Applied to Security, Philadelphia, pp. 5–8 (2001)

    Google Scholar 

  30. Shiravi, A., Shiravi, H., Tavallaee, M., Ghorbani, A.A.: Towards developing a systematic approach to generate benchmark datasets for intrusion detection. Comput. Secur. 31(3), 357–374 (2012)

    Article  Google Scholar 

  31. Song, J., Takakura, H., Okabe, Y.: Description of Kyoto University Benchmark Data. http://www.takakura.com/Kyoto_data/BenchmarkData-Description-v3.pdf (2006)

  32. Song, J., Takakura, H., Okabe, Y., Nakao, K.: Toward a more practical unsupervised anomaly detection system. Inf. Sci. 231 (2011). http://dx.doi.org/10.1016/j.ins.2011.08.011

  33. Sperotto, A., Sadre, R., Vliet, F., Pras, A.: A labeled data set for flow-based intrusion detection. In: Proceedings of the 9th IEEE International Workshop on IP Operations and Management, IPOM ’09, pp. 39–50. Springer, Venice (2009)

    Google Scholar 

  34. Stolfo, S.J., Fan, W., Lee, W., Prodromidis, A., Chan, P.K.: Cost-based modeling for fraud and intrusion detection: results from the JAM project. In: Proceedings of the DARPA Information Survivability Conference and Exposition, vol. 2, pp. 130–144. IEEE CS (2000)

    Google Scholar 

  35. symantec.com: Symantec security response. http://securityresponse.symantec.com/avcenter

  36. Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the KDD CUP 99 data set. In: Proceedings of the 2nd IEEE International Conference on Computational Intelligence for Security and Defense Applications, pp. 53–58. IEEE Press (2009)

    Google Scholar 

  37. Thomas, C., Sharma, V., Balakrishnan, N.: Usefulness of DARPA dataset for intrusion detection system evaluation. In: Proceedings of the Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security, 6973. SPIE, Orlando (2008)

    Google Scholar 

  38. UNIBS: University of Brescia Dataset. http://www.ing.unibs.it/ntw/tools/traces/ (2009)

  39. Xu, J., Shelton, C.R.: Intrusion detection using continuous time Bayesian networks. J. Artif. Intell. Res. 39, 745–774 (2010)

    MathSciNet  MATH  Google Scholar 

  40. Zhang, C., Zhang, G., Sun, S.: A mixed unsupervised clustering-based intrusion detection model. In: Proceedings of the 3rd International Conference on Genetic and Evolutionary Computing, pp. 426–428. IEEE CS (2009)

    Google Scholar 

  41. Zhang, Y.F., Xiong, Z.Y., Wang, X.Q.: Distributed intrusion detection based on clustering. In: Proceedings of the International Conference on Machine Learning and Cybernetics, vol. 4, pp. 2379–2383 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Bhuyan, M.H., Bhattacharyya, D.K., Kalita, J.K. (2017). A Systematic Hands-On Approach to Generate Real-Life Intrusion Datasets. In: Network Traffic Anomaly Detection and Prevention. Computer Communications and Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-65188-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-65188-0_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-65186-6

  • Online ISBN: 978-3-319-65188-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics