Abstract
To evaluate a network anomaly detection or prevention, it is essential to test using benchmark network traffic datasets. This chapter aims to provide a systematic hands-on approach to generate real-life intrusion dataset. It is organized in three major sections. Section 3.1 provides the basic concepts. Section 3.2 introduces several benchmark and real-life datasets. Finally, Sect. 3.3 provides a systematic approach toward generation of an unbiased real-life intrusion datasets. We establish the importance of intrusion datasets in the development and validation of a detection mechanism or a system, identify a set of requirements for effective dataset generation, and discuss several attack scenarios.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
A demilitarized zone is a network segment located between a secure local network and unsecure external networks (Internet). A DMZ usually contains servers that provide services to users on the external network, such as Web, mail, and DNS servers, that are hardened systems. Typically, two firewalls are installed to form the DMZ.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
References
Bhuyan, M.H., Bhattacharyya, D.K., Kalita, J.K.: RODD: an effective reference-based outlier detection technique for large datasets. In: Advanced Computing, vol. 133, pp. 76–84. Springer (2011)
Bhuyan, M.H., Bhattacharyya, D.K., Kalita, J.K.: Surveying port scans and their detection methodologies. Comp. J. 54(10), 1565–1581 (2011)
CACE Technologies: Winpcap. http://www.winpcap.org
CAIDA: The cooperative analysis for internet data analysis. http://www.caida.org (2011)
Cemerlic, A., Yang, L., Kizza, J.: Network intrusion detection based on Bayesian networks. In: Proceedings of the 20th International Conference on Software Engineering and Knowledge Engineering, SEKE’08, pp. 791–794. KSI, San Francisco (2008)
Cole, E.: Hackers Beware: Defending Your Network from the Wiley Hacker. New Riders Publishing, Thousand Oaks (2001)
Dainotti, A., Pescape, A.: Plab: a packet capture and analysis architecture (2004). Http://www.grid.unina.it/software/ITG/D-ITGpublications/TR-DIS-122004.pdf
Defcon: The Shmoo group. http://cctf.shmoo.com/ (2011)
Delooze, L.: Applying soft-computing techniques to intrusion detection. Ph.D. thesis, Computer Science Department, University of Colorado, Colorado Springs (2005)
Denning, D.E.: An intrusion-detection model. IEEE Trans. Softw. Eng. 13(2), 222–232 (1987)
Ghorbani, A.A., Lu, W., Tavallaee, M.: Network Intrusion Detection and Prevention: Concepts and Techniques. Advances in Information Security. Springer, US (2009)
Gogoi, P., Bhuyan, M.H., Bhattacharyya, D.K., Kalita, J.K.: Packet and flow-based network intrusion dataset. In: Proc. of the 5th International Conference on Contemporary Computing, vol. LNCS-CCIS 306, pp. 322–334. Springer (2012)
Jacobson, V., Leres, C., McCanne, S.: tcpdump. URL ftp://ftp.ee.lbl.gov/tcpdump.tar.gz
KDDcup99: Knowledge discovery in databases DARPA archive. http://www.kdd.ics.uci.edu/databases/kddcup99/task.html (1999)
Kendall, K.: A database of computer attacks for the evaluation of intrusion detection systems. Master’s thesis, MIT (1999)
Lazarevic, A., Ertoz, L., Kumar, V., Ozgur, A., Srivastava, J.: A comparative study of anomaly detection schemes in network intrusion detection. In: Proceedings of the 3rd SIAM International Conference on Data mining. SIAM (2003)
LBNL: Lawrence Berkeley National Laboratory and ICSI, LBNL/ICSI Enterprise Tracing Project. http://www.icir.org/enterprise-tracing/ (2005)
Lippmann, R.P., Fried, D.J., Graf, I., Haines, J.W., Kendall, K.R., McClung, D., Weber, D., Webster, S.E., Wyschogrod, D., Cunningham, R.K., Zissman, M.A.: Evaluating intrusion detection systems: the 1998 DARPA offline intrusion detection evaluation. In: DARPA Information Survivability Conference and Exposition, vol. 2, pp. 12–26 (2000)
Mahoney, M.V., Chan, P.K.: An analysis of the 1999 DARPA/Lincoln laboratory evaluation data for network anomaly detection. In: Proceedings of the 6th International Symposium on Recent Advances in Intrusion Detection, pp. 220–237. Springer (2003)
McCanne, S., Jacobson, V.: The BSD packet filter: a new architecture for user level packet capture. In: Proceedings of the Winter 1993 USENIX Conference, pp. 259–269. USENIX Association (1993)
McHugh, J.: Testing intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory. ACM Trans. Inf. Syst. Secur. 3(4), 262–294 (2000)
Mell, P., Hu, V., Lippmann, R., Haines, J., Zissman, M.: An overview of issues in testing intrusion detection systems. http://citeseer.ist.psu.edu/621355.html (2003)
MIT Lincoln Lab, Information Systems Technology Group: DARPA intrusion detection data sets. http://www.ll.mit.edu/mission/communications/ist/corpora/ideval/data/2000data.html (2000)
Muda, Z., Yassin, W., Sulaiman, M.N., Udzir, N.I.: A K-means and naive-bayes learning approach for better intrusion detection. Inf. Technol. J. 10(3), 648–655 (2011)
NSL-KDD: NSL-KDD data set for network-based intrusion detection systems. http://iscx.cs.unb.ca/NSL-KDD/ (2009)
Otey, M.E., Ghoting, A., Parthasarathy, S.: Fast distributed outlier detection in mixed-attribute data sets. Data Min. Knowl. Disc. 12(2–3), 203–228 (2006)
Pang, R., Allman, M., Bennett, M., Lee, J., Paxson, V., Tierney, B.: A first look at modern enterprise traffic. In: Proceedings of the 5th ACM SIGCOMM Conference on Internet Measurement, pp. 2–2. USENIX Association, Berkeley (2005)
Pang, R., Allman, M., Paxson, V., Lee, J.: The devil and packet trace anonymization. SIGCOMM Comput. Commun. Rev. 36(1), 29–38 (2006)
Portnoy, L., Eskin, E., Stolfo, S.: Intrusion detection with unlabeled data using clustering. In: Proceedings of the ACM CSS Workshop on on Data Mining Applied to Security, Philadelphia, pp. 5–8 (2001)
Shiravi, A., Shiravi, H., Tavallaee, M., Ghorbani, A.A.: Towards developing a systematic approach to generate benchmark datasets for intrusion detection. Comput. Secur. 31(3), 357–374 (2012)
Song, J., Takakura, H., Okabe, Y.: Description of Kyoto University Benchmark Data. http://www.takakura.com/Kyoto_data/BenchmarkData-Description-v3.pdf (2006)
Song, J., Takakura, H., Okabe, Y., Nakao, K.: Toward a more practical unsupervised anomaly detection system. Inf. Sci. 231 (2011). http://dx.doi.org/10.1016/j.ins.2011.08.011
Sperotto, A., Sadre, R., Vliet, F., Pras, A.: A labeled data set for flow-based intrusion detection. In: Proceedings of the 9th IEEE International Workshop on IP Operations and Management, IPOM ’09, pp. 39–50. Springer, Venice (2009)
Stolfo, S.J., Fan, W., Lee, W., Prodromidis, A., Chan, P.K.: Cost-based modeling for fraud and intrusion detection: results from the JAM project. In: Proceedings of the DARPA Information Survivability Conference and Exposition, vol. 2, pp. 130–144. IEEE CS (2000)
symantec.com: Symantec security response. http://securityresponse.symantec.com/avcenter
Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the KDD CUP 99 data set. In: Proceedings of the 2nd IEEE International Conference on Computational Intelligence for Security and Defense Applications, pp. 53–58. IEEE Press (2009)
Thomas, C., Sharma, V., Balakrishnan, N.: Usefulness of DARPA dataset for intrusion detection system evaluation. In: Proceedings of the Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security, 6973. SPIE, Orlando (2008)
UNIBS: University of Brescia Dataset. http://www.ing.unibs.it/ntw/tools/traces/ (2009)
Xu, J., Shelton, C.R.: Intrusion detection using continuous time Bayesian networks. J. Artif. Intell. Res. 39, 745–774 (2010)
Zhang, C., Zhang, G., Sun, S.: A mixed unsupervised clustering-based intrusion detection model. In: Proceedings of the 3rd International Conference on Genetic and Evolutionary Computing, pp. 426–428. IEEE CS (2009)
Zhang, Y.F., Xiong, Z.Y., Wang, X.Q.: Distributed intrusion detection based on clustering. In: Proceedings of the International Conference on Machine Learning and Cybernetics, vol. 4, pp. 2379–2383 (2005)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Bhuyan, M.H., Bhattacharyya, D.K., Kalita, J.K. (2017). A Systematic Hands-On Approach to Generate Real-Life Intrusion Datasets. In: Network Traffic Anomaly Detection and Prevention. Computer Communications and Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-65188-0_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-65188-0_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65186-6
Online ISBN: 978-3-319-65188-0
eBook Packages: Computer ScienceComputer Science (R0)