A Systematic Hands-On Approach to Generate Real-Life Intrusion Datasets

Bhuyan, Monowar H.; Bhattacharyya, Dhruba K.; Kalita, Jugal K.

doi:10.1007/978-3-319-65188-0_3

Monowar H. Bhuyan⁵,
Dhruba K. Bhattacharyya⁶ &
Jugal K. Kalita⁷

Part of the book series: Computer Communications and Networks ((CCN))

1612 Accesses
1 Citations

Abstract

To evaluate a network anomaly detection or prevention, it is essential to test using benchmark network traffic datasets. This chapter aims to provide a systematic hands-on approach to generate real-life intrusion dataset. It is organized in three major sections. Section 3.1 provides the basic concepts. Section 3.2 introduces several benchmark and real-life datasets. Finally, Sect. 3.3 provides a systematic approach toward generation of an unbiased real-life intrusion datasets. We establish the importance of intrusion datasets in the development and validation of a detection mechanism or a system, identify a set of requirements for effective dataset generation, and discuss several attack scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Hardcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.iscx.ca/NSL-KDD/
2.
http://www.ll.mit.edu/mission/communications/ist/corpora/ideval/data/index.html
3.
http://cctf.shmoo.com/data/
4.
http://www.caida.org/home/
5.
http://www.takakura.com/kyoto_data
6.
http://agnigarh.tezu.ernet.in/~dkb/resources.html
7.
A demilitarized zone is a network segment located between a secure local network and unsecure external networks (Internet). A DMZ usually contains servers that provide services to users on the external network, such as Web, mail, and DNS servers, that are hardened systems. Typically, two firewalls are installed to form the DMZ.
8.
http://packetstormsecurity.com/
9.
http://nmap.org/
10.
http://rnmap.sourceforge.net/
11.
http://www.securitytube-tools.net/
12.
http://sourceforge.net/projects/loic/
13.
http://staff.washington.edu/corey/gulp/
14.
http://www.packet-craft.net/

References

Bhuyan, M.H., Bhattacharyya, D.K., Kalita, J.K.: RODD: an effective reference-based outlier detection technique for large datasets. In: Advanced Computing, vol. 133, pp. 76–84. Springer (2011)
Google Scholar
Bhuyan, M.H., Bhattacharyya, D.K., Kalita, J.K.: Surveying port scans and their detection methodologies. Comp. J. 54(10), 1565–1581 (2011)
Article Google Scholar
CACE Technologies: Winpcap. http://www.winpcap.org
CAIDA: The cooperative analysis for internet data analysis. http://www.caida.org (2011)
Cemerlic, A., Yang, L., Kizza, J.: Network intrusion detection based on Bayesian networks. In: Proceedings of the 20th International Conference on Software Engineering and Knowledge Engineering, SEKE’08, pp. 791–794. KSI, San Francisco (2008)
Google Scholar
Cole, E.: Hackers Beware: Defending Your Network from the Wiley Hacker. New Riders Publishing, Thousand Oaks (2001)
Google Scholar
Dainotti, A., Pescape, A.: Plab: a packet capture and analysis architecture (2004). Http://www.grid.unina.it/software/ITG/D-ITGpublications/TR-DIS-122004.pdf
Google Scholar
Defcon: The Shmoo group. http://cctf.shmoo.com/ (2011)
Delooze, L.: Applying soft-computing techniques to intrusion detection. Ph.D. thesis, Computer Science Department, University of Colorado, Colorado Springs (2005)
Google Scholar
Denning, D.E.: An intrusion-detection model. IEEE Trans. Softw. Eng. 13(2), 222–232 (1987)
Article Google Scholar
Ghorbani, A.A., Lu, W., Tavallaee, M.: Network Intrusion Detection and Prevention: Concepts and Techniques. Advances in Information Security. Springer, US (2009)
Google Scholar
Gogoi, P., Bhuyan, M.H., Bhattacharyya, D.K., Kalita, J.K.: Packet and flow-based network intrusion dataset. In: Proc. of the 5th International Conference on Contemporary Computing, vol. LNCS-CCIS 306, pp. 322–334. Springer (2012)
Google Scholar
Jacobson, V., Leres, C., McCanne, S.: tcpdump. URL ftp://ftp.ee.lbl.gov/tcpdump.tar.gz
KDDcup99: Knowledge discovery in databases DARPA archive. http://www.kdd.ics.uci.edu/databases/kddcup99/task.html (1999)
Kendall, K.: A database of computer attacks for the evaluation of intrusion detection systems. Master’s thesis, MIT (1999)
Google Scholar
Lazarevic, A., Ertoz, L., Kumar, V., Ozgur, A., Srivastava, J.: A comparative study of anomaly detection schemes in network intrusion detection. In: Proceedings of the 3rd SIAM International Conference on Data mining. SIAM (2003)
Google Scholar
LBNL: Lawrence Berkeley National Laboratory and ICSI, LBNL/ICSI Enterprise Tracing Project. http://www.icir.org/enterprise-tracing/ (2005)
Lippmann, R.P., Fried, D.J., Graf, I., Haines, J.W., Kendall, K.R., McClung, D., Weber, D., Webster, S.E., Wyschogrod, D., Cunningham, R.K., Zissman, M.A.: Evaluating intrusion detection systems: the 1998 DARPA offline intrusion detection evaluation. In: DARPA Information Survivability Conference and Exposition, vol. 2, pp. 12–26 (2000)
Google Scholar
Mahoney, M.V., Chan, P.K.: An analysis of the 1999 DARPA/Lincoln laboratory evaluation data for network anomaly detection. In: Proceedings of the 6th International Symposium on Recent Advances in Intrusion Detection, pp. 220–237. Springer (2003)
Google Scholar
McCanne, S., Jacobson, V.: The BSD packet filter: a new architecture for user level packet capture. In: Proceedings of the Winter 1993 USENIX Conference, pp. 259–269. USENIX Association (1993)
Google Scholar
McHugh, J.: Testing intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory. ACM Trans. Inf. Syst. Secur. 3(4), 262–294 (2000)
Article Google Scholar
Mell, P., Hu, V., Lippmann, R., Haines, J., Zissman, M.: An overview of issues in testing intrusion detection systems. http://citeseer.ist.psu.edu/621355.html (2003)
MIT Lincoln Lab, Information Systems Technology Group: DARPA intrusion detection data sets. http://www.ll.mit.edu/mission/communications/ist/corpora/ideval/data/2000data.html (2000)
Muda, Z., Yassin, W., Sulaiman, M.N., Udzir, N.I.: A K-means and naive-bayes learning approach for better intrusion detection. Inf. Technol. J. 10(3), 648–655 (2011)
Article Google Scholar
NSL-KDD: NSL-KDD data set for network-based intrusion detection systems. http://iscx.cs.unb.ca/NSL-KDD/ (2009)
Otey, M.E., Ghoting, A., Parthasarathy, S.: Fast distributed outlier detection in mixed-attribute data sets. Data Min. Knowl. Disc. 12(2–3), 203–228 (2006)
Article MathSciNet Google Scholar
Pang, R., Allman, M., Bennett, M., Lee, J., Paxson, V., Tierney, B.: A first look at modern enterprise traffic. In: Proceedings of the 5th ACM SIGCOMM Conference on Internet Measurement, pp. 2–2. USENIX Association, Berkeley (2005)
Google Scholar
Pang, R., Allman, M., Paxson, V., Lee, J.: The devil and packet trace anonymization. SIGCOMM Comput. Commun. Rev. 36(1), 29–38 (2006)
Article Google Scholar
Portnoy, L., Eskin, E., Stolfo, S.: Intrusion detection with unlabeled data using clustering. In: Proceedings of the ACM CSS Workshop on on Data Mining Applied to Security, Philadelphia, pp. 5–8 (2001)
Google Scholar
Shiravi, A., Shiravi, H., Tavallaee, M., Ghorbani, A.A.: Towards developing a systematic approach to generate benchmark datasets for intrusion detection. Comput. Secur. 31(3), 357–374 (2012)
Article Google Scholar
Song, J., Takakura, H., Okabe, Y.: Description of Kyoto University Benchmark Data. http://www.takakura.com/Kyoto_data/BenchmarkData-Description-v3.pdf (2006)
Song, J., Takakura, H., Okabe, Y., Nakao, K.: Toward a more practical unsupervised anomaly detection system. Inf. Sci. 231 (2011). http://dx.doi.org/10.1016/j.ins.2011.08.011
Sperotto, A., Sadre, R., Vliet, F., Pras, A.: A labeled data set for flow-based intrusion detection. In: Proceedings of the 9th IEEE International Workshop on IP Operations and Management, IPOM ’09, pp. 39–50. Springer, Venice (2009)
Google Scholar
Stolfo, S.J., Fan, W., Lee, W., Prodromidis, A., Chan, P.K.: Cost-based modeling for fraud and intrusion detection: results from the JAM project. In: Proceedings of the DARPA Information Survivability Conference and Exposition, vol. 2, pp. 130–144. IEEE CS (2000)
Google Scholar
symantec.com: Symantec security response. http://securityresponse.symantec.com/avcenter
Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the KDD CUP 99 data set. In: Proceedings of the 2nd IEEE International Conference on Computational Intelligence for Security and Defense Applications, pp. 53–58. IEEE Press (2009)
Google Scholar
Thomas, C., Sharma, V., Balakrishnan, N.: Usefulness of DARPA dataset for intrusion detection system evaluation. In: Proceedings of the Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security, 6973. SPIE, Orlando (2008)
Google Scholar
UNIBS: University of Brescia Dataset. http://www.ing.unibs.it/ntw/tools/traces/ (2009)
Xu, J., Shelton, C.R.: Intrusion detection using continuous time Bayesian networks. J. Artif. Intell. Res. 39, 745–774 (2010)
MathSciNet MATH Google Scholar
Zhang, C., Zhang, G., Sun, S.: A mixed unsupervised clustering-based intrusion detection model. In: Proceedings of the 3rd International Conference on Genetic and Evolutionary Computing, pp. 426–428. IEEE CS (2009)
Google Scholar
Zhang, Y.F., Xiong, Z.Y., Wang, X.Q.: Distributed intrusion detection based on clustering. In: Proceedings of the International Conference on Machine Learning and Cybernetics, vol. 4, pp. 2379–2383 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Kaziranga University, Jorhat, India
Monowar H. Bhuyan
Tezpur University, Napaam, India
Dhruba K. Bhattacharyya
University of Colorado, Colorado Springs, Colorado, USA
Jugal K. Kalita

Authors

Monowar H. Bhuyan
View author publications
You can also search for this author in PubMed Google Scholar
Dhruba K. Bhattacharyya
View author publications
You can also search for this author in PubMed Google Scholar
Jugal K. Kalita
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bhuyan, M.H., Bhattacharyya, D.K., Kalita, J.K. (2017). A Systematic Hands-On Approach to Generate Real-Life Intrusion Datasets. In: Network Traffic Anomaly Detection and Prevention. Computer Communications and Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-65188-0_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-65188-0_3
Published: 05 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65186-6
Online ISBN: 978-3-319-65188-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics