A Novel Approach for Generating Synthetic Datasets for Digital Forensics

Göbel, Thomas; Schäfer, Thomas; Hachenberger, Julien; Türr, Jan; Baier, Harald

doi:10.1007/978-3-030-56223-6_5

Thomas Göbel^17,18,
Thomas Schäfer¹⁸,
Julien Hachenberger¹⁹,
Jan Türr²⁰ &
…
Harald Baier^17,18

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 589))

Included in the following conference series:

IFIP International Conference on Digital Forensics

586 Accesses
7 Citations

Abstract

Increases in the quantity and complexity of digital evidence necessitate the development and application of advanced, accurate and efficient digital forensic tools. Digital forensic tool testing helps assure the veracity of digital evidence, but it requires appropriate validation datasets. The datasets are crucial to evaluating reproducibility and improving the state of the art. Datasets can be real-world or synthetic. While real-world datasets have the advantage of relevance, the interpretation of results can be difficult because reliable ground truth may not exist. In contrast, ground truth is easily established for synthetic datasets.

This chapter presents the hystck framework for generating synthetic datasets with ground truth. The framework supports the automated generation of synthetic network traffic and operating system and application artifacts by simulating human-computer interactions. The generated data can be indistinguishable from data generated by normal human-computer interactions. The modular structure of the framework enhances the ability to incorporate extensions that simulate new applications and generate new types of network traffic.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Synthetic Data Generation and Defense in Depth Measurement of Web Applications

Application of Synthetic Data Generation Methods to the Detection of Network Attacks on Internet of Things Devices

Article 01 December 2021

SDGen: A Scalable, Reproducible and Flexible Approach to Generate Real World Cyber Security Datasets

References

S. Abt and H. Baier, Are we missing labels? A study of the availability of ground truth in network security research, Proceedings of the Third International Workshop on Building Analysis Datasets and Gathering Experience Returns for Security, pp. 40–55, 2014.
Google Scholar
D. Brauckhoff, A. Wagner and M. May, FLAME: A flow-level anomaly modeling engine, Proceedings of the Conference on Cyber Security Experimentation and Test, article no. 1, 2008.
Google Scholar
G. Brogi and V. Tong, Sharing and replaying attack scenarios with Moirai, presented at the Rendezvous de la Recherche et de l’Enseignement de la Sécurité des Systèmes d’Information (Information Systems Security Research and Education Meeting), 2017.
Google Scholar
Carrier, B.: Open Source Digital Forensic Tools: The Legal Argument, @stake. Massachusetts, Cambridge (2002).
Google Scholar
R. Chinchilla, J. Hoag, D. Koonce, H. Kruse, S. Osterman and Y. Wang, Characterization of Internet traffic and user classification: Foundations for the next generation of network emulation, Proceedings of the Tenth International Conference on Telecommunications Systems, Modeling and Analysis, 2002.
Google Scholar
C. Cordero, E. Vasilomanolakis, N. Milanov, C. Koch, D. Hausheer and M. Muhlhauser, ID2T: A DIY dataset creation toolkit for intrusion detection systems, Proceedings of the IEEE Conference on Communications and Network Security, pp. 739–740, 2015.
Google Scholar
C. Grajeda, F. Breitinger and I. Baggili, Availability of datasets for digital forensics – And what is missing, Digital Investigation, vol. 22(S), pp. S94–S105, 2017.
Google Scholar
M. Hibler, R. Ricci, L. Stoller, J. Duerig, S. Guruprasad, T. Stack, K. Webb and J. Lepreau, Large-scale virtualization in the Emulab network testbed, Proceedings of the USENIX Annual Technical Conference, pp. 113–128, 2008.
Google Scholar
M. Mahoney and P. Chan, An analysis of the 1999 DARPA/Lincoln Laboratory evaluation data for network anomaly detection, Proceedings of the International Workshop on Recent Advances in Intrusion Detection, pp. 220–237, 2003.
Google Scholar
C. Moch and F. Freiling, The Forensic Image Generator Generator (Forensig\(^2\)), Proceedings of the Fifth International Conference on IT Security Incident Management and IT Forensics, pp. 78–93, 2009.
Google Scholar
C. Moch and F. Freiling, Evaluating the Forensic Image Generator Generator, Proceedings of the International Conference on Digital Forensics and Cyber Crime, pp. 238–252, 2011.
Google Scholar
S. Molnar, P. Megyesi and G. Szabo, How to validate traffic generators? Proceedings of the IEEE International Conference on Communications Workshops, pp. 1340–1344, 2013.
Google Scholar
National Institute of Standards and Technology, Computer Forensic Tool Testing (CFTT) Program, Gaithersburg, Maryland (www.nist.gov/itl/ssd/software-quality-group/computer-forensics-tool-testing-program-cftt), 2019.
Google Scholar
National Institute of Standards and Technology, The CFReDS Project, Gaithersburg, Maryland (www.cfreds.nist.gov), 2019.
Google Scholar
Ring, M., Wunderlich, S., Scheuring, D., Landes, D., Hotho, A.: A survey of network-based intrusion detection datasets. Computers and Security 86, 147–167 (2019).
Article Google Scholar
L. Rossey, R. Cunningham, D. Fried, J. Rabek, R. Lippmann, J. Haines and M. Zissman, LARIAT: Lincoln adaptable real-time information assurance testbed, Proceedings of the IEEE Aerospace Conference, 2002.
Google Scholar
M. Scanlon, X. Du and D. Lillis, EviPlant: An efficient digital forensics challenge creation, manipulation and distribution solution, Digital Investigation, vol. 20(S), pp. S29–S36, 2017.
Google Scholar
H. Visti, S. Tohill and P. Douglas, Automatic creation of computer forensic test images, in Computational Forensics, U. Garain and F. Shafait (Eds.), Springer, Cham, Switzerland, pp. 163–175, 2015.
Google Scholar
C. Wright, C. Connelly, T. Braje, J. Rabek, L. Rossey and R. Cunningham, Generating client workloads and high-fidelity network traffic for controllable repeatable experiments in computer security, Proceedings of the International Workshop on Recent Advances in Intrusion Detection, pp. 218–237, 2010.
Google Scholar

Download references

Author information

Authors and Affiliations

Darmstadt University of Applied Sciences, Darmstadt, Germany
Thomas Göbel & Harald Baier
National Research Center for Applied Cybersecurity, Darmstadt, Germany
Thomas Göbel, Thomas Schäfer & Harald Baier
Fraunhofer Institute for Secure Information Technology, Darmstadt, Germany
Julien Hachenberger
Technical University Darmstadt, Darmstadt, Germany
Jan Türr

Authors

Thomas Göbel
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Schäfer
View author publications
You can also search for this author in PubMed Google Scholar
Julien Hachenberger
View author publications
You can also search for this author in PubMed Google Scholar
Jan Türr
View author publications
You can also search for this author in PubMed Google Scholar
Harald Baier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Göbel .

Editor information

Editors and Affiliations

Department of Electrical and Computer Engineering, Air Force Institute of Technology, Wright-Patterson AFB, OH, USA
Gilbert Peterson
Tandy School of Computer Science, University of Tulsa, Tulsa, OK, USA
Sujeet Shenoi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Göbel, T., Schäfer, T., Hachenberger, J., Türr, J., Baier, H. (2020). A Novel Approach for Generating Synthetic Datasets for Digital Forensics. In: Peterson, G., Shenoi, S. (eds) Advances in Digital Forensics XVI. DigitalForensics 2020. IFIP Advances in Information and Communication Technology, vol 589. Springer, Cham. https://doi.org/10.1007/978-3-030-56223-6_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-56223-6_5
Published: 06 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-56222-9
Online ISBN: 978-3-030-56223-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)

A Novel Approach for Generating Synthetic Datasets for Digital Forensics

Abstract

Access this chapter

Preview

Similar content being viewed by others

Synthetic Data Generation and Defense in Depth Measurement of Web Applications

Application of Synthetic Data Generation Methods to the Detection of Network Attacks on Internet of Things Devices

SDGen: A Scalable, Reproducible and Flexible Approach to Generate Real World Cyber Security Datasets

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

A Novel Approach for Generating Synthetic Datasets for Digital Forensics

Abstract

Access this chapter

Preview

Similar content being viewed by others

Synthetic Data Generation and Defense in Depth Measurement of Web Applications

Application of Synthetic Data Generation Methods to the Detection of Network Attacks on Internet of Things Devices

SDGen: A Scalable, Reproducible and Flexible Approach to Generate Real World Cyber Security Datasets

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation