Abstract
Increases in the quantity and complexity of digital evidence necessitate the development and application of advanced, accurate and efficient digital forensic tools. Digital forensic tool testing helps assure the veracity of digital evidence, but it requires appropriate validation datasets. The datasets are crucial to evaluating reproducibility and improving the state of the art. Datasets can be real-world or synthetic. While real-world datasets have the advantage of relevance, the interpretation of results can be difficult because reliable ground truth may not exist. In contrast, ground truth is easily established for synthetic datasets.
This chapter presents the hystck framework for generating synthetic datasets with ground truth. The framework supports the automated generation of synthetic network traffic and operating system and application artifacts by simulating human-computer interactions. The generated data can be indistinguishable from data generated by normal human-computer interactions. The modular structure of the framework enhances the ability to incorporate extensions that simulate new applications and generate new types of network traffic.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
S. Abt and H. Baier, Are we missing labels? A study of the availability of ground truth in network security research, Proceedings of the Third International Workshop on Building Analysis Datasets and Gathering Experience Returns for Security, pp. 40–55, 2014.
D. Brauckhoff, A. Wagner and M. May, FLAME: A flow-level anomaly modeling engine, Proceedings of the Conference on Cyber Security Experimentation and Test, article no. 1, 2008.
G. Brogi and V. Tong, Sharing and replaying attack scenarios with Moirai, presented at the Rendezvous de la Recherche et de l’Enseignement de la Sécurité des Systèmes d’Information (Information Systems Security Research and Education Meeting), 2017.
Carrier, B.: Open Source Digital Forensic Tools: The Legal Argument, @stake. Massachusetts, Cambridge (2002).
R. Chinchilla, J. Hoag, D. Koonce, H. Kruse, S. Osterman and Y. Wang, Characterization of Internet traffic and user classification: Foundations for the next generation of network emulation, Proceedings of the Tenth International Conference on Telecommunications Systems, Modeling and Analysis, 2002.
C. Cordero, E. Vasilomanolakis, N. Milanov, C. Koch, D. Hausheer and M. Muhlhauser, ID2T: A DIY dataset creation toolkit for intrusion detection systems, Proceedings of the IEEE Conference on Communications and Network Security, pp. 739–740, 2015.
C. Grajeda, F. Breitinger and I. Baggili, Availability of datasets for digital forensics – And what is missing, Digital Investigation, vol. 22(S), pp. S94–S105, 2017.
M. Hibler, R. Ricci, L. Stoller, J. Duerig, S. Guruprasad, T. Stack, K. Webb and J. Lepreau, Large-scale virtualization in the Emulab network testbed, Proceedings of the USENIX Annual Technical Conference, pp. 113–128, 2008.
M. Mahoney and P. Chan, An analysis of the 1999 DARPA/Lincoln Laboratory evaluation data for network anomaly detection, Proceedings of the International Workshop on Recent Advances in Intrusion Detection, pp. 220–237, 2003.
C. Moch and F. Freiling, The Forensic Image Generator Generator (Forensig\(^2\)), Proceedings of the Fifth International Conference on IT Security Incident Management and IT Forensics, pp. 78–93, 2009.
C. Moch and F. Freiling, Evaluating the Forensic Image Generator Generator, Proceedings of the International Conference on Digital Forensics and Cyber Crime, pp. 238–252, 2011.
S. Molnar, P. Megyesi and G. Szabo, How to validate traffic generators? Proceedings of the IEEE International Conference on Communications Workshops, pp. 1340–1344, 2013.
National Institute of Standards and Technology, Computer Forensic Tool Testing (CFTT) Program, Gaithersburg, Maryland (www.nist.gov/itl/ssd/software-quality-group/computer-forensics-tool-testing-program-cftt), 2019.
National Institute of Standards and Technology, The CFReDS Project, Gaithersburg, Maryland (www.cfreds.nist.gov), 2019.
Ring, M., Wunderlich, S., Scheuring, D., Landes, D., Hotho, A.: A survey of network-based intrusion detection datasets. Computers and Security 86, 147–167 (2019).
L. Rossey, R. Cunningham, D. Fried, J. Rabek, R. Lippmann, J. Haines and M. Zissman, LARIAT: Lincoln adaptable real-time information assurance testbed, Proceedings of the IEEE Aerospace Conference, 2002.
M. Scanlon, X. Du and D. Lillis, EviPlant: An efficient digital forensics challenge creation, manipulation and distribution solution, Digital Investigation, vol. 20(S), pp. S29–S36, 2017.
H. Visti, S. Tohill and P. Douglas, Automatic creation of computer forensic test images, in Computational Forensics, U. Garain and F. Shafait (Eds.), Springer, Cham, Switzerland, pp. 163–175, 2015.
C. Wright, C. Connelly, T. Braje, J. Rabek, L. Rossey and R. Cunningham, Generating client workloads and high-fidelity network traffic for controllable repeatable experiments in computer security, Proceedings of the International Workshop on Recent Advances in Intrusion Detection, pp. 218–237, 2010.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 IFIP International Federation for Information Processing
About this paper
Cite this paper
Göbel, T., Schäfer, T., Hachenberger, J., Türr, J., Baier, H. (2020). A Novel Approach for Generating Synthetic Datasets for Digital Forensics. In: Peterson, G., Shenoi, S. (eds) Advances in Digital Forensics XVI. DigitalForensics 2020. IFIP Advances in Information and Communication Technology, vol 589. Springer, Cham. https://doi.org/10.1007/978-3-030-56223-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-56223-6_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-56222-9
Online ISBN: 978-3-030-56223-6
eBook Packages: Computer ScienceComputer Science (R0)