Skip to main content

A Novel Approach for Generating Synthetic Datasets for Digital Forensics

  • Conference paper
  • First Online:
Advances in Digital Forensics XVI (DigitalForensics 2020)

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 589))

Included in the following conference series:

Abstract

Increases in the quantity and complexity of digital evidence necessitate the development and application of advanced, accurate and efficient digital forensic tools. Digital forensic tool testing helps assure the veracity of digital evidence, but it requires appropriate validation datasets. The datasets are crucial to evaluating reproducibility and improving the state of the art. Datasets can be real-world or synthetic. While real-world datasets have the advantage of relevance, the interpretation of results can be difficult because reliable ground truth may not exist. In contrast, ground truth is easily established for synthetic datasets.

This chapter presents the hystck framework for generating synthetic datasets with ground truth. The framework supports the automated generation of synthetic network traffic and operating system and application artifacts by simulating human-computer interactions. The generated data can be indistinguishable from data generated by normal human-computer interactions. The modular structure of the framework enhances the ability to incorporate extensions that simulate new applications and generate new types of network traffic.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. S. Abt and H. Baier, Are we missing labels? A study of the availability of ground truth in network security research, Proceedings of the Third International Workshop on Building Analysis Datasets and Gathering Experience Returns for Security, pp. 40–55, 2014.

    Google Scholar 

  2. D. Brauckhoff, A. Wagner and M. May, FLAME: A flow-level anomaly modeling engine, Proceedings of the Conference on Cyber Security Experimentation and Test, article no. 1, 2008.

    Google Scholar 

  3. G. Brogi and V. Tong, Sharing and replaying attack scenarios with Moirai, presented at the Rendezvous de la Recherche et de l’Enseignement de la Sécurité des Systèmes d’Information (Information Systems Security Research and Education Meeting), 2017.

    Google Scholar 

  4. Carrier, B.: Open Source Digital Forensic Tools: The Legal Argument, @stake. Massachusetts, Cambridge (2002).

    Google Scholar 

  5. R. Chinchilla, J. Hoag, D. Koonce, H. Kruse, S. Osterman and Y. Wang, Characterization of Internet traffic and user classification: Foundations for the next generation of network emulation, Proceedings of the Tenth International Conference on Telecommunications Systems, Modeling and Analysis, 2002.

    Google Scholar 

  6. C. Cordero, E. Vasilomanolakis, N. Milanov, C. Koch, D. Hausheer and M. Muhlhauser, ID2T: A DIY dataset creation toolkit for intrusion detection systems, Proceedings of the IEEE Conference on Communications and Network Security, pp. 739–740, 2015.

    Google Scholar 

  7. C. Grajeda, F. Breitinger and I. Baggili, Availability of datasets for digital forensics – And what is missing, Digital Investigation, vol. 22(S), pp. S94–S105, 2017.

    Google Scholar 

  8. M. Hibler, R. Ricci, L. Stoller, J. Duerig, S. Guruprasad, T. Stack, K. Webb and J. Lepreau, Large-scale virtualization in the Emulab network testbed, Proceedings of the USENIX Annual Technical Conference, pp. 113–128, 2008.

    Google Scholar 

  9. M. Mahoney and P. Chan, An analysis of the 1999 DARPA/Lincoln Laboratory evaluation data for network anomaly detection, Proceedings of the International Workshop on Recent Advances in Intrusion Detection, pp. 220–237, 2003.

    Google Scholar 

  10. C. Moch and F. Freiling, The Forensic Image Generator Generator (Forensig\(^2\)), Proceedings of the Fifth International Conference on IT Security Incident Management and IT Forensics, pp. 78–93, 2009.

    Google Scholar 

  11. C. Moch and F. Freiling, Evaluating the Forensic Image Generator Generator, Proceedings of the International Conference on Digital Forensics and Cyber Crime, pp. 238–252, 2011.

    Google Scholar 

  12. S. Molnar, P. Megyesi and G. Szabo, How to validate traffic generators? Proceedings of the IEEE International Conference on Communications Workshops, pp. 1340–1344, 2013.

    Google Scholar 

  13. National Institute of Standards and Technology, Computer Forensic Tool Testing (CFTT) Program, Gaithersburg, Maryland (www.nist.gov/itl/ssd/software-quality-group/computer-forensics-tool-testing-program-cftt), 2019.

    Google Scholar 

  14. National Institute of Standards and Technology, The CFReDS Project, Gaithersburg, Maryland (www.cfreds.nist.gov), 2019.

    Google Scholar 

  15. Ring, M., Wunderlich, S., Scheuring, D., Landes, D., Hotho, A.: A survey of network-based intrusion detection datasets. Computers and Security 86, 147–167 (2019).

    Article  Google Scholar 

  16. L. Rossey, R. Cunningham, D. Fried, J. Rabek, R. Lippmann, J. Haines and M. Zissman, LARIAT: Lincoln adaptable real-time information assurance testbed, Proceedings of the IEEE Aerospace Conference, 2002.

    Google Scholar 

  17. M. Scanlon, X. Du and D. Lillis, EviPlant: An efficient digital forensics challenge creation, manipulation and distribution solution, Digital Investigation, vol. 20(S), pp. S29–S36, 2017.

    Google Scholar 

  18. H. Visti, S. Tohill and P. Douglas, Automatic creation of computer forensic test images, in Computational Forensics, U. Garain and F. Shafait (Eds.), Springer, Cham, Switzerland, pp. 163–175, 2015.

    Google Scholar 

  19. C. Wright, C. Connelly, T. Braje, J. Rabek, L. Rossey and R. Cunningham, Generating client workloads and high-fidelity network traffic for controllable repeatable experiments in computer security, Proceedings of the International Workshop on Recent Advances in Intrusion Detection, pp. 218–237, 2010.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas Göbel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Göbel, T., Schäfer, T., Hachenberger, J., Türr, J., Baier, H. (2020). A Novel Approach for Generating Synthetic Datasets for Digital Forensics. In: Peterson, G., Shenoi, S. (eds) Advances in Digital Forensics XVI. DigitalForensics 2020. IFIP Advances in Information and Communication Technology, vol 589. Springer, Cham. https://doi.org/10.1007/978-3-030-56223-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-56223-6_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-56222-9

  • Online ISBN: 978-3-030-56223-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics