Skip to main content

A Digital Media Similarity Measure for Triage of Digital Forensic Evidence

  • Conference paper
  • First Online:
Advances in Digital Forensics XVI (DigitalForensics 2020)

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 589))

Included in the following conference series:

Abstract

As the volume of potential digital evidence increases, digital forensic practitioners are challenged to determine the best allocation of their limited resources. While automation will continue to partially mitigate this problem, the preliminary question about which media should be examined by human or machine remains largely unsolved. This chapter describes and validates a methodology for assessing digital media similarity to assist with digital media triage decisions. The application of the methodology is predicated on the idea that unexamined media is likely to be relevant or interesting to a practitioner if the media is similar to other media that were previously determined to be relevant or interesting. The methodology builds on prior work using sector hashing and the Jaccard index of similarity. These two methods are combined in a novel manner and the accuracy of the resulting methodology is demonstrated using a collection of hard drive images with known ground truth. The work goes beyond interesting file and file fragment matching. Specifically, it assesses the overall similarity of digital media to identify systems that might share applications and thus be related, even if common files of interest are encrypted, deleted or otherwise unavailable. In addition to triage decisions, digital media similarity may be used to infer links and associations between disparate entities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. R. Beverly, S. Garfinkel and G. Cardwell, Forensic carving of network packets and associated data structures, Digital Investigation, vol. 8(S), pp. S78–S89, 2011.

    Google Scholar 

  2. P. Bjelland, K. Franke and A. Arnes, Practical use of approximate hash-based matching in digital investigations, Digital Investigation, vol. 11(S1), pp. S18–S26, 2014.

    Google Scholar 

  3. B. Bloom, Space/time trade-offs in hash coding with allowable errors, Communications of the ACM, vol. 13, pp. 422–426, 1970.

    Google Scholar 

  4. F. Breitinger and H. Baier, Performance issues about context-triggered piecewise hashing, Proceedings of the International Conference on Digital Forensics and Cyber Crime, pp. 141–155, 2012.

    Google Scholar 

  5. F. Breitinger, B. Guttman, M. McCarrin, V. Roussev and D. White, Approximate Matching: Definition and Terminology, NIST Special Publication 800-168, National Institute of Standards and Technologies, Gaithersburg, Maryland, 2014.

    Google Scholar 

  6. S. Bunting and W. Wei, EnCase Computer Forensics: The Official EnCE: EnCase Certified Examiner Study Guide, Wiley Publishing, Indianapolis, Indiana, 2006.

    Google Scholar 

  7. H. Chu, Lightning Memory-Mapped Database Manager (LMDB), Symas Corporation, Grand Junction, Colorado (www.lmdb.tech/doc), 2011.

    Google Scholar 

  8. Digital Corpora, hashdb 3.1.0 Users Manual (downloads.digitalcorpora.org/downloads/hashdb/hashdb_um.pdf), 2017.

    Google Scholar 

  9. Digital Corpora, 2009 M57-Patents Scenario (digitalcorpora.org/corpora/scenarios/m57-patents-scenario), 2019.

    Google Scholar 

  10. Digital Corpora, Real Data Corpus (digitalcorpora.org/corpora/disk-images/real-data-corpus), 2019.

    Google Scholar 

  11. S. Garfinkel, Forensic feature extraction and cross-drive analysis, Digital Investigation, vol. 3(S), pp. S71–S81, 2006.

    Google Scholar 

  12. S. Garfinkel, Digital media triage with bulk data analysis and bulk_extractor, Computers and Security, vol. 32, pp. 56–72, 2013.

    Google Scholar 

  13. S. Garfinkel, P. Farrell, V. Roussev and G. Dinolt, Bringing science to digital forensics with standardized forensic corpora, Digital Investigation, vol. 6(S), pp. S2–S11, 2009.

    Google Scholar 

  14. S. Garfinkel and M. McCarrin, Hash-based carving: Searching media for complete files and file fragments with sector hashing and hashdb, Digital Investigation, vol. 14(S1), pp. S95–S105, 2015.

    Google Scholar 

  15. S. Garfinkel, A. Nelson, D. White and R. Roussev, Using purpose-built functions and block hashes to enable small block and sub-file forensics, Digital Investigation, vol. 7(S), pp. S13–S23, 2010.

    Google Scholar 

  16. J. Kornblum, Identifying almost identical files using context-triggered piecewise hashing, Digital Investigation, vol. 3(S), pp. 91–97, 2006.

    Google Scholar 

  17. V. Moia and M. Henriques, A comparative analysis about similarity search strategies for digital forensic investigations, Proceedings of the Thirty-Fifth Brazilian Symposium on Telecommunications and Signal Processing, pp. 462–466, 2017.

    Google Scholar 

  18. National Institute of Standards and Technology, National Software Reference Library (NSRL), Gaithersburg, Maryland (www.nsrl.nist.gov), 2019.

    Google Scholar 

  19. J. Oliver, C. Cheng and Y. Chen, TLSH – A locality sensitive hash, Proceedings of the Fourth Cybercrime and Trustworthy Computing Workshop, pp. 7–13, 2013.

    Google Scholar 

  20. J. Oliver, S. Forman and C. Cheng, Using randomization to attack similarity digests, Proceedings of the International Conference on Applications and Techniques in Information Security, pp. 199–210, 2014.

    Google Scholar 

  21. H. Parsonage, Computer Forensics Case Assessment and Triage – Some Ideas for Discussion (computerforensics.parsonage.co.uk/triage/ComputerForensicsCaseAssessmentANDTriageDiscussionPaper.pdf), 2009.

    Google Scholar 

  22. P. Penrose, W. Buchanan and R. Macfarlane, Fast contraband detection in large capacity disk drives, Digital Investigation, vol. 12(S1), pp. S22–S29, 2015.

    Google Scholar 

  23. RCFL National Program Office, Regional Computer Forensics Laboratory Annual Report for Fiscal Year 2017, Quantico, Virginia (www.rcfl.gov/file-repository/09-rcfl-annual-2017-190130-print-1.pdf/view), 2017.

    Google Scholar 

  24. R. Real and J. Vargas, The probability basis of Jaccard’s index of similarity, Systematic Biology, vol. 45(30), pp. 380–385, 1996.

    Google Scholar 

  25. V. Roussev, Building a better similarity trap with statistically improbable features, Proceedings of the Forty-Second Hawaii International Conference on System Sciences, 2009.

    Google Scholar 

  26. V. Roussev, Data fingerprinting with similarity digests, in Advances in Digital Forensics VI, K. Chow and S. Shenoi (Eds.), Springer, Berlin Heidelberg, Germany, pp. 207–226, 2010.

    Google Scholar 

  27. V. Roussev, Y. Chen, T. Bourg and G. Richard, md5bloom: Forensic filesystem hashing revisited, Digital Investigation, vol. 3(S), pp. S82–S90, 2006.

    Google Scholar 

  28. W. Stallings and L. Brown, Computer Security: Principles and Practice, Pearson Education, Upper Saddle River, New Jersey, 2015.

    Google Scholar 

  29. J. Taguchi, Optimal Sector Sampling for Drive Triage, M.S. Thesis, Department of Computer Science, Naval Postgraduate School, Monterey, California, 2013.

    Google Scholar 

  30. A. Tridgell, spamsum (samba.org/ftp/unpacked/junkcode/spamsum/README), 2002.

    Google Scholar 

  31. R. Walls, E. Learned-Miller and B. Levine, Forensic triage for mobile phones with DECoDE, Proceedings of the Twentieth USENIX Security Symposium, 2011.

    Google Scholar 

  32. J. Young, K. Foster, S. Garfinkel and K. Fairbanks, Distinct sector hashes for target file detection, IEEE Computer, vol. 45(12), pp. 28–35, 2012.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Myeong Lim .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lim, M., Jones, J. (2020). A Digital Media Similarity Measure for Triage of Digital Forensic Evidence. In: Peterson, G., Shenoi, S. (eds) Advances in Digital Forensics XVI. DigitalForensics 2020. IFIP Advances in Information and Communication Technology, vol 589. Springer, Cham. https://doi.org/10.1007/978-3-030-56223-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-56223-6_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-56222-9

  • Online ISBN: 978-3-030-56223-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics