Abstract
As the volume of potential digital evidence increases, digital forensic practitioners are challenged to determine the best allocation of their limited resources. While automation will continue to partially mitigate this problem, the preliminary question about which media should be examined by human or machine remains largely unsolved. This chapter describes and validates a methodology for assessing digital media similarity to assist with digital media triage decisions. The application of the methodology is predicated on the idea that unexamined media is likely to be relevant or interesting to a practitioner if the media is similar to other media that were previously determined to be relevant or interesting. The methodology builds on prior work using sector hashing and the Jaccard index of similarity. These two methods are combined in a novel manner and the accuracy of the resulting methodology is demonstrated using a collection of hard drive images with known ground truth. The work goes beyond interesting file and file fragment matching. Specifically, it assesses the overall similarity of digital media to identify systems that might share applications and thus be related, even if common files of interest are encrypted, deleted or otherwise unavailable. In addition to triage decisions, digital media similarity may be used to infer links and associations between disparate entities.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
R. Beverly, S. Garfinkel and G. Cardwell, Forensic carving of network packets and associated data structures, Digital Investigation, vol. 8(S), pp. S78–S89, 2011.
P. Bjelland, K. Franke and A. Arnes, Practical use of approximate hash-based matching in digital investigations, Digital Investigation, vol. 11(S1), pp. S18–S26, 2014.
B. Bloom, Space/time trade-offs in hash coding with allowable errors, Communications of the ACM, vol. 13, pp. 422–426, 1970.
F. Breitinger and H. Baier, Performance issues about context-triggered piecewise hashing, Proceedings of the International Conference on Digital Forensics and Cyber Crime, pp. 141–155, 2012.
F. Breitinger, B. Guttman, M. McCarrin, V. Roussev and D. White, Approximate Matching: Definition and Terminology, NIST Special Publication 800-168, National Institute of Standards and Technologies, Gaithersburg, Maryland, 2014.
S. Bunting and W. Wei, EnCase Computer Forensics: The Official EnCE: EnCase Certified Examiner Study Guide, Wiley Publishing, Indianapolis, Indiana, 2006.
H. Chu, Lightning Memory-Mapped Database Manager (LMDB), Symas Corporation, Grand Junction, Colorado (www.lmdb.tech/doc), 2011.
Digital Corpora, hashdb 3.1.0 Users Manual (downloads.digitalcorpora.org/downloads/hashdb/hashdb_um.pdf), 2017.
Digital Corpora, 2009 M57-Patents Scenario (digitalcorpora.org/corpora/scenarios/m57-patents-scenario), 2019.
Digital Corpora, Real Data Corpus (digitalcorpora.org/corpora/disk-images/real-data-corpus), 2019.
S. Garfinkel, Forensic feature extraction and cross-drive analysis, Digital Investigation, vol. 3(S), pp. S71–S81, 2006.
S. Garfinkel, Digital media triage with bulk data analysis and bulk_extractor, Computers and Security, vol. 32, pp. 56–72, 2013.
S. Garfinkel, P. Farrell, V. Roussev and G. Dinolt, Bringing science to digital forensics with standardized forensic corpora, Digital Investigation, vol. 6(S), pp. S2–S11, 2009.
S. Garfinkel and M. McCarrin, Hash-based carving: Searching media for complete files and file fragments with sector hashing and hashdb, Digital Investigation, vol. 14(S1), pp. S95–S105, 2015.
S. Garfinkel, A. Nelson, D. White and R. Roussev, Using purpose-built functions and block hashes to enable small block and sub-file forensics, Digital Investigation, vol. 7(S), pp. S13–S23, 2010.
J. Kornblum, Identifying almost identical files using context-triggered piecewise hashing, Digital Investigation, vol. 3(S), pp. 91–97, 2006.
V. Moia and M. Henriques, A comparative analysis about similarity search strategies for digital forensic investigations, Proceedings of the Thirty-Fifth Brazilian Symposium on Telecommunications and Signal Processing, pp. 462–466, 2017.
National Institute of Standards and Technology, National Software Reference Library (NSRL), Gaithersburg, Maryland (www.nsrl.nist.gov), 2019.
J. Oliver, C. Cheng and Y. Chen, TLSH – A locality sensitive hash, Proceedings of the Fourth Cybercrime and Trustworthy Computing Workshop, pp. 7–13, 2013.
J. Oliver, S. Forman and C. Cheng, Using randomization to attack similarity digests, Proceedings of the International Conference on Applications and Techniques in Information Security, pp. 199–210, 2014.
H. Parsonage, Computer Forensics Case Assessment and Triage – Some Ideas for Discussion (computerforensics.parsonage.co.uk/triage/ComputerForensicsCaseAssessmentANDTriageDiscussionPaper.pdf), 2009.
P. Penrose, W. Buchanan and R. Macfarlane, Fast contraband detection in large capacity disk drives, Digital Investigation, vol. 12(S1), pp. S22–S29, 2015.
RCFL National Program Office, Regional Computer Forensics Laboratory Annual Report for Fiscal Year 2017, Quantico, Virginia (www.rcfl.gov/file-repository/09-rcfl-annual-2017-190130-print-1.pdf/view), 2017.
R. Real and J. Vargas, The probability basis of Jaccard’s index of similarity, Systematic Biology, vol. 45(30), pp. 380–385, 1996.
V. Roussev, Building a better similarity trap with statistically improbable features, Proceedings of the Forty-Second Hawaii International Conference on System Sciences, 2009.
V. Roussev, Data fingerprinting with similarity digests, in Advances in Digital Forensics VI, K. Chow and S. Shenoi (Eds.), Springer, Berlin Heidelberg, Germany, pp. 207–226, 2010.
V. Roussev, Y. Chen, T. Bourg and G. Richard, md5bloom: Forensic filesystem hashing revisited, Digital Investigation, vol. 3(S), pp. S82–S90, 2006.
W. Stallings and L. Brown, Computer Security: Principles and Practice, Pearson Education, Upper Saddle River, New Jersey, 2015.
J. Taguchi, Optimal Sector Sampling for Drive Triage, M.S. Thesis, Department of Computer Science, Naval Postgraduate School, Monterey, California, 2013.
A. Tridgell, spamsum (samba.org/ftp/unpacked/junkcode/spamsum/README), 2002.
R. Walls, E. Learned-Miller and B. Levine, Forensic triage for mobile phones with DECoDE, Proceedings of the Twentieth USENIX Security Symposium, 2011.
J. Young, K. Foster, S. Garfinkel and K. Fairbanks, Distinct sector hashes for target file detection, IEEE Computer, vol. 45(12), pp. 28–35, 2012.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 IFIP International Federation for Information Processing
About this paper
Cite this paper
Lim, M., Jones, J. (2020). A Digital Media Similarity Measure for Triage of Digital Forensic Evidence. In: Peterson, G., Shenoi, S. (eds) Advances in Digital Forensics XVI. DigitalForensics 2020. IFIP Advances in Information and Communication Technology, vol 589. Springer, Cham. https://doi.org/10.1007/978-3-030-56223-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-56223-6_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-56222-9
Online ISBN: 978-3-030-56223-6
eBook Packages: Computer ScienceComputer Science (R0)