Abstract
Digital forensic investigators frequently have to search for relevant files in massive digital corpora – a task often compared to finding a needle in a haystack. To address this challenge, investigators typically apply cryptographic hash functions to identify known files. However, cryptographic hashing only allows the detection of files that exactly match the known file hash values or fingerprints. This paper demonstrates the benefits of using approximate matching to locate relevant files. The experiments described in this paper used three test images of Windows XP, Windows 7 and Ubuntu 12.04 systems to evaluate fingerprint-based comparisons. The results reveal that approximate matching can improve file identification – in one case, increasing the identification rate from 1.82% to 23.76%.
Chapter PDF
Similar content being viewed by others
References
H. Baier and C. Dichtelmuller, Datenreduktion mittels kryptographischer Hashfunktionen in der IT-Forensik: Nur ein Mythos? DACH Security, pp. 278–287, September 2012.
F. Breitinger, K. Astebol, H. Baier and C. Busch, mvhash-b – A new approach for similarity preserving hashing, Proceedings of the Seventh International Conference on IT Security Incident Management and IT Forensics, pp. 33–44, 2013.
F. Breitinger and H. Baier, Security aspects of piecewise hashing in computer forensics, Proceedings of the Sixth International Conference on IT Security Incident Management and IT Forensics, pp. 21–36, 2011.
F. Breitinger and H. Baier, A fuzzy hashing approach based on random sequences and Hamming distance, Proceedings of the Conference on Digital Forensics, Security and Law, 2012.
F. Breitinger and H. Baier, Similarity preserving hashing: Eligible properties and a new algorithm mrsh-v2, Proceedings of the Fourth International ICST Conference on Digital Forensics and Cyber Crime, 2012.
A. Broder, On the resemblance and containment of documents, Proceedings of the International Conference on the Compression and Complexity of Sequences, pp. 21–29, 1997.
L. Chen and G. Wang, An efficient piecewise hashing method for computer forensics, Proceedings of the First International Workshop on Knowledge Discovery and Data Mining, pp. 635–638, 2008.
P. Deutsch and J. Gailly, ZLIB Compressed Data Format Specification Version 3.3, RFC 1950, 1996.
J. Kornblum, Identifying almost identical files using context triggered piecewise hashing, Digital Investigation, vol. 3(S), pp. S91–S97, 2006.
J. Kornblum, ssdeep ( http://ssdeep.sourceforge.net ), 2013.
National Institute of Standards and Technology, National Software Reference Library, Gaithersburg, Maryland ( www.nsrl.nist.gov ).
L. Noll, FNV hash ( www.isthe.com/chongo/tech/comp/fnv/index.html ), 2013.
V. Roussev, Data fingerprinting with similarity digests, in Advances in Digital Forensics VI, K. Chow and S. Shenoi (Eds.), Springer, Heidelberg, Germany, pp. 207–226, 2010.
V. Roussev, An evaluation of forensic similarity hashes, Digital Investigation, vol. 8(S), pp. S34–S41, 2011.
C. Sadowski and G. Levin, SimHash: Hash-Based Similarity Detection, Technical Report UCSC-SOE-11-07, Department of Computer Science, University of California Santa Cruz, Santa Cruz, California ( http://simhash.googlecode.com/svn/trunk/paper/SimHashWithBib.pdf ), 2007.
K. Seo, K. Lim, J. Choi, K. Chang and S. Lee, Detecting similar files based on hash and statistical analysis for digital forensic investigations, Proceedings of the Second International Conference on Computer Science and its Applications, 2009.
A. Tridgell, spamsum ( http://mirror.linux.org.au/linux.conf.au/2004/papers/junkcode/spamsum/README ), 2002.
D. White, Hashing of file blocks: When exact matches are not useful, presented at the Annual Meeting of the American Academy of Forensic Sciences, 2008.
C. Winter, M. Schneider and Y. Yannikos, F2S2: Fast forensic similarity search through indexing piecewise hash signatures, Digital Investigation, vol. 10(4), pp. 361–371, 2013.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 IFIP International Federation for Information Processing
About this paper
Cite this paper
Breitinger, F., Winter, C., Yannikos, Y., Fink, T., Seefried, M. (2014). Using Approximate Matching to Reduce the Volume of Digital Data. In: Peterson, G., Shenoi, S. (eds) Advances in Digital Forensics X. DigitalForensics 2014. IFIP Advances in Information and Communication Technology, vol 433. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44952-3_11
Download citation
DOI: https://doi.org/10.1007/978-3-662-44952-3_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44951-6
Online ISBN: 978-3-662-44952-3
eBook Packages: Computer ScienceComputer Science (R0)