Skip to main content

Advertisement

SpringerLink
Log in
Menu
Find a journal Publish with us
Search
Cart
Book cover

IFIP International Conference on Digital Forensics

DigitalForensics 2014: Advances in Digital Forensics X pp 149–163Cite as

  1. Home
  2. Advances in Digital Forensics X
  3. Conference paper
Using Approximate Matching to Reduce the Volume of Digital Data

Using Approximate Matching to Reduce the Volume of Digital Data

  • Frank Breitinger3,4,
  • Christian Winter5,
  • York Yannikos5,
  • Tobias Fink3 &
  • …
  • Michael Seefried3 
  • Conference paper
  • 1403 Accesses

  • 1 Citations

Part of the IFIP Advances in Information and Communication Technology book series (IFIPAICT,volume 433)

Abstract

Digital forensic investigators frequently have to search for relevant files in massive digital corpora – a task often compared to finding a needle in a haystack. To address this challenge, investigators typically apply cryptographic hash functions to identify known files. However, cryptographic hashing only allows the detection of files that exactly match the known file hash values or fingerprints. This paper demonstrates the benefits of using approximate matching to locate relevant files. The experiments described in this paper used three test images of Windows XP, Windows 7 and Ubuntu 12.04 systems to evaluate fingerprint-based comparisons. The results reveal that approximate matching can improve file identification – in one case, increasing the identification rate from 1.82% to 23.76%.

Keywords

  • File identification
  • approximate matching
  • ssdeep

Download conference paper PDF

References

  1. H. Baier and C. Dichtelmuller, Datenreduktion mittels kryptographischer Hashfunktionen in der IT-Forensik: Nur ein Mythos? DACH Security, pp. 278–287, September 2012.

    Google Scholar 

  2. F. Breitinger, K. Astebol, H. Baier and C. Busch, mvhash-b – A new approach for similarity preserving hashing, Proceedings of the Seventh International Conference on IT Security Incident Management and IT Forensics, pp. 33–44, 2013.

    Google Scholar 

  3. F. Breitinger and H. Baier, Security aspects of piecewise hashing in computer forensics, Proceedings of the Sixth International Conference on IT Security Incident Management and IT Forensics, pp. 21–36, 2011.

    Google Scholar 

  4. F. Breitinger and H. Baier, A fuzzy hashing approach based on random sequences and Hamming distance, Proceedings of the Conference on Digital Forensics, Security and Law, 2012.

    Google Scholar 

  5. F. Breitinger and H. Baier, Similarity preserving hashing: Eligible properties and a new algorithm mrsh-v2, Proceedings of the Fourth International ICST Conference on Digital Forensics and Cyber Crime, 2012.

    Google Scholar 

  6. A. Broder, On the resemblance and containment of documents, Proceedings of the International Conference on the Compression and Complexity of Sequences, pp. 21–29, 1997.

    Google Scholar 

  7. L. Chen and G. Wang, An efficient piecewise hashing method for computer forensics, Proceedings of the First International Workshop on Knowledge Discovery and Data Mining, pp. 635–638, 2008.

    CrossRef  Google Scholar 

  8. P. Deutsch and J. Gailly, ZLIB Compressed Data Format Specification Version 3.3, RFC 1950, 1996.

    Google Scholar 

  9. J. Kornblum, Identifying almost identical files using context triggered piecewise hashing, Digital Investigation, vol. 3(S), pp. S91–S97, 2006.

    CrossRef  Google Scholar 

  10. J. Kornblum, ssdeep ( http://ssdeep.sourceforge.net ), 2013.

  11. National Institute of Standards and Technology, National Software Reference Library, Gaithersburg, Maryland ( www.nsrl.nist.gov ).

  12. L. Noll, FNV hash ( www.isthe.com/chongo/tech/comp/fnv/index.html ), 2013.

    Google Scholar 

  13. V. Roussev, Data fingerprinting with similarity digests, in Advances in Digital Forensics VI, K. Chow and S. Shenoi (Eds.), Springer, Heidelberg, Germany, pp. 207–226, 2010.

    CrossRef  Google Scholar 

  14. V. Roussev, An evaluation of forensic similarity hashes, Digital Investigation, vol. 8(S), pp. S34–S41, 2011.

    CrossRef  Google Scholar 

  15. C. Sadowski and G. Levin, SimHash: Hash-Based Similarity Detection, Technical Report UCSC-SOE-11-07, Department of Computer Science, University of California Santa Cruz, Santa Cruz, California ( http://simhash.googlecode.com/svn/trunk/paper/SimHashWithBib.pdf ), 2007.

  16. K. Seo, K. Lim, J. Choi, K. Chang and S. Lee, Detecting similar files based on hash and statistical analysis for digital forensic investigations, Proceedings of the Second International Conference on Computer Science and its Applications, 2009.

    Google Scholar 

  17. A. Tridgell, spamsum ( http://mirror.linux.org.au/linux.conf.au/2004/papers/junkcode/spamsum/README ), 2002.

  18. D. White, Hashing of file blocks: When exact matches are not useful, presented at the Annual Meeting of the American Academy of Forensic Sciences, 2008.

    Google Scholar 

  19. C. Winter, M. Schneider and Y. Yannikos, F2S2: Fast forensic similarity search through indexing piecewise hash signatures, Digital Investigation, vol. 10(4), pp. 361–371, 2013.

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Darmstadt University of Applied Sciences, Darmstadt, Germany

    Frank Breitinger, Tobias Fink & Michael Seefried

  2. Center for Advanced Security Research Darmstadt, Darmstadt, Germany

    Frank Breitinger

  3. Fraunhofer Institute for Secure Information Technology, Darmstadt, Germany

    Christian Winter & York Yannikos

Authors
  1. Frank Breitinger
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Christian Winter
    View author publications

    You can also search for this author in PubMed Google Scholar

  3. York Yannikos
    View author publications

    You can also search for this author in PubMed Google Scholar

  4. Tobias Fink
    View author publications

    You can also search for this author in PubMed Google Scholar

  5. Michael Seefried
    View author publications

    You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

  1. Air Force Institute of Technology, Wright-Patterson Air Force Base, 45433-7765, OH, USA

    Gilbert Peterson

  2. University of Tulsa, 74104-3189, Tulsa, OK, USA

    Sujeet Shenoi

Rights and permissions

Reprints and Permissions

Copyright information

© 2014 IFIP International Federation for Information Processing

About this paper

Cite this paper

Breitinger, F., Winter, C., Yannikos, Y., Fink, T., Seefried, M. (2014). Using Approximate Matching to Reduce the Volume of Digital Data. In: Peterson, G., Shenoi, S. (eds) Advances in Digital Forensics X. DigitalForensics 2014. IFIP Advances in Information and Communication Technology, vol 433. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44952-3_11

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/978-3-662-44952-3_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-44951-6

  • Online ISBN: 978-3-662-44952-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Search

Navigation

  • Find a journal
  • Publish with us

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support

65.108.231.39

Not affiliated

Springer Nature

© 2023 Springer Nature