Skip to main content

EVALUATION OF NETWORK TRAFFIC ANALYSIS USING APPROXIMATE MATCHING ALGORITHMS

  • Conference paper
  • First Online:
Advances in Digital Forensics XVII (DigitalForensics 2021)

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 612))

Included in the following conference series:

Abstract

Approximate matching has become indispensable in digital forensics as practitioners often have to search for relevant files in massive digital corpora. The research community has developed a variety of approximate matching algorithms. However, not only data at rest, but also data in motion can benefit from approximate matching. Examining network traffic flows in modern networks, firewalls and data loss prevention systems are key to preventing security compromises.

This chapter discusses the current state of research, use cases, validations and optimizations related to applications of approximate matching algorithms to network traffic analysis. For the first time, the efficacy of prominent approximate matching algorithms at detecting files in network packet payloads is evaluated, and the best candidates, namely TLSH, ssdeep, mrsh-net and mrsh-cf, are adapted to this task. The individual algorithms are compared, strengths and weaknesses highlighted, and detection rates evaluated in gigabit-range, real-world scenarios. The results are very promising, including a detection rate of 97% while maintaining a throughput of 4 Gbps when processing a large forensic file corpus. An additional contribution is the public sharing of optimized prototypes of the most promising algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. S. Alneyadi, E. Sithirasenan and V. Muthukkumarasamy, A survey of data leakage prevention systems, Journal of Network and Computer Applications, vol. 62, pp. 137–152, 2016.

    Google Scholar 

  2. H. Baier and F. Breitinger, Security aspects of piecewise hashing in computer forensics, Proceedings of the Sixth International Conference on IT Security Incident Management and IT Forensics, pp. 21–36, 2011.

    Google Scholar 

  3. F. Breitinger, K. Astebol, H. Baier and C. Busch, mvHash-B – A new approach for similarity-preserving hashing, Proceedings of the Seventh International Conference on IT Security Incident Management and IT Forensics, pp. 33–44, 2013.

    Google Scholar 

  4. F. Breitinger and I. Baggili, File detection in network traffic using approximate matching, Journal of Digital Forensics, Security and Law, vol. 9(2), pp. 23–36, 2014.

    Google Scholar 

  5. F. Breitinger and H. Baier, A fuzzy hashing approach based on random sequences and Hamming distance, Proceedings of the Annual ADFSL Conference on Digital Forensics, Security and Law, pp. 89–100, 2012.

    Google Scholar 

  6. F. Breitinger and H. Baier, Similarity-preserving hashing: Eligible properties and a new algorithm MRSH-v2, in Digital Forensics and Cyber Crime, M. Rogers and K. Seigfried-Spellar (Eds.), Springer, Berlin Heidelberg, Germany, pp. 167–182, 2013.

    Google Scholar 

  7. F. Breitinger, H. Baier and J. Beckingham, Security and implementation analysis of the similarity digest sdhash, Proceedings of the First International Baltic Conference on Network Security and Forensics, 2012.

    Google Scholar 

  8. F. Breitinger, B. Guttman, M. McCarrin, V. Roussev and D. White, Approximate Matching: Definition and Terminology, NIST Special Publication 800-168, National Institute of Standards and Technologies, Gaithersburg, Maryland, 2014.

    Google Scholar 

  9. F. Breitinger, H. Liu, C. Winter, H. Baier, A. Rybalchenko and M. Steinebach, Towards a process model for hash functions in digital forensics, in Digital Forensics and Cyber Crime, P. Gladyshev, A. Marrington and I. Baggili (Eds.), Springer, Cham, Switzerland, pp. 170–186, 2014.

    Google Scholar 

  10. F. Breitinger and K. Petrov, Reducing the time required for hashing operations, in Advances in Digital Forensics IX, G. Peterson and S. Shenoi (Eds.), Springer, Heidelberg, Germany, pp. 101–117, 2013.

    Google Scholar 

  11. A. Breslow and N. Jayasena, Morton filters: Fast, compressed sparse cuckoo filters, The VLDB Journal, vol. 29(2-3), pp. 731–754, 2020

    Google Scholar 

  12. D. Chang, M. Ghosh, S. Sanadhya, M. Singh and D. White, FbHash: A new similarity hashing scheme for digital forensics, Digital Investigation, vol. 29(S), pp. S113–S123, 2019.

    Google Scholar 

  13. D. Chang, S. Sanadhya and M. Singh, Security analysis of MVhash-B similarity hashing, Journal of Digital Forensics, Security and Law, vol. 11(2), pp. 22–34, 2016.

    Google Scholar 

  14. B. Charyyev and M. Gunes, IoT traffic flow identification using locality-sensitive hashes, Proceedings of the IEEE International Conference on Communications, 2020.

    Google Scholar 

  15. E. Damiani, S. De Capitani di Vimercati, S. Paraboschi and P. Samarati, An open digest-based technique for spam detection, Proceedings of the ICSA Seventeenth International Conference on Parallel and Distributed Computing Systems, pp. 559–564, 2004.

    Google Scholar 

  16. Editorial Team, Our work with the DNC: Setting the record straight, CrowdStrike Blog, June 5, 2020.

    Google Scholar 

  17. B. Fan, D. Andersen, M. Kaminsky and M. Mitzenmacher, Cuckoo filter: Practically better than Bloom, Proceedings of the Tenth ACM International Conference on Emerging Networking Experiments and Technologies, pp. 75–88, 2014.

    Google Scholar 

  18. S. Garfinkel, P. Farrell, V. Roussev and G. Dinolt, Bringing science to digital forensics with standardized forensic corpora, Digital Investigation, vol. 6(S), pp. S2–S11, 2009.

    Google Scholar 

  19. S. Gatlan, Software AG, IT giant, hit with \$23 million ransom by Clop ransomware, BleepingComputer, October 9, 2020.

    Google Scholar 

  20. T. Graf and D. Lemire, XOR filters: Faster and smaller than Bloom and cuckoo filters, ACM Journal of Experimental Algorithmics, vol. 25(1), article no. 5, 2020.

    Google Scholar 

  21. V. Gupta and F. Breitinger, How cuckoo filters can improve existing approximate matching techniques, in Digital Forensics and Cyber Crime, J. James and F. Breitinger (Eds.), Springer, Cham, Switzerland, pp. 39–52, 2015.

    Google Scholar 

  22. N. Harbour, dcfldd version 1.3.4-1 (dcfldd.sourceforge.net), 2006.

    Google Scholar 

  23. V. Harichandran, F. Breitinger and I. Baggili, Bytewise approximate matching: The good, the bad and the unknown, Journal of Digital Forensics, Security and Law, vol. 11(2), pp. 59–78, 2016.

    Google Scholar 

  24. J. Kornblum, Identifying almost identical files using context-triggered piecewise hashing, Digital Investigation, vol. 3(S), pp. 91–97, 2006.

    Google Scholar 

  25. V. Martinez, F. Hernandez-Alvarez and L. Encinas, An improved bytewise approximate matching algorithm suitable for files of dissimilar sizes, Mathematics, vol. 8(4), article no. 503, 2020.

    Google Scholar 

  26. J. Oliver, C. Cheng and Y. Chen, TLSH – A locality-sensitive hash, Proceedings of the Fourth Cybercrime and Trustworthy Computing Workshop, pp. 7–13, 2013.

    Google Scholar 

  27. A. Lee and T. Atkison, A comparison of fuzzy hashes: Evaluation, guidelines and future suggestions, Proceedings of the ACM SouthEast Conference, pp. 18–25, 2017.

    Google Scholar 

  28. D. Lillis, F. Breitinger and M. Scanlon, Expediting MRSH-v2 approximate matching with hierarchical Bloom filter trees, in Digital Forensics and Cyber Crime, P. Matousek and M. Schmiedecker (Eds.), Springer, Cham, Switzerland, pp. 144–157, 2018.

    Google Scholar 

  29. F. Pagani, M. Dell’Amico and D. Balzarotti, Beyond precision and recall: Understanding uses (and misuses) of similarity hashes in binary analysis, Proceedings of the Eighth ACM Conference on Data and Application Security and Privacy, pp. 354–365, 2018.

    Google Scholar 

  30. V. Roussev, Building a better similarity trap with statistically-improbable features, Proceedings of the Forty-Second Hawaii International Conference on System Sciences, 2009.

    Google Scholar 

  31. V. Roussev, Data fingerprinting with similarity digests, in Advances in Digital Forensics VI, K. Chow and S. Shenoi (Eds.), Springer, Heidelberg, Germany, pp. 207–226, 2010.

    Google Scholar 

  32. V. Roussev, An evaluation of forensic similarity hashes, Digital Investigation, vol. 8(S), pp. S34–S41, 2011.

    Google Scholar 

  33. V. Roussev, G. Richard and L. Marziale, Multi-resolution similarity hashing, Digital Investigation, vol. 4(S), pp. S105–S113, 2007.

    Google Scholar 

  34. C. Sadowski and G. Levin, SimHash: Hash-Based Similarity Detection, Technical Report, Department of Computer Science, University of California Santa Cruz, Santa Cruz, California, 2007.

    Google Scholar 

  35. ssdeep Project, sdeep – Fuzzy Hashing Program, GitHub (ssdeep-project.github.io/ssdeep), April 11, 2018.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas Göbel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Göbel, T., Uhlig, F., Baier, H. (2021). EVALUATION OF NETWORK TRAFFIC ANALYSIS USING APPROXIMATE MATCHING ALGORITHMS. In: Peterson, G., Shenoi, S. (eds) Advances in Digital Forensics XVII. DigitalForensics 2021. IFIP Advances in Information and Communication Technology, vol 612. Springer, Cham. https://doi.org/10.1007/978-3-030-88381-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-88381-2_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-88380-5

  • Online ISBN: 978-3-030-88381-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics