Abstract
Approximate matching has become indispensable in digital forensics as practitioners often have to search for relevant files in massive digital corpora. The research community has developed a variety of approximate matching algorithms. However, not only data at rest, but also data in motion can benefit from approximate matching. Examining network traffic flows in modern networks, firewalls and data loss prevention systems are key to preventing security compromises.
This chapter discusses the current state of research, use cases, validations and optimizations related to applications of approximate matching algorithms to network traffic analysis. For the first time, the efficacy of prominent approximate matching algorithms at detecting files in network packet payloads is evaluated, and the best candidates, namely TLSH, ssdeep, mrsh-net and mrsh-cf, are adapted to this task. The individual algorithms are compared, strengths and weaknesses highlighted, and detection rates evaluated in gigabit-range, real-world scenarios. The results are very promising, including a detection rate of 97% while maintaining a throughput of 4 Gbps when processing a large forensic file corpus. An additional contribution is the public sharing of optimized prototypes of the most promising algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
S. Alneyadi, E. Sithirasenan and V. Muthukkumarasamy, A survey of data leakage prevention systems, Journal of Network and Computer Applications, vol. 62, pp. 137–152, 2016.
H. Baier and F. Breitinger, Security aspects of piecewise hashing in computer forensics, Proceedings of the Sixth International Conference on IT Security Incident Management and IT Forensics, pp. 21–36, 2011.
F. Breitinger, K. Astebol, H. Baier and C. Busch, mvHash-B – A new approach for similarity-preserving hashing, Proceedings of the Seventh International Conference on IT Security Incident Management and IT Forensics, pp. 33–44, 2013.
F. Breitinger and I. Baggili, File detection in network traffic using approximate matching, Journal of Digital Forensics, Security and Law, vol. 9(2), pp. 23–36, 2014.
F. Breitinger and H. Baier, A fuzzy hashing approach based on random sequences and Hamming distance, Proceedings of the Annual ADFSL Conference on Digital Forensics, Security and Law, pp. 89–100, 2012.
F. Breitinger and H. Baier, Similarity-preserving hashing: Eligible properties and a new algorithm MRSH-v2, in Digital Forensics and Cyber Crime, M. Rogers and K. Seigfried-Spellar (Eds.), Springer, Berlin Heidelberg, Germany, pp. 167–182, 2013.
F. Breitinger, H. Baier and J. Beckingham, Security and implementation analysis of the similarity digest sdhash, Proceedings of the First International Baltic Conference on Network Security and Forensics, 2012.
F. Breitinger, B. Guttman, M. McCarrin, V. Roussev and D. White, Approximate Matching: Definition and Terminology, NIST Special Publication 800-168, National Institute of Standards and Technologies, Gaithersburg, Maryland, 2014.
F. Breitinger, H. Liu, C. Winter, H. Baier, A. Rybalchenko and M. Steinebach, Towards a process model for hash functions in digital forensics, in Digital Forensics and Cyber Crime, P. Gladyshev, A. Marrington and I. Baggili (Eds.), Springer, Cham, Switzerland, pp. 170–186, 2014.
F. Breitinger and K. Petrov, Reducing the time required for hashing operations, in Advances in Digital Forensics IX, G. Peterson and S. Shenoi (Eds.), Springer, Heidelberg, Germany, pp. 101–117, 2013.
A. Breslow and N. Jayasena, Morton filters: Fast, compressed sparse cuckoo filters, The VLDB Journal, vol. 29(2-3), pp. 731–754, 2020
D. Chang, M. Ghosh, S. Sanadhya, M. Singh and D. White, FbHash: A new similarity hashing scheme for digital forensics, Digital Investigation, vol. 29(S), pp. S113–S123, 2019.
D. Chang, S. Sanadhya and M. Singh, Security analysis of MVhash-B similarity hashing, Journal of Digital Forensics, Security and Law, vol. 11(2), pp. 22–34, 2016.
B. Charyyev and M. Gunes, IoT traffic flow identification using locality-sensitive hashes, Proceedings of the IEEE International Conference on Communications, 2020.
E. Damiani, S. De Capitani di Vimercati, S. Paraboschi and P. Samarati, An open digest-based technique for spam detection, Proceedings of the ICSA Seventeenth International Conference on Parallel and Distributed Computing Systems, pp. 559–564, 2004.
Editorial Team, Our work with the DNC: Setting the record straight, CrowdStrike Blog, June 5, 2020.
B. Fan, D. Andersen, M. Kaminsky and M. Mitzenmacher, Cuckoo filter: Practically better than Bloom, Proceedings of the Tenth ACM International Conference on Emerging Networking Experiments and Technologies, pp. 75–88, 2014.
S. Garfinkel, P. Farrell, V. Roussev and G. Dinolt, Bringing science to digital forensics with standardized forensic corpora, Digital Investigation, vol. 6(S), pp. S2–S11, 2009.
S. Gatlan, Software AG, IT giant, hit with \$23 million ransom by Clop ransomware, BleepingComputer, October 9, 2020.
T. Graf and D. Lemire, XOR filters: Faster and smaller than Bloom and cuckoo filters, ACM Journal of Experimental Algorithmics, vol. 25(1), article no. 5, 2020.
V. Gupta and F. Breitinger, How cuckoo filters can improve existing approximate matching techniques, in Digital Forensics and Cyber Crime, J. James and F. Breitinger (Eds.), Springer, Cham, Switzerland, pp. 39–52, 2015.
N. Harbour, dcfldd version 1.3.4-1 (dcfldd.sourceforge.net), 2006.
V. Harichandran, F. Breitinger and I. Baggili, Bytewise approximate matching: The good, the bad and the unknown, Journal of Digital Forensics, Security and Law, vol. 11(2), pp. 59–78, 2016.
J. Kornblum, Identifying almost identical files using context-triggered piecewise hashing, Digital Investigation, vol. 3(S), pp. 91–97, 2006.
V. Martinez, F. Hernandez-Alvarez and L. Encinas, An improved bytewise approximate matching algorithm suitable for files of dissimilar sizes, Mathematics, vol. 8(4), article no. 503, 2020.
J. Oliver, C. Cheng and Y. Chen, TLSH – A locality-sensitive hash, Proceedings of the Fourth Cybercrime and Trustworthy Computing Workshop, pp. 7–13, 2013.
A. Lee and T. Atkison, A comparison of fuzzy hashes: Evaluation, guidelines and future suggestions, Proceedings of the ACM SouthEast Conference, pp. 18–25, 2017.
D. Lillis, F. Breitinger and M. Scanlon, Expediting MRSH-v2 approximate matching with hierarchical Bloom filter trees, in Digital Forensics and Cyber Crime, P. Matousek and M. Schmiedecker (Eds.), Springer, Cham, Switzerland, pp. 144–157, 2018.
F. Pagani, M. Dell’Amico and D. Balzarotti, Beyond precision and recall: Understanding uses (and misuses) of similarity hashes in binary analysis, Proceedings of the Eighth ACM Conference on Data and Application Security and Privacy, pp. 354–365, 2018.
V. Roussev, Building a better similarity trap with statistically-improbable features, Proceedings of the Forty-Second Hawaii International Conference on System Sciences, 2009.
V. Roussev, Data fingerprinting with similarity digests, in Advances in Digital Forensics VI, K. Chow and S. Shenoi (Eds.), Springer, Heidelberg, Germany, pp. 207–226, 2010.
V. Roussev, An evaluation of forensic similarity hashes, Digital Investigation, vol. 8(S), pp. S34–S41, 2011.
V. Roussev, G. Richard and L. Marziale, Multi-resolution similarity hashing, Digital Investigation, vol. 4(S), pp. S105–S113, 2007.
C. Sadowski and G. Levin, SimHash: Hash-Based Similarity Detection, Technical Report, Department of Computer Science, University of California Santa Cruz, Santa Cruz, California, 2007.
ssdeep Project, sdeep – Fuzzy Hashing Program, GitHub (ssdeep-project.github.io/ssdeep), April 11, 2018.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 IFIP International Federation for Information Processing
About this paper
Cite this paper
Göbel, T., Uhlig, F., Baier, H. (2021). EVALUATION OF NETWORK TRAFFIC ANALYSIS USING APPROXIMATE MATCHING ALGORITHMS. In: Peterson, G., Shenoi, S. (eds) Advances in Digital Forensics XVII. DigitalForensics 2021. IFIP Advances in Information and Communication Technology, vol 612. Springer, Cham. https://doi.org/10.1007/978-3-030-88381-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-88381-2_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88380-5
Online ISBN: 978-3-030-88381-2
eBook Packages: Computer ScienceComputer Science (R0)