Abstract
Deep packet inspection typically uses MD5 whitelists/blacklists or regular expressions to identify viruses, malware and certain internal files in network traffic. Fuzzy hashing, also referred to as context-triggered piecewise hashing, can be used to compare two files and determine their level of similarity. This chapter presents the stream fuzzy hash algorithm that can hash files on the fly regardless of whether the input is unordered, incomplete or has an initially-undetermined length. The algorithm, which can generate a signature of appropriate length using a one-way process, reduces the computational complexity from \(O\left( n \log n\right) \) to O(n). In a typical deep packet inspection scenario, the algorithm hashes files at the rate of 68 MB/s per CPU core and consumes no more than 5 KB of memory per file. The effectiveness of the stream fuzzy hash algorithm is evaluated using a publicly-available dataset. The results demonstrate that, unlike other fuzzy hash algorithms, the precision and recall of the stream fuzzy hash algorithm are not compromised when processing unordered and incomplete inputs.
Keywords
Download to read the full chapter text
Chapter PDF
References
F. Breitinger and I. Baggili, File detection on network traffic using approximate matching, Journal of Digital Forensics, Security and Law, vol. 9(2), pp. 23–35, 2014.
F. Breitinger and H. Baier, Performance issues about context-triggered piecewise hashing, Proceedings of the International Conference on Digital Forensics and Cyber Crime, pp. 141–155, 2011.
F. Breitinger and H. Baier, Similarity preserving hashing: Eligible properties and a new algorithm MRSH-v2, Proceedings of the International Conference on Digital Forensics and Cyber Crime, pp. 167–182, 2012.
A. Broder, On the resemblance and containment of documents, Proceedings of the Conference on Compression and Complexity of Sequences, pp. 21–29, 1997.
M. Charikar, Similarity estimation techniques from rounding algorithms, Proceedings of the Thirty-Fourth Annual ACM Symposium on the Theory of Computing, pp. 380–388, 2002.
L. Chen and G. Wang, An efficient piecewise hashing method for computer forensics, Proceedings of the First International Workshop on Knowledge Discovery and Data Mining, pp. 635–638, 2008.
T. Cormen, C. Leiserson, R. Rivest and C. Stein, Introduction to Algorithms, MIT Press, Cambridge, Massachusetts, 2009.
Y. Elovici, A. Shabtai, R. Moskovitch, G. Tahan and C. Glezer, Applying machine learning techniques for detection of malicious code in network traffic, Proceedings of the Annual Conference on Artificial Intelligence, pp. 44–50, 2007.
R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach and T. Berners-Lee, Hypertext Transfer Protocol – HTTP/1.1, RFC 2616, 1999.
M. Grassl, I. Ilic, S. Magliveras and R. Steinwandt, Cryptanalysis of the Tillich-Zémor hash function, Journal of Cryptology, vol. 24(1), pp. 148–156, 2011.
V. Harichandran, F. Breitinger and I. Baggili, Bytewise approximate matching: The good, the bad and the unknown, Journal of Digital Forensics, Security and Law, vol. 11(2), pp. 59–77, 2016.
K. Joju and P. Lilly, Pre-image of Tillich-Zémor hash function with new generators, Applied Mathematical Sciences, vol. 7(85), pp. 4237–4248, 2013.
K. Joju and P. Lilly, Improved form of Tillich-Zémor hash function, International Journal of Theoretical Physics and Cryptography, vol. 6, pp. 24–29, 2014.
J. Kornblum, Identifying almost identical files using context-triggered piecewise hashing, Digital Investigation, vol. 3(S), pp. S91–S97, 2006.
National Institute of Standards and Technology, National Software Reference Library (NSRL), Gaithersburg, Maryland (www.nist.gov/software-quality-group/national-software-reference-library-nsrl), 2018.
J. Oliver, C. Cheng and Y. Chen, TLSH – A locality sensitive hash, Proceedings of the Fourth Cybercrime and Trustworthy Computing Workshop, pp. 7–13, 2013.
C. Petit and J. Quisquater, Pre-images for the Tillich-Zémor hash function, Proceedings of the International Workshop on Selected Areas in Cryptography, pp. 282–301, 2010.
V. Roussev, Data fingerprinting with similarity digests, in Advances in Digital Forensics VI, K. Chow and S. Shenoi (Eds.), Springer, Heidelberg, Germany, pp. 207–226, 2010.
V. Roussev, An evaluation of forensic similarity hashes, Digital Investigation, vol. 8(S), pp. S34–S41, 2011.
A. Shrivastava and P. Li, In defense of MinHash over SimHash, Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, pp. 886–894, 2014.
X. Shu and D. Yao, Data leak detection as a service, Proceedings of the International Conference on Security and Privacy in Communications Systems, pp. 222–240, 2012.
J. Tillich and G. Zémor, Hashing with \(SL_2\), Proceedings of the International Cryptology Conference, pp. 40–49, 1994.
A. Tridgell, spamsum (github.com/tridge/junkcode/tree/master/spamsum), 2002.
S. Wandelt, J. Wang, S. Gerdjikov, S. Mishra, P. Mitankin, M. Patil, E. Siragusa, A. Tiskin, W. Wang, J. Wang and U. Lesser, State-of-the-art in string similarity search and join, ACM SIGMOD Record, vol. 43(1), pp. 64–76, 2014.
Wikipedia, Nilsimsa Hash (en.wikipedia.org/wiki/Nilsimsa\_Hash), 2018.
C. Winter, M. Schneider and Y. Yannikos, F2S2: Fast forensic similarity search through indexing piecewise hash signatures, Digital Investigation, vol. 10(4), pp. 361–371, 2013.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 IFIP International Federation for Information Processing
About this paper
Cite this paper
Zheng, C., Li, X., Liu, Q., Sun, Y., Fang, B. (2018). Hashing Incomplete and Unordered Network Streams. In: Peterson, G., Shenoi, S. (eds) Advances in Digital Forensics XIV. DigitalForensics 2018. IFIP Advances in Information and Communication Technology, vol 532. Springer, Cham. https://doi.org/10.1007/978-3-319-99277-8_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-99277-8_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99276-1
Online ISBN: 978-3-319-99277-8
eBook Packages: Computer ScienceComputer Science (R0)