Reducing the Time Required for Hashing Operations
Due to the increasingly massive amounts of data that need to be analyzed in digital forensic investigations, it is necessary to automatically recognize suspect files and filter out non-relevant files. To achieve this goal, digital forensic practitioners employ hashing algorithms to classify files into known-good, known-bad and unknown files. However, a typical personal computer may store hundreds of thousands of files and the task becomes extremely time-consuming. This paper attempts to address the problem using a framework that speeds up processing by using multiple threads. Unlike a typical multithreading approach, where the hashing algorithm is performed by multiple threads, the proposed framework incorporates a dedicated prefetcher thread that reads files from a device. Experimental results demonstrate a runtime efficiency of nearly 40% over single threading.
KeywordsFile hashing runtime performance file handling prefetching
- 1.D. Alcantara, A. Sharf, F. Abbasinejad, S. Sengupta, M. Mitzenmacher, J. Owens and N. Amenta, Real-time parallel hashing on the GPU, ACM Transactions on Graphics, vol. 28(5), article no. 154, 2009.Google Scholar
- 2.C. Altheide and H. Carvey, Digital Forensics with Open Source Tools, Syngress, Waltham, Massachusetts, 2011.Google Scholar
- 4.A. Baxter, SSD vs. HDD (www.storagereview.com/ssd_vs_hdd), 2012.
- 6.F. Breitinger and H. Baier, Performance issues about context-triggered piecewise hashing, Proceedings of the Third International ICST Conference on Digital Forensics and Cyber Crime, pp. 141–155, 2011.Google Scholar
- 7.F. Breitinger and H. Baier, Similarity preserving hashing: Eligible properties and a new algorithm mrsh-v2, Proceedings of the Fourth International ICST Conference on Digital Forensics and Cyber Crime, 2012.Google Scholar
- 11.G. Moore, Cramming more components onto integrated circuits, Electronics Magazine, pp. 114–117, April 19, 1965.Google Scholar
- 12.National Institute of Standards and Technology, Secure Hash Standard, FIPS Publication 180-3, Gaithersburg, Maryland, 2008.Google Scholar
- 13.National Institute of Standards and Technology, National Software Reference Library, Gaithersburg, Maryland (www.nsrl.nist.gov), 2012.
- 14.L. Noll, FNV hash (www.isthe.com/chongo/tech/comp/fnv/index.html), 2012.
- 15.R. Rivest, MD5 Message-Digest Algorithm, RFC 1321, 1992.Google Scholar
- 18.S. Sumathi and S. Esakkirajan, Fundamentals of Relational Database Management Systems, Springer-Verlag, Berlin Heidelberg, Germany, 2010.Google Scholar
- 19.S. Woerthmueller, Multithreaded file I/O, Dr. Dobb’s Journal, September 28, 2009. Google Scholar