Advertisement

Expediting MRSH-v2 Approximate Matching with Hierarchical Bloom Filter Trees

  • David Lillis
  • Frank Breitinger
  • Mark Scanlon
Conference paper
Part of the Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering book series (LNICST, volume 216)

Abstract

Perhaps the most common task encountered by digital forensic investigators consists of searching through a seized device for pertinent data. Frequently, an investigator will be in possession of a collection of “known-illegal” files (e.g. a collection of child pornographic images) and will seek to find whether copies of these are stored on the seized drive. Traditional hash matching techniques can efficiently find files that precisely match. However, these will fail in the case of merged files, embedded files, partial files, or if a file has been changed in any way.

In recent years, approximate matching algorithms have shown significant promise in the detection of files that have a high bytewise similarity. This paper focuses on MRSH-v2. A number of experiments were conducted using Hierarchical Bloom Filter Trees to dramatically reduce the quantity of pairwise comparisons that must be made between known-illegal files and files on the seized disk. The experiments demonstrate substantial speed gains over the original MRSH-v2, while maintaining effectiveness.

Keywords

Approximate matching Hierarchical bloom filter trees MRSH-v2 

References

  1. 1.
    van Baar, R., van Beek, H., van Eijk, E.: Digital forensics as a service: a game changer. Digit. Invest. 11(Supplement 1), S54–S62 (2014).  https://doi.org/10.1016/j.diin.2014.03.007 CrossRefGoogle Scholar
  2. 2.
    Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)CrossRefzbMATHGoogle Scholar
  3. 3.
    de Braekt, R.I., Le-Khac, N.A., Farina, J., Scanlon, M., Kechadi, T.: Increasing digital investigator availability through efficient workflow management and automation. In: 2016 4th International Symposium on Digital Forensic and Security (ISDFS), pp. 68–73 (2016).  https://doi.org/10.1109/ISDFS.2016.7473520
  4. 4.
    Breitinger, F., Baier, H.: Similarity preserving hashing: eligible properties and a new algorithm MRSH-v2. In: Rogers, M., Seigfried-Spellar, K.C. (eds.) ICDF2C 2012. LNICST, vol. 114, pp. 167–182. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-39891-9_11 CrossRefGoogle Scholar
  5. 5.
    Breitinger, F., Baier, H., White, D.: On the database lookup problem of approximate matching. Digit. Invest. 11, S1–S9 (2014).  https://doi.org/10.1016/j.diin.2014.03.001 CrossRefGoogle Scholar
  6. 6.
    Breitinger, F., Guttman, B., McCarrin, M., Roussev, V., White, D.: Approximate matching: definition and terminology. NIST Spec. Publ. 800, 168 (2014)Google Scholar
  7. 7.
    Breitinger, F., Rathgeb, C., Baier, H.: An efficient similarity digests database lookup - a logarithmic divide & conquer approach. J. Digit. Forensics Secur. Law 9(2), 155–166 (2014)Google Scholar
  8. 8.
    Broder, A.Z.: On the resemblance and containment of documents. In: Compression and Complexity of Sequences 1997, Proceedings, pp. 21–29. IEEE (1997).  https://doi.org/10.1109/SEQUEN.1997.666900
  9. 9.
    Casey, E., Ferraro, M., Nguyen, L.: Investigation delayed is justice denied: proposals for expediting forensic examinations of digital evidence. J. Forensic Sci. 54(6), 1353–1364 (2009)CrossRefGoogle Scholar
  10. 10.
    Gupta, J.N., Kalaimannan, E., Yoo, S.M.: A heuristic for maximizing investigation effectiveness of digital forensic cases involving multiple investigators. Comput. Oper. Res. 69, 1–9 (2016).  https://doi.org/10.1016/j.cor.2015.11.003 MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Harichandran, V.S., Breitinger, F., Baggili, I.: Bytewise approximate matching: the good, the bad, and the unknown. J. Digit. Forensics Secur. Law: JDFSL 11(2), 59 (2016)Google Scholar
  12. 12.
    James, J.I., Gladyshev, P.: Automated inference of past action instances in digital investigations. Int. J. Inf. Secur. 14(3), 249–261 (2015).  https://doi.org/10.1007/s10207-014-0249-6 CrossRefGoogle Scholar
  13. 13.
    Kornblum, J.: Identifying identical files using context triggered piecewise hashing. Digit. Invest. 3, 91–97 (2006).  https://doi.org/10.1016/j.diin.2006.06.015 CrossRefGoogle Scholar
  14. 14.
    Lillis, D., Becker, B., O’Sullivan, T., Scanlon, M.: Current challenges and future research areas for digital forensic investigation. In: 11th ADFSL Conference on Digital Forensics, Security and Law (CDFSL 2016), ADFSL, Daytona Beach, FL, USA (2016).  https://doi.org/10.13140/RG.2.2.34898.76489
  15. 15.
    Oliver, J., Cheng, C., Chen, Y.: TLSH-a locality sensitive hash. In: Cybercrime and Trustworthy Computing Workshop (CTC), 2013 Fourth, pp. 7–13. IEEE (2013).  https://doi.org/10.1109/CTC.2013.9
  16. 16.
    Quick, D., Choo, K.K.R.: Impacts of increasing volume of digital forensic data: a survey and future research challenges. Digit. Invest. 11(4), 273–294 (2014).  https://doi.org/10.1016/j.diin.2014.09.002 CrossRefGoogle Scholar
  17. 17.
    Rogers, M.K., Goldman, J., Mislan, R., Wedge, T., Debrota, S.: Computer forensics field triage process model. J. Digit. Forensics Secur. Law 1(2), 19–38 (2006)Google Scholar
  18. 18.
    Roussev, V.: Data fingerprinting with similarity digests. In: Chow, K.P., Shenoi, S. (eds.) IFIP International Conference on Digital Forensics. IFIP AICT, vol. 337, pp. 207–226. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15506-2_15 Google Scholar
  19. 19.
    Roussev, V.: An evaluation of forensic similarity hashes. Digit. Invest. 8, S34–S41 (2011)CrossRefGoogle Scholar
  20. 20.
    Roussev, V., Richard III, G.G.: Breaking the performance wall: the case for distributed digital forensics. In: Proceedings of the 2004 Digital Forensics Research Workshop, vol. 94 (2004)Google Scholar
  21. 21.
    Sadowski, C., Levin, G.: Simhash: hash-based similarity detection. Technical report, Google (2007)Google Scholar
  22. 22.
    Scanlon, M.: Battling the digital forensic backlog through data deduplication. In: Proceedings of the 6th IEEE International Conference on Innovative Computing Technologies (INTECH 2016). IEEE, Dublin (2016)Google Scholar

Copyright information

© ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2018

Authors and Affiliations

  1. 1.Forensics and Security Research Group, School of Computer ScienceUniversity College DublinDublinIreland
  2. 2.Cyber Forensics Research and Education Group, Tagliatela College of Engineering, ECECSUniversity of New HavenWest HavenUSA

Personalised recommendations