A New Approach for Creating Forensic Hashsets

Ruback, Marcelo; Hoelz, Bruno; Ralha, Celia

doi:10.1007/978-3-642-33962-2_6

A New Approach for Creating Forensic Hashsets

Marcelo Ruback^3,4,
Bruno Hoelz^3,4 &
Celia Ralha³

Conference paper

1535 Accesses
4 Citations

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 383))

Abstract

The large amounts of data that have to be processed and analyzed by forensic investigators is a growing challenge. Using hashsets of known files to identify and filter irrelevant files in forensic investigations is not as effective as it could be, especially in non-English speaking countries. This paper describes the application of data mining techniques to identify irrelevant files from a sample of computers from a country or geographical region. The hashsets corresponding to these files are augmented with an optimized subset of effective hash values chosen from a conventional hash database. Experiments using real evidence demonstrate that the resulting augmented hashset yields 30.69% better filtering results than a conventional hashset although it has approximately half as many (51.83%) hash values.

Download to read the full chapter text

Chapter PDF

References

N. Beebe and J. Clark, Dealing with terabyte data sets in digital investigations, in Advances in Digital Forensics, M. Pollitt and S. Shenoi (Eds.), Springer, Boston, Massachusetts, pp. 3–16, 2005.
Google Scholar
S. Bunting, EnCase Computer Forensics – The Official EnCE: EnCase Certified Examiner Study Guide, Sybex, Hoboken, New Jersey, 2007.
Google Scholar
U. Fayyad, G. Piatetsky-Shapiro and P. Smyth, From data mining to knowledge discovery: An overview, in Advances in Knowledge Discovery and Data Mining, U. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy (Eds.), AAAI Press, Menlo Park, California, pp. 1–34, 1996.
Google Scholar
J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, San Francisco, California, 2006.
MATH Google Scholar
F. Hinshaw, Data warehouse appliances: Driving the business intelligence revolution, Information Management, vol. 14(9), p. 30, 2004.
Google Scholar
B. Hoelz, C. Ralha and R. Geeverghese, Artificial intelligence applied to computer forensics, Proceedings of the ACM Symposium on Applied Computing, pp. 883–888, 2009.
Chapter Google Scholar
K. Kim, S. Park, T. Chang, C. Lee and S. Baek, Lessons learned from the construction of a Korean software reference data set for digital forensics, Digital Investigation, vol. 6(S), pp. S108–S113, 2009.
Article Google Scholar
S. Mead, Unique file identification in the National Software Reference Library, Digital Investigation, vol. 3(3), pp. 138–150, 2006.
Article Google Scholar
National Institute of Standards and Technology, National Software Reference Library, Gaithersburg, Maryland ( www.nsrl.nist.gov ).
J. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo, California, 1993.
Google Scholar
L. Rokach and O. Maimon, Data Mining with Decision Trees: Theory and Applications, World Scientific, Singapore, 2008.
MATH Google Scholar
V. Roussev, G. Richard and L. Marziale, Class-aware similarity hashing for data classification, in Advances in Digital Forensics IV, I. Ray and S. Shenoi (Eds.), Springer, Boston, Massachusetts, pp. 101–113, 2008.
Chapter Google Scholar
B. Schneier, Applied Cryptography, John Wiley, New York, 1995.
Google Scholar
P. Tan, M. Steinbach and V. Kumar, Introduction to Data Mining, Addison-Wesley, Boston, Massachusetts, 2005.
Google Scholar

Download references

Author information

Authors and Affiliations

University of Brasilia, Brasilia, Brazil
Marcelo Ruback, Bruno Hoelz & Celia Ralha
National Institute of Criminalistics, Brazilian Federal Police, Brasilia, Brazil
Marcelo Ruback & Bruno Hoelz

Authors

Marcelo Ruback
View author publications
You can also search for this author in PubMed Google Scholar
Bruno Hoelz
View author publications
You can also search for this author in PubMed Google Scholar
Celia Ralha
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Air Force Institute of Technology, Wright-Patterson Air Force Base, 45433-7765, OH, USA
Gilbert Peterson
University of Tulsa, 74104-3189, Tulsa, OK, USA
Sujeet Shenoi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ruback, M., Hoelz, B., Ralha, C. (2012). A New Approach for Creating Forensic Hashsets. In: Peterson, G., Shenoi, S. (eds) Advances in Digital Forensics VIII. DigitalForensics 2012. IFIP Advances in Information and Communication Technology, vol 383. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33962-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-33962-2_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33961-5
Online ISBN: 978-3-642-33962-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics