Abstract
Associations between drive images can be important in many forensic investigations, particularly those involving organizations, conspiracies, or contraband. This work investigated metrics for comparing drives based on the distributions of 18 types of clues. The clues were email addresses, phone numbers, personal names, street addresses, possible bank-card numbers, GPS data, files in zip archives, files in rar archives, IP addresses, keyword searches, hash values on files, words in file names, words in file names of Web sites, file extensions, immediate directories of files, file sizes, weeks of file creation times, and minutes within weeks of file creation. Using a large corpus of drives, we computed distributions of document association using the cosine similarity TF/IDF formula and Kullback-Leibler divergence formula. We provide significance criteria for similarity based on our tests that are well above those obtained from random distributions. We also compared similarity and divergence values, investigated the benefits of filtering and sampling the data before measuring association, examined the similarities of the same drive at different times, and developed useful visualization techniques for the associations.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Abe, H., Tsumoto, S.: Text categorization with considering temporal patterns of term usages. In: Proceedings of IEEE International Conference on Data Mining Workshops, pp. 800–807 (2010)
Beverly, R., Garfinkel, S., Cardwell, G.: Forensic caving of network packets and associated data structures. Digital Invest. 8, S78–S89 (2011)
Borgatti, S., Everett, M.: Models of core/periphery structures. Soc. Netw. 21(4), 375–395 (2000)
Bulk Extractor 1.5: Digital Corpora: Bulk Extractor [software] (2013). digitalcorpora.org/downloads/bulk_extractor. 6 Feb 2015
Catanese, S., Fiumara, G., A visual tool for forensic analysis of mobile phone traffic. In: Proceedings ACM Workshop on Multimedia in Forensics, Security, and Intelligence, Firenze, Italy, October 2010, pp. 71–76 (2010)
Flaglien, Anders, Franke, Katrin, Arnes, Andre: Identifying Malware Using Cross-Evidence Correlation. In: Peterson, Gilbert, Shenoi, Sujeet (eds.) DigitalForensics 2011. IAICT, vol. 361, pp. 169–182. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24212-0_13
Forman, G., Eshghi, K., Chiocchetti, S.: Finding similar files in large document repositories. In: Proceedings of 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, Chicago, IL, US, August 2005, pp. 394–400 (2005)
Garfinkel, S.: Forensic feature extraction and cross-drive analysis. Digital Invest. 3S, S71–S81 (2006)
Garfinkel, S., Farrell, P., Roussev, V., Dinolt, G.: Bringing science to digital forensics with standardized forensic corpora. Digital Invest. 6, S2–S11 (2009)
Jones, A., Valli, C., Dardick, C., Sutherland, I., Dabibi, G., Davies, G.: The 2009 analysis of information remaining on disks offered for sale on the second hand market. J. Digital Forensics Secur. Law 5(4) (2010). Article 3
Mohammed, H., Clarke, N., Li, F.: An automated approach for digital forensic analysis of heterogeneous big data. J. Digital Forensics, Secur. Law 11(2) (2016). Article 9
Nassif, L., Hruschka, E.: Document clustering for forensic analysis: an approach for improving computer inspection. IEEE Trans. Inf. Forensics Secur. 8(1), 46–54 (2013)
Pateriya, P., Lakshmi, Raj, G.: A pragmatic validation of stylometric techniques using BPA. In: Proceedings of International Conference on The Next Generation Information Technology: Confluence, pp. 124–131 (2014)
Patterson, J., Hargreaves, C.: The potential for cross-drive analysis using automated digital forensic timelines. In: Proceedings of 6th International Conference on Cybercrime Forensics and Training, Canterbury, NZ, October 2012 (2012)
Raghavan, S., Clark, A., Mohay, G.: FIA: an open forensic integration architecture for composing digital evidence. In: Proceedings of International Conference of Forensics in Telecommunications, Information and Multimedia, pp. 83–94 (2009)
Rowe, N.: Identifying forensically uninteresting files in a large corpus. EAI Endorsed Trans. Secur. Safety 16(7) (2016). Article e2
Rowe, N.: Finding and rating personal names on drives for forensic needs. In: Proceedings of 9th EAI International Conference on Digital Forensics and Computer Crime, Prague, Czech Republic, October 2017
Rowe, N., Schwamm, R., McCarrin, M., Gera, R.: Making sense of email addresses on drives. J. Digital Forensics Secur. Law 11(2), 153–173 (2016)
Sippl, M., Scheraga, H.: Solution of the embedding problem and decomposition of symmetric matrices. In: Proceedings of National Academy of Sciences, USA, vol. 82, pp. 2197–2201, April 1985
Sun, M., Xu, G., Zhang, J., Kim, D.: Tracking you through DNS traffic: Linking user sessions by clustering with Dirichlet mixture model. In: Proceedings of 20th ACM International Conference on Modeling, Analysis, and Simulation of Wireless and Mobile Systems, Miami, FL, US, November 2017, pp. 303–310 (2017)
Tabish, S., Shafiq, M., Farooq, M., Malware detection using statistical analysis of byte-level file content. In: Proceedings of ACM Workshop on Cybersecurity and Intelligence, Paris, France, June 2009, pp. 23–31 (2009)
Van Bruaene, J.: Large scale cross-drive correlation of digital media. M.S. thesis, U.S. Naval Postgraduate School, March 2016
Whissell, J., Clarke, C.: Effective measures for inter-document similarity. In: Proceedings of 22nd ACM International Conference on Information and Knowledge Management, pp. 1361–1370 (2013)
Woods, K., Lee, C., Garfinkel, S., Dittrich, D., Russell, A., Kearton, K.: Creating realistic corpora for security and forensic education. In: Proceedings of ADFSL Conference on Digital Forensics, Security, and Law, pp. 123–134 (2011)
Zhao, S., Yu, L., Cheng, B.: Probabilistic community using link and content for social networks. IEEE. Access PP(99), 27189–27202 (2017)
Zhou, D., Manavoglu, E., Li, J., Giles, C., Zha, H.: Probabilistic models for discovering e-communities. In: Proceedings of WWW Conference, 23–26 May 2006, Edinburgh, Scotland, pp. 173–182 (2006)
Acknowledgements
This work was supported by the Naval Research Program at the Naval Postgraduate School under JON W7B27. The views expressed are those of the author and do not represent the U.S. Government. Edith Gonzalez-Reynoso and Sandra Falgout helped.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Rowe, N.C. (2019). Associating Drives Based on Their Artifact and Metadata Distributions. In: Breitinger, F., Baggili, I. (eds) Digital Forensics and Cyber Crime. ICDF2C 2018. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 259. Springer, Cham. https://doi.org/10.1007/978-3-030-05487-8_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-05487-8_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05486-1
Online ISBN: 978-3-030-05487-8
eBook Packages: Computer ScienceComputer Science (R0)