Skip to main content

Associating Drives Based on Their Artifact and Metadata Distributions

  • Conference paper
  • First Online:

Abstract

Associations between drive images can be important in many forensic investigations, particularly those involving organizations, conspiracies, or contraband. This work investigated metrics for comparing drives based on the distributions of 18 types of clues. The clues were email addresses, phone numbers, personal names, street addresses, possible bank-card numbers, GPS data, files in zip archives, files in rar archives, IP addresses, keyword searches, hash values on files, words in file names, words in file names of Web sites, file extensions, immediate directories of files, file sizes, weeks of file creation times, and minutes within weeks of file creation. Using a large corpus of drives, we computed distributions of document association using the cosine similarity TF/IDF formula and Kullback-Leibler divergence formula. We provide significance criteria for similarity based on our tests that are well above those obtained from random distributions. We also compared similarity and divergence values, investigated the benefits of filtering and sampling the data before measuring association, examined the similarities of the same drive at different times, and developed useful visualization techniques for the associations.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   72.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Abe, H., Tsumoto, S.: Text categorization with considering temporal patterns of term usages. In: Proceedings of IEEE International Conference on Data Mining Workshops, pp. 800–807 (2010)

    Google Scholar 

  2. Beverly, R., Garfinkel, S., Cardwell, G.: Forensic caving of network packets and associated data structures. Digital Invest. 8, S78–S89 (2011)

    Article  Google Scholar 

  3. Borgatti, S., Everett, M.: Models of core/periphery structures. Soc. Netw. 21(4), 375–395 (2000)

    Article  Google Scholar 

  4. Bulk Extractor 1.5: Digital Corpora: Bulk Extractor [software] (2013). digitalcorpora.org/downloads/bulk_extractor. 6 Feb 2015

  5. Catanese, S., Fiumara, G., A visual tool for forensic analysis of mobile phone traffic. In: Proceedings ACM Workshop on Multimedia in Forensics, Security, and Intelligence, Firenze, Italy, October 2010, pp. 71–76 (2010)

    Google Scholar 

  6. Flaglien, Anders, Franke, Katrin, Arnes, Andre: Identifying Malware Using Cross-Evidence Correlation. In: Peterson, Gilbert, Shenoi, Sujeet (eds.) DigitalForensics 2011. IAICT, vol. 361, pp. 169–182. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24212-0_13

    Chapter  Google Scholar 

  7. Forman, G., Eshghi, K., Chiocchetti, S.: Finding similar files in large document repositories. In: Proceedings of 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, Chicago, IL, US, August 2005, pp. 394–400 (2005)

    Google Scholar 

  8. Garfinkel, S.: Forensic feature extraction and cross-drive analysis. Digital Invest. 3S, S71–S81 (2006)

    Article  Google Scholar 

  9. Garfinkel, S., Farrell, P., Roussev, V., Dinolt, G.: Bringing science to digital forensics with standardized forensic corpora. Digital Invest. 6, S2–S11 (2009)

    Article  Google Scholar 

  10. Jones, A., Valli, C., Dardick, C., Sutherland, I., Dabibi, G., Davies, G.: The 2009 analysis of information remaining on disks offered for sale on the second hand market. J. Digital Forensics Secur. Law 5(4) (2010). Article 3

    Google Scholar 

  11. Mohammed, H., Clarke, N., Li, F.: An automated approach for digital forensic analysis of heterogeneous big data. J. Digital Forensics, Secur. Law 11(2) (2016). Article 9

    Google Scholar 

  12. Nassif, L., Hruschka, E.: Document clustering for forensic analysis: an approach for improving computer inspection. IEEE Trans. Inf. Forensics Secur. 8(1), 46–54 (2013)

    Article  Google Scholar 

  13. Pateriya, P., Lakshmi, Raj, G.: A pragmatic validation of stylometric techniques using BPA. In: Proceedings of International Conference on The Next Generation Information Technology: Confluence, pp. 124–131 (2014)

    Google Scholar 

  14. Patterson, J., Hargreaves, C.: The potential for cross-drive analysis using automated digital forensic timelines. In: Proceedings of 6th International Conference on Cybercrime Forensics and Training, Canterbury, NZ, October 2012 (2012)

    Google Scholar 

  15. Raghavan, S., Clark, A., Mohay, G.: FIA: an open forensic integration architecture for composing digital evidence. In: Proceedings of International Conference of Forensics in Telecommunications, Information and Multimedia, pp. 83–94 (2009)

    Google Scholar 

  16. Rowe, N.: Identifying forensically uninteresting files in a large corpus. EAI Endorsed Trans. Secur. Safety 16(7) (2016). Article e2

    Article  Google Scholar 

  17. Rowe, N.: Finding and rating personal names on drives for forensic needs. In: Proceedings of 9th EAI International Conference on Digital Forensics and Computer Crime, Prague, Czech Republic, October 2017

    Google Scholar 

  18. Rowe, N., Schwamm, R., McCarrin, M., Gera, R.: Making sense of email addresses on drives. J. Digital Forensics Secur. Law 11(2), 153–173 (2016)

    Google Scholar 

  19. Sippl, M., Scheraga, H.: Solution of the embedding problem and decomposition of symmetric matrices. In: Proceedings of National Academy of Sciences, USA, vol. 82, pp. 2197–2201, April 1985

    Article  MathSciNet  Google Scholar 

  20. Sun, M., Xu, G., Zhang, J., Kim, D.: Tracking you through DNS traffic: Linking user sessions by clustering with Dirichlet mixture model. In: Proceedings of 20th ACM International Conference on Modeling, Analysis, and Simulation of Wireless and Mobile Systems, Miami, FL, US, November 2017, pp. 303–310 (2017)

    Google Scholar 

  21. Tabish, S., Shafiq, M., Farooq, M., Malware detection using statistical analysis of byte-level file content. In: Proceedings of ACM Workshop on Cybersecurity and Intelligence, Paris, France, June 2009, pp. 23–31 (2009)

    Google Scholar 

  22. Van Bruaene, J.: Large scale cross-drive correlation of digital media. M.S. thesis, U.S. Naval Postgraduate School, March 2016

    Google Scholar 

  23. Whissell, J., Clarke, C.: Effective measures for inter-document similarity. In: Proceedings of 22nd ACM International Conference on Information and Knowledge Management, pp. 1361–1370 (2013)

    Google Scholar 

  24. Woods, K., Lee, C., Garfinkel, S., Dittrich, D., Russell, A., Kearton, K.: Creating realistic corpora for security and forensic education. In: Proceedings of ADFSL Conference on Digital Forensics, Security, and Law, pp. 123–134 (2011)

    Google Scholar 

  25. Zhao, S., Yu, L., Cheng, B.: Probabilistic community using link and content for social networks. IEEE. Access PP(99), 27189–27202 (2017)

    Article  Google Scholar 

  26. Zhou, D., Manavoglu, E., Li, J., Giles, C., Zha, H.: Probabilistic models for discovering e-communities. In: Proceedings of WWW Conference, 23–26 May 2006, Edinburgh, Scotland, pp. 173–182 (2006)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the Naval Research Program at the Naval Postgraduate School under JON W7B27. The views expressed are those of the author and do not represent the U.S. Government. Edith Gonzalez-Reynoso and Sandra Falgout helped.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Neil C. Rowe .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rowe, N.C. (2019). Associating Drives Based on Their Artifact and Metadata Distributions. In: Breitinger, F., Baggili, I. (eds) Digital Forensics and Cyber Crime. ICDF2C 2018. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 259. Springer, Cham. https://doi.org/10.1007/978-3-030-05487-8_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-05487-8_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-05486-1

  • Online ISBN: 978-3-030-05487-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics