Skip to main content

Measuring Accuracy of Automated Parsing and Categorization Tools and Processes in Digital Investigations

  • Conference paper
  • First Online:
Digital Forensics and Cyber Crime (ICDF2C 2013)

Abstract

This work presents a method for the measurement of the accuracy of evidential artifact extraction and categorization tasks in digital forensic investigations. Instead of focusing on the measurement of accuracy and errors in the functions of digital forensic tools, this work proposes the application of information retrieval measurement techniques that allow the incorporation of errors introduced by tools and analysis processes. This method uses a ‘gold standard’ that is the collection of evidential objects determined by a digital investigator from suspect data with an unknown ground truth. This work proposes that the accuracy of tools and investigation processes can be evaluated compared to the derived gold standard using common precision and recall values. Two example case studies are presented showing the measurement of the accuracy of automated analysis tools as compared to an in-depth analysis by an expert. It is shown that such measurement can allow investigators to determine changes in accuracy of their processes over time, and determine if such a change is caused by their tools or knowledge.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 72.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Anderson, T., Schum, D.A., Twining, W.L.: Analysis of Evidence. Cambridge University Press, Cambridge (2005)

    Book  Google Scholar 

  2. Baggili, I.M., Mislan, R., Rogers, M.: Mobile phone forensics tool testing: A database driven approach. Int. J. Digit. Evid. 6(2), 168–178 (2007)

    Google Scholar 

  3. Beebe, N.L., Clark, J.G.: Digital forensic text string searching: improving information retrieval effectiveness by thematically clustering search results. Digit. Investig. 4, 49–54 (2007)

    Article  Google Scholar 

  4. Carrier, B.D.: Defining digital forensic examination and analysis tools. In: Digital Forensic Research Workshop, Syracuse, NY, p. 10 (2002). (Citeseer)

    Google Scholar 

  5. Casey, E.: Error, uncertainty, and loss in digital evidence. Int. J. Digit. Evid. 1(2), 1–45 (2002)

    MathSciNet  Google Scholar 

  6. de Vel, O.: File classification using byte sub-stream kernels. Digit. Investig. 1(2), 150–157 (2004)

    Article  Google Scholar 

  7. Gogolin, G.: The digital crime tsunami. Digit. Investig. 7(1–2), 3–8 (2010)

    Article  Google Scholar 

  8. Goss, J., Gladyshev, P.: Forensic triage: managing the risk. Ph.D. thesis, Dublin (2010)

    Google Scholar 

  9. Guidance: EnCase Legal Journal, September 2009 Edition (2009)

    Google Scholar 

  10. James, J.: Survey of evidence and forensic tool usage in digital investigations (2010)

    Google Scholar 

  11. James, J.I., Gladyshev, P.: 2010 Report of digital forensic standards, processes and accuracy measurement (2011)

    Google Scholar 

  12. James, J.I., Gladyshev, P.: A survey of digital forensic investigator decision processes and measurement of decisions based on enhanced preview. Digit. Investig. 10(2), 148–157 (2013)

    Article  Google Scholar 

  13. James, J.I., Gladyshev, P.: Challenges with automation in digital forensic investigations, pp. 1–7 (2013)

    Google Scholar 

  14. Jones, A., Valli, C.: Building a Digital Forensic Laboratory: Establishing and Managing a Successful Facility. Butterworth-Heinemann, Oxford (2008)

    Google Scholar 

  15. Kim, J.S., Kim, D.G., Noh, B.N.: A fuzzy logic based expert system as a network forensics. In: Proceedings of the 2004 IEEE International Conference on Fuzzy Systems, vol. 2, pp. 879–884. IEEE (2004)

    Google Scholar 

  16. Koopmans, M.B., James, J.I.: Automated network triage. Digit. Investig. 10(2), 129–137 (2013)

    Article  Google Scholar 

  17. Li, B., Wang, Q, Luo, J.: Forensic analysis of document fragment based on SVM (2006)

    Google Scholar 

  18. Lyle, J.R.: If error rate is such a simple concept, why don’t i have one for my forensic tool yet? Digit. Investig. 7, S135–S139 (2010)

    Article  Google Scholar 

  19. Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    Book  MATH  Google Scholar 

  20. NIST: Computer forensics tool testing program (2013)

    Google Scholar 

  21. Obasogie, O.: Phantom of Heilbronn revealed! (2009)

    Google Scholar 

  22. Palmer, G.L.: Forensic analysis in the digital world. Int. J. Digit. Evid. 1(1), 1–6 (2002)

    Google Scholar 

  23. Rennison, A.: Codes of practice and conduct for forensic science providers and practitioners in the criminal justice system (2011)

    Google Scholar 

  24. Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice Hall, Upper Saddle River (2009)

    Google Scholar 

  25. Supreme Court: Daubert v. Merrell Dow Pharmaceuticals (1993)

    Google Scholar 

  26. Supreme Court: Crawford v. Commonwealth of Virginia (2000)

    Google Scholar 

  27. Thompson, W.C.: DNA testing. In: Levinson, D. (ed.) Encyclopedia of Crime and Punishment, vol. 2, pp. 537–544. Sage Publications Inc, Thousand Oaks (2002)

    Google Scholar 

  28. Yeoman, F.: The phantom of Heilbronn, the tainted DNA and an eight-year goose chase (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joshua I. James .

Editor information

Editors and Affiliations

Appendices

Appendix

A Case 1: Results of Precision of Investigation vs. the Gold Standard

Examination 1:

See Table 6.

Table 6. Examination 1 artifacts identified compared to the gold standard
$$P=\tfrac{4}{6}=0.67\;R=\tfrac{4}{12}=0.33\;F=2\cdot \tfrac{0.67 \cdot 0.33}{0.67 + 0.33} = 0.44$$

Examination 2:

See Table 7.

Table 7. Examination 2 artifacts identified compared to the gold standard
$$P=\tfrac{0}{5}=0\;R=\tfrac{0}{1}=0\;F=2\cdot \tfrac{0 \cdot 0}{0 + 0} = 0$$

Examination 3:

See Table 8.

Table 8. Examination 3 artifacts identified compared to the gold standard
$$P=\tfrac{200}{200}=1\;R=\tfrac{200}{200}=1\;F=2\cdot \tfrac{1 \cdot 1}{1 + 1} = 1$$

Examination 4:

See Table 9.

Table 9. Examination 4 artifacts identified compared to the gold standard
$$P=\tfrac{16}{216}=0.07\;R=\tfrac{16}{30}=0.53\;F=2\cdot \tfrac{0.07 \cdot 0.53}{0.07 + 0.53} = 0.12$$

Examination 5:

See Table 10.

Table 10. Examination 5 artifacts identified compared to the gold standard
$$P=\tfrac{4}{26}=0.15\;R=\tfrac{4}{34}=0.12\;F=2\cdot \tfrac{0.15 \cdot 0.12}{0.15 + 0.12} = 0.13$$

B Case 2: Results of Precision of Investigation vs. the Gold Standard

See Tables 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29 and 30.

Table 11. Results of a full examination on media number 1
Table 12. Results of preliminary analysis on media number 1 from five examiners
Table 13. Preliminary analysis object identification error rates for media number 1
Table 14. Preliminary analysis accuracy rates for media number 1
Table 15. Results of a full examination on media number 2
Table 16. Results of preliminary analysis on media number 2 from five examiners
Table 17. Preliminary analysis object identification error rates for media number 2
Table 18. Preliminary analysis accuracy rates for media number 2
Table 19. Results of a full examination on media number 3
Table 20. Results of preliminary analysis on media number 3 from five examiners
Table 21. Preliminary analysis object identification error rates for media number 3
Table 22. Preliminary analysis accuracy rates for media number 3
Table 23. Results of a full examination on media number 4
Table 24. Results of preliminary analysis on media number 4 from five examiners
Table 25. Preliminary analysis object identification error rates for media number 4
Table 26. Preliminary analysis accuracy rates for media number 4
Table 27. Results of a full examination on media number 5
Table 28. Results of preliminary analysis on media number 5 from five examiners
Table 29. Preliminary analysis object identification error rates for media number 5
Table 30. Preliminary analysis accuracy rates for media number 5

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Cite this paper

James, J.I., Lopez-Fernandez, A., Gladyhsev, P. (2014). Measuring Accuracy of Automated Parsing and Categorization Tools and Processes in Digital Investigations. In: Gladyshev, P., Marrington, A., Baggili, I. (eds) Digital Forensics and Cyber Crime. ICDF2C 2013. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 132. Springer, Cham. https://doi.org/10.1007/978-3-319-14289-0_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-14289-0_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-14288-3

  • Online ISBN: 978-3-319-14289-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics