Cluster Computing

, Volume 19, Issue 2, pp 723–740 | Cite as

Big forensic data reduction: digital forensic images and electronic evidence



An issue that continues to impact digital forensics is the increasing volume of data and the growing number of devices. One proposed method to deal with the problem of “big digital forensic data”: the volume, variety, and velocity of digital forensic data, is to reduce the volume of data at either the collection stage or the processing stage. We have developed a novel approach which significantly improves on current practice, and in this paper we outline our data volume reduction process which focuses on imaging a selection of key files and data such as: registry, documents, spreadsheets, email, internet history, communications, logs, pictures, videos, and other relevant file types. When applied to test cases, a hundredfold reduction of original media volume was observed. When applied to real world cases of an Australian Law Enforcement Agency, the data volume further reduced to a small percentage of the original media volume, whilst retaining key evidential files and data. The reduction process was applied to a range of real world cases reviewed by experienced investigators and detectives and highlighted that evidential data was present in the data reduced forensic subset files. A data reduction approach is applicable in a range of areas, including: digital forensic triage, analysis, review, intelligence analysis, presentation, and archiving. In addition, the data reduction process outlined can be applied using common digital forensic hardware and software solutions available in appropriately equipped digital forensic labs without requiring additional purchase of software or hardware. The process can be applied to a wide variety of cases, such as terrorism and organised crime investigations, and the proposed data reduction process is intended to provide a capability to rapidly process data and gain an understanding of the information and/or locate key evidence or intelligence in a timely manner.


Digital forensics Big data Big forensic data Data reduction Forensic computing Forensic challenges Intelligence analysis 


  1. 1.
    Gartner. IT Glossary: Big Data. (2013). Accessed 21 July 2013
  2. 2.
    Garfinkel, S.: Digital forensics research: the next 10 years. Digit. Investig. 7, S64–S73 (2010)CrossRefGoogle Scholar
  3. 3.
    Raghavan, S.: Digital forensic research: current state of the art. CSI Trans. ICT 1(1), 91–114 (2013)CrossRefGoogle Scholar
  4. 4.
    FBI_RCFL: FBI Regional Computer Forensic Laboratory Annual Reports 2003–2012. 2003–2012;
  5. 5.
    Australia, C.o., National plan to combat cybercrime, A.C. Commission, Editor 2013: CanberraGoogle Scholar
  6. 6.
    Palmer, G.: A road map for digital forensic research. Report from the First Digital Forensic Research Workshop (DFRWS) (2001)Google Scholar
  7. 7.
    Richard, G., Roussev, V.: Digital Forensics Tools: The Next Generation. Digital Crime and Forensic Science in Cyberspace, p. 75, 2006Google Scholar
  8. 8.
    Beebe, N.: Digital Forensic Research: The Good, the Bad and the Unaddressed. Advances in Digital Forensics, pp. 17–36. Springer, Berlin (2009)Google Scholar
  9. 9.
    Kenneally, E., Brown, C.: Risk sensitive digital evidence collection. Digit. Investig. 2(2), 101–119 (2005)CrossRefGoogle Scholar
  10. 10.
    Greiner, L.: Sniper Forensics. netWorker 13(4), 8–10 (2009)CrossRefGoogle Scholar
  11. 11.
    Beebe, N., Clark, J.: Dealing with terabyte data sets in digital investigations. Advances in Digital Forensics, pp. 3–16. Springer, Berlin (2005)Google Scholar
  12. 12.
    Alzaabi, M., Jones, A., Martin, T.A.: An Ontology-Based Forensic Analysis Tool. Journal of Digital Forensics, Security & Law, 2013. In: 2013 Conference Supplement, pp. 121–135Google Scholar
  13. 13.
    van Baar, R.B., van Beek, H.M.A., van Eijk, E.J.: Digital forensics as a service: a game changer. Digit. Investig. 11, S54–S62 (2014)CrossRefGoogle Scholar
  14. 14.
    Casey, E., Ferraro, M., Nguyen, L.: Investigation delayed is justice denied: proposals for expediting forensic examinations of digital evidence. J. Forensic Sci. 54(6), 1353–1364 (2009)CrossRefGoogle Scholar
  15. 15.
    Casey, E., Katz, G., Lewthwaite, J.: Honing digital forensic processes. Digit. Investig. 10(2), 138–147 (2013)CrossRefGoogle Scholar
  16. 16.
    Vidas, T., Kaplan, B., Geiger, M.: OpenLV: empowering investigators and first-responders in the digital forensics process. Digit. Investig. 11, S45–S53 (2014)CrossRefGoogle Scholar
  17. 17.
    Noel, G.E., Peterson, G.L.: Applicability of latent Dirichlet allocation to multi-disk search. Digit. Investig. 11(1), 43–56 (2014)CrossRefGoogle Scholar
  18. 18.
    Xu, Z., et al.: Knowle: a semantic link network based system for organizing large scale online news events. Future Gener. Comput. Syst. 43, 40–50 (2015)CrossRefGoogle Scholar
  19. 19.
    Xu, Z., et al.: Crowdsourcing based social media data analysis of urban emergency events. In: Multimedia Tools and Applications, pp. 1–18, 2015Google Scholar
  20. 20.
    Xu, Z., et al.: Crowdsourcing based description of urban emergency events using social media big data. In: IEEE Transactions on Cloud Computing, PP(99): pp. 1–1, 2016Google Scholar
  21. 21.
    Brown, R., Pham, B., de Vel, O.: Design of a digital forensics image mining system. In: Knowledge-Based Intelligent Information and Engineering Systems, pp. 395–404, 2005Google Scholar
  22. 22.
    Pollitt, M.M.: Triage: a practical solution or admission of failure. Digit. Investig. 10(2), 87–88 (2013)CrossRefGoogle Scholar
  23. 23.
    Ferraro, M.M., Russell, A.: Current issues confronting well-established computer-assisted child exploitation and computer crime task forces. Digit. Investig. 1(1), 7–15 (2004)CrossRefGoogle Scholar
  24. 24.
    Turner, P.: Applying a forensic approach to incident response, network investigation and system administration using Digital Evidence Bags. Digit. Investig. 4(1), 30–35 (2007)CrossRefGoogle Scholar
  25. 25.
    Parsonage, H.: Computer Forensics Case Assessment and Triage—some ideas for discussion, 2009. Accessed 4 Aug 2013
  26. 26.
    Shiaeles, S., Chryssanthou, A., Katos, V.: On-scene triage open source forensic tool chests: are they effective? Digit. Investig. 10(2), 99–115 (2013)CrossRefGoogle Scholar
  27. 27.
    Roussev, V., Richard, G.: Breaking the performance wall: The case for distributed digital forensics, 2004. In: Proceedings of the 2004 Digital Forensics Research Workshop, Vol. 94Google Scholar
  28. 28.
    Lee, J., Un, S., Hong, D.: High-speed search using Tarari content processor in digital forensics. Digit. Investig. 5, S91–S95 (2008)CrossRefGoogle Scholar
  29. 29.
    Pringle, N., Sutherland, I.: Is a Computational Grid a Suitable Platform for High Performance Digital Forensics? In: Proceedings of the 7th European Conference on Information Warfare and Security 2008, Academic Conferences Limited, p. 175Google Scholar
  30. 30.
    Sheldon, A.: The future of forensic computing. Digit. Investig. 2(1), 31–35 (2005)CrossRefGoogle Scholar
  31. 31.
    Alink, W., et al.: XIRAF—XML-based indexing and querying for digital forensics. Digit. Investig. 3, 50–58 (2006)CrossRefGoogle Scholar
  32. 32.
    Bhoedjang, R.A.F., et al.: Engineering an online computer forensic service. Digit. Investig. 9(2), 96–108 (2012)CrossRefGoogle Scholar
  33. 33.
    Ribaux, O., Walsh, S.J., Margot, P.: The contribution of forensic science to crime analysis and investigation: forensic intelligence. Forensic Sci. Int. 156(2), 171–181 (2006)CrossRefGoogle Scholar
  34. 34.
    Kantardzic, M.: Data Mining: Concepts, Models, Methods, and Algorithms. Wiley, New York (2011)CrossRefMATHGoogle Scholar
  35. 35.
    Pyle, D.: Data Preparation for Data Mining, vol. 1. Morgan Kaufmann, Burlington (1999)Google Scholar
  36. 36.
    Fayyad, U., Piatetsky-Shapiro, G.: Knowledge discovery and data mining: towards a unifying framework. In: KDD, pp. 82–88, 1996Google Scholar
  37. 37.
    Shannon, M.: Forensic relative strength scoring: ASCII and entropy scoring. Int. J. Digit. Evid. 2(4), 151–169 (2004)Google Scholar
  38. 38.
    Wang, L., et al.: Particle swarm optimization based dictionary learning for remote sensing big data. Knowl. Based Syst. 79, 43–50 (2015)CrossRefGoogle Scholar
  39. 39.
    Wang, L., et al.: IK-SVD: dictionary learning for spatial big data via incremental atom update. Comput. Sci. Eng. 16(4), 41–52 (2014)CrossRefGoogle Scholar
  40. 40.
    Ma, Y., et al.: Towards building a data-intensive index for big data computing—a case study of remote sensing data processing. In: Information Sciences, 2014Google Scholar
  41. 41.
    Stüttgen, J.: Selective imaging: creating efficient forensic images by selecting content first. Mannheim University, 2011Google Scholar
  42. 42.
    Garfinkel, S.L.: Forensic feature extraction and cross-drive analysis. Digit. Investig. 3, 71–81 (2006)CrossRefGoogle Scholar
  43. 43.
    Shaw, A., Browne, A.: A practical and robust approach to coping with large volumes of data submitted for digital forensic examination. Digit. Investig. 10(2), 116–128 (2013)CrossRefGoogle Scholar
  44. 44.
    Grier, J., Richard III, G.G.: Rapid forensic acquisition of large media with sifting collectors. Digit. Investig. 2015(14), S34–S44 (2015)CrossRefGoogle Scholar
  45. 45.
    Quick, D., Choo, K.-K.R.: Data reduction and data mining framework for digital forensic evidence: storage, intelligence, review and archive. Trends Issues Crime Crim. Justice 480, 1–11 (2014)Google Scholar
  46. 46.
    ISO/IEC, 27037:2012 Guidelines for identification, collection, acquisition and preservation of digital evidence, in Information technology—Security techniques. ISO, Geneva (2012)Google Scholar
  47. 47.
    ACPO: Good Practice Guidelines for Computer Based Evidence v4.0. 2006. Accessed 5 Mar 2014
  48. 48.
    NIJ: Forensic Examination of Digital Evidence: A Guide for Law Enforcement, 2004.
  49. 49.
    Alqahtany, S., et al.: A forensic acquisition and analysis system for IaaS. In: Cluster Computing, pp. 1–15, 2015Google Scholar
  50. 50.
    Hu, C., et al.: Semantic link network-based model for organizing multimedia big data. IEEE Trans. Emerg. Top. Comput. 2(3), 376–387 (2014)CrossRefGoogle Scholar
  51. 51.
    Xu, Z., et al.: Semantic based representing and organizing surveillance big data using video structural description technology. J. Syst. Softw. 102, 217–225 (2015)CrossRefGoogle Scholar
  52. 52.
    Hu, C., et al.: Video structural description technology for the new generation video surveillance systems. Front. Comput. Sci. 9(6), 980–989 (2015)CrossRefGoogle Scholar
  53. 53.
    Xu, Z., et al.: Semantic enhanced cloud environment for surveillance data management using video structural description. In: Computing, pp. 1–20, 2014Google Scholar
  54. 54.
    Alhussein, M.: Automatic facial emotion recognition using weber local descriptor for e-Healthcare system. In: Cluster Computing, pp. 1–10, 2016Google Scholar
  55. 55.
    Jones, B., Pleno, S., Wilkinson, M.: The use of random sampling in investigations involving child abuse material. Digit. Investig. 9, S99–S107 (2012)CrossRefGoogle Scholar
  56. 56.
    Garfinkel, S., et al.: Bringing science to digital forensics with standardized forensic corpora. Digit. Investig. 6, S2–S11 (2009)CrossRefGoogle Scholar
  57. 57.
    Ribaux, O., et al.: Intelligence-led crime scene processing. Part I: Forensic intelligence. Forensic Sci. Int. 195(1–3), 10–16 (2010)CrossRefGoogle Scholar
  58. 58.
    Luo, X., et al.: Building association link network for semantic link on web resources. IEEE Trans. Autom. Sci. Eng. 8(3), 482–494 (2011)CrossRefGoogle Scholar
  59. 59.
    Xu, Z., et al.: Measuring the semantic discrimination capability of association relations. Concurr. Comput. 26(2), 380–395 (2014)CrossRefGoogle Scholar
  60. 60.
    Xu, Z., et al.: Generating temporal semantic context of concepts using web search engines. J. Netw. Comput. Appl. 43, 42–55 (2014)CrossRefGoogle Scholar
  61. 61.
    Wei, X., et al.: Online comment-based hotel quality automatic assessment using improved fuzzy comprehensive evaluation and fuzzy cognitive map. IEEE Trans. Fuzzy Syst. 23(1), 72–84 (2015)CrossRefGoogle Scholar
  62. 62.
    Xu, Z., et al.: Mining temporal explicit and implicit semantic relations between entities using web search engines. Future Gener. Comput. Syst. 37, 468–477 (2014)CrossRefGoogle Scholar
  63. 63.
    Xuan, J., et al.: Uncertainty analysis for the keyword system of web events, 2015Google Scholar
  64. 64.
    Zhao, L., et al.: Geographical information system parallelization for spatial big data processing: a review. In: Cluster Computing, pp. 1–14, 2015Google Scholar
  65. 65.
    Punithavathani, D.S., Sujatha, K., Jain, J.M.: Surveillance of anomaly and misuse in critical networks to counter insider threats using computational intelligence. Clust. Comput. 18(1), 435–451 (2015)CrossRefGoogle Scholar
  66. 66.
    Ghaleb, T.A.: Techniques and countermeasures of website/wireless traffic analysis and fingerprinting. In: Cluster Computing, pp. 1–12, 2015Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.University of South AustraliaAdelaideAustralia

Personalised recommendations