Abstract
Investigators and analysts are increasingly experiencing large, even terabyte sized data sets when conducting digital investigations. State-of-the-art digital investigation tools and processes are efficiency constrained from both system and human perspectives, due to their continued reliance on overly simplistic data reduction and mining algorithms. The extension of data mining research to the digital forensic science discipline will have some or all of the following benefits: (i) reduced system and human processing time associated with data analysis; (ii) improved information quality associated with data analysis; and (iii) reduced monetary costs associated with digital investigations. This paper introduces data mining and reviews the limited extant literature pertaining to the application of data mining to digital investigations and forensics. Finally, it provides suggestions for applying data mining research to digital forensics.
Chapter PDF
References
T. Abraham and O. de Vel, Investigative profiling with computer forensic log data and association rules, Proceedings of the IEEE International Conference on Data Mining, pp. 11–18, 2002.
T. Abraham, R. Kling and O. de Vel, Investigative profile analysis with computer forensic log data using attribute generalization, Proceedings of the Fifteenth Australian Joint Conference on Artificial Intelligence, 2002.
D. Barbara, J. Couto, S. Jajodia and N. Wu, ADAM: A testbed for exploring the use of data mining in intrusion detection, ACM SIGMOD Record, vol 30(4), pp. 15–24, 2001.
N. Beebe and J. Clark, A hierarchical objectives-based framework for the digital investigations process, to appear in Digital Investigation, 2005.
D. Brown and S. Hagen, Data association methods with applications to law enforcement, Decision Support Systems vol. 34, p. 10, 2002.
M. Carney and M. Rogers, The Trojan made me do it: A first step in statistical based computer forensics event reconstruction, Digital Evidence, vol. 2(4), p. 11, 2004.
E. Casey, Network traffic as a source of evidence: Tool strengths, weaknesses and future needs, Digital Investigation, vol. 1, pp. 28–43, 2004.
W. Cavnar and J. Trenkle, N-gram-based text categorization, Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval, pp. 161–175, 1994.
M. Chau, J. Xu and H. Chen, Extracting meaningful entities from police narrative reports, Proceedings of the National Conference for Digital Government Research, pp. 271–275, 2002.
H. Chen, W. Chung, Y. Qin, M. Chau, J. Xu, G. Wang, R. Zheng and H. Atabakhsh, Crime data mining: An overview and case studies, Proceedings of the National Conference for Digital Government Research, p. 4, 2003.
H. Chen, W. Chung, J. Xu, G. Wang, Y. Qin and M. Chau, Crime data mining: A general framework and some examples, IEEE Computer, vol. 37(4), pp. 50–56, 2004.
Connected Corporation, Storage reduction facts and figures (www.connected.com/downloads/Items_for_Downloads/Storage Facts_Figures.pdf).
O. de Vel, A. Anderson, M Corney and G. Mohay, Mining e-mail content for author identification forensics, ACM SIGMOD Record, vol. 30(4), pp. 55–64, 2001.
J. Giordano and C. Maciag, Cyber forensics: A military operations perspective, Digital Evidence, vol 1(2), pp. 1–13, 2002.
J. Han and M. Kamber, Data Mining: Concepts and Techniques, Academic Press, San Diego, California, p. 550, 2001.
D. Hand, H. Mannila and P. Smyth, Principles of Data Mining, MIT Press, Cambridge, Massachusetts, 2001.
R. Hauck, H. Atabakhsh, P. Ongvasith, H. Gupta and H. Chen, Using Coplink to analyze criminal justice data, IEEE Computer, vol. 35, pp. 30–37, March 2002.
F. Hinshaw, Data warehouse appliances: Driving the business intelligence revolution, DM Review Magazine, September, 2004.
J. Jackson, G. Gunsch, R. Claypoole and G. Lamont, Blind steganography detection using a computational immune system: A work in progress, Digital Evidence, vol. 1(4), pp. 1–19, 2003.
G. Moore and C. Benjamin, Using Benford’s Law for fraud detection, Internal Auditing, vol. 19(1), pp. 4–9, 2004.
S. Mukkamala and A. Sung, Identifying significant features for network forensic analysis using artificial intelligence techniques, Digital Evidence, vol. 1(4), pp. 1–17, 2003.
G. Palmer, A Road Map for Digital Forensics Research: Report from the First Digital Forensics Research Workshop, Technical Report DTR-T001-01 Final, Air Force Research Laboratory, Rome, New York, 2001.
F. Petitcolas, R. Anderson and M. Kuhn, Information hiding: A survey, Proceedings of the IEEE, vol. 87(7), pp. 1062–1078, 1999.
D. Radcliff, Inside the DoD’s Crime Lab, NetworkWorldFusion, pp. 1–5, March 8, 2004.
V. Roussev and G. Richard III, Breaking the performance wall: The cases for distributed digital forensics, Proceedings of the Digital Forensics Research Workshop, pp. 1–16, 2004.
M. Schwartz, Cybercops need better tools, Computerworld, p. 1, July 31, 2000.
M. Shannon, Forensics relative strength scoring: ASCII and entropy scoring, Digital Evidence, vol. 2(4), pp. 1–19, 2004.
P. Sommer, The challenges of large computer evidence cases, Digital Investigation, vol. 1, pp. 16–17, 2004.
S. Stolfo, W. Lee, P. Chan, W. Fan and E. Eskin, Data mining based intrusion detectors: An overview of the Columbia IDS Project, A CM SIGMOD Record, vol. 30(4), pp. 5–14, 2001.
D. Sullivan, Document Warehousing and Text Mining: Techniques for Improving Business Operations, Marketing and Sales, John Wiley, New York, p. 542, 2001.
G. Wang, H. Chen and H. Atabakhsh, Automatically detecting deceptive criminal identities, Communications of the ACM, vol. 47(3), pp. 71–76, 2004.
J. Xu and H. Chen, Fighting organized crimes: Using shortest-path algorithms to identify associations in criminal networks, Decision Support Systems, vol. 38, pp. 473–487, 2004.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 International Federation for Information Processing
About this paper
Cite this paper
Beebe, N., Clark, J. (2006). Dealing with Terabyte Data Sets in Digital Investigations. In: Pollitt, M., Shenoi, S. (eds) Advances in Digital Forensics. DigitalForensics 2005. IFIP — The International Federation for Information Processing, vol 194. Springer, Boston, MA. https://doi.org/10.1007/0-387-31163-7_1
Download citation
DOI: https://doi.org/10.1007/0-387-31163-7_1
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-30012-2
Online ISBN: 978-0-387-31163-0
eBook Packages: Computer ScienceComputer Science (R0)