Abstract
Constructions of science that slowly change over time are deemed to be the basis of the reliability with which scientific knowledge is regarded. A potential paradigm shift based on big data is looming – many researchers believe that massive volumes of data have enough substance to capture knowledge without the theories needed in earlier epochs. Patterns in big data are deemed to be sufficient to make predictions about the future, as well as about the past as a form of understanding. This chapter uses an argument developed by Calude and Longo [6] to critically examine the belief system of the proponents of data-driven knowledge, especially as it applies to digital forensic science.
From Ramsey theory it follows that, if data is large enough, knowledge is imbued in the domain represented by the data purely based on the size of the data. The chapter concludes that it is generally impossible to distinguish between true domain knowledge and knowledge inferred from spurious patterns that must exist purely as a function of data size. In addition, what is deemed a significant pattern may be refuted by a pattern that has yet to be found. Hence, evidence based on patterns found in big data is tenuous at best. Digital forensics should therefore proceed with caution if it wants to embrace big data and the paradigms that evolve from and around big data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
C. Anderson, The end of theory: The data deluge makes the scientific method obsolete, Wired, June 23, 2008.
N. Beebe, Digital forensic research: The good, the bad and the unaddressed, in Advances in Digital Forensics V, G. Peterson and S. Shenoi (Eds.), Springer, Heidelberg, Germany, pp. 17–36, 2009.
P. Blair, P. Fleming, D. Bensley, I. Smith, C. Bacon and E. Taylor, Plastic mattresses and sudden infant death syndrome, Lancet, vol. 345(8951), p. 720, 1995.
R. Blanch, Report of the Inquiry into the Convictions of Kathleen Megan Folbigg, State of New South Wales, Parramatta, Australia (www.folbigginquiry.justice.nsw.gov.au/Documents/Report%20of%20the%20Inquiry%20into%20the%20convictions%20of%20Kathleen%20Megan%20Folbigg.pdf), 2019.
J. Buolamwini and T. Gebru, Gender shades: Intersectional accuracy disparities in commercial gender classification, Proceedings of Machine Learning Research, vol. 81, pp. 77–91, 2018.
C. Calude and G. Longo, The deluge of spurious correlations in big data, Foundations of Science, vol. 22(3), pp. 595–612, 2017.
J. Clemens, Automatic classification of object code using machine learning, Digital Investigation, vol. 14(S1), pp. S156–S162, 2015.
K. Crawford and T. Paglen, Excavating AI: The Politics of Training Sets for Machine Learning, Excavating AI (www.excavating.ai), September 19, 2019.
S. D’Agostino, The architect of modern algorithms, Quanta Magazine, November 20, 2019.
England and Wales Court of Appeal (Criminal Division), Regina v. Sally Clark, EWCA Crim 54, Case No: 1999/07495/Y3, Royal Courts of Justice, London, United Kingdom, October 2, 2000.
England and Wales Court of Appeal (Criminal Division), Regina v. Sally Clark, EWCA Crim 1020, Case No. 2002/03824/Y3, Royal Courts of Justice, London, United Kingdom, April 11, 2003.
M. Kestemont, M. Tschuggnall, E. Stamatatos, W. Daelemans, G. Specht and B. Potthast, Overview of the author identification task at PAN-2018: Cross-domain authorship attribution and style change detection, in Working Notes of CLEF 2018 – Conference and Labs of the Evaluation Forum, L. Cappellato, N. Ferro, J. Nie and L. Soulier (Eds.), Volume 2125, CEUR-WS.org, RWTH Aachen University, Aachen, Germany, 2018.
W. Knight, Facebook’s head of AI says the field will soon “hit the wall,” Wired, December 4, 2019.
P. Langley, The changing science of machine learning, Machine Learning, vol. 82(3), pp. 275–279, 2011.
R. Meadow, Fatal abuse and smothering, in ABC of Child Abuse, R. Meadow (Ed.), BMJ Publishing Group, London, United Kingdom, pp. 27–29,1997.
F. Mitchell, The use of artificial intelligence in digital forensics: An introduction, Digital Evidence and Electronic Signature Law Review, vol. 7, pp. 35–41, 2010.
F. Mitchell, An overview of artificial intelligence based pattern matching in a security and digital forensic context, in Cyberpatterns, C. Blackwell and H. Zhu (Eds.), Springer, Cham, Switzerland, pp. 215–222, 2014.
M. Pollitt and A. Whitledge, Exploring big haystacks, in Advances in Digital Forensics II, M. Olivier and S. Shenoi (Eds.), Springer, Boston, Massachusetts, pp. 67–76, 2006.
I. Raji and J. Buolamwini, Actionable auditing: Investigating the impact of publicly naming biased performance results of commercial AI products, Proceedings of the AAAI/ACM Conference on AI, Ethics and Society, pp. 429–435, 2019.
F. Ramsey, On a problem of formal logic, Proceedings of the London Mathematical Society, vol. s2-30(1), pp. 264–286, 1930.
Royal Statistical Society, Royal Statistical Society concerned by issues raised in Sally Clark case, News Release, London, United Kingdom, October 23, 2001.
J. Smeaton, Reports of the Late John Smeaton, F.R.S., Made on Various Occasions, in the Course of his Employment as a Civil Engineer, Volume II, Longman, London, United Kingdom, 1812.
J. Wulff, Artificial intelligence and law enforcement, Australasian Policing, vol. 10(1), pp. 16–23, 2018.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 IFIP International Federation for Information Processing
About this paper
Cite this paper
Olivier, M. (2020). Digital Forensics and the Big Data Deluge — Some Concerns Based on Ramsey Theory. In: Peterson, G., Shenoi, S. (eds) Advances in Digital Forensics XVI. DigitalForensics 2020. IFIP Advances in Information and Communication Technology, vol 589. Springer, Cham. https://doi.org/10.1007/978-3-030-56223-6_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-56223-6_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-56222-9
Online ISBN: 978-3-030-56223-6
eBook Packages: Computer ScienceComputer Science (R0)