Abstract
This paper considers recent attempts within archaeology to create, integrate and interpret digital data on an unprecedented scale—a movement that resonates with the much wider so-called big data phenomenon. Using the example of our work with a particularly large and complex dataset collated for the purpose of the English Landscape and Identities project (EngLaID), Oxford, UK, and drawing on insights from social scientists’ studies of information infrastructures much more broadly, we make the following key points. Firstly, alongside scrutinising and homogenising digital records for research purposes, it is vital that we continue to appreciate the broader interpretative value of ‘characterful’ archaeological data (those that have histories and flaws of various kinds). Secondly, given the intricate and pliable nature of archaeological data and the substantial challenges faced by researchers seeking to create a cyber-infrastructure for archaeology, it is essential that we develop interim measures that allow us to explore the parameters and potentials of working with archaeological evidence on an unprecedented scale. We also consider some of the practical and ethical consequences of working in this vein.
Similar content being viewed by others
Notes
The term ‘events’ is used throughout the text to mean archaeological fieldwork investigations.
References
Alberti, B., Jones, A. M., & Pollard, J. (2013). Archaeology after interpretation: returning materials to archaeological theory. Walnut Creek: Left Coast Press.
Amorosi, T., Woollett, J., Perdikaris, S., & McGovern, T. (1996). Regional zooarchaeology and global change: Problems and potentials. World Archaeology, 28(1), 126–157.
Atici, L., Witcher Kansa, S., Lev-Tov, J., & Kansa, E. C. (2013). Other peoples’ data: a demonstration of the imperative of publishing primary data. Journal of Archaeological Method and Theory, 20, 663–681.
Bawden, D., & Robinson, L. (2009). The dark side of information: overload, anxiety and other paradoxes and pathologies. Journal of Information Science, 35, 180–191.
Benson, D. (1972). A Sites and Monuments Record for the Oxford region. Oxoniensia, 37, 226–237.
Boldrini, N. (2006). Planning uncertainty: creating an artefact density index for North Yorkshire, England. Internet Archaeology,21 http://dx.doi.org/10.11141/ia.21.1. Accessed 15 October 2014.
Bowker, G., & Star, L. (1999). Sorting things out: classification and its consequences. Cambridge, MA: MIT Press.
Bowker, G. C. (2005). Memory practices in the sciences. Cambridge, Massachusetts: MIT.
Boyd, D., & Crawford, K. (2012). Critical questions for big data. Communication and Society, 15(5), 662–679.
Callou, C., Baly, I., Gargominy, O., & Reib, E. (2011). National Inventory of Natural Heritage website. Recent, historical and archaeological data. The SAA Archaeological, Record, 11(1), 37–40.
Clarke, D. L. (1968). Analytical archaeology. London: Methuen.
Connelly, W. (2011). A world of becoming. Durham, NC: Duke University Press.
Cooper, A. (2013). Prehistory in practice: a multi-stranded analysis of British prehistoric research, 1975–2010. British Archaeological Report, British Series 577. Oxford: Archaeopress.
Dam, C. and Hansen, H.J. (2005). The European digital resource in archaeology: sites and monuments data as a common European web resource. Internet Archaeology,18 http://dx.doi.org/10.11141/ia.18.4. Accessed 15 October 2014.
Dam, C., Austin, T., & Kenny, J. (2010). Breaking down national barriers: ARENA—a portal to European heritage information. In F. Niccolucci & H. Sorin (Eds.), Beyond the artefact. Digital interpretation of the past. Proceedings of CAA2004. Prato 13-17 April 2004 (pp. 94-98). Budapest: Archaeolingua.
Edgeworth, M. (2003). Acts of discovery: an ethnography of archaeological practice. British Archaeological Report International Series 1131. Oxford: Archaeopress.
Edwards, P. (2010). A vast machine: computer models, climate data, and the politics of global warming. Cambridge, MA: MIT Press.
Ell, P. S. (2010). GIS, e-Science and the humanities grid. In D. J. Bodenhamer, J. Corrigan, & T. M. Harris (Eds.), The spatial humanities: GIS and the future of humanities scholarship (pp. 143–166). Bloomington: Indiana University Press.
Evans, T. (2013). Holes in the archaeological record? A comparison of national event databases for the historic environment in England. The Historic Environment: Policy & Practice, 4, 19–34.
Fulford, M. G. & Holbrook, N. (2011). Assessing the contribution of commercial archaeology to the study of the Roman period in England, 1990-2004. Antiquaries Journal, 91, 323–345.
Gitelman, L. (Ed.). (2013). “Raw data” is an oxymoron. Cambridge, MA: MIT Press.
Gitelman, L., & Jackson, V. (2013). Introduction. In L. Gitelman (Ed.), “Raw data” is an oxymoron (pp. 1–14). Cambridge, MA: MIT Press.
Gobalet, K. (2001). A critique of faunal analysis: inconsistency among experts in blind analysis. Journal of Archaeological Science, 28, 377–386.
Goodman, D., & Piro, S. (2013). GPR remote sensing in archaeology. London: Springer.
Gosden, C., Kamash, Z., Kirkham, R., & Pybus, J. (2009). Joining the dots: exploring technical and social issues in e-Science approaches to linking landscape and artefactual data in British archaeology (pp. 171-174). E-Science workshops, 2009 5th IEEE International Conference. Oxford: Institute for Electrical and Electronic Engineers.
Green, C. (2012). Archaeology in broad strokes: collating data for England from 1500 BC to AD 1086. In A. Chrysanthi, D. Wheatley, I. Romanowska, C. Papadopoulos, P. Murrieta-Flores, T. Sly, & G. Earl (Eds.), Archaeology in the Digital Era: Papers from the 40th Annual Conference of Computer Applications and Quantitative Methods in Archaeology (CAA), Southampton, 26-29 March 2012 (pp. 307–312). Amsterdam: Amsterdam University Press.
Hodder, I. (1984). Archaeology in 1984. Antiquity, 58(222), 25–32.
Hodder, I. (1986). Reading the past: current approaches to interpretation in archaeology. Cambridge: Cambridge University Press.
Holbrook, N. & Morton, R. (2011). Assessing the research potential of grey literature in the study of Roman England. Stage 1 report. Cotswold Archaeology. http://dx.doi.org/10.5284/1000368. Accessed 3 December 2013.
Jones, A. (2009). Into the future. In B. Cunliffe, C. Gosden and R.A. Joyce (Eds.), The Oxford Handbook of Archaeology (pp. 89–114). Oxford: Oxford University Press.
Kamash, Z., Cooper, A., Green, C., ten Harkel, L., & Morley, L. (2014). Transregional research using national datasets. Unpublished report. Oxford: Institute of Archaeology.
Kinory, J. L. (2012). Salt production, distribution and use in the Britsh Iron Age. British Archaeological Report British Series 559. Oxford: Archaeopress.
Kintigh, K. (2006). The promise and challenge of archaeological data integration. American Antiquity, 71(3), 567–578.
Lampland, M. & S. Star, S. (Eds.) (2009). Standards and their stories: How quantifying, classifying and formalizing practices shape everyday life. New York: Cornell University Press.
Latour, B. (1988). The pasturisation of France. Cambridge, MA: Harvard University Press.
Latour, B. (2000). Did Ramses II die of Tuberculosis? On the partial existence of existing and nonexisting objects. In L. Daston (Ed.), Biographies of scientific objects (pp. 247–269). Chicago: Chicago University Press.
Latour, B. (2005). Reassembling the social: an introduction to Actor-Network-Theory. Oxford: Oxford University Press.
Latour, B., Jensen, P., Venturini, T., Grauwin, S., & Boullier, D. (2012). ‘The whole is always smaller than its parts’: A digital test of Gabriel Tardes’ monads. British Journal of Sociology, 63(4), 590–615.
Lee, E. (2012). ‘Everything we know informs everything we do’: a vision for historic environment sector knowledge and information management. The Historic Environment, 3(1), 28–41.
Levi, A.S. (2013). Humanities ‘big data’: myths, challenges, and lessons. In Big Data, 2013 I.E. International Conference Proceedings (pp. 33-36). http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6691667&isnumber=6690588. Accessed 15 October 2014.
Levy, T.E., (2014). Editorial. Near Eastern Archaeology, 77 (3, special issue on Cyber-Archaeology).
Lucas, G. (2001). Destruction and the rhetoric of excavation. Norwegian Archaeological Review, 34(1), 35–46.
Lucas, G. (2012). Understanding the archaeological record. Cambridge: Cambridge University Press.
McOmish, D., Field, D., & Brown, G. (2002). The field archaeology of the Salisbury Plain Training Area. Swindon: English Heritage.
Mikkelsen, M. (2012). Development-led archaeology in Denmark. In L. Webley, M. Vander Linden, C. Haselgrove, & R. Bradley (Eds.), Development-led archaeology in northwest Europe (pp. 117–127). Oxford: Oxbow.
Millerand, F., & Bowker, G. (2009). Metadata standards: trajectories and enactment in the life of an ontology. In M. Lampland & S. Star (Eds.), Standards and their stories: How quantifying, classifying and formalizing practices shape everyday life (pp. 149–166). New York: Cornell University Press.
Musen, M. (1992). Dimensions of knowledge sharing and reuse. Computers and Biomedical Research, 25, 435–67.
Nature Editors. (2009). Data’s shameful neglect: research cannot flourish if data are not preserved and made accessible. All concerned must act accordingly. Nature, 461(7261), 145.
Newman, M. (2011). The database as material culture. Internet Archaeology, 29. http://intarch.ac.uk/journal/issue29/tag_index.html. Accessed 3 December 2013.
Onsrud, H., & Campbell, J. (2007). Big opportunities in access to “small science” data. Data Science Journal, 6(Open Data Issue), 58–66.
Patrik, L. E. (1985). Is there an archaeological record?’. In M. B. Schiffer (Ed.), Advances in archaeological method and theory (pp. 27–62). New York: Academic Press.
Prescott, A. (2013). Bibliographic records as humanities in big data. In Big Data, 2013 I.E. International Conference Proceedings (pp. 55-58). http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6691670&isnumber=6690588. Accessed 15 October 2014.
Ribes, D., & Jackson, V. (2013). Data bite man: the work of sustaining a long-term study. In L. Gitelman (Ed.), “Raw data” is an oxymoron (pp. 147–66). Cambridge, MA: MIT Press.
Robinson, B. (2000). English Sites and Monuments Records—information, communication and technology. In G. Lock & K. Brown (Eds.), On the theory and practice of archaeological computing (pp. 89-106). Oxford University Committee for Archaeology Monograph 51. Oxford: Oxbow.
Robbins, K. (2013). Balancing the scales: Exploring the variable effects of collection bias on data collected by the Portable Antiquities Scheme. Landscapes, 14(1), 54–72.
Roskams, S. & Whyman, M. (2007). Categorising the past: lessons from the Archaeological Resource Assessment for Yorkshire. Internet Archaeology 23, http://intarch.ac.uk/journal/issue23/2/index.html. Accessed 3 December 2013.
Shanks, M., & Tilley, C. (1987). Re-constructing archaeology: theory and practice. Cambridge: Cambridge University Press.
Snow, D., Gahegan, M., Giles, L., Hirth, K., Milner, G., Prasenjit, M., & Wang, J. (2006). Cybertools and archaeology. Science, 311, 958–959.
Spielmann, K., & Kintigh, K. (2011). The digital archaeological record: The potentials of archaeozoological data integration through tDAR. The SAA Archaeological Record, 11(1), 22–25.
Taylor, J. (2007). An atlas of Roman rural settlement in England. London: CBA Research Report 151.
Tilbury, J. (2013). Digital archiving and preservation: How to compare and contrast. Workshop on the Future of Big Data Management, June 2013. https://indico.cern.ch/event/246453/session/8/contribution/21/material/slides/1.pdf. Accessed 10 Feb 2015.
Tilley, C. (1998). Archaeology: The loss of isolation. Antiquity, 72, 691–93.
Wainwright, G. (1989). Management of the English landscape. In H. Cleere (Ed.), Archaeological heritage management in the modern world (pp. 164–170). London: Council for British Archaeology.
Weinberger, D. (2012). Too big to know: rethinking knowledge now that the facts aren’t the facts, experts are everywhere, and the smartest person in the room is the room. New York: Basic Books.
Worrell, S., Egan, G., Naylor, J., Leahy, K., & Lewis, M. (Eds.). (2010). A decade of discovery: Proceedings of the Portable Antiquities Scheme Conference 2007. British Archaeological Reports, British Series 520. Oxford: Archaeopress.
Wylie, A. (1985). Putting shakertown back together: critical theory in archaeology. Journal of Anthropological Archaeology, 4, 133–47.
Yarrow, T. (2003). Artefactual persons: the relational capacities of persons and things in the practice of excavation. Norwegian Archaeological Review, 36(1), 65–73.
Yarrow, T. (2006). Perspective matters: traversing scale through archaeological practice. In G. Lock & B. Molyneaux (Eds.), Confronting scale in archaeology: issues of theory and practice (pp. 77–87). New York: Springer.
Yarrow, T. (2012). Not knowing as knowledge: asymmetry between archaeology and anthropology. In D. Garrow & T. Yarrow (Eds.), Archaeology and anthropology: understanding similarities, exploring differences (pp. 13–27). Oxford: Oxbow.
Acknowledgments
This study was carried out as part of a 5-year European Research Council funded research project. It also draws on the findings of work undertaken separately as part of an English Heritage-commissioned investigation aimed at developing a new information access strategy for England. The data upon which the study is based were provided by 75 separate HER Officers, English Heritage, the Archaeological Investigations Project and the Portable Antiquities Scheme. It goes without saying that our work would not have been possible without the support and expertise of the professionals involved in curating and extracting these data for us. We are particularly grateful to Sally Croft (Cambridgeshire HER), Simon Crutchley (English Heritage), Rebecca Loader (Isle of Wight HER), Dan Pett (PAS) and Emma Trevarthen (Cornwall HER) who kindly gave us permission to publish images of their data. Simon Crutchley, Martin Newman and Roger Thomas facilitated access to the NRHE data and have offered thoughtful guidance during our endeavour to get to grips with our various datasets. Ehren Milner gave us advice about accessing AIP data, and Dan Pett provided the PAS data. Letty ten Harkel, Zena Kamash and Laura Morley undertook the 100-km2 test exercise along with us. Miranda Creswell, Duncan Garrow, Chris Gosden, Letty ten Harkel, Zena Kamash and Dan Stansbie provided helpful comments on an earlier draft of the paper. The input of four anonymous reviewers improved substantially the version of this paper that was originally submitted for publication.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendices
Appendix 1: URLs for cited datasets
NRHE: http://www.pastscape.org.uk/
HERs: http://www.heritagegateway.org.uk/gateway/
AIP: http://csweb.bournemouth.ac.uk/aip/aipintro.htm
OASIS: http://archaeologydataservice.ac.uk/archsearch/
PAS: http://finds.org.uk/
EMC: http://www.fitzmuseum.cam.ac.uk/dept/coins/emc/
Appendix 2: Specificities of the datasets provided to the EngLaID project
HER data (including several UADs)
Seventy-five of the 84 HERs and UADs in England provided data for the EngLaID project. Most of these datasets use a monument and event structure, with events associated with a monument linked to it and, presumably, with events not linked to an existing monument generating a new monument upon their taking place. Nottinghamshire is an exception in having an intervening third layer, which can be thought of as a ‘feature’. Essentially, in that case, events are linked to features and then features are linked to monuments where a preexisting site is known of or once they become important enough to merit consideration as a ‘monument’. The majority, but not all, HERs also records sources/bibliography. Many records find details, especially amongst HBSMR users. However, such details tend to be added on an ad hoc basis.
NRHE
NRHE data were supplied by English Heritage as shapefiles and associated PDF documents that contained the most important attributes of each record. These PDF files had to be scanned using a script to extract the relevant attribute data. This process is slightly imperfect due to some monument types running across multiple lines, which makes it impossible for any automated (digital) process to tell when a term finishes. As such, the resulting output results in a ‘stream of consciousness’ list of monument types for each period, with one running into another, e.g. Roman: villa bathhouse barn round house. This makes some queries hard to perform, but is largely functional.
AIP
AIP data was downloaded by EngLaID from the AIP Web site. This is not entirely ideal due to a restriction in the database software used by the AIP, which means that it is not possible to download more than a certain (undetermined) number of records at one time. Some categories of data are, thus, not extractable due to exceeding this limit even when filtered down to the greatest level.
PAS
PAS data were supplied directly by the PAS in the form of a single large CSV spreadsheet. PAS data maps reasonably well onto HER finds recording formats, but contains a lot more detail.
Appendix 3: Additional methodological details for the 100-km2 exercise
Since, upon testing, a negligible degree of overlap between records held in the EMC and those within the PAS was identified, and since their structure and content is broadly compatible, these datasets are combined and treated as one.
One-hundred-square-kilometre test areas were selected so as to contain a reasonable, but not overly onerous number of records, so as to characterise as accurately as possible the nature of the dataset relationships. When tested specifically, almost all these squares included an above-average record density for the HER in question. If the HER area concerned included distinctly different landscape zones, which were likely to produce particular kinds of records (e.g. Norfolk, where the Fens produces very few PAS records, but the upland Breckland zone produces many) the 100-km2 area was positioned, as far as possible to include both/all such zones. In two cases, West Berkshire and North East Lincolnshire, it was not possible to fit a full set of 100 1 × 1-km cells within the case study area, so smaller areas were tested.
The findings of this exercise should be treated with some caution. All four EngLaID researchers involved in carrying out this exercise used the same broad methodology and at all times, we tried to maintain a consistent approach. However, the challenges involved in identifying links between records in different datasets varied considerably from test area to test area. Consequently, researchers necessarily developed their own ways of dealing with the particular ambiguities they faced in conducting the test.
Rights and permissions
About this article
Cite this article
Cooper, A., Green, C. Embracing the Complexities of ‘Big Data’ in Archaeology: the Case of the English Landscape and Identities Project. J Archaeol Method Theory 23, 271–304 (2016). https://doi.org/10.1007/s10816-015-9240-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10816-015-9240-4