Skip to main content
Log in

Embracing the Complexities of ‘Big Data’ in Archaeology: the Case of the English Landscape and Identities Project

  • Published:
Journal of Archaeological Method and Theory Aims and scope Submit manuscript

Abstract

This paper considers recent attempts within archaeology to create, integrate and interpret digital data on an unprecedented scale—a movement that resonates with the much wider so-called big data phenomenon. Using the example of our work with a particularly large and complex dataset collated for the purpose of the English Landscape and Identities project (EngLaID), Oxford, UK, and drawing on insights from social scientists’ studies of information infrastructures much more broadly, we make the following key points. Firstly, alongside scrutinising and homogenising digital records for research purposes, it is vital that we continue to appreciate the broader interpretative value of ‘characterful’ archaeological data (those that have histories and flaws of various kinds). Secondly, given the intricate and pliable nature of archaeological data and the substantial challenges faced by researchers seeking to create a cyber-infrastructure for archaeology, it is essential that we develop interim measures that allow us to explore the parameters and potentials of working with archaeological evidence on an unprecedented scale. We also consider some of the practical and ethical consequences of working in this vein.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. The term ‘events’ is used throughout the text to mean archaeological fieldwork investigations.

References

  • Alberti, B., Jones, A. M., & Pollard, J. (2013). Archaeology after interpretation: returning materials to archaeological theory. Walnut Creek: Left Coast Press.

    Google Scholar 

  • Amorosi, T., Woollett, J., Perdikaris, S., & McGovern, T. (1996). Regional zooarchaeology and global change: Problems and potentials. World Archaeology, 28(1), 126–157.

    Article  Google Scholar 

  • Atici, L., Witcher Kansa, S., Lev-Tov, J., & Kansa, E. C. (2013). Other peoples’ data: a demonstration of the imperative of publishing primary data. Journal of Archaeological Method and Theory, 20, 663–681.

    Article  Google Scholar 

  • Bawden, D., & Robinson, L. (2009). The dark side of information: overload, anxiety and other paradoxes and pathologies. Journal of Information Science, 35, 180–191.

    Article  Google Scholar 

  • Benson, D. (1972). A Sites and Monuments Record for the Oxford region. Oxoniensia, 37, 226–237.

  • Boldrini, N. (2006). Planning uncertainty: creating an artefact density index for North Yorkshire, England. Internet Archaeology,21 http://dx.doi.org/10.11141/ia.21.1. Accessed 15 October 2014.

  • Bowker, G., & Star, L. (1999). Sorting things out: classification and its consequences. Cambridge, MA: MIT Press.

    Google Scholar 

  • Bowker, G. C. (2005). Memory practices in the sciences. Cambridge, Massachusetts: MIT.

    Google Scholar 

  • Boyd, D., & Crawford, K. (2012). Critical questions for big data. Communication and Society, 15(5), 662–679.

    Article  Google Scholar 

  • Callou, C., Baly, I., Gargominy, O., & Reib, E. (2011). National Inventory of Natural Heritage website. Recent, historical and archaeological data. The SAA Archaeological, Record, 11(1), 37–40.

    Google Scholar 

  • Clarke, D. L. (1968). Analytical archaeology. London: Methuen.

    Google Scholar 

  • Connelly, W. (2011). A world of becoming. Durham, NC: Duke University Press.

    Google Scholar 

  • Cooper, A. (2013). Prehistory in practice: a multi-stranded analysis of British prehistoric research, 1975–2010. British Archaeological Report, British Series 577. Oxford: Archaeopress.

    Google Scholar 

  • Dam, C. and Hansen, H.J. (2005). The European digital resource in archaeology: sites and monuments data as a common European web resource. Internet Archaeology,18 http://dx.doi.org/10.11141/ia.18.4. Accessed 15 October 2014.

  • Dam, C., Austin, T., & Kenny, J. (2010). Breaking down national barriers: ARENA—a portal to European heritage information. In F. Niccolucci & H. Sorin (Eds.), Beyond the artefact. Digital interpretation of the past. Proceedings of CAA2004. Prato 13-17 April 2004 (pp. 94-98). Budapest: Archaeolingua.

    Google Scholar 

  • Edgeworth, M. (2003). Acts of discovery: an ethnography of archaeological practice. British Archaeological Report International Series 1131. Oxford: Archaeopress.

    Google Scholar 

  • Edwards, P. (2010). A vast machine: computer models, climate data, and the politics of global warming. Cambridge, MA: MIT Press.

    Google Scholar 

  • Ell, P. S. (2010). GIS, e-Science and the humanities grid. In D. J. Bodenhamer, J. Corrigan, & T. M. Harris (Eds.), The spatial humanities: GIS and the future of humanities scholarship (pp. 143–166). Bloomington: Indiana University Press.

    Google Scholar 

  • Evans, T. (2013). Holes in the archaeological record? A comparison of national event databases for the historic environment in England. The Historic Environment: Policy & Practice, 4, 19–34.

    Article  Google Scholar 

  • Fulford, M. G. & Holbrook, N. (2011). Assessing the contribution of commercial archaeology to the study of the Roman period in England, 1990-2004. Antiquaries Journal, 91, 323–345.

  • Gitelman, L. (Ed.). (2013). “Raw data” is an oxymoron. Cambridge, MA: MIT Press.

    Google Scholar 

  • Gitelman, L., & Jackson, V. (2013). Introduction. In L. Gitelman (Ed.), “Raw data” is an oxymoron (pp. 1–14). Cambridge, MA: MIT Press.

    Google Scholar 

  • Gobalet, K. (2001). A critique of faunal analysis: inconsistency among experts in blind analysis. Journal of Archaeological Science, 28, 377–386.

    Article  Google Scholar 

  • Goodman, D., & Piro, S. (2013). GPR remote sensing in archaeology. London: Springer.

    Book  Google Scholar 

  • Gosden, C., Kamash, Z., Kirkham, R., & Pybus, J. (2009). Joining the dots: exploring technical and social issues in e-Science approaches to linking landscape and artefactual data in British archaeology (pp. 171-174). E-Science workshops, 2009 5th IEEE International Conference. Oxford: Institute for Electrical and Electronic Engineers.

    Google Scholar 

  • Green, C. (2012). Archaeology in broad strokes: collating data for England from 1500 BC to AD 1086. In A. Chrysanthi, D. Wheatley, I. Romanowska, C. Papadopoulos, P. Murrieta-Flores, T. Sly, & G. Earl (Eds.), Archaeology in the Digital Era: Papers from the 40th Annual Conference of Computer Applications and Quantitative Methods in Archaeology (CAA), Southampton, 26-29 March 2012 (pp. 307–312). Amsterdam: Amsterdam University Press.

    Google Scholar 

  • Hodder, I. (1984). Archaeology in 1984. Antiquity, 58(222), 25–32.

    Article  Google Scholar 

  • Hodder, I. (1986). Reading the past: current approaches to interpretation in archaeology. Cambridge: Cambridge University Press.

    Google Scholar 

  • Holbrook, N. & Morton, R. (2011). Assessing the research potential of grey literature in the study of Roman England. Stage 1 report. Cotswold Archaeology. http://dx.doi.org/10.5284/1000368. Accessed 3 December 2013.

  • Jones, A. (2009). Into the future. In B. Cunliffe, C. Gosden and R.A. Joyce (Eds.), The Oxford Handbook of Archaeology (pp. 89–114). Oxford: Oxford University Press.

  • Kamash, Z., Cooper, A., Green, C., ten Harkel, L., & Morley, L. (2014). Transregional research using national datasets. Unpublished report. Oxford: Institute of Archaeology.

  • Kinory, J. L. (2012). Salt production, distribution and use in the Britsh Iron Age. British Archaeological Report British Series 559. Oxford: Archaeopress.

    Google Scholar 

  • Kintigh, K. (2006). The promise and challenge of archaeological data integration. American Antiquity, 71(3), 567–578.

    Article  Google Scholar 

  • Lampland, M. & S. Star, S. (Eds.) (2009). Standards and their stories: How quantifying, classifying and formalizing practices shape everyday life. New York: Cornell University Press.

  • Latour, B. (1988). The pasturisation of France. Cambridge, MA: Harvard University Press.

    Google Scholar 

  • Latour, B. (2000). Did Ramses II die of Tuberculosis? On the partial existence of existing and nonexisting objects. In L. Daston (Ed.), Biographies of scientific objects (pp. 247–269). Chicago: Chicago University Press.

    Google Scholar 

  • Latour, B. (2005). Reassembling the social: an introduction to Actor-Network-Theory. Oxford: Oxford University Press.

    Google Scholar 

  • Latour, B., Jensen, P., Venturini, T., Grauwin, S., & Boullier, D. (2012). ‘The whole is always smaller than its parts’: A digital test of Gabriel Tardes’ monads. British Journal of Sociology, 63(4), 590–615.

    Article  Google Scholar 

  • Lee, E. (2012). ‘Everything we know informs everything we do’: a vision for historic environment sector knowledge and information management. The Historic Environment, 3(1), 28–41.

    Article  Google Scholar 

  • Levi, A.S. (2013). Humanities ‘big data’: myths, challenges, and lessons. In Big Data, 2013 I.E. International Conference Proceedings (pp. 33-36). http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6691667&isnumber=6690588. Accessed 15 October 2014.

  • Levy, T.E., (2014). Editorial. Near Eastern Archaeology, 77 (3, special issue on Cyber-Archaeology).

  • Lucas, G. (2001). Destruction and the rhetoric of excavation. Norwegian Archaeological Review, 34(1), 35–46.

    Article  Google Scholar 

  • Lucas, G. (2012). Understanding the archaeological record. Cambridge: Cambridge University Press.

    Google Scholar 

  • McOmish, D., Field, D., & Brown, G. (2002). The field archaeology of the Salisbury Plain Training Area. Swindon: English Heritage.

    Google Scholar 

  • Mikkelsen, M. (2012). Development-led archaeology in Denmark. In L. Webley, M. Vander Linden, C. Haselgrove, & R. Bradley (Eds.), Development-led archaeology in northwest Europe (pp. 117–127). Oxford: Oxbow.

    Google Scholar 

  • Millerand, F., & Bowker, G. (2009). Metadata standards: trajectories and enactment in the life of an ontology. In M. Lampland & S. Star (Eds.), Standards and their stories: How quantifying, classifying and formalizing practices shape everyday life (pp. 149–166). New York: Cornell University Press.

    Google Scholar 

  • Musen, M. (1992). Dimensions of knowledge sharing and reuse. Computers and Biomedical Research, 25, 435–67.

    Article  Google Scholar 

  • Nature Editors. (2009). Data’s shameful neglect: research cannot flourish if data are not preserved and made accessible. All concerned must act accordingly. Nature, 461(7261), 145.

  • Newman, M. (2011). The database as material culture. Internet Archaeology, 29. http://intarch.ac.uk/journal/issue29/tag_index.html. Accessed 3 December 2013.

  • Onsrud, H., & Campbell, J. (2007). Big opportunities in access to “small science” data. Data Science Journal, 6(Open Data Issue), 58–66.

    Google Scholar 

  • Patrik, L. E. (1985). Is there an archaeological record?’. In M. B. Schiffer (Ed.), Advances in archaeological method and theory (pp. 27–62). New York: Academic Press.

    Google Scholar 

  • Prescott, A. (2013). Bibliographic records as humanities in big data. In Big Data, 2013 I.E. International Conference Proceedings (pp. 55-58). http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6691670&isnumber=6690588. Accessed 15 October 2014.

  • Ribes, D., & Jackson, V. (2013). Data bite man: the work of sustaining a long-term study. In L. Gitelman (Ed.), “Raw data” is an oxymoron (pp. 147–66). Cambridge, MA: MIT Press.

    Google Scholar 

  • Robinson, B. (2000). English Sites and Monuments Records—information, communication and technology. In G. Lock & K. Brown (Eds.), On the theory and practice of archaeological computing (pp. 89-106). Oxford University Committee for Archaeology Monograph 51. Oxford: Oxbow.

    Google Scholar 

  • Robbins, K. (2013). Balancing the scales: Exploring the variable effects of collection bias on data collected by the Portable Antiquities Scheme. Landscapes, 14(1), 54–72.

  • Roskams, S. & Whyman, M. (2007). Categorising the past: lessons from the Archaeological Resource Assessment for Yorkshire. Internet Archaeology 23, http://intarch.ac.uk/journal/issue23/2/index.html. Accessed 3 December 2013.

  • Shanks, M., & Tilley, C. (1987). Re-constructing archaeology: theory and practice. Cambridge: Cambridge University Press.

    Google Scholar 

  • Snow, D., Gahegan, M., Giles, L., Hirth, K., Milner, G., Prasenjit, M., & Wang, J. (2006). Cybertools and archaeology. Science, 311, 958–959.

    Article  Google Scholar 

  • Spielmann, K., & Kintigh, K. (2011). The digital archaeological record: The potentials of archaeozoological data integration through tDAR. The SAA Archaeological Record, 11(1), 22–25.

    Google Scholar 

  • Taylor, J. (2007). An atlas of Roman rural settlement in England. London: CBA Research Report 151.

    Google Scholar 

  • Tilbury, J. (2013). Digital archiving and preservation: How to compare and contrast. Workshop on the Future of Big Data Management, June 2013. https://indico.cern.ch/event/246453/session/8/contribution/21/material/slides/1.pdf. Accessed 10 Feb 2015.

  • Tilley, C. (1998). Archaeology: The loss of isolation. Antiquity, 72, 691–93.

    Google Scholar 

  • Wainwright, G. (1989). Management of the English landscape. In H. Cleere (Ed.), Archaeological heritage management in the modern world (pp. 164–170). London: Council for British Archaeology.

    Google Scholar 

  • Weinberger, D. (2012). Too big to know: rethinking knowledge now that the facts aren’t the facts, experts are everywhere, and the smartest person in the room is the room. New York: Basic Books.

    Google Scholar 

  • Worrell, S., Egan, G., Naylor, J., Leahy, K., & Lewis, M. (Eds.). (2010). A decade of discovery: Proceedings of the Portable Antiquities Scheme Conference 2007. British Archaeological Reports, British Series 520. Oxford: Archaeopress.

    Google Scholar 

  • Wylie, A. (1985). Putting shakertown back together: critical theory in archaeology. Journal of Anthropological Archaeology, 4, 133–47.

    Article  Google Scholar 

  • Yarrow, T. (2003). Artefactual persons: the relational capacities of persons and things in the practice of excavation. Norwegian Archaeological Review, 36(1), 65–73.

    Article  Google Scholar 

  • Yarrow, T. (2006). Perspective matters: traversing scale through archaeological practice. In G. Lock & B. Molyneaux (Eds.), Confronting scale in archaeology: issues of theory and practice (pp. 77–87). New York: Springer.

    Google Scholar 

  • Yarrow, T. (2012). Not knowing as knowledge: asymmetry between archaeology and anthropology. In D. Garrow & T. Yarrow (Eds.), Archaeology and anthropology: understanding similarities, exploring differences (pp. 13–27). Oxford: Oxbow.

    Google Scholar 

Download references

Acknowledgments

This study was carried out as part of a 5-year European Research Council funded research project. It also draws on the findings of work undertaken separately as part of an English Heritage-commissioned investigation aimed at developing a new information access strategy for England. The data upon which the study is based were provided by 75 separate HER Officers, English Heritage, the Archaeological Investigations Project and the Portable Antiquities Scheme. It goes without saying that our work would not have been possible without the support and expertise of the professionals involved in curating and extracting these data for us. We are particularly grateful to Sally Croft (Cambridgeshire HER), Simon Crutchley (English Heritage), Rebecca Loader (Isle of Wight HER), Dan Pett (PAS) and Emma Trevarthen (Cornwall HER) who kindly gave us permission to publish images of their data. Simon Crutchley, Martin Newman and Roger Thomas facilitated access to the NRHE data and have offered thoughtful guidance during our endeavour to get to grips with our various datasets. Ehren Milner gave us advice about accessing AIP data, and Dan Pett provided the PAS data. Letty ten Harkel, Zena Kamash and Laura Morley undertook the 100-km2 test exercise along with us. Miranda Creswell, Duncan Garrow, Chris Gosden, Letty ten Harkel, Zena Kamash and Dan Stansbie provided helpful comments on an earlier draft of the paper. The input of four anonymous reviewers improved substantially the version of this paper that was originally submitted for publication.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anwen Cooper.

Appendices

Appendices

Appendix 1: URLs for cited datasets

NRHE: http://www.pastscape.org.uk/

NMP: http://www.english-heritage.org.uk/professional/research/landscapes-and-areas/national-mapping-programme/

HERs: http://www.heritagegateway.org.uk/gateway/

AIP: http://csweb.bournemouth.ac.uk/aip/aipintro.htm

OASIS: http://archaeologydataservice.ac.uk/archsearch/

PAS: http://finds.org.uk/

EMC: http://www.fitzmuseum.cam.ac.uk/dept/coins/emc/

Appendix 2: Specificities of the datasets provided to the EngLaID project

HER data (including several UADs)

Seventy-five of the 84 HERs and UADs in England provided data for the EngLaID project. Most of these datasets use a monument and event structure, with events associated with a monument linked to it and, presumably, with events not linked to an existing monument generating a new monument upon their taking place. Nottinghamshire is an exception in having an intervening third layer, which can be thought of as a ‘feature’. Essentially, in that case, events are linked to features and then features are linked to monuments where a preexisting site is known of or once they become important enough to merit consideration as a ‘monument’. The majority, but not all, HERs also records sources/bibliography. Many records find details, especially amongst HBSMR users. However, such details tend to be added on an ad hoc basis.

NRHE

NRHE data were supplied by English Heritage as shapefiles and associated PDF documents that contained the most important attributes of each record. These PDF files had to be scanned using a script to extract the relevant attribute data. This process is slightly imperfect due to some monument types running across multiple lines, which makes it impossible for any automated (digital) process to tell when a term finishes. As such, the resulting output results in a ‘stream of consciousness’ list of monument types for each period, with one running into another, e.g. Roman: villa bathhouse barn round house. This makes some queries hard to perform, but is largely functional.

AIP

AIP data was downloaded by EngLaID from the AIP Web site. This is not entirely ideal due to a restriction in the database software used by the AIP, which means that it is not possible to download more than a certain (undetermined) number of records at one time. Some categories of data are, thus, not extractable due to exceeding this limit even when filtered down to the greatest level.

PAS

PAS data were supplied directly by the PAS in the form of a single large CSV spreadsheet. PAS data maps reasonably well onto HER finds recording formats, but contains a lot more detail.

Appendix 3: Additional methodological details for the 100-km2 exercise

Since, upon testing, a negligible degree of overlap between records held in the EMC and those within the PAS was identified, and since their structure and content is broadly compatible, these datasets are combined and treated as one.

One-hundred-square-kilometre test areas were selected so as to contain a reasonable, but not overly onerous number of records, so as to characterise as accurately as possible the nature of the dataset relationships. When tested specifically, almost all these squares included an above-average record density for the HER in question. If the HER area concerned included distinctly different landscape zones, which were likely to produce particular kinds of records (e.g. Norfolk, where the Fens produces very few PAS records, but the upland Breckland zone produces many) the 100-km2 area was positioned, as far as possible to include both/all such zones. In two cases, West Berkshire and North East Lincolnshire, it was not possible to fit a full set of 100 1 × 1-km cells within the case study area, so smaller areas were tested.

The findings of this exercise should be treated with some caution. All four EngLaID researchers involved in carrying out this exercise used the same broad methodology and at all times, we tried to maintain a consistent approach. However, the challenges involved in identifying links between records in different datasets varied considerably from test area to test area. Consequently, researchers necessarily developed their own ways of dealing with the particular ambiguities they faced in conducting the test.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cooper, A., Green, C. Embracing the Complexities of ‘Big Data’ in Archaeology: the Case of the English Landscape and Identities Project. J Archaeol Method Theory 23, 271–304 (2016). https://doi.org/10.1007/s10816-015-9240-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10816-015-9240-4

Keywords

Navigation