Embracing the Complexities of ‘Big Data’ in Archaeology: the Case of the English Landscape and Identities Project

Cooper, Anwen; Green, Chris

doi:10.1007/s10816-015-9240-4

Embracing the Complexities of ‘Big Data’ in Archaeology: the Case of the English Landscape and Identities Project

Published: 25 February 2015

Volume 23, pages 271–304, (2016)
Cite this article

Journal of Archaeological Method and Theory Aims and scope Submit manuscript

Anwen Cooper¹ &
Chris Green¹

3043 Accesses
69 Citations
5 Altmetric
Explore all metrics

Abstract

This paper considers recent attempts within archaeology to create, integrate and interpret digital data on an unprecedented scale—a movement that resonates with the much wider so-called big data phenomenon. Using the example of our work with a particularly large and complex dataset collated for the purpose of the English Landscape and Identities project (EngLaID), Oxford, UK, and drawing on insights from social scientists’ studies of information infrastructures much more broadly, we make the following key points. Firstly, alongside scrutinising and homogenising digital records for research purposes, it is vital that we continue to appreciate the broader interpretative value of ‘characterful’ archaeological data (those that have histories and flaws of various kinds). Secondly, given the intricate and pliable nature of archaeological data and the substantial challenges faced by researchers seeking to create a cyber-infrastructure for archaeology, it is essential that we develop interim measures that allow us to explore the parameters and potentials of working with archaeological evidence on an unprecedented scale. We also consider some of the practical and ethical consequences of working in this vein.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bridging Digital Divides: a Literature Review and Research Agenda for Information Systems Research

Article 06 January 2021

Ecotourism and sustainable development: a scientometric review of global research trends

Article 21 February 2022

Heritage as a Focus of Research: Past, Present and New Directions

Notes

The term ‘events’ is used throughout the text to mean archaeological fieldwork investigations.

References

Alberti, B., Jones, A. M., & Pollard, J. (2013). Archaeology after interpretation: returning materials to archaeological theory. Walnut Creek: Left Coast Press.
Google Scholar
Amorosi, T., Woollett, J., Perdikaris, S., & McGovern, T. (1996). Regional zooarchaeology and global change: Problems and potentials. World Archaeology, 28(1), 126–157.
Article Google Scholar
Atici, L., Witcher Kansa, S., Lev-Tov, J., & Kansa, E. C. (2013). Other peoples’ data: a demonstration of the imperative of publishing primary data. Journal of Archaeological Method and Theory, 20, 663–681.
Article Google Scholar
Bawden, D., & Robinson, L. (2009). The dark side of information: overload, anxiety and other paradoxes and pathologies. Journal of Information Science, 35, 180–191.
Article Google Scholar
Benson, D. (1972). A Sites and Monuments Record for the Oxford region. Oxoniensia, 37, 226–237.
Boldrini, N. (2006). Planning uncertainty: creating an artefact density index for North Yorkshire, England. Internet Archaeology,21 http://dx.doi.org/10.11141/ia.21.1. Accessed 15 October 2014.
Bowker, G., & Star, L. (1999). Sorting things out: classification and its consequences. Cambridge, MA: MIT Press.
Google Scholar
Bowker, G. C. (2005). Memory practices in the sciences. Cambridge, Massachusetts: MIT.
Google Scholar
Boyd, D., & Crawford, K. (2012). Critical questions for big data. Communication and Society, 15(5), 662–679.
Article Google Scholar
Callou, C., Baly, I., Gargominy, O., & Reib, E. (2011). National Inventory of Natural Heritage website. Recent, historical and archaeological data. The SAA Archaeological, Record, 11(1), 37–40.
Google Scholar
Clarke, D. L. (1968). Analytical archaeology. London: Methuen.
Google Scholar
Connelly, W. (2011). A world of becoming. Durham, NC: Duke University Press.
Google Scholar
Cooper, A. (2013). Prehistory in practice: a multi-stranded analysis of British prehistoric research, 1975–2010. British Archaeological Report, British Series 577. Oxford: Archaeopress.
Google Scholar
Dam, C. and Hansen, H.J. (2005). The European digital resource in archaeology: sites and monuments data as a common European web resource. Internet Archaeology,18 http://dx.doi.org/10.11141/ia.18.4. Accessed 15 October 2014.
Dam, C., Austin, T., & Kenny, J. (2010). Breaking down national barriers: ARENA—a portal to European heritage information. In F. Niccolucci & H. Sorin (Eds.), Beyond the artefact. Digital interpretation of the past. Proceedings of CAA2004. Prato 13-17 April 2004 (pp. 94-98). Budapest: Archaeolingua.
Google Scholar
Edgeworth, M. (2003). Acts of discovery: an ethnography of archaeological practice. British Archaeological Report International Series 1131. Oxford: Archaeopress.
Google Scholar
Edwards, P. (2010). A vast machine: computer models, climate data, and the politics of global warming. Cambridge, MA: MIT Press.
Google Scholar
Ell, P. S. (2010). GIS, e-Science and the humanities grid. In D. J. Bodenhamer, J. Corrigan, & T. M. Harris (Eds.), The spatial humanities: GIS and the future of humanities scholarship (pp. 143–166). Bloomington: Indiana University Press.
Google Scholar
Evans, T. (2013). Holes in the archaeological record? A comparison of national event databases for the historic environment in England. The Historic Environment: Policy & Practice, 4, 19–34.
Article Google Scholar
Fulford, M. G. & Holbrook, N. (2011). Assessing the contribution of commercial archaeology to the study of the Roman period in England, 1990-2004. Antiquaries Journal, 91, 323–345.
Gitelman, L. (Ed.). (2013). “Raw data” is an oxymoron. Cambridge, MA: MIT Press.
Google Scholar
Gitelman, L., & Jackson, V. (2013). Introduction. In L. Gitelman (Ed.), “Raw data” is an oxymoron (pp. 1–14). Cambridge, MA: MIT Press.
Google Scholar
Gobalet, K. (2001). A critique of faunal analysis: inconsistency among experts in blind analysis. Journal of Archaeological Science, 28, 377–386.
Article Google Scholar
Goodman, D., & Piro, S. (2013). GPR remote sensing in archaeology. London: Springer.
Book Google Scholar
Gosden, C., Kamash, Z., Kirkham, R., & Pybus, J. (2009). Joining the dots: exploring technical and social issues in e-Science approaches to linking landscape and artefactual data in British archaeology (pp. 171-174). E-Science workshops, 2009 5th IEEE International Conference. Oxford: Institute for Electrical and Electronic Engineers.
Google Scholar
Green, C. (2012). Archaeology in broad strokes: collating data for England from 1500 BC to AD 1086. In A. Chrysanthi, D. Wheatley, I. Romanowska, C. Papadopoulos, P. Murrieta-Flores, T. Sly, & G. Earl (Eds.), Archaeology in the Digital Era: Papers from the 40th Annual Conference of Computer Applications and Quantitative Methods in Archaeology (CAA), Southampton, 26-29 March 2012 (pp. 307–312). Amsterdam: Amsterdam University Press.
Google Scholar
Hodder, I. (1984). Archaeology in 1984. Antiquity, 58(222), 25–32.
Article Google Scholar
Hodder, I. (1986). Reading the past: current approaches to interpretation in archaeology. Cambridge: Cambridge University Press.
Google Scholar
Holbrook, N. & Morton, R. (2011). Assessing the research potential of grey literature in the study of Roman England. Stage 1 report. Cotswold Archaeology. http://dx.doi.org/10.5284/1000368. Accessed 3 December 2013.
Jones, A. (2009). Into the future. In B. Cunliffe, C. Gosden and R.A. Joyce (Eds.), The Oxford Handbook of Archaeology (pp. 89–114). Oxford: Oxford University Press.
Kamash, Z., Cooper, A., Green, C., ten Harkel, L., & Morley, L. (2014). Transregional research using national datasets. Unpublished report. Oxford: Institute of Archaeology.
Kinory, J. L. (2012). Salt production, distribution and use in the Britsh Iron Age. British Archaeological Report British Series 559. Oxford: Archaeopress.
Google Scholar
Kintigh, K. (2006). The promise and challenge of archaeological data integration. American Antiquity, 71(3), 567–578.
Article Google Scholar
Lampland, M. & S. Star, S. (Eds.) (2009). Standards and their stories: How quantifying, classifying and formalizing practices shape everyday life. New York: Cornell University Press.
Latour, B. (1988). The pasturisation of France. Cambridge, MA: Harvard University Press.
Google Scholar
Latour, B. (2000). Did Ramses II die of Tuberculosis? On the partial existence of existing and nonexisting objects. In L. Daston (Ed.), Biographies of scientific objects (pp. 247–269). Chicago: Chicago University Press.
Google Scholar
Latour, B. (2005). Reassembling the social: an introduction to Actor-Network-Theory. Oxford: Oxford University Press.
Google Scholar
Latour, B., Jensen, P., Venturini, T., Grauwin, S., & Boullier, D. (2012). ‘The whole is always smaller than its parts’: A digital test of Gabriel Tardes’ monads. British Journal of Sociology, 63(4), 590–615.
Article Google Scholar
Lee, E. (2012). ‘Everything we know informs everything we do’: a vision for historic environment sector knowledge and information management. The Historic Environment, 3(1), 28–41.
Article Google Scholar
Levi, A.S. (2013). Humanities ‘big data’: myths, challenges, and lessons. In Big Data, 2013 I.E. International Conference Proceedings (pp. 33-36). http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6691667&isnumber=6690588. Accessed 15 October 2014.
Levy, T.E., (2014). Editorial. Near Eastern Archaeology, 77 (3, special issue on Cyber-Archaeology).
Lucas, G. (2001). Destruction and the rhetoric of excavation. Norwegian Archaeological Review, 34(1), 35–46.
Article Google Scholar
Lucas, G. (2012). Understanding the archaeological record. Cambridge: Cambridge University Press.
Google Scholar
McOmish, D., Field, D., & Brown, G. (2002). The field archaeology of the Salisbury Plain Training Area. Swindon: English Heritage.
Google Scholar
Mikkelsen, M. (2012). Development-led archaeology in Denmark. In L. Webley, M. Vander Linden, C. Haselgrove, & R. Bradley (Eds.), Development-led archaeology in northwest Europe (pp. 117–127). Oxford: Oxbow.
Google Scholar
Millerand, F., & Bowker, G. (2009). Metadata standards: trajectories and enactment in the life of an ontology. In M. Lampland & S. Star (Eds.), Standards and their stories: How quantifying, classifying and formalizing practices shape everyday life (pp. 149–166). New York: Cornell University Press.
Google Scholar
Musen, M. (1992). Dimensions of knowledge sharing and reuse. Computers and Biomedical Research, 25, 435–67.
Article Google Scholar
Nature Editors. (2009). Data’s shameful neglect: research cannot flourish if data are not preserved and made accessible. All concerned must act accordingly. Nature, 461(7261), 145.
Newman, M. (2011). The database as material culture. Internet Archaeology, 29. http://intarch.ac.uk/journal/issue29/tag_index.html. Accessed 3 December 2013.
Onsrud, H., & Campbell, J. (2007). Big opportunities in access to “small science” data. Data Science Journal, 6(Open Data Issue), 58–66.
Google Scholar
Patrik, L. E. (1985). Is there an archaeological record?’. In M. B. Schiffer (Ed.), Advances in archaeological method and theory (pp. 27–62). New York: Academic Press.
Google Scholar
Prescott, A. (2013). Bibliographic records as humanities in big data. In Big Data, 2013 I.E. International Conference Proceedings (pp. 55-58). http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6691670&isnumber=6690588. Accessed 15 October 2014.
Ribes, D., & Jackson, V. (2013). Data bite man: the work of sustaining a long-term study. In L. Gitelman (Ed.), “Raw data” is an oxymoron (pp. 147–66). Cambridge, MA: MIT Press.
Google Scholar
Robinson, B. (2000). English Sites and Monuments Records—information, communication and technology. In G. Lock & K. Brown (Eds.), On the theory and practice of archaeological computing (pp. 89-106). Oxford University Committee for Archaeology Monograph 51. Oxford: Oxbow.
Google Scholar
Robbins, K. (2013). Balancing the scales: Exploring the variable effects of collection bias on data collected by the Portable Antiquities Scheme. Landscapes, 14(1), 54–72.
Roskams, S. & Whyman, M. (2007). Categorising the past: lessons from the Archaeological Resource Assessment for Yorkshire. Internet Archaeology 23, http://intarch.ac.uk/journal/issue23/2/index.html. Accessed 3 December 2013.
Shanks, M., & Tilley, C. (1987). Re-constructing archaeology: theory and practice. Cambridge: Cambridge University Press.
Google Scholar
Snow, D., Gahegan, M., Giles, L., Hirth, K., Milner, G., Prasenjit, M., & Wang, J. (2006). Cybertools and archaeology. Science, 311, 958–959.
Article Google Scholar
Spielmann, K., & Kintigh, K. (2011). The digital archaeological record: The potentials of archaeozoological data integration through tDAR. The SAA Archaeological Record, 11(1), 22–25.
Google Scholar
Taylor, J. (2007). An atlas of Roman rural settlement in England. London: CBA Research Report 151.
Google Scholar
Tilbury, J. (2013). Digital archiving and preservation: How to compare and contrast. Workshop on the Future of Big Data Management, June 2013. https://indico.cern.ch/event/246453/session/8/contribution/21/material/slides/1.pdf. Accessed 10 Feb 2015.
Tilley, C. (1998). Archaeology: The loss of isolation. Antiquity, 72, 691–93.
Google Scholar
Wainwright, G. (1989). Management of the English landscape. In H. Cleere (Ed.), Archaeological heritage management in the modern world (pp. 164–170). London: Council for British Archaeology.
Google Scholar
Weinberger, D. (2012). Too big to know: rethinking knowledge now that the facts aren’t the facts, experts are everywhere, and the smartest person in the room is the room. New York: Basic Books.
Google Scholar
Worrell, S., Egan, G., Naylor, J., Leahy, K., & Lewis, M. (Eds.). (2010). A decade of discovery: Proceedings of the Portable Antiquities Scheme Conference 2007. British Archaeological Reports, British Series 520. Oxford: Archaeopress.
Google Scholar
Wylie, A. (1985). Putting shakertown back together: critical theory in archaeology. Journal of Anthropological Archaeology, 4, 133–47.
Article Google Scholar
Yarrow, T. (2003). Artefactual persons: the relational capacities of persons and things in the practice of excavation. Norwegian Archaeological Review, 36(1), 65–73.
Article Google Scholar
Yarrow, T. (2006). Perspective matters: traversing scale through archaeological practice. In G. Lock & B. Molyneaux (Eds.), Confronting scale in archaeology: issues of theory and practice (pp. 77–87). New York: Springer.
Google Scholar
Yarrow, T. (2012). Not knowing as knowledge: asymmetry between archaeology and anthropology. In D. Garrow & T. Yarrow (Eds.), Archaeology and anthropology: understanding similarities, exploring differences (pp. 13–27). Oxford: Oxbow.
Google Scholar

Download references

Acknowledgments

This study was carried out as part of a 5-year European Research Council funded research project. It also draws on the findings of work undertaken separately as part of an English Heritage-commissioned investigation aimed at developing a new information access strategy for England. The data upon which the study is based were provided by 75 separate HER Officers, English Heritage, the Archaeological Investigations Project and the Portable Antiquities Scheme. It goes without saying that our work would not have been possible without the support and expertise of the professionals involved in curating and extracting these data for us. We are particularly grateful to Sally Croft (Cambridgeshire HER), Simon Crutchley (English Heritage), Rebecca Loader (Isle of Wight HER), Dan Pett (PAS) and Emma Trevarthen (Cornwall HER) who kindly gave us permission to publish images of their data. Simon Crutchley, Martin Newman and Roger Thomas facilitated access to the NRHE data and have offered thoughtful guidance during our endeavour to get to grips with our various datasets. Ehren Milner gave us advice about accessing AIP data, and Dan Pett provided the PAS data. Letty ten Harkel, Zena Kamash and Laura Morley undertook the 100-km² test exercise along with us. Miranda Creswell, Duncan Garrow, Chris Gosden, Letty ten Harkel, Zena Kamash and Dan Stansbie provided helpful comments on an earlier draft of the paper. The input of four anonymous reviewers improved substantially the version of this paper that was originally submitted for publication.

Author information

Authors and Affiliations

Institute of Archaeology, University of Oxford, 36 Beaumont Street, Oxford, OX1 2PG, UK
Anwen Cooper & Chris Green

Authors

Anwen Cooper
View author publications
You can also search for this author in PubMed Google Scholar
Chris Green
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anwen Cooper.

Appendices

Appendix 1: URLs for cited datasets

NRHE: http://www.pastscape.org.uk/

NMP: http://www.english-heritage.org.uk/professional/research/landscapes-and-areas/national-mapping-programme/

HERs: http://www.heritagegateway.org.uk/gateway/

AIP: http://csweb.bournemouth.ac.uk/aip/aipintro.htm

OASIS: http://archaeologydataservice.ac.uk/archsearch/

PAS: http://finds.org.uk/

EMC: http://www.fitzmuseum.cam.ac.uk/dept/coins/emc/

Appendix 2: Specificities of the datasets provided to the EngLaID project

HER data (including several UADs)

Seventy-five of the 84 HERs and UADs in England provided data for the EngLaID project. Most of these datasets use a monument and event structure, with events associated with a monument linked to it and, presumably, with events not linked to an existing monument generating a new monument upon their taking place. Nottinghamshire is an exception in having an intervening third layer, which can be thought of as a ‘feature’. Essentially, in that case, events are linked to features and then features are linked to monuments where a preexisting site is known of or once they become important enough to merit consideration as a ‘monument’. The majority, but not all, HERs also records sources/bibliography. Many records find details, especially amongst HBSMR users. However, such details tend to be added on an ad hoc basis.

NRHE

NRHE data were supplied by English Heritage as shapefiles and associated PDF documents that contained the most important attributes of each record. These PDF files had to be scanned using a script to extract the relevant attribute data. This process is slightly imperfect due to some monument types running across multiple lines, which makes it impossible for any automated (digital) process to tell when a term finishes. As such, the resulting output results in a ‘stream of consciousness’ list of monument types for each period, with one running into another, e.g. Roman: villa bathhouse barn round house. This makes some queries hard to perform, but is largely functional.

AIP

AIP data was downloaded by EngLaID from the AIP Web site. This is not entirely ideal due to a restriction in the database software used by the AIP, which means that it is not possible to download more than a certain (undetermined) number of records at one time. Some categories of data are, thus, not extractable due to exceeding this limit even when filtered down to the greatest level.

PAS

PAS data were supplied directly by the PAS in the form of a single large CSV spreadsheet. PAS data maps reasonably well onto HER finds recording formats, but contains a lot more detail.

Appendix 3: Additional methodological details for the 100-km² exercise

Since, upon testing, a negligible degree of overlap between records held in the EMC and those within the PAS was identified, and since their structure and content is broadly compatible, these datasets are combined and treated as one.

One-hundred-square-kilometre test areas were selected so as to contain a reasonable, but not overly onerous number of records, so as to characterise as accurately as possible the nature of the dataset relationships. When tested specifically, almost all these squares included an above-average record density for the HER in question. If the HER area concerned included distinctly different landscape zones, which were likely to produce particular kinds of records (e.g. Norfolk, where the Fens produces very few PAS records, but the upland Breckland zone produces many) the 100-km² area was positioned, as far as possible to include both/all such zones. In two cases, West Berkshire and North East Lincolnshire, it was not possible to fit a full set of 100 1 × 1-km cells within the case study area, so smaller areas were tested.

The findings of this exercise should be treated with some caution. All four EngLaID researchers involved in carrying out this exercise used the same broad methodology and at all times, we tried to maintain a consistent approach. However, the challenges involved in identifying links between records in different datasets varied considerably from test area to test area. Consequently, researchers necessarily developed their own ways of dealing with the particular ambiguities they faced in conducting the test.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cooper, A., Green, C. Embracing the Complexities of ‘Big Data’ in Archaeology: the Case of the English Landscape and Identities Project. J Archaeol Method Theory 23, 271–304 (2016). https://doi.org/10.1007/s10816-015-9240-4

Download citation

Published: 25 February 2015
Issue Date: March 2016
DOI: https://doi.org/10.1007/s10816-015-9240-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Embracing the Complexities of ‘Big Data’ in Archaeology: the Case of the English Landscape and Identities Project

Abstract

Access this article

Similar content being viewed by others

Bridging Digital Divides: a Literature Review and Research Agenda for Information Systems Research

Ecotourism and sustainable development: a scientometric review of global research trends

Heritage as a Focus of Research: Past, Present and New Directions

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendices

Appendix 1: URLs for cited datasets

Appendix 2: Specificities of the datasets provided to the EngLaID project

HER data (including several UADs)

NRHE

AIP

PAS

Appendix 3: Additional methodological details for the 100-km² exercise

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Embracing the Complexities of ‘Big Data’ in Archaeology: the Case of the English Landscape and Identities Project

Abstract

Access this article

Similar content being viewed by others

Bridging Digital Divides: a Literature Review and Research Agenda for Information Systems Research

Ecotourism and sustainable development: a scientometric review of global research trends

Heritage as a Focus of Research: Past, Present and New Directions

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendices

Appendix 1: URLs for cited datasets

Appendix 2: Specificities of the datasets provided to the EngLaID project

HER data (including several UADs)

NRHE

AIP

PAS

Appendix 3: Additional methodological details for the 100-km2 exercise

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

Appendix 3: Additional methodological details for the 100-km² exercise