KeywordsArchaeological Data Before Present Archaeological Information Tiny Fragment Mobile Device Usage
In one sense, archaeology deals with the biggest dataset of all: the entire material record of human history, from the earliest human origins c. 2.2 million years Before Present (BP) to the present day. However this dataset is, by its nature, incomplete, fragmentary, and dispersed. Archaeology therefore brings a very particular kind of challenge to the concept of big data. Rather than real-time analyses of the shifting digital landscape of data produced by the day to day transactions of millions of people and billions of devices, approaches to big data in archaeology refer to the sifting and reverse-engineering of masses of data derived from both primary and secondary investigation into the history of material culture.
Big Data and the Archaeological Research Cycle
Whether derived from excavation, post-excavation analysis, experimentation, or simulation, archaeologists have only tiny fragments of the “global” dataset that represents the material record, or even the record of any specific time period or region. If one takes any definition of “Big Data” as it is generally understood, a corpus of information which is too massive for desktop-based or manual analysis or manipulation, no single archaeological dataset is likely to have these attributes of size and scale. The significance of Big Data for archaeology lies not so much in the analysis and manipulation of single or multiple collections of vast datasets but rather in the bringing together of multiple data, created at different times, for different purposes and according to different standards; the interpretive and critical frameworks needed to create knowledge from them. Archaeology is “Big Data” in the sense that it is “data that is bigger than the sum of its parts.”
Those parts are massively varied. Data in archaeology can be normal photographic images, images and data from remote sensing, tabular data of information such as artifact findspots, numerical databases, or text. It should also be noted that the act of generating archaeological data is rarely, if ever, the end of the investigation or project. Any dataset produced in the field or the lab typically forms part of a larger interpretation and interpolation process and – crucially – archaeological data is often not published in a consistent or interoperable manner; although approaches to so-called Grey Literature, which constitutes reports from archaeological surveys and excavations that typically do not achieve a wide readership, are discussed below. This fits with a general characteristic of Big Data, as opposed to the “e-Science/Grid Computing” paradigm of the 2000s. Whereas the latter was primarily concerned with “big infrastructure,” anticipating the need for scientists to deal with a “deluge” of monolithic data emerging from massive projects such as the Large Hardron Collider, as described by Tony Hey and Anne Trefethen, Big Data is concerned with the mass of information which grows organically as the result of the ubiquity of computing in everyday life and in everyday science. In the case of archaeology, it may be considered more as a “complexity deluge,” where small data, produced on a daily basis, forms part of a bigger picture.
There are exceptions: Some individual projects in archaeology are concerned with terabyte-scale data. The most obvious example in the UK is the North Sea Paleolandscapes, led by the University of Birmingham, a project which has reconstructed the Early Holocene landscape of the bed of the North Sea, which was an inhabitable landscape until its inundation between 20,000 and 8,000 BP – so-called Doggerland. Vince Gaffney and others describe drawing on 3D seismic data gathered during the process of oil prospection, this project has used large-scale data analytics and visualization to reconstruct the topography of the preinundation land surface spanning an area larger than the Netherlands, and to thus allow inferences as to what environmental factors might have shaped human habitation of it; although it must be stressed that there is no direct evidence at all of that human occupation. While such projects demonstrate the potential of Big Data technologies for conducting large-scale archaeological research, they remain the exception. Most applications in archaeology remain relatively small scale, at least in terms of the volume of data that is produced, stored, and preserved.
However, this is not to say that approaches which are characteristic of Big Data are not changing the picture significantly in archaeology, especially in the field of landscape studies. Data from geophysics, the science of scanning subterranean features using techniques such as magentometry and resistivity typically produce relatively large datasets, which require holistic analysis in order to be understood and interpreted. This trend is accentuated by the rise of more sophisticated data capture techniques in the field, which is increasing the capacity of data that can be gathered and analyzed. Although still not “big” in the literal sense of “Big Data,” this class of material undoubtedly requires the kinds of approaches in thinking and interpretation familiar from elsewhere in the Big data agenda. Recent applications in landscape archaeology have highlighted the need both for large capacity and interoperation. For example, integration of data from the in the Stonehenge Hidden Landscape project, also directed by Gaffney, provides for “seamless” capture of reams of geophysical data from remote sensing, visualizing the Neolithic landscape beneath modern Wiltshire to a degree of clarity and comprehensiveness that would only have been possible hitherto with expensive and laborious manual survey. Due to improved capture techniques, this project succeeded in gathering a quantity of data in its first two weeks equivalent to that of the landmark Wroxeter survey project in the 1990s.
These early achievements of big data in an archaeological context fall against a background of falling hardware costs, lower barriers to usage, and the availability of generic web-based platforms where large-scale distributed research can be conducted. This combination of affordability and usability is bringing about a revolution in applications such as those described above, where remote sensing is reaching new concepts and applications. For example, coverage of freely available satellite imagery is now near-total; graphical resolution is finer for most areas than ever before (1 m or less); and pre-georeferenced satellite and aerial images are delivered to the user’s desktop, removing the costly and highly specialized process of locating imagery of the Earth’s surface. Such platforms also allow access to imagery of archaeological sites in regions which are practically very difficult or impossible to survey, such as Afghanistan, where declassified CORONA spy satellite data are now being employed to construct inventories of the region’s (highly vulnerable) archaeology. If these developments cannot be said to have removed the boundaries within which archaeologists can produce, access, and analyze data, then it has certainly made them more porous.
As in other domains, strategies for the storage and preservation of data in archaeology have a fundamental relationship with relevant aspects of the Big Data paradigm. Much archaeological information lives on the local servers of institutions, individuals, and projects; this has always constituted an obvious barrier to their integration into a larger whole. However, weighing against this is the ethical and professional obligation to share, especially in a discipline where the process of gathering the data (excavation) destroys its material context. National strategies and bodies encourage the discharge of this obligation. In the UK, as well as data standards and collections held by English Heritage, the main repository for archaeological data is the Archaeology Data Service, based at the University of York. The ADS considers for accession any archaeological data produced in the UK in a variety of formats. This includes most of the data formats used in day-to-day archaeological workflows: Geographic Information System (GIS) databases and shapefiles, images, numerical data, and text. In the latter case, particular note should be given to the “Grey Literature” library of archaeological reports from surveys and excavations, which typically present archaeological information and data in a format suitable for rapid publication, rather than the linking and interoperation of that data. Currently, the Library contains over 27,000 such reports. Currently, the total volume of the ADS’s collections stands at 4.5 Tb (I thank Michael Charno for this information). While this could be considered “big” in terms of any collection of data in the humanities, it is not of a scale which would overwhelm most analysis platforms; however what is key here is that it is most unlikely to be useful to perform any “global” scale analysis across the entire collection. The individual datasets therein relate to each other only inasmuch as they are “archaeological.” In the majority of cases, there is only fragmentary overlap in terms of content, topic, and potential use. A 2007 ADS/English Heritage report on the challenges of Big Data in archaeology identified four types of data format potentially relevant to Big Data in the field: LIDAR (Light Detection and Ranging or Laser Imaging Detection and Ranging) data, which models terrain elevation modelled from airborne sensors, 3D laser scanning, maritime survey, and digital video. At first glance this appears to underpin an assumption that the primary focus is data formats which convey larger individual data objects, such as images and geophysics data, with the report noting that “many formats have the potential to be Big Data, for example, a digital image library could easily be gigabytes in size. Whilst many of the conclusions reached here would apply equally to such resources this study is particularly concerned with Big Data formats in use with technologies such as lidar surveys, laser scanning and maritime surveys.”
However, the report also acknowledges that “If long term preservation and reuse are implicit goals data creators need to establish that the software to be used or toolsets exist to support format migration where necessary.” It is true that any “Big Data” which is created from an aggregation of “small data” must interoperate. In the case of “social data” from mobile devices, for example, location is a common and standardizable attribute that can be used to aggregate Tb-scale datasets: heat maps of mobile device usage can be created which show concentrations of particular kinds of activity in particular places at particular times. In more specific contexts hashtags can be used to model trends and exchanges between large groups. Similarly intuitive attributes that can be used for interoperation, however, elude archaeological data, although there is much emerging interest in Linked Data technologies, which allow the creation of linkages between web-exposed databases, provided they conform (or can be configured to conform) to predefined specifications in descriptive languages such as RDF. Such applications have proved immensely successful in areas of archaeology concerned with particular data types, such as geodata, where there is a consistent base reference (such as latitude and longitude). However, this raises a question which is fundamental to archaeological data in any sense. Big Data approaches here, even if the data is not “Big” in terms of relative terms to the social and natural sciences, potentially allows an “n=all” picture of the data record. As noted above, however, this record represents only a tiny fragment of the entire picture. A key question, therefore, is does “Big data” thinking risk technological determination, constraining what questions can be asked? This is a point which has concerned archaeologists since the very earliest days of computing in the discipline. In 1975, a skeptical Sir Moses Finley noted that “It would be a bold archaeologist who believed he could anticipate the questions another archaeologist or a historian might ask a decade or a generation later, as the result of new interests or new results from older researchers. Computing experience has produced examples enough of the unfortunate consequences … of insufficient anticipation of the possibilities at the coding stage.”
Such questions probably cannot be predicted, but big data is (also) not about predicting questions. The kind of critical framework that Big Data is advancing, in response to the ever-more linkable mass of pockets of information, each themselves becoming larger in size as hardware and software barriers lower, allows us to go beyond what is available “just” from excavation and survey. We can look at the whole landscape in greater detail and at new levels of complexity. We can harvest public discourse about cultural heritage in social media and elsewhere and ask what that tells us about that heritage’s place in the contemporary world. We can examine what are the fundamental building blocks of our knowledge about the past and ask what do we gain, as well as lose, by putting them into a form that the World Wide Web can read.
- Archaeology data service. http://archaeologydataservice.ac.uk. Accessed 25 May 2017.
- Austin, T., & Mitcham, J. (2007). Preservation and management strategies for exceptionally large data formats: ‘Big Data’. Archaeology Data Service & English Heritage: York, 28 Sept 2007.Google Scholar
- Gaffney, V., Thompson, K., & Finch, S. (2007). Mapping Doggerland: The Mesolithic landscapes of the Southern North Sea. Oxford: Archaeopress.Google Scholar
- Gaffney, C., Gaffney, V., Neubauer, W., Baldwin, E., Chapman, H., Garwood, P., Moulden, H., Sparrow, T., Bates, R., Löcker, K., Hinterleitner, A., Trinks, I., Nau, W., Zitz, T., Floery, S., Verhoeven, G., & Doneus, M. (2012). The Stonehenge Hidden Landscapes Project. Archaeological Prospection, 19(2), 147–155.CrossRefGoogle Scholar