Introduction

The potential value of palaeoecological and geological databases has increased considerably in recent years, driven by increasing amounts of data and the use of vegetation models to study the past and forecast the future (Miller et al. 2008; Sitch et al. 2008). Databases such as the European Pollen Database (EPD, http://www.europeanpollendatabase.net) are now considerably more than just long-term data repositories, having become important tools in multi-disciplinary research projects. The large body of European Quaternary pollen data is widely dispersed in the literature, but when organised into a common format, it becomes accessible for research into broad-scale vegetation dynamics and their interactions with climate and long-term development of human societies. The intention of this paper is: (1) to review the development of the EPD and (2) to highlight efforts made and solutions found to improve this data archive so that it may better serve the scientific community in the future. The decision to write such a paper was driven by the combined aims of giving a general overview of the past and current developments as a background to the future improvement of the EPD, and presenting the ongoing work of the Mapping and Data Accuracy Support Group. This paper is not intended as a definitive review, and parts may be biased by the personal views of the authors.

Role of the EPD

Pollen stratigraphies are probably the most spatially extensive data available for the reconstruction of past changes in terrestrial and aquatic vegetation composition. In addition to using pollen records to investigate vegetation dynamics at individual sites through time, palaeoecologists have used the large amount of information stored in the database to address a range of scientific questions at regional or continental scales, such as (1) the reconstruction of patterns of past climate change through time and space (Davis et al. 2003), which in turn is important in studies of general circulation models in the past (Bonfils et al. 2004); (2) studies of the spread of plants, especially trees, since the last glaciation (Brewer et al. 2002; Terhürne-Berson et al. 2004; Giesecke and Bennett 2004; Conedera et al. 2004; Krebs et al. 2004; Van der Knaap et al. 2005; Magri 2008); (3) reconstructions of past plant distribution patterns which allow testing of our understanding of factors limiting these and models that attempt to capture them (Giesecke et al. 2007; Liepelt et al. 2008) and increased precision in past land-cover reconstructions (Gaillard et al. 2008; Broström et al. 2008; Caseldine et al. 2008). In addition, knowledge of pollen-inferred past land-cover changes allows evaluation of the consequences and legacies of past land-use and provides information on the dynamic responses of vegetation to a constantly changing environment. This may allow us to evaluate threats to our natural environment and define aims for the conservation and management of Europe’s landscape (Anderson et al. 2006; Willis et al. 2007).

The increasing understanding of these topics, and the need to analyse spatial patterns, makes it necessary to draw information from more than a single pollen record. Many of these investigations require the availability of datasets because only limited information can be extracted from printed pollen diagrams. Therefore, data archives such as the EPD and other global databases play an important role in the collation and archiving of data at the extensive spatial scales needed for analyses at regional or continental scales. The storage of original pollen count data and their associated metadata is important for between-site comparisons and spatial analyses. The EPD has become the main archive for provision of the above functions for Quaternary pollen analytical results from western Eurasia. In addition to serving as a data archive for extensive spatial analyses, the EPD also plays an important role for individual data contributors in mitigating against the inevitable metadata loss (“information-entropy”, sensu Mitchener et al. 1997) that occurs through time (Fig. 1). Both count data and metadata have a natural decay function through time as a result of memory recall, accidental data loss, changes in storage media and subsequent incompatibility, and retirement or death of the original investigators.

Fig. 1
figure 1

Decay function in information and data associated with pollen sequences through time (adapted from Mitchener et al. 1997)

A short history of the EPD

Towards the end of the 1980s the IGCP 158b working group led by B. E. Berglund and M. Ralska-Jasziewisczowa, and palynologists involved in EU research programs on palaeoclimatology (A. Pons, W. A. Watts, B. Huntley) both needed improved maps of late Quaternary palaeovegetation based on the large sets of pollen data acquired since the 1960s. The work of Huntley and Birks (1983) had demonstrated the value of spatial analysis of pollen data, and had a great influence on many European palynologists. These groups realised the need for a pollen database (the EPD). In 1988 the first EPD-related discussions and meetings took place involving, amongst others, J. Guiot, I. C. Prentice, B. Huntley, B. E. Berglund and G. L. Jacobson. These discussions resulted in a meeting in 1989, convened by B. E. Berglund, to discuss the organisation of the database, which was attended by representatives of 18 European countries. The proposal by A. Pons to host the EPD in Arles was accepted. Two subsequent workshops in Wilhelmshaven (1990) and Arles (1991) led to the definition of the database structure, the administrative structure (an Advisory Board representative of the different European regions and an Executive Committee of three persons invited to meet every year to review EPD progress) and the “protocol” ruling the rights and the duties of data contributors and users. The EPD and the North American Pollen Database (NAPD) were established simultaneously through active collaboration with E. Grimm and J. Keltner. This was done to ensure compatibility and to contribute towards the ultimate goal of a Global Pollen Database. In 1990, thanks to European and French national funding, R. Cheddadi was appointed as EPD Manager to work alongside J. Guiot, J.-L. de Beaulieu and later S. Hicks, who served for many years as the Executive Committee Chair. In January 1991 the first newsletter was sent to European Quaternary palynologists asking them to contribute their data to the EPD.

The EPD was set up to provide an inclusive and permanent archiving facility to all palynologists for storing the basic data that had been generated within European research. It was anticipated that the EPD would also become a tool by means of which further research on biogeographical, palaeoclimatological and palaeoecological problems could be addressed, at a variety of different spatial and temporal scales. The role of past environmental archives in the understanding of global climatic change was clear from the early 1980s; contemporary societal concerns about climatic change have resulted in an even greater role for archives of past environments.

The 1990s were very busy times for the EPD, with numerous training courses organised in Arles and elsewhere. A major task was to harmonize the pollen taxonomy and nomenclature, which was first undertaken by B. Huntley and his group, and later developed and finalized in 1993 by a working group coordinated by M. J. Gaillard and joined by S. Peglar, J. van Leeuwen, L. Wick and J.-L. de Beaulieu. The BIOME6000 initiative stimulated palynologists to share pollen data, and much of the data from the former Soviet Union and Mongolia was compiled as part of this project, facilitated by funds obtained from the EU (INTAS) to promote the participation of these partners (Prentice et al. 1996; Tarasov et al. 1998). Unfortunately, the team hosting the EPD (IMEP) did not succeed in securing a permanent position for a database manager from their French administration. From 1995 to 2003 funding for the further development and management of the EPD was dependent upon collaboration with forest geneticists involved in EU research projects (FAIROAK, CYTOFOR and FOSSILVA) that used the EPD as a tool for linking phylogeography and palaeobiogeography of forest trees (Petit et al. 2002; Cheddadi et al. 2006; Magri et al. 2006). When the FOSSILVA project ended in the early 2000s the EPD was unfunded. It managed to survive thanks to the altruistic contribution of R. Cheddadi, whose position was now supported by other projects, thus limiting his ability to commit time to the EPD. The EPD became a relict database, with no development or incorporation of new data. At the end of 2006 IMEP obtained a permanent position for a new database manager (M. Leydet) from the University of Aix-Marseille and data compilation resumed, with the support of the NOE EVOLTREE project.

In May 2007 a special open meeting to discuss the future of the EPD was convened by R. Bradshaw and J.-L. de Beaulieu in Arbois (France) under the auspices of a EuroCLIMATE workshop with sponsorship from EVOLTREE. The workshop, attended by 78 European palynologists, had a range of outputs that can be reviewed on the EPD website (http://www.europeanpollendatabase.net). The most important result was the foundation of a range of support and working groups to help maintain and update the database. The Mapping and Data Accuracy working group (MADCAP) is one of the EPD support groups founded at this meeting. Another such group, the taxonomy and nomenclature working group, already existed and has provided essential services for the database in past years. At the 2007 open meeting, all support groups were initially populated with volunteers present or represented by colleagues. Some groups have subsequently grown in the number of their members. All support groups are open to anybody interested in supporting the EPD. In September 2008 these working groups reported to a well-attended open meeting of the EPD at the International Palynological Congress, Bonn, at which a new administrative structure for the EPD was proposed and accepted. It was decided that the EPD would be managed by a board comprising an elected chairman and the spokespersons of the working groups. The term of office of the chairman would be 4 years and support groups elect their spokespersons as they see fit. Richard Bradshaw was subsequently elected as chairman.

Status of the EPD

Since its establishment in 1992, many European palynologists have submitted pollen counts to the EPD and in January 2009 a total of 1,032 pollen sequences from 877 sites were held in the database; together these comprise well over one million individual pollen counts. Of these sequences, referred to in the database as entities, 668 have an associated chronology that is in most cases based on an age-depth model that makes use of radiocarbon-dated samples from a series of known depths (Figs. 2, 3, 4). The full original information on age determinations, most commonly radiocarbon, is stored in archival tables of the database. Age-depth models derived from this information are stored in research tables and several age-depth models may be stored for any individual entity. Currently these models are generally based on the uncalibrated radiocarbon time-scale. However, efforts are being made to construct age-depth models using calibrated radiocarbon age determinations. The EPD also gives palynologists the option to archive datasets but restrict access to the count data. This option may be used by authors who are still in the process of publishing their data but at the same time wish to highlight its existence because this may lead to the establishment of new collaborations. The EPD currently holds data for 143 sites that have a restricted status.

Fig. 2
figure 2

Distribution of dated (red dots) and undated (white dots) pollen sequences in the EPD in western Eurasia

Fig. 3
figure 3

Distribution of dated (red dots) and undated (white dots) pollen sequences in the EPD in eastern Eurasia

Fig. 4
figure 4

Temporal distribution of pollen samples with an allocated age within the EPD for the last 21,000 years; samples with older ages were cut off

As a result of the considerable collaborative effort from the outset, the database is largely compatible with other continental databases such as the African Pollen Database, the Latin American Pollen Database, the North American Pollen Database and the Global Pollen Database. It currently contains 57 tables divided into five categories: archival; look-up; research; system; and views. Of these, the archival tables contain the original data (for example raw counts), the look-up tables contain reference information (e.g. plant taxonomy) and the research tables contain information relating to analyses of the data (for example age-depth models). The database is currently managed using Paradox, although it has also been transferred to PostgreSQL to allow web access and there is a general agreement to migrate to a new database system. The complex table structure of the EPD was designed to make full use of the power of a relational database, so that all entries in the database can be queried concurrently. It is thus possible, for example, to find all sites with chronological information that have more than 1% Plantago lanceolata pollen 5,000 radiocarbon years ago. In order to be able to execute such a query, the user needs to download the full database. Users less experienced with working with Paradox or PostgreSQL databases may find it easier to work with the database in MS Access, and an access version of the database has been provided for download from the EPD web site.

Users who are interested in working with a small number of sites or particular sites may find it easier to use the online facilities. Currently the data in the EPD can be browsed, queried, visualised and selected datasets downloaded using the ‘Fossil Pollen Database Viewer’ developed by Nicolas Garnier at MEDIAS-France in close co-operation with the African Pollen Database (managed by A. M. Lezine).

Developments in approaches to storing and accessing palaeoecological datasets will result in changes to the underlying structure of the EPD in the near future. There is a need to migrate the EPD to a modern SQL-compatible database structure and to change the table structure in line with other palaeoecological databases. These developments will offer the opportunity to combine other types of palaeoecological data into the database. For example, the growing macrofossil databases with European data could be easily incorporated. A downloadable Access version of the database will become available with a range of different example queries that allow inexperienced database users to gain full use of all aspects of the database. These developments will make it simpler for all palaeoecologists to access and use the database. It is hoped that greater use of the EPD will also lead to positive feedback and user-led improvements, for example, improved age-depth models generated within research projects should be submitted. Maintenance of the database in both the current and projected future format requires substantial effort from the database manager. Collaborative community effort through new initiatives such as the EPD support groups will inevitably lead to a database that is better able to serve the needs of European palynologists and other potential users.

MADCAP activities

The MADCAP support group aims to make the data in the EPD more available to the scientific community and thus to enhance its use. The key goal of the group is to produce a new web-based version of a European palaeo-vegetation atlas that provides maps of past pollen percentages for visualisation, teaching purposes and as a basis for data-model comparisons. In order to achieve this goal the group has undertaken a systematic review of the data currently held in the EPD with the aim of identifying problems with individual site records, and of flagging for correction of any errors within the database. This process has followed a standardized protocol. Data have been downloaded from the EPD, pollen diagrams constructed and, wherever possible, checked against the original publications. In the first instance, sites that have some chronological control were chosen and age-depth models were included in the review process, as these will form the basis for the palaeovegetation atlas. The age-depth models for each site within the database have also been checked. Members of the group combine different regional expertise so that diagrams from most European regions were checked by a person with knowledge about their regional vegetation history. Where data handling errors have been identified, this has been fed back to the database manager, who is a member of the group, for flagging or, where possible, correction. MADCAP members have checked 711 pollen sequences. In the effort of compiling maps for the past, collaboration with the Taxonomy and Nomenclature Support Group (PTN) is essential. While MADCAP focuses on the metadata and count data, the PTN ensures that the identifications of different analysts are comparable and that different taxonomic levels can be resolved. Already, the pollen taxonomy and nomenclature of all pollen data entered in the database has been checked and harmonized by the PTN group.

Generation of the new palaeovegetation atlas is in progress using the dated sites from within the EPD. At the present time age-depth models are being constructed. The final dataset used to compile the atlas will be made available to the wider community following completion of the project, as will a grid-plan of results for each taxonomic unit.

Errors within the EPD

The EPD evolved at a time when personal computing was an emerging growth area rather than a matter of routine. As a consequence there are a number of errors within the EPD that result from data entry, handling, and conversion. Errors are manifest in both the metadata and in the raw count data. Although these errors are more common in older datasets, they still occur in datasets that have been submitted more recently and are to some extent inevitable. The most severe errors in the metadata are incorrect latitude/longitude information that may lead to site location errors of up to hundreds of kilometres, and incorrect or missing site references. These errors were encountered in 1.1% and 6.6% of checked sites, respectively (Table 1). A less severe error for the production of a palaeovegetation atlas is the incorrect or missing elevation of a site. With regional knowledge, these errors generally are easily detected and correctable.

Table 1 Types and numbers of errors encountered in the EPD

Errors within the raw counts are typically the result of the process of conversion between different file types or importing the dataset into the EPD. Errors may be systematic within sites (for example, switching of count data between taxa A and B), or random (one taxon in an individual sample swapped for another). These errors are usually obvious: the count for Artemisia may be switched with that for Alnus for a single sample, for example, resulting in an isolated high value for one associated with an atypical low value for the other. In very few cases, entire samples were switched (for example, switching of depths between samples X and Y) or assigned an incorrect depth. These errors can be confirmed by checking against the original publication. It is possible that count data for individual taxa are missed through the data conversion. In such cases the original data contributor may need to be contacted to re-submit the dataset, where this is still possible.

The errors described so far are critical and unfortunate, because they are situated in the archival tables. Less serious for the database, but important for users, are errors, misjudgements or misinterpretations in the construction of age-depth models. Age-depth models that are based on very few age determinations often interpolate and/or extrapolate over many thousands of years and have very large but rarely quantified uncertainties (Parnell et al. 2008). This can result in errors that may only be obvious, for example, where a late-glacial pollen spectrum is assigned to a Holocene age or vice versa. MADCAP members have, to date, suggested changes to the chronologies, or already made new ones, for a number of sequences (Table 1). Chronologies based on only one or two radiocarbon dates have been flagged due to their potentially low precision. However, users are encouraged to be critical of the available age-depth models and where necessary to construct their own. Users who construct new age-depth models are encouraged to submit these to the EPD where they will be stored in research tables.

All contributors and users of the EPD are strongly encouraged to report any errors which they encounter to the database manager. Each error should be clearly described and, if possible, suggestions made as to how it may be resolved.

Concluding remarks: the benefits of submitting data to and building the EPD

Submission of pollen data offers a range of benefits to individual European palynologists and to the community as a whole. Over time, memories fade, storage media evolve and data loss occurs. The EPD provides an archive that can reduce the chance of at least part of this loss. Submission of data may act as an additional tool for wider dissemination of findings, especially in cases where the data is published in books or regional journals. In the past, submission of data has led to increased collaboration between data contributors, resulting in innovative projects, joint publications and enhanced citation of work. In order to further facilitate this, the international peer-reviewed journal Grana now offers short publications of pollen diagrams that have been submitted to the EPD in a new ‘Contributions to the European Pollen Database’ section (Bradshaw 2007); recent examples include Jankovská et al. (2007) and Stefanova et al. (2008). The EPD also has dedicated space on the website for a wiki (http://www.europeanpollendatabase.net/wiki/). This space has been developed to foster discussion, community building, participation and knowledge exchange amongst European pollen analysts.

The precision of spatial analyses of pollen data is related to the number of data points (sites) that are available for analysis. Such studies, and their application to data from the EPD outlined above, are made possible by the willingness of European pollen analysts to share their data in public access archives. The original aims of the EPD were to provide an archiving facility for the pollen analytical community with the understanding that data would be accessible and open for use by all. This remains true, and the importance of such an archive has been seen in recent trends towards using observations of past environmental changes and their consequences to improve forecasts of future changes and their potential impacts. We wish to take this opportunity to encourage the submission of pollen analytical results as a routine part of research. Only in this way will we be able to gain a better understanding of the past and contribute to a sustainable future.