1 Introduction

The natural environment is under persistent and increasingly complex challenge from anthropogenic contaminants including radionuclides, heavy metals, organic pesticides, endocrine disruptors and mimetics. Increasingly these are found to interact to generate a combinatorial challenge; the multi-stressor problem (Vanhoudt et al., 2012). Consequently environmental protection, arguably inseparable from that of the human population, depends on accurate data and sample collection ; the production of theoretical models on which safety assessments may be made, and improvements in our understanding of the contribution of these agents to compromising the integrity of the biotic ecosphere (Mothersill et al., 2018, 2019; Salbu, 2009).

Both longitudinal, geospatial and niche-specific collection of both data and material for analysis are critical in our attempts to achieve these aims. However, the internationally distributed nature of studies, their details, and how to access data currently lacks a coherent or common platform or clearing house to allow investigators and regulators to discover datasets and material that may often, especially in the case of environmental samples, be completely unique. In this way the domain of ecology, more so than many others, is dependent on the archiving, discovery and access to data and samples in order to carry out comparisons, data aggregation/integration, and novel modes of analysis. Data and sample reuse are therefore critical to the field.

The issues of data archiving and reuse have been under considerable scrutiny in recent years, resulting in the formulation of the FAIR guidelines for Open Data (Wilkinson et al., 2016). Resulting from extensive consultation between funding agencies, journals and scientists, these guidelines have been adopted by many major funding agencies, the European Commission and formally by the countries of the OECD and G20 group of nations (Arzberger et al., 2004; Mons et al., 2017). Findability, Accessibility, Interoperability, and Reusability represent the four principles of Open data and are essential for effective data governance and management (Sansone et al., 2018). The advantages of data sharing are overwhelming, amongst which are improved reproducibility, accountability, and the added value, both scientific and financial, of reusing data for purposes for which it was not originally intended; aggregating with other datasets, or conducting novel analysis in the light of new methods or paradigms (Cook-Degan, 2007). For the individual this also provides increased recognition and often collaborations or further developments of studies that they had not anticipated.

There has always been acceptance in the ecological sciences that release of primary data by investigators is an important norm and, while this is not always respected, it is fair to say that this community has an excellent track record in comparison to many (Michener, 2015). The development of very large datasets in recent years has increased the willingness to share, though there are still some issues, common with other disciplines (e.g. Blumenthal et al., 2006), which inhibit full and free sharing. These include protectionism, concerns that flaws in analysis might be revealed, lack of time, expertise or funding for preparation of data for upload, lack of appropriate sharing platforms, concerns over intellectual property protection, and loss of “ownership”. These are common to many disciplines and, although radioecologists were a small category of respondents in a recent survey of data sharing in a large European radiobiology project, CONCERT, the responses received broadly reflected these common findings (Madas & Schofield 2019).

Responsibility for encouragement of data sharing rests significantly on funding agencies and journals. Current funder policies vary according to agency but although data sharing is encouraged, mandatory sharing is not stipulated by most agencies funding ecological research. A notable exception for example is the UK NERC, though the European Commission is currently undertaking a data sharing pilot study (Horizon 2020 guidelines, 2016). A summary of many funding agency and journal policies can be found on the FAIR sharing website; (https://fairsharing.org/) and (McQuilton et al., 2016).

Journals are increasingly developing policies to conform to the criteria laid out in the TOP (Transparency, Openness and Reproducibility) guidelines (Nosek et al., 2015) with the aspiration of requiring mandatory sharing of data and resources as a condition of publication. So far however, with the exception of PLoS journals (Bloom et al., 2014), there seems to have been little impact on the availability of primary publication data (Federer et al., 2018) and in the domain of radioecology data sharing is encouraged by many journals, often implementing a publisher’s blanket policy, such as Elsevier, but we have been unable to identify a journal in the area of radioecology or radiation biology for which deposition into a public database is mandatory.

The problem of data availability has recently been raised by Beresford et al. (2020) and reflects a common issue about the provision of summary data alone, or in some cases no primary data at all. Withholding of primary data not only slows the progress of science, for example withholding unique contamination datasets, but also makes intercomparison and aggregation of datasets impossible. This adds to the uncertainty about reliability of conclusions where it is impossible to replicate the analysis. Where this kind of problem impacts on regulatory activities and safety assessments, with potentially huge implications for humans and environmental safety, as well as major economic impacts, there is an additional imperative for the community to ensure that the highest standards are met.

2 Environmental and Ecological Data

2.1 Environmental Information Data Centre

Several structured databases have been established specifically for the general domain of ecology. The UK Natural Environment Research Council’s Environmental Information Data Centre is hosted by the Centre for Ecology & Hydrology (CEH) and provides access to data and tools related to integrated research in terrestrial and freshwater ecosystems and their interaction with the atmosphere. It is a well structured data resource with a high degree of FAIR compliance (NERC Data Centre; http://eidc.ceh.ac.uk/), This database currently contains 15 radioecology datasets (data accessed 5.11.18) from a wide range of studies.

2.2 The Radioecology Exchange

The European Radioecology Alliance (ALLIANCE) was created in in September 2012, and has developed a framework and strategic plan for radioecology research which is now continued under COMET (COordination and iMplementation of a pan-European instrumenT for radioecology), a Coordination and Support Action funded by the EC/ Euratom FP7. As part of the COMET infrastructure the Radioecology Exchange was created to act as a platform portal for radioecological data (Muikku et al., 2018). The Radioecology Exchange contains a wide range of datasets from six European countries and Japan from the STAR Network of Excellence, and is an important resource for radioecology (https://radioecology-exchange.org/content/radioecology-data).

2.3 Other Dedicated Databases

The FREDERICA database (Copplestone et al., 2008) contains data on the effects of radiation on non-human biota curated from around 1200 papers with approximately 30,000 data points. The data contains details of exposures, biological effects, environmental conditions, life cycle and pathway of exposure.

The Wildlife Transfer Database (Copplestone et al. 2013) (https://www.wildlifetransferdatabase.org/) provides parameter values for use in environmental radiological assessments to estimate the transfer of radioactivity to non-human biota.

The PROBA UIAR database contains radionuclide spatial distribution data from the Chernobyl exclusion zone (Kashparov et al., 2018) and can be found both in the NERC data centre (Kashparov et al., 2017) and the STORE database (see below: https://doi.org/10.20348/STOREDB/1087).

The US Earth Observation system data and information database , EOSDIS (https://earthdata.nasa.gov/), which supports “discovery and processing of earth science data from satellite, aircraft and field campaigns” is also a source of some radiation ecology associated data but much less used by the community.

Radnet is the United States environmental radiation monitoring service (Wolbarst et al., 2008) which is run by the US environmental protection agency (EPA). It monitors the radionuclide content of air, precipitation and drinking water in the environment; in some cases in real time, and has historical records of ambient environmental radiation going back to the 1940s. Further information may be found on (https://www.epa.gov/radnet/radnet-databases-and-reports).

3 Biological and Inorganic Sample Archives

3.1 Radioecology Exchange Samples Register

Biomaterials from non-human biota and inorganic matter including water and air are generally archived as part of specific data gathering, often over protracted periods of time with the aim of gathering longitudinal data from the same site. This means that samples are scattered across the community and discovery of relevant material depends on familiarity with published studies. In an attempt to produce a clearing house for such samples collections the Radioecology Alliance has collected lists of available samples on its website, mainly derived from European studies. These include samples derived from air (mainly filters), water, soil and building materials, as well as biological material. The data records for these archives may be found on https://radioecology-exchange.org/content/sample-archives along with the appropriate contact details. Work is underway to curate these collections for the STORE database in order to improve accessibility and discovery for other investigators.

3.2 Sample Bank of Fukushima Animals, Japan

Following the Fukushima Daiichi Nuclear Power Plant (FNPP) accident, a sample bank of animals affected was established. Organs of domestic livestock in the evacuation zone, within a 20-km radius from FNPP, were sampled between August 29, 2011 and March 21 2013. Organs (1270) and peripheral blood samples (200) from 302 exposed cows have been archived, and analysis on radionuclide content carried out (Fukuda et al., 2013). Organs were either stored as formalin fixed, paraffin embedded blocks or frozen at −80 °C (Takahashi et al., 2015). More recently the sample bank has been augmented by the collection of organs from more than 400 Japanese macaques (Urushihara et al., 2018 and M. Fukumoto. Pers. Comm.). Detailed environmental dosimetry, geographical distribution and other data are available on request (manabu.fukumoto.a8@tohoku.ac.jp).

4 STORE DB; a Database for Radiobiology, Radioecology and Epidemiology

While there already exist public databases dedicated to particular domains or data types, such as Array Express (RNA expression studies (Kolesnikov et al., 2015)), and PRIDE (proteomics (Jarnuczak & Vizcaino, 2017)), domain specific databases which carry a wide range of data types relevant to studies on one theme, e.g. Mouse genome informatics (MGI), (genomic , variant and phenotypic data on mice (Eppig, 2017)) are much more rare. There are huge advantages in domain-specific databases, notably that of expert curation, data structure and, specifically, domain metadata. As long as domain-specific metadata are consistent with recognised standards and therefore allow data discovery and integration with other datasets, such databases can be important resources for a community.

Following the development of the ERA database between 1999 and 2011 (Gerber et al., 1996; 2006; Gerber & Wick, 2004; Tapio et al., 2008; Birschwilks et al., 2011), with the aim of sustaining legacy data from very large scale animal exposure experiments, it became apparent that there was a need for a database that would be available for the deposition and sharing of contemporary as well as legacy data that could be accessed by anyone in the community.

In response to this need, the STORE database was initiated under European Commission funding in 2009 and has been sustained through successive grants until the present. STORE provides a platform for all types of data, organised on a project basis. The “Study” provides a root directory into which datasets and individual data items can be loaded in a hierarchical fashion, in principle allowing for the multiple outputs of a project, protocols, raw data, processed data, etc. to be filed in order to document a complete project if desired. This structured clustering of data has advantages over the approaches taken by commercial data-agnostic repositories that are centred only on the data entry itself and although it is not mandatory to structure data entries in this way it is very helpful for large integrated projects as repository for the research methods and outputs to be archived and shared. This is particularly helpful when referencing data and protocols in publications, as STORE generates stable accession identifiers and digital object identifiers (DOIs) which can be referenced in publications rather than depositing information as journal supplementary data. STORE is also used to archive links to datasets in other databases, and is completely integrated with the ERA database. Where bio- or inorganic matter material collections are entered, any web presence, database of material or other formal point of contact such as a curator may be recorded and these collections are described together with any publications.

Data and datasets are tagged with metadata terms taken from the Ontology for Biomedical Investigations (Bandrowski et al., 2016) and the Experimental factor ontology (Malone et al., 2010). There are ongoing efforts to add further terms to these ontologies for radiation biology, but this is work in progress. Use of established semantic standards for data tagging will become important in making STORE FAIR, and allowing programmatic access and data discovery in the future. STORE provides persistent digital object identifiers and accession IDs which use a persistent namespace formally registered with identifiers.org at the EBI. It is recognised by the FAIRSharing initiative (McQuilton et al., 2016) and re3data (Pampel et al., 2013).

Although curatorial help and training are available currently from the STORE team, users can generate a user account using their ORCID identifier, upload, tag and describe their data themselves in an intuitive GUI. Deposition and access to data are free to individual investigators and institutions. Data is stored live for a guaranteed period of 7 years after the most recent access, after which it will be stored successively for another 7 years and so on. If data are not accessed for longer than 7 years they will be taken offline and archived as a way of making sure that STORE has sufficient capacity to take new data while retaining everything that has ever been entered. STORE is available on http://www.storedb.org (Fig. 4.1) through an html interface, although programmatic access is also planned in the near future as recommended in the FAIR guidelines (Wilkinson et al., 2018).

Fig. 4.1
figure 1

A screenshot of the front page of the STORE database; http://www.storedb.org. 30.1.19

The database is physically located within the secure BfS network platform and the BfS has undertaken to maintain the database indefinitely which means that data will be secure and accessible for the foreseeable future. Currently STORE contains more than 3000 data objects across a wide range of data types and has 95 registered data depositors.

5 Database and Bioresource Sustainability

The adoption of open data policies by journals and funding agencies does not seem to have made significant impact on sharing, and there remains considerable resistance even in communities where sharing is regarded as a community norm (Piwowar, 2011; Savage & Vickers, 2009; Tenopir et al., 2011, 2015). Even when data is deposited, for example in journal supplementary information sites, it is often not complete or, due to missing information, effectively unusable. In a recent study only 56% of the data purportedly available from sampled ecological studies was found to be inadequate and effectively unusable (Roche et al., 2015; Roche, 2017).

The development of supplementary information sites for journals over the last 20 years is no longer regarded as an adequate repository for primary data, as many of these repositories are unstructured and lose data (Alsheikh-Ali et al., 2011; Anderson et al., 2006), undiscoverable or data not actually submitted, in contradiction to explicit journal policies (Federer et al., 2018). It is clear therefore that stable repositories, such as provided by STORE and other public databases constitute an essential part of biological data infrastructure.

Sustaining the infrastructure for data sharing is a major challenge for which there is as yet no satisfactory formula applicable to all communities (Chandras et al., 2009; Kaiser, 2016; Reiser et al., 2016; Sansone et al., 2018; Schofield et al., 2010). A recent study (Attwood et al., 2015) of the long-term sustainability of databases reported that 62.3% of 326 databases listed in a 1997 directory, DBCat, were no longer operating after 18 years. Database longevity was strongly associated with long-term sources of financial support from local institutions or central government funding, and databases which failed to be sustained were funded by short term grant support. The competition for funding between infrastructure and primary research is often cited as the major problem in sustaining databases, and particularly biomaterial collections and archives. This essential conflict is found even in funding agencies with avowed infrastructure funding programmes, which nevertheless limit funding periods to at the most five years and often two or three. It is relatively straightforward to get a new database off the ground or raise money for the establishment of a materials collection, it is quite another to sustain it in to the future, even with extensive use and value, as recently demonstrated by the major reduction of funding to five important model organism databases by the National Human Genome Research Institute of the United States National Institutes of Health (Check-Hayden, 2016; Kaiser, 2016).

We have been fortunate with STORE that the BfS have taken responsibility for the maintenance of data, but resources for running the database, curating data and conducting training still need funding to permit ongoing activities. One model for international funding, to our knowledge unique in the biosciences, is the United Kingdom BBSRC and the United States NSF collaborative funding scheme which coordinates funding strategies and procedures for international projects and infrastructures and supports researchers wishing to apply for UK-US collaborative research funding. In another Europe model, many international databases and resources are being supported by the ELIXIR intergovernmental organisation, which integrates and sustains infrastructure and resources for bioinformatics across Europe (Durinx et al., 2016). The ELIXIR model requires national funders to contribute to resources, providing additional funds to coordinate and integrate them nationally and offers a potential model for the funding of databases like STORE which are of international value. Alternative models where users pay for access, such as the commercial sharing resources explicitly discriminate against nationals of countries insufficiently wealthy to access the data. This model was discussed and to a degree implemented in a multi-tier model by the Arabidopsis database community (International Arabidopsis Informatics 2010, Reiser et al., 2016), but with serious concern about exclusion of some significant stakeholders for example from third world countries; there would be similar issues with radioecology data as many countries with serious radioecological challenges have poorly funded science.

6 Conclusions

The foresight shown by the European Commission and the BfS in supporting sharing infrastructures in the overall field of radiation biology has resulted in the discipline being somewhat ahead of the pack in having a dedicated sharing platform open to the whole community. Similarly the ALLIANCE consortium has been able to amply demonstrate the importance of coordinating access to ecological data and materials in the Radioecology Exchange. Nevertheless, the field faces the challenge of failure to share data and materials. In many cases this is an issue of training in data management and community norms, or financial constraints on distribution of materials. However, it is also clear that in some cases active decisions are being made to withhold primary data. Addressing these issues of training and culture would seem to be some of the most important that we face and require support from the community, funding agencies and stakeholders to improve the reproducibility of analyses and realise the added value gained from access to the results of publicly funded data. Sustainability of the platforms is another issue to be addressed urgently, and while different models for sustainability exist it is clear that without the coordinated input of governmental and other funding agencies this is not likely to be viable in the long run.

Investigators in the area of radioecology and radiation protection have additional responsibilities to those in many other areas of the biosciences. The maintenance of public trust in our research and our inputs into the activities of regulators and public policy bodies are a critical element in our work. The safety of humans and the ecosphere within which we live is directly dependent on maintaining this this trust and openness.