Diversity, fragmentation, and connectivity across the UK amphibian and reptile data management landscape

Large-scale biodiversity monitoring remains a challenge in science and policy. ‘Biodiversity Observation Networks’ provide an integrated infrastructure for monitoring biodiversity through timely discovery, access, and re-use of data, but their establishment relies on an in-depth understanding of existing monitoring effort. We performed a scoping review and network analysis to assess the scope of available data on amphibians and reptiles in the UK and catalogue the mobilisation of information across the data landscape, thereby highlighting existing gaps. The monitoring portfolio has grown rapidly in recent decades, with over three times as many data sources than there are amphibian and reptile species in the UK now available. We identified 45 active sources of ‘FAIR’ (‘Findable’, ‘Accessible’, ‘Interoperable’ and ‘Reusable’) data. The taxonomic, geographic and temporal coverage of datasets appears largely uneven and no single source is currently suitable for producing robust multispecies assessments on large scales. A dynamic and patchy exchange of data occurs between different recording projects, recording communities and digital data platforms. The National Biodiversity Network Atlas is a highly connected source but the scope of its data (re-)use is potentially limited by insufficient accompanying metadata. The emerging complexity and fragmented nature of this dynamic data landscape is likely to grow without a concerted effort to integrate existing activities. The factors driving this complexity extend beyond the UK and to other facets of biodiversity. We recommend integration and greater stakeholder collaboration behind a coordinated infrastructure for data collection, storage and analysis, capable of delivering comprehensive assessments for large-scale biodiversity monitoring.


Introduction
Human activity is influencing biodiversity turnover across the globe (Dornelas et al. 2014;Keil et al. 2015;Kaarlejärvi et al. 2021).Monitoring biodiversity at large spatial and temporal scales is central to understanding the magnitude of change, and is important in conservation planning and resource allocation by decision-makers (Parr et al. 2002;Petersen et al. 2021;Thornhill et al. 2021).To understand species status and monitor change requires high-quality data.Data must provide sufficient taxonomic, temporal and geographic coverage to reliably inform evidence-based conservation (Wetzel et al. 2018).Given the economical and logistical constraints involved with monitoring, our understanding of biodiversity turnover would likely benefit from combining data originating from different sources.Modern statistical advancements can facilitate the integration of data to produce accurate assessments on the state of biodiversity (Isaac et al. 2020).However, effective monitoring for the long-term requires streamlining disparate efforts in the collection, storage and analysis of biodiversity data.
Historically, biological recording has been coordinated by institutions and carried out by volunteer recorders (Roy et al. 2012;Pocock et al. 2015).Nowadays, the popular practice of engaging volunteers in a scientific project is commonly regarded as 'citizen science' (Cohn 2008) which can generate 'community contributed data'.Citizen (or 'community') scientists can assist with the collection of biodiversity data on large spatial and temporal scales that would otherwise not be feasible using small-scale studies (Cohn 2008;Pocock et al. 2017;Dobson et al. 2020;Thornhill et al. 2021).Citizen science projects can vary in their objectives and methodological approaches, and, as with any dataset, can be subject to errors and biases imposed by observational processes (Oliveira et al. 2016;Dobson et al. 2020).Once the characteristics of datasets are understood, constituent data can be handled for bias mitigation (Roy et al. 2012;Isaac et al. 2014;Dobson et al. 2020).For instance, sophisticated analytical tools have emerged in recent years to assist data users to assess (e.g., Boyd et al. 2021) and address (e.g., Bird et al. 2014;Geldmann et al. 2016) sampling errors and biases within observation datasets.Datasets derived from citizen science projects can therefore complement small-scale systematic monitoring and are important sources of observation data, usable by scientists and resource-managers for monitoring biodiversity change (Roy et al. 2012;Bonney et al. 2014;Burgess et al. 2017;McKinley et al. 2017;Tredick et al. 2017;Thornhill et al. 2021).
The value of citizen science, combined with a surge in technological innovation, has increased the diversity of projects that generate data in recent years (Kosmala et al. 2016;McKinley et al. 2017;Pocock et al. 2017;Thornhill et al. 2021).Data management has also evolved with the increased wealth of biodiversity data.Data flows from individuals (e.g.recorders, project organisers, stakeholder groups, consultants) to digital data platforms (e.g.web-based apps, e-infrastructure data portals, multi-dataset repositories) via an array of quality assurance techniques (e.g.automated data verification tools, validated datasets) applied at various levels before reaching data users (James 2011;Roy et al. 2012;Pocock et al. 2015).Collectively, such diversification has resulted in a rich data-gathering landscape.Despite these advances, significant challenges remain for collating and analysing data from different sources.These include issues associated with data confidentiality and fears that open data could endanger sensitive species and their habitats (e.g.persecution, or accidental damage to sites by naturalists wanting to see species); as well as a reluctance to share data that could be used for commercial purposes (Griffiths et al. 2015;Fox et al. 2019).Whilst a rich array of biodiversity data now exists, there remain significant barriers to ensuring that it informs decision-making efficiently.Nonetheless, at a time where there has never been a greater wealth of data available, recent advances in computing mean that there is now also an accompanying suite of statistical tools available to researchers to maximise the re-use of multiple datasets.Hence, the potential to widen the reach of existing biodiversity monitoring efforts and increase the re-use of data through integration is rapidly building momentum.
Effective integration requires a framework that unifies disparate monitoring efforts (König et al. 2019).By its very nature, data integration is dependent upon data sharing.The sheer magnitude of all biodiversity on Earth means that no single institution can know more than a tiny proportion of it at any given time (Walters and Scholes 2017).Stakeholder collaboration in the collection, sharing and analysis of biodiversity data therefore has clear benefits for biodiversity monitoring.Whilst innovative technology can enhance data collection, database design, and data sharing (Roy et al. 2012), large-scale monitoring is difficult without a coordinated infrastructure (Walters and Scholes 2017).'Biodiversity Observation Networks' (e.g., see Wetzel et al. 2018) are collaborative organisational structures for monitoring biodiversity through sharing data, resource and expertise by stakeholders (Walters and Scholes 2017).A network can thus increase the mobilisation of data for research and resource-management.Accordingly, more precise tracking of biodiversity across space and time is possible, thereby enriching the contribution of disparate monitoring efforts (Constable et al. 2010;Jetz et al. 2019).However, the success of a network relies on uptake of common practices across the sector (König et al. 2019).Examples of relevant practices include data standards such as Darwin Core (Wieczorek et al. 2012) for data storage and guiding principles such as the FAIR data principles (Wilkinson et al. 2016) to ensure that data may be 'Findable', 'Accessible', 'Interoperable' and 'Reusable', thus enabling the mobilisation of usable biodiversity data.
While advances in technology will facilitate sharing of 'best practice' for handling and mobilising biodiversity data, there will still be challenges associated with maximising the quality of data and overcoming biases.Understanding the differences amongst data sources; their taxonomic, temporal and geographic scope; and the flow of data between sources can identify gaps in existing monitoring portfolios.In turn, this can illuminate ways in which monitoring and data may be integrated (Petersen et al. 2021).For instance, the monitoring of amphibians and reptiles in the United Kingdom (UK) is carried out in a number of ways by a diverse recording community.There are thirteen species of amphibians and reptiles native to the UK, many of which have experienced recent declines (Humphreys et al. 2011;Wilkinson and Arnell 2013;Beebee and Ratcliffe 2018;Gardner et al. 2019).Whilst climate change and habitat degradation threaten the UK populations (Dunford and Berry 2012; Turner and Maclean 2022) and may be implicated in declines, formal assessments of status and national trends have largely relied on anecdotal evidence (Hayhow et al. 2019).The lack of empirical evidence is surprising given that such a limited range of species should be relatively simple to identify, and that several are also subject to legal protection with mandatory reporting requirements.Adopting an integrated, network approach is therefore likely to enhance the monitoring and conservation effort for these species.
As an essential first step towards understanding the value of existing data and opportunities for integration, we surveyed the UK amphibian and reptile data management landscape.We used a scoping review and network analysis to characterise and track the mobilisation of information across the data landscape.To our knowledge, this study is the first to use this approach in the context of biodiversity data exploration.We identified an array of existing sources of amphibian and reptile data, characterised the scope of data sources using the available meta-data associated with each source, and highlighted limitations in the existing monitoring portfolio for tracking species national status and population trends.The network analysis illuminated the dynamics at play within an unrealised Biodiversity Observation Network.To this end, the aims of this review were to: (1) Identify existing sources of UK amphibian and reptile observation data.
(2) Characterise the taxonomic, geographic and temporal scope of UK amphibian and reptile data sources and their corresponding sampling and dataset quality assurance procedures.
(3) Catalogue the mobilisation of data between sources of UK amphibian and reptile observation data.(4) Identify gaps in the existing UK amphibian and reptile data management landscape.
(5) Provide recommendations for achieving an integrated Biodiversity Observation Network.

Search strategy
We used a scoping review framework (see Arksey and O'Malley 2005;Levac et al. 2010) to survey sources of amphibian and reptile observation data.This enabled us to identify sources that were not strictly locked within academic literature, thus reflecting the type of data suitable for integration across a network.We performed searches between 7 November 2020 and 15 January 2021.First, data sources were identified through consultations with three stakeholder organisations: the Amphibian and Reptile Group (ARG UK) network, the Amphibian and Reptile Conservation (ARC) Trust, and the British Trust for Ornithology (BTO).The ARG UK network and the ARC Trust are leading amphibian and reptile conservation charities that support various projects monitoring and conserving native amphibian and reptile populations across the UK.The BTO is mainstream UK conservation charity leading on the conservation and research of birds and other British wildlife.The BTO Garden BirdWatch scheme is the one of the largest biodiversity monitoring citizen science projects in the UK and generates thousands of amphibian and reptile observations annually.Following initial consultations, we performed a series of desk-based searches of electronic databases, internet search engines, registries of biological records data, and the grey literature.We searched Google search engine (www.google.co.uk) using the key terms "UK reptile amphibian data" and "UK biodiversity recording database", respectively.The first 100 results of each search were reviewed as potential sources of reptile and/or amphibian observation data.Next, we searched the National Biodiversity Network (NBN) Atlas (www.nbnat las.org) using the 'advanced search' function to filter for relevant data partners and datasets using the [Species/Taxon (any)] field and searching the key term "reptile OR amphibian".We then searched the UK Environmental Observation Framework Catalogue (www.ukeof.org.uk) for relevant datasets using the key term "reptile OR amphibian".Relevant platforms were also identified from manual interrogation of those listed on the National Forum for Biological Recording (www.

Data source selection
Data source selection was an iterative process which involved searching for sources of data, refining the search strategy, and reviewing sources for inclusion (Levac et al. 2010) (see Fig. 1).Where appropriate, we grouped sources according to their overarching 'umbrella' organisations or collectives as these were analogous in their purpose and operations.The inclusion criteria used in this review focused on capturing FAIR datasets (see Wilkinson et al. 2016)

Data charting
We collated information on the characteristics of datasets from data source websites and through consultations with data publishers, where this could be arranged (incl.ARC Trust, ARG UK, BTO, Royal Society for the Protection of Birds, Froglife, BRC, and The Woodland Trust).We abstracted the metadata and sampling event information for each data source using a data charting technique to synthesise and interpret the information.Where available, we captured the following information using a standardised form: source name; publisher/organiser; background information and purpose; recorder characteristics; type of available data; temporal coverage (i.e., year of establishment and/or year that source became involved in amphibian and/or reptile data collection, storage and/or management); geographic coverage; taxonomic coverage; data quality assurance procedures; data transfer activity.

Methodological appraisal
We summarised data sources according to their characteristics, data generation procedures and dataset attributes.This included an assessment of the taxonomic, geographic and temporal scope of the dataset, recorder characteristics and any data quality assurance techniques used, as derived from the available metadata associated with each data source.We categorised data sources based on the structural traits of their equivalent datasets using Fig. 1 Flowchart illustrating the process of searching for data sources and the selection and grouping of sources meeting criteria for inclusion in this scoping review five-point Likert scale ranging from highly structured (A) to unstructured (E) data (see electronic supplementary information, S2).

Data analysis
We performed a network analysis to visualise the mobilisation of data between sources and identify prominent sources in the network.Data sources were represented in a network as nodes and data transfers were mapped as links, plotted using the GGally (Schloerke et al. 2021), ggplot2 (Wickham 2016), network (Butts 2015), and igraph (Csardi and Nepusz 2016) R packages.Network metrics were computed using the 'networkD3' (Allaire et al. 2017) R package.The 'degree' of nodes reflected the number of directional links with other nodes.The average number of links to pass through a node was calculated as 'betweenness'.'Betweenness centrality' was computed as the number of instances in which a node fell on the shortest path between two other nodes, thus facilitating data transfer between sources.To identify influential sources in the network, eigenvector values were computed which took account of the 'degree' of nodes and their connectedness to other well-connected nodes.Nodes with high eigenvector values were centralised in the network and were, therefore, largely influential in the mobilisation of data across the data landscape.
All data analysis procedures were carried out using R studio v4.0.2 (R Core Team 2021).

Data source attributes, contributors and quality assurance
We identified 45 sources of UK amphibian and reptile observation data from the scoping review (see Table 1).These sources clustered into three typologies: 'recording projects' (n = 26), 'recording communities' (n = 4) and 'digital data platforms' (n = 15).Recording projects reflected a coordinated data collection activity that followed a defined methodology (e.g.systematic or semi-structured monitoring, see Table 1) with a discrete taxonomic, geographic or temporal focus.Recording communities were organised groups of individuals that carried out the collection of data and coordinated the storage and sharing of data.Recording communities typically organised and participated in sampling events, though were not defined by methodological constraints, and hence we identified recording communities that were associated with several datasets.Digital data platforms represented online tools for the direct capture, storage, or export of records.The structure of datasets varied across data sources.Heterogeneous datasets (Group C) were the most widely available across sources (42%), particularly for digital data platforms, with component records an aggregation of verified and unverified opportunistic sightings and systematic survey data collected by a variety of recorders.There was comparable abundance of highly structured (Group A, 24%) and semi-structured datasets (Group B, 22%).Generally, these datasets consisted of validated and verified records that had been collected using pre-defined (semi-)systematic methodologies.
The recorders contributing to the various data sources ranged from novice citizen scientists to experienced species surveyors and a combination thereof.Nine recording projects solely recruited citizen scientists (of any ability) to collect data.Seven recording projects only recruited experienced (often licenced) species surveyors to collect data, particularly when European-protected species were the taxonomic focus of monitoring.When these A = Source datasets appeared highly structured and included detailed information content for records.Records were collected using pre-defined sampling procedures, often with repeated surveys at defined intervals and using specialist survey equipment.Grid references were validated for accuracy and the recorders had expertise in recording species data B = The data source generated semi-structured datasets and included some information content for records.Records had either been verified by an expert or the species identification were likely to be accurate based on the methods of identification or the expertise of the recorder.Records may not have been collected according to a defined, repeat survey methodology nor specialist survey equipment, but often at least semi-structured surveys had been used.The recorders may or may not have been trained in survey methods or identification C = Datasets generated by the data source appeared heterogeneous in nature.Datasets were aggregations of records generated from opportunistic sightings and systematic surveys.Verified data was likely to be accurate, though unverified data may be less accurate.The expertise of the recorder, the sampling event procedures and species identification checks varied across records or were unknown D = Records comprised of entirely opportunistic sightings, with little-to-no information pertaining to the sampling event, the recorder or metadata.Records may have had little or no verification and validation.Photographs of observations may be available for some records E = Records comprised of entirely opportunistic sightings that were not collected according to a pre-defined sampling methodology.There were no defined record verification or validation procedures and there were no details on the expertise of data recorders, nor information pertaining to species identification species were the taxonomic focus of citizen science-based recording projects, citizen scientists accompanied licenced species surveyors to collect data on systematic surveys and received training in species survey methodologies and identification.Across all citizen science-based recording projects, six of the project organisers provided only identification guides, whereas five of the projects did not issue identification guides or any formal training to citizen scientists.Most digital data platforms included verification and validation procedures for quality control purposes.Typically, this encompassed verification by species experts (e.g.'County Recorders').Record verification for some platforms also relied on community knowledge, whereby online communities of wildlife recorders provided identification suggestions to each other's observations, and automated computer checks to flag (likely) errors to recorders entering data or to verifiers after records had been entered.Further information on the characteristics of each data source is provided in the electronic supplementary information (S3).

Taxonomic coverage
Sources of data for all native species of UK amphibians and reptiles were identified.As illustrated in Fig. 2, multispecies datasets, particularly for widespread species, featured extensively across sources.Recording communities only generated multispecies data.Digital data platforms also typically captured multispecies data, whilst a targeted taxonomic focus was more common amongst recording projects.Amphibians were the taxonomic focus of data sources more frequently than reptiles.Common frog had the best coverage across all data sources, as approximately three-quarters of all sources captured observation data for this species.Sources of data for great crested newts were also widely available and seven sources included eDNA records.European adder, sand lizard and grass snake were the focal species most frequently targeted amongst the reptile recording projects.No source specifically targeted the sole collection of data for palmate newt, slow-worm, smooth newt or viviparous lizard, though data for these species were captured by multispecies sources and can be obtained from sources targeting species with legislative reporting requirements.

Geographic coverage
There was division in the geographic availability of data sources included in this review (see electronic supplementary information, S5).Data sources pertained mostly to England (n = 41), followed by Wales (n = 34) and Scotland (n = 30).Northern Ireland had the fewest sources of data (n = 24).Data available through digital data platforms generally had the largest (national) geographic coverage.We note, however, that while many data sources had the potential for national coverage, their actual spatial extents were often more restricted to a number of targeted sites (see electronic supplementary information, S3).This was particularly evident for recording projects and for recording communities that were bound by a local (e.g.Vice-county) perimeter of operation (see electronic supplementary information, S3).

Temporal coverage
The number of data sources has fluctuated widely over the last century (see Fig. 3).Records of human observations were available from 1900 whilst eDNA records emerged from 2013 onwards.One digital data platform was active since the start of the 1900s.Recording communities emerged in the 1910s and the first (still active) recording projects emerged in, or shortly prior to, the 1950s.The largest increase in the total number of active sources was observed between the 2000s and the 2010s; rising from 12 sources at the end of the 1990s to 44 active sources by 2019.Despite the majority of sources emerging later towards the 2000s, digital data platforms usually included historical records prior to the platform's establishment.Recording projects generally ranged between 3 (IQ1) and 31 (IQ3) years, with a mode of 3 years.Historical records were also available through the ARC Reserves Surveys, reflecting some of the earliest records available through a recording project.The Natterjack Toad Monitoring Programme and the Sand Lizard Monitoring Programme had the longest periods of continuous monitoring of any recording project, spanning 40 and 37 years respectively.Recording communities also typically had extensive periods of activity, on average 66 years.

Network analysis
The network analysis illustrated the flow of data between sources (see Fig. 4).The analysis indicated that the UK amphibian and reptile monitoring portfolio is a dynamic and fragmented data landscape.Two isolated nodes, with no links to any other source, were identified in the network.All other sources had at least one link, but some appeared to only receive data and did not export data to other sources.The degree ('g') of nodes averaged to 4.6 links per data source though 53% of sources had two or fewer links.Digital data platforms generally had the highest number of connections.Overall, the NBN Atlas had the highest number of connections (g = 21), followed by the LERCs (g = 19) and the Living ARCive (g = 17).The ARGs/RAGs (g = 13) and Great Crested Newt Level 1 Licence Returns (g = 6) sources were the most connected recording community and recording project in the network, respectively.On average, data mobilised across 2.4 links between sources within the network.Digital data platforms often fell on the shortest path between other nodes in the network (betweenness centrality, 'bc') and were highly influential over the mobilisation of data across the network (eigenvector centrality, 'ec').The NBN Atlas (bc = 309), Living ARCive (bc = 281), and LERCs (bc = 172) had the highest bc across all sources, indicating that these sources, particularly the NBN Atlas, most frequently bridged the transfer of data between two other sources in the network.The LERCs had the highest ec overall, indicating that these were the most centralised sources of data within the network, with high connectedness to other centralised sources.Other centralised sources (ec > 0.60) in the network included the NBN Atlas (ec = 0.90), iRecord (ec = 0.67), ARGs/RAGs (ec = 0.65), and Record Pool (ec = 0.64).

Discussion
Integrated biodiversity monitoring may enhance the (re)usability of available data and enable more precise tracking of biodiversity over large spatial and temporal extents.In this review, we explored the scope of existing sources of FAIR (see Wilkinson et al. 2016) amphibian and reptile data for assessing species status and national trends in the UK.Recognising that individual datasets were collated with specific purposes in mind, we did not seek to ascertain were "the best" sources of data.Rather, to illustrate the heterogeneity of the data landscape and to identify taxonomic, temporal and geographic gaps in the existing monitoring portfolio.Whilst diversity can enhance monitoring capabilities, we observed an emerging problem of complexity and fragmentation that is likely to amplify under ongoing technological innovation (e.g., see August et al. 2015).Collectively, datasets may provide comprehensive information for all species and regions but without integrating disparate monitoring efforts, the ongoing complexity and fragmentation of the evidence base is only likely to increase.Many of the factors driving this situation are pertinent to biodiversity monitoring more widely, so the problems and solutions are likely to be general.The integration of data in a unifying network infrastructure that streamlines fragmented monitoring may offer more precise, up-to-date biodiversity assessments over multiple scales.
The UK amphibian and reptile monitoring portfolio is a diverse data landscape comprising recording projects, recording communities and digital data platforms that collect, curate, and share data for all native species.The large number of sources is testament to a growing conservation community and should be celebrated.However, this diversity presents challenges for synthesis in research and decision-making processes at national scales.Digital data platforms are key for the mobilisation of data, particularly the LERCs and the NBN Atlas, which are highly connected and centralised sources in the data landscape.Collectively, the LERCs interacted with other important sources more frequently than the NBN Atlas alone which led to their aggregated position as the most centralised sources in the data landscape.Though the mobilisation of data from some LERCs can sometimes be restricted by paywalls, formatting incompatibility or due to constraints on data sensitivity and confidentiality.At its inception, the NBN Atlas sought to become; "the best wildlife information management structure", by capturing, enhancing and mobilising wildlife data, making information widely available and engaging people about wildlife (NBN Trust 2014).We found that the NBN Atlas is highly connected to other data sources and is a central distributor of information, frequently bridging the transfer of data between other sources.This suggests that the NBN Atlas has been reasonably successful towards achieving its aims in collating and making data widely available.However, the full vision of the NBN Atlas may not yet be realised as we found that it lacked detailed metadata on the sampling protocols used to generate datasets.This information is essential for reusing data in other contexts.Taken together, our findings suggest that whilst the NBN Atlas is the most publicly accessible source and has the potential to reach its objective of becoming "the best wildlife information management structure", it currently falls short due to insufficient metadata and lower rates of data sharing with other important data distributors than could be achieved.
It is important to stress that the high centrality metric used in our analysis does not directly relate to "the 'best" data source.Instead, we used this metric to highlight which sources are influential in the mobilisation of biodiversity data (Zhao and Zhang 2020).There are many advantages to diversity in species recording and data management.Multiple organisations working together can address more facets of biodiversity monitoring beyond the capacity of any standalone organisation.A variety of stakeholders also fulfil different roles within a nature conservation network; from bottom-up primary data generators, with detailed regional or taxonomic expertise, to top-down statutory monitoring and governance.It is encouraging that we observed a high reciprocity of data transfer between sources as this suggests that many organisations are promoting a FAIR and open data landscape.However, high mobilisation of data may affect the quality of available data as there are multiple levels at which information may be lost through data manipulation and interpretation by data users.We identified isolated sources and one-way links which may pose significant weaknesses in the network.Catastrophic data loss could occur for some species and regions if an organisation collapses or ceases to collect data into the future.Poorly connected sources may also be less likely to contribute to wider biodiversity conservation efforts than well-connected sources.Hence, sources of this nature may limit the mobilisation of data across a network, hampering future integration efforts and restricting the information available for research and for informing national policy.
Currently, none of the existing sources of UK amphibian and reptile data appear to provide sufficient baseline information for national monitoring of all species, though some sources may have adequate foundations to build on for specific species, regions and time periods.Digital data platforms and recording communities generally have wide taxonomic scope, acting as "catch-all buckets" for any available data across large temporal and spatial extents.However, data made available through digital data platforms tend to lack sufficient quality to make reliable inferences of biodiversity dynamics (Bayraktarov et al. 2019) as they typically contain only presence-only records.National platforms with informationrich abundance data is lacking, but incorporation of standards such as the Darwin Core 'Event' category (Wieczorek et al. 2012), which formalise the capture and presentation of sampling information across heterogeneous datasets would make this possible.Nonetheless, the large datasets of presence-only records available through digital data platforms can complement systematic surveys, filling some of the spatial and temporal gaps often associated with small-scale studies (Isaac et al. 2020).In isolation, however, these datasets usually contain a variety of data biases (Petrovan et al. 2020), which can lead to misleading conclusions if not recognised and accounted for (Isaac and Pocock 2015).
Structured datasets arising through systematic monitoring of multiple species can enable standalone assessments of biodiversity.We observed that systematic monitoring is often restricted to a selection of sites within regions (i.e., via convenience sampling as sites are managed by project coordinators).Systematic monitoring currently favours amphibians over reptiles and most structured datasets are limited to species with legislative reporting requirements.For instance, most of the existing suite of structured amphibian and reptile datasets are single-species and arise from the systematic monitoring of Europeanprotected species coordinated solely by conservation organisations.We did find that citizen science-based recording projects frequently generated multispecies datasets, usually for species that are widespread in their occurrence.Though such data may contain sampling biases of varying degrees, particularly as large heterogeneous collections of records (Isaac and Pocock 2015), and there has been limited empirical analyses of these datasets with regard to amphibians and reptiles (though see Humphreys et al. 2011;Wilkinson and Arnell 2013).Where recording projects, however, have focussed on single-species monitoring, there have been relatively more empirical outcomes.For instance, common frog (Scott et al. 2008, great crested newt (Beebee 1997;Denoël 2012), common toad (Petrovan and Schmidt 2016), and adder (Gardner et al. 2019) have all featured in empirical studies and we observed that these species are a popular focus in recording projects.Whereas, quantitative assessments for palmate newt, smooth newt, slow-worm and viviparous lizard are largely lacking and we found that these species had the lowest rates of monitoring of all widespread species.In the case of palmate newt, this could in part be due to difficulties with identification or lower rates of occupancy nationally.Likewise, we observed a clear geographic bias for England and lower rates for Northern Ireland.Though differences in human population densities and regional taxonomic prevalence could explain these findings.
Advances in computing are likely to have led to a variety of means for collecting, validating, and verifying data, and therefore have likely contributed to an increase in the uptake of citizen science approaches in biodiversity monitoring in recent years (August et al. 2015;Pocock et al. 2017).For instance, eDNA has emerged as a viable tool for amphibian monitoring (Biggs et al. 2015), and we found seven sources of eDNA in our search, all initiated since 2013.Amphibian and reptile surveillance is also a primary frontier in several emerging ecological remote-sensing techniques such as camera trapping (Welbourne et al. 2017).In line with other accounts (e.g.James 2011;Roy et al. 2012;Pocock et al. 2015), we observed that an array of data quality assurance techniques can be imposed on datasets before being made available to data users.We caution, however, that excessive manipulation of data by publishers may reduce the quality of available metadata depending on the format in which it is published.As is typical for sources of biodiversity data (Roy et al. 2012;Dobson et al. 2020;Thornhill et al. 2021), many UK amphibian and reptile datasets reflect heterogeneous collections of records originating from opportunistic and systematic surveys.However, we found that extracting specific data collection procedures from these sources was either challenging or impossible.By restricting the availability of sampling event information associated with datasets, the potential for reuse of existing biodiversity data may be constrained for several large data sources.
Biodiversity and conservation science is in the midst of adopting more formal and systematic approaches to evidence synthesis.Historically, evidence reviews in biodiversity science have had lower standards of reproducibility (Grames and Elphick 2020).We adapted the traditional scoping review framework (Arksey and O'Malley 2005) to suit the needs of this review to provide a rigorous and transparent approach to mapping a baseline account of FAIR UK amphibian and reptile observation data; permitting gaps in the current monitoring portfolio to come to light.We hope that this study may serve as a template for summarising sources of biodiversity data, enabling comparable assessments and appraisals of existing data for other taxa and environments.We grouped and evaluated some sources as collective units as the evaluation of their separate entities was not feasible.These represented branched organisations that operated as independent groups.Therefore, it is important to note that not every independent branch of grouped sources (i.e. the 'LERCs', 'ARGs/RAGs', 'Local Nature Partnerships', and 'Wildlife Trusts') will necessarily have links to all of the data sources identified in the network analysis.Nonetheless, we expect that mapping them in this way provides a typical depiction of the characteristics and wider mobilisation of data and across a biodiversity data management landscape.
Effective large-scale biodiversity monitoring requires integration of localised and fragmented monitoring efforts, thereby extending the capacity of any stand-alone programme, to address pressing science and conservation issues (Kühl et al. 2020).We recommend integration of datasets and coordinated monitoring for more comprehensive status and trends assessments.A discussion on the complementarities amongst data sources in this review is provided in the electronic supplementary information (see S4).To achieve an integrated monitoring portfolio, stakeholder collaboration within a unified infrastructure, such as a network, is paramount.Aligning pathways in shared, interoperable formats, combined with core monitoring, allows for robust analyses on the patterns of large-scale biodiversity change (Kühl et al. 2020).We conclude this review by providing recommendations to improve on current practice and achieve an integrated biodiversity monitoring portfolio.
First, to improve transparency and allow data to be used more widely, data publishers should seek to improve the 'interoperability' and 'reusability' of datasets by providing data in clear, interoperable formats [e.g., 'Darwin Core' (Wieczorek et al. 2012)] to align with data standards and ensure that important sampling event metadata accompany records in datasets.Second, we urge data publishers to provide clarity on how information is disseminated and shared between recorders, scheme organisers, scientists and decision-makers.As a minimum, this would provide information about the level of data duplication when combining datasets in a single analytical framework.Data should be presented in a way that would enable it to be traced back to its origin and allow data-users to ascertain how the data was collected.Examining all facets of network communication was not within the scope of this review, but clear channels of communication will be essential to enable an integrated network to generate and share information more effectively.Future work should explore current practice for sharing information and evidence between data publishers and government bodies so that clear channels for sharing data and information can permeate across the landscape.Third, the development of a validated tool to assess the 'structure' of datasets would likely enable more timely identification of fit-for-purpose datasets.Finally, we advocate for the establishment of an effective realised Biodiversity Observation Network, co-developed by stakeholders, and the enhancement of existing centralised data infrastructures that take account of these recommendations for collating, characterising, and sharing biodiversity data.

Fig. 3
Fig. 3 The number of active sources of United Kingdom amphibian and reptile observation data per decade from 1900 to 2021.The number of sources is depicted as counts of digital data platforms, recording communities and recording projects since the source became active in amphibian and reptile data collection, storage and/or management

Table 1
Sources of United Kingdom amphibian and reptile observation data