Introduction

An understanding of the status and abundance of species, their habitats, the threats and pressures they face, and the progress of work undertaken for their conservation is essential for effective project management and decision-making (Robinson et al. 2018; Hu et al. 2019; Stephenson and Stengel 2020). However, for a variety of reasons, species monitoring is largely inadequate and often neglected, meaning the necessary data are unavailable to decision makers (Amano et al. 2016; Stephenson et al. 2017a, 2022). Moreover, scientific knowledge of species is constrained by the taxonomic and spatial biases of the available data (Amano et al. 2016; Troudet et al. 2017; dos Santos et al. 2020).

Taxonomic biases in biodiversity research and monitoring at the global level tend to result in an under-representation of invertebrates in data sets (Leather 2009; Hochkirch et al. 2021) and an over-representation of vertebrates, especially mammals and birds (Bonnet et al. 2002; McRae et al. 2017; Moussy et al. 2022). Geographic biases mean the tropical regions housing the most biodiversity and the most threatened species are the least studied (Pimm et al. 2006; Amano and Sutherland 2013; Titley et al. 2017). Reasons for this inequality are diverse and poorly understood (Stephenson et al. 2017a, 2022; Stephenson 2019; Hochkirch et al. 2021) but may be a key factor in the low use of data in national biodiversity reports (Bubb et al. 2011).

It is necessary to understand and address the gaps and biases in biodiversity knowledge if we are to enhance conservation management. The urgency is particularly acute in Africa where a large proportion of the population is directly dependent on ecosystem services for their livelihoods, yet increasing habitat loss from the expansion of agricultural land and urban areas is exacerbating biodiversity loss (Craigie et al. 2010; Seto et al. 2012; IPBES 2018).

The aim of our research was therefore to conduct a case study in East Africa to answer the question: What are the taxonomic and geographic gaps and biases in the biodiversity data available for species considered priorities and how are they affecting conservation action? The project set out to identify trends in data availability and place them in the context of recognised conservation priorities at national, regional and global levels. We conducted an analysis of global databases to identify overall trends in data availability for the region, and to discover to what extent regional biases reflected global biases. However, in order to gain an understanding of data user needs and the root causes and consequences of data gaps, we conducted a survey of practitioners and policymakers. Our ultimate aim was to understand the factors blocking the flow of biodiversity data to conservation decision makers in East Africa.

We focused on assessing the biodiversity data needs and challenges in eleven countries in East Africa: Burundi, Comoros Islands, Ethiopia, Kenya, Madagascar, Malawi, Mozambique, Rwanda, Somalia, Tanzania and Uganda. This region was selected as it has a wide range of habitats (including rainforests, mangroves, mountains, freshwater lakes, coral reefs and seagrass beds), and several priority ecoregions (e.g. African Rift Lakes, Coastal East Africa, Coastal East Africa Marine, and Madagascar; Olson and Dinerstein 2002) and associated biodiversity hotspots (Eastern Afromontane, Coastal Forests of Eastern Africa, and Madagascar and the Indian Ocean; Mittermeier et al. 2004). As a result, the region has been the focus of conservation efforts by many national governments and national and international conservation agencies dating back several decades (Huxley 1961). Although there are increasing efforts to identify global floral and fungal conservation priorities (e.g. Bachman et al. 2018; Gonçalves et al. 2021), governments and NGOs in East Africa primarily focus their attention on their faunas and so our study looked at animal species only.

Many biodiversity monitoring systems fail to take account of user needs (Stephenson et al. 2015b, 2017b). While exploring data on threatened species can be useful, the level of extinction risk is not the same as the level of priority for conservation action (Fitzpatrick et al. 2007; Mace et al. 2007; Le Berre et al. 2019). The most threatened species are often the most difficult and the most expensive to conserve (Le Berre et al. 2019) and some taxa will be priorities for conservation action even if they are not listed as threatened (IUCN 2012). Therefore, in this study regional data needs were determined by defining priority species from national laws and governmental, inter-governmental and NGO conservation plans. Data availability and access challenges were identified by assessing East African data in three global databases commonly used to monitor delivery of national contributions to global biodiversity goals: the International Union for Conservation of Nature (IUCN) Red List of Threatened Species (IUCN 2021), the WWF/ZSL Living Planet Index (Loh et al. 2005; Living Planet Index 2021) and GBIF—the Global Biodiversity Information Facility (GBIF 2021). We assessed the results against socio-economic factors and surveyed data users to better understand the reasons and consequences of data access issues in the region.

Methods

Data collection

Threatened species

Threatened animal species in the 11 countries were identified through the IUCN Red List of Threatened Species (hereafter the IUCN Red List; Fig. 1). The IUCN Red List (IUCN 2021) is an inventory of the global conservation status of biodiversity (animals, plants and fungi). Using a set of quantitative criteria, it assesses the risk of extinction of species, ranging from Least Concern (for species that are still abundant in the wild and under little threat) to Extinct (IUCN 2012). Threatened species are those assessed to be in categories Vulnerable, Endangered and Critically Endangered. Data on range, population size, habitat and ecology, use and trade, threats, and conservation actions are also listed in the database. Data downloaded for this study included: scientific name, taxonomy, Red List category, population trend (stable, decreasing, increasing or unknown), assessment ID, assessment date, year published, system (e.g. marine, terrestrial, freshwater), countries where each species was present, and potential threats. (References and DOIs for the Red List data downloaded can be accessed in the supplementary material Appendix A, Tables S1 and S2 respectively). Data files containing each species’ countries and threats were downloaded separately and modified using the Python programming language (version 3.6; VanRossum and Drake 2009) to allow for better visualisation of the data and to make it easier to conduct analyses. The total numbers of species listed in the IUCN Red List by target country and species class in all 11 target countries were also recorded for comparison with the numbers of threatened and priority species (see below) obtained for each target country and species class in the study area.

Fig. 1
figure 1

Map of Africa showing the relative number of threatened animal species in each East African country studied: Burundi, Comoros Islands, Ethiopia, Kenya, Madagascar, Malawi, Mozambique, Rwanda, Somalia, Tanzania and Uganda. The darker the colour, the greater the number of threatened species

Priority species

While many people consider threatened species to be important, data are more likely to be collected for those that are considered a priority by key stakeholder groups. Therefore, we defined a priority species as a threatened species identified for protection by law or by action by a government, international convention or non-governmental organisation (NGO) (Fig. 2). We focused on priority species because most conservationists do not need to monitor all species in every taxon; it is infeasible (especially for very speciose groups like invertebrates) and unnecessary, since data are only required for species that are of concern or are the focus of conservation action (i.e., priority species).

Fig. 2
figure 2

Workflow used in the present study. The number of threatened animal species was first identified from the IUCN Red List. Of these, species protected by the governments (GOs), species listed on International Conventions (CI), i.e. on the Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES) and on the Convention on the Conservation of Migratory Species of Wild Animals (CMS), as well as species prioritised by NGOs operating in the selected countries were listed as "priority" species. Data of these priority species was then collected from the Global Biodiversity Information Facility (GBIF) and Living Planet Index (LPI). The number of priority species for which data in each of the two databases is shown in the yellow rectangles. In parallel, a survey of practitioners and policy makers was conducted

After identifying threatened species, we identified which ones were priorities through a review of national laws and strategies, Multilateral Environmental Agreements (MEAs), reports to the Convention on Biological Diversity (CBD; www.cbd.int), and NGO strategies. Each country's National Biodiversity Strategy and Action Plan (NBSAP) was reviewed, as were National Reports to the CBD. Lists of species protected by law were found on ECOLEX (IUCN, UNEP and FAO 2021) and the FAOLEX Database (FAO 2021). Both protected and partially protected species were considered priorities. Names and dates of the laws found for each country can be found in supplementary material (Appendix A, Table S3). The protected species lists of two international conventions, the Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES) and the Convention on the Conservation of Migratory Species of Wild Animals (CMS), were also used to identify priority species in target countries (Fig. 2). All 11 countries studied are parties to both conventions and so, by definition see their goals and priorities as national priorities. The lists of species protected by the conventions were downloaded from the Species + website (UNEP-WCMC 2021) for CITES appendices I, II and III and CMS appendices I and II.

Through web searches we identified the main international NGOs operating in the target region and for which at least some information was available online, which included: African Wildlife Foundation (AWF), Alliance for Zero Extinction (AZE), Conservation International, Re:wild (formerly Global Wildlife Conservation), Wildlife Conservation Society (WCS), World Wide Fund for Nature (WWF), and the Zoological Society of London (ZSL). Priority species per country for these NGOs were then identified from their respective websites in April and May 2021. No priority species were found for Conservation International. A list of priority species per country was then created. For a species to be considered a priority it had to be: i) present in the country, ii) considered a priority by the government of the country or by one of the NGOs working in that country or on the country's list of one of the two international conventions (CITES or CMS).

Data availability

We then assessed data availability for priority species by interrogating the Global Biodiversity Information Facility (GBIF; www.gbif.org, and the Living Planet Index (LPI; www.livingplanetindex.org) (Fig. 2). (Citations for each data download can be found in the supplementary material Appendix A, Table S4). These global databases bring together several types of biodiversity data and were chosen for this study as they are the most extensive biodiversity databases currently available, and are used to measure indicators by the Convention on Biological Diversity (CBD; Brooks et al. 2015; McRae et al. 2017), of which all of the East African countries are signatories. To assess whether there had been increased monitoring in response to the CBD adoption of the Strategic Plan for Biodiversity 2011–2020 (Convention on Biological Diversity 2010), the number of data collected since 2010 were analysed separately for GBIF and LPI databases.

GBIF is a global database of species presence records collected from multiple sources ranging from museum specimens to geo-referenced photographs and records from amateur naturalists. It is linked to, and holds data from, other global databases, including eBird (https://ebird.org). In December 2021, 1,902,873,733 occurrences were recorded in the database. GBIF records of East African priority species were downloaded via the GBIF website (GBIF 2021) in August and October 2021 (Fig. 2). As differences in the species number of occurrences were observed when the ‘country’ filter was applied directly on the GBIF database website or after downloading the species occurrences, all occurrences found for every species were downloaded and subsequently sorted by country. Data extracted included the number of occurrences recorded by country and the date of the last record and the basis of the last record (e.g. "human observation", "preserved specimen"). The final GBIF data file can be accessed in supplementary Table S4, Appendix A.

The LPI is a database of population trends of vertebrate species managed by the Zoological Society of London, which enables the assessment of changes in biodiversity over time. The database holds time-series data for over 27,000 populations of more than 4300 species. These population trend data are collated from different sources such as published literature, grey literature, government reports and online databases. Data on population trends of priority species were downloaded from the LPI data portal (Fig. 2; Living Planet Index 2021). Data extracted included the number of time series recorded by species by country and the date of the last record. The final LPI data file can be accessed in supplementary Table S5, Appendix A.

Socio-economic factors: international tourism receipts, gross domestic product and political stability

Some studies (e.g., Moussy et al. 2022) have shown that socio-economic variables may influence biodiversity data availability, and so economic data on target countries were collected. A web search investigated potential socio-economic data that might be of relevance in understanding biodiversity trends in East Africa. The most complete data sets found were for three socio-economic variables: gross domestic product (GDP), international tourism receipts and political stability.

GDP figures in billions of US dollars were obtained from the International Monetary Fund database (International Monetary Fund 2021). Data on international tourism receipts for each target country from 2014 to 2018 were obtained from the World Tourism Organization (UNWTO) Tourism Highlights Reports from 2016 to 2020 (UNWTO 2021). Somalia had no data on international tourism receipts. As these values fluctuate from year to year, an average of GDP (in millions of US dollars) and international tourism receipts (in millions of US dollars) from 2014 to 2018 was calculated for each country. Country population size data were collected from UN data (www.data.un.org) to take into account the tourism income and the GDP per capita in the analyses. The Political Stability Index, measured as “perceptions of the likelihood of political instability and/or politically-motivated violence, including terrorism”, of each country from 2014 to 2018 were obtained from the World Bank database (www.databank.worldbank.org). These country-specific socio-economic data can be found in supplementary material Appendix A Table S6.

Survey of data users

An online survey of conservation practitioners and policy makers was conducted to understand data user needs in East Africa and the root causes and consequences of data gaps. Questionnaire surveys have been used extensively in conservation biology (e.g., Tenopir et al. 2011; Chen et al. 2019; Danovaro et al. 2020) and allow direct input from stakeholders (Sanders et al. 2021) to help provide answers that may not be found in the literature, where data is scarce or missing (Meyer and Booker 2001).

While the global databases interrogated primarily provide data of use in measuring species presence (GBIF) or population trends (LPI), conservation managers ultimately need data to measure a much broader range of indicators on the state of biodiversity, the pressures species and habitats face and the conservation responses made to reduce pressures (Tittensor et al. 2014; Stephenson 2019; Stephenson et al. 2022). Therefore, in addition to surveying user needs for data of the type captured in the target global databases, we also included questions around other indicators to gain a more complete understanding of the issues. Forty data types were identified for inclusion in the survey, based on variables used for Red List assessments (IUCN 2012) and common metrics for project management (Table 1).

Table 1 The 40 types of data assessed in the survey

A confidential online survey for practitioners and policy makers was designed using Qualtrics survey software (Qualtrics, Provo, UT) and emailed to 65 conservation agencies: the environmental authority of each target country and 54 NGOs working in at least one of the countries. The survey was open from 8 September to 11 October 2021. Twenty-one questions were asked (in English and French) to obtain information on respondents’ conservation projects (recent, current or upcoming), the data they need to carry out their projects, data collection, difficulties encountered in accessing data, the impact of a lack of data on their projects, and data sharing.. The questionnaire survey is available in supplementary Table S7, Appendix A.

Data analyses

All data analyses were carried out using the R statistical programming language version 4.1.1 (R Core Team 2021). First, we assessed the taxonomic bias in data availability. Using Pearson's Chi-Squared tests (“prop.test” function in stats package), we compared the proportions between invertebrates and vertebrates, and the proportions among vertebrates for (i) the proportion of threatened species considered as a priority, (ii) the proportion of priority species with IUCN population trends data, (iii) the proportion of priority species with GBIF data and (iv) the proportion of priority species with GBIF data since 2010 among vertebrate classes. We did not test for differences in proportion of priority species with LPI data because this data was lacking for most taxonomic groups (except birds and mammals). We applied Bonferroni corrections to account for multiple comparisons.

Secondly, we tested if the number of species considered as a conservation priority was linked to tourism receipts per capita, GDP per capita or political stability. Five different models were performed using the glmmTMB R package (v1.1.2.3; Brooks et al. 2017), each using a different measure of the number of priority species by country as response variable: (i) the number of priority species in each country, (ii) the number of priority species per country having population trend data in the IUCN Red List, (iii) the number of priority species per country for which data are available in GBIF, (iv) in GBIF since 2010, (v) in LPI, or (vi) in LPI since 2010. Because these response variables are counts data, we used generalized linear models (GLMs) with Poisson or negative binomial distribution (negative binomial distribution was used if there was overdispersion in the model’s residuals). The validity of the models was assessed using the R package DHARMa (v0.4.4; Hartig 2021). The final models can be found in Appendix A, Table S8. Explanatory variables (tourism receipts per capita, GDP per capita and political stability indexes) were log-scaled in the models. The effect of each explanatory variable was tested using an analysis of variance (“Anova” function in car package, v3.0–11; Fox and Weisberg 2019).

Thirdly, responses to the survey of practitioners and policy makers were analysed. The relationship between the importance of data for successful conservation projects (i.e., the proportion of respondents who think the data are important) and the difficulty of accessing the same data (i.e., the proportion of respondents who think the data are difficult to access) was tested. As our response variables are percentages, binomial models were computed using the glmmTMB R package (v1.1.2.3; Brooks et al. 2017). The 40 data types (Table 1) were divided into seven categories: population parameters, demographic parameters, habitat, threats, associated species, already ongoing efforts, and socio-economic and cultural value of the species. These categories were accounted for in the analyses as a random effect. Post hoc multiple comparisons (with Bonferroni correction) were conducted to assess the differences between groups (“cld” function, package multcomp, version 1.4–17, Piepho 2004). The validity of our models was assessed using R package DHARMa (v0.4.4; Hartig 2021) and the effect of the explanatory variable was tested using an analysis of variance (“Anova” function, car package, v3.0–11; Fox and Weisberg 2019).

Results

Data availability for threatened and priority species

In the 11 study countries, data on 11,071 animal species were available in the IUCN Red List, of which 1674 species (15.1%) were assessed as threatened. There are at least 55,683 unique animal species names recorded in the region (GBIF.org [28 August 2022] GBIF Occurrence Download), meaning that the 11,071 species assessed in the Red List represents about 19.9% of the total fauna. Of the 1674 threatened species, 681 were considered priority species, with 293 species protected by governments, 364 species listed as protected on international conventions (CMS and CITES), and 336 considered a priority by one or more NGOs (Fig. 2, 3 and 4).

Fig. 3
figure 3

Number of threatened and priority animal species per country. The numbers in brackets are the total number of animal species per country assessed by the IUCN Red List (critically endangered, endangered, vulnerable, lower risk; conservation dependent, near threatened, last concern, data deficient). Left panel: Number of threatened species (in light green) according to the IUCN Red List and of these the number of species considered as priorities (in dark blue) according to international conventions, governments and NGOs. Right panel: number of priority species (dark blue) per country, number of priority species with population trend estimates in the IUCN Red List (red), and the number of priority species that have (i) ≥ 1 occurrence in GBIF (green), (ii) ≥ 1 occurrence in GBIF since 2010 (hatched green), (iii) population trends data in LPI (yellow) and (iv) in LPI since 2010 (hatched yellow) per country. x axis of the right panel is log scaled

Fig. 4
figure 4

Number of threatened and priority animal species per class in East Africa. The numbers in brackets are the total number of extant animal species per class in the 11 target countries assessed by the IUCN Red List (critically endangered, endangered, vulnerable, lower risk; conservation dependent, near threatened, last concern, data deficient). Left panel: Number of threatened species (in light green) per class in the 11 target countries according to the IUCN Red List and of these the number of species considered as priorities (in dark blue) according to international conventions, governments and NGOs. Right: number of priority species (dark blue) per class in the 11 target countries, the number of priority species with population trend estimates in the IUCN Red List (red) and the number of priority species that have (i) ≥ 1 occurrence in GBIF (green), (ii) ≥ 1 occurrence in GBIF since 2010 (hatched green), (iii) population trends data in LPI (yellow) and (iv) in LPI since 2010 (hatched yellow) per class in the 11 target countries. x axis of the right panel is log scaled

Taxonomic bias in data availability

Four of the nine invertebrate classes, Anthozoans (sea anemones and corals), Holothurians (sea cucumbers), insects and Malacostracan crustaceans (crabs, lobsters, shrimps, etc.), had priority species, but this represented only 79 (11.6%) of the total priority species for the region. In comparison, all seven vertebrate classes with threatened species in East Africa included at least one priority species (Fig. 3). The proportion of threatened species considered as a priority was significantly higher for vertebrates (0.476) than for invertebrates (0.193) (Pearson's Chi-Squared test, χ21 = 101.22, p < 0.00001). However, Holothurians (sea cucumbers) had the highest proportion (90%) of priority species among threatened species than any vertebrate or invertebrate class. The proportion of threatened species considered as priority varied among vertebrate classes (Pearson's Chi-Squared test, χ26 = 417.03, p < 0.0001). Most threatened Chondrichthyes species (cartilaginous fishes) were listed as priorities while Actinopterygians species (ray-finned fishes) were rarely listed as priorities (see Fig. 4 and supplementary Table S9 for details).

The IUCN Red List provides an assessment of the population trends for some species. The proportion of threatened species having population trends data on the IUCN Red List was significantly higher for vertebrates (0.740) than for invertebrates (0.413) (Pearson's Chi-Squared test, χ21 = 145.59, p < 0.0001). The same applies to priority species with a proportion of vertebrates and invertebrates of 0.875 and 0.430 respectively (Pearson's Chi-Squared test, χ21 = 92.24, p < 0.0001). The proportion of priority species having population trends data on the IUCN Red List varied among vertebrate classes (Pearson's Chi-Squared test, χ26 = 39.94, p < 0.0001). Birds had a higher proportion of priority species and reptiles and Actinopterygians (ray-finned fishes) had a lower proportion of priority species with population trends data on the IUCN Red List than most other classes of vertebrates (see Fig. 4 and supplementary Table S9 for details).

Of the 681 priority species, 609 (89.4%) had one or more occurrence records in GBIF, 561 (82.4%) had population trend data on the IUCN Red List and 49 (7.2%) had population trend data in the LPI. When Data were considered from 2010 onwards (during implementation of the CBD Strategic Plan for Biodiversity 2011–2020), occurrence data were available for 397 priority species in GBIF (a 35% reduction) and 11 priority species in the LPI (a 78% reduction).

Data existed in GBIF for 554 out of 602 (92.0%) priority vertebrate species and 55 out of 79 (69.6%) priority invertebrate species. However, there are data for only 379 (63%) of priority vertebrates and 18 (22.8%) of priority invertebrates since 2010. The proportion of priority species with data in GBIF was significantly higher for vertebrates (0.920) than for invertebrates (0.696) (Pearson's Chi-Squared test, χ21 = 34.75, p < 0.0001). The same applies to priority species with data in GBIF since 2010 (Pearson's Chi-Squared test, χ21 = 44.72, p < 0.0001). Among vertebrate classes, there was no significant differences between the proportion of priority species with data in GBIF. However, when focusing on data produced since 2010, the proportion of priority species with data on GBIF varied among vertebrate classes (Pearson's Chi-Squared test, χ26 = 78,87, p < 0.0001). Birds had more priority species than any other vertebrate classes while amphibians were the less represented vertebrate class on GBIF since 2010 (see Fig. 4 and supplementary Table S9 for details).

Geographic and socio-economic bias in data availability

The number of priority species per country was positively correlated with per capita international tourism receipts (Table 2; Fig. S1), as was the number of priority species having population estimates in the LPI and having ≥ 1 occurrence records in GBIF and in GBIF since 2010. The number of priority species with LPI data was also positively correlated with per capita GDP. There was no correlation between each country’s Political Stability Index and the number of priority species in each country or database (Table 2).

Table 2 Assessment of data against socio-economic factors: tourism receipts, GDP and political stability

Data users’ perceptions

We received 33 completed responses to the online survey of species conservation experts (a response rate of 51%). These included three responses from environmental authorities (in the governments of Comoros, Malawi and Rwanda), and 30 responses from NGOs (see acknowledgements). Based on responses received, the top five factors affecting the accessibility and usability of data were excessive expense, technological challenges, lack of capacity, the incompleteness of data, and the poor quality of available data (Fig. 5). The top five types of data required by decision makers related to species abundance, habitat extent, habitat quality, levels of human-wildlife conflict and conservation responses for site protection (Fig. 6a). Difficulty in accessing data was positively correlated with the importance of the data for conservation projects (Fig. 6a) and differed by data type (Fig. 6b) (binomial GLM: χ21 = 4.8, p = 0.028 and χ26 = 27.41, p = 0.0001 respectively). The data most important for respondents was the most difficult for them to access.

Fig. 5
figure 5

Factors affecting data accessibility and usability

Fig. 6
figure 6

Relationship between the difficulty of access to data and the importance of these data for carrying out conservation projects. a Correlation between the importance of data and the difficulty to access these data. b Differences between data group means ranked with letters. Group means sharing a letter are not significantly different. Data type group 1: population parameters, group 2: demographic parameters, group 3: habitat, group 4: threats, group 5: associated species, 6: already ongoing conservation efforts, group 7: socio-economic and cultural value of the species

Discussion

Priority species

This study is unique in that it considered not just the availability of biodiversity data but the availability of data for species that have been defined as priorities by governments and the NGOs supporting them. “Allocating resources solely to the most endangered species will typically not minimize the number of extinctions in the long-term, as this does not account for the risk of less endangered species going extinct in the future” (Wilson et al. 2011). Therefore, we looked at not just the most threatened animals in East Africa, but those considered priorities by the conservation community. However, identifying national priorities was often a difficult task. In general, countries did not specify priority species in their NBSAPs or in their reports to the CBD. NGO priorities were also sometimes difficult to identify from the organisations’ websites, with details buried in hard-to-find webpages or inaccessible plans and reports. The conservation organisations often gave only examples of priorities instead of full lists. Other studies in Africa have found that conservation plans need taxonomic and spatially explicit details to facilitate effective delivery and monitoring (e.g., Balmford 2003; Stephenson and Ntiamoa-Baidu 2010). In the future, targeted conservation action and monitoring might be improved if governments and NGOs specified, better communicated and reported on the taxa they focus their conservation attention on.

Taxonomic data gaps and biases

As expected, across East Africa more vertebrate species are considered priorities than invertebrate species. However, Holothurians (sea cucumbers) and Anthozoans (corals and their relatives) had relatively high proportions of threatened species considered priorities. This may reflect the value of these taxonomic groups to national and local economies due to the exploitation of sea cucumbers for food and of coral reefs for fisheries and tourism. East Africa is also important for many threatened and priority marine vertebrate species such as sea turtles, dugongs, dolphins, rays and many fish species (Sievers et al. 2019), and National Reports to the CBD of the Comoros, Ethiopia, Kenya, Madagascar and Tanzania (Convention on Biological Diversity 2021) state that actions are being taken to meet the Aichi Target 10 on ocean ecosystems.

Although data were available in GBIF for more than two thirds of the regional priority invertebrates, this was significantly less across taxa than for vertebrates. The findings in East Africa reflect historic global trends in taxonomic data biases towards vertebrates (Loh et al. 2005; Troudet et al. 2017), partly reflecting an over-representation of mammals and birds within vertebrates research and monitoring (Bonnet et al. 2002; McRae et al. 2017; Christie et al. 2021; Moussy et al. 2022). However, even within these classes, there are gaps with, for example, less data on African small mammals than African large mammals (Stephenson et al. 2021).

Actinopterygians (ray-finned fishes) were well represented in GBIF and the LPI which may be explained partly by the fact that East African countries are dependent on fisheries for food and economic income (Garibaldi 2014), and exploited fish stocks are likely to be better monitored (McRae et al. 2017). The trend for significantly fewer data on invertebrates, especially insects, is seen by many as worrying given the importance of these species-rich taxa for a variety of ecosystems and ecosystem services (Gascon et al. 2015; Cardoso and Leather 2019).

There were fewer data on priority vertebrate species in the LPI than in GBIF. This may partly reflect the fact it is easier to collect an occurrence record for a species (e.g., using museum collection records or citizen science observations) than it is to conduct a survey of the population with enough detail to calculate trends in abundance over time. Nonetheless, time-series data were found for only 11 East African priority species (mostly mammals) since 2010. Previous assessments of the LPI have acknowledged its geographic biases, with the majority of data collected in temperate regions rather than high biodiversity tropical regions (Collen et al. 2009). Reasons for low data input may include the lack of research conducted to obtain data on population trends and a lack of data sharing when data are collected (McRae et al. 2017). Eight people surveyed in this study explicitly stated that they were not aware of the existence of the LPI database, reflecting other findings that some practitioners are not aware of global databases or confident of the relevance of database content (Bowles-Newark et al. 2015). This further confirms the need to raise awareness of global databases at national level.

Troudet et al. (2017) suggested that public opinion guides biodiversity data gathering. Since many of the European and North American cultures that provide a high proportion of the resources for global databases usually perceive vertebrates such as large mammals to be the most charismatic species (Colléony et al. 2017; Krause and Robinson 2017; Albert et al. 2018;), it may not be surprising that vertebrates dominate those databases. There is also often an assumption that, if larger, wider-ranging species such as vertebrates are conserved, then smaller species in the same habitats, such as invertebrates, will also be conserved. This may explain some of the data collection biases. However, the use of such surrogates has its pitfalls (Cardoso et al., 2011) and may mean that taxa not monitored decline without us knowing. This is why calls have been made to enhance data collection for neglected taxa (Hochkirch et al., 2021; Stephenson et al., 2022).

Of course, the presence of data in a global database—whether occurrence records or trends in populations—does not in itself mean the data are useful for monitoring or reporting. The records need to be appropriate for the users (e.g., accessible, timely, relevant, easily understood). However, several studies have shown the potential value of disaggregating global data sets to obtain national trends (Han et al. 2014; Stephenson et al. 2015a). “Global datasets cannot always replace local or national data” (Bowles-Newark et al. 2015) but certain indicators such as population trends lend themselves to national analyses (Stephenson et al. 2015a) and the LPI has been used to create several national reports (e.g., van Strien et al. 2016; WWF-Canada 2020). In future, conservationists in East Africa need to explore further the use of global data sets to enhance their monitoring and reporting on biodiversity. If they are to ramp up the monitoring of their contributions to the CBD’s post-2020 global biodiversity framework, countries will not only need more and better data on species occurrence and population trends, but also on the pressures faced by species and the conservation responses made (Tittensor et al. 2014; Stephenson 2019), including the links between ecological systems and social and governance systems (Mastrángelo et al. 2019). However, our findings allow us a proxy measure of the availability of useful data at a global level and it suggests there remain some significant gaps in East Africa.

Filling data gaps

The absence of data on some priority species may reflect a lack of data collection or lack of sharing collected data. It may also reflect the fact only one in five countries worldwide uses national indicators based on those recommended by CBD (Bhatt et al. 2020). Conservation project managers often report more on activities and outputs rather than impacts (Stephenson 2019), and one review found only 19% of species-focused conservation projects submitted data on population trends (Badalotti et al. 2022). This suggests a more fundamental need to improve impact monitoring in conservation projects in East Africa and beyond.

More data need to be collected on many species considered as conservation priorities, especially invertebrates, amphibians and reptiles, as well as marine species. Efforts are already underway to incorporate data on invertebrates into the LPI, as has been started with European grassland butterflies (WWF 2020). But these efforts will only succeed if more fundamental improvements are made in species monitoring at national level.

As proposed by Hochkirch et al. (2021), measures and tools to enter research results systematically into global databases should be put in place, such as the obligation by scientific journals to share population trend or occurrence data through databases such as GBIF and the LPI. To motivate such behaviour, data contributed to global databases needs to be recognised as a credible publication (Costello et al. 2013) with mechanisms put in place for data citation to be comparable to other scientific publications (Costello 2009). The IUCN Red List already allows species assessments to be saved and allocated a digital objective identifier which therefore promotes the sharing of data by making it a citable publication. Other databases should follow suit.

Citizen science, the engagement of people without scientific training in the collection of data, has traditionally been focused on Western Europe and North America, but in recent years has expanded in Africa (Wotton et al. 2020; Stephenson et al. 2021). While this may provide additional opportunities for monitoring (see Chandler et al. 2017), efforts will need to be made to ensure data-deficient species are included (Theobald et al. 2015; Troudet et al. 2017). As well as exploring options for citizen science, more effort is needed to tap into indigenous knowledge (Sitati and Ipara 2012); many local communities in East Africa will know better than universities, NGOs or governments if a certain species occurs near them or not.

Addressing factors affecting data availability and use

The importance of biodiversity data for effective biodiversity conservation management has been well documented (Reichman et al. 2011; Stephenson and Stengel 2020) but our study shows for the first time that the data considered by practitioners as the most useful are the data that are most difficult to access. The data needed most are related to species abundance and the extent and quality of habitat (Fig. 6), which are common measures of biodiversity state. The three main challenges identified for data access and use revolved around inadequate resources, tools and capacity, issues raised in other studies (Amano and Sutherland 2013; Thapa et al. 2014; Tittensor et al. 2014; Stephenson et al. 2017a, 2021; Rounsevell et al. 2020).

We used three global databases (IUCN Red List, GBIF and LPI) as proxies for the taxonomic and geographic biases of biodiversity data relevant to East Africa. However, we note that other databases (at global and local levels) are needed to support countries with tracking other indicators not related to species presence or population trends. Therefore, a follow up study would be useful to determine the availability and biases of other data rated as important, such as habitat cover, habitat quality and human-wildlife conflict.

It is notable that, while the databases reviewed are openly and freely accessible, two of the greatest challenges to data use were excessive expense and a lack of resources to process and analyse data. This suggests that open-access databases have associated costs for users, perhaps in terms of developing the capacity necessary to process and analyse data. It could also suggest such databases are not useful for meeting local needs. Bowles-Newark et al. (2015) found a lot of uncertainty among national CBD focal points on the accessibility and applicability of global data sets. Issues of access and perceived relevance may be compounded by the fact that data users need at least 40 types of data (not only species presence or population trends), and that global databases do not exist for all of these variables.

With multiple challenges to data use uncovered by our user survey, multiple options need to be considered to help remove those challenges and make data more readily available. While more funding for monitoring is obviously key and was the main blockage noted by respondents to our survey, other solutions include allocating more of existing budgets to monitoring (Stephenson 2019; Badalotti et al. 2022). Furthermore, some studies suggest that starting biodiversity monitoring programmes in Africa could require as little as US$ 30,000–50,000 per country per year (Pereira et al. 2010; Wotton et al. 2020). Nonetheless, this will not avoid the fact that, in some cases, other conservation work is deemed more urgent than monitoring. For example, in Kenya there is some evidence that, at least in protected areas, managers often have to prioritize park security, anti-poaching, and the monitoring of illegal activity over the monitoring of species status (Stephenson et al. 2021). This further underlines the resource challenges and decisions facing many managers. The pervasive poverty in East Africa, both in terms of financial income and access to education and health, is a major constraint to the development of conservation and monitoring projects in this region (Kinzig and McShane 2015).

Globally, per capita GDP has been shown to be correlated with the number of species monitoring programmes in each country (Moussy et al. 2022) but GDP did not have a significant relationship with data availability in East Africa. However, our study found a positive correlation between per capita international tourism income and the number of priority species and species with data in GBIF and the LPI. Africa is a leading destination for nature and wildlife experiences (Higginbottom 2004) and the most established tourism products in Africa are wildlife related, such as safaris for the “Big Five" mammals, gorilla tourism, birdwatching and scuba diving (World Tourism Organization 2014). As discussed by Nyaupane and Poudel (2011), tourism development generates income for conservation and conversely biodiversity conservation makes places attractive to tourists. Although we can only infer causality, tourism revenue may explain better than GDP the number of priority species and the availability of data in East Africa because of the wildlife focus of so much tourism and the investments made in conserving species that attract those tourists. It would be interesting to conduct more detailed case studies and explore the links between tourism and data availability within and across regions.

Our findings underline the point raised by others that, given the high levels of species endemism and diversity and the low levels of GDP in sub-Saharan Africa, it is especially important for the international conservation community of donors and NGOs to ramp up support for species monitoring (Stephenson et al. 2017a, 2020, 2021). The second most important challenge for data users in East Africa was the use of technology, yet the use of remote sensing and other modern techniques such as DNA-based approaches will be essential for improving species monitoring across Africa (Stephenson et al. 2020, 2021). More guidance and capacity building support for monitoring should therefore be provided to project managers (Stephenson et al. 2015b, 2017a, b; Schmeller et al. 2017; Stephenson 2019; Badalotti et al. 2022). Capacity building would be further enhanced if conservationists implemented more pilots and case studies on using global data for national reporting that are then communicated to help share lessons (Bowles-Newark et al. 2015; Stephenson et al. 2015b).

Regional schemes for biodiversity monitoring that should be promoted and supported in order to fill data gaps include the Global Coral Reef Monitoring Network (Obura et al. 2017), and the GlobWetland Africa Project (Gardner et al. 2015). As well as sharing data with global databases, regional databases in Africa should also be expanded, including FishBase for Africa (http://www.fishbaseforafrica.org/), the Albertine Rift Conservation Society Biodiversity Management Information System (http://arbmis.arcosnetwork.org/), and the Africa Marine Atlas and African portal on the Ocean Biogeographic Information System (http://www.iobis.org/). The GBIF Secretariat (2019) is helping African countries to create networks of data holders and users and to digitize existing data. National efforts should also be ramped up to collect, use and share data, building on successful models like SANBI (2022) and the Endangered Wildlife Trust (2022).

Conclusions

East African biodiversity is under threat and in need of successful conservation action to preserve species, habitats and the ecosystem services for current and future generations. However, effective conservation action and sustainable development requires data for adaptive management and, as we demonstrated, this poses severe challenges. While global databases provide data for most priority species, taxonomic and geographic biases exist. Furthermore, many conservationists face capacity and resource challenges in accessing and using the data they need—and the data decision makers need most are the data that are hardest for them to access.

Based on our findings, we propose a series of actions to enhance data availability for key decision makers. Priorities include: the development by governments and their academic and NGO partners of long-term monitoring programmes for priority species (taking into account the need to counter identified data biases); the mobilisation of more financing and a larger proportion of existing conservation financing for species monitoring; the development of capacity for data collection, including for the use of the latest remote sensing and DNA technologies; increased engagement of citizen scientists to help governments and NGOs collect data; improved data sharing between national and global databases; and improved communications and case studies on the accessibility and uses of global databases for national monitoring.

These actions will require collaboration between governments and civil society, and more support from wealthier countries to those with more biodiversity. If this support is forthcoming, and the motivation to demonstrate national contributions towards the post-2020 global biodiversity framework and the Sustainable Development Goals helps create the appropriate enabling environment, we hope to see an increase in the capacity of East African states to collect, share and use biodiversity data and enhance the impact of conservation action on the ground.