Introduction

Gastropods are molluscs of incredible diversity both in form and in function (Bouchet et al. 2017; Ponder et al. 2019). Likely stemming from univalved molluscs of the Ediacaran—early Cambrian, current gastropods comprise the snails, slugs, whelks, and limpets that are distinguished from all other molluscs by the occurrence of a single shell, an operculum for most, and larval torsion at least once in their ontogeny (Aktipis et al. 2008; Parkhaev 2008). Their success in radiating into the marine, freshwater, and terrestrial ecosystems have made them ecologically important and cosmopolitan components in most trophic interactions as well as in nutrient and carbon cycles (Ponder et al. 2019). In the human context, gastropods have been a source of food, pharmaceutical compounds, and cultural identity but have also provided medical problems (e.g., as intermediate hosts to parasitic flatworms) and agricultural nuisance (Barker 2002; Olivera et al. 2014; Dang et al. 2015; Giannelli et al. 2016).

Despite their ubiquity, a definitive global assessment on the diversity of modern marine gastropods is still lacking, and with reason. The wide range of niches and environments that gastropods have been observed to occupy requires an equivalent range of taxonomic specialisation (and institutional support) to study them. It is also hypothesised that the majority of gastropod diversity would come from the generally unsampled and lesser-studied small and cryptic (i.e., hidden) molluscs, making them even harder to discover (Bouchet et al. 2002). But it is perhaps the sheer number of species within this group, backdropped by their complicated taxonomic history and the perennial reorganisation of their phylogeny, that contribute to the difficulty of determining how many species of marine gastropods there actually are. Current estimates of total gastropod diversity reach 150,000 species (Ponder and Lindberg 2008; Appeltans et al. 2012). Up to 45,000 of these are theorised to be undescribed species that are stored in specimen collections, whilst 60,000 are morphospecies yet to be sampled and discovered. Around 63,000 extant gastropod species have been named and described (Aktipis et al. 2008; Bouchet et al. 2017), and these numbers continue to grow. Advancements in imaging and sequencing technologies have accelerated describing new species and revealing non-monophyletic taxa and cryptic species (Meyer 2003; Giribet 2008; Duda et al. 2008; Puillandre et al. 2011, 2014, 2015; Golding et al. 2014; Zapata et al. 2014; Varney et al. 2021; Kantor et al. 2022). At the same time, the increased number of workers and research projects conducted in historically under-studied regions and water depths as well as the use of various sampling techniques have also steadily increased the rates of species discovery (Bouchet et al. 2002, 2016, 2023; Appeltans et al. 2012; Thaler and Amon 2019; Cunha et al. 2023).

Nevertheless, empirical information on the current overall diversity of any taxon is important for understanding their ecology and evolution and can facilitate developing strategies for their conservation and management. With the on-going efforts in digitising natural history collections (Yesson et al. 2007; Nelson and Ellis 2019) and the increased participation of citizen scientists (Dickinson et al. 2012; Kosmala et al. 2016; Bouchet et al. 2016), collaboration, and accessibility to research data (Costello 2009; Bingham et al. 2017; Escribano et al. 2018), accurately determining global patterns of biodiversity has become more achievable. For this, the Global Biodiversity Information Facility (GBIF) can be an invaluable source of such information. GBIF is an international network and data facility that is dedicated to aggregating and distributing open-access, standardised biodiversity data. It houses centuries worth of specimens-based and observations-based occurrence records compiled from museums, governmental environmental monitoring surveys, and research-grade community science platforms, among others. Although not without issues on data quality and taxonomic and regional biases, GBIF has been at the forefront among data infrastructures for primary biodiversity metadata, which can still be proven powerful and essential (Edwards 2004; Yesson et al. 2007; Maldonado et al. 2015; Kosmala et al. 2016; Nelson and Ellis 2019), especially for cosmopolitan, extensively collected, and diverse taxa such as marine gastropods.

As essential as determining the taxonomic diversity of a group is knowing the status of available reference sequences that are used for genetic identification and characterisation of species and genetic diversity. Despite the expanding applications of DNA barcoding and metabarcoding to ecological, environmental, and health research, genetic methods for taxon identification are constrained by the availability of reference gene sequences (i.e., barcodes) representing the taxa to be identified. Lack of reference sequences, in metabarcoding studies for example, results in the use of broadly assigned molecular operational taxonomic units (MOTUs) which could limit the derivation of ecological insights (Blaxter et al. 2005; Schmidt et al. 2015; Múrria et al. 2020). Determining species-rich regions and taxonomic groups deficient in genetic resources, particularly of complete mitochondrial genomes and cytochrome c oxidase I (COX1 or COI) gene sequences, could be helpful in prioritising areas and taxa for the augmentation of reference sequence databases. In this study, we used occurrence records from GBIF, with annotation of authoritative taxonomy, to evaluate the taxonomic richness of marine gastropods and to identify putative global species hotspots for this group. Additionally, the availability of mitogenomes and COI barcodes of species was reviewed to assess the state of genetic resources for these marine molluscs and detect potentially important regions where reference sequences for identifying these organisms are wanting.

Methods

A dataset compiling all recorded, present (i.e., non-absent) occurrences of gastropods (N = 11,085,172) was downloaded from GBIF (GBIF.org 2023) (19 Sep 2023, https://doi.org/10.15468/dl.qgsn2b) and analysed in R v4.2.2 (R Core Team 2022). The dataset was filtered to contain records that have species identifications at the least. Subspecies, form, and variety identifications were included in the dataset but were analysed only at the species level. The dataset was further filtered to have country assignments, irrespective of whether these were originally in the record or estimated by GBIF based on geographic coordinates provided by the publishing organisation (Fig. 1).

Fig. 1
figure 1

Flowchart of the data filtering steps and the main research questions of this study

All occurrence events were taxon-matched via the LifeWatch Species Information Backbone (www.LifeWatch.eu) and were annotated with authoritative taxonomy based on the World Register of Marine Species (WoRMS) with the R package ‘worrms’ v0.4.3 (Chamberlain and Vanhoorne 2023). Fuzzy matches were manually corrected based on verbatim scientific names and authorities. Ambiguous scientific names with no attached authors were not considered for taxon matching and were removed. Any record found unmatched with a valid Aphia ID was likewise removed. Events based on fossil specimens in GBIF and records of species flagged as extinct in WoRMS were filtered out. To retain exclusively marine taxa, records containing species categorised in WoRMS to inhabit freshwater and terrestrial environments were excluded from downstream tallying. Records containing species categorised to inhabit brackish environments were retained only if they were flagged to be marine as well. The total diversity of marine gastropods at different taxonomic levels was evaluated by tallying distinct and accepted families, genera, and species (Table 1).

Table 1 Global taxonomic richness, mitogenome availability (NCBI and GenBank), and COI barcoding coverage (BOLD) of marine gastropods, covering 234 countries and territories, based on GBIF occurrence data (n = 3,904,314 records; 1662–2023)

To examine the availability of genetic reference sequences for the resulting taxa, we mined publicly accessible online databases for complete mitochondrial genome sequences and curated COI barcodes. Metadata of all available full gastropod mitogenomes were downloaded from NCBI (query: “Gastropoda[Organism] AND mitochondrion, complete genome[Title]”, 05 October 2023). Mitogenomes that are not yet integrated into the NCBI RefSeq database (i.e., from INSDC GenBank) were included in this study. COI barcode information was downloaded from the Barcode of Life Data System (BOLD) using the R package ‘bold’ v1.3.0 (Dubois and Chamberlain 2023). Percentages of families, genera, and species that are represented by at least one full mitogenome or COI barcode in the filtered, taxon-matched dataset were calculated under each gastropod order.

Records with geographic coordinates were subset for global mapping. Data points holding exactly similar values for latitudinal and longitudinal coordinates were assumed erroneous and removed (n = 165). Occurrences found in landlocked countries were seen to be clerical errors in georeferencing (e.g., GBIF country estimation from wrong geographic coordinates) and were removed from mapping. A map was produced and further land masking was performed based on world country polygons from Natural Earth using the R package ‘rnaturalearth’ v0.3.4 (Massicotte et al. 2023). Lastly, to exclude records that may potentially have erroneous specimen origins from the metadata, we removed records that are found significantly outside a species’ average geographic distribution by calculating each occurrence’s latitudinal and longitudinal absolute z-scores based on species means and standard deviations, and setting a threshold of z-score = 3 in both latitude and longitude. Resulting data points with z-scores < 3 (n = 1,602,331) were binned into 5° × 5° grid cells and summarised by the number of unique species present in each cell. Regions considered to be “species hotspots” (i.e., grid cells that hold higher numbers of reported species) were highlighted in the map by heuristically setting a minimum threshold of 500 species based on quantile values at Q90 (400) and Q95 (667). Found species hotspots were clustered based on the marine realms and provinces sensu Spalding et al. (2007). The barcoding references per region was also assessed by mapping percentages of species with COI from georeferenced occurrences in each grid cell.

Finally, temporal patterns in occurrence events were explored in a subset comprising records that have event dates (i.e., date-stamped dataset; n = 2,712,886). Here, data points were binned into five-year intervals. Changes in the relative contribution of the different data publishing organisations and their associated countries through time were explored. Each publishing organisation was classified into 12 categories (Table 2) sensu Groom et al. (2017). The relative contributions of each organisation category to the dataset were likewise explored.

Table 2 Relative contributions of the different publishing organisation categories to species occurrence records of marine gastropods in GBIF

Results

Systematic division of marine gastropod taxa

A filtered dataset of 3,904,314 occurrence records was obtained from GBIF, all taxon-matched and annotated with valid species names and Aphia IDs from WoRMS. The dataset consists of specimens collected or observations made from March 1662 until September 2023. Six gastropod subclasses were represented with widely varied relative occurrences (Fig. 2a). Caenogastropoda makes up the bulk of the dataset and accounts for 64.7% of the records, followed by Heterobranchia at 17.8%. Vetigastropoda accounts for 12.4% of the records and the rest are shared by Patellogastropoda (3.7%), Neritimorpha (1.2%), and Neomphaliones (< 0.1%).

Fig. 2
figure 2

Systematic division of marine gastropods found in the GBIF dataset. a Relative abundance of occurrence records by gastropod subclass. b Number of species within each gastropod order found in the GBIF dataset. Thin bars represent the number of gastropod species documented in WoRMS/MolluscaBase. c Comparison of current totals of accepted extant marine species from WoRMS and this study’s dataset from GBIF

At least 33,268 unique and valid marine species (33,987 when including subspecies, varieties, and forms) under 3291 genera belonging to 380 gastropod families were reported in the dataset (Table 1, Online Resource 1). About 64.5% of the listed species are Caenogastropoda, over half of which belong to the order Neogastropoda (Fig. 2b). Neogastropoda, also the most speciose of the orders in the dataset, accounts for 13,014 of the species. It is followed by Littorinimorpha (5922), Trochida (2245), and Nudibranchia (2100). These four most speciose orders are also the most genus-rich and account for 70% of all reported genera. Nudibranchia has the greatest number of reported families with 71, followed by Littorinimorpha, 67, and Neogastropoda with 62 families.

The total number of species represented in the GBIF dataset covers 83.3% of the current total number of accepted extant marine gastropod species curated in WoRMS (n = 39,992; Fig. 2c). By taxonomic order, the relative abundance of species in the dataset are found to generally correlate with WoRMS totals (Fig. 2b, Table 1).

Mitogenome and COI barcode availability

We obtained a total of 1055 full mitochondrial genomes of Gastropoda from NCBI RefSeq and INSDC (GenBank) which represents 292 genera of marine, brackish, freshwater, and terrestrial gastropods. Out of the 380 reported marine families in the GBIF dataset, 103 (27.1%) have at least one species with a sequenced full mitogenome (Table 1). Fifty (48.5%) of the mitogenome-represented families are Caenogastropoda, whilst 35 (33.9%) of these are Heterobranchia. All six gastropod subclasses have at least one full mitogenome sequenced, although the Heterobranchia orders Pteropoda and Umbraculida are found to have none as of writing. Among the 33,268 reported species in the dataset, only 4011 (12.1%) have been barcoded according to BOLD (Table 1).

Data sources and publishing institutions

Museums supply almost 60% of species occurrences in the filtered, taxon-matched dataset (Table 2). Until the 2010s, museums had been the major component of the recorded species occurrences (Fig. 3). However, in the last decade, GBIF submissions by citizen science platforms (e.g., iNaturalist, Seasearch) increased significantly and became the dominant source of marine gastropod observations during the period (Fig. 3, Online Resource 2).

Fig. 3
figure 3

Relative contribution of the 12 categories of all publishing organisations to species occurrences recorded from 1662 to 2023, demonstrating the shift from museum-based collections towards citizen science observations

The geo-tagged, z-limited dataset, whilst with unverified country assignments, comprises at least 199 observation countries and territories (Online Resource 3), the numbers of occurrence records in which range from two in Monaco up to 272,225 in Australia. It incorporates 1430 different datasets that were submitted by 334 publishing organisations which are registered in one of 55 countries or are produced by international initiatives (e.g., Conservation of Arctic Flora and Fauna, International Barcode of Life Consortium) (Online Resource 2).

Reported species hotspots and COI barcoding cold spots

Retaining only georeferenced, land-filtered, non-outlying observations yields 21,433 distinct accepted species that are found to be widely distributed albeit disproportionate across all oceans (Online Resource 4). When mapped, non-empty grid cells are each found to contain up to 60,280 observations, with up to 2981 distinct species.

Species hotspots, defined in this study as 5° × 5° grid cells containing at least 500 species, were found in 93 cells and assigned into 28 marine provinces across nine marine realms (Fig. 4a, Table 3, Online Resource 5). Disparity in species numbers is observed among species hotspots, between provinces, and between realms. The greatest numbers of reported species are found in the two marine provinces that comprise the Coral Triangle, the islands of New Caledonia in the Tropical Southwestern Pacific, the Caribbean Islands of the Tropical Northwestern Atlantic, and around Madagascar and Réunion Island of the Western Indian Ocean. Reported species numbers are also high in the Tropical Eastern Pacific, in the southern end of the Kuroshio, and in the tropical to temperate range of the Eastern Australian Shelf.

Fig. 4
figure 4

Maps showing the distribution of the a 93 species hotspots cells based on georeferenced occurrences (numbered 1–28 by marine province) and b percentages of COI barcoding coverage for the recorded species within each 5° × 5° species hotspot cell

Table 3 Cumulative number of reported occurrences, total valid species, and species with COI barcodes in the 93 grid cells considered as species hotspots and grouped by marine province

Gaps in COI barcode availability were also revealed for the reported species within each cell, within hotspots, and their corresponding provinces and realms. Whilst the less species-rich, non-hotspot grid cells are fairly covered (Online Resource 6), COI/species ratios show a general shortage in barcode sequences for genetic identification of marine gastropod species. COI barcode coverage ranges from 12.3 to 62.3% (M = 38.0%, SD = 12.2%) among hotspot cells, which have 505–2981 reported species each (Fig. 4b, Online Resource 5). This general deficit in COI coverage and disparity between regions are echoed at the broader scale of marine provinces (Table 3). Hotspot provinces cumulatively hold 523–4464 species each and have COI coverage of 16.3–58.3% (M = 35.1%, SD = 10.7%). Meanwhile, disparity in COI coverage was also revealed among the more speciose provinces. A notable although counterintuitive example would be the Southeast Australian Shelf and the tropical and subtropical provinces in the western Atlantic, with 16.3–21.2% coverage, which are less likely to have a COI barcode for a given sampled species, as opposed to the even more species-rich provinces of Central Indo-Pacific, which are shown to have better COI coverage (27.9–58.3%).

Discussion

Distribution of marine species richness and barcoding coverage

This study provides the first baseline information on the global taxonomic richness of marine gastropods based on GBIF species occurrence records, totalling up to 33,268 valid species. Species hotspots based on the reduced z-limited dataset (comprising 21,433 species) were found to be unevenly spread throughout the globe, with 54 out of the 93 hotspots unsurprisingly concentrated between Central Indo-Pacific and Temperate Australasia whilst the rest of the provinces present 1–9 hotspot cells each. To a certain extent, these results could be reflective of actual patterns of biodiversity distribution. In general, diversity tends to increase towards the lower latitudes with various hypotheses posed (e.g., mid-domain effect, species-energy hypothesis, effect of climate stability or harshness on species persistence, faster rates of microevolution in the tropics, etc.), although why both terrestrial and aquatic biodiversity peak nearer the equator continues to be one of the major questions in biogeography. In the marine realm, this gradient is long-observed to peak longitudinally at the Coral Triangle, an Indo-Pacific region comprising six archipelagic nations (Hoeksema 2007). The origin of this megadiverse region has been associated with several, possibly synergistic hypotheses: as a (1) centre of speciation, (2) centre of geographic overlap of Indian and Pacific Ocean fauna, (3) centre of accumulation of expanding geographic ranges of species, or as a (4) centre of surviving old lineages (Bellwood and Meyer 2009; Barber and Meyer 2015).

The geo-tagged, z-limited dataset, however, also exhibited high disparity in the number of records distributed throughout the globe. In a broader context, in addition to showing the geographic distribution of gastropod diversity in nature, the dataset more clearly reveals the uneven levels of sampling effort made throughout history, which may have confoundingly shaped the biodiversity patterns perceived in this study. For example, as it stands, there are only nine occurrence records of marine gastropods found for Cameroon: eight of these were submitted by the Muséum National d'Histoire Naturelle (MNHN, Paris, France) pointing to occurrences of the borsoniid Genota mitriformis (W. Wood, 1828) and one from the zoological collections of Universität Ulm (Baden-Württemberg, Germany) for one horaiclavid Micropleurotoma melvilli (Sykes, 1906). Given the country’s 400-km coastline and knowledge of species range and ecology, it is highly implausible that Cameroon has just nine records of marine gastropods. Consequently, this means that the dataset in its current form, despite the millions of submitted records it contains, could have still been restrictive for certain countries and regions, like Cameroon, because of underreporting or an overall undersampling.

This bias in sampling effort is then further highlighted by the patchy geographic distribution not only of the overall counts of occurrence records themselves, but also of the observed number of species. With the degradation of marine habitats, backdropped by the continuously increasing ocean temperatures due to anthropogenic climate change (Hoegh-Guldberg et al. 2007; Peñaflor et al. 2009; Lough et al. 2018), it has become more urgent for the world to characterise baseline marine biodiversity for conservation in coral reefs, which, in turn, has stimulated more research in the tropical regions (Myers et al. 2000; Roberts et al. 2002). Exemplifying this are the brightest cells within the tropics, which may be traced to intensive sampling activities conducted in New Caledonia and the Philippines (Bouchet et al. 2002, 2009) where special emphasis was placed on various complementing sampling approaches to document the species richness of benthic molluscs. The work of Bouchet et al. (2002) of the MNHN detected 2738 molluscan species in New Caledonia (2187 of which were gastropods and around 90% of which were considered micromolluscs or molluscs sized 0.4–40.9 mm). Their results reflect both the potential yield from maximising sampling effort and the extent of undiscovered diversity especially within undersampled environments and taxa (or size classes). This highly focused work can account for the steepness of some diversity gradients found in our results. Fittingly, it is observed that the MNHN group has majorly contributed to New Caledonia (68.0% of the georeferenced records) and the overall Tropical Southwestern Pacific (57.9%), the Philippines (21.7%), as well as in several other hotspot provinces where they have been involved like the Mediterranean Sea (50.4%), Western Indian Ocean (47.8%), Southeast Polynesia (33.6%), and in the Eastern Coral Triangle (23.8%). A few other hotspots could also be well-linked to where research funding is available (e.g., Japan, Australia), or to where research has been prioritised (e.g., critically important but threatened regions like the Coral Triangle), or to old natural history collections linked with colonialism (e.g., the Caribbean, Australia). That there is a deficit of locally generated occurrence records or biodiversity research in general in previously colonised or economically poor countries has been previously observed (Fontanilla et al. 2014; Titley et al. 2017; Berba and Matias 2022) and could very well speak for itself, which only underscores the need for further, geographically systematic taxon sampling in these areas. Low-latitude regions outside the Coral Triangle or the Central Indo-Pacific may still be posed to house high levels of gastropod biodiversity and should be explored and investigated.

The important work ahead is further made evident by the limited availability of reference sequences that have been generated so far for marine gastropods. As high-throughput sequencing technologies have grown to be more efficient and affordable, so does de novo sequencing of genomes. Comparative phylogenomic analyses have all become important approaches in studying the biology and ecology of organisms. Full mitochondrial genomes have become useful data in illuminating unique genomic architecture (Knudsen et al. 2006; Grande et al. 2008; Rawlings et al. 2010; Sun et al. 2018; Ghiselli et al. 2021), in providing evolutionary insights on morphology and adaptation (Medina et al. 2011; Osca et al. 2014; Du et al. 2020), and in supporting or rejecting long-troubled phylogenetic hypotheses from within families across most subclasses to between broader taxonomic groups (Grande et al. 2002; Cunha et al. 2009; Arquez et al. 2014; Uribe et al. 2016, 2017a, b, 2019; Jiang et al. 2019; Varney et al. 2021; Sanders et al. 2021). However, current research tends to sample and sequence already extensively studied, usually economically important or public health-related taxa, or the easier-to-access freshwater and terrestrial taxa, or the congeners and conspecifics that warrant taxonomic delineation (Lopes-Lima et al. 2021). As a result, even with the huge amount of research, only a little over a quarter (27.1%) of the marine gastropod families listed in this study are represented with at least one mitogenome. As of writing, less diverse but equally important orders consisting of the pteropod and umbraculid gastropods are yet to be represented in NCBI RefSeq or in GenBank.

In terms of the availability of reference sequences for COI barcoding, whilst it is acknowledged that molluscs have become one of the most barcoded phyla among non-chordate metazoans (Kvist 2013; Mugnai et al. 2021), the enormous number of species within this group means an equivalent effort, if not much more, is needed to generate a comprehensive reference database. Since the inception of DNA barcoding as a tool for genetic identification of species (Hebert et al. 2003), it appears that only 12.1% of the total number of species of marine gastropods accounted in this study (36.0% at the genus level) have been coupled with a COI barcode. Geographically, this disparity in barcoding coverage has been made evident by the putative COI cold spots (e.g., species hotspots in the Tropical Atlantic, Tropical Eastern Pacific, and Southern Australia; Fig. 4b) where sometimes over 3600 species can be found but of which only about 17% are barcoded (Table 3). To address these barcoding resource inequities, we therefore continue to call for further taxon sampling, sequencing work, and prioritisation within the putative COI cold spot provinces. Moving forward, we also advocate for increased funding for and stronger involvement of local institutions within these regions by building capacity and actively following through open, inclusive and non-parachute collaborative biodiversity research (sensu Stefanoudis et al. 2021).

Open and inclusive science can pave the way for biodiversity research

The opening up of science and allowing access to research data have provided the opportunity to review global patterns of biodiversity. Open science allows for transparency and reproducibility and stimulates collaboration. Through this, the creation of many free and universally available biodiversity databases (e.g., GBIF), data standards (e.g., Darwin Core; Wieczorek et al. 2012), tissue and DNA repositories (e.g., Global Genome Biodiversity Network; Droege et al. 2014), and open-source tools (e.g., taxize; Chamberlain and Szöcs 2013) has been advanced. With the development of publicly accessible species occurrence repositories, data from centuries worth of biodiversity collections and research have been and are continually being made available. These can be rich sources of biological insights that will be useful for making proper ecological inferences and conservation strategies. However, despite the substantial taxonomic richness that our dataset accounts in this study, it proves to inadequately represent the diversity of marine gastropods in regions where records are lacking. Thus, the promise of an accurate and definitive global evaluation of marine gastropod diversity, though not fully realised herein, is still on its way to fulfilment.

With the continuous curation being done through WoRMS and MolluscaBase, the study was able to minimise the occurrence of outdated taxonomy uploaded in the GBIF dataset, which is most helpful and critical for taxonomically dynamic groups of organisms such as gastropods. As we also foresee an explosive increase in species occurrence records derived from metabarcoding environmental samples, the maintenance, expansion, and further integration and cross-referencing between these different open technologies will be crucial. These data points will come from slurries derived from plankton tows, settlement plates or gut (i.e., trophic) analyses, as well as the less invasive or destructive filtered water samples. The development and extension of GBIF’s capacity to track and version sequence variants (Abarenkov et al. 2023) will rapidly expand occurrence records beyond the current one to one specimen:record model. The ability to put species names to these sequence records will be significantly handicapped without augmented voucher-based reference libraries. Tools and benchmarks such as those developed herein can help steer expeditionary efforts and funding resources strategically to close our knowledge gaps for documenting life on the planet.

The analysis also reveals a noticeable shift in the categories of uploading institutions from museums and natural history collections towards volunteer-based organisations and biodiversity data centres that monitor and collate data in recent years. We attribute this to two main reasons—the rising involvement of citizen science and the massive shift away from specimen-based sampling in biodiversity studies—which have been some of science’s responses considering the current biodiversity crisis (Troudet et al. 2018; Byrne 2023). Observation-based sampling, as opposed to collecting and keeping actual animal specimens, is viewed as a less destructive and logistically easier way of gathering, storing, and sharing diversity data and has been favoured in recent years (although counterarguments signifying the importance and continued relevance of specimen-based sampling are also discussed by Gropp (2018) and Nachman et al. 2023). Observation-based occurrences have also been the capital of participatory citizen science projects such as iNaturalist, which allows for anyone with a camera to upload an observation of an organism and have a community of avid naturalists and scientists improve the record by vouching for or amending its taxon identification. The inclusion in GBIF of research-grade contributions resulting from this process (i.e., species occurrences that are well-photographed, taxonomically vouched, and georeferenced) has indeed supplemented species coverages, particularly for gastropods, as we later found in this study. Gathering the ten countries with the highest numbers of citizen science contributions (i.e., USA, Australia, UK, Netherlands, New Zealand, Canada, France, Indonesia, Norway, Philippines) shows an increase of uploaded underwater observations in such platforms, presumably as a consequence of the recent accessibility of underwater photography among recreational and professional SCUBA divers and the general interest for marine biota content in social media (Retka et al. 2019; Ruiz-Frau et al. 2020; Roberts et al. 2023). Nudibranchs in particular have been a favourite among underwater photographers due to the diversity of their colours, textures, and anatomy (Witabora and Homan 2021) and have had significant presence in social media platforms such as Instagram (Hoffman et al. 2022). Their rise in popularity may have been pushed further by the mainstreaming of science communication especially on biodiversity, ecology, and conservation (Burns et al. 2003; López-Goñi and Sánchez-Angulo 2018; Lamb et al. 2018; Heathcote 2021; Habibi and Salim 2021) alongside recent internet memes and videos on nudibranchs (e.g., Ze Frank 2020). We surmise that all these may have helped boost the species occurrence records of non-shelled gastropods like nudibranchs particularly in citizen science platforms, which would otherwise have been inadequately represented within museum collections that are predominated by shells (Fig. 5).

Fig. 5
figure 5

Relative occurrence of nudibranch and non-nudibranch gastropod observations made in the ten countries that have the most citizen science contributions

For a clearer picture of gastropod or molluscan taxonomy, diversity, and distribution, we believe that a more widely open and inclusive science will be key. It is estimated that 40% of the total morphospecies of gastropods are yet to be discovered and sampled from the wild. It has also been shown that the waning numbers of taxonomists and that the difficulties of sampling gastropods in more niche environments have become the main limiting factors in the discovery and description of more new species (Bouchet et al. 2016). In recent years, aside from the submission of species occurrence reports, the involvement of “amateur” naturalists and citizen scientists in biodiversity monitoring and research have contributed greatly to molluscan taxonomy through new species descriptions. In the study of Bouchet et al. (2016), 57% of new species descriptions from 2000 to 2014 have been first-authored by citizen scientists. This suggests that an even more active collaboration of academics and non-academics in malacology and taxonomy could only be beneficial to molluscan science, as has already been demonstrated by citizen science-led studies on biodiversity monitoring and discovery (Sneha Chandran et al. 2017; Smith and Davis 2019; Chow et al. 2022) and on geographic range expansion of both native and non-indigenous taxa (Nimbs and Smith 2018; Kleitou et al. 2019; Smith and Nimbs 2022).

The ubiquity and diversity of gastropods have made them ecologically, economically, and culturally important components of the ecosphere. Their evolutionary history, the richness of their taxa, and their skeletal records have all placed them in a unique position whose potential can be made further useful in biomonitoring and research in the context of biodiversity loss amidst rapidly changing marine environments. By pinning potentially important regions of high species richness and barcode deficiency, this study hopes to guide future work and research agenda to advance more taxon sampling, further support for barcoding efforts, and the participation and empowerment of local institutions and literally anyone who can help.