Introduction

Natural History collections hold the records of the history of life on Earth through collections of fossils that attest to the existence of a particular life form at a given time, including its evolutionary history, paleobiology and paleoecology, as well as data about the entire past ecosystems. Often, these life forms can be represented by a single whole specimen or a set of elements that can be assigned to a single specimen (e.g. a partial skeleton of a dinosaur). Constructing this record can be challenging, especially if the lineage of some fossils cannot be fully traced.

The Sociedade de História Natural (SHN), based in Torres Vedras (Portugal), is a non-profit scientific organization that manages an important paleontological collection, part of which was collected by an amateur palaeontologist. The process of discovery of these fossils and the lineage of the collection are not fully known, except for the location where they were collected, making it harder to determine and individualize specimens. The paleontological collection of the SHN corresponds to one of the most significant paleontological collections of the Iberian Peninsula, comprising numerous pieces of evidence of plants, invertebrates, and, especially, vertebrates from the Mesozoic of the Lusitanian Basin in Portugal, including fishes, mammal forms, lepidosauromorphs, crocodylomorphs, pterosaurs and dinosaurs. The researchers recognized the undeniable scientific value of the collection, which is already providing numerous contributions to the evolutionary history of Portugal's Late Jurassic-Early Cretaceous ecosystems, including some new species for science.

To better understand the characteristics of the collection, we developed a new management strategy based on the spatial location from where the fossils were found and retrieved. This new approach was based on establishing a methodology for field collection and the development of a spatial database to manage the entire stages of paleontological collections management. The new strategy proved to be extremely helpful in assisting the transferring of the collection to new storage facilities, supporting institutional partnerships and developing new research lines.

Geographic and geological context

The geographical area from where the fossils were collected consists of a stretch of approximately 90 km of the Portuguese West Coast spawning from Ericeira, in the south, to Nazaré in the north (Fig. 1), corresponding roughly to the coastal area of the central sector of the Lusitanian Basin. Furthermore, the geology also bears witness to past ecosystems, constituting what we propose to designate as a paleontological landscape by preserving testimonials other than large vertebrates. The paleontological collection of the SHN is mainly composed of fossils collected in the Mesozoic levels of the Lusitanian Basin from the Upper Jurassic deposits outcropping in this coastal area.

Fig. 1
figure 1

A General view of the Upper Jurassic cliffs on the Portuguese West Coast near Cambelas (Torres Vedras municipality). B Geographic extent of the studied area

The Lusitanian Basin is in the west of the Iberian Peninsula and is related to the opening of the North Atlantic (Kullberg 2000). It began to be differentiated in the Triassic due to the fragmentation of Pangea and evolved during the Jurassic and Cretaceous (Kullberg et al. 2013). The Lusitanian Basin sedimentary sequence was deposited from the Middle Triassic (Ladinian-Carnian) (Rocha et al. 1996) to the Late Cretaceous (Turonian) (Rey 1999). The Upper Jurassic sedimentary sequence represents a third rifting episode (Rasmussen et al. 1998; Kullberg et al. 2013), marked by intense subsidence and an internal differentiation in three main sectors (Rocha and Soares 1984). From the lower Kimmeridgian to the top of the Upper Jurassic, the sedimentary sequence was dominated by abundant siliciclastic inputs, and the coastal environments became shallow and siliciclastics prograded into the basin, with the gradual development of fluvial (e.g., Hill 1988; Manuppella et al. 1999; Kullberg et al. 2013). During the last part of the Late Jurassic in the central sector of the Lusitanian Basin, which includes the studied area, distal fluvial-deltaic and coastal environments are dominant and give place to the richest fossil record of terrestrial vertebrates in the Lusitanian Basin (e.g. Lapparent and Zbyszewski 1957; Dantas 1990; Antunes and Mateus 2003; Ortega et al. 2009). The stratigraphy of the Upper Jurassic sequence of the Lusitanian Basin is complex due to the profuse lateral heterogeneity and the proposal of several different stratigraphic approaches (e.g. Hill 1988; Leinfelder 1993; Manuppella et al. 1999; Schneider et al. 2009; Martinius and Gowland 2011; Kullberg et al. 2013; Taylor et al. 2014; Fürsich et al. 2022).

Late Jurassic fauna of the Lusitanian Basin

The Kimmeridgian-Tithonian deposits of the Lusitanian Basin are rich in fossil vertebrates, represented by numerous taxa of fishes, mammals, amphibians, and reptiles. Non-archosaur diapsid reptiles are represented by choristoderan, squamates and sphenodontians (Malafaia et al. 2010;). Turtles are abundant and diverse and represented by plesiochelyids, non-plesiochelyid eucryptodirans and paracryptodiran pleurosternids (Pérez-García and Ortega 2011, 2014, 2022; Pérez-García 2015; Pérez-García et al. 2023). The Upper Jurassic of Portugal is known for its rich and diverse fossil record of archosaurs. Crocodyliforms are represented by some early-branching crocodyliforms, thalattosuchians, and neosuchian forms referred to Goniopholidae and Atoposauridae (Young et al. 2014; Guillaume et al. 2019). Pterosauria fossil record is sparse and represented only by a few isolated elements referred to as Pterodactyloidea or indeterminate pterosaurs (Wiechmann and Gloy 2000; Bertozzo et al. 2021).

Dinosaurs are remarkably diverse and represented by various forms of Ornithischia, Sauropoda, and Theropoda. Ornithischia is represented by thyreophorans, including members of Stegosauria and Ankylosauria (e.g. Galton 1980; Escaso et al. 2007; Costa and Mateus 2019) and iguanodontian ornithopods that have been referred to camptosaurids, dryosaurids and a possible indeterminate neornithischian (Escaso 2014; Escaso et al. 2014; Rotatori et al. 2022). Sauropods are dominated by turiasaurs, diplodocids and macronarian, including camarasaurids, brachiosaurid and early branching somphospondylans (e.g. Mannion et al. 2012; Mateus et al. 2014; Mocho et al. 2014, 2016a, 2017a, b, c, 2019a). Theropods are mainly represented by medium to large-sized forms belonging to Ceratosauria or tetanurans, including members of Megalosauridae and Allosauroidea (e.g. Mateus 1998, 2006; Hendrickx and Mateus et al. 2014; Malafaia et al. 2010, 2015; 2017a, b; 2020), but some small theropods are also recorded. The analyses of the relationship between the Late Jurassic fauna of Portugal and North America have been a source of many studies. The Portuguese faunas are characterized by a complex array of exclusive and shared forms with other domains, mainly with North America, which is explained by the shared history of these territories (e.g. Pérez-Moreno et al. 1999; Escaso et al. 2007; Hendrickx and Mateus et al. 2014; Mocho et al. 2014, 2019a; Malafaia et al. 2020).

In recent years, the significance of the Portuguese West Coast for palaeontology has been further attested by the publication of previously unknown species (e.g. Pérez-García and Ortega 2011, 2014; Mocho et al. 2019a; Malafaia et al. 2020). In this concern, the SHN paleontological collections have supported the publication of many scientific papers in SCI journals, abstracts in national and international paleontological conferences, and PhD thesis (Pérez-García 2012; Escaso 2014; Mocho 2016; Malafaia 2018). We highlight the importance of having a managing strategy for the paleontological collections behind this research. The SHN paleontological collections currently host five holotype specimens of Late Jurassic vertebrates (Pérez-García and Ortega 2011, 2014; Escaso et al. 2014; Mocho et al. 2019a; Malafaia et al. 2020). They provided relevant data on the Mesozoic ecosystems of the Lusitanian Basin and helped to understand the evolutionary history of many vertebrate groups, especially dinosaurs. Herein, we will briefly overview some of the most important findings.

An important line of research has been carried out on the Kimmeridgian-Tithonian collection of turtle remains of the SHN, including the establishment of the oldest European pleurosternid Selenemys lusitanica Pérez-García and Ortega, the description of a new species of Hylaeochelys, Hylaeochelys kappa Pérez-García and Ortega 2014; and the first occurrence of Tropidemys in the Upper Jurassic of the Iberian Peninsula (Pérez-García and Ortega 2011, 2014; Pérez-García, 2015; Pérez-García et al. 2023, 2024). A new pterosaurian specimen from the SHN was recently studied, providing new data on this poorly known group of archosaurs in the Upper Jurassic of Portugal and supporting the first occurrence of a dsungaripteroid pterosaur in the Iberian Peninsula (Bertozzo et al. 2021). Sauropods are the most abundant dinosaurian group in the collection, including many partial skeletons and isolated remains. The SHN team has recently collected some of these specimens in the Torres Vedras municipality. This fossil record shows a great diversity, including remains of turiasaurs, diplodocoids and macronarians (e.g. Mocho et al. 2016b, 2017b, d; 2019a, b) and the holotype specimen of Oceanotitan dantasi Mocho, Royo-Torres and Ortega, one of the oldest somphospondylans found worldwide (Mocho et al. 2019a). The study of the theropod remains of SHN has provided relevant data about the diversity of this clade in the Lusitanian basin during the Late Jurassic (Malafaia et al. 2015, 2017a,b,c, 2019, 2020) and the establishment of Lusovenator santosi Malafaia, Mocho, Escaso and Ortega, the oldest carcharodontosaurian allosauroid yet discovered from Laurasia (Malafaia et al. 2020). Is also relevant the sample of ornithischians dinosaurs in the collection (Escaso 2014; Escaso et al. 2014), including the type specimens the dryosaurid ornithopod, Eousdryosaurus nanohallucis Escaso, Ortega, Dantas, Malafaia, Silva, Gaulla, Mocho, Narváez and Sanz. SHN hosts an extensive collection of crocodylomorph and dinosaurian tracks with many morphotypes already identified by Castanera et al. (2020a, b). Numerous lines of research are still ongoing. They are mainly focused on the phylogenetic systematics (e.g. fishes, turtles, pterosaurs, sauropods, theropods, ornithischians), paleobiology (e.g. paleohistology, paleopathologies), paleobiogeography (e.g. Escaso 2014; Escaso et al. 2018; Camilo 2019; Mocho et al. 2019b, 2022; Costa et al. 2021, 2022; Pérez-García and Ortega 2022; Rotatori et al. 2022; Pérez-García et al. 2023 and also in paleontological heritage management (Mano 2010; e.g. Escaso 2014; Escaso et al. 2018; Mocho et al. 2019b, 2022; Costa et al. 2021, 2022; Camilo et al. 2020, 2022;).

The collection(s)

Over the last decades, professionals and amateurs have collected thousands of fossils in the Mesozoic sedimentary deposits of the Lusitanian Basin (Portugal). One such effort is the José Joaquim dos Santos Collection (JJS Collection), mainly composed of fossil remains collected in Upper Jurassic deposits of the aforementioned basin and many handwritten notes on most of these specimens.

The JJS Collection comprises thousands of fossils representative of hundreds of specimens (some of these specimens correspond to a set of several remains). The municipality of Torres Vedras, recognizing the scientific value of such a paleontological collection, decided to acquire it in 2008. Subsequently, it entrusted SHN with its management, conservation and study, which at the time was being kept in improvised and personal facilities of José Joaquim dos Santos, near Torres Vedras (Figure 2)

Fig. 2
figure 2

The JJS collection original storage facilities

In addition to this collection, SHN also manages specimens collected through their field campaigns, usually conducted in collaboration with national and international academic partnerships. These specimens have a more formal acquisition cycle but only correspond to a small portion of the SHN paleontological collections compared to those in the JJS collection.

The SIGAP project

Natural History collections are usually organized around the concept of specimen – a sample attesting to the occurrence of a taxon or more at a particular time and place. Once the taxon is determined, it is associated with lithostratigraphic, paleobiogeographic and paleoecological data, related bibliography and other information that helps describe its contextual significance (Lane 1996). However, specimens whose lineage is unclear (i.e., not acquired using systematic methods and registration techniques) usually lack the link between the taxon and its context.

To understand the management strategy adopted by the SHN, it is essential to note that the fossils were stored in the best conditions that José Joaquim dos Santos could provide, albeit suboptimal. Unfortunately, the fossils were not prepared to be studied and classified accurately (i.e., they require preparation procedures to highlight morphological features), and it can be hard to determine which fossils belong together. It, therefore, could correspond to the same individual, taxon or to the same fossil assemblage. Apart from preservation issues, the most urgent issue about the collection was the lack of an inventory. The available information consisted solely of handwritten notes and the memory of José Joaquim dos Santos. These notes typically document the collection site and general anatomical characteristics when perceivable.

Before moving the JJS Collection to new storage facilities, the problems described above had to be addressed; however, establishing an inventory structured around the specimen corresponding to a single individual would take many years. Hence, the strategy chosen was to organize the collection by location and geological context because this information could be collected within a few months with the help of José Joaquim Santos's notes and memories. Determining specimens to a single individual or taxonomic unit will follow in the coming years as research fellows study the collection.

The importance of space and location as descriptors of Natural History collections specimens is widely recognized (Reed et al. 2004); what makes this case different is the role spatial information takes as the structuring element around which the paleontological collections are organized.

Once a first (location-based) catalogue of the collection is available, it would allow researchers to (a) group fossils by provenience and, from there, build specimen listings and (b) plan revisits to the locations from where collections were made to monitor if erosion uncovered further fossils, eventually requiring a systematic excavation. Such a strategy leads to a greater understanding of the fossil record and contributes to tackling challenges of Natural history collections, such as preservation and accessibility (Dorfman 2018).

Implementing a location-based approach to manage this paleontological collection took the form of a specially designed SIGAP project (GIS applied to Paleontology), an ongoing internal project initiated in 2008.Footnote 1 Its main objective is to georeference the provenience locations of the fossil remains from the JJS Collection and build a spatial database that can support the management of the paleontological collections – from on-site collection to, once it is determined, specimen registry. The following sections describe how this system was developed.

Methodology

Field collection

The first data-related task of the project was the acquisition of the locations where fossils were retrieved. These locations are designated as “paleositio”, and they can take one of two types: a sitejazida” or a dispersion areadisperso”. A site is recorded as a zero-dimensional geometry using a Global Positioning System (GPS) to identify the exact location from where one or more fossils, representative of one or more individuals, were retrieved. Thus, a site location is known with enough accuracy to determine the exact layer from where the fossils were collected. By contrast, a dispersion area represents a fuzzier type of collection and is recorded as a two-dimensional geometry. Due to the effects of erosion or human activities, this type of register is associated with findings detached from the original sedimentary layer that once encapsulated the fossil.

Field collection in the case of dispersion areas consists primarily of digitizing polygons over aerial imagery as background using QGIS open-source software (QGIS.org 2023)through visual image interpretation (Baker and Philipson 1997), focusing primarily on association properties. The association seeks proximity relationships between recognizable objects (a field, a road, a landscape feature, a building, etc.) to identify a location. It is a fast and efficient process we find suitable considering the locational fuzziness of the findings. A drawback of such an approach is that there is no accuracy level to be expected; the resulting polygons vary widely in size and shape, reflecting both the dispersion areas and what memory allows to reconstruct.

The case of sites, however, incurs subtleties that should not be ignored. Fossil sites are exposed primarily due to the effects of erosion. Once erosion exposes a fossil and its consequent discovery, the fossil is collected. However, one cannot exclude the possibility that more fossils from the same or another specimen are buried deeper into the layer and near the collected remains. Future erosion or the carrying out of paleontological works will eventually answer that question, but sites must be registered. Ideally, this site must be recorded with a spatial accuracy of under two meters so that the collection site can be monitored in the years to come if necessary.

This accuracy requirement, however, can be challenging to comply with. First, due to the harshness of the landscape (Fig. 1), very high-accuracy survey equipment like total stations or external antenna GPS systems would make the fieldwork slower and more dangerous. Secondly, most of the surveys were conducted at the bottom or halfway through the cliff, meaning degraded Dilution of Precision (DoP) index values and high reflectance errors – both well-known factors for loss of accuracy when using GPS measurements (Langley 1999; Sabatini and Palmerini 2008).

In such a scenario, the survey solution must rely on portable equipment to maximize acquisition speed and safety while supporting real-time differential corrections that minimize the loss of accuracy. The field collection used single-frequency handheld GPS receivers (Fig. 3) with support for two types of differential corrections: (i) a system of Virtual Reference Stations (VRS), a service provided by the Portuguese Army Geographic InstituteFootnote 2 allowing sub-meter accuracy, and as an alternative, (ii) the European Geostationary Navigation Overlay Service (EGNOS), allowing accuracy levels of around two meters. Both solutions eliminate the need to operate a second GPS receiver to serve as a base that corrects the rover measurements. However, the VRS network was the preferred differential method due to the higher accuracy offered.

Fig. 3
figure 3

Field data collection Serra do Bouro, Caldas da Rainha, Portugal, using a handheld GPS receiver

The VRS system relies on a series of Global Navigation Satellite System (GNSS) reference stations connected to a control centre. These stations collect information that can be used to correct the positioning of the rover GNSS (i.e. GPS) receiver once its position is known to the data control centre. This bi-directional communication relies on a data link, usually established using a cell phone network like 3 or 4G (Lou and Chen 2008).

The drawback of this system is that a cellphone with data services must be used with the GPS receiver via Bluetooth pairing. Furthermore, if, for some reason, the area to survey has no third (3G) or fourth (4G) generation of wireless mobile telecommunications technology network coverage, real-time differential corrections are lost. In such scenarios, EGNOS corrections are used.

The horizontal referencing of the data resulting from the surveys and any secondary spatial datasets was projected to the ETRS89 / Portugal TM06 (EPSG:3763) system, Portugal's official spatial reference system. However, no referencing system is adopted for Z values; thus, despite elevation values being automatically registered by the GPS receiver, they are essentially ignored.

Lastly, there is also the case of measurements where no correction was made. This was the case when it was impossible to have either cell phone coverage to use the VRS or a lock into the geostationary satellites of EGNOS to get differential corrections. In such a scenario, positional accuracy is always above the 2 m requirement mentioned earlier (Table 1).

Table 1 Number of sites registered with GPS and expected accuracy levels

Differences in expected accuracy are essential to understanding the motivations for having different GNSS acquisition methods. Measurements that take advantage of VRS correction signals were always preferred, but in the study area, often, there are blind areas where no 3G or 4G coverage is available – a requirement to use VRS corrections. In such a scenario, the second preferred method was to use the EGNOS geostationary satellite corrections to achieve accuracies within the desirable two-meter threshold. If both differential methods cannot be used, the site must be registered using autonomous GPS.

As a complement to the spatial registry, efforts were made to minimize difficulties in locating sites when revisiting them in the future, especially when the registry did not benefit from any differential correction. A photograph with a flag attached to a mast that sits at the location from where the fossil was collected, according to José Joaquim dos Santos's memory and notes, exists for every site related to the JJS Collection. These photos are as panoramic as possible, providing many reference points from the surroundings. For the JJS collection, 361 paleosites – 265 sites and 96 dispersed areas – have been registered using the abovementioned methods.

Secondary data

In parallel with field data, additional spatial datasets provided context and spatial relationships to the paleosites (Table 2).

Table 2 Secondary datasets used in the project

Topographic maps are paramount to understanding the spatial context and identifying any landscape features that might play a role when accessing the site. The geological maps provide information about the lithostratigraphic and chronostratigraphic units and structural features (e.g. faults) associated with a site. Orthophotos complement topographic maps because they are used for spatial context and visual interpretation. Finally, administrative boundaries allow for the retrieval of territorial units where the sites are located.

In the case of topographic maps, only the individual sheet number is used to refer to the paper-printed map, and in the case of the geological charts, they were scanned, georeferenced, and geometries were distilled via manual vectorization. In both cases, this workflow results from closed data policies that many Portuguese public institutions, unfortunately, still maintain (Publications Office of the European Union 2020).

A few datasets referenced in Table 1 are encoded in a reference system different from those adopted for the project (ETRS89/TM06 Portugal). In such cases, datum transformations are required to bring those datasets in line with standard horizontal spatial referencing. Datum transformations used the National Transformation version 2 (NvT2) grids developed by José GonçalvesFootnote 3 to minimize datum transformation errors introduced by standard 3 and 7 parameter transformation methods.

Management Strategy

Managing a paleontological collection comprehends several stages (Frick and Greeff 2021) whose order and nature may vary depending on local circumstances. The methodological lineage of the JJS Collection is not fully known, especially in the earlier stages when the specimens were discovered and collected. Thus, we adopted the following management stages:

  1. A).

    Spatial registry stage – the location where the fossils were found is registered, and all the spatial context associated with that location is also inferred (administrative unit, geology, etc.).

  1. B).

    Specimen registry stage – the fossils are prepared in a laboratory and given a specimen identifier.

  1. C).

    Specimen lineage stage – consists of keeping track of any preparation a specimen has undergone and publication works associated with that specimen. That includes keeping a record of who did those works.

  1. D).

    Inventory stage – specimen preservation and storage policies, including specimen loans to partner institutions, are defined in this stage.

  1. E).

    Landscape stage – a complementary phase that aims at planning and territory management; it includes the registration of geological landmarks, reporting incidents related to paleontological heritage, and information about susceptible areas.

For stage A (spatial registry stage), the spatial datasets described thus far allow spatial management of the collection, that is, both the locations of the paleosites, and contextual information and spatial relationships that might be relevant. The spatial context involves knowing what other spatial objects intersect the fossil paleosites and dispersion areas. From this intersection, relevant attributes should be automatically associated, a simple requirement that excludes any manual editing to ensure semantic consistency, an essential component of data quality.

Upon conclusion of the registration of Stage A, we proceed with Stage B (specimen registry stage), C (specimen lineage stage), and D (inventory stage), which correspond to the non-spatial part of the database. Stage E (Landscape stage), the last one, is not yet implemented, although the database design assumes it will be developed soon. These stages do not necessarily follow a predetermined sequence as the alphabetical enumeration may suggest. The presented order is the typical sequence of management stages we follow, but concurrent stages over the same specimen are possible.

These stages of work are a generic framework in which the specific user and organizational needs and requirements are identified. Identifying these requirements is an essential step in the process of designing a database, and it should precede any implementation effort, ideally including input from the future users of the system or other relevant actors (Yeung 2007).

Spatial database

Developing a relational spatial database is the core of the GIS approach adopted by SHN. It is an ongoing effort based on PostgreSQLFootnote 4 with PostGISFootnote 5 spatial extension enabling the handling of spatial objects. The blueprint of the database structure takes the form of a Unified Modeling Language (UML) class diagram capturing the requirements of the institution and use cases (Fig. 4; Appendix 1).

Fig. 4
figure 4

Class diagram of the spatial database (see Appendix 1). Each class has one additional letter label to highlight the management stage the class is primarily used for. This representation option also has the advantage of making it easier to discuss functionality with the users

The database structure is a direct consequence of the management stages described previously. All the tables associated with the spatial registry stage, denoted with the A-label, refer to primary field data collection or spatial tables providing additional spatial context—this is the first body of data to be collected. Once the spatial registry is completed, the specimen registry stage can start – B label. This stage consists of non-spatial data about the specimen and any associated preparation and publishing.

Some of the tables relate to more than one stage. In the case of the specimen lineage stage – C label, the history of fossil preparation (including the description of conservation-restoration procedures) is kept in a separate table. However, the lineage of this information can also come in the form of publications that might also be relevant for the specimen registry stage. Once a specimen and its lineage are described, another set of tables keeps records related to inventory. That corresponds to the inventory stage – D label–- it is the most downstream stage of the collection management and can only exist once a formal cycle of acquisition and study occurs. The last tables are those related to the landscape stage – E label. That is where the information related to the paleontological landscape concept is stored, including risk assessment and geomorphological features.

A Model-Driven Architecture (Siegel 2010) approach was adopted to translate the class diagram into a Platform Specific Model implementation in the form of SQL statements valid for PostgreSQL database server (a link to access the implementation script is available under the Supplementary Materials section).

A great effort is made to ensure data consistency, particularly in semantic accuracy, a fundamental condition for having queries return complete and consistent results. That is achieved via two mechanisms: the first is to set a predefined set of allowed values for a given column using foreign keys that point to tables with a fixed structure; the other is to use spatial triggers to automatically fill in any fields whose data is obtained from a spatial predicate or a spatial function (Fig. 5). The key idea is to reduce as much as possible the amount of data to be specified by the users of the database, a principle often mentioned among data quality experts.

Fig. 5
figure 5

Data consistency mechanisms for the case of the paleosite table. On the left, the class view–- only attributes in bold must be declared manually. The remaining attributes are determined by operations in the form of database triggers to ensure consistent objects (on the right)

To further enforce data consistency, very few users have DELETE and UPDATE rights on any given table. Instead, most roles only have the SELECT right, and anytime a user with editing privileges changes a table, the record(s) that were updated or deleted are stored in the correspondent history table. This mechanism opens the possibility of recovering a previous state at any given point of the data lineage.

Along with data consistency, the database must also provide table structures the users expect. Several database views construct tables from join conditions (Fig. 6), thus sparing the users from understanding the data model and other intricacies behind the database.

Fig. 6
figure 6

Example of a database view showing a predetermined set of attributes. The SQL code that produces database views is included in the implementation script – see the Supplementary Materials note)

Web Services and Web GIS

The usability component should not be underestimated, and along with the views, user interfaces are necessary. The outreach and usability of a database depend on the interfaces and interoperability mechanisms built over it. Interoperability is ensured by using Open Geospatial Consortium (OGC) standard web services specification. Two types of services are provided using QGIS Server: Web Map Services (WMS)Footnote 6 and Web Feature Services (WFS).Footnote 7 The first service distributes the data as a static representation and serves the cases where SHN wants to share data more restrictedly. On the other hand, the second allows vector and tabular data to be encoded in Geographic Markup Language (GML) format. This second service is suitable for sharing data with research partners without giving them direct access to the database.

Regarding user interfaces, we maintain a Web GIS based on Lizmap,Footnote 8 which integrates well with QGIS and PostgreSQL. Lizmap also has the advantage of requiring little, if any, code development while still offering essential functionalities for a platform aimed mainly at researchers: advanced searches, photo visualization, measurement tools, and user authentication (Fig. 7).

Fig. 7
figure 7

The SIGAP web application

Results

The SIGAP project is an ongoing effort whose outcomes are proving valuable for SHN. The immediate result is an initial inventory of fossils that were necessary before the transfer of the SHN collection to new storage facilities (Fig. 8) without splitting sets of fossils. A set of elements can comprise a unique specimen if it results from being collected in the same location. We avoided separating or dividing the elements that comprise those specimens during the collection transfer to minimize possible data loss. The establishment of sets of fossils, i.e., specimens, based on their spatial location, in a collection with a poor collection lineage, was vital for the identification of fossil assemblages and individuals, which in turn led to the development of some already published studies (e.g. Mocho et al. 2019a; Malafaia et al. 2020).

Fig. 8
figure 8

The new storage facilities for the paleontological collections

Along with the listing, SHN also has profile information about the collections. As an example, the collection can be characterized based on its distribution by genus (Table 3), geological background (Table 4), apparent anatomical parts (Table 5) and administrative context (Table 6).

Table 3 Taxons distribution by paleosite (subset)
Table 4 Geological distribution by paleosite (subset)
Table 5 (Apparent) anatomic part count (subset)
Table 6 Administrative distribution by paleosite (subset)

These overviews do not constitute new scientific knowledge by itself. However, they are essential to provide a preliminary characterization of the collection by delimiting groups based on their attributes to facilitate the establishment of priorities for (i) future lines of research, (ii) procedures for Conservation-Restoration and (iii) management policies for the collection. They can also take the form of cartographic products (Fig. 9) that can be used to plan field campaigns, apply for funding, and enhance internal and external communication.

Fig. 9
figure 9

Example of a cartographic product: map of paleosites around Praia Azul (Torres Vedras municipality, Portugal)

From a planning and spatial perspective, the SIGAP project facilitated the inclusion of paleontological heritage as a possible restriction to some human activities within the municipality of Torres Vedras. The inclusion of paleontological elements in the new Plano Director Municipal (i.e. Master plan)Footnote 9 was based on the location data collected by SIGAP. This inclusion sets, along with the case of the Municipality of Lourinhã,Footnote 10 an important precedent for the protection of Paleontological heritage in Portugal by including a category designated Sitios de especial relevância paleontológica (i.e. Sites of particular paleontological relevance) as one of the categories of Património Natural (i.e. Natural heritage) of the Planta de Ordenamento (i.e. Ordinance plan).Footnote 11 This inclusion essentially means that in the future, impacts over sites of paleontological interest will have to be mitigated or avoided in the event of human activities or developments that incur in soil movements.

Last but not least, having information stored consistently and the ability to share that information – via web services, Web GIS, or simply raw data, allows SHN to build partnerships more efficiently due to the quality and reliability of the input.

Discussion

The spatial-led approach to managing paleontological collections we propose has shown to be a valid strategy for cases where the traceability of the collection and the storage conditions are not ideal or entirely known, as it is the case with the collection collected by José Joaquim dos Santos. Our experience supports the claims for a need for more flexible collection management approaches, as Schindel and Cook (2018) suggested. Structuring the collection of José Joaquim dos Santos around spatial information not only provides a first form of inventory, but can also assist with the fossil’s features in confirming what constitutes a specimen. Organizing the collection around a specimen is a defining characteristic of most Natural History collection management approaches, and our proposal seeks to facilitate this standard approach. Therefore, our approach is complementary to the existing one and should not be seen as a replacement.

From a technical perspective, open-source software saves scarce financial resources that can be invested in studying the collections; it is also a more sustainable and inclusive choice. By sustainable, we mean the definition provided by the Sustainability Institute as “[…]the software you use today will be available–- and continue to be improved and supported–- in the future”.Footnote 12 In the case of QGIS and PostgreSQL, there is no reason to fear the loss of the sustainability factor: the number of active contributors, commits, and institutional adoption all confirm the solidity of these projects. The inclusiveness attribute refers to the few juridical barriers to the project’s use and distribution, lowering the entry requirements for using the tools and the data.

On top of implementation approaches, the principles presiding over the development of a GIS system are of no less importance to the project’s success. The project is deeply rooted in the acquired know-how of SHN and supported the inventory of the paleontological collection of José Joaquim dos Santos, organizing thousands of fossils in specimens based on spatial location. Therefore, the continued development of the SIGAP project at SHN rests on the human capital of experiences. These are the basis for defining the project's needs, objectives, and approaches.

The assessment of the spatial-based approach is an exercise for the coming years, as there are not that many examples of applying GIS to manage paleontological collections (Garcia Ortiz de Landaluce, 2015). Nevertheless, mature examples of GIS supporting specimen collection in biology (Funk et al. 1999) or applied to broader spatial analysis in archaeological contexts (Chapman 2006) are promising examples of the value of the GIS in managing collections where location and space are part and complement of the equation.

Conclusion

A total of 371 paleosites have been recorded as of November 2020. They mainly belong to the JJS Collection managed by the SHN, from where an undetermined number of specimens, representative of an equally undetermined number of taxa, were collected. The system developed in the scope of the SIGAP project has successfully kept track of a valuable collection by providing anchoring points (location) to the initial inventory and organization of the collections. The information managed by the database is expected to grow as stages B, C and D–- specimen registry, specimen lineage, and inventory occur. The significant development for the near future is the construction of paleontological heritage risk models that will establish management strategies to tackle the challenges of the paleontological landscape concept we enunciate. To achieve this, previous experiences in developing erosion models in the scope of paleontological heritage and mitigation of spoliation risks (Blasi et al. 2021) confirm the relevance of such an effort.