1 Background and rationale

Mozambique lies on the southeast coast of Africa, between latitudes 10°27′S to 26°52′S and longitudes 40°51′E to 30°12′E, bordered by Tanzania in the north, the Indian Ocean in the east, Zambia to the northwest, Malawi, Zimbabwe, and Eswatini to the west and South Africa to the west and south. The country has a total area of 801,590 km2 (Instituto Nacional de Estatística 2020), about half of which is covered by forests (The World Bank 2021; Cianciullo et al. 2023), with other woody vegetation scattered across the country. The climate is tropical and dry over most of the country, moderated by the influence of mountainous topography in the north-west of the country and with sub-tropicals conditions in the south (Mcsweeney et al. 2006). Geographically, Mozambique can be roughly divided into two main regions: a northern region, which consists of a large plateau, and a southern region, characterised by lowlands (Ministry for the Coordination of Environmental Affairs 2014). Similarly, the country geology can be divided into two regions, with a dominance of sedimentary rocks in the south (Rutten et al. 2008) and the ancient granite rocks in the north and west-central regions (Boyd et al. 2010). Such geological and climatic factors create a high biogeographical complexity. According to Burgess et al. (2004) thirteen ecoregions are present in Mozambique (Fig. 1), resulting in a high biodiversity richness.

Fig. 1
figure 1

Mozambique ecoregions (Burgess et al. 2004; Olson 2020; OCHA 2022)

Floristically, 7099 taxa (5957 species, 605 subspecies, 537 varieties), 87% of which are native to Mozambique (Odorico et al. 2022), are known in the country. This knowledge on floristic diversity has rapidly grown in the most recent decades, due to the novel increase of botanical exploration in the country (Cheek et al. 2018) and to the free access to international botanical data allowed by online botanical databases. Mozambique’s fauna diversity, on the other hand, still remains broadly unknown, despite its richness. Such a lack of biodiversity knowledge can be related to the country’s political instability over the past decades, which hindered studies on biodiversity and prevented its documentation (Neves et al. 2018). Indeed, large areas of Mozambique remain unexplored (Branch et al. 2005) and more in-depth, systematic, and homogeneous investigations of the country’s biodiversity are required, especially in the north where further surveys are recommended to properly identify conservation hotspots (Schneider et al. 2005). Currently, according to Neves et al. (2018), 217 terrestrial mammal species occur in Mozambique. As regards the ornithofauna, BirdLife International (2022) records 678 bird species. More than 3000 insect species, including 433 species of butterflies (Papilionoidea), are estimated in the country (Sandramo et al. 2021; Sáfián et al. 2022). Regarding information on the herpetofauna, Mozambique is recovering from having one of the most poorly known in Africa (Branch et al. 2005), though a significant lack of knowledge about the occurrence and distribution of taxa in the country persists (Neves et al. 2019). Despite recent annotated checklists of Mozambique herpetofauna (Portik et al. 2013), the country's herpetological knowledge still remains limited to certain areas and is mostly incorporated in comprehensive researches on southern and eastern Africa herpetofauna; more than 100 amphibians and 294 reptiles are currently known in the country (Ceríaco et al. 2021). In the light of this data, Mozambique's biodiversity richness still remains underestimated and needs to be properly documented.

Considering the crucial role of biodiversity to support the sustainable development of Mozambique (MITADER 2015), a proper system to collect, organise and make available reliable biodiversity data is needed. In the light of the widespread unavailability of biodiversity data for institutions, researchers and decision-makers, due to the scattered collection of data in several different repositories differing in platform, structure and data semantics (Edwards et al. 2000, Schuurman and Leszczynski 2008, Heidorn 2011, Martellos and Attorre 2012), such a system represents a vital pillar to achieve an effective species extinction risk prevention and to implement any biodiversity conservation strategies (Niza et al. 2021). This particularly concerns African countries, who face several challenges, such as data collection and database access and management, which hinder the flow of biodiversity data (Stephenson et al. 2017).

Especially since the Earth Summit of Rio in 1992, electronic access to biodiversity information has become a priority task worldwide, leading to great progress in the field of Biodiversity Informatics that applies techniques from Information Technologies to improve the management, presentation, discovery, exploration, integration, and analysis of biodiversity data (Martellos and Attorre 2012; Costello et al. 2013; Gadelha et al. 2021; Heberling et al. 2021). This, amongst other relevant results, led to the creation of global federated networks for the aggregation of biodiversity data, such as the Global Biodiversity Information Facility (GBIF) (http://www.gbif.org/), which aims to make existing data on planetary biodiversity freely and universally available covering both observational and specimen data, and the Biological Collection Access Service (BioCASE) (http://www.biocase.org), which is focussed on specimen data from natural history collections (Holetschek et al. 2012).

Within this framework, one of the strategic needs expressed by the Mozambican institutions was the development of a national biodiversity data repository to aggregate, manage and make data available online. Such a tool is seen as a priority action in understanding the economic, social, and ecological value of biodiversity and elaborating effective conservation strategies (MITADER 2019). Furthermore, the establishment of national repositories of biodiversity data would overcome the limitations of a data collection too often carried out in the framework of international projects, at the end of which these data disappear (Pacifici et al. 2018).

Hence, it was planned to transfer to Mozambique previous experiences, such as the Italian National Biodiversity Network (Martellos et al. 2011) and the Biodiversity National Network of Albania (BioNNA) (Pacifici et al. 2018), aiming at the development of a sustainable infrastructure for the aggregation, organisation and sharing of primary biodiversity data (i.e. data obtained from floristic and faunistic observations and from specimens of biological collections): the Biodiversity Network of Mozambique (BioNoMo). BioNoMo initiative involves many data providers, such as academies, institutions, research centres and related projects in Mozambique, and aggregates data collected from different sources. Converting such data to a standard format, BioNoMo aims to make them available to institutional decision-makers and planners, researchers, eco-tourists, and the general public, acting as a repository of primary biodiversity data generated from past research initiatives, digitization of existing biological collections, as well as by future projects and citizen science initiatives.

In this paper, we present and describe the network of data providers, the architecture of BioNoMo, the data it currently aggregates, its WebGIS platform, and its impact and future perspectives on biodiversity conservation strategies in Mozambique.

2 Approach and methodology

2.1 The Network: training activities, data management and sharing

BioNoMo was established in 2016 as a joint initiative of Eduardo Mondlane University of Maputo (Department of Biological Sciences and Centre of Biotechnology) and Sapienza—University of Rome (Department of Biological Sciences) in the framework of SECOSUD II Project—Conservation and equitable use of biological diversity in the SADC region (http://www.secosud2project.com/), funded by the Italian Agency for Development Cooperation and jointly implemented by the two aforementioned universities. The engine of the initiative is the network of data providers: a cooperation between different institutions which generate, organise, and share primary biodiversity data, feeding the data repository and steering the initiative. The network currently includes:

  • The Department of Biological Sciences of the Eduardo Mondlane University (UEM DCB)

  • The Department of Environmental Biology of Sapienza University (DEB SUR)

  • The Ministry of Land and Environment of the Republic of Mozambique (MTA)

  • The Natural History Museum of Maputo (MHN)

  • The Herbarium of Eduardo Mondlane University (LMU)

  • The National Herbarium of Mozambique, hosted by the National Institute for Agricultural Research (LMA)

  • The Mozambican Institute for Fishery Research (IIP)

  • The Plant Health Department, Ministry of Agriculture and Rural Development of the Republic of Mozambique (MADER)

To transfer the knowledge and capacities required for the digitization and management of primary biodiversity data, training courses have been provided to selected technicians from all partner institutions: the trained personnel were then appointed as digital collection managers and dedicated teams have been created in each of the aforementioned institutions. Where needed, these teams have been complemented with research assistants trained and hired by the SECOSUD II project. The training courses covered basic concepts of database management systems, digital collections, international standards for mapping biological data, global biodiversity data sharing initiatives, and data munging and cleansing techniques. Digital collections management teams in partner institutions have been provided with the equipment required for data processing and storage (laptops and/or desktop computers, monitors, internet access, etc.) depending on the specific needs of each institution. The SECOSUD II project staff offers technical support for the setup of the data digitisation, management, and quality control process, as well as continued assistance during the preparation and final quality control of the data sets.

Each data provider in the network has full and direct control on the digital collections created from its own data, and decides whether to share those publicly and to what extent (whole collection or part of it), and is also involved in the decision-making process related to participation and contribution to parallel initiatives, fundraising, partnerships, inclusion of new data providers, etc.

2.2 Under the hood: nodes, data repository, portal and webGIS

Data providers in the Network are responsible for feeding and maintaining the server nodes in which primary biodiversity data are stored. Collection managers and data digitization teams are in charge of this task.

The preparation of a data set for sharing and publication involves the following steps:

  1. 1.

    Digitization of the information (in case of biological collections or field notebook data).

  2. 2.

    Georeferencing (if needed) through geocoding of location information and comparison with topographic maps (also historical).

  3. 3.

    Control and update of the taxonomic information: scientific names and classification trees are updated according to a global taxonomic reference as follows:

  1. a

    Tropicos and International Plant Name Index (IPNI) for vascular plants.

  2. b

    FishBase and World Register of Marine Specie (WoRMS) for aquatic vertebrates and crustaceans.

  3. c

    National Center for Biotechnology Information (NCBI) and Catalogue of Life (COL) for all other groups.

  1. 4

    Quality control of the data (consistency and completeness of the information, absence of typing errors, etc.)

  2. 5

    Organisation of the data in a relational database, depending on the specific needs of each data provider.

  3. 6

    Mapping of the database fields according to international standards (see below)

  4. 7

    Upload of the mapped data to the central repository.

Each data provider is responsible for creating, maintaining, and updating the databases related to its own resources (biological collection and other information), and has direct and full control over them through the server node located in its own head offices. The internal structure of the databases, as well as the software tools and the input routines used to populate them, may vary in the different nodes, according to the specific needs of the data provider. As a matter of fact, at the time of the setup of each node, the internal structure, and the software tools to be used have been tailored to each provider’s type of data and requirements. This ensures that the data digitisation benefits the provider in the first place, allowing for the digital collections’ data to be easily accessible, well organised, and reliable, which in turn guarantees long-term sustainability to the initiative, since maintaining high quality databases is in the best interest of each data provider. To standardise the data across all digital collections, international standards for biological data mapping are used: each collection is mapped using Access to Biological Collection Databases (ABCD) and Darwin Core (DwC) schemas, ensuring full interoperability with all the major sharing initiatives at the global level.

Mapped data is then uploaded to a cloud repository powered by a server running PostgreSQL 11 (The PostgreSQL Global Development Group 2021) and PostGIS 2.4. This Data Base Management System (DBMS) has been chosen due to its open source philosophy, recognised reliability, high security standards and wide compatibility with different operating systems and web applications, as well as for its ease of use and the possibility to store and process geographic data.

The BioNoMo cloud repository aggregates primary biodiversity data and digital maps. Biodiversity data are constituted of observations of plants or animals with identification, geographic coordinates and date, complemented by several pieces of ancillary information that may vary depending on the type of observation. Primary biodiversity data are divided (by owner, taxonomic groups of reference and/or survey expedition) into “digital collections”. The following digital collections are currently hosted in the BioNoMo repository:

  • Mozambique National Forest Inventory 2006 (Marzoli 2007)—81,499 records

  • General Collection of the Natural History Museum of Maputo—11,769 records

  • Entomological Collection of the Natural History Museum of Maputo—7967 records

  • Ornithological Collection of the Natural History Museum of Maputo—3788 records

  • Mozambique Endemic and Near-Endemic Red Listed plant species (LMA)—2534 records

  • Herbarium of Eduardo Mondlane University (LMU)—1,059 records

  • Entomological Collection of the Plant Health Department (MADER)—3897 records

  • Occurrence and diversity of aquatic vertebrate species in Inhambane province—95,116 records

  • Occurrence and diversity of aquatic vertebrate species in Lake Niassa—28,435 records

  • Occurrence and diversity of aquatic vertebrate species in Zambezia province—37,108 records

The information hosted on the repository contains primary biodiversity data and digital maps accessible to the general public via a web portal and a webGIS. The BioNoMo portal (https://openscidata.org/bionomo) is a web application based on node.js (https://nodejs.org/en/), HTML5 and JavaScript. It allows users to retrieve, filter and download data from the digital collections hosted in the repository. The main page contains a “Basic search” field (Fig. 2) in which users can search observations by scientific name of the organism.

Fig. 2
figure 2

Header of the main page and “Basic search” field

The “Advanced search” (Fig. 3) allows for more complex filtering based on any combination of date range, province of observation and taxonomy. When users submit a query, the output is produced by sending the query to all the digital collections in the repository. ABCD and DwC concepts are used to retrieve the relevant information across all the collections, and the results are displayed in the form of a table with the fundamental information and separated by digital collection. For each occurrence, Catalogue Number, identification, location, date, and elevation (when available) of each observation are reported (Fig. 4). It is possible to sort and filter the results using the tools in the header of the table, and to download the full dataset in csv format for further elaboration in statistical analysis tools and spatial rendering in GIS applications.

Fig. 3
figure 3

“Advanced search” interface

Fig. 4
figure 4

Visualisation of search results

The webGIS (https://maps.openscidata.org/index.php/view/map/?repository=bionomo&project=Bionomo) allows for direct spatial rendering of the observation data from the digital collections in a web interface based on QGIS Server 3.16 (QGIS.org 2020) and LizMap 3.5 (Douchin and D’Hont 2021). Each digital collection is available as a layer and can be filtered and queried to retrieve the information related to the single observations. Several additional layers related to administrative boundaries, topography, protected areas, biodiversity hotspots and ecosystem classification, as well as base maps from Google, are also available (Fig. 5). Since 2021 (see “Impact and future perspectives”), the number and quality of additional layers has further increased, as BioNoMo webGIS has become the official repository of geographic data (see “Impact and future perspectives" for more information) for the Clearing House Mechanism (CHM) of the Convention on Biological Diversity (CBD). The geographic data provided by BioNoMo webGIS are also accessible via Web Map Service (WMS) and Web Map Tile Service (WMTS) protocols to be directly viewed in a desktop GIS. In addition, users can generate permalinks for direct access to configurable map layouts.

Fig. 5
figure 5

WebGIS interface

As well as being a federated network of data providers, since 2017 BioNoMo also publishes data on GBIF (see “Impact and future perspectives”), to make the biodiversity data of Mozambique available worldwide.

3 State of the art

BioNoMo is currently the largest aggregator of primary biodiversity data in Mozambique, and it is planned to grow further by aggregating new datasets.

To date, BioNoMo has aggregated a total of 273,172 records (Fig. 6), including 85,092 occurrence records of plants and 188,080 occurrence records of animals (41.2% terrestrial, 58.8% aquatic). The total record set encompasses 7 Phyla, 17 Classes, 118 Orders, 362 Families, 1170 genera and 2296 species.

Fig. 6
figure 6

Descriptive statistics of BioNoMo data. A Records by provider. B Proportion of aquatic and terrestrial observations. C Proportion of plants and animals observations

However, the aggregation of a new dataset into the federation is not an automated process. Each dataset must have a certain level of data quality, as well as a minimum mandatory set of data: a unique ID for each observation record, the scientific name of the taxon, latitude, and longitude. For consistency purposes, it is suggested that the coordinates follow the WGS84 geographic system in decimal degrees. In addition to the minimum mandatory dataset, many other data can potentially be included. Those which have been already implemented in the Web-App of BioNoMO are: date of survey or collection, collector or observer, locality of collection or observation, community in which the organism was surveyed and references. Furthermore, the use of ABCD, which lists ca. 1200 concepts, permits far more data for each observation record to be aggregated in the BioNoMo federation.

BioNoMo is also storing data on sensitive taxa. In this case, a relevant issue is given by showing the exact locations of occurrence, providing the risk of misuse of those sensitive data by parties with commercial interests. On the other hand, the exact location of sensitive taxa (e.g. species classified as threatened by the IUCN — critically endangered, endangered or vulnerable — e.g. important nursery colony of bats, den site of bears, otter holts, etc.) is an important source of information for decision-makers, such as the MTA and the Administração Nacional das Áreas de Conservação (ANAC). Hence, sensitive data will not be displayed online, but provided under strict safeguard policy, to avoid their misuse.

4 Impact and future perspectives

BioNoMo aims at supporting Mozambican Institutions in managing and using data on the biodiversity of the country. Several stakeholders will potentially benefit from it, especially if it will receive regular updates and grow with the planned aggregation of other datasets such as more artisanal fishery data from IIP and the digital collections of the Edward O. Wilson Laboratory in Gorongosa National Park. Practitioners will benefit from knowing which taxa are occurring in the areas they manage, to plan species- or habitat-specific conservation measures. Detailed and up-to-date knowledge of the distribution of “problematic” taxa, e.g. elephants, big predators etc., will help reduce the conflicts between human needs and wildlife conservation. Depicting the areas in which charismatic species live will also improve tourism, providing economic benefits to local communities, hence leading to an increased awareness of the values provided by wildlife conservation amongst local populations.

In 2017, BioNoMo partners participated, in association with the South African National Biodiversity Institute (SANBI) in a consortium for a project funded by the Biodiversity Information for Development (BID) programme promoted by GBIF. The project, named “Mobilizing primary biodiversity data for Mozambican species of conservation concern” resulted in the publication of the first primary biodiversity datasets generated in Mozambique (https://www.gbif.org/project/6QF1fqTDq0GkkkSuwKq024/mobilizing-primary-biodiversity-data-for-mozambican-species-of-conservation-concern). In total, 8 datasets were published, aggregating about 180,000 single observations. These data represent a key resource for scientific research about biodiversity in Mozambique and create potential opportunities for collaboration between BioNoMo partners and other research institutions at the global level, as also demonstrated by the high number of citations (more than 450 until July 2022) reached by the published datasets.

From 2018 to 2020, BioNoMo partners were also involved in several projects aimed at updating the IUCN Red List assessments for target species and ecosystems in Mozambique, identifying the Key Biodiversity Areas in the country. In these initiatives, coordinated by the Wildlife Conservation Society (WCS) and funded by the United States Agency for International Development (USAID), the primary biodiversity data provided by BioNoMo and the digital collection managers trained in the preliminary phases of the project played a key role in supporting the taxonomic experts group for the statistical analysis and spatial processing needed (Wildlife Conservation Society et al. 2021a, 2021b). The work developed through these initiatives culminated in 2021 with recognition from Mozambique’s Ministry of Land and Environment of 29 KBAs, covering a total of 139,947 km2 (MTA 2022).

Starting in 2021 BioNoMo webGIS has become the official repository of geographic data for the CBD—Clearing House Mechanism website in Mozambique (Biodiversity Information System of Mozambique—SIBMOZ; https://sibmoz.gov.mz/), developed by WCS for the MTA (Wildlife Conservation Society 2022). BioNoMo webGIS has been consequently integrated with additional layers of information on environmental protection areas, marine and terrestrial ecosystems, and important biodiversity areas. The provision of public access to biodiversity-related data is a major requirement for members of the Convention, and BioNoMo offered an easy and reliable solution to comply with this demand.

Academics will have a fundamental role in the development of BioNoMo and in its institutionalisation process a pool of outstanding local researchers must be selected as the Scientific Committee of BioNoMo, with the aim of:

  1. 1.

    Validating the datasets before aggregating them in the federated database system;

  2. 2.

    Deciding whether to provide data on sensitive species to interested parties. Monitoring the use of the sensitive data should be mandatory, to ensure the protection of sensitive taxa;

  3. 3.

    Promoting the use of BioNoMo to elaborate effective conservation strategies and defining the main strategies for its further development.

The outputs of the Scientific Committee will be received by the Network of data providers, which will in turn steer the activities of the digitization teams, and inform the Committee on the progress, challenges and difficulties encountered in the implementation of such activities. The Scientific Committee will be supported by international scientists (only during its early stages) to facilitate the transfer of biodiversity-related knowledge and technology and reduce the risk of "parachute" or “helicopter" science, where scientists from the global North do research in the global South without collaborating or sharing data, knowledge and/or skills with local scientists and authorities (Pettorelli et al. 2021; Stefanoudis et al. 2021). The collaboration with international scientists will also increase the opportunities for fundraising and collaborations with relevant initiatives and projects on a larger scale.

In the nearest future, BioNoMo will focus on:

  1. 1.

    Developing a priority system for data digitisation and publication, and selecting the most important resources which should be added to the system;

  2. 2.

    Expanding the network of local data providers estimating their consistency, scientific relevance and digisation readiness (i.e. the effort required to digitalise a resource).

  3. 3.

    Expanding the functionalities of the systems to combine species occurrence data (Petersen et al. 2021), with abundance and/or environmental data (Stephenson and Stengel 2020).

  4. 4.

    Defining and implementing relevant training programmes where students from various disciplines, including botany, zoology, ecology, environmental science and anthropology work together with students from mathematics, information technology and data science, to create a working database management system to collect, capture, store, process, query, share and use biodiversity data.

In 2013, the Global Biodiversity Informatics Outlook stressed the importance of cooperative networks between researchers, policymakers, and other stakeholders to encourage data sharing, integration and synthesis to support better decisions in conservation management (Hobern et al. 2013). In this respect, BioNoMo represents an important node of the network of national platforms needed if Africa is poised to become a world leader in biodiversity conservation, as it safeguards some of the largest wilderness areas and intact ecosystems in the world (Stephenson et al. 2017).