Keywords

1 Introduction

In all areas of science, many ontologies (or more broadly semantic artefactsFootnote 1) are used to represent and annotate data in a standardized manner. Semantic artefacts have become a master element to achieve the FAIR Data Principles [1] and have been discussed as research objects that themselves need to be FAIR [2,3,4]. However, those semantic artefacts are spread out, in different formats, of different size, with different structures and from overlapping domains. Therefore, there is a need for common platforms to receive and host them, to serve them, to align them, and to enable their reuse in miscellaneous communities and use cases. In other words, with the explosion in the number of ontologies and semantic artefacts available, ontology repositories or more broadly, semantic artefact catalogues are now mandatory.

Ontology repositories are usually developed to address the needs of certain communities. Their functionalities span from simple listings with metadata description (i.e., libraries) to rich platforms offering various advanced ontology-based services (i.e., repositories), including browsing, searching, visualizing, computing metrics, annotating, recommending, accessing data, assessing FAIRness, sometimes even editing. More generally, ontology repositories help ontology users to deal with ontologies without asking the users to manage them or to engage in the complex and long process of developing them. Plus, as with any other data, repositories help make ontologies FAIR (Findable, Accessible, Interoperable, and Re-usable) [5, 6].

The OntoPortal Alliance (https://ontoportal.org) is a consortium of several research and infrastructure teams and a company dedicated to promoting the development of ontology repositories—in science and other disciplines—based on the open, collaboratively developed OntoPortal open-source software. Teams in the Alliance develop and maintain several openly accessible ontology repositories and semantic artefact catalogues. These ontology repositories include BioPortal, the primary and historical source of OntoPortal code, but also AgroPortal, EcoPortal, MatPortal and more, as illustrated in Fig. 1. The OntoPortal Alliance's original motivation and vision [7] was to reuse outcomes and experiences obtained in the biomedical domain—an area where the use of ontologies has always been important—to serve and advance other scientific disciplines.

Fig. 1.
figure 1

Current public installations of OntoPortal. Missing installation(s) done by Cogni.zone (private) and many private deployments, as well as new portals in 2023.

In this paper, we present the OntoPortal technology as a generic resource to build ontology repositories or, more broadly, semantic artefact catalogues that can simultaneously co-host and fully support resources that span from SKOS thesauri to OBO, RDF-S, and OWL ontologies. We briefly review the span of OntoPortal-generic features, from the ones originally developed and provided by BioPortal [8, 9], to the new ones developed in the context of other projects such as AgroPortal or EcoPortal [10]. Then, we present the OntoPortal Alliance, the consortium maintaining the software as an open-source collaborative project. As an “evaluation” of our technology, we list the current uses of the OntoPortal technology, focusing mainly on the current and coming public and open repositories built with the technology maintained by the Alliance.

2 Related Work on Semantic Artefact Catalogues

This section reuses and updates elements in the introduction chapter of [45].

2.1 From Ontology Libraries and Repositories to Semantic Artefact Catalogues

With the growing number of developed ontologies, ontology libraries and repositories have been a long-time interest in the semantic web community. Ding & Fensel [11] presented in 2001 a review of ontology libraries: “A system that offers various functions for managing, adapting and standardizing groups of ontologies. It should fulfill the needs for re-use of ontologies.” Ontology libraries usually register ontologies and provide metadata description. The terms collection, listing or registry were also later used to describe similar concepts to ontology libraries. All correspond to systems that help reuse or find ontologies by simply listing them (e.g., DAML, Protégé or DERI listings) or by offering structured metadata to describe them (e.g., FAIRSharing, BARTOC, Agrisemantics Map). But those systems do not support any services beyond description, including services based on the content of the ontologies. In the biomedical domain, the OBO Foundry [12] is a reference library effort to help the biomedical and biological communities build their ontologies with an enforcement of design and reuse principles. A number of services and tools are built to work with this library of semantic artefacts.

Hartman et al. [13] introduced in 2009 the concept of ontology repository: “A structured collection of ontologies (…) by using an Ontology Metadata Vocabulary. References and relations between ontologies and their modules build the semantic model of an ontology repository. Access to resources is realized through semantically-enabled interfaces applicable for humans and machines.”. Multiple ontologies repositories have been developed since then, with advanced features such as search, metadata management, visualization, personalization, mappings, annotation and recommendation services, as well as application programming interfaces to query their content/services. Here again the biomedical domain has seen a lot of resources (not necessarily synchronized), such as the NCBO BioPortal [8], OntoBee [14], the EBI Ontology Lookup Service [15] and AberOWL [16]. We have seen also repository initiatives such as the Linked Open Vocabularies [17], OntoHub [18], and the Marine Metadata Initiative’s Ontology Registry and Repository [19] and its earth science counterpart, the ESIP Federation's Community Ontology Repository. By the end of the 2000’s, the topic was of high interest as illustrated by the 2010 ORES workshop [20] and the 2008 Ontology Summit.Footnote 2 More recently, the SIFR BioPortal [21] prototype was built to develop a French Annotator and experiment with multilingual issues in BioPortal [22]. The first reuse of the OntoPortal technology to develop a free and open, community-driven ontology repository in the spirit of BioPortal, but for agri-food, was AgroPortal, started at the end of 2015 [23]. D’Aquin & Noy [24] and Naskar and Dutta [25] provided the latest reviews of ontology repositories.

In parallel, there have been efforts to index any semantic web data online (including ontologies) and offer search engines such as Swoogle and Watson [26, 27]. We cannot consider these “semantic web indexes” as ontology libraries, even if they support some features of ontology repositories (e.g., search). Other similar products are terminology services or vocabulary servers which are usually developed to host one or a few terminologies for a specific community (e.g., SNOMED-CT terminology server, UMLS-KS, CLARIN vocabulary services, OpenTheso, etc.); they are usually not semantic web compliant and did not handle the complexity of ontologies, although an increasing number of terminology services are getting compliant with SKOS (Simple Knowledge Organization System) [28]. We can also cite the ARDC Research Vocabularies Australia (https://vocabs.ardc.edu.au) using multiple technologies such as PoolParty and SSISVoc.

In the following, we will focus on ontology repositories considering they offer both ontology-focused services (i.e., services for ontologies) and ontology-based services (i.e., services using ontologies). We will also name them now semantic artefact catalogues, a term which emerged in the forum and discussions around building the European Open Science Cloud (e.g., [29]) and which translates the idea that such catalogues are not only for ontologies but must offer common services for a wide range of semantic artefacts.

2.2 Generic Ontology Repository and Semantic Artefact Catalogue Technology

In the end of the 2000’s, the Open Ontology Repository Initiative (OORI) [30] was a collaborative effort to develop a federated infrastructure of ontology repositories. At that time, the effort already reused the NCBO BioPortal technology [31] that was the most advanced open-source technology for managing ontologies at that time. Later, the initiative studied OntoHub [18] technology for generalization but the Initiative is now discontinued.

In the context of our projects, to avoid building new ontology repositories from scratch, most of the authors have considered which of the technologies cited above were reusable. While there is a strong difference between “open source” (most of them are) and “made to be reused” we think only the NCBO BioPortal and OLS were really generic ontology repository candidates for both their construction and documentation. OLS technology has always been open source but some significant changes (e.g., the parsing of OWL) facilitating the reuse of the technology for other portals were done with OLS 3.0 released in December 2015. Until very recently (2022), in the context of the NFDI projects (https://terminology.tib.eu), we had not seen another public repository built with OLS. On the other hand, the NCBO BioPortal was developed from scratch as a domain-independent and open-source software. Although it has been very early reused by ad-hoc projects (e.g., at OORI, NCI, and MMI), it is only in 2012, with the release of BioPortal 4.0 that the technology, made of multiple various components was packaged as a virtual appliance, a virtual server machine embedding the complete code and deployment environment, allowing anyone to set up a local ontology repository and customize it. The technology is denoted as OntoPortal since 2018.

Skosmos [32] is another alternative originally built in for reuse, but it only supports browsing and search for SKOS vocabularies. For instances Finto (https://finto.fi) or Loterre (www.loterre.fr) have adopted Skosmos as backend technology. Another example is VocPrez, an open-source technology developed by a company adopted for examples by the Geoscience Australia Vocabularies system (https://vocabs.ga.gov.au) or by the NERC Vocabulary Server (http://vocab.nerc.ac.uk). Another technology is ShowVoc, based on the same technological core as VocBench but it appears to have drawn inspiration from OntoPortal in terms of its design and services.

A full comparison of the different semantic artefact catalogue technologies is not the subject of this paper, but we strongly believe the OntoPortal technology implements the highest number of features and requirements in our projects. Indeed, there are two other major motivations for reusing this technology: (i) to avoid re-developing tools that have already been designed and extensively used and instead contribute to long term support of a shared technology; and (ii) to offer the same tools, services and formats to multiple user communities, to facilitate the interface and interaction between domains and interoperability.

3 OntoPortal Technology

The OntoPortal virtual appliance is mainly made available as an OVF file (Open Virtualization Format) to deploy on a server. Amazon Machine Instances are also available. Once installed, an OntoPortal instance provides an out-of-the-box semantic artefact catalogue web application with a wide range of features. A demo server can be visited at https://demo.ontoportal.org. Administrators of the platform can then include the desired semantic artefacts directly, reach out to their users to let them upload resources, or both. In the following, we review the OntoPortal architecture and default OntoPortal services—many of these have been presented and published already in the context of referenced publications of BioPortal or subsequent projects. Then, we describe the latest services and functionalities developed by members of the Alliance that are being discussed and step-by-step included in the main code branch when relevant.

3.1 OntoPortal Standard/Default Technical Architecture

OntoPortal is a complex system composed of multiple –coherently connected– stacks depending on the services implemented. Most of the components (listed in Table 1) are developed in Ruby (www.ruby-lang.org). Sometimes, they rely or reuse third party technologies, especially in the storage layer.

Fig. 2.
figure 2

OntoPortal system architecture.

The OntoPortal system architecture is presented in Fig. 2. It is structured in several layers briefly described here:

  • The storage layer is mainly made of a triple-store which saves each semantic artefact RDF content in a distinct graph, as well as other data (metadata records, mappings, projects, users, etc.). We have always used 4store (https://github.com/4store), a very efficient and scalable RDF database. The technology being outdated, we are transitioning to other triple-stores. This layer also uses: (i) Redis-based key-value storage for application caches and the Annotator dictionary datastore; (ii) Solr search engine (https://solr.apache.org) to index and retrieve ontologies content data with the Search service.

  • The model layer implements all the models (objects) of the business logic and the mechanisms to parse the semantic artefact source files using the OWL-API (https://github.com/owlcs/owlapi) and persist/retrieve them from the triple-store using our built-in Object-Relational-Mapping-like library, called GOO.

  • The service layer, with Ruby/Sinatra (https://sinatrarb.com), implements the core OntoPortal services working with the models: Search, Annotator and Recommender. When necessary, these services rely on specific storage components and external tools (e.g., Mgrep concept recognizer [33]). A command line administration tool was also integrated to do jobs monitoring and managing the integrity of the system.

  • The Application Programming Interface (API) layer implements a unified application programming interface for all the models (e.g., Group, Category, Class, Instance, Ontology, Submission, Mapping, Project, Review, Note, User) and services supported by OntoPortal. The API can return XML or custom formats, but the default and most-used output is JSON-LD, which uses JSON to encode RDF.

  • The user interface is a typical web application built mostly with Ruby On Rails (https://rubyonrails.org), a popular open-source framework written in Ruby. The user interface offers a set of various views to display and use the services and components built in the API layer. The user interface is customized for logged-in users and for groups/organizations that display their own sub-set of resources using the slices feature. Administrators of the OntoPortal instance have access to an additional administration console to monitor, and manage the content of the portal.

Table 1. OntoPortal components code repositories (https://github.com/ontoportal).

3.2 Default OntoPortal Services

Ontology Public/Private Hosting, Grouping, Organization and Slices: When OntoPortal is installed as a publicly visible web application in its default configuration, end usersFootnote 3 can self-register and upload artefacts themselves to the repository. New artefacts are publicly visible by default, for anyone to find, use, and download. OntoPortal also allows private ontologies, which can be managed by or made visible to any number of other users. This allows ontology work to be performed without the ontology being publicly visible, or a subset of ontologies to be visible only within a certain community. Plus, OntoPortal allows logged-in users to specify the list of ontologies to display in their own user interface. Within an installation, semantic resources are organized in groups and/or categories that are specialized by each portal. Typically, groups associate ontologies from the same project or organization whereas categories are about the ontology subjects/topics. OntoPortal also offers a “slice” mechanism to allow users to interact (both via API and UI) only with a subset of ontologies in an installation. If browsing the slice, all the portal features will be restricted to the chosen subset, enabling users to focus on their specific use cases.

Library, Versioning and Search:

The primary mission of an OntoPortal installation is to host and serve ontologies and semantic artefacts. The portals accept resources in multiple knowledge representation languages: OWL, RDF-S, SKOS, OBO and UMLS-RRF.Footnote 4 Ontologies are semantically described with rich metadata (partially extracted from the source files), and a browsing user interface allows to quickly identify, with faceted search, the ontologies of interest based on their metadata. The portal can also consider some resources as “views” of main ones. The technology is not a version control system like GitHub—which provides complementary services—but will store all ontology versions (called “submissions”), whether manually submitted or automatically pulled.Footnote 5 Each version’s metadata record is savedFootnote 6 and differences from one version to the other are computed, which enables a historical overview of the ontology as it evolves. Only the latest versions of ontologies are indexed and loaded in the backend but all source files and diffs are available. Beyond the metadata record, OntoPortal loads each ontology’s content in a triple-store and indexes the content (classes, properties and values) with Solr to allow searching across the ontologies by keyword or identifier.

Ontology Browsing and Content Visualization:

OntoPortal lets users visualize a class/concept or property within its hierarchy, as well as see related information for this entity (as relations included in the source file).Footnote 7 Some key properties (e.g., preferred labels, synonyms, definitions), even if encoded by custom properties in a given source file, are explicitly “mapped” (by the portal or the submitter) to a common model that offers a baseline for OntoPortal services. For each ontology, several web widgets (e.g., “Autocomplete jump-to term” or “Hierarchy tree”) are automatically provided and can be embedded in external web applications to facilitate the reuse/visualization of ontology entities.

Mappings:

Another key service of OntoPortal is a mapping repository that stores 1-to-1 mappings between classes or concepts. The mappings in OntoPortal are first-class citizens that can be identified, stored, described, retrieved and deleted. Mappings can be explicitly uploaded from external sources and reified as a resource described with simple provenance information and an explicit relation (e.g., owl:sameAs, skos:exactMatch). The portal automatically creates some mappings when two classes share the same URI (indicating reuse of that URI) or the same UMLS CUI and generates simple “lexical mappings” with the LOOM algorithm [34]. Although the LOOM mappings are not semantic (based only on syntactic matching), they quickly indicate the overlap of an ontology with all the other ones in a portal, and suggest possible terms to investigate in other ontologies. OntoPortal does not yet support the new Simple Semantic Standard for Ontology Mapping (SSSOM) format [35], although steps toward this have started in the AgroPortal project.

Community Feedback, Change Requests and Projects:

OntoPortal includes some community-oriented features [36] such as: (i) Ontology reviews: for each ontology, a review can be written by a logged-in user; this feature is currently rebuild and thus deactivated by default. (ii) Notes can be attached in a forum-like mode to a specific artefact or class/concept, in order to discuss the ontology (its design, use, or evolution) or allow users to propose changes. (iii) Change requests can be submitted, and in some cases directly transferred to external systems such as a GitHub issue tracker. (iv) Projects can be defined, and their use of specific ontologies recorded, to materialize the ontology-project relation and demonstrate concrete uses of an ontology. Ontology developers (or any registered users) can subscribe to email notifications to be informed each time a user note or mapping is added to their ontologies of interest.

Ontology-Based Annotation with the Annotator:

OntoPortal features the Annotator, a domain-agnostic text annotation service that will identify semantic artefact classes or concepts inside any raw text [37]. The user can control which ontologies are used to perform the text annotation. The Annotator workflow is based on a highly efficient syntactic concept recognition tool (using concept names and synonyms) [33], and on a set of semantic expansion algorithms that leverage the semantics in ontologies (e.g., subclass relations and mappings). It is also used as a component of the system to recommend ontologies for given text input, as described hereafter.

Ontology Recommendation with the Recommender:

OntoPortal includes the Recommender an ontology recommendation service [38] which suggests relevant semantic artefacts for a provided text or keyword list. The Recommender evaluates the relevance of an ontology according to four different criteria: (1) the extent to which the ontology covers the input data; (2) the acceptance of the ontology in the community (number of views in the portal); (3) the level of detail of the ontology classes that cover the input data; and (4) the specialization of the ontology to the domain of the input data. The user can configure the weights of the four criteria, and can choose to rank the most relevant individual ontologies, or sets of ontologies. The OntoPortal Recommender is arguably the most powerful ontology discovery and recommendation tool available in public semantic repositories.

Automated Access: REST API and SPARQL Endpoint:

OntoPortal provides two different endpoints for accessing its content: (i) a REST web service API that returns JSON-LD; (ii) a SPARQL endpoint [39]. These endpoints are consistent across all the OntoPortal deployments, so software written originally to query a specific portal (most commonly BioPortal) can be used equally well to query any other OntoPortal deployment, assuming that deployment makes its endpoints accessible. The REST web service API provides access to all the resources (read/write) and services described above, and queries are highly customizable using various request parameters. To efficiently handle large result sets, pagination is available for the majority of the endpoints. The SPARQL web service provides direct read-only access to the OntoPortal triple store. Since OntoPortal is developed to work with semantic web technologies and artefacts, all of its content (ontologies, mappings, metadata, notes, and projects) is stored in an RDF triple store. For security, some OntoPortal installations (like BioPortal itself) choose not to make the primary triple store queryable. In these cases, a copy of the triple store can be made accessible [39].

Many external applications developed by the biomedical semantics community to use BioPortal can be adapted to work with any other portal; examples include OntoMaton, OntoCAT, Zooma, Galaxy, REDCap, and FAIRsharing. More recently, we have seen tools developed directly considering multiple OntoPortal endpoints e.g., https://github.com/cthoyt/ontoportal-client, and we are discussing federating search and other queries/services across all public OntoPortal systems.

3.3 Additional Features and Services Developed by the Alliance

The features described above were originally developed for BioPortal and after adopted and sometime improved by members of the Alliance, which now is established as the baseline for OntoPortal technology. However, with new adopters and use cases, the Alliance has proposed new ideas and developed new functionalities. The additional features presented in this section have been developed in the context of the SIFR BioPortal [21], AgroPortal [23], and EcoPortal [10] projects. The Alliance is now incorporating some of these contributions into the core OntoPortal code.

Enhanced Mapping Features:

AgroPortal enhanced its mappings repository with several more advanced features. Originally any mapping’s source and target object could only be in the local repository, but AgroPortal added the ability for the target entity to be in another instance of the OntoPortal technology (‘inter-portal’) or in any external semantic resource. AgroPortal can also import mappings in bulk from a JSON fileFootnote 8 (submitting multiple mappings previously required multiple calls to the API). AgroPortal can also recognize SKOS mappings explicitly defined in semantic artefact source files, and can serve those mappings (both in UI and API) alongside all of the other ones in the mapping repository.

Enhanced Semantic Annotation Workflow:

The SIFR BioPortal offers natural language-based enhancements of the Annotator making it first available for French text, but also adding three scoring methods to rank semantic annotations by order of importance [40]. It also introduced significant improvements that support clinical context detection (i.e., in the context of clinical text, the Annotator can detect negation, experiencer—the person experiencing the symptom or event—and temporality) [22] that were eventually made available for English in OntoPortal.

Extended Ontology Metadata Model and Instances:

To facilitate the ontology identification and selection process and promote FAIRness, AgroPortal implemented an extended metadata model based on MOD1.4 [41] to better support descriptions of ontologies and their relations. Such a model enabled multiple features in the portal [5] such as: additional filtering options when selecting ontologies, a Landscape page which shows synthetized metadata-based analytics for all the ontologies in the portal, and FAIRness assessment. AgroPortal also now supports OWL instances—in addition to the classes and properties in the standard OntoPortal—and displays the instances in the user interface.

Ontology FAIRness Assessment with O’FAIRe:

AgroPortal implemented the Ontology FAIRness Evaluation (O’FAIRe) methodology [6, 42] in a tool that automatically assesses the level of FAIRness –i.e., to which degree a digital object adheres to the FAIR principles–of semantic artefacts within the portal. The assessment relies on 61 questions answered using the extended ontology metadata or the portal’s own services. When working on O’FAIRe, we demonstrated the importance of relying on ontology repositories to harmonize and harness unified metadata and thus allow FAIRness assessment.

Extended SKOS Support:

AgroPortal added new functions to support SKOS resources as the standard OntoPortal code is still limited [43]. The new functions handle and represent SKOS concept schemes, collections and SKOS-XL elements if used in the displayed thesaurus. AgroPortal offers state-of-the-art innovative browsing approaches to discover and navigate concepts in SKOS thesauri that make extensive use of scheme and collections.

Assigning DOIs and Connecting with VocBench:

EcoPortal development focused on improving the provenance aspects of OntoPortal and supporting the collaborative creation and maintenance of the semantic artefacts, in particular of SKOS thesauri. EcoPortal added the ability to: (i) graphically administrate groups and categories of semantic artefacts and (ii) request a Digital Object Identifier (DOI) for resources hosted in the portal using Datacite services (https://datacite.org/dois.html). The DOI assignment depends on an editorial workflow that evaluates the ontology’s maturity and pertinence to the ecological domain. To support collaborative work on the portal’s semantic artefacts, EcoPortal integrated a connector to the VocBench 3 system (https://vocbench.uniroma2.it), which provides a web-based, multilingual, collaborative development capability.

4 OntoPortal Open-Source Project Organization

The OntoPortal Alliance has a main goal of synchronizing and sharing research and development efforts. The group’s motivations are: (i) to represent OntoPortal adopters and end users; (ii) to maximize the OntoPortal state-of-the-art service portfolio; (iii) to improve OntoPortal software while managing several parallel and different installations; (iv) to increase semantic uptake in science communities; and (v) to increase the ecosystem’s long-term support.

The Alliance is committed to be an open community and is working to ease participation by providing installation and deployment procedures, detailed documentation, and in the future training and tutorials for all stakeholders. We are spending a considerable amount of time to create a resource for the community and support it (e.g., average of 4 support emails per day for BioPortal, 3 per week for AgroPortal) and document it. In 2022, we launched an annual 3-day-workshop [44] that we see as key to fostering our growing community. The Alliance is setting-up three documents (https://ontoportal.github.io/documentation) to reach multiple stakeholders: (i) A user guide documents domain- and portal-specific capabilities. This targets OntoPortal end users—either ontology developers who want to host an ontology on one of the portals, or users who want to access and reuse ontologies. The user guide can be specialized by each project in the Alliance to adjust to specific needs of a community or to document a portal specific feature. (ii) An admin guide documents how to set up the system and manage the content. This is typically addressed to the technical person involved in deploying, running and monitoring the server but also to the content administrator who will supervise the semantic artefacts loaded, perform artefact curation, and provide outreach to the end users. (iii) A developer guide documents how to develop new features and make contributions to the core technology, thereby sharing work back with the rest of the Alliance.

The code packaged and running within the appliance is available on the OntoPortal GitHub (https://github.com/ontoportal) and licensed BSD-2, so every administrator or developer can easily get the relevant branches or forks and redeploy the code in the appliance. We strongly encourage Alliance partners (and other open-source contributors) to fork the OntoPortal GitHub repositories to enable traceability and collaborative contributions via their pull requests. Besides code sharing, we use GitHub for issues, discussions, decision-making and overall project management. Wherever feasible the OntoPortal project follows best practices for developing and supporting open-source projects. The documentation and the OntoPortal website materials are being enhanced and made more community-maintainable. Both have been ported to GitHub too and can be maintained by the community as any other technical projects (Table 2). These mechanisms allow all community members to improve the public presence of both the software and the organization.

The key idea behind the deployment of community- or domain-specific semantic repositories is to provide closer and better connections to the end users in those communities. To address its user communities, the Alliance relies on the support provided by each project deploying an installation. We have implemented a free licensing system that we use to trace the many applications of the technology, identify potential collaborators and Alliance members, and maintain an ongoing connection to the adopters, so that we can notify them of timely improvements and get feedback. In some cases, if the Alliance collaborates with industry or the private sector, we discuss the appropriate terms and conditions and possible financial participation.

Table 2. OntoPortal project description repositories (https://github.com/ontoportal).

5 Usage of the OntoPortal Technology

In place of an evaluation section, we hereafter briefly present the uses of our technology: either by public and open repositories but also local, private and temporal uses. We believe the choices to reuse our technology made by such a large variety of projects and use cases, in multiple scientific domains, is the best assessment of its value.

5.1 Current Open Domain or Project Specific OntoPortal Installation

In September 2022, we conducted a survey among the 10 main Alliance participants to date (Table 3). We obtained a sense of a typical OntoPortal installation: a public (open and free) community repository where anyone can contribute ontologies, with on average 50/60 ontologies, of which more than 50% are exclusive to that repository (that is, unavailable in any other OntoPortal repository). The content is generally multilingual (77%)—despite a lack of support for this in the core software—and mostly in OWL or SKOS format. Ontologies are mostly added by content administrators performing significant or moderate content curation, even when end users can also add ontologies. This tends to change with broader adoption of the portals. Indeed, a typical OntoPortal installation is concretely used by a few groups (dozens to hundreds of users) and animated by a 1-to-3-person team. If a developer is in this team, the portal tends to develop new functionalities.

In the survey, we requested and received detailed information about several facets of the projects. The most important reason people wanted to run an OntoPortal instance was the value of running a community-specific ontology repository, while the least important reason was BioPortal’s reliability. Most installations have not determined any policy for adding ontologies to the collection, and had relatively little outreach (it is likely too soon for many projects). There is a lot of interest in adding diverse features, and several responses alluded to improving the ability to re-use ontologies in various ways. Many groups expressed interest in leading a shared development activity.

Table 3. Current members of the OntoPortal Alliance as of early 2023.

5.2 Other Running Installations of the OntoPortal Technology

Beyond the domain-specific portal reuses in the Alliance, the OntoPortal technology is deployed by many external parties with other objectives. For instance, hospitals reuse the technology in-house to use services such as the Annotator on sensitive data.

In the past, those uses of the OntoPortal technology were hard to track since users provide no feedback or report to the OntoPortal providers unless they need explicit support. Through 2015, the virtual appliance file was downloaded or deployed from Amazon Machine Images more than 140 times. Since version 3.0 of the OntoPortal software in 2020, the appliance incorporates a “call home” feature and a free registration solution that together help track the number and status of other OntoPortal installations. In the past 3 years, 98 unique accounts have registered 135 OntoPortal appliances. In 2022, 60 unique appliance IDs called home including 19 running in Amazon Machine Instances. These numbers demonstrate the large adoption of the OntoPortal technology beyond the Alliance and public repositories.

6 Perspectives and Discussion

The Alliance members are working on multiple technical improvements of the OntoPortal technology. These improvements include: (i) Multilingual support for artefact content; (ii) internationalization of the user interface; (iii) Fully SSSOM-compliant mapping repository with enhanced mapping features e.g., connect to third party tools for ontology alignment; (iv) Docker container-based setup/installation of OntoPortal; (v) SPARQL query editor and viewer; (vi) Consolidated and harmonized metadata model; (vii) Historical view of the evolution of a semantic artefact (metrics, differencing); (viii) Decoupling dependencies on current triple-store backend to support alternative triple-stores; (ix) Refactoring the feedback and notes mechanism and connect modification requests to codebase repositories such as GitHub; (x) Accelerated, simplified, and more transparent ontology submissions; (xi) Federated search capabilities across multiple repositories; (xii) New and improved user interfaces; and (xiii) improve the overall system performance.

With the proliferation of semantic artefact catalogues––either using the OntoPortal technology or not––the semantic ecosystem for scientific research becomes more complex for many end users. Because scientific domains and projects overlap, some ontologies might be hosted in multiple portals, or conversely a set of ontologies may be split across multiple portals, and different versions of the same ontology may be presented in different catalogues. Our philosophy is that ontology developers should decide how their ontology should be deployed, whether in one or many portals. On our side, we will work to provide the best possible federation of our portals. Our challenge is to better coordinate to be sure semantic artefact metadata and versions are synchronized, ontology developers are aware where their resources are deployed (without having to explicitly deal with multiple portals), and our services federate results of services like search, annotation and recommendations.

In the context of FAIR-IMPACT, a Horizon Europe project within the European Open Science Cloud, some members of the Alliance and other parties are investigating the life cycle of FAIR semantic artefacts. We are reviewing governance models for semantic artefacts and discussing the role the catalogues have to play in this governance. Within this project, we are also building on the Metadata for Ontology Description and Publication initiative (https://github.com/FAIR-IMPACT/MOD) to provide a DCAT-based vocabulary to describe semantic artefacts and their catalogues. We expect to make these descriptions available via a standard application programming interface that ontology repositories and semantic artefact catalogues, including the OntoPortal technology and also extending beyond it, can implement to improve their interoperability and ease the reuse of their resources.

The OntoPortal Alliance has many other development opportunities beyond the ongoing tasks described above. Each member of the Alliance brings unique vision and potential for improvements to the software. These improvements will make the OntoPortal systems steadily more interoperable, more interconnected, more powerful, and easier to install, operate, and update. However, these development opportunities are not without challenges. Creating a common open-source project, and an emerging organization and governance model to coordinate changes and evolutions, only begin the work needed for a robust collaborative software capability. Achieving successful results require a combination of factors: community commitment to a common, yet improvable, technical solution; technical approaches for accepting modifications that allow each participating organization autonomy to select its own system configuration; funding commitments that allow both the OntoPortal Alliance and its contributing members to thrive; and continuing buy-in to the common mission of the Alliance.

7 Conclusion

In the semantic web and linked open data world, the impact of BioPortal is easily illustrated by the famous Linked Open Data cloud diagram that since 2017 includes ontologies imported from the NCBO BioPortal (most of the Life Sciences section): We like to duplicate this impact in multiple scientific domains. In [5, 6], we argued about the importance of ontology repositories to make semantic artefacts FAIR. In this paper, we have presented a domain-agnostic, open and collaboratively developed technology for building such repositories.

The demand for semantic services can be seen not just in the growing deployment of OntoPortal systems, but the increasing presence and capabilities of other semantic artefact catalogues, often developed on their own with ad-hoc technology and brand-new code. We believe the timing––and community maturity––is right to invest some energy in a common, shared yet customizable and adaptable technology for semantic artefact catalogues. The OntoPortal Alliance, with its OntoPortal technology and world-wide base of members and system users, is uniquely positioned to move semantic artefact catalogues and ontology repositories to the next level of adoption and value to the research community. We anticipate accelerated progress and engagement from researchers, Alliance members, funders, and sponsors as we pursue the Alliance mission.

Resource Availability Statement