Keywords

1 Introduction

The FAIR Guiding Principles articulate the behaviors expected from digital artifacts that are Findable, Accessible, Interoperable and Reusable by machines [1]. Although arguably an established term already, the FAIR Principles do not explicitly consider actual implementation choices enabling FAIR behaviors [2]. For example,

  • Principle F2 states that data should be described with rich metadata, but leaves the definition of “rich” and other findability requirements to the discretion of the domain community (which varies from one stakeholder, domain, and application to another, e.g. CERIF, DataCite Metadata Schema or ISO 19115/ISO 19139);

  • Principle I1 requests that a formal, accessible, shared, and broadly applicable language for knowledge representation be used to embed machine-actionable semantics (e.g., RDF/OWL, RuleML, CycL) but it gives no recommendation on how to select the best option in any particular use case;

  • Principle R1.1 requests that data and metadata be released with clear and accessible usage licenses, but does not specify which of the many digital licensing schemes should actually be applied (e.g., Creative Commons Attribution 4.0 International Public License or Open Data Commons Public Domain Dedication and License).

In each case, the FAIR Principles leave implementation choices to the communities of practice, permitting maximum freedom to operate while at the same time ensuring a high degree of automated Findability, Accessibility, Interoperability, and Reusability. This freedom to operate, while necessary and desirable, has led to the development of a variety of technical solutions which hold the inherent risk of reducing compatibility between stakeholder communities. For example, although initiatives like the European Strategy Forum on Research Infrastructures (ESFRI) or the Research Data Alliance (RDA) are driving the adoption of FAIR practices, different domain communities nonetheless have their own, often well-established implementation preferences and priorities for data reuse. Hence, coordinating a broadly accepted, widely used FAIR implementation approach still remains a global challenge.

In an effort to accelerate broad community convergence [3] on FAIR implementation options, the GO FAIR FIP Working Group [4] launched the development of the FAIR Convergence Matrix, a collaborative online resource consisting of all the FAIR implementation choices made by different domain communities [5]. This ongoing activity aims to create a machine-actionable description of the emerging FAIR implementation landscape. This will enable stakeholders to systematically optimise implementation choices with respect to, for example, more streamlined FAIR deployments while at the same time securing some guarantees on the FAIR maturity levels of those deployments and the degree of interoperation that can be expected with Resources created by other communities.

In this paper we first describe the different components of the FAIR Implementation Conceptual Model and the workflow for the creation of community-specific FAIR Implementation Profiles (Sect. 2). In Sect. 3 we discuss the potential benefits of this approach and how the FAIR Implementation Profile relates to the FAIR Principle R1.3 and why this contribution is novel in relation to previous work. In Sect. 4 we conclude by describing upcoming activities and planned improvements to the ongoing work.

2 The FAIR Implementation Profile Conceptual Model and Its Supporting Components

2.1 FAIR Implementation Profiles

The FAIR Implementation Profile (FIP) conceptual model [6] is based on the developing GO FAIR Ontology [7] and is composed of two principal concepts: FAIR Implementation Community and the FAIR-Enabling Digital Resource.

By FAIR Implementation Community (Community) we mean a self-identified organization (composed of more than one person) sharing a common interest that aspires to the creation of FAIR data and services. Typically, a Community forms around a knowledge domain or in the participation in a research infrastructure, or in the commitment to a policy jurisdiction such as those found in a university, a hospital, a province or a county. As such, Communities can be formal (e.g., scholarly society) or informal (e.g., working groups), large or small, influential or not, long-lived (industry associations) or temporary (e.g., funded projects). It may also be useful to identify sub-communities that may be related to specific repositories when dealing with different types of resources (e.g., sensors). In any case, a Community must itself be represented with FAIR (meta)data, by procuring a globally unique and persistent resolvable identifier (GUPRI) usually via a registration process. Every Community registers a Community Data Steward (a single person representing data stewards of the Community who provides a contact point for FIP creation and who likely works in a team of experts coordinating FIP development).

By FAIR-Enabling Digital Resource (Resource) we mean any digital object that provides a function needed to achieve some aspect of FAIRness and is explicitly linked to one or more FAIR Principles. Resources include for instance datasets, metadata, code, protocols, compute resources, computed work units, data policies, data management plans, identifier mechanisms, standards, FAIRification processes, FAIRness assessment criteria and methods, data repositories and/or supporting tools. We define an Implementation Choice as the decision of a Community to reuse a Resource from among existing implementations. If, however, none of these appear suitable, the Community may then accept the Implementation Challenge to create and implement a new solution to solve the identified gap (note that every Resource that forms a Choice, was itself once a Challenge). Choices and Challenges are made on the basis of Considerations that involve numerous Community-specific factors including FAIR Requirements and various sources of Constraints endemic to the Community.

Since early 2019, prototype FIPs have been created for roughly 50 communities (including ESFRIs [8] and projects like ENVRI-FAIR [9]) as a means to achieve practical development of the conceptual model and its representation. An advanced example of a FIP created by the GO FAIR Virus Outbreak Data Network can be found in both human-readable (PDF) [10] and machine-actionable (JSON) formats [11].

2.2 FAIR Implementation Questionnaire

Although Community Data Stewards may build a FIP de novo, in practice the task can be facilitated and standardized when they are prompted via a questionnaire to systematically list the implementation choices that correspond to each of the FAIR Principles. These choices are drawn from an accumulated listing of existing or proposed Resources. The GO FAIR FIP Working Group has developed the FIP questionnaire in a series of hackathons since January 2019, carefully aligning questions and accommodating the complex space of potential answers with the aim to ensure machine-actionable FIPs. The current version 4.0 questionnaire (with 21 questions covering the FAIR Principles) is accessible on GitHub [12].

A tool in which the FIP questionnaire is currently implemented is the Data Stewardship Wizard (DSW) [13]. The DSW platform provides an efficient means to capture implementation Choices and Challenges by directly linking to canonical references for Resources issued in public registries, such as FAIRsharing.org (see Fig. 1). In turn, the DSW tool enables the FIP to be output in various file formats, both human and machine-readable including the development of custom export templates. In this case, the DSW has been repurposed from its original application as a data management/stewardship planning tool into a FIP capture tool by substituting the data stewardship knowledge model (i.e., extensible and evolvable definition of a questionnaire) with a newly created one corresponding to the FIP questionnaire. As such, we refer to the new knowledge model and interface as the “FIP Wizard” which is publicly accessible [14].

Fig. 1.
figure 1

An implementation choice of the VODAN Community in the FIP Wizard for F1

2.3 FIPs as FAIR Digital Objects

FIPs created in the FIP Wizard can be represented as collections of assertions having the form <Community><Chooses to reuse><Resource> or <Community><accepts the Challenge to build><Resource>. All assertions having the same Community as the subject compose the FIP for that Community. This graph structure where a single subject has multiple predicate-object pairs is called a Knowlet [15]. The Knowlet structure of the FIP can itself be encapsulated as a FAIR Digital Object (FDO) having GUPRIs, type specifications and other FAIR metadata components [16]. As new FIPs are created, and existing FIPs are revised with alternative choices or extended when novel technologies are introduced, the FIP FDO is updated and versioned with provenance trails. These features allow FIPs to have ownership/authorship, to be cited, and will therefore accumulate value to its creators. This will incentivise the ongoing curation and maintenance of the FIP by its Community and garner reputation and trust that engenders reuse by others when making their own FAIR implementation choices. Moreover, applications that perform automated inference over Knowlets will open a range of potential analyses assisting in the optimization of FIPs or clusters of FIPs with respect to well-defined convergence objectives. Because the FIP Wizard captures and outputs Community-specific FIPs as JSON, we have written custom pipelines to convert the FIP Wizard format to nanopublications [17] that can then be permanently published on the decentralized, federated nanopublication server network [18].

2.4 The FIP Convergence Matrix

Over time, as numerous Communities independently create FIPs (whether manually or via tools such as the FIP Wizard) it will be possible to accumulate a comprehensive listing of FAIR-Enabling Resources reflecting the current technology landscape supporting FAIR data and services. Based on patterns of use and reuse of existing Resources, transparent strategies for optimal coordination in the revision of existing, or the creation of novel FIPs could be derived.

For example, Fig. 2 depicts an idealized repository of FIPs, each column representing a Community, each row a Resource linked to the appropriate FAIR Principle(s). The list of implementation choices for each principle might be tediously long but will be filterable on a variety of criteria including the frequency of its use in other research domains, its FAIR maturity level, or its endorsement by trusted organizations such as funding agencies.

Fig. 2.
figure 2

FIP Convergence Matrix with registered Community Choices regarding the use of FAIR-Enabling Resources, which are made available for reuse by other Communities. (Color figure online)

FIPs may be similar or divergent, but in any case, are likely to compose a unique ‘signature’ for each Community. In its simplest formulation, for each Resource listed in rows, a Community may choose to either use (1) or not use (0) that Resource. In this idealized ‘binary’ limit the FIP could be represented as a bit string (for example, the FIP for Community C in Fig. 2 would be represented as {0,1,0,1,0,1,0,1}). In this binary vector representation, the FIP composes a community-specific ‘fingerprint’ that can be used to map the similarity distribution of FAIR implementation decisions (using for example, vector matching techniques). As depicted here, Communities A-D have each created distinct FIPs. In contrast, Communities E-H have chosen to reuse the profile of Community C (red arrows). Community I has also adopted the FIP of Community C but in this case, with 2 modifications (red circles for Resource 3 and Resource 5). Community J has adopted the exact FIP of Community 4 (blue arrow). FIP reuse leads to increasing similarity among FIPs in the Matrix which can be taken as a metric for convergence. In a manner similar to the Knowlet representation of the FIP, the fingerprint can itself be treated as FAIR data, including its representation as a FAIR Digital Object.

However, in practice, responses to the questionnaire are more nuanced than binary ‘use/do not use’ and require additional codes or in some cases even free-text responses (for example, from preliminary results working with roughly 50 research communities throughout Europe, it is clear that Community Data Stewards often wish to declare “we do not use this Resource yet, but have a project to implement in the next year”). Furthermore, alongside the FIP as a digital fingerprint it is vital to also publish the Considerations (captured as free text) as a separate referenceable record in order to make the reasons for the implementation choices and challenges intelligible to others and thereby making FIP reuse better fit for purpose.

The ultimate goal of this analysis is to align FIPs from different Communities in order to achieve convergence on the reuse of existing Resources and interoperation between the FAIR data and services of each Community. Hence, we refer to a FIP repository as the FIP Convergence Matrix. Although we can be confident that the FIP Wizard and the Nanopublication Server Network which currently store FIPs are reliable repositories, the FIP Convergence Matrix should eventually be sustained by a global and trusted data-mandated organization as an Open and FAIR resource, whether it be a centralized registry or a distributed network of repositories.

2.5 An Emerging FIP Architecture and Workflow

The FIP conceptual model and its various supporting components that are in development by the GO FAIR FIP Working Group compose a workflow for FIP creation and reuse.

The process of FIP creation begins by defining the Community description itself as a Resource. This includes the creation of a corresponding GUPRI and designation of a Community Data Steward. This minimal Community template has been used in the Nanobench tool [19] to mint nanopublications for a Community with a GUPRI and metadata like its research domain, time/date and versioning information [20].

Following the completion of the FIP questionnaire in the FIP Wizard, all FAIR implementation choices can be linked to the Community, creating an unambiguous machine-actionable FIP. FIPs can then be exposed as FAIR Digital Objects which in turn can be collected in the FIP Convergence Matrix repository yielding an overview of the FAIR implementation landscape. The FIP Convergence Matrix composed of FIP fingerprints facilitates systematic analyses over these landscapes leading to FIP optimization.

3 Discussion

3.1 FIPs and FAIR Convergence

Entering the FIP Wizard, confronting a complicated questionnaire that is likely to exceed the expertise of any single person, and then researching and declaring FAIR implementation decisions is a tedious and costly investment. However, once made, the FIP FDOs are reusable and can be shared with others in a number of important ways. This has the potential to lead to the rapid convergence and scaling required to realise the Internet of FAIR data and services in short time frames. This is especially true for FIPs authored or sanctioned by trusted domain authorities such as scholarly societies, scientific unions, GO FAIR Implementation Networks, or industry associations. Shareable and reusable FIPs can be used as a ‘default setting’ to kick-start FIP creation by other communities that aspire to adopt FAIR practices. However, organizations having cross-disciplinary or administrative mandates - such as repositories and national archives, funding agencies or publishers - may also define FIPs that would be seen as target implementation profiles by data producers. Likewise, data-related organizations, such as the GO FAIR Foundation, the Research Data Alliance, CODATA, and the World Data System could also create and endorse FIPs as they do for other best practices. As more FIPs accumulate, it should be possible to harness positive feedback where FIPs can inform the creation of other FIPs, leading potentially to easily reusable solutions and rapid convergence in this otherwise complex space. The reuse of carefully crafted FIPs has at least two important, and deeply related applications:

First, Trusted FIPs as Defaults in the FIP Convergence Matrix:

The optimized FIPs composed, maintained and endorsed by trusted authorities can be offered in the Convergence Matrix as ‘one-click’ defaults for other communities to adopt and reuse, in whole or in part, as they see fit.

Second, Trusted FIPs as Defaults in Data Stewardship Plans:

Once a FIP has been published in the FIP Convergence Matrix, it can be seen as the FAIR component of any data management/stewardship plan. The FIP could even inform community-specific ‘autocomplete’ functions in data management/stewardship planning tools assisting the data steward.

Community declared FIPs can be objectively evaluated on the basis of different attributes. For example, by inspecting each Resource listed in the FIP, it will be possible to calculate the degree to which the FIP maximises the reuse of existing Resources or the degree to which the FIP ensures interoperability. In addition, FIPs can be evaluated against various maturity indicators, while taking into account actual cost estimates for implementation.

As such, the FIP can itself be systematically optimized through judicious consideration and revision of implementation choices. Given the potential economic impact of “going FAIR” [21], there will likely emerge sophisticated FIP optimisation applications that could even include machine learning approaches that offer “suggestions” on how to improve a FIP for a given purpose. Advanced stages of FIP analysis will eventually lead to the identification and examination of FAIR technology ‘gaps’, spurring innovation of next-generation FAIR technologies.

The FAIR Principle R1.3 states that “(Meta)data meet domain-relevant Community standards”. This is the only explicit reference in the FAIR Principles to the role played by domain-specific communities in FAIR. It is interesting to note that an advanced, online, automated, FAIR maturity evaluation system [22] did not attempt to implement a maturity indicator for FAIR Principle R1.3. It was not obvious during the development of the evaluator system how to test for “domain-relevant Community standards” as there exists, in general, no venue where communities publicly and in machine-readable formats declare data and metadata standards, and other FAIR practices. We propose the existence of a valid, machine-actionable FIP be adopted as a maturity indicator for FAIR Principle R1.3.

3.2 Related Work

Although the FAIR Guiding Principles are numerously cited (~3000 citations of [1]) and strongly supported by the EOSC initiative to push Europe towards a culture of open research, there are currently no broadly accepted FAIR solutions. Most of the work today is on the topic of FAIR data assessment approaches, be it quantitative measurements with Maturity Indicator tests [22] or qualitative assessment tools like those from DANS [23], CSIRO [24] or from the RDA FAIR Data Maturity Model Working Group [25]. Relevant work on the uptake of good FAIR practices is being driven by the FAIRsFAIR project that issued FAIR semantic recommendations recently [26]. Also, standardization efforts such as the CoreTrustSeal certification procedures [27] will leverage the adoption of FAIR data management practices for trustworthy data repositories. As for the tools, in [28], the authors analysed and commented on the current trends and convergence in data management tools with respect to FAIR data stewardship and machine-actionability.

Other attempts are trying to foster harmonisation on specific aspects of the FAIR Principles, or focus on a specific domain. The project ENVRI-FAIR emphasizes the need to implement common FAIR policies and interoperability solutions across environmental research infrastructures. One way to foster convergence is to provide technical demonstrators for research infrastructures that adopt FAIR implementations offered by others [29]. The RDA I-ADOPT WG is developing on Interoperability Framework for seamless cross-domain terminology alignment for observable property descriptions [30]. In an effort to support and harmonise metadata applications toward FAIR, the GO FAIR initiative has launched a systematic and scalable approach to the creation of machine-actionable metadata called Metadata for Machines (M4M) Workshops [31]. As such, the FIP approach is novel in the sense that it offers a transparent vehicle for very specific, yet open and flexible Community-based solutions for each of the FAIR Principles.

4 Conclusion

FIP creation is not a goal in itself. The ultimate objective is to accelerate convergence onto widespread FAIR implementations. This calls also for a coordinated effort to create an agreed compilation of FAIR-Enabling Resources. The practical testing and uptake of the FIP conceptual model and its supporting tools signals promising applications across a broad spectrum of knowledge domains: from environmental sciences, like in ENVRI-FAIR using the FIP approach in its recurring FAIR assessment evaluation [9] to life sciences with the GO FAIR Virus Outbreak Data Network (VODAN Implementation Network), which has now published its version 1.0 FIP [10, 11]. FIP creation also features prominently in a series of hackathons leading up to the GO FAIR/CODATA Convergence Symposium 2020 [32] where five diverse communities attempt to demonstrate FIP-mediated FAIR convergence.