1 International Data Spaces

Data sovereignty is a fundamental aspect of the International Data Spaces (IDS). It can be defined as a natural person’s or corporate entity’s capability of being entirely self-determined with regard to its data. This is the main reason why the International Data Spaces initiative got its name in 2015 [1]. A Data Space represents a data sharing concept without a central storage. Thus, data remains at its source and is only shared when needed. This enables data providers to be sovereign over the data as long as data is needed, and when it is needed IDS have to ensure to attach usage policies to the data, which systems and users can follow. In this regard the IDS aims at meeting the following goals.

1.1 Goals of the International Data Spaces

Trust is always needed to enable data sharing in a data ecosystem. A participant has to trust in the systems itself, but also in the fact that other participants in the data ecosystems get valuable data, which only should be used with regard to the usage policies, defined by the data provider. Thus, trust is the basis of the International Data Spaces. Each participant and software in the IDS will be certified before being granted access to the ecosystem.

Security is strongly coupled with trust. All systems in the IDS have to fulfill state-of-the-art security, to also guarantee trust and data sovereignty. Thus, security requirements are also part of the certification criteria.

Data sovereignty is a fundamental aspect of the International Data Spaces (IDS). It can be defined as a natural person’s or corporate entity’s capability of being entirely self-determined with regard to its data. This means that a data owner can define usage restriction to their data, before sharing it with data consumers. Data consumers must accept the usage restrictions.

Data ecosystems enable new business models that individual actors cannot make possible themselves because they lack the entirety of data. No single actor has all the data it needs to offer innovative service itself. Therefore, a data ecosystem needs a data space to enable these new innovative services.

Standardized interoperability is needed to build up data space, because different data ecosystems will exchange different kinds of data in different formats and protocols. Only when the interoperability is standardized, every system can be interoperable in the IDS. Therefore, the IDS architecture is defined in a reference architecture model [2], data and endpoints are described semantically in the information model, and certification ensures that every system follows the architecture and uses the information model in the defined way. Also, there is the DIN Spec 27070 which defines the IDS Connector.

1.2 Reference Architecture Model

In the following, we provide an overview of the technological components inside the IDS Reference Architecture Model, starting with the most important technological building block of the IDS, the International Data Spaces Connector.

Figure 3.1 shows an overview of the IDS architecture. Alongside the Connector, additional key services are essential for a successful realization of the IDS. The following services are defined by the IDS.

Fig. 3.1
figure 1

Reference architecture model [3]. ©2021, International Data Spaces Association. Used under permission from International Data Spaces Association

1.2.1 The International Data Spaces Components

Connector

As a technological building block of the IDS, the Connector ensures that participants maintain sovereignty over the data. At the same time, it functions as an interface between the internal systems of the IDS participants and the IDS ecosystem itself.

Depending on the configuration, the tamper-proof Connector hosts a variety of system services ensuring, for example, a secure bidirectional communication, enforcement of usage policies upon exchanged content, system monitoring, and logging of content transactions for clearing purposes. The functionality of a generic Connector may further be extended by custom software (Data Apps) for data processing, visualization or persistence, etc.

As shown in Fig. 3.2, the IDS Connector can be viewed in the phases of configuration and execution. The structure of an IDS Connector in the execution phase consists of three container types:

  1. 1.

    The Core Container with basic functions for communication between IDS components (Connectors, Broker, and App Store). This is divided into the internal components Data Router, which manages the communication according to predefined configuration parameters, and Data Bus, which exchanges data with other components or stores data within the connector.

  2. 2.

    App Store Containers for Applications certified and downloaded from the IDS App Store.

  3. 3.

    Custom Containers which were not stored in the IDS App Store but deployed by the Connector operator itself.

Fig. 3.2
figure 2

International Data Spaces Connector Architecture [2]. ©2019, International Data Spaces Association. Used under permission from International Data Spaces Association

The application container management technology allows the different services to run in an isolated and secure environment. The Connector architecture follows the principle of processing data as close as possible to the data source.

Identity Provider

The Identity Provider should offer a service to create, maintain, manage, monitor, and validate identity information of and for participants in the IDS. This is imperative for secure operation of the IDS and to avoid unauthorized access to data. The Identity Provider administers self-descriptions and attested (certified) attributes of the connectors and issues tokens as needed for the required attributes of a connector.

  • Each International Data Spaces Connector has a private key with a corresponding X509v3 certificate (device certificate).

  • In contrast to conventional PKI-based enterprise IDM systems, these static certificates are however used for authentication only and not for the exchange of identity attributes.

  • Instead, these are exchanged using dynamic tokens that the connectors obtain from an attribute server.

Metadata Broker

A Broker acts as a mediator between data providers offering data and data users requesting data. It also acts as a data source registry. In more detail, a Broker performs the following activities:

  • Provides data providers with functions to publish their data sources

  • Provides data users with functions to search through the data sources of data providers

  • Provides data providers and data users with functions to make agreements on the provision and use of certain data

Clearing House

The Clearing House provides clearing and settlement services for all financial and data exchange transactions. The Clearing House logs all activities performed in the course of a data exchange. After a data exchange, or parts of it, has been completed, both the data provider and the data consumer confirm the data transfer by logging the details of the transaction at the Clearing House. Based on this logging information, the transaction can then be billed. The logging information can also be used to resolve conflicts (e.g., to clarify whether a data package has been received by the data consumer or not).

The Clearing House supervises the exchange of data (without infringing upon the data sovereignty of the data owners). In more detail, the Clearing House performs the following activities in its function as a clearing house:

  • Supervises and records data exchange transactions

  • Furnishes reports on the search for data sources and on data exchange transactions

  • Supports the rollback of transactions in case of faulty or incomplete data exchange

App Store

The Industrial Data Space promotes the development of a business ecosystem in which participants may develop software (especially data services) and make this software available via the App Store. The App Store Operator performs the following activities:

  • Provides functions by which software developers may describe data services and make these services available to other participants

  • Provides functions by which participants may retrieve and download data services

  • Provides functions for payment and rating of data services

Vocabulary Provider

In order to better define one’s own data, domain-specific vocabularies can be created and made available in the vocabulary for all IDS participants. The Vocabulary Provider manages and offers vocabularies (i.e., ontologies, reference data models, or metadata elements) that can be used to annotate and describe datasets. In particular, the Vocabulary Provider provides the Information Model of the IDS, which is the basis for the description of data sources. In addition, other domain-specific vocabularies can be provided:

  • Provides a central repository for schema and vocabulary information

  • Provides a tool support for collaborative versioning (creating, maintaining, and archiving) of vocabularies and schemas

  • Enables a linkage between the data transferred by the Connector and the vocabulary information

1.2.2 The International Data Spaces Roles

Participants can take on different roles, which are assigned to different categories depending on the level of interaction and organization, which are described in detail below (summarized in Table 3.1).

Table 3.1 International Data Space role categories

Category 1: Core Participant

The core participants are involved and required every time data is exchanged in the IDS. Roles assigned to this category are data owner, data provider, data consumer, and data user.

Data Owner

The data owner is defined as a legal or natural person who creates and/or exercises control over the data. This control is enabled by defining usage policies and providing access to data. Data ownership includes at least these two major concepts:

  • To have the (technical) capability and the responsibility to define the usage contracts incl. payment model and usage policies

  • To provide access to the data

Data Provider

The data provider is responsible for providing the data for exchange between a data owner and a data consumer and uses software components that are compliant with the IDS Reference Architecture Model for this purpose. In most cases, but not necessarily, the data provider and data owner are identical. A connection can be established directly between a data provider and a data consumer. To facilitate a data request, the data provider can transmit appropriate metadata to the broker service. Further activities can be the logging of transactions at a clearing house as well as the enrichment or transformation of data by means of data apps.

Data Consumer

The data consumer receives data from a data provider. From a business process modeling perspective, the data consumer is the mirror entity of the data provider; the activities performed by the data consumer are therefore similar to the activities performed by the data provider.

Before connecting to a data provider, the data consumer can search for existing datasets by making an inquiry at a Broker Service Provider.

Data User

Similar to the data owner being the legal entity that has the legal control over its data, the data user is the legal entity that has the legal right to use the data of a data owner as specified by the usage policy. In most cases, the data user is identical with the data consumer. However, there may be scenarios in which these roles are assumed by different participants.

App Provider

App providers develop data apps to be used in the IDS. To be deployable, a data app has to be compliant with the system architecture of the IDS. In addition, data apps can be certified by a Certification Body in order to increase trust in these applications. App providers should describe each data app using metadata (in compliance with a metadata model) with regard to its semantics, functionality, interfaces, etc.).

Category 2: Intermediary

Intermediaries act as trusted entities. Roles assigned to this category are Metadata Broker Service Provider, Clearing House, App Store, Vocabulary Provider, and Identity Provider. Only trusted organizations should assume these roles.

The federated architecture of the IDS provides for the operation of (virtually) centralized components that map individual aspects of service delivery within the data space. Namely, these are the core components described above, with the exception of the Connector, which runs in a decentralized manner.

Each of these components must be integrated, operated, and maintained in a functioning data space. These activities are performed by the service provider in its role as intermediary. It should be mentioned here that there can be one service provider for all components or different service providers for individual components.

Category 3: Software and Services

This category comprises IT companies providing software and/or services (e.g., in a software-as-a-service model) to the participants of the IDS. Roles subsumed under this category are App Provider, Service Provider, and Software Provider.

Software Provider

A Software Provider provides software for implementing the functionality required by the IDS. Unlike data apps, software is not provided by the App Store, but delivered over the Software Providers’ usual distribution channels, and used on the basis of individual agreements between the Software Provider and the user (e.g., a data consumer, a data provider, or a Broker Service Provider).

Service Provider

If a participant does not deploy the technical infrastructure required for participation in the IDS itself, it may transfer the data to be made available in the IDS to a Service Provider hosting the required infrastructure for other organizations.

This role includes also providers offering additional data services (e.g., for data analysis, data integration, data cleansing, or semantic enrichment) to improve the quality of the data exchanged in the IDS.

Category 4: Governance Body

The IDS is governed by the Certification Body and the International Data Spaces Association.

International Data Spaces Association

The International Data Spaces Association is a nonprofit organization promoting the continuous development of Data Spaces. It supports and governs the development of the Reference Architecture Model. The International Data Spaces Association is currently organized across several working groups, each one addressing a specific topic (e.g., architecture, use cases and requirements, or certification). Members of the Association are primarily large industrial enterprises, IT companies, SMEs, research institutions, and industry associations.

Certification Body and Evaluation Facility

The Certification Body and the Evaluation Facility are in charge of the certification of the participants and the technical core components in the IDS.

1.2.3 Usage Control

In addition to classic access control, which controls access to certain resources, the IDS reference architecture focuses on data-centric usage control [4]. This aims at granting usage restrictions for data even after access. This is achieved by binding rules to the exchanged data, which can be continuously controlled, e.g., how messages are processed, aggregated, or forwarded to further endpoints. On the one hand, the data-centric view allows users to continuously control data flows, not just access. On the other hand, usage control through the IDS connectors ensures that data is not processed in an undesirable way, e.g., by forwarding personal data to public endpoints.

To illustrate the relevance of usage control, examples can be given that cannot be achieved using access control only. In the area of secrecy, it can be achieved that classified data cannot be forwarded by data consumers to third parties who have not been authorized. Separation of duties can be achieved by ensuring that datasets of, e.g., two competing companies are not aggregated or processed by the same third party in a service. This enables a company to control that their own data is not used by a third party to benefit their direct competitors.

Usage control is enforced by monitoring data flows by control points. Within these checkpoints, decision-making engines decide on permission, denial, or necessary modification of the data.

The required restrictions must be formally defined by the data owners. User-friendly graphical interfaces are available for this purpose, which transform the specifications into machine-readable output.

The usage restrictions can be attached to the data in two different ways. On the one hand, usage restrictions can be attached directly to the data. For example, decryption can only take place if the usage restrictions are guaranteed to be respected. On the other hand, usage restrictions can be stored in a central instance independently of the data. In this case, the usage restrictions must be exchanged between the systems.

A policy editor (Policy Administration Point, PAP) integrated in the connectors can be used to specify the usage restrictions.

1.3 Certification

Certification is an important element in the IDS to establish a trustworthy data space. A distinction is made between the certification of components and Data Space participants or organizations [5].

1.3.1 Security Profiles

The IDS RAM defines four different security profiles: Base Free, Base, Trust, and Trust+ (Managed Trust). New profiles may be added in the future. The Base Free profile supports the operation of IDS concepts and technologies outside the public trusted Data Space, e.g., for research projects for the operation inside of one security domain, e.g., inside a company. The Base profile defines the minimal level of trust mechanisms, including the certification process. The Trust profile defines extended security features. The Trust+ or Managed Trust profile relies on Trusted Hardware based on TPM (Table 3.2).

Table 3.2 IDS security profiles and the related dimensions [2]

1.3.2 Participant Certification

The certification of participants enables trustworthy interaction with these participants. On the one hand, entire data spaces can become trustworthy, and on the other hand, it is possible to filter data spaces in which everyone can participate according to trustworthiness.

The trust to be established in the participants of the data space is often essential, especially in an industrial context.

The certification itself focuses on the achievement of defined security levels, which include the infrastructure and the compliance with processes.

1.3.3 Component Certification

The certification of the components focuses on compliance with the required functionalities and security levels that correspond to the International Data Spaces. In addition, interoperability is ensured. A particular challenge is to ensure that correct information is given about which participants can access the data and which software components will be able to access the data.

In summary, certification must be ensured by an approved inspection body and the central certification body of the IDS, which act as developer-independent entities. These instances therefore provide an official certificate, i.e., a signature of the individually relevant manifests that enable reliable technical verification, as well as a digital X.509 certificate for the connector.

1.4 Open Source

The IDS ecosystem’s implementations are by their very nature collaborative, as the results must be verified by stakeholders from different industries before being success stories. This open environment will not only speed up the validation process, but it will also help IDS component implementations achieve the highest level of quality possible through collaboration. As a result, technical components of IDS (Sect. 3.1.2) can be found in the IDSA GitHub repository (https://github.com/International-Data-Spaces-Association), where they are built using best-practice OSS development processes through continuous implementations and feedback from communities and stakeholders.