Abstract
This chapter presents the Trusted Integrated Knowledge Dataspace (TIKD)—a trusted data sharing approach, based on Linked Data technologies, that supports compliance with the General Data Privacy Regulation (GDPR) for personal data handling as part of data security infrastructure for sensitive application environments such as healthcare. State-of-the-art shared dataspaces typically do not consider sensitive data and privacy-aware log records as part of their solutions, defining only how to access data. TIKD complements existing dataspace security approaches through trusted data sharing that includes personal data handling, data privileges, pseudonymization of user activity logging, and privacy-aware data interlinking services. TIKD was implemented on the Access Risk Knowledge (ARK) Platform, a socio-technical risk governance system, and deployed as part of the ARK-Virus Project which aims to govern the risk management of personal protection equipment (PPE) across a group of collaborating healthcare institutions. The ARK Platform was evaluated, both before and after implementing the TIKD, using both the ISO 27001 Gap Analysis Tool (GAT), which determines information security standard compliance, and the ISO 27701 standard for privacy information. The results of the security and privacy evaluations indicated that compliance with ISO 27001 increased from 50% to 85% and compliance with ISO 27701 increased from 64% to 90%. This shows that implementing TIKD provides a trusted data security dataspace with significantly improved compliance with ISO 27001 and ISO 27701 standards to share data in a collaborative environment.
Keywords
- Dataspace
- Knowledge Graph
- Trusted data
- Personal data handling
This research has received funding from the ADAPT Centre for Digital Content Technology, funded under the Science Foundation Ireland Research Centres Programme (Grant 13/RC/2106_P2) and under the SFI Covid Rapid Response Grant Agreement No. 20/COV/8463, and co-funded by the European Regional Development Fund and the European Union’s Horizon 2020 Research and Innovation Programme under the Marie Sklodowska-Curie grant agreement No. 713567. For the purpose of Open Access, the author has applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission.
Download chapter PDF
1 Introduction
This chapter relates to the technical priority of data management from the European Big Data Value Strategic Research and Innovation Agenda [23]. It addresses the horizontal concern of data management from the BDV Technical Reference Model and the vertical concerns of cybersecurity. Additionally, this chapter relates to the data for AI enablers of the AI, Data, and Robotics Strategic Research, Innovation, & Deployment Agenda [22].
Sharing sensitive data, between healthcare organizations, for example, can facilitate significant societal, environmental, and economic gains such as medical diagnoses and biomedical research breakthroughs. However, as this data is sensitive, organizations understand the importance (and increasing compliance requirements) of securely sharing, storing, managing, and accessing such data. Here, sensitive data is specified to include personal data or personally identifiable information (PII), GDPR special category personal data,Footnote 1 and business confidential or restricted data that does not normally leave an organization. Most recent works in sensitive data sharing have used cryptosystems and blockchain approaches [4, 10, 21]. These approaches were designed to facilitate the sharing of sensitive data, such as sharing patient medical records between healthcare institutions, but need additional infrastructure to support collaborative data sharing environments for the purpose of research or collaborative analysis. This chapter explores the use of a dataspace, a data management framework capable of interrelating heterogeneous data, for the sharing of sensitive data in a collaborative environment. It also illustrates the use of Knowledge Graphs (KGs) in constructing a trusted data sharing environment for sensitive data.
In recent years, KGs have become the base of many information systems which require access to structured knowledge [2]. A KG provides semantically structured information which can be interpreted by computers, offering great promise for building more intelligent systems [24]. KGs have been applied in different domains such as recommendation systems, information retrieval, data integration, medicine, education, and cybersecurity, among others [20]. For example, in the medical domain, KGs have been used to construct, integrate, and map healthcare information [24]. A dataspace integrates data from different sources and heterogeneous formats, offering services without requiring upfront semantic integration [6]. It follows a “pay-as-you-go” approach to data integration where the priority is to quickly set up the fundamental aspects of the dataspace functionality, such as dataset registration and search, and then improve upon the semantic cohesion of the dataspace over time [6, 8]. The dataspace services offered over the aggregated data do not lose their surrounding context, i.e., the data is still managed by its owner, thus preserving autonomy [5].
A dataspace requires security aspects, such as access control and data usage control [2, 17, 18], to avoid data access by unauthorized users. In this sense, access control is a fundamental service in any dataspace where personal data is shared [2, 3, 13, 15, 17, 18]. According to Curry et al. [2], a trusted data sharing dataspace should consider both personal data handling and data security in a clear legal framework. However, there is currently a lack of solutions for dataspaces that consider both the privacy and security aspects of data sharing and collaboration (see Sect. 3). This work explores the following research question: to what extent will the development of a multi-user and multi-organization dataspace, based on Linked Data technologies, personal data handling, data privileges, and data interlinking, contribute to building a trusted sharing dataspace for a collaborative environment? In response, this work proposes the Trusted Integrated Knowledge Dataspace (TIKD)—an approach to the problem of secure data sharing in collaborative dataspaces.
The TIKD is a multi-user and multi-organization Linked Data approach to trustworthy data sharing between an organization’s users. The security access to data follows a context-based access control (CBAC) model, which considers the user and data context to authorize or deny data access. The CBAC implementation is based on the Social Semantic SPARQL Security for Access Control Ontology [19] (S4AC) which defines a set of security policies through SPARQL ASK queries. TIKD defines a privacy protecting user log, based on the PROV ontology, to create user history records. User logs are securely stored following a pseudonymized process based on the Secure Hash Algorithm 3 (SHA-3). The TIKD also provides personal data handling, based on the data privacy vocabularyFootnote 2 (DPV), to comply with the General Data Protection Regulation (GDPR). It implements an interlinking process to integrate external data to the KG based on the Comprehensive Knowledge Archive NetworkFootnote 3 (CKAN) data management tool. The contributions of this research are:
-
1.
A trusted dataspace, based on Knowledge Graph integration and information security management, for collaborative environments such as healthcare
-
2.
An information security management system to securely handle organizational data sharing, personal data, user history logs, and privacy-aware data interlinking by means of a context-based access control that includes data privileges and applies a pseudonymization process for user logs
This work extends TIKD from the former work [7] by updating the access control model, improving the personal data handling process, describing the data classification mechanism, and incorporating a new evaluation process based on the privacy information ISO 27701 standard.
The structure of the remainder of this chapter is as follows: the Use Case section defines the requirements of the ARK-Virus Project. The Related Work section presents the state of the art in dataspace data sharing approaches. The Description of the TIKD section details the services of the dataspace. The Evaluation section presents the results from the ISO 27001 Gap Analysis Tool (GAT) and the ISO 27701 control requirements. Finally, the Conclusion section presents a summary of this research and its future directions.
2 Use Case—Sensitive Data Sharing and Collaboration for Healthcare in the ARK-Virus Project
The ARK-Virus Project.Footnote 4 extends the ARK Platform to provide a collaborative space for use in the healthcare domain—specifically for the risk governance of personal protective equipment (PPE) use for infection prevention and control (IPC) across diverse healthcare and public service organizations [12]. The consortium consists of the ARK academic team (ADAPT Centre, Dublin City University, and the Centre for Innovative Human Systems, Trinity College Dublin) and a community of practice which includes safety staff in St. James’s Hospital Dublin, Beacon Renal, and Dublin Fire Brigade. Staff across all three organizations are involved in trialing the ARK Platform application which is hosted in Trinity College Dublin. This creates many overlapping stakeholders that must be appropriately supported when handling sensitive information.
The ARK Platform uses Semantic Web technologies to model, integrate, and classify PPE risk data, from both qualitative and quantitative sources, into a unified Knowledge Graph. Figure 1 illustrates the ARK Platform’s data model supporting the collaborative space for PPE. This model is expressed using the ARK Cube ontologyFootnote 5 and the ARK Platform VocabularyFootnote 6 [9, 12]. The Cube ontology is used in the overall architecture of the ARK Platform to support data analysis through the Cube methodology—an established methodology for analyzing socio-technical systems and for managing associated risks [1, 11]. The ARK Platform Vocabulary allows for the modeling of platform users, access controls, user permissions, and data classifications.
Through the ARK-Virus Project a set of security requirements for the ARK Platform were defined (see Table 1). These requirements included data interlinking, data accessibility (privacy-aware evidence distillation), and secure evidence publication (as linked open data), as priority security aspects. The ARK Platform implements the TIKD to cope with these requirements (see Table 1) and to provide secure management of personal data, pseudonymized data (for compliance with the GDPR, explained later in this chapter), and security logs (for history records).
3 Related Work
This section compares available data sharing approaches with the ARK-Virus requirements (see Table 1) in order to establish their suitability. The approaches analyzed can be divided into two main techniques: dataspace-based and blockchain-based, where blockchain is an Internet database technology characterized by decentralization, transparency, and data integrity [14].
Dataspace approaches to data sharing services are primarily associated with the Internet of Things (IoT) [2, 15, 17, 18], where data integration from heterogeneous devices and access control are the main objective. On the other hand, blockchain approaches [4, 10, 21] integrate cryptography techniques as part of the data management system in order to share data between agents (users or institutions). Table 2 provides a comparison of the state of the art and TIKD in relation to the requirements of the ARK-Virus Project.
Data sharing approaches based on blockchain methods [3, 4, 10, 21] use a unified scheme. In most cases records must be plain text, avoiding the integration of data in different formats, and usage policies, which restrict the kind of action that an agent can perform over data, are not defined. Even when the main concern of these approaches is to keep a record’s information secure, they do not propose any agent information tracking for activity records. TIKD implements an authorization access control based on security policies that consider context information, security roles, and data classification (explained in the next section) in order to share data between users in the same or different organizations.
Typical state-of-the-art dataspaces implement security features such as access control authentication methods [13, 17], defined access roles [2, 15], user attributes [18]), and usage control [13, 15] in order to provide data sharing services. In addition to security aspects, dataspace approaches with sharing services cope with data fusion [17], usage control between multiple organizations[13], real-time data sharing [2], and privacy protection [18]. However, these approaches do not provide mechanisms for personal data handling in compliance with GDPR, privacy-aware log records, or privacy-protected interlinking with external resources. TIKD is based on a set of Linked Data vocabularies that support these aspects, e.g., the Data Protection Vocabulary (DPV) to cope with personal data handling, the Data Catalog VocabularyFootnote 7 (DCAT) to cope with interlinking external resources, and PROVFootnote 8 to cope with user logs.
4 Description of the TIKD
The Trusted Integrated Knowledge Dataspace (TIKD) was designed in accordance with the ARK-Virus Project security requirements (see Sect. 2). The TIKD services (Fig. 2) define data permissions (Knowledge Graph integration, subgraph sharing, and data interlinking), user access grants (security control), and external resource integration (data interlinking) to provide a trusted environment for collaborative working.
TIKD is a multi-user and multi-organization dataspace with the capability of securely sharing information between an organization’s users. The security control module asserts that only granted users, from the same organization, can access KGs, shared information, and interlinked data. This module follows a context-based approach considering security roles and data classifications (explained later in this section), i.e., access to the organization’s data is determined by the user’s context and the target data classification. The next subsections explain each of these services.
4.1 Knowledge Graph Integration
The Knowledge Graph integration service (Fig. 2, Knowledge Graph integration) is a central component of the TIKD. This service defines a dataspace where i) multiple users can work on a KG within an organization, ii) multiple organizations can create KGs, iii) linking to datasets by means of DCAT, instead of graphs/data, is supported, iv) fine-grained record linkage via DCAT records is supported, and v) evidence and KG integration/linking are supported.
4.2 Security Control
The security control service (Fig. 2, security control) is the main service of the TIKD. This service makes use of Linked Data vocabularies to handle personal data, access control context specification, and privacy protecting user logs. The following subsections explain in detail each one of these services.
4.2.1 Personal Data Handling
Personal data is described through the DPV, proposed by the W3C’s Data Privacy Vocabularies and Controls Community Group [16] (DPVCG). DPV defines a set of classes and properties to describe and represent information about personal data handling for the purpose of GDPR compliance.
The ARK Platform collects user’s personal data through a registration process which enables the access to the ARK Platform. The registration process requires a username, email address, organization role, platform role, and a password. On the other hand, the TIKD security control service authenticates an ARK user through their username, or email address, and their password. To represent these kinds of personal data, the following DPV classes (Fig. 3) were used:
-
Personal data category (dpv:PersonalDataCategory): identifies a category of personal data. The classes dpv:Password, dpv:Username, and dpv:EmailAddress are used to represent the personal data handled by TIKD.
-
Data subject (dpv:DataSubject): identifies the individual (the ARK user) whose personal data is being processed.
-
Data controller (dpv:DataController): defines the individual or organization that decides the purpose of processing personal data. The data controller is represented by the ARK Platform.
-
Purpose (dpv:Purpose): defines the purpose of processing personal data. The security class (dpv:Security) is used to define the purpose.
-
Processing (dpv:Processing): describes the processing performed on personal data. In this sense, the ARK Platform performs the action of storing (dpv:Store) the ARK user’s personal data and TIKD performs the action of pseudonymizingFootnote 9 (dpv:PseudoAnonymise) the data to perform log actions.
4.2.2 Data Classification
The ARK Platform uses different data classification levels to define the visibility, accessibility, and consequences of unauthorized access to an access control entityFootnote 10 (ACE). An ACE defines a KG representing an ARK Project or an ARK Risk Register.Footnote 11 Table 3 describes each data classification access level. Considering the data classification levels, a public ACE can be accessed by the general public and mishandling of the data would not impact the organization. Conversely, the impact of unauthorized access or mishandling of a restricted ACE would seriously impact the organization, staff, and related partners. The integration of data classification to the TIKD provides certainty about who can access which data based on the constraints of the data itself.
An ACE can be associated with one or more data entities. A data entityFootnote 12 represents an individual unit (data) or aggregate of related data (group of data), each of which can have its own data classification. The data classification of data entities follows a hierarchical structure whereby the ACE represents the root node and the data entities represent a child or sub-node. In line with this hierarchy, sub-nodes cannot have a less restrictive access level than the root/parent node, i.e., if the ACE data classification is defined as internal, then its data entities cannot be classified as public.
4.2.3 Access Control
The access controls (AC) were designed to meet the privacy-aware evidence distillation requirement (Table 1) of providing access to users with the appropriate level of clearance. The AC follows a context-based approach, alongside data classifications, to allow or deny access to an ACE.
Considering the security role, the AC mediates every request to the ARK Platform, determining whether the request should be approved or denied. TIKD defines a context-based access control (CBAC) model, based on context and role specification, where data owners can authorize and control data access. In a CBAC model, policies associate one or more subjects with sets of access rights, pertaining to users, resources, and the environment, in order to grant or deny access to resources. In this sense, the set of policies consider the current user’s context information to approve or deny access to ACEs. The AC takes into account the following authorization access elements (Fig. 4):
-
ARK user: an ARK user has associated an organization role, a platform status, and a security role. The security role is assigned after creating or relating an ARK user with an ACE.
-
ARK Platform status: defines the user’s status in the ARK Platform, e.g., active, pending, update pending, and update approved.
-
Organization role: each organization has the facility to define their own organization and security role hierarchy independently. The ARK Platform contains some predefined security roles (admin, owner, collaborator, and read-only) and platform roles (frontline staff, clinical specialist, and safety manager, among others). However, these roles can be extended according to the organization’s requirements.
-
Security role: an ARK user is associated with an ACE through their security role. In this sense, an ARK user could take one of the following predefined security roles: admin, owner, collaborator, or read-only, where owner and admin are the highest level roles.
-
Data classification: defines the data visibility of ACEs and data entities considering the rules from Table 3.
-
Data entity (evidence): refers to interlinked data. A user can interlink data from external sources to enrich an ACE. In the ARK Platform context, this interlinked data is considered “evidence.” The evidence is under the owning organization’s jurisdiction, i.e., only users from the same organization have access. Additionally, the evidence can take any of the data classification access level, i.e., an evidence could be defined as public, internal, confidential, or restricted.
The TIKD AC (Fig. 5) is based on the Social Semantic SPARQL Security for Access Control Ontology (S4AC). The S4AC is a fine-grained access control over Resource Description Framework (RDF) data. The access control model provides the users with means to define policies to restrict the access to specific RDF data at named graphs or triple level. It reuses concepts from SIOC,Footnote 13 SKOS,Footnote 14 WAC,Footnote 15 SPIN,Footnote 16 and the Dublin Core.Footnote 17
The main element of the S4AC model is the access policy (Fig. 5). An access policy defines the constraints that must be satisfied to access a given named graph or a specific triple. If the access policy is satisfied, the user is allowed to access the data, but if not, access is denied. TIKD access policies consider ARK user context (the ARK Platform status, the security role, organization role) and the data classification of the target resource (an ACE or a data entity).
The TIKD AC integrates the arkp:AccessControlContext class to the S4AC to define the ARK Platform context information. The ARK user’s context information is represented as a hash string to validate the relationship between the ARK user and the target ACE (Fig. 6a). The ARK user context corresponds to the attributes which define the current state of the user in relationship with the ARK Platform (their status), the ACE (their security role), and the organization (their organization’s role). These attributes are the input for the hash function to generate a corresponding hash string, which will be associated with the user and the ACE (Fig. 6b), through the property arkp:hasContextValidation in the corresponding class.
4.2.4 Policy Specification
The TIKD AC defines two kinds of policies: global and local. The global policy and context policy compare the ARK user’s context hash string against the hash string from the target ACE (ARK Project or ARK Risk Register). If both are the same, access to the ACE is granted; otherwise, it is denied. The local policy considers the data classification of ACEs and data entities to grant or deny access to an ARK user. Table 4 describes the data classification and the security role required to access the data. Local polices check if an ARK user’s security role has the correct permissions to access the requested data.
A TIKD AC policy is defined by the tuple P =< ACS, AP, R, AEC > , where ACS stands for the set of access conditions, AP for the access privilege (create, delete, read, update), R for the resource to be protected, and AEC for the access evaluation context. An access condition is defined through a SPARQL ASK query, representing a condition to evaluate a policy or policies. The AEC is represented by the hash string value produced from the ARK user context.
The policy specification process selects the corresponding global and local policies. After an ARK user sends a request to access an ACE (Fig. 7a), the global policy is selected (Fig. 7b, c). The local policies include the ACE and their data entity data classification configuration (Fig. 7d), which defines data authorization access; according to this configuration, the corresponding ASK queries are selected.
4.2.5 Policy Enforcement
The policy enforcement process executes the corresponding ASK queries and returns the decision to grant or deny access to the ACE (Fig. 7e). The global policies are executed first, and if the ASK query returns a true value, then the local policies are executed. In the ARK Platform, the user context could change at any moment by several factors, e.g., update to organization role, organization change, update to security role, update to platform status, etc. The global policy validates the ARK user context with the target ACE. A correct validation means that the user is granted access to the ACE. On the other hand, the local policy defines a fine-grained data access for data entities allowed to be accessed by the user.
4.2.6 Privacy Protecting User Logs
Finally, the privacy protecting user logs record the actions performed by users during their sessions on the ARK Platform for historical record purposes. User information is pseudonymized in the log data, using the SHA-3 algorithm, by combining the username, email, and registration date parameters.
The user logs record user activities on the platform and the results retrieved by the system (failure, success, warning, etc.) during a session, e.g., if the user tries to modify the KG but their role is read-only, the privacy protecting user log process will record this activity as well as the failure response from the system. The PROV ontologyFootnote 18 is used to implement the privacy protecting user logs following an agent-centered perspective i.e., focusing on the people or organizations involved in the data generation or manipulation process.
4.3 Data Interlinking
TIKD supports the integration of KGs and also provides special support for the integration of potentially sensitive external resources (a data interlinking requirement of the ARK-Virus Project), by means of an interlinking service (Fig. 2 data interlinking).
The data interlinking service allows users to add data from an external source as evidence to a risk management project. Evidence is used as supporting data for the KG, providing findings or adding valuable information to enrich the content of the KG. The multi-user and multi-organizational nature of the ARK Platform requires an access restriction to evidence. In this sense, the access control service restricts access to evidence only to users from the same organization.
The TIKD data interlinking process was implemented through CKAN, a data management system which enables organizations and individuals to create and publish datasets, and associated metadata, through a web interface. CKAN is an open-source community project, thus providing a rich number of extensions/plugins.
The data interlinking process (Fig. 8) consists of three main steps: (i) dataset creation, (ii) API communication, and (iii) evidence integration. In step one, a user creates a dataset, containing evidence resources, using CKAN (Fig. 8a). In step two, the API communication (Fig. 8b) handles the evidence requests, i.e., the ARK Platform requests evidence metadata via the CKAN API which returns the requested information as a DCAT record. In step three, (Fig. 8c), users request access to evidence metadata through the ARK Platform, which validates the user’s grants based on the access control, in order to interlink the evidence to the project KG.
Datasets created using CKAN can be classified as public or private—public datasets are visible to everyone and private datasets are visible only to users of the owning organization. Private datasets align with the internal classification of the ARK data classification model.
As the ARK-Virus requirements define the visibility of data through a more complex structure than CKAN, the default data classification of CKAN will be altered to align with the ARK data classifications. This will be achieved through CKAN extensions that allow for dataset access to be more restricted than the current private/internal visibility level.
4.4 Data Sharing
TIKD provides the functionality to share data between users from the same organization, considering the ARK-Virus security requirements. Data sharing is performed by means of the data interlinking service and data classifications.
The sharing mechanism allows users from the same organization to share evidence through CKAN. The data classification of the shared evidence remains under the control of the owner or the admin user, i.e., the data classification of shared evidence is not transferable between projects.
The data interlinking service and the sharing mechanism allow organizations to reuse data between projects. Evidence data is shared under a secured scenario where the access control and the data classification determine the visibility of the evidence.
4.5 Subgraph Sharing
The ARK-Virus Project defines a collaborative environment where users can share data from ACEs using a privacy-aware sharing mechanism whereby confidential or sensitive data cannot be shared outside an organization. This sharing functionality helps to reuse information to enrich related ACEs. In this sense, the subgraph sharing service (Fig. 2, subgraph sharing) helps to extend or complement information from one ACE to another.
The subgraph sharing process (Fig. 9) considers the access control policies, from the security control service, to determine which data is accessible to an organization’s users and which data is not, e.g., ACE data defined as public (P-labeled nodes) could be reused by any member of the same organization, whereas restricted data (R-labeled node) cannot be shared with any other member of the organization, i.e., the data defined as restricted is enabled only for the owner of the data, the organization admin, and other explicitly specified users. The accessibility is defined by the data classification (Table 4) of the ACE and its data entities. If the user’s request is allowed, the corresponding subgraph is returned.
The sharing methods defined by TIKD enable collaboration between members from the same organization. The subgraph sharing enables the reuse of data between ACEs. These sharing functionalities are handled by the access control policies which determine whether the requester (user) is able to access evidence or subgraph information.
5 Security and Privacy Evaluations of the ARK Platform
This section presents a security evaluation of the ARK Platform considering the requirements of the ISO 27001 (ISO/IEC 27001) standard and the privacy control requirements of the ISO 27701 (ISO/IEC 27701). The ISO 27001Footnote 19 is a specification for information security management systems (ISMS) to increase the reliability and security of systems and information by means of a set of requirements.
The second standard considered for the evaluation of TIKD is the ISO 27701.Footnote 20 The ISO 27701 is the international standard for personally identifiable information (PII). This standard defines a privacy information management system (PIMS) based on the structure of the ISO 27001. The standard integrates the general requirements of GDPR, the Information Security Management System (ISMS) of ISO 27001, and the ISO 27002 which defines the best security practices.
The requirements of the ISO 27701 include 114 security controls of Annex A of ISO/IEC 27001 and the guide of ISO/IEC 27002 about how to implement these security controls. The ISO 27701 defines specific security controls that are directly related to PII, which are grouped into two categories: PII processors (Annex A) and PII controllers (Annex B).
5.1 Security Evaluation
The security evaluation of the ARK PlatformFootnote 21 was conducted using the ISO 27001 GAT. The ISO 27001 GAT can be used to identify gaps in ISO 27001 compliance.
The ISO 27001 GAT consists of 41 questions divided into 7 clauses. Each clause is divided into sub-clauses, containing one or more requirements (questions). For example, the “Leadership” clause is divided into three sub-clauses: the first sub-clause is leadership and commitment which contains three requirements. The first requirement is: “are the general ISMS objectives compatible with the strategic direction?”; a positive answer means that the ISMS supports the achievement of the business objectives. (Figure 10 illustrates this example.)
The ISO 27001 GAT was conducted on the ARK Platform both before and after implementing TIKD. Before implementing TIKD, the ARK Platform only used access control, based on authentication process, to provide access to the platform. The results of both evaluations can be seen in Table 5 where #Req. defines the number of requirements for each sub-clause, Impl defines the number of implemented requirements, and %Impl. defines the percentage of implemented requirements.
It can be seen that compliance with the ISO 27001 standard increased, from 54% to 85%, after implementing the TIKD on the ARK Platform. There was a notable increase in the “Operation” and “Performance evaluation” clauses after the TIKD was employed. However, there are still some requirements that are yet to be addressed in order to achieve an increased level of compliance with the ISO 27001 standard. Table 6 outlines these unaddressed requirements as well as the action needed to implement them.
5.2 Privacy Information Evaluation
The privacy information evaluation of the ARK PlatformFootnote 22 was conducted considering the clauses defined in the ISO/IEC 27701:2019Footnote 23 Annex A and B, concerned with the personal data handling. Annex A, PIMS-specific reference control objectives and controls, defines the control requirements for PII controllers. Annex B, PIMS-specific reference control objectives and controls, defines the control requirements for PII processors.
The ISO 27701 evaluation followed the same configuration as the ISO 27001 GAT evaluation (conducted before and after TIKD). For this evaluation, before the implementation of TIKD, the ARK Platform had documented personal data handling; however, some elements were not fully implemented. After implementing TIKD on the ARK Platform, all personal data handling elements were included. Table 7 shows the evaluation results, where the first and second columns represent the Annex and the target clause. The third column defines the number of control requirement for the corresponding clause. The before TIKD group of columns defines the number and percentage of the implemented control requirements for the corresponding Annex clause. The same applies for the after TIKD group of columns.
According to the evaluation results, Annex A results (A 7.2–7.5) show a compliance improvement after implementing TIKD, mainly in A 7.3 and A 7.5. In the case of A 7.3, obligations to PII principals, the ARK Platform before TIKD was less accurate than the ARK Platform after TIKD implementation as some control requirements related to implementation aspects were only covered by the latter. In A 7.5, PII sharing, transfer, and disclosure, the ARK Platform before TIKD complied with the documented control requirements; meanwhile, the ARK Platform after TIKD complied with both the documented and implementation control requirements. In this clause, both versions did not comply with the control requirement of “Countries and international organizations to which PII can be transferred are identified and documented” as sharing information with international organizations is beyond the scope of the ARK Platform.
Similar to Annex A, the Annex B results (B 8.2–8.5) show a compliance improvement after implementing TIKD. In B 8.5, PII sharing, transfer, and disclosure control requirements, the low percentage in the ARK Platform after TIKD is due to the fact that the ARK-Virus Project does not define subcontractors for processing personal data. Additionally, the control requirements of B 8.5 are related to countries and international organizations—this is beyond scope of the ARK-Virus Project. In B 8.4, privacy by design and privacy by default, the ARK Platform after TIKD satisfies the control requirements; however, the before TIKD version did not comply with any of the control requirements as they are all related to implementation aspects which were not covered by this version.
6 Conclusions
In this chapter the Trusted Integrated Knowledge Dataspace (TIKD) was presented as an approach to securely share data in collaborative environments by considering personal data handling, data privileges, access control context specification, and a privacy-aware data interlinking.
TIKD was implemented in the ARK Platform, considering the security requirements of the ARK-Virus Project, to explore the extent to which an integrated sharing dataspace, based on Linked Data technologies, personal data handling, data privileges, and interlinking data, contributes to building a trusted sharing dataspace in a collaborative environment. In comparison with state-of-the-art works TIKD integrates solutions for security aspects in compliance with the ISO 27001 security information standard and GDPR-compliant personal data handling in compliance with the ISO 27701 privacy information standard as part of the data security infrastructure.
The TIKD evaluation considers the requirements of the security standard ISO 27001 and the control requirements of the privacy information standard ISO 27701. The security evaluation of the ARK Platform was conducted using the ISO 27001 Gap Analysis Tool (GAT). The evaluation compared two versions of the ARK Platform, a version before TIKD implementation and a version after TIKD implementation. According to the results, the implementation of the TIKD achieved an 85% ISO 27001 compliance score, improving the security aspects of the ARK Platform as compared to the version before TIKD implementation (54% ISO 27001 compliance score).
The privacy information evaluation was conducted considering the control requirements defined by the ISO/IEC 27701:2019 standard and following the same configuration as the security evaluation. According to the results, the ARK Platform after implementing TIKD achieved a 91% ISO 27701 compliance score, improving the privacy information aspects defined by the standard when compared to the version before TIKD implementation (64% ISO 27701 compliance score).
Future work will focus on addressing the remaining ISO 27001 standard requirements. Additionally, the TIKD will be evaluated by the project stakeholders and their feedback will be used to distill further requirements.
Notes
- 1.
GDPR Art.9-1.
- 2.
- 3.
- 4.
- 5.
Available at https://openark.adaptcentre.ie/Ontologies/ARKCube.
- 6.
Available at https://openark.adaptcentre.ie/Ontologies/ARKPlatform.
- 7.
- 8.
- 9.
The action of replacing personal identifiable information with artificial identifiers.
- 10.
- 11.
These terms are explained later in this section.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
The evaluation was performed by three computer scientists with strong backgrounds in Linked Data and security systems. The first evaluation was performed in February 2021 and the second was performed in April 2021.
- 22.
The evaluation was performed by the same three computer scientists from the first evaluation. The first evaluation was performed in June 2021 and the second was performed in August 2021.
- 23.
References
Corrigan, S., Kay, A., O’Byrne, K., Slattery, D., Sheehan, S., McDonald, N., Smyth, D., Mealy, K., & Cromie, S. (2018). A socio-technical exploration for reducing & mitigating the risk of retained foreign objects. International Journal of Environmental Research and Public Health, 15(4). https://doi.org/10.3390/ijerph15040714
Curry, E., Derguech, W., Hasan, S., Kouroupetroglou, C., & ul Hassan, U. (2019). A real-time linked dataspace for the Internet of Things: Enabling “pay-as-you-go” data management in smart environments. Future Generation Computer Systems, 90, 405–422. https://doi.org/10.1016/j.future.2018.07.019
Dankar, F. K., & Badji, R. (2017). A risk-based framework for biomedical data sharing. Journal of Biomedical Informatics, 66, 231–240. https://doi.org/10.1016/j.jbi.2017.01.012
Fan, K., Wang, S., Ren, Y., Li, H., & Yang, Y. (2018). Medblock: Efficient and secure medical data sharing via blockchain. Journal of Medical Systems, 42(8), 1–11. https://doi.org/10.1007/s10916-018-0993-7
Franklin, M., Halevy, A., & Maier, D. (2005). From databases to dataspaces: A new abstraction for information management. SIGMOD Record, 34(4), 27–33. https://doi.org/10.1145/1107499.1107502
Franklin, M., Halevy, A., & Maier, D. (2008). A first tutorial on dataspaces. Proceedings of the VLDB Endowment, 1(2), 1516–1517. https://doi.org/10.14778/1454159.1454217
Hernandez, J., McKenna, L., & Brennan, R. (2021). Tikd: A trusted integrated knowledge dataspace for sensitive healthcare data sharing. In 2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC) (pp. 1855–1860). IEEE.
Jeffery, S. R., Franklin, M. J., & Halevy, A. Y. (2008). Pay-as-you-go user feedback for dataspace systems. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD ’08 (pp. 847–860). New York, NY: Association for Computing Machinery. https://doi.org/10.1145/1376616.1376701
Junior, A. C., Basereh, M., Abgaz, Y.M., Liang, J., Duda, N., McDonald, N., & Brennan, R. (2020). The ARK platform: Enabling risk management through semantic web technologies. In: J. Hastings & F. Loebe (Eds.) Proceedings of the 11th International Conference on Biomedical Ontologies (ICBO), Italy, September 17, 2020, CEUR Workshop Proceedings (Vol. 2807, pp. 1–10). CEUR-WS.org (2020). http://ceur-ws.org/Vol-2807/paperM.pdf
Liu, X., Wang, Z., Jin, C., Li, F., & Li, G. (2019). A blockchain-based medical data sharing and protection scheme. IEEE Access, 7, 118943–118953. https://doi.org/10.1109/ACCESS.2019.2937685
McDonald, N. (2015). The evaluation of change. Cognition, Technology and Work, 17(2), 193–206. https://doi.org/10.1007/s10111-014-0296-9
McKenna, L., Liang, J., Duda, N., McDonald, N., & Brennan, R. (2021). Ark-virus: An ark platform extension for mindful risk governance of personal protective equipment use in healthcare. In Companion Proceedings of the Web Conference 2021 (WWW ’21 Companion), April 19–23, 2021, Ljubljana, Slovenia. New York, NY: ACM (2021). https://doi.org/10.1145/3442442.3458609
Munoz-Arcentales, A., López-Pernas, S., Pozo, A., Alonso, Á., Salvachúa, J., & Huecas, G. (2019). An architecture for providing data usage and access control in data sharing ecosystems. Procedia Computer Science, 160, 590–597 (2019). https://doi.org/10.1016/j.procs.2019.11.042
Nakamoto, S. (2009). Bitcoin: A peer-to-peer electronic cash system. http://www.bitcoin.org/bitcoin.pdf
Otto, B., Hompel, M. T., & Wrobel, S. (2019). International data spaces (pp. 109–128). Springer, Berlin (2019). https://doi.org/10.1007/978-3-662-58134-6_8
Pandit, H. J., Polleres, A., Bos, B., Brennan, R., Bruegger, B. P., Ekaputra, F. J., Fernández, J. D., Hamed, R. G., Kiesling, E., Lizar, M., Schlehahn, E., Steyskal, S., & Wenning, R. (2019). Creating a vocabulary for data privacy—the first-year report of data privacy vocabularies and controls community group (DPVCG). In On the Move to Meaningful Internet Systems, 2019, Rhodes, Greece, October 21–25, 2019, Proceedings, Lecture Notes in Computer Science (Vol. 11877, pp. 714–730). Springer (2019). https://doi.org/10.1007/978-3-030-33246-4_44
Sun, W., Huang, Z., Wang, Z., Yuan, Z., & Dai, W. (2019). A method and application for constructing a authentic data space. In 2019 IEEE International Conference on Internet of Things and Intelligence System, IoTaIS 2019, Bali, Indonesia, November 5–7, 2019 (pp. 218–224). IEEE. https://doi.org/10.1109/IoTaIS47347.2019.8980430
Sun, Y., Yin, L., Sun, Z., Tian, Z., & Du, X. (2020). An IoT data sharing privacy preserving scheme. In 39th IEEE Conference on Computer Communications, INFOCOM Workshops 2020, Toronto, ON, Canada, July 6–9, 2020 (pp. 984–990). IEEE. https://doi.org/10.1109/INFOCOMWKSHPS50562.2020.9162939
Villata, S., Delaforge, N., Gandon, F., & Gyrard, A. (2011). An access control model for linked data. In: R. Meersman, T. Dillon, & P. Herrero (Eds.), On the Move to Meaningful Internet Systems: OTM 2011 Workshops (pp. 454–463). Berlin: Springer.
Xu, Z., Sheng, Y.-P., He, L. R., & Wang, Y. F. (2016). Review on knowledge graph techniques. Journal of University of Electronic Science and Technology of China, 45(dzkjdxxb-45-4-589), 589. https://doi.org/10.3969/j.issn.1001-0548.2016.04.012
Yang, X., Li, T., Pei, X., Wen, L., & Wang, C. (2020). Medical data sharing scheme based on attribute cryptosystem and blockchain technology. IEEE Access, 8, 45468–45476. https://doi.org/10.1109/ACCESS.2020.2976894
Zillner, S., Bisset, D., Milano, M., Curry, E., García Robles, A., Hahn, T., Irgens, M., Lafrenz, R., Liepert, B., O’Sullivan, B., & Smeulders, A.E. (2020). Strategic research, innovation and deployment agenda—AI, data and robotics partnership. Third release. BDVA, euRobotics, ELLIS, EurAI and CLAIRE (Vol. 3)
Zillner, S., Curry, E., Metzger, A., Auer, S., & Seidl, R. E. (2017). European big data value strategic research & innovation agenda.
Zou, X. (2020). A survey on application of knowledge graph. Journal of Physics: Conference Series, 1487, 012016. https://doi.org/10.1088/1742-6596/1487/1/012016
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2022 The Author(s)
About this chapter
Cite this chapter
Hernandez, J., McKenna, L., Brennan, R. (2022). TIKD: A Trusted Integrated Knowledge Dataspace for Sensitive Data Sharing and Collaboration. In: Curry, E., Scerri, S., Tuikka, T. (eds) Data Spaces . Springer, Cham. https://doi.org/10.1007/978-3-030-98636-0_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-98636-0_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98635-3
Online ISBN: 978-3-030-98636-0
eBook Packages: Computer ScienceComputer Science (R0)