According to the data protection by design principles required/established by GDPR, which draws from earlier privacy by design principles, privacy and data protection should be proactively embedded in the systems-to-be since their inception and throughout their development (rather than as an afterthought or, even worse, as a mere reaction to data breaches). Thus, our projects introduce privacy in a variety of activities and tools that cover the whole span of the systems development lifecycle. Some (PDP4E, SMOOTH, PAPAYA) mostly focus on the design time, while others (DEFeND, BPR4GDPR, POSEIDON) are also targeting runtime and operation. All in all, they encompass a broad range of processes and activities: requirements operationalization and instantiation (PDP4E), modelling (DEFeND, PDP4E, BPR4GDPR), process engineering (BPR4GDPR, PDP4E), risk management (POSEIDON, PDP4E), verification and validation (BPR4GDPR, SMOOTH, PDP4E), runtime monitoring (BPR4GDPR) and refactoring (BPR4GDPR, PDP4E).
Besides, although each of the said projects is addressing its own, individual goals, common topics can be traced among them, as they all address some of the crucial challenges posed by GDPR:
Data and process inventory GDPR requires that most organizations keep a registry of the data processing activities they carry out, including data categories, data subjects affected, etc. Even when such registry is not compulsory (e.g. for small enterprises), it is still pivotal for other activities (e.g. it is quite difficult to address data subject portability requests if there is no such inventory of personal data categories). Hence, features are provided to elicit, map and analyse data (DEFeND, SMOOTH, PDP4E) and processes (BPR4GDPR), to model data flows (SMOOTH, PDP4E) and business processes (BPR4GDPR), even by dynamically discovering process on runtime (BPR4GDPR).
Consent management GDPR requires that organizations are able to prove the nature of the consent they have got to process personal data. Together with other lawful basis for processing (e.g. contracts, legitimate interests), other GDPR principles (which deal with collection, storage and purpose limitation) and obligations to limit disclosure, it may be difficult to ensure that processing activities is lawful at all times. With that aim, many of these projects try to bridge data subject interests with organization processes, by providing features that support consent and preference management (DEFeND, BPR4GDPR) on the one hand; and policy, data, permission and data subject rights management and enforcement (BPR4GDPR, POSEIDON) on the other one. Dashboards are particularly recognized (PAPAYA, BPR4GDPR, POSEIDON) as an appropriate pattern to address the complexity of dynamic consent management from the data subject’s side.
Encryption measures GDPR requires that security and data protection technical measures are applied and enforced. In this respect, several projects (DEFeND, PAPAYA, BPR4GDPR) address encryption-based protection measures (e.g. privacy-preserving encryption, anonymization or cryptography-based access control).
Distributed data processors GDPR establishes obligations regarding data processors which may perform data processing activities on behalf of the data controllers (which are ultimately responsible for that). Several projects (BPR4GDR, PAPAYA, POSEIDON, SMOOTH) take especially into account such distribution of data processing activities provided “as a Service” among several organizations. Such distribution paradigm is even brought to the solutions provided by the projects themselves (PAPAYA, BPR4GDPR, SMOOTH), by sticking to a “compliance as a service” approach that fosters the distribution of the different software modules developed.
Accountability GDPR compliance not only requires abiding by the measures prescribed there, but also being able to demonstrate that they have been effectively implemented and responsibly adopted. This concept, known as the accountability principle, requires keeping evidence of the measures taken and processes carried out, and it is also being addressed by some projects (POSEIDON, PDP4E).
Next, we detail the solutions planned by each project and to what extent they solve the challenges mentioned in the previous section.
In order to cover its functional needs towards GDPR compliance and cope with the operational phases, BPR4GDPR has specified the system architecture highlighted in Fig. 2. As illustrated, the BPR4GDPR architecture is divided in four “quadrants”, reflecting different groups of functionalities. In the following, the main principles and technical ideas are summarized.
Governance provides all functions related to policy management, representing the Policy Decision Point (PDP) of the system. In BPR4GDPR, policies hold a dual role: (1) they provide the means for system governance, in the sense that they set the rules that regulate the operation of BPR4GDPR components; (2) they comprise the knowledge base that feeds the procedure of process reengineering, towards by design compliant process models. To this end, BPR4GDPR develops a comprehensive Policy-based Access and Usage Control framework, tailored for the needs of highly distributed environments, involving multiple stakeholders, even in cross-border scenarios. The ground technology is the academic work described in , along with the respective software prototype, whereas policies are grounded on the compliance ontology, providing a high-level codification of GDPR into concepts that need to be taken into consideration by the policy framework.
Planning concerns the specification of workflow models and their verification as regards compliance with the GDPR, and their subsequent transformation, if needed, so that they become compliant by design. The first step in this direction is facilitated by tools allowing their description in a way that effectively guides their execution, while also being expressive enough to capture associated provisions; these tools are grounded upon prior academic work of BPR4GDPR researchers . Further, in order to automatically incorporate policies as part of workflow design, the BPR4GDPR approach involves sophisticated means for the evaluation of process specifications against a number of compliance aspects. Their main aim is to control access to, usage of and flow of information and prevent illegitimate activity, as well as to determine whether critical tasks are properly included and, if not, impose their execution.
Monitoring deals with process mining and monitoring with the aim to identify discrepancies between compliant and actual behaviour. To this end, BPR4GDPR implements a Privacy-Aware Process Mining Framework, based on mature technology brought by its partners, particularly ProMFootnote 7 [16, 17]. The approach is primarily based on two concepts: streaming process mining, that allows analysing real-time data in order to detect problems, anomalies and potential frauds; the concept drift issue, calling for solutions for change detection and continuous update, in order to handle situations where new factors/requirements render the process model out-of-date and in need to be adapted/improved.
Finally, in order to facilitate the deployment of appropriate technical measures, as required by the GDPR, Run-time provides the means for the run-time system operation, particularly in terms of policy enforcement, data management, privacy-enhancing tools, and interaction with data subjects. In this context, the project provides a set of functional components addressing common needs of stakeholders. This so-called Compliance Toolkit consists of modular functions that, fostering “plug and play” to the extent possible, will be easy to deploy, easy to configure and easy to integrate within an organization’s ICT environment, while they will be automatically incorporated to process chains, as a result of re-engineering. The toolkit’s modules fall into three families:
Privacy-enhancing technologies, particularly cryptographic tools, devised for data and communications confidentiality, anonymization and pseudonymization, as well as enforcement of access rights by cryptographic means .
Data management tools that, by means of data access and usage management, provide for controlling data handling, including retention and storage, pre- and post-processing, etc. A core position is held by the Data Management Bus (Fig. 1), comprising the main Policy Enforcement Point (PEP).
User-centred tools, providing for the enforcement of the data subjects’ rights, including information and notification, consent, and consideration of own preferences as regards data handling.
Although the market is full of companies offering their services and tools for GDPR compliance, such solutions are mostly focused on providing generic approaches and frameworks that allow organizations to evaluate their current GDPR readiness level and propose some generic guidelines for moving towards compliance. They do not, however, provide specific methods, techniques and tools to tackle the above challenges. As a result, the above challenges remain. It is therefore important, as indicated by the EU call Secure societies—Protecting freedom and security of Europe and its citizens (topic DS-08-2017: Cybersecurity PPP: Privacy, Data Protection and Digital Identities), “to develop tools and methods to assist organizations to implement GDPR”.
We strongly believe the proposed Data Governance for Supporting GDPR (DEFeND) Platform will tackle these challenges therefore significantly contributing to the objectives of the call, through the development of tools and methods integrated into a platform that will provide solutions to the above challenges and support continuous GDPR compliance.
The main aim of the DEFeND project is to deliver an innovative data privacy governance platform, which will facilitate scoping and processing of data and data breach management and will support organizations towards GDPR compliance.
Organizations, in order to comply with the GDPR, have to implement in their processes, at a very low-level, different tools, solutions and processes, so privacy is inherently integrated in these. Therefore, it is important that DEFeND provides a solution that not only supports compliance of the relevant GDPR articles, but also fulfils special characteristics of needs that organizations might have. That way DEFeND goes beyond current products that offer general solutions and need special expertize and effort in order to cover the requirements of the organizations (by adapting the general solutions to the special needs of the organizations).
Another important aim for DEFeND is to be budget-available. We found many of the current solutions available in the market are too expensive for SMEs and require a high-level of expertize in order to adapt them. Therefore, it is important that DEFeND is adaptable enough so that organizations with budget restrictions can still make use of it. We plan to achieve this by following a modular strategy that provides different services to users and supports both planning and operational stages. This allows two innovative characteristics: on the one hand, the solutions are more specific to the needs of the organization and, on the other hand, the modules of DEFeND could be extended with new solutions. Another aim of DEFeND is to support not only organizations to comply with GDPR but consultants, (legal and/or technical) to use it as part of their consultancy services to clients seeking GDPR compliance.
To achieve the above aims, the project focuses on providing a realistic and useful solution that deals with the main research challenges mentioned above, through 7 objectives:
Design and development of a successful, market-oriented, platform to support organizations towards GDPR compliance
Develop a modular solution that cover different aspects of the GDPR
Automated methods and techniques to elicit, map and analyse data that organizations hold for individuals
Advanced modelling languages and methodologies for privacy-by-design and data protection management
Specification, management and enforcement of personal data consent
Integrated encryption and anonymization solutions for GDPR
Deployment and validation of the DEFeND platform in real operational environments
Solutions targeting MEnts to help them to adopt the GDPR should be simple, economically affordable, reliable and of general purpose so that they can be used independently of their business context. SMOOTH aims at implementing a cloud-based platform that meets all these requirements and automatically assess the compliance of MENTs regarding basic elements of the GDPR that affect most of them. Figure 3 outlines the SMOOTH platform.
The SMOOTH platform is being implemented in a modular approach comprising a front-end and a back-end. The front-end performs the interactions with the MEnts in order to: (1) get all the contextual information and resources (documents) required for running GDPR compliance validation tests, (2) deliver the GDPR compliance report in a simple, constructive and reliable format. The back-end integrates the technologies implementing the automatic assessment of compliance with the main elements of the GDPR impacting the MEnts. The compliance report is to be generated in the back-end based on the results obtained from the automatic assessment process. Following, we detail these components.
MEnts access the SMOOTH platform through a registration/subscription process where they have to fill an entry questionnaire. The questionnaire captures contextual information about the MEnt business, such as its data protection background, the personal information that is collecting from its customers/providers (if any) and the data protection mechanisms currently in place (if any). This information is highly valuable to the algorithms performing the compliance analysis in the back-end.
It should also be stressed that SMOOTH will use the customers’ data repository only for the purpose of generating the compliance report. The data analysis will happen in real time and once the compliance report is delivered, all data sample from the MEnt will be removed from the platform. In any case, due to the analysis process, the SMOOTH platform becomes a data processor; therefore, MEnts, being the data controllers, will have to sign an online contract for letting the SMOOTH platform process their data. This contract will be signed the first time MEnts access the SMOOTH platform. This way, we guarantee that SMOOTH is itself compliant with the GDPR.
Finally, the front-end will be the interface to deliver the GDPR compliance report generated in the platform back-end. This report will use a plain and simple language in a constructive tone to expose the failures in the GDPR compliance, in an order of importance, along with appropriate guidance for their resolution.
The goal of the SMOOTH back-end is to automatically produce a compliance report against the main basic elements of the GDPR affecting MEnts. The back-end uses as input the information provided in the registration process and the resources uploaded by the MEnts to the platform. The back-end is formed by three modules each of them analysing specific resources type. The output generated by these modules will be consolidated to produce the GDPR compliance report. Following, we describe the modules comprising the SMOOTH platform back-end.
SMOODATA This module analyses the personal information that MEnts are storing using as input the sample of customers’ and providers’ information repository uploaded by the MEnts. The module assesses if the MEnt has sufficient permission to store the information that it is actually storing. That is, whether the MEnts are only storing the personal data items declared in the consent documents (analysed by the previous module) or is storing personal data (by mistake) non-agreed by users in the consent documents with the customer personal data. The module also identifies the presence of” Sensitive Personal Data” in the data repository. This type of data requires a special treatment (e.g. sensitive personal data must be encrypted).
Each of the above modules has an added value in its own right. Focusing on specific aspects of the GDPR, each module could well apply to larger organizations beyond the context of MEnts. In SMOOTH, the three modules will be integrated together to create the core of the SMOOTH platform. The output generated by each module will be processed and combined together to create the final GDPR compliance report to be delivered to the MEnts.
As previously explained, privacy and data protection aspects are usually dealt with from perspectives which, despite providing valuable contributions, are not aligned with systematic engineering practice. This makes engineers consider privacy as an unfamiliar aspect they often ignore . Nonetheless, in order to ensure that privacy and data protection features are effectively embedded in the products, systems and services, it seems reasonable to directly involve those who are responsible for creating and developing them—that is, put the engineers in the loop. Any legal innovation (e.g. data minimization principle, right to be forgotten, data protection impact assessment, or accountability, to name just a few) needs to go along with systematic guidance to engineers, so as to ensure that it is effectively implemented . This idea follows the “code is law” aphorism, in that features implemented by software products has practical implications of what is allowed to do as much as the legal regulation.
Hence, PDP4E claims that engineers must be endowed with methodological and technological tools to systematically apply privacy of data protection principles so as to comply with the regulatory framework. These methods and tools should allow for other, competing requirements and system constraints, and they must especially bear in mind that the savvy and effort to apply them by non-privacy-experts should be taken to a minimum, by being aligned with the engineers’ expertize. In order to pay effective consideration to privacy and data protection, engineers must be endowed with tools that map data protection principles, data subject rights, and controller obligations, onto engineering terms such as backlog items, database structures, business process models or deployment architectures.
Thus, PDP4E fosters the production of privacy and data protection methods and tools that integrate within the large heritage of software and systems engineering, which have long amassed a substantial wisdom that is methodically and systematically applied by engineers in their daily work—and which might well be taken advantage of for privacy and data protection as well. PDP4E vouches the seamless inclusion of privacy and data protection functions into general-purpose software and system engineering tools of customary use by engineers (as recommended by ENISA ), to support that privacy and data protection be embedded throughout the methods and workflows followed by engineers in the SDLC. This represents a “shift left” in the application of privacy and data protection, from the Op[eration]s towards the Dev[elopement] activities. That way, PDP4E results populate the field of Privacy and Data Protection Engineering, which “pursues systematic approaches for the inception and application of privacy-oriented solutions throughout systems and software development processes”  and which precisely revolves around methods and tools employed by engineers .
It shall be noted that this approach implicitly considers a honest but reckless engineer, who is willing to introduce privacy and data protection into their developments, but lacks the expertize or the resource (be it monetary or time) to appropriately address them. Likewise, we also remark that this implies that the organizations developing products are willing to cooperate to achieve privacy and data protection, and they are committed to protecting data subjects from attacks to their privacy, even if these might yield some benefit to the organization itself. That is, the organization assumes being in charge of protecting the rights and freedoms of the data subjects on their behalf, even if might defy some (illegitimate) business ambitions. This approach is not so peculiar indeed, as it is already applied in other fields (e.g. Occupational Safety and Health where organizations must look after the work-related risks of their employees). All in all, this Privacy and Data Protection Engineering approach does not prevent organizations holding personal data from intentionally violating privacy and data protection regulations and principles if they are willing to do so, but lowers the practical barriers they may be facing to reach the compliance they have voluntarily assumed and committed to achieve.
PDP4E is providing a set of systematic, economical, engineering methods and tools (as opposed to mere legal regulation, void principles, informal craftsmanship or managerial procedures) that introduce privacy and data protection issues throughout the disciplines of the systems development lifecycle (SDLC), leveraging the wisdom of software and systems engineering and integrating within existent, general-purpose, engineering methods and tools. In particular, PDP4E is addressing four disciplines, viz. risk management, requirements engineering, model-driven design and systems assurance.
The Risk Management discipline addresses potential negative effects of uncertain events. In PDP4E, these mostly refer to the impact on the individuals’ (i.e. data subjects’) rights and freedoms derived from the personal data processing activities carried out by an organization (data controller), in the context of a Privacy and Data Protection Impact Assessment (PIA / DPIA). Nonetheless, following a multilateral security approach, PDP4E also gives appropriate consideration to security risks, business risks and risks related to data processors (vendors that process data on behalf of the data controller under a contract). Typical risk management concepts (e.g. assets, threats, vulnerabilities, impacts, countermeasures or controls) are handled by the PDP4E risk management method, which builds on previous risk assessment methods (LINDDUN  and STRIDE ) and the use of data flow diagrams (DFDs) to model how personal data flows across different data processing activities and organizations (i.e. data controllers and processors).
The Requirements Engineering discipline allows analysing, managing and verifying that a product, system or service meets the needs posed by a variety of stakeholders. In PDP4E, privacy and data protection requirements arising from legal texts, industry standards and generic privacy goals (e.g. unlinkability, transparency, and intervenability ) are handled as templates of non-functional requirements (NFRs), which can only make sense if they are parameterized and instantiated within the specific context of the endeavour at hand (i.e. the specification of each project’s functional requirements). In PDP4E, this process is addressed through the successive refinement of abstract needs into operational requirements, and the use of a lightweight version of the “problem frames” approach initially proposed by PROPAN .
The Model-Driven Design discipline allows representing a system under development from different perspectives, so as to support engineers in moving from an abstract understanding of the system to a fine-grained, detailed design; and eventually verifying that the system models match the desired properties. PDP4E method proposes that models of the system-to-be be enriched with properties that respond to privacy and data protection specific features. For instance, structural models (dealing with data types, attributes and relationships) can include further properties to determine which data is personal, whether it is sensitive, upon what basis it was collected, and how long it can be retained. Procedural models (e.g. dataflows) can represent the processes that deal with personal data, the processing operations it is being subject to, how data flows from one operation to another, for what purpose it is being used, and who is authorized to access it. And architectural models (representing components and their deployment) can represent who stores and processes personal data and under which jurisdiction. But model-driven design goes beyond a merely descriptive approach: it can be leveraged for data mapping and inventory activities (i.e. identifying and categorizing the personal data that will be processed by the system), analysis and reasoning about the most appropriate design solutions regarding privacy and data protection (through the systematic application of privacy design strategies, tactics and patterns ), and generation of model-based tests that help verify the application of access control mechanisms.
The discipline of Systems Assurance focuses on the actions that must be arranged and executed to achieve and ensure the confidence that a system abides by some given requirements. Compliance with modern privacy regulatory frameworks requires not only sticking to the corresponding legally binding obligations, but also being able to demonstrate that appropriate actions have been taken throughout the development process. Thus, systems assurance becomes key to support privacy principles such as accountability, transparency and intervenability. PDP4E provides a formal model of the regulatory framework (in particular, GDPR and its interpretation through related quasi-, co- and self-regulations) as a method that includes required processes and relevant relations between one another, roles that carry them out, plus their input and output products. Then, during the development of a project, generated artefacts are captured that provide evidences, which can be traced to specific requirements posed by the regulatory framework, so that, all in all, and through a logical argumentation process, compliance with the regulation can be claimed. In order to support that assurance process, PDP4E also provides reusable argumentation patterns that act as templates of typical techniques to achieve and justify compliance.
This approach is realized into a set of interrelated but loosely coupled Privacy and Data Protection Engineering tools that PDP4E is producing (Fig. 4, and which leverage and extend general purpose software and system engineering tools already in the context of each of the said disciplines. Thus, the privacy and data protection risk management tool is a new version of a previous security-management tool called MUSA, the requirements engineering and the model-driven design tools are extensions of Papyrus (a modular model-driven engineering framework), together with some ancillary tools implemented on the source code analysis tool Frama-C, and the assurance tool draws from the OpenCert assurance framework. All the tools that PDP4E is creating are equipped with knowledge bases (of different types, depending on the respective discipline), whose contents can be instantiated during a development process. These knowledge bases capture best privacy and data protection practice and make them ready to be used from engineering tools. Likewise, all the tools rely on model-based approaches and produce models of one or another type (controls, requirements, system structure, processes and architecture, argumentations, among others). The results of the different tools are also related: the risk management and the requirements engineering tools provide complementary views (risk-oriented and goal-driven, respectively) of the attributes that shall be met and validated by the design; the assurance tool captures artefacts produced by other tools as evidences for compliance, etc.
The main objective of PAPAYA is to design and develop a platform of privacy preserving analytics modules that allows the outsourcing of analytics operations into untrusted cloud servers while protecting the privacy of the data. Thanks to these newly developed privacy preserving analytics modules, stakeholders will be able to ensure their clients’ privacy (and be compliant with the GDPR) while extracting valuable and meaningful information from the analysed data.
In particular, PAPAYA develops novel privacy preserving neural network classification primitives that are based on partially homomorphic encryption, secure two-party computation or fully homomorphic encryption. A privacy preserving collaborative training solution based on differential privacy is also being implemented. Furthermore, the problem of privacy preserving counting and privacy preserving trajectory clustering are investigated.
The PAPAYA framework contains components that will be running in the cloud environment (such as privacy preserving machine learning services, auditing services and others), and components that will be running on the client side (such as the Data Subject toolbox which will provide means for end-user privacy and usability of the platform). To facilitate user experience and enable data subjects and data controllers to exercise their rights over their data and control what is disclosed to third parties, the platform will provide dashboards for the different actors featuring, e.g. usable visualizations and auditing components.
The PoSeID-on solution is based on innovative technologies such as blockchain , smart contracts and cloud computing, that provide targeted benefits for end-users, potentially enabling them to manage personal data and data access authorizations in an easy, secure and auditable way. Additionally, it helps both public and private entities to identify new business opportunities, to be compliant with GDPR while processing personal data, as well as to undergo a substantial ICT-driven transformation, which will ensure higher security of end-user’s data. PoSeID-on also impacts society as a whole, as it leads to increased trust in the digital market, in addition to supporting fundamental rights in the digital society.
Through smart contracts, the project aims to meet the need of data confidentiality, inviolability, and access control for data subjects. Through the blockchain technology, references to PII shall be managed and exchanged securely. The blockchain technology was selected due to two main reasons. First and foremost, there was the need to maintain an irrevocable record of PII transactions, including permissions handling and all kinds of operations involving PII processing, for providing full control to PII owners, for accountability, and for legal assurances. On the other hand, there was need to allow multiple entities to share data and to contribute to data processing, without relinquishing control over their own databases, or without relying on a central datastore. By agreeing to participate in the PoSeID-on system, users benefit from full control over their PII, and third parties can provide an auditable ledger of all their PII-related operations to users and regulators. Moreover, it should be highlighted that no PII is ever stored in the blockchain, that only stores information on permissions and on PII handling.
Figure 5 illustrates the overall PoSeID-on architecture, identifying the various system components.
Table 1 lists the conceptual components and the respective short description.
It should be noted that, from the perspective of PoSeID-on, data controllers (entities that determine the purposes, conditions and means for the processing of PII) and data processors (the entities that process PII on behalf of data controllers) are treated in the exact same way, as these functionalities often reside in the same system.
The platform developed by the project is now being assessed in four different pilot deployments (in Italy, France, Spain and Malta), in public, private and mixed contexts. Specifically, the Italian pilot aims at enhancing e-services for public officials, the Spanish pilot aims to improve e-Government services for the citizens of Santander, the Maltese pilot focuses on helping businesses to better sponsor and offer their services to customers, and the French pilot is aimed at simplifying e-services for French citizens. Initially, pilots involve a basic, limited set of users, to be enlarged during the evaluation phase. The pilots run in a controlled environment in order to simulate real-life services and conditions.