Understanding the GDPR from a requirements engineering perspective—a systematic mapping study on regulatory data protection requirements

Negri-Ribalta, Claudia; Lombard-Platet, Marius; Salinesi, Camille

doi:10.1007/s00766-024-00423-4

Understanding the GDPR from a requirements engineering perspective—a systematic mapping study on regulatory data protection requirements

Original Article
Open access
Published: 10 July 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Requirements Engineering Aims and scope Submit manuscript

Understanding the GDPR from a requirements engineering perspective—a systematic mapping study on regulatory data protection requirements

Download PDF

Claudia Negri-Ribalta ORCID: orcid.org/0009-0003-8480-5788^1,2,
Marius Lombard-Platet² &
Camille Salinesi¹

341 Accesses
Explore all metrics

Abstract

Data protection compliance is critical from a requirements engineering (RE) perspective, both from a software development lifecycle (SDLC) perspective and regulatory compliance. Not including these requirements from the early phases of the SDLC can prove costly and challenging afterward. The general data protection regulation (GDPR) from the European Union (EU) sets a list of requirements that organizations working within its scope should satisfy. However, these requirements are complex to work with, as legal prose tends to be vague and imprecise, and not all requirements have received the same attention from researchers. This study aims to identify the research published in RE for helping compliance with regulatory data protection requirements. We gathered and analyzed 90 articles from 2016 to 2022 through a systematic mapping study. We analyzed key trends in the sample, such as year of publication, publication venue, type of research, interdisciplinarity in the author’s background, GDPR focus of compliance element, and type of proposal. Our main findings show ongoing interest, mostly published in conferences, in achieving overall compliance with the GDPR and consent as the most popular topics. Other topics, such as cookies or children’s data, did not receive significant attention. Research over the whole RE process has been done. 20 (22%) of the papers have authors affiliated with non-computer science; however, most research seems not interdisciplinary. We finally discuss gaps in the literature, possible future areas of research, and the importance of interdisciplinary research for regulatory data protection requirements in RE.

Modeling ecosystems of reference frameworks for assurance: a case on privacy impact assessment regulation and guidelines

Article Open access 07 November 2022

An Iterative and Incremental Approach to Address Regulatory Compliance Concerns in Requirements Engineering

Legal Requirement Elicitation, Analysis and Specification for a Data Transparency System

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The general data protection regulation (GDPR) is an EU regulation that defines personal data lifecycle requirements at a European level. It ranges from data subjects’ rights, fines, policies, or business processes. For example, fines for non-compliance can be up to 20 million euros, or up to 4% of the company’s yearly global turnover [1]. Thus, entities working with information systems (IS) must pay close attention to the legal requirements, as non-compliance might cause them (on top of fines) a loss of reputation and increased human and monetary spending.

Even though 5 years have passed, companies are still regularly fined. For instance, in January 2023, Meta received a 390,000,000€ fine from the Irish Data Protection Commission [2]. According to [3], EU-wide, more than 1000 fines have been given between July 2018 and March 2022 for violations of GDPR. The average fine was of around 1 500 000€, the highest being a 746,000,000€ fine for Amazon Luxembourg, in July 2021, at the time of this paper's writing.

As such, it is clear that GDPR is, at best, partially implemented by many private actors despite the strong financial incentive to achieve compliance. In this context, the need to include GDPR and privacy requirements in the software development life cycle (SDLC) becomes imperative. However, handling requirements emanating from regulations and legal documents is challenging [4, 5]. This situation arises due to the conflicting nature of legal prose versus requirements and the SDLC [6, 7]. Legal prose tends to be vague, so it can travel through time and be applied in multiple contexts. Conversely, RE aims at having precise requirements [8, 9].

To address this issue, requirement engineering researchers have proposed, over the last years, various artifacts to tackle diverse aspects of GDPR compliance—either from a technical point of view or following a wider interdisciplinary approach. This paper aims to map and identify the approaches RE has proposed for GDPR compliance and understand the current state of the art through a systematic approach. Through a systematic mapping study, the objective is to identify what has been proposed and future areas of research, venues, and research methodology. Consequently, the results are to provide the reader with an exhaustive list of the artifacts proposed in the RE for GDPR compliance. The expected outcome is a list of diverse artifacts, ranging from frameworks to extensions of conceptual languages.

2 Background and related work

RE has a longstanding history with regulatory data protection requirements. In the early stages, the RE community researched and proposed about safety requirements for critical systems. Multiple lines of research have concluded that software engineers find regulatory requirements—including data protection regulations—challenging to understand and translate into the information system [4, 6, 10,11,12,13,14].

2.1 Privacy versus data protection regulatory requirements

There are several challenges with privacy requirements. Firstly, the definition of privacy may vary due to cultural elements, personal preferences, and conceptualization [15]. Given this situation, privacy can be a vague term that may encompass different issues [15]. Therefore, regulations dealing with personal data prefer to regulate information privacy or data protection—that is to say, issues regarding allowing the data subject to determine by themselves which type of personal data, how and when will be shared, including its’ life cycle, in line with [16] definition of privacy^{Footnote 1} In other words, as a generalization, it is the individual who chooses how their data should be processed rather than the system itself..

Privacy requirements also differ from data protection regulatory requirements because the latter seeks to set requirements in one specific aspect of the former. Regulatory data protection requirements emanate from a regulation or legal body. Therefore, regulatory requirements come from a specific type of document(s) and may signal specific requirements. For example, regulation can mandate organizational requirements such as appointing a data protection officer or identifying the entity responsible for data processing—such as the [1]—which might not be present as an element in a privacy ontology. Using Glinz [9] taxonomy on requirements, regulatory requirements for data protection could be labeled as constraints. Even if the stakeholders do not agree on a requirement set by the regulation—for example, appointing a DPO—this requirement is a restriction set by the regulation—in our example, the GDPR [1]—.

Furthermore, working with regulatory requirements for data protection requires specific knowledge and expertise about that specific regulatory body [4], when the same is not necessarily the case for privacy requirements. The regulatory requirements can reference other pieces of regulation and evolve over time [4, 7]. For example, the GDPR entered into force in 2018 and interacts with the Privacy and Electronic Communications Directive (ePrivacy directive) or with the Digital Service Act (DSA) and Digital Markets Act (DMA), which have entered into force in the recent months. Consequently, knowledge of these policies and others is required for regulatory requirements for data protection in Europe, which is not the case for all privacy requirements.^{Footnote 2}

All in all, privacy requirements are not necessarily the same as regulatory requirements for data protection. The two have different origins, expectations, and specifications, among others. While stakeholders might have different conceptualizations of privacy, regulatory data protection requirements set out requirements independently of the privacy conceptualization, even if the wording allows for different interpretations.

2.2 Privacy requirements engineering

From an RE perspective, privacy is a well-established area of research [17]. Various research papers focus on privacy requirements, including legal concerns, each emphasizing different levels or topics [18]. Several reviews—systematic or not—have been conducted on the subject.

Kalloniatis et al. [17] researched the management and elicitation of privacy requirements, taking a holistic view of privacy requirements. Their research does not follow a systematic approach but reviews well-known frameworks and approaches for privacy requirements [17]. In addition, they highlight the importance of including security and privacy requirements from the early phase of the software development lifecycle, in line with established academic work [4, 5, 17].

Morales-Trujillo et al. [19] share their results of a systematic mapping study on the privacy-by-design paradigm in software engineering. They report that there has been an increased interest in the subject in 2018, which they relate to the entry into force of the GDPR [19]. In addition, most papers propose models for the subject; however, most of the contributions are in the initial stages and need further development [19].

Netto et al. [20] carried out a systematic literature review of privacy requirements engineering, focused on the years from 2000 to 2016. In their research, most of the literature on requirements engineering would focus on the elicitation process on privacy requirements, followed by their analysis [20]. Furthermore, they highlight that language used in the legal text is very different from requirement engineering, which complicates the work between the two domains, and that there is a lack of modeling language that can bridge both said domains [20].

Recently [21] published a systematic literature review on privacy requirements and their perception across IT practitioners, understanding privacy requirements broadly. They provided a list of requirements elicitation techniques, methods, and frameworks published until 2021 [20]. They conclude that most used tools or frameworks in academia do not align with those in the private sector or practitioners [20].

All the previous works do not focus specifically on regulatory data protection requirements compliance. Some acknowledge the subject and discuss the implications of the corresponding regulation. For example, both [19, 21] point out that there seems to be a peak of published papers related to privacy requirements in 2018, which relate to the entry into force of the GDPR. However, they do not focus on the compliance of a specific regulation or the GDPR.

Several proposals are consistently mentioned and studied throughout these papers as ways to tackle privacy requirements. Some examples are:

LINDDUN is a privacy threat modeling framework based on data flow diagrams that allow the analyst to elicit and model privacy threats from early stages SDLC [22, 23]. By including privacy concerns from the beginning of the SDLC, the idea is to help software developers build PbD software [23]. One of the latest development is LINDUUN GO, which is a lightweight and gamified approach to the framework [24].
Privacy safeguard (PriS) is an organizational goal-oriented framework that helps analyze the business processes from a privacy perspective [18, 25, 26]. Based on the Enterprise Knowledge Development, “PriS provides a set of concepts for modeling privacy requirements in the organization domain and a systematic way-of-working for translating these requirements into system models” [25]. In fact, it identifies eight privacy goals—authentication, authorization, identification, data protection, anonymity, pseudonymity, unlinkability, and unobservability—and 7 privacy-process patterns that help to achieve the goals [25, 26]. Through a determined methodology that consists of 4 steps, PriS allows the practitioner to elicit privacy goals, analyze and understand the impact of these goals and identify which patterns and techniques may better support the achievement of the privacy goals [25, 26].
The role-based access control (RBAC) approach is proposed by [5]. Through a goal-driven approach, their framework helps model privacy requirements from early phases in role engineering to bridge the gap between “high-level privacy requirements and low-level access control policies” [5]. Furthermore, their framework helps modeling and analyzing competing security and privacy requirements [5].
Spiekermann and Cranor [27] suggests that privacy requirements should be tackled from an architectural (privacy-by-architecture) and policy (privacy-by-policy) point of view, taking a hybrid approach. Using the FIPPs principles and privacy reflections as a starting point, they identify that privacy can be divided into three spheres: the user, joint, and recipient spheres. “The ‘user sphere’ encompasses a user’s device [...] The ‘recipient sphere’ is a company-centric sphere of data control that involves back-end infrastructure and data sharing networks” [27] while the joint sphere denotes the services that companies provide to users [27]. Privacy requirements are essential in all three spheres and are divided into data transfer, storage, and processing [27]. Accordingly, they propose that taking a hybrid approach of privacy-by-policy—which focuses on choice and notice—and privacy-by-architecture—which focuses on data minimization, anonymization, and PETs—“satisfies business needs while minimizing privacy risk” [27].

Other methods and frameworks have also been proposed for requirements engineering on privacy.

Across all these proposals, the conceptual model of privacy is not necessarily the same. Each place emphasizes on different characteristics of privacy. Similarly to what was previously mentioned, these proposals do not focus primarily on regulatory data protection compliance or the GDPR. Indeed, some of the proposals discuss and touch on regulatory data protection, but it is not their main focus. Hence, their fit is more related to privacy requirements than to regulatory data protection requirements.

2.3 Regulatory data protection requirements engineering

Data protection and regulatory requirements have long been studied in RE [7, 28]. Multiple frameworks, tools, methodologies, and artifacts have been proposed, either for specific regulatory regimes (or applied to specific laws) or data protection regulatory requirements in a general manner.

[28] carried out a systematic literature mapping on modeling for regulatory compliance. The authors compared how the goal-based and non-goal-based approaches differ towards legal and regulatory compliance, highlighting their respective benefits and drawbacks. Their research found that compliance modeling and compliance checking were the most popular topics in modeling, followed by analysis [28]. Furthermore, healthcare was the domain that received the most attention, with the Health Insurance Portability and Accountability Act (HIPPA) the most popular legal document.

[29] identified the critical factors in implementing GDPR in general in organizations through a systematic literature review. In broad terms, they suggest that few papers discuss the GDPR [29]. They mention several benefits to implementing the GDPR in an organization, including but not limited to better data management, cost reduction, and better reputation [29]. They concluded that the main challenges are that the GDPR is a complex regulation—in line with what [6] indicates on regulatory data protection requirements engineering—and there is a lack of people with expertise on the subject [29]. In addition, finding data protections is difficult and expensive, and implementing the GDPR is time-consuming and costly in financial and human resources [29].

[30] carried out a systematic mapping study on automated GDPR compliance using natural language processing (NLP) tools in RE. In particular, they researched which “NLP approaches are useful for RE and for which RE activity?” [30]. They gathered papers up to 2021, and compliance is out of their scope [30]. They have identified that NLP for RE is an ongoing trend.

From an ontological perspective, some proposals either fully seek to tackle requirements from the GDPR or include some of it. A non-exhaustive list is the following:

PrOnto [31] proposes an ontology for GDPR requirements based on legal reasoning. It does not focus on privacy but on legal data protection aspects to check compliance. Another stated goal of PrOnto is to help with legal reasoning and “web of data and information retrieval” [31] that can be used for legal reasoning. The methodology used to develop the ontology is “methodology for building Legal Ontology” (MeLOn), which is frequently used in the legal domain to create ontologies [31]. For example, it has a variety of classes to represent regulatory data protection requirements, such as obligation, rights, or purpose class. From this work, the authors have extended their work to propose the DAPRECO [32] knowledge base.
CoPRI is a privacy ontology proposed by Gharib et al. [33], that includes some aspects of the GDPR, even though it is not their main purpose to be a GDPR ontology. It includes elements that go beyond the scope of the GDPR. It does not use legal reasoning for the ontology.
Similarly, LloPY also includes legal aspects to its ontology [34], although it focuses on IoT instruments. It seeks to include specific attributes of regulatory data protection requirements, such as consent or choice, into privacy policies [34]. It does not use legal reasoning for the ontology.
The GDPRov family comprises of the GDPRov, GDPRtEXT, and GConsent proposals, which are combined ontologies that tackle specific legal requirements of the GDPR. [35,36,37]. GDPRov is an OWL2 ontology that focuses on the specific elements of the GDPR, namely “acquisition, usage, storage, deletion, and sharing of consent and data lifecycles” [35]. It focuses on processes like data deletion and access, consent management, and personal data [35]. GConsent [37] is more specific, focusing solely on GDPR consent requirements. The approach of these ontologies is similar to PrOnto as the GDPR plays a fundamental role, although they do not explicitly state they use legal reasoning.

In this manner, our work differs from other works in RE, as it focuses solely on data protection requirements, specifically the GDPR. Previous work has focused either on privacy requirements—which we have already discussed how they differ from data protection—or on compliance requirements in general. In comparison, data protection is becoming more standardized: the OECD privacy principles [38] and the GDPR became de-facto the international standards for data protection legal instruments. Furthermore, this research aims at studying what has been done in RE to achieve or help compliance, and not GDPR requirements from a general perspective.

3 Research method

This research follows the guidelines from Petersen et al. [39,40,41] to gather, analyze and produce the systematic mapping study. Mapping studies’ objective is to discover trends over a specific area, whereas a systematic literature review tries to answer a research question [40]. In this manner, a mapping does not necessarily need to find all the research articles that may answer a question nor have one, but instead grab a good representative sample of the area of interest [39, 40, 42]. Given this approach, mapping studies do not require a qualitative assessment [39].

Following what Petersen et al. [39] indicates, the approach of this study is a “’thematic analysis’ that counts papers related to specific themes or categories”. Similarly, this mapping study contains a few research questions closer to a systematic literature review than a mapping, as they cannot be answered only by reading the abstract. However, mappings and reviews can be considered a continuum, benefiting by using research strategies from one another [39]. Therefore the researcher does not necessarily have to restrict themself to read one part of the article when doing a mapping study [39].

3.1 Objective and mapping studies

This systematic mapping study was done with the guidelines from Petersen et al. [39,40,41] for planning the research, as seen in Fig. 1. In particular, the paper gathering sample plan is based Petersen et al. [39].

This mapping study aims to discover the trends of what initiatives have been proposed in requirement engineering to achieve GDPR compliance. Its main objective is to summarize and disseminate the current state of affairs of GDPR regulatory requirements in RE.

To discover the trends and fulfill the objective of the mapping study—as mapping studies do not necessarily answer a question [40]—the following sub-questions were chosen:

RQ1::: When, where and in what type of venue has the research been published? (I.e: Type of venue)
RQ2::: Are the authors of multiple disciplines?
RQ3::: What type of research is it?
RQ4::: On what stage of the RE process does the paper focus?
RQ5::: What compliance elements of the GDPR does the research article focus on?
RQ6.1::: What type of proposal is the paper?
RQ6.2::: If a modeling language extension is proposed, it is an extension of which language?

This research understands initiatives in a broad manner, as artifacts or treatments proposed by Wieringa et al. [43]. Research is understood as investigations that tackles knowledge questions on the domain [43]. We are also interested in knowledge questions, as they act as guiding elements for research.

3.2 Search planning

In order to create the search string and define the exclusion and inclusion criteria, we followed the PICO (Population, Intervention, Comparison, and Outcomes) approach per Kitchenham and Charters [41]. Although the PICO approach is recommended for systematic reviews and not mappings, it does help identify keywords, as Petersen et al. [40] did on their mapping.

Population: Our population of interest is GDPR.
Intervention: “The intervention is the software methodology/tool/technology/procedure that addresses a specific issue” [41], which in our case is requirements engineering, more precisely compliance.
Comparison: We compare what has been proposed in RE, understanding proposals flexibly (so knowledge questions can be included). Following a similar strategy to Petersen et al. [40], we do not empirically compare the proposal, as this study aims to discover trends, not to do a systematic literature review.
Outcomes: As this research is a mapping study, as indicated by [40] this item does not necessarily apply to our research. However, the outcome is to have a systematic list of proposal from RE for GDPR compliance.

As a result, with PICO we have identified keywords. Overall, and taking a similar approach as Petersen et al. [40], there are four groups of words to be searched:

Set 1: Searching elements related to the GDPR, such as data protection regulation.
Set 2: The scope is within requirements engineering.
Set 3: The requirements need to be linked with compliance or adherence.
Set 4: The requirements must come from a legal or regulatory document.

Hence, a list of synonyms is identified and provided in Table 1. We performed the search string based on this identified synonyms.

Table 1 Keywords and synonyms

Understanding the GDPR from a requirements engineering perspective—a systematic mapping study on regulatory data protection requirements

Abstract

Similar content being viewed by others

Modeling ecosystems of reference frameworks for assurance: a case on privacy impact assessment regulation and guidelines

An Iterative and Incremental Approach to Address Regulatory Compliance Concerns in Requirements Engineering

Legal Requirement Elicitation, Analysis and Specification for a Data Transparency System

1 Introduction

2 Background and related work

2.1 Privacy versus data protection regulatory requirements

2.2 Privacy requirements engineering

2.3 Regulatory data protection requirements engineering

3 Research method

3.1 Objective and mapping studies

3.2 Search planning

3.3 Exclusion and inclusion criteria

3.4 Search string

3.5 Data extraction

4 Results from the mapping

4.1 RQ1: When, where and in what type of venue has the paper been published?

4.1.1 When: year of publication

4.1.2 Where: venue published

4.1.3 In what: type of venue

4.2 RQ2: Are the authors from multiple disciplines?

4.3 RQ3: What type of research is it?

4.4 RQ4: On what stage of the RE process does the paper focus?

4.5 RQ5: What compliance elements of the GDPR does the research article focus on?

4.6 RQ6.1: What type of proposal is the paper?

4.7 RQ6.2: If a modeling language extension is proposed, it is an extension of which language?

5 Discussion

5.1 RQ1: When, where and in what type of venue has the paper been published?

5.2 RQ2: Are the authors from multiple disciplines?

5.3 RQ3: What type of research is it?

5.4 RQ4: On what stage of the RE process does the paper focus?

5.5 RQ5: What compliance elements of the GDPR does the research article focus on?

5.6 RQ6.1: What type of proposal is the paper?

5.7 RQ6.2: If a modeling language extension is proposed, it is an extension of which language?

6 Threat to validity

7 Conclusion

Data availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix 1: Papers selected for the mapping study

Appendix 2: Venues of the selected papers

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation