1 Introduction

Our healthcare systems are going through a tide of digital transformation. This is characterised by the increasing use of information and communication technology (ICT) in various healthcare contexts. Early use of ICT in healthcare focused on digitalisation of systems that were previously managed manually. One example is electronic health records, which emerged in the 1990s and are now gradually replacing traditional paper-based records (McLoughlin et al., 2017). In addition, a flood of advanced digital technologies is being introduced in healthcare, including artificial intelligence, big data analytics, wearables, sensors for health monitoring, mobile applications and robotics (World Health Organization, 2019). With the advance of ICT, healthcare can extend from hospital settings to private homes. For example, active and assisted living technologies promise to enable older citizens to live more independently in private dwellings, reducing their needs for caregiver interventions (Haque et al., 2020).

Due to the proliferation of ICT in healthcare, health data are being collected and processed on an unprecedented scale, with large quantities of health data being stored in electronic health records and beyond. Assistive technologies, often in the form of wearables and sensors, can be privacy-intrusive because they capture a user’s health and well-being status by collecting large quantities of data, such as vital signs, daily activities and even data on the ambient environment (Ienca & Villaronga, 2019). The explosion of health data processing creates two conflicting needs. On the one hand, the data privacy of individuals requires the prohibition or minimisation of health data processing; on the other hand, the processing of large quantities of health data is needed (or even encouraged) to support scientific research, public health management and technological development, among other good purposes.

Privacy-enhancing technologies hold the potential to play an important role in reconciling these two conflicting needs by enabling the processing of health data in a less privacy-intrusive manner. Many privacy-enhancing technologies seek to protect privacy by de-identifying personal data such that natural persons are more difficult to identify (Ribaric et al., 2016). In the healthcare sector, anonymisation and pseudonymisation are two commonly used privacy-enhancing technologies (Hansen et al., 2021). Several anonymisation models have emerged in computer science research. Arora et al. conducted a comparative analysis of several techniques, such as K-Anonymity, L-Diversity and T-Closeness (Arora et al., 2014). In a recent review, Majeed and Lee categorised user attributes into four types—direct identifiers, quasi-identifiers, sensitive attribute and non-sensitive attributes—and discussed how each type could be handled in anonymisation (Majeed & Lee, 2021). To protect user attributes, various anonymisation operations can be applied, such as generalisation, suppression, permutation, perturbation and anatomisation (Majeed & Lee, 2021). Pseudonymisation is another key security technique that can protect personal data processing while facilitating its usage (European Union Agency for Cybersecurity, 2022) The European Union Agency for Cybersecurity (ENISA) demonstrated the benefits of using pseudonymisation techniques in various healthcare contexts, including patient health data exchange, clinical trials and patient-sourced monitoring of health data (European Union Agency for Cybersecurity, 2022).

It should be noted that anonymisation and pseudonymisation have specific meanings under the EU’s legal regime. Rules regarding these terms have been set out in legal norms, notably the General Data Protection Regulation (hereinafter ‘GDPR’) (Regulation (EU) 2016/679, 2016). They are important tools to implement data protection by design, which has become a legal requirement in the EU since the introduction of the GDPR (Tamò-Larrieux, 2018).

Despite the statutory grounds laid out in the GDPR, the interpretation of anonymisation and pseudonymisation remains far from undisputed. Competing understandings exist within various normative instruments, including the Article 29 Data Protection Working Party’s Opinion (Article 29 Data Protection Working Party, 2014), the jurisprudence of the Court of Justice of the European Union (CJEU) and other guidance issued by national regulators. Beyond the legislative arena, the concept of anonymisation is also hotly debated in academia. Ohm claims that the notion of anonymisation and its privacy-protecting promise have failed because scientists now have stronger powers to reverse the anonymisation process (Ohm, 2009). Rubinstein and Hartzog acknowledge the limitations of anonymisation and argue that the focus should be placed not on preventing harm, but on minimising the risk of re-identification and the disclosure of sensitive attributes through process-based data release policies (Rubinstein & Hartzog, 2016). Finck and Pallas believe that any anonymisation process always comes with a residual risk (Finck & Pallas, 2020). In light of this complexity, Colonna suggests considering synthetic data as an alternative to anonymisation, in order to facilitate health data sharing while acknowledging its potential shortcomings (Colonna, 2020).

Taking a forward-looking perspective, this paper aims to contribute to the scholarly debate around anonymisation and pseudonymisation by extending the discussion to the contexts of forthcoming EU data laws, with a focus on the draft European Health Data Space (EHDS) Regulation (EHDS Proposal, 2022). It does so by digging into the past, present and future of anonymisation and pseudonymisation in EU data laws. Starting with a positivist enquiry, the paper first investigates the traces and evolution of anonymisation and pseudonymisation in both pre- and post-GDPR EU data protection instruments. It then shifts focus to future EU data laws and examines the roles of anonymisation and pseudonymisation in such instruments, including the draft EHDS Regulation, the newly adopted EU Data Governance Act and the draft EU Data Act. Ultimately, the paper makes preliminary remarks on the draft EHDS Regulation and questions to what extent current legal definitions of anonymisation and pseudonymisation can be reconciled with the health data sharing arrangements proposed in this Regulation.

2 Traces from the Pre-GDPR Era

The notion of anonymisation and pseudonymisation under EU data protection law can be traced back to legal instruments prior to the adoption of the GDPR, including the EU Data Protection Directive 95/46 (hereinafter ‘the Directive’), Article 29 Data Protection Working Party Opinion 4/2007 and Article 29 Data Protection Working Party Opinion 5/2014.

2.1 October 1995: Directive 95/46/EC

The story starts in October 1995. At that time, the GDPR’s predecessor, the EU Data Protection Directive 95/46, was passed into law. While the Directive does not explicitly define the term ‘anonymisation’, it does indicate a binary distinction between personal data and anonymous data.

In particular, the Directive makes clear that data protection principles shall apply only to personal data, not to ‘data rendered anonymous’. Recital 26 of the Data Protection Directive provides that:

“Whereas the principles of protection must apply to any information concerning an identified or identifiable person; whereas, to determine whether a person is identifiable, account should be taken of all the means likely reasonably to be used either by the controller or by any other person to identify the said person; whereas the principles of protection shall not apply to data rendered anonymous in such a way that the data subject is no longer identifiable” (Directive 95/46/EC, 1995) (emphasis added).

Here, it can be seen that the identifiability of the person concerned is the key to drawing a line between personal data and anonymous information. Further, when determining whether a data subject is identifiable, ‘all the means likely reasonably to be used’ by ‘either the controller or by any other person’ to identify that person should be considered. At first glance, the standard of rendering personal data anonymous seems quite high, given the wording ‘all the means’ by ‘any other person’. However, the core lies in what ‘likely reasonably’ means. The Directive does not provide further guidance in this regard. This has been interpreted by the Article 29 Data Protection Working Party in its Opinion on anonymisation techniques and is elaborated below.

2.2 June 2007: Article 29 Data Protection Working Party Opinion 4/2007

Twelve years after the adoption of the EU Data Protection Directive, the then EU data protection watchdog, the Article 29 Data Protection Working Party, issued an Opinion on the concept of personal data (Article 29 Data Protection Working Party, 2007). In this Opinion, ‘anonymous data’ are defined as:

“any information relating to a natural person where the person cannot be identified, whether by the data controller or by any other person, taking account of all the means likely reasonably to be used either by the controller or by any other person to identify that individual” (Article 29 Data Protection Working Party, 2007)

This definition basically affirmed the notion of ‘data rendered anonymous’ in the Directive. On this note, the Opinion further provided that ‘anonymised data’ constitute anonymous data that have previously been personal data, but where identification has been rendered ‘no longer possible’ (Article 29 Data Protection Working Party, 2007). It also pointed out that aggregated data contained in statistical information may not necessarily constitute anonymous data when the sample size is not large enough (Article 29 Data Protection Working Party, 2007).

2.3 April 2014: Article 29 Data Protection Working Party Opinion 5/2014

In 2014, the Article 29 Data Protection Working Party issued an Opinion on anonymisation techniques (Article 29 Data Protection Working Party, 2014). Regarding the definition of anonymisation, Opinion 05/2014 stated that ‘anonymisation results from processing personal data in order to irreversibly prevent identification’ (Article 29 Data Protection Working Party, 2014). It reaffirmed what was stated in the EU Data Protection Directive and pointed out that all the means likely reasonably to be used by the controller or any third party should be considered when evaluating the re-identification risk (Article 29 Data Protection Working Party, 2014). The Opinion maintained that while anonymised data fell outside the scope of data protection laws, anonymisation itself was a form of processing of personal data (Article 29 Data Protection Working Party, 2014).

The Opinion established a relatively high threshold regarding the criteria of anonymisation and required that anonymised data should not be re-identifiable by the data controller or any other party. Specifically, the Opinion discussed the advantages and disadvantages of several anonymisation techniques, including noise addition, permutation, differential privacy, aggregation, k-anonymity, l-diversity and t-closeness (Article 29 Data Protection Working Party, 2014). The robustness of these techniques was evaluated against three prevailing re-identification risks, i.e. singling out, linkability and inference (Article 29 Data Protection Working Party, 2014). In the words of the Opinion, ‘an effective anonymisation solution prevents all parties from singling out an individual in a dataset, from linking two records within a dataset (or between two separate datasets) and from inferring any information in such dataset’ (Article 29 Data Protection Working Party, 2014). The standard for anonymisation set out here was thus relatively high, as the theoretical risk of re-identification by certain parties would be very difficult to rule out. In addition, Opinion 05/2014 specifically pointed out that pseudonymisation is not a method of anonymisation, as it only reduces the links between datasets and data subjects. The Opinion stated that case-by-case analysis should be conducted in order to determine what techniques are most suitable (Article 29 Data Protection Working Party, 2014). That being said, the Opinion did acknowledge the possible residual risks of an anonymised dataset and thus warned data controllers not to regard anonymisation as a ‘one-off exercise’ (Article 29 Data Protection Working Party, 2014).

3 Traces from the Post-GDPR Era

While pre-GDPR instruments took a more stringent approach toward anonymisation, post-GDPR norms started to diverge. The GDPR came as a turning point. By pointing out factors such as cost, time and technology, the GDPR seems to have marched toward a more contextual notion of anonymisation, but in an implicit way. However, the CJEU has spoken in a more explicit manner in its Breyer decision. While this decision came after the adoption of the GDPR, it was made in accordance with the rules of the Data Protection Directive. Despite this, the Breyer decision went far beyond the wording of the Directive and embraced a more relative and contextual interpretation of anonymisation (Case C-582/14, 2016). This contextual approach culminated in Convention 108 + , which clearly states that if the re-identification requires unreasonable time, effort or resources, the relevant data would be considered anonymous (Convention 108 +, 2018).

3.1 April 2016: GDPR

In April 2016, the GDPR was passed into law. It has since become the single most influential data protection legal instrument in the EU, with a wide global reach. The distinction between personal data and anonymous data therefore remains vital, as it determines whether the terms of the GDPR are applicable.

The GDPR defines personal data as ‘any information relating to an identified or identifiable natural person’ (Regulation (EU) 2016/679, 2016). The GDPR does not provide a definition for the term ‘anonymisation’. In its recitals, the GDPR refers to ‘anonymous information’ as ‘information which does not relate to an identified or identifiable natural person’ or ‘personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable’ (Regulation (EU) 2016/679, 2016). When determining whether a person is identifiable, ‘all the means reasonably likely to be used, such as singling out, either by the controller or by another person’ should be taken into account (Regulation (EU) 2016/679, 2016). The notion of ‘anonymous information’ here is similar to the notion of ‘anonymous data’ in the Data Protection Directive, except that the GDPR uses a softer wording: ‘another person’ instead of the more absolute ‘any other person’ in the Directive.

A major difference between the GDPR and the Directive is that the GDPR provides further guidance on determining what means are reasonably likely to be used to re-identify the natural person. The GDPR provides that several objective factors must be considered, including (1) the costs for identification, (2) the time required for identification, (3) the available technology at the time of the processing and (4) technological developments (Regulation (EU) 2016/679, 2016). While this is not explicitly spelled out, it seems that the GDPR has taken a more contextual approach toward anonymisation. It is indicated that if the means to re-identify the natural person are not reasonably likely because of time, money or technology, the data could be considered anonymous in the sense of the GDPR, despite a theoretical risk of re-identification. This point was further discussed in a CJEU decision from later in the same year.

While no definition of ‘anonymisation’ is given, the GDPR specifically defines the term ‘pseudonymisation’ as meaning ‘the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information’ (Regulation (EU) 2016/679, 2016). Such ‘additional information’ must be kept separately and technical and organisational measures must be taken to prevent re-identification (Regulation (EU) 2016/679, 2016).

3.2 October 2016: The Breyer Decision

Six months after the adoption of the GDPR, the CJEU delivered an important decision in Patrick Breyer v Bundesrepublik Deutschland (Case C-582/14, 2016). In its ruling, the Court discussed a core issue of data protection law: what is personal data and what is not. By doing so, the Court essentially touched upon the standard of anonymisation. The question before the Court was whether a dynamic IP address should be considered personal data. The Court said that it should be, because it was likely that the online media services provider could request the additional information necessary for re-identification from the competent authority (Case C-582/14, 2016).

In this case, both the Advocate General and the Court discussed the issue of re-identification, which was believed to be the standard for distinguishing anonymous data or information from personal data. The Advocate General explicitly criticised Opinion 5/2014’s interpretation of ‘means likely reasonably to be used by the controller or by any other person’ on the grounds that it was overly strict. The Advocate General pointed out that, theoretically, the possibility of identifying a person could not be ruled out and that the means must be reasonable, rather than all means being prohibited (Campos Sánchez-Bordona, 2016). In its ruling, the Court affirmed the Advocate General’s reasoning on this front and argued that a means would not be ‘likely reasonably’ to be used to re-identify a person if it was prohibited by law, or if it was practically impossible when the time, cost and manpower required were disproportionate (Case C-582/14, 2016). In other words, the Court essentially established that if re-identification is legally allowed, but the efforts for re-identification are disproportionate to factors such as costs and manpower, the risk of re-identification is in reality insignificant (Case C-582/14, 2016).

Interestingly, the Breyer decision was ruled under the Data Protection Directive, not the GDPR. However, the Court went beyond the relatively strict wording of the Directive and the stringent approach taken in Opinion 05/2014 and essentially argued for a more relative and contextual approach to anonymisation, more in line with the approach taken in the GDPR.

3.3 May 2018: Convention 108 + 

Two years after the enactment of the GDPR, the Council of Europe Convention for the protection of individuals with regard to the processing of personal data (Convention 108 +) was adopted (Convention 108 +, 2018). Convention 108 + is an international treaty for the protection of personal data. It addresses anonymisation in the texts accompanying the definition of ‘personal data’, endorsing a binary division between personal data and anonymous data while stating that ‘identifiable’ is the key to distinguishing between them (Convention 108 +, 2018).

While Convention 108 + does not directly define the term ‘anonymisation’, it does address what constitutes anonymous data. According to Convention 108 + :

“Data is to be considered as anonymous only as long as it is impossible to re-identify the data subject or if such re-identification would require unreasonable time, effort or resources, taking into consideration the available technology at the time of the processing and technological developments.” (Convention 108+, 2018)

As seen above, Convention 108 + ’s definition of ‘anonymous data’ broadly aligns with the GDPR’s description of anonymous information. However, it is important to note that Convention 108 + goes one step further by clearly embracing a relative and contextual interpretation of anonymisation. According to Convention 108 +, there are two categories of anonymous data: (1) data that do not make it possible to re-identify the data subject; (2) data that may enable re-identification, but where ‘such re-identification would require unreasonable time, effort or resources’ (Convention 108 +, 2018). Therefore, it seems clear from Convention 108 + that a dataset does not have to be absolutely impossible to link to a data subject to be considered anonymous data.

Convention 108 + states that ‘what constitutes unreasonable time, efforts or resources should be addressed on a case-by-case basis’ (Convention 108 +, 2018). Further, the Convention lists several factors to consider when evaluating the re-identifiability, including the purpose of the processing, its cost, the benefits of identification, the type of controller, the technology used and technological developments (Convention 108 +, 2018). On that note, Convention 108 + states that measures should be implemented to avoid re-identification even when data are anonymised (Convention 108 +, 2018). This reaffirms Article 29 Data Protection Working Party’s notion that anonymisation is not a ‘one-off exercise’ (Article 29 Data Protection Working Party, 2014).

Unlike the GDPR, Convention 108 + does not directly define the term ‘pseudonymisation’. However, it specifically points out that using a pseudonym does not lead to anonymisation because the data subject can either be identified or individualised. Pseudonymous data are therefore personal data and subject to the Convention (Convention 108 +, 2018).

4 Normative Tensions and Scholarly Debates

An evolutionary tale of anonymisation indicates that EU data protection legal instruments have taken different approaches before and after the GDPR. Prior to the GDPR, the notion of anonymisation was more stringent. The Data Protection Directive used phrases like ‘all the means’ by ‘any other person’, without clarifying what is ‘likely reasonably’. This is supplemented by Article 29 Data Protection Working Party’s Opinion 05/2014, which interprets ‘likely reasonably’ and in doing so essentially sets a relatively high standard for achieving anonymisation, i.e. anonymisation can only be achieved if no person can re-identify the data subject. However, post-GDPR instruments point to a more relative and contextual understanding of anonymisation, indicating that individual-level anonymity can be achieved if the re-identification risk by a specific party is relatively low considering the context, meaning that the data can be treated as anonymous relative to that party.

Thus, there is a tension between different legal sources, most notably Opinion 05/2014 and the Breyer decision, on the interpretation of anonymisation. Groos and van Veen argue that the Breyer decision in fact deviated from the Article 29 Data Protection Working Party’s rather absolute approach by leaning toward a more relative approach, i.e. a contextual approach (Groos & van Veen, 2020). Groos and van Veen state that the European Data Protection Board’s non-involvement in Breyer and relevant literature criticising Opinion 2014/5 were not in the spirit of the rule of law in a liberal democracy (Groos & van Veen, 2020). Mourby also discusses the Breyer decision and argues for a relative, governance-based approach to the understanding of anonymity, i.e. that it would be seen as possible that the same piece of information could be both personal data (for the discloser) and anonymous data (for the recipient) at the same time, taking into account the governance mechanisms in place to prevent reasonable means of re-identification on the part of the recipient (Mourby, 2020).

This tension exists not only in legal norms. Different understandings also exist among European data protection authorities. Some regulators, such as the Irish Data Protection Commission (Irish Data Protection Commission, 2019) and the UK’s Information Commissioner’s Office (Information Commissioner’s Office, 2021), are more willing to embrace the idea that anonymisation is contextual and relative. However, the Dutch Data Protection Authority leans toward the approach described in Opinion 05/2014 and seems to believe that anonymisation must be absolute (Groos & van Veen, 2020). The French data protection regulator Commission Nationale de l'Informatique et des Libertés refers to anonymisation as a process that makes identification impossible, which seems to have a more absolute tone (Commission Nationale de l'Informatique et des Libertés, 2020). In light of the complexity surrounding the standard of anonymisation, distinguishing personal data from anonymous data or information remains difficult in practice. The UK’s Information Commissioner’s Office has pointed out that the question regarding when a piece of data is personal data or anonymous information is one of the most challenging issues facing organisations today (Shah, 2021).

It can be seen from the normative instruments and academic discussion regarding anonymisation that the debate centres around one core question: what is the standard of anonymisation? Many other scholarly debates centre around similar questions. As Rubinstein and Hartzog observe, there has always been a formalist approach and a pragmatist approach toward anonymisation (Rubinstein & Hartzog, 2016). Formalists (such as Ohm) argue for a higher standard for anonymisation, while pragmatists tend to acknowledge the difficulty in achieving absolute anonymisation and focus more on re-identification risk management (Groos & van Veen, 2020; Mourby, 2020; Rubinstein & Hartzog, 2016).

Regardless of this, Opinion 05/2014 makes clear that anonymisation is not a ‘one-off exercise’ (Article 29 Data Protection Working Party, 2014). Further, Convention 108 + states that measures should be implemented to avoid re-identification even when data are anonymised (Convention 108 +, 2018). Groos and van Veen also emphasise that anonymisation is not a way to escape from the obligations of the GDPR, and that it in itself is a data processing activity. Indeed, anonymisation should enable the legitimate use of data (for instance in health research) (Groos & van Veen, 2020). On that note, scholars have engaged in discussions regarding the governance of health data sharing, beyond the mere usage of technical measures such as anonymisation. For example, Desai et al. propose a ‘Five Safes’ model, describing a secure context for data accessing in research, including safe projects, safe people, safe data, safe settings and safe outputs (Desai et al., 2016). With this as a basis, Groos and van Veen propose a ‘Six Safes’ model, adding a ‘safe transit’ pillar. Groos and van Veen apply this model in the health research setting and argue that anonymisation, together with the developments in co-governance, could promote data exchange and reduce obstacles (Groos & van Veen, 2020). Mourby notes that the UK has established a model for the governance of data reuse which consists of several controls to reduce risk, including not only de-identification, but also oversight, training, accreditation of relevant parties and data access agreements—accompanied by a Code of Practice (Mourby, 2020). Colonna suggests considering four factors when evaluating the risk of re-identification in health data sharing, including the nature of the dataset, the robustness of the anonymisation algorithm, the data release context and additional controls (Colonna, 2020).

In addition to the legal debates, an interesting question at the intersection of technical and legal discussions around PETs is to what extent do they meet the requirements of EU data protection law. It is worth noting that the terms ‘anonymisation’ and ‘pseudonymisation’ have specific legal meanings under EU data protection law, and the answer to this question depends on whether a PET falls into the legal realm of anonymisation or pseudonymisation.

Attempts have been made to determine the relationship between PETs and these two legal concepts. A study on PETs in the assisted living context indicates that certain groups of PETs, such as blind vision, secure processing and data hiding methods, are more likely to be viewed as pseudonymisation methods rather than anonymisation methods under the GDPR, mainly due to their technical reversibility (He, 2022). However, it is important to note that the assessment may change in specific cases because the legal concept of anonymisation is not absolute but context-specific and sensitive to many factors, such as time, costs and available technologies for re-identification (He, 2022). In an era where both PETs and related legal concepts are constantly evolving, it is essential to consider both the technical features of a PET and the legal interpretation of anonymisation and pseudonymisation when conducting similar analysis.

5 Traces in Future EU Data Laws

Ongoing debates do not seem to have intimidated EU policymakers in regard to anonymisation and pseudonymisation. On the contrary, there is a clear trend that anonymisation and pseudonymisation play important roles in the data sharing schemes proposed in future EU data laws, including the newly adopted Data Governance Act, the forthcoming Data Act and the EHDS Regulation.

5.1 February 2022: The Data Act Proposal

On 23 February 2022, the Commission proposed a draft Data Act. Among other things, this Act aimed to facilitate the use by public sector bodies of enterprise-held data, where there is an exceptional need (Data Act Proposal, 2022). In its recitals, the draft Data Act states that data holders need to anonymise personal data when it is necessary to make such data available to a public sector body. When anonymisation is not possible, other measures—such as pseudonymisation and aggregation—must be used before disclosure (Data Act Proposal, 2022).

However, the requirement to anonymise personal data as requested in the recitals strangely disappears in the articles of the draft Data Act. When it comes to requirements for making data available to a public sector body, Article 18 of the draft Data Act only requires data holders to ‘take reasonable efforts to pseudonymise the data’ before making the requested disclosure (Data Act Proposal, 2022). This is remarkably different from the recitals of the same draft Act, under which anonymisation must be the first step. It is unclear if this omission of an anonymisation requirement is intentional or accidental. Apart from the pseudonymisation requirement, the draft Data Act does not provide further guidance on the use of anonymisation and pseudonymisation techniques. Although Article 17(2)(d) stipulates that data requests should generally concern non-personal data (Data Act Proposal, 2022), there is a need for further clarification on the process for handling requests related to personal data. Some scholars have suggested explicitly mentioning anonymisation in Article 18(5) to address this issue (Drexl et al., 2022).

5.2 May 2022: The EHDS Proposal

On 3 May 2022, the Commission proposed a regulation on the European Health Data Space (hereinafter ‘draft EHDS Regulation’), aimed at creating a European Health Union by facilitating access to health data across the EU (EHDS Proposal, 2022). Specifically, the draft EHDS Regulation establishes a data permit system for secondary use of electronic health data. Each Member State must appoint a health data access body, which will be responsible for reviewing data access applications made by applicants that need to process health data. The health data access body can grant the applicant a data permit and provide them with access to the health data requested, in a secure environment (EHDS Proposal, 2022).

Notably, the requested electronic health data would only be provided on an anonymised or pseudonymised basis. Article 44 of the draft EHDS Regulation provides that:

”2. The health data access bodies shall provide the electronic health data in an anonymised format, where the purpose of processing by the data user can be achieved with such data, taking into account the information provided by the data user.

3. Where the purpose of the data user’s processing cannot be achieved with anonymised data, taking into account the information provided by the data user, the health data access bodies shall provide access to electronic health data in pseudonymised format.” (EHDS Proposal, 2022)

Where health data are provided on a pseudonymised basis, there are extra requirements. The additional information for re-identification must be held only by the health data access body, with data users being unable to re-identify the natural person from the obtained health data (EHDS Proposal, 2022). Additionally, data users must provide a description of the legal basis of data processing in accordance with Article 6(1) of the GDPR. Data users should also provide information on ethical assessments, where applicable (EHDS Proposal, 2022).

Given the requirement for health data to be shared in an anonymised or pseudonymised manner, one might expect anonymisation and pseudonymisation to play important roles for the sharing of health data in a future European Health Union. Despite the reliance on these tools, the EHDS proposal does not define or elaborate on what anonymisation and pseudonymisation entail. Rather, it only stipulates that all GDPR definitions continue to apply for the purpose of the EHDS Regulation (EHDS Proposal, 2022). This means that the GDPR definition of pseudonymisation applies to the EHDS, together with the notion of ‘anonymous information’ in the recitals of the GDPR, and that the term ‘anonymisation’ remains undefined.

5.3 May 2022: The Data Governance Act

The EHDS proposal is not alone in establishing data sharing mechanisms centred on anonymisation and pseudonymisation techniques. On 16 May 2022, the Data Governance Act was passed into law. It lays down conditions for the re-use of certain categories of data held by public sector bodies within the EU (Regulation (EU) 2022/868, 2022). This newly adopted Act attaches great importance to anonymisation and pseudonymisation techniques. It empowers public sector bodies to impose anonymisation or pseudonymisation requirements on the re-use of personal data (Regulation (EU) 2022/868, 2022).

Here, an interesting change should be noted in the legislative process of this Act. The Proposal for a Data Governance Act stated that:

“Depending on the case at hand, before its transmission, personal data should be fully anonymised, so as to definitively not allow the identification of the data subjects, or data containing commercially confidential information modified in such a way that no confidential information is disclosed.” (Data Governance Act Proposal, 2020) (emphasis added)

The draft Act uses ‘fully’ and ‘definitively’ when referring to data anonymisation. It seems that a more absolute understanding of anonymisation has been used, in the sense that the anonymised data should ‘definitively not allow the identification of the data subjects’. However, the draft Data Governance Act does not explain what ‘fully’ or ‘definitively’ mean. Thus, it remains unclear how the notion of anonymisation in this draft Act can be reconciled with the CJEU’s interpretation in its Breyer decision, in which a more relative approach was taken. This may explain why words like ‘fully’ and ‘definitely’ were removed in the official version adopted in May 2022 (Regulation (EU) 2022/868, 2022).

6 Critical Aspects of the Draft EHDS Regulation

While the draft EHDS Regulation proposes a health data sharing system on the basis of anonymisation and pseudonymisation techniques, there are aspects that require further clarification or careful monitoring in the further drafting and implementation of this Regulation.

First, the draft EHDS Regulation does not clarify what anonymisation and pseudonymisation entail in the health data sharing context, beyond making a reference to the GDPR definitions. From a legislative perspective, it is understandable that EU legislators prefer a uniform interpretation of key terms in EU data laws, to enhance legal certainty. However, this does not mean that the EHDS cannot provide further clarity regarding these key terms in the healthcare sector. From a practical perspective, lack of further guidance essentially means that data access bodies will need to make their own assessments of what ‘anonymised data’ and ‘pseudonymised data’ mean, and under what conditions the techniques can be used. Considering the competing interpretations of anonymisation among normative sources—as illustrated above—it cannot be expected that data access bodies across the EU will adopt a uniform understanding in this regard. This is not good for legal certainty.

Such a situation could be further exacerbated in the case of a single data holder. According to the draft EHDS Regulation, if a data access application pertains to data from only one data holder in a single Member State, that sole data holder would be responsible for issuing data permits and providing the requested health data to the applicant (EHDS Proposal, 2022). In other words, in the case of a data access application relating to only one data holder, it is up to that data holder to provide the data to the data user on an anonymised or pseudonymised basis. In the absence of an EU-level interpretation, it is highly likely that data holders across the EU may adopt different standards when providing access to health data. Thus, it seems reasonable to expect further guidance on anonymisation and pseudonymisation in the health data sharing context, including what ‘anonymised’ and ‘pseudonymised’ mean, what the standards of anonymisation and pseudonymisation are and what considerations should be made in choosing specific techniques.

Second, the draft EHDS Regulation imposes similar requirements on health data shared on an anonymised and/or pseudonymised basis. Such an approach has its pros and cons. The debate regarding the standard of anonymisation has been important because this standard determines what constitutes personal data and what does not. The difference in regulatory implications can be huge, as personal data are subject to the rules of the GDPR, among many other EU laws, while anonymous information is not personal data and is thus outside of the scope of the GDPR. The difference, however, would become less obvious under the data sharing mechanism proposed by the EHDS. This is because health data released on an anonymised basis would still be subject to the terms of the data permit and the requirements of the EHDS. Essentially, this would mean that the data users could use the anonymised data only in a secure environment provided by the health data access body and would act as joint controllers of the anonymised data with the health data access body. In addition, data users would be prohibited from re-identifying the data subjects concerned. In this regard, anonymised and pseudonymised data are treated similarly under the draft EHDS Regulation, in the sense that they are all subject to the requirements of the EHDS and the data permits under which they are granted.

On the plus side, the EDHS’s data sharing approach better protects data subjects by assuming that neither anonymised nor pseudonymised data are risk-free and subjecting them to the requirements of the data permit and the EHDS Regulation. Thus, it has the potential to counteract the ambiguity that exists regarding the interpretation and implementation of anonymisation and pseudonymisation techniques. Regardless of the approach taken, the anonymised or pseudonymised data would be treated similarly. As stated above, Opinion 05/2014, Convention 108 + and many scholarly contributions have underscored that anonymisation should not be a one-off exercise. The focus can be shifted from seeking absolute anonymity to re-identification risk management under the EHDS.

A possible problem of the EHDS arrangement is that it could discourage the use of anonymisation techniques. In the context of health data sharing, a ‘negative correlation’ between privacy and data utility has been observed (Colonna, 2020). In other words, de-identification affects the usability of datasets. From a data user’s perspective, pseudonymised datasets are much more desirable than anonymised datasets because of better utility. By subjecting anonymised and pseudonymised data to similar restrictions, the EDHS proposal creates even more incentive for data users to apply for pseudonymised datasets instead of anonymised dataset, given that the benefits of using anonymised data (i.e. less regulatory burdens) cannot be enjoyed in this context.

Technically speaking, the draft EHDS Regulation does impose additional obligations on data users who apply for pseudonymised datasets. In their applications, applicants must explain the legal basis for data processing in accordance with Article 6(1) of the GDPR and include information on ethical assessments if required by national law (EHDS Proposal, 2022). Depending on the particulars of the situation, these additional requirements may not be burdensome, especially if the data user needs to conduct ethical reviews anyway. However, it is up to the applicant to evaluate whether the additional administrative efforts would be proportionate to the potential benefits of pseudonymised datasets. There could be cases in which data users apply for pseudonymised datasets when the use of anonymised datasets could have been enough. This may not align with the data minimisation principle. However, given the lack of further guidance, it will be extremely difficult for health data access bodies and data holders to draw a line regarding when access to anonymised datasets should be granted. The effect of this data sharing arrangement therefore needs to be monitored carefully, to ensure that suitable techniques are used in the relevant data sharing contexts, so that the data minimisation principle is respected.

7 Concluding Remarks

The paper explores the interaction of anonymisation and pseudonymisation with EU data protection law by examining the past, present and future of these techniques in various EU data protection legal instruments. A closer look at these instruments from an evolutionary perspective reveals a shift in the approach to the interpretation of anonymisation. This shift is apparent in the GDPR and is further explored in the CJEU’s Breyer decision, with a culmination in Convention 108 +. By navigating the legal instruments, the paper reveals a tension between pre-GDPR instruments, notably Opinion 05/2014 on anonymisation techniques, and post-GDPR norms.

The paper highlights that ongoing debates do not seem to have intimidated EU policymakers in regard to anonymisation and pseudonymisation. On the contrary, there is a clear trend that anonymisation and pseudonymisation play important roles in the data sharing schemes proposed in future EU data laws, including the newly adopted Data Governance Act, the forthcoming Data Act and the EHDS Regulation. While all three instruments attach great importance to the use of anonymisation and pseudonymisation techniques, none gives further clarification on these terms in the data sharing context. The paper indicates that discrepancies exist within and between these new EU data law instruments. Thus, an EU level guidance on anonymisation and pseudonymisation, especially in the health data sharing context, is still greatly needed.

The paper argues that the current approach adopted by the EHDS may discourage the use of anonymisation techniques, as compared with pseudonymisation. By extending data protection rules to anonymised data, the draft EHDS Regulation clearly embraces the idea of risk management. However, imposing similar legal obligations on anonymised data and pseudonymised data may favour the latter even when anonymisation is more suitable than pseudonymisation. This in turn risks compromising the data minimisation principle. Differences between anonymisation and pseudonymisation techniques should be taken into account and reflected in rulemaking.

Finally, if a legal obligation to anonymise or pseudonymise health data was to be introduced under the EHDS, such an obligation should also be considered for the processing of health data outside the scope of the EHDS. While the EHDS mainly binds the processing of health data granted under the EHDS, this does not mean that health data processing activities elsewhere are less risky. Therefore, the coverage of the anonymisation and pseudonymisation requirements can be expanded. This may be achieved by introducing a general legal obligation under the GDPR requiring the anonymisation or pseudonymisation of health data before processing, so that the privacy-enhancing powers and data sharing facilitation potentials of these techniques can be unleashed to a fuller extent.