Introduction

The re-use of health data for research, innovation, and policy-making offers great potential, which the European Commission (2022) is trying to capture by proposing the European Health Data Space (EHDS). While data are needed for any kind of (clinical) research, it is the use of large data sets, which could be made available by large-scale infrastructures such as the EHDS, that offers the opportunity to transform healthcare (Tretter et al. 2023). However, the application of artificial intelligence (AI) required to unlock the value of big data is fraught with ethical challenges (Howe III and Elenberg 2020; Gerke et al. 2020).

According to the German Ethics Council, “Data sovereignty […] is the central ethical and legal goal in confronting the challenges and opportunities presented by big data” (German Ethics Council 2017, p. 30). Although not used uniformly throughout the literature, the concept of data sovereignty transfers the historical idea of a subject’s position of control within a domain to data processes, and at the same time it normatively demands that these should be able to articulate and assert corresponding claims (Hummel et al. 2018). This requires the implementation of technical measures to generate corresponding empowerment effects. From a regulatory standpoint, the concept of data sovereignty takes up important basic decisions already laid down in law and reinterprets them in a specific functionality (Hummel et al. 2021a). Self-determination and informed consent are central keys to the principle of patient data sovereignty. From the perspective of a data user that uses health data to generate value, the thorough implementation of data sovereignty measures can also be seen as a necessary condition to meet the requirements of responsible research and innovation as proposed in various European frameworks (European Commission 2011; Burget et al. 2017).

Although facets of data sovereignty have been widely discussed at a theoretical level in certain studies (e.g., Hummel et al. 2021a; Wiertz 2022), these studies do not take a stance on how to advance data sovereignty in clinical practice—especially when AI is involved. Considering the enormous implications for data collection and processing, there is a need for key pillars to guide the implementation of data sovereignty and informed consent in clinical research, ideally in a standardized way. This complex task requires an exploratory, multidisciplinary approach and perspectives from varying fields of research as well as patients (Hummel et al. 2018). Using the qualitative method of a narrative literature review in combination with a design thinking approach, we contribute to the discussion by answering the research question: What are the practical requirements for an AI-ready implementation of data sovereignty and informed consent? The paper is structured as follows: The next section will outline the methodology of our study, followed by a presentation of the results. The study closes with a discussion of the findings and a conclusion.

Methodology

Answering our complex research question requires an exploratory and therefore qualitative, cross-disciplinary, and multidisciplinary approach involving several fields of science and learning. Sharing knowledge has generated a great interest in design thinking, which has been applied in many organizations and institutions in an attempt to acquire creative thinking and a broader vision—broader problem-solving for user-centered innovation (Inglesis Barcellos and Botura 2018). Design thinking can be described as a strategy to solve complex problems with the help of researchers from different disciplines (Wölbling et al. 2012). Various process models of design thinking exist. A common model from the HPI School of Design Thinking uses a systematic approach consisting of six phases: understand, observe, define the point of view, ideate, prototype, and test. The design thinking process is non-linear and iterative.

In line with Lauf et al. (2022), we modify this original design by transforming the stated phases to adapt the approach to our research purposes: awareness building, knowledge building, point of view, ideate, concept development, and validation. These adaptations were necessary because our approach is a conceptual research methodology rather than a more technical procedure commonly used in design thinking. Specifically, we applied literature reviews in the first two phases while changing phases five and six from a prototyping focus to concept developments and focus group discussions. The adapted iterative approach is shown in Fig. 1, with an embedded loop from the last to the first phase to support an agile research process.

Fig. 1
figure 1

Methodology based on a Design Thinking process with six phases (HPI School of Design Thinking 2022)

Our research group consists of 15 members with backgrounds in different disciplines, including the authors and additional researchers. Starting with awareness building, the researchers developed their own understanding of the possible requirements for data sovereignty and informed consent and the consequential implications for the development of IT solutions in the German healthcare system. Subsequently, in the second phase of knowledge building, the researchers conducted a narrative literature analysis within their field of research to obtain comprehensive information on potential requirements. Compared with a systematic literature review, a narrative literature review is more flexible and reflective (Pautasso 2019) and therefore better suited to the iterative and agile nature of our design thinking process. After this step, each researcher formed an individual point of view, based on the insights accumulated from the previous phases. In the phase of ideate, the researchers relied on the rather subjective insights that they gained to generate different ideas. These initial ideas were transformed into concrete requirements and implications in the subsequent concept development phase.

While the researchers worked independently in the five phases described so far, the final phase of validation was carried out in focus groups. Focus groups are appropriate when the researcher wants to obtain a variety of possibly divergent perspectives on a selected topic in order to capture the issue at hand as holistically as possible (Stewart 2014). In addition to the 15 researchers, two patient representatives participated in order to validate the concepts from a patient perspective. The focus groups used recommended collaborative tools such as digital whiteboards and presentations to support the research process (Brown 2008). After the third iteration of plenum discussions, the research process ended as there was a high level of perceived congruence between all researchers and patient representatives.

Results

The following sections describe the central requirements for a practical implementation of data sovereignty and informed consent as a central goal to overcome the ethical challenges associated with the use of big data in health. As a result of our study, we identified six requirements that will be outlined in detail here. While only the third requirement explicitly deals with AI, all of the other requirements are also necessary conditions for its application (Table 1).

Table 1 Key requirements for practical implementation of data sovereignty and informed consent

The concerns underlying the first requirement are supported by recent work on medical AI ethics (Tang et al. 2023). It aims to enable data sovereignty and informed consent through three fundamental technologies: a digital consent management system, privacy-preserving technologies, and privacy risk quantification. Informed consent for medical data is required by the European General Data Protection Regulation (GDPR) Art. 9(2), and thus digital consent management systems can facilitate this process. The GDPR outlines specific conditions for data processing, with explicit consent from the data subject being a common legal basis for medical research. As the volume of medical data increases, granular consent management becomes essential (Appenzeller et al. 2020) to facilitate active patient participation and engagement. Appenzeller et al. (2022) propose a workflow for sovereign dynamic consent that prioritizes patient involvement and provides a technical system for granular decisions with quantification of privacy impact, thereby enhancing informed consent decisions. Another fundamental part for patient-centered research is the use of privacy-preserving technologies. While anonymization and pseudonymization are commonly used methods in medical research, there is also a need for privacy-preserving technologies. To mitigate re-identification risks (Sweeney 2002), various privacy-preserving technologies exist (Dwork 2008) and have already been successfully implemented in a large-scale real-word use case (Kenny et al. 2021). The final technological building block we have identified for data sovereignty is the quantification of privacy risks. While digital consent empowers the patient, on the one hand, the degree of choice can potentially overwhelm the data subject, on the other hand (Appenzeller et al. 2022). Therefore, comprehensible privacy risk quantification should support patients in their decision to share data. Privacy risk quantification for personal data includes two main approaches: data-based quantification, which analyzes shared data for factors such as uniqueness and regulatory fines, and rule-based quantification, which examines privacy policies or text-based regulations that affect data privacy (Deußer et al. 2020; Kelley et al. 2009).

The second requirement aims to achieve data sovereignty and informed consent through the rigorous application of usability and interaction design standards in healthcare software. This requirement was identified because healthcare software currently suffers from poor usability and interface design (Wachter 2017). However, both aspects are essential to ensure that data sovereignty and informed consent can be achieved in practice by end users (Feth 2023). The principles of human-centered design and engineering are well understood and outlined. As such, there are three key documents that should be implemented in a software development process. Firstly, ISO 9241-210 (International Organization for Standardization 2019) provides the terminology needed to work towards usable software and describes the human-centered design process and its activities. Secondly, ISO 9241-110 (International Organization for Standardization 2020) provides guidelines on the fundamental qualities of usable software. To be considered usable, software must fulfil seven principles (suitability for user’s task, self-descriptiveness, conformity with user expectations, learnability, controllability, user error robustness, user engagement) which are described in more detail in the document. Finally, DIN EN ISO 13485 (International Organization for Standardization 2022) specifies requirements for a quality management system in which an organization must demonstrate its ability to provide medical devices and related services that consistently meet customer and applicable regulatory requirements. The document describes the need for thorough documentation, particularly with regard to design documentation.

The third requirement concerns the implementation of state-of-the-art knowledge to ensure trustworthy AI. Although AI and machine learning approaches have the potential to significantly advance clinical research (Yu et al. 2018), their application must be developed to meet specific quality standards and be secured against their risks (Beck et al. 2023; Gerke et al. 2020). The concept of trustworthy AI has been identified to be central for achieving this. The concept has three main components: “(1) it should comply with the law, (2) it should fulfil ethical principles and (3) it should be robust” (European Commission 2019, p. 3). Based on these key parameters and the EU core values, the EU identified seven key requirements for trustworthy AI: (1) human agency and oversight, (2) technical robustness and safety, (3) privacy and data governance, (4) transparency, (5) diversity, non-discrimination and fairness, (6) societal and environmental well-being, and (7) accountability (European Commission 2019). Putting these rather abstract requirements into practice is still a challenge, as their implementation is highly dependent on the type of technology used and the field of application. One way to define application-specific quality criteria is a risk-based approach (Poretschkin et al. 2021). In this approach, the object to be analyzed is specified and evaluated along six dimensions of trustworthiness (fairness, autonomy and control, transparency, reliability, safety and security, data protection). The risk-based approach has already been proven in classic IT security and functional safety (Aleksandrov et al. 2021) and can therefore serve as a means to assess and optimize the trustworthiness of AI for healthcare software. However, Botsman (2017) and Tretter et al. (2023) point out that it will continue to be the central task of human personnel to act as trust-bearers in clinical contexts and to ensure that this trust is constantly maintained.

As a fourth requirement, a health information exchange platform must also meet several criteria related to user acceptance and trust, as existing research has shown that while patients recognize the potential for secondary use of health data, concerns about trust, privacy, and transparency are widespread (Hutchings et al. 2020). By iterating existing knowledge in the literature with the patient representatives involved in the research group, we identified three factors as being most critical for patient acceptance: the safety and soundness of the application; the transparency of the application’s data use and privacy policies; and the convenience, ease of access, and immediate availability of the application. Safety and soundness of the application refers to the expectation and perception that the technology will carry out certain processes according to the user’s wishes and assumptions, while not harming the user in any way (Backhaus 2017; Hutchings et al. 2020). There are certain specificities related to digital trust, mainly because there is no direct interaction with another physical party (Taddeo 2009). However, research on trust transfer theory shows that patients tend to transfer their initial cognitive trust in their healthcare provider to the health IT application (Lu et al. 2011; McKnight et al. 2002). This highlights the importance of involving key healthcare users in the development and deployment of applications. To increase trust, developers should also implement trust-building measures such as informational accuracy, understandability, and the promotion of autonomy and patient empowerment, as recommended by van Haasteren et al. (2020). Second, the transparency of the application’s data use and privacy policies relate particularly to what types of data are shared for what purpose and with whom, the duration of data storage, and the security measures taken to ensure data privacy (Hutchings et al. 2020; Mangal et al. 2022). Finally, the convenience, ease of access, and immediate availability of the application are important, which are relevant for both the patient and the healthcare provider (Hassol et al. 2004; Vergouw et al. 2020).

As a fifth requirement, in a scenario where individuals donate their data for secondary use, rigorous patient involvement should be ensured (Saelaert et al. 2023). In order to comply with legal requirements and to enable informed decision-making, patients are required to provide their consent when participating in a medical research study, which is usually a very narrow consent to use data for a specific purpose for a predefined period of time. This obviously limits the use of the data for that purpose and does not allow for secondary analyses or inclusion in other potentially relevant medical research. The term “data donation” has been coined in Germany to expand the use of data in medical research (Molnár-Gábor 2021). It is the approval of using personal health information for medical research given at a specific point in time after detailed explanation of the consequences (Bundesministerium für Gesundheit 2020). Patients donate their data for research without receiving any compensation and justify their decision mainly with altruism, solidarity, gratitude, and generally supporting a common good (Richter et al. 2018). Self-determination in the context of data donation requires understandable and comprehensive information about the goals, processes, and specific projects of secondary use. These criteria also apply when AI is used to process the data (Perni et al. 2023). All data donation scenarios require high-quality citizen participation and the promotion of social data literacy as accompanying measures. The realization of self-determination through an opt-in broad consent faces significant practical and ethical challenges. Against this background, the ethical benefits of a quality-assured opt-out scenario are currently being discussed (Strech et al. 2020).

The sixth requirement is to develop effective legislation that balances different interests in order to achieve data sovereignty (Hummel et al. 2021b). In general, the processing of health data is allowed if there is consent or if there is Union or Member State law that provides for the processing. The GDPR sets out specific rules for the processing of sensitive data such as health data. The processing of this type of data is prohibited. Only with certain exceptions can the categories of data listed in Art. 9 (1) GDPR be processed. The regulation therefore contains opening clauses that allow for national implementation; cf. Art. 9 (2) GDPR (Kuner et al. 2020). Typically, data are not processed primarily for research purposes, but for the individual diagnosis and treatment of an individual patient. Research purposes therefore constitute a so-called secondary use. This secondary use of data causes great difficulties in practice regarding their legitimate use and the extent of such use, due to the vagueness of the underlying provisions (Peloquin et al. 2020). The GDPR grants privileges for the secondary use of data sets for research purposes; see Art. 5 (1)(b) GDPR. At the same time, the regulation strengthens the sovereignty of individuals by granting certain rights (see Art. 13 GDPR). Whenever health data are processed and Art. 9(2) GDPR is the legal basis, the safeguards of Art. 89 GDPR must be respected (Kuner et al. 2020).

By contrast, the current national legislation is more research-friendly. In the past 3 years, the German parliament has passed two new laws to expand digitization in the healthcare system. In November 2019, the Digital Healthcare Act (DVG) was passed with the aim of promoting digitization in the healthcare sector. A central database is planned in which the health data of around 90% of the German population will be stored (Schrahe and Städter 2020). Later, these data will also be made available to third parties for research purposes. Additionally, in October 2020, the Patient Data Protection Act (PDSG) came into force. From 2023 on, insured persons will have the option of voluntarily making some of their health data stored in their electronic health record available for research (Bretthauer and Spiecker genannt Döhmann 2020; Orak 2021). Both laws, in combination with the GDPR and other national laws based on the opening clauses thereof, have a major impact on the research of health data in combination with AI (Bretthauer and Spiecker genannt Döhmann 2020; Weichert 2020a; see BVerfG decision of the Second Chamber of the First Senate of 19 March 2020, No. 1 BvQ 1/20-, Rn. 1–18).

Furthermore, the use of vague terms in the GDPR leaves unanswered the scope of the privilege for scientific research. There is no legally binding definition of the term “scientific research” at the European level, which could lead to different interpretations and consequently raise various questions (Forgo 2015; Roßnagel 2019). The GDPR attempts to strike a balance between the two conflicting fundamental rights—the freedom of research and the right to data protection. In the context of research data processing, the freedom of research according to Art. 13 EU Charter of Fundamental Rights (CFR) and the protection of personal data according to Art. 8 CFR—are in conflict. According to Art. 13 CFR, research is an activity aimed at acquiring new knowledge in a methodical, systematic, and verifiable manner (Jarass 2021). Applied research is covered, but not the mere application of previously acquired knowledge. It is irrelevant whether the findings are scientifically recognized and where the research takes place. The regulatory content of Art. 13 CFR largely corresponds to that of Art. 5 (3) Grundgesetz and is inspired by it (Weichert 2020b). All in all, scientific research is understood in a broad sense. This is also confirmed by Recital 159 GDPR. Although Art. 13 CFR does not limit the scope of protection, it must be limited in the event of conflicting fundamental rights: see Art. 52 (1) CFR. Otherwise, freedom of research would mean that data protection would pale into insignificance (Spiecker genannt Döhmann 2021). The fact that both fundamental rights have to be balanced is also shown by Art. 89 GDPR (Spiecker genannt Döhmann 2021). It strengthens the protection of personal data. Art. 89 (2) and (3) GDPR, however, provide for exceptions to the rights of the data subject, which protects the freedom of research (Kuner et al. 2020).

From a legal perspective, the first step is therefore to formulate effective legislation. The secondary use of health data does not only allow for the processing of the collected data for other purposes, but also limits the rights of data subjects. It is important not only to conduct research, but also to regulate it in a way that respects fundamental rights, and this has become increasingly important in the context of the GDPR.

Discussion

By applying a narrative literature review combined with a design thinking approach, we were able to derive six requirements that need to be met to implement data sovereignty and informed consent in AI-enabled clinical research. Based on the existing literature, the requirements were operationalized into concrete implications for the development of ethical IT solutions. These are summarized in Table 2.

Table 2 Implications for IT solution development preserving patient data sovereignty and informed consent within the German healthcare system

Our study advances existing knowledge from both a theoretical and a practical perspective. From a theoretical perspective, other authors have examined how data sovereignty can be achieved in a data-sharing use case primarily from an ethical perspective. In particular, Hummel et al. (2018) propose to adopt data sovereignty and informed consent as normative reference points for responsible and ethically sound data-sharing mechanisms. Their recommendations overlap with our requirements. For example, Hummel et al. (2018) highlight the responsibility of multiple stakeholders and levels to achieve data sovereignty. They also emphasize that individuals have a right to education on data literacy, while at the same time technology should be designed to be easy to understand so as not to overwhelm individuals with information they cannot process. On the other hand, it is clear that the recommendations presented are not specifically tailored to the healthcare use case, as several of them aim to balance individual claims for data sovereignty against market dynamics (e.g., by ensuring data interoperability or a plurality of data platforms). However, the case for healthcare is different, as the likely evolution in Europe is towards a much more regulated value chain with few actors. Furthermore, the recommendations put forward are very theoretical in nature. Therefore, we believe that our paper adds value by targeting the recommendations to a healthcare use case, taking into account the latest legislative initiatives at both EU and national level such as, e.g., the EHDS (European Commission 2022), the AI Act (European Commission 2021), or the Health Data Utilization Act (Bundesministerium für Gesundheit 2023; see also Ebert and Spiecker gen. Döhmann 2021; Kühling and Schildbach 2024; Roos and Maddaloni 2023). In addition, our requirements are underpinned by implications that pave the way for implementation.

From a practical perspective, the German Ethics Council (2017) has put forward recommendations to implement data sovereignty as a guiding principle for the use of big data in health. The recommendations aim at realizing the potential of big data, preserving individual freedom and privacy, ensuring fairness and solidarity, and promoting responsibility and trust. Again, there is an overlap with the requirements derived from our research. However, the paper falls short in providing evidence-based tools and measures to guide implementation. Furthermore, both the technical and legal frameworks have evolved significantly since the publication of the paper. In summary, our study contributes to the literature by providing up-to-date and evidence-based practical guidelines on how to implement data sovereignty and informed consent in a healthcare setting.

Conclusion

Achieving patient-centered AI-driven clinical research requires data sovereignty and informed patient consent. The complexity of the context requires a cross-disciplinary approach and active patient involvement in the development of solutions. We identify six requirements, which are further substantiated with concrete implications for the development of IT solutions in the German healthcare system. The paper thus closes a research gap by providing key pillars on how data sovereignty in clinical research can be implemented in practice. We recommend three directions for further research in the field.

First, we have identified several approaches, tools, and standards that are suitable for achieving data sovereignty and informed consent in a healthcare context. However, most of these approaches have not yet been sufficiently implemented. For example, although standards for usability and interaction design standards are well documented, existing healthcare technologies still perform poorly in this regard. Similarly, risk-based approaches to assess AI trustworthiness have proven valuable, but not yet in the context of clinical research.

Secondly, we have outlined in detail the legal challenges of secondary data use that are rooted in the current legislation. Further efforts are needed to balance the interests of all stakeholders (patients, clinical researchers, national governments, etc.) in order to develop a legal framework that adequately addresses the different legitimate needs while enabling innovation through data use.

Thirdly, in our study we took great care to include different scientific perspectives as well as patient representatives. Collaboration between many disciplines and end users is needed to digitally enable data sovereignty and informed consent. It is this collaboration between different disciplines itself, and the involvement of patient research partners to represent the patient perspective, that has made the analysis of requirements presented here possible and offers much potential for scientific progress. Our results point to the need for extensive stakeholder involvement in the development of the architecture of future IT solutions for the secondary use of healthcare data. Contributions on stakeholder involvement have been made both in healthcare and in other fields (Hendricks et al. 2018; Reed 2008). We therefore call on researchers to develop a stakeholder participation agenda to support the development of the EU-wide health data sharing solution. In addition to drawing on existing research, such an agenda should also incorporate learning from countries that have already implemented such a solution.