Introduction

Data assets are digital goods and the basis for all Information Systems (IS). They have become a strategic asset for societal prosperity and economic competitiveness. Accordingly, studying data as a concept is essential for further developments in IS research (Singi et al., 2020). According to recent estimations, data assets will grow in quantity and increase in importance in the upcoming years (Statista, 2022). More data sharing that further enables data-driven decision-making is one reason for this growth and increase (Munoz-Arcentales et al., 2019). However, those who share their data fear a loss of control and competitive disadvantage, which is why a data economy that protects the individual and organization’s interests is vital (Lauf et al., 2021). In this context, data sovereignty becomes a success factor as its implementation strengthens actors to decide on the use of their data as an economic asset (Banse, 2021), thus paving the way to a digital space wherein providers and consumers can control all of their data actions.

Practically speaking, data sovereignty constitutes a key piece in building safe environments where data providers and consumers overcome trust issues while sharing data. Given the importance of handling data according to sovereignty principles, policymakers must ensure “fair data sharing practices” (European Commission, 2022, p. 26) and create secure frameworks. Legislations derived from the European Strategy for Data, such as the Data Governance Act (DGA) or the Data Act (DA), as well as the General Data Protection Regulation (GDPR) that came into force in 2016, regulate the data protection of different actors. They directly influence technological design in order to balance economic opportunities with society’s interests in sharing and reusing data (Labadie et al., 2019). In addition, politicians, organizations, and other stakeholders recognize data sovereignty as essential for controlling the data of individuals and organizations; however, when referring to data sovereignty, it is often unclear whether these actors share the same understanding of the concept.

A deeper understanding of how organizations and individuals technically implement control over data when sharing it is crucial for all research into digital self-determination and motivates the study of data sovereignty in IS. First, academia demands more alignment and less isolation in exploring the core aspects and relations of data sovereignty. Moreover, there is persistent terminological ambiguity in IS research, particularly in studies on indigenous people (Taylor & Kukutai, 2016), data sovereignty in the cloud (Irion, 2012), or data sovereignty of individuals and enterprises (Jarke et al., 2019), to name just a few examples. Additionally, holistic research on data sovereignty that observes the overall concept is either absent or fails to live up to the expectations of exploring the handling of data in a sovereign way within IS (Hummel et al., 2021; Kushwaha et al., 2020).

Moreover, former IS research has faced challenges, provided loose ends, or come to divergent conclusions. This is shown by different data sovereignty definitions with contrasting focuses on law (Docter & Fuchs, 2020), self-determination (Banse, 2021; Jarke et al., 2019; Nagel & Lycklama, 2021), data flows (Lauf et al., 2021), or informational freedom (German Ethics Council, 2017). Further, studies have focused on implementing data sovereignty without clarifying the concept’s foundation (Opriel et al., 2021; Plattform Industrie 4.0, 2022). Other research has analyzed the impact of data sovereignty on data sharing without examining the concept itself (Azkan et al., 2022). Previous articles and studies have described sovereignty as a capability (Nagel & Lycklama, 2021) without proving a theoretical approach. This study aims to fill these gaps by analyzing the current state of research and developing a conceptual model that can help researchers and practitioners navigate this cluttered field so as to gain a mutual understanding of the concept.

This research is structured as follows: It begins by describing data sovereignty and contextualizing its background, as well as analyzing previous contributions in IS and adjacent domains from academia and practitioners. As described in detail in the appendix, a Multivocal Literature Review (MLR) is applied to developing a conceptual model (Fig. 1) that specifies the core aspects of data sovereignty (Table 2). It draws on agency theory to support a consistent understanding of the concept within the realm of IS, as well as to form a baseline for further analytical, exploratory, and design-oriented research. Using real-world examples, the proposed model illuminates all core aspects and explains their roles and relations. The paper concludes by discussing theoretical and practical implications while considering limitations and future research opportunities.

Background and related work

In the digital world, the concept of sovereignty describes forms of independence, control, and autonomy over digital infrastructures, technologies, data, and digital content (Pohle & Thiel, 2020). Discussions of sovereignty with a technological focus began in the 1980s (Grant, 1983; Hinsley, 1986) when it extended to various forms and domains, such as technological-, digital-, data-, or cyber sovereignty (Hellmeier & von Scherenberg, 2023). Data sovereignty is a relatively new term used in decision-making and data ownership (Hummel et al., 2021). Over time, researchers have shaped its meaning, emphasizing its different nuances. Table 1 summarizes various data sovereignty definitions from different research domains to contextualize the concept.

Table 1 Collection of data sovereignty definitions

Direct comparisons reveal different perspectives on the same term. For example, Polatin-Reuben and Wright (2014) mentioned a missing definition and shaped the concept on a national level, while both Jarke et al. (2019) and Nagel and Lycklama (2021) described it for individuals and enterprises. Other publications, such as the German Ethics Council (2017), included such technical aspects as big data, while Docter and Fuchs (2020) introduced the legal perspective. Research has often referred to the notion of digital self-determination that exceeds the perspective of data sovereignty as it considers not only one’s data but also “data about oneself” (Verhulst, 2023, p. 8) and is related to protecting personal data and user consent. In contrast to data sovereignty, digital self-determination makes no distinctions between data and their actors but sees both as an entity (Verhulst, 2023).

Within IS, current discussions on data sovereignty are increasingly driven by regulations that balance data protection and use before, during, and after the sharing process, such as the European GDPR, the DGA (European Commission, 2020), and the DA (European Commission, 2022). However, control over data is not only a fundamental European principle. Such regulations as China’s Personal Information Protection Law (PIPL) or the California Consumer Privacy Act (CCPA) show that it is also gaining increasing attention globally (Chander et al., 2021), as the global rise in data exploitation stems mainly from the market power of monopolistic US and Chinese organizations, thus explaining the increasing demand for new data governance models.

The technical implementation of data sovereignty can initiate beneficial consequences of data sharing since it enables organizations to find a solution for balancing data protection and use. These are, first, cost sharing, where actors save money and time when sharing their data under the prerequisites of data sovereignty; second, the greater common good, where organizations can, for example, be motivated to share data for the achievement of CO2 targets; and third, joint innovation, which can only occur when actors work together, as most participants are unable to realize the application individually (Data Spaces Support Centre, 2023b). These examples show that value is not created by one player, but through various actors’ combinations and data enrichment in data ecosystems (Gelhaar et al., 2021).

The ecosystem concept originally stems from ecological science and draws on the attention of living organisms that co-exist in a healthy environment (Chapin et al., 2011). Ecosystems and, in this regard, data ecosystems do not function with central governance but rather work in balance. They can be open or closed (Oliveira & Lóscio, 2018), and while open data ecosystems are free for everyone to join, the closed variety often enforces technical or legal entry barriers (Capiello et al., 2020; Janssen et al., 2012; van den Homberg & Susha, 2018). Actors in data ecosystems depend on and benefit from each other in equilibrium, without one being dominant. As such, all actors should be equipped with an instrument to control their own data without being controlled by one central instance to create a trusted environment. Consequently, implementing data sovereignty is an essential part of this (Gelhaar et al., 2021; Otto et al., 2022).

Conceptualizing data sovereignty in IS research

This chapter proposes a data sovereignty conceptual model consisting of core conceptual aspects and relations. Conceptual models are critical for simplifying and abstracting reality, as well as helping researchers and practitioners to understand, organize, and communicate complex or novel concepts (Houy et al., 2012). As described in the appendix, the conceptual model was developed by consulting the IS data sovereignty literature in Tables 1, 2, and Table S1. It is grounded in the agency theory to ensure that the model can fully explain the concept and offer a basis for real-world application (Eisenhardt, 1989).

Table 2 Specification of the core conceptual data sovereignty aspects and relations

The core of this theory, developed during the 1960s and 1970s, is to analyze the relationship between two actors (Eisenhardt, 1989). Its underlying assumption is that these two actors pursue their objectives, which often differ, acting in their self-interest. In addition, it implies an information asymmetry between both actors. In order to avoid mistrust, control mechanisms are installed that lead to greater transparency (Eisenhardt, 1989). With the help of this theory, challenges in organizational relationships can be more effectively uncovered, and governance structures more deeply understood (Eisenhardt, 1989).

Through the lens of this theory, data sovereignty can be implemented as an instrument with the central objective of establishing more trust. As outlined in the theory’s description, contractual agreements provide the necessary transparency on the actions of both actors (here, data providers and consumers). According to Eisenhardt (1989), this theory can be applied to buyer–supplier and other agency relationships and, therefore, is suited for relations in the context of data sharing that arises in open or closed data ecosystems. With the implementation of data sovereignty, actors have an instrument at hand that paves the way for a more balanced power structure and supports all parties in pursuing their objectives.

The presented conceptual model applies the concept of data sovereignty in IS research, supporting both researchers and practitioners to develop a holistic understanding of the concept and serves to guide those (i.e., practitioners) who seek to implement data sovereignty technically. It aims for a completeness that has, as yet, not been provided by existing IS literature and definitions (see Table 1). In addition, this conceptual model helps all stakeholders better understand and communicate the concept of data sovereignty.

The seven core aspects referenced in Table 2 result from the IS literature’s analysis and our experience in this field, using agency theory as the basis for the development of this conceptual model (Creswell, 2009). Details about the MLR search process, including scientific and grey literature, are described in the appendix. The modeling process considered the contributions listed in Table S1, explicitly focusing on data sovereignty in the IS domain. We use examples to explain how we derived the conceptual model when explaining each core aspect. Table 2 summarizes all core aspects and relations and lists their specifications.

With directed arrows, the conceptual model illustrated in Fig. 1 represents the relations of the core conceptual aspects. The model acknowledges the data asset as its central component that must be protected in an organizational or personal context if shared with other parties (Nagel & Lycklama, 2021). During its lifecycle, from creation to sharing and deletion (Rahul & Banyal, 2020), a data asset can reach different statuses and versions because it is modified by activities in the data value chain (Curry, 2016). These activities are performed by the data provider or the data consumer who gained access to the data asset (Otto et al., 2022). In order to implement data sovereignty, the provider and consumer must negotiate a contract that specifies the use conditions of the data asset (Zrenner et al., 2019). Access and usage policies are possible examples of such contracts (Gil et al., 2020). Due to frequent mistrust between the parties involved (Lauf et al., 2021), a data provider often seeks to ensure that the consumer only performs data value chain activities described in the contractual agreement. Therefore, a manual or technical data infrastructure helps ensure trust because it supports the management of contracts through enforcement techniques (Munoz-Arcentales et al., 2019). Nevertheless, trust is always required by all stakeholders involved (Nagel & Lycklama, 2021), even if the concept reduces the minimum amount needed to create a data sovereignty solution. The following subsections describe every core aspect in detail.

Fig. 1
figure 1

Conceptualization of data sovereignty in IS

Data asset

Based on the conceptual model, data sovereignty can be defined as an instrument to keep control over an actor’s data asset. Examples of data assets can range from individual files to complete batches and full data streams. Such data assets must be controlled in terms of their access and usage (Munoz-Arcentales et al., 2019). Data are defined as assets describing intangible objects that can be reproduced repeatedly (Capiello et al., 2020). However, it is worth noting that there is no single definition of the concept in IS research (McKinney & Yoos, 2010). Data are contextual, and their ownership is difficult to define. They cannot be classified as private or common goods, such as traditional commodities (Jentzsch, 2018), since there are no legally binding concepts regarding their ownership (Bärenfänger, 2017). The data asset has been placed at the base of the model as it is key for each application of data sovereignty. Since the status of the data asset is modified by the data value chain and lifecycle activities, they are directly related to the data asset and positioned at the bottom as a baseline.

Data provider and data consumer

A data provider can decide to keep their data private for internal use, share it publicly, or allow access to a restricted number of third parties based on custom rules. For example, contracts are created and negotiated between the data provider and the data consumer to keep control over the data asset. Providers and consumers can be individuals, enterprises, or organizations sharing data assets (Cavanillas et al., 2016; Marfia et al., 2017). In the case of a contractual agreement, the provider can be further divided into the role of a data owner that creates and executes control over the data asset and authorizes a data provider to make it available to other parties (Hummel et al., 2021; Otto et al., 2019). In addition, when referring to data consumer, other sources, such as the Data Spaces Support Centre Glossary, use the term data recipient (Data Spaces Support Centre, 2023a). Besides contractual arrangements between both partners, data-providing enterprises can share data directly or through existing systems, such as data marketplaces (Nagel & Lycklama, 2021). Here, a data consumer can buy either the data asset itself or limited usage rights. Since both actors are represented as core aspects in the model, they are placed on the left side for the provider part and on the right for the consumer part, as all activities are performed in between them.

Contractual agreement

As stated above, exercising data sovereignty can promote data sharing between organizations. In the traditional sense, written contracts are drawn up to increase trust, which results in a contractual agreement after mutual consent. Due to a lack of control, these agreements are often not fully honored and lack high levels of trustworthiness (Nagel & Lycklama, 2021). IS research has recognized and addressed this problem to enforce contract agreements that are negotiated and monitored semi-automatedly with the help of infrastructures and architectures to reduce (un)intentional data misuse (Jarke et al., 2019). Therefore, different systems and processes in various domains focus on smart contracts (Ghazizadeh & Sun, 2021). The data provider and consumer can be two neutral actors creating a contract based on rights and obligations, data usage policies, and terms and conditions (Zrenner et al., 2019), described in more detail in the infrastructure section. They can give or revoke their consent to change access rights and specify conditions of how their data can be accessed and used. The contractual agreement is located in the middle, as it consists of the main conditions for maintaining control over data assets — the main goal of data sovereignty.

Data value chain and data lifecycle activities

As depicted in Fig. 1, the data value chain includes different activities in the data lifecycle of a data asset: creation, storage, usage, sharing, archiving, and destruction (Rahul & Banyal, 2020). In this context, the implementation of data sovereignty enables an organization or individual to control the data asset throughout the data lifecycle. According to Curry (2016), an information flow consists of different activities that perform transformation steps to turn a data input into a data output. In the context of data sovereignty, the ability to keep control must extend over all data value chain activities, from creation to transformation to deletion, rather than focusing on individual activities (Banse, 2021). The activities must be consistent with the contractual agreements and usage conditions to enable self-determination. Accordingly, the data asset itself in Fig. 1 is not directly linked to the data provider or consumer (Nagel & Lycklama, 2021). Instead, the data provider and data consumer perform value chain activities on the data asset.

Data infrastructure

The data infrastructure component enforces terms and conditions determined in the contractual agreement (Munoz-Arcentales et al., 2019; Nagel & Lycklama, 2021). It is centrally located in the model since it works between the data provider and consumer by validating and executing terms and restrictions (Nagel & Lycklama, 2021), specified in the contractual agreement (see Fig. 1). These terms are divided into access control (AC) and usage control (UC), which protect data assets in almost all activities in the data value chain and lifecycle. As implied by the term AC, the concept focuses on the concrete control of access. Seeing as control is lost once access is granted, UC extends the control over data before and after third-party access (Gil et al., 2020), specifying which aspects of actors in ecosystems can access and use the data (Zrenner et al., 2019). However, AC and UC requirements specified in contractual agreements do not add value if not enforced correctly. Therefore, data infrastructure components, such as software systems, must validate the conditions of the contractual agreement and execute the actions described in the policies (Gil et al., 2020). Concepts based on decentralized identities (Ernstberger et al., 2023) and initiatives, such as the International Data Spaces Association (IDSA) and GAIA-X, operate according to standards and the technical implementation of data infrastructure components to address these problems. Their solutions find application in various domains, such as the cloud, IoT devices (Qarawlus et al., 2021), manufacturing (Landolfi et al., 2019), and many others.

Trust

According to Schilke and Cook (2013), trust has emerged as a central theme in inherently uncertain relationships, with Botsman (2017) defining the term as the “confident relationship with the unknown” (2017, p. 8). While in private and closed scenarios, trust can be established in the first instance because the actors know each other, it is challenging in the second scenario as the data provider and consumer are partly unknown due to complex supply chain networks with many participants (Gil et al., 2020). In the conceptual model, the relationship of trust needs to be considered from two different angles. In the first step, trust is required by the data provider and consumer (Peterson et al., 2011). In this context, actors in open and closed data ecosystems must establish a fundamental trust relationship in the methods and technologies used to enter a relationship and realize data sovereignty. In the second step, trust can be enhanced as soon as parties, such as data providers and consumers, establish contractual agreements via data-sharing infrastructure to accelerate business transactions (Yang et al., 2021). Thus, the basic trust required by data consumers and providers helps strengthen the overall trust in the data infrastructure that enforces the policies specified in the contract agreements. To make the argument of trust a core aspect for developing a more robust conceptual model, Munoz-Arcentales et al. (2019) stated: “Trust. It is the basis for all the relations between different organizations. Thus, being part of trusted environments is a key part of every operation, including data exchange. Data usage control is achieved thanks to this principle” (2019, p. 592), which makes it an essential component of data sovereignty.

Examples from the field

The model was evaluated by concrete examples from the field. Such real-life scenarios can demonstrate its usefulness and possible applications. One example stems from the German automotive industry and deals with data exchange in the supply chain. The case study, its requirements, and the results presented by Opriel et al. (2021) can be mapped to the core aspects. In their study, the data exchange occurs between an original equipment manufacturer (OEM) and a specific supplier (data provider, data consumer). They exchange industrial information on demand and capacity (data assets) at different stages (data value chain and lifecycle activities) based on such current standards as the Electronic Data Interchange (EDI) (data infrastructure). The researchers identified the need for trust and the possibilities of contractual agreements in their problem, barriers, and business requirements analysis: “[Data sovereignty] can foster trust in each other and reduce risks being affected in data breaches (P16) […]. In order to secure legal aspects, the system shall provide functionalities to link usage policies with contractual definitions (R16)” (Opriel et al., 2021, p. 436). Here, the instrument of data sovereignty is implemented to overcome trust issues originating from power imbalances between participating actors and, therefore, serves as an excellent example of agency theory’s applicability.

Another concrete example explains the shared use of data in a network of enterprises. In its white paper, Plattform Industrie 4.0 (2022) demonstrated how the technical implementation of data sovereignty plays a crucial role in multilateral data sharing for Collaborative Condition Monitoring (CCM) between such participants as component suppliers and factory operators (data provider, data consumer). They share and use (data value chain and lifecycle activities) datasets, such as sensor data (data asset), to leverage data-driven business models via a decentralized, federated infrastructure (Plattform Industrie 4.0, 2022). Similar to the previous case, the core aspects of the conceptual model can be directly mapped to their results, as summarized in Table 3. Component suppliers, machine suppliers, and factory operators create legally binding concepts to ensure trust between each other. Moreover, this example showcases agency theory’s relevance in this actor relationship and highlights that data sovereignty is a suitable instrument with which to overcome mistrust and weaken power imbalances, even if both actors have their own interests.

Table 3 Examples mapped to the core conceptual data sovereignty aspects

Discussion and future research opportunities

The presented conceptual model offers a new approach to understanding data sovereignty’s implementation in IS research by considering adjacent domains. It contributes to the existing literature by laying the foundation for further research, as well as by filling the above-described research gaps of underlying conflicts and inconsistencies. The following discussion describes the study’s practical and theoretical implications and addresses current limitations and future research opportunities.

This study’s results lead to direct implications for practice, as they serve to guide and provide a mutual understanding of the concept for individuals and companies. It aims to help users technically implement data sovereignty, e.g., actors in research projects, organizations building data-sharing ecosystems, and stakeholders strengthening the role of data sovereignty through regulatory bodies. In addition, industry and research projects related to IDSA or Gaia-X can help to further communicate and develop this topic by designing systems based on data sovereignty principles. Additionally, individuals and society can play an enhanced role in demanding technology that implements data sovereignty for all data lifecycle stages by design, in line with European values. The conceptual model can further refine this vision and clarify communication.

For theory, this work’s conceptual model can be seen as a necessary academic addition to ongoing discussions. The terminological ambiguity, viewpoints of current research streams, and existing definitions were brought together by defining and describing core aspects. We acknowledge that, in IS research, other models have sought to offer a mutual understanding of data sovereignty. However, Ernstberger et al.’s (2023) model has a nearly exclusive technical layer perspective, while Zrenner et al.’s (2019) applies it in the manufacturing domain only, and the model of Otto et al. (2019) is a specific reference architecture. An additional theoretical impact arises from linking different research streams that describe the core conceptual aspects. To the best of our knowledge, some of these (e.g., the data value chain and trust) had not previously been contextualized in this manner, meaning that this study offers an approach with the potential to open up new perspectives. Due to its fundamentality, this IS research’s theoretical contributions can be tested and applied in different research areas, e.g., with a legal or political focus.

Despite careful evaluation, this fundamentals study suffers from limitations as it could not cover all essential research strands. Nevertheless, these can provide input for future research opportunities according to different paradigms, namely design science research (DSR), which aims to develop artifacts addressing real-world problems (Hevner et al., 2004), and behavioral research focusing on why groups or individuals act in a specific way and how they can be influenced (Skinner, 1965). To theoretically ground the conceptual model, the agency theory approach was chosen. However, its limitation must be acknowledged, which include, for example, a closer relation to the area of IS, defined “as a system[s] in […] organization[s]” (Davis, 2000, p. 67). Moreover, the literature analyses have limitations since using different databases or searches could lead to different results.

In line with the DSR paradigm, further limitations are addressed in Table 4 and described in the following: First, future research should examine the necessary development of an artifact that supports individuals in controlling their data (RQ#1). Moreover, the implementation of data sovereignty according to the model for individuals is valid; nevertheless, its enforcement requires further attention. Research on the enforcement of data sovereignty for individuals exists (Lomotey et al., 2022). However, as this was outside the scope of this study, future research could explore which artifacts need to be developed to enhance individuals’ ability to control their data. Additionally, the future development of the instrument of data sovereignty was not covered in this research. Therefore, identifying the capabilities needed to implement data sovereignty as an instrument is critical (RQ#2). Building on this, conducting design-oriented studies of maturity models to track and measure data sovereignty’s implementation (RQ#3) could be a promising research direction. Furthermore, there is a need for IT artifacts in policy management and data spaces, as well as reference models and methods, to establish, develop, improve, and ensure data sovereignty in internal and external data management activities (RQ#4), such as validation, enforcement, signing, watermarking, or data integrity concepts (Hellmeier et al., 2023).

Table 4 Summary of future research opportunities

In the context of this study’s limitations, the behavioral research paradigm applies to various research opportunities. Due to this research’s qualitative literature approach, subjectivity can be seen as a limitation. Even if examples from the field are mapped to the conceptual model (Opriel et al., 2021; Plattform Industrie 4.0, 2022), applying the model in practice, e.g., in the “common European data spaces” (Data Spaces Support Centre, 2023b, p. 5) would prove its utility in various data sharing projects (RQ#5). Additionally, this could help validate agency theory’s application for reaching an overall understanding of implementing data sovereignty as an instrument. Moreover, the cost of such implementation has not been discussed in this research. The relationship between the value of data and data economics on the one hand, and data sovereignty on the other, acknowledging that data assets may vary in criticality and value, is an exciting research strand. Open questions have to be answered focusing on the maintenance costs of data infrastructure and standards for enforcement (RQ#6). Besides, data sovereignty is a prerequisite for enabling more data sharing (Azkan et al., 2022). This study has not explicitly analyzed whether data sovereignty positively or negatively impacts data sharing, thus making it necessary to explore this aspect in the future and re-evaluate the topic’s importance (RQ#7).

Summary

As IS research on data sovereignty remains in its infancy, this study has included academic and practical literature in its investigation so as to determine a common understanding of the concept itself (see Fig. 1). As shown by the analysis of the current research stream, data sovereignty is not uniformly defined, with contrasting explanations and definitions having been offered. This fundamentals paper expands IS research’s knowledge on data sovereignty by providing a conceptual model following agency theory and validated by documented real-world examples. It emphasizes the specification of the core aspects (derived from the literature) needed to implement data sovereignty. The technological implementation of data sovereignty is essential for guaranteeing trusted data sharing between individuals and organizations of different parties and make innovation happen. However, further practical and theoretical implications have yet to be uncovered, and future research must still evaluate and apply the proposed model.