Introduction

The ever-increasing availability of data and diffusion of digital technologies leads to a shift in perception of data from a byproduct to a strategic resource that offers more and more opportunities for staying competitive and finding new angles for diversification (Legner et al. 2020). Leading management consultancy Gartner predicts that “organizations that promote data sharing will outperform their peers on most business value metrics” (Goasduff 2021). Additionally, new legislation such as the German Act on Corporate Due Diligence Obligations in Supply Chains (Supply Chain Act) or the Corporate Social Responsibility Directive (CSRD) requires organizations to collect data about their direct and indirect suppliers as well as to ensure that they themselves and their suppliers do not engage in activities that are harmful to the environment or promote inhumane working conditions. Realizing the positive intentions of the Supply Chain Act and navigating the additional cost of bureaucracy and inter-organizational cooperation require organizations to collect data from multiple parties, such as suppliers, government agencies, customers, non-government organizations, or whistleblowers (Federal Ministry for Economic Cooperation & Development 2023; Germany’s Federal Parliament 2021). A recent study by the German research institution IW Köln (Kolev-Schaefer & Neligan 2024) finds that most companies, even if they are not involved directly, are now tasked to supply information to their suppliers or customers who fall under the Supply Chain ActFootnote 1.

This can be particularly challenging. For instance, the automotive industry is characterized by extremely complex supply chains in which car manufacturers employ thousands of direct and even more indirect suppliersFootnote 2. At the same time, car manufacturers are tasked to create transparency and continuously monitor their (in-)direct suppliers for risks regarding environmental or human rights infringement (CSR-in-Deutschland 2023). Creating transparency, continuously monitoring their (in-)direct suppliers, and fostering collaboration across the supply chains are challenging for the involved car manufacturers (Leckel & Linnartz 2023). For instance, BMW is currently under pressure since one of their suppliers in Morocco supposedly mines cobalt while releasing arsenic in nearby waters, subsequently causing environmental damage and potentially harming humans (Blum et al. 2023).

Collecting the data to fulfill these requirements and generate novel business can hardly be achieved alone or be done solely in sequential supply chain relationships but requires inter-organizational industrial data ecosystems (Legenvre & Hameri 2024; Oliveira et al. 2019). However, these data ecosystems rarely emerge naturally (i.e., without legislative force) because of a range of concerns organizations have when sharing their data. Among these is the fear that their data will be used against them, negative effects in competition, data misappropriation, or a general lack of control in deciding what can be done with their data (Fassnacht et al. 2023; Jussen et al. 2024; Opriel et al. 2021). A recent study from Germany finds that issues of data sovereignty, such as access control, data protection, and unclear usage rights of data, are among the top concerns of organizations when sharing data (Röhl et al. 2021).

Data spaces are inter-organizational information systems (IOISs) that explicitly address these concerns and aim to support inter-organizational data sharing by organizationally and technically implementing data sovereignty (Otto & Jarke 2019). They are one way to realize flourishing industrial data ecosystems, and in our opinion, one of the most promising, in which organizations can access multiple data sources in a distributed system while agreeing on a set of bilateral (peer-to-peer) and multilateral (ecosystem) set of data usage rules (Otto 2022b). These rules can include access control and usage control policies, which formalize how data providers’ data can be accessed and used by data users through technical implementation, e.g., so-called data space connectors (Otto 2022a; Otto & Jarke 2019).Footnote 3 When entering a data space, participants are required to accept its rules defined in the governance framework (Data Spaces Support Center 2023).

The political and economic importance of these data spaces is reflected in a range of European initiatives enacting a significant effort to promote data spaces. For instance, Gaia-X is a European initiative providing architectural guidance for creating data ecosystems in many domains, such as Agriculture, Logistics, Energy, and Health (Gaia-X 2022). In their view, data spaces are the “sum of all its participants, which may be data providers, users and intermediaries” designed to uphold data sovereignty and trust among its participants (Gaia-X 2023). The German initiative Catena-X Automotive Network aims to create an open data ecosystem based on the Catena-X data space covering the automotive supply chain tailored explicitly to its needs and including typical stakeholders taking on different roles (e.g., OEMs, Suppliers, IT Services). Members must use a standardized interface—a data space connector—to access this data ecosystem (Catena-X 2022b). In a recent article in the German news outlet Süddeutsche Zeitung, Catena-X is positioned as the data-based enabler to comply with the German Supply Chain Act (see above) (Martin-Jung 2022). Thus, one can recognize the stark effort put into implementing data spaces complemented by multiple initiatives, research projects, or large-scale community events (e.g., the Data Space Symposium with more than 1000 participants interested in data spaces). Against this background, data spaces are poised and tasked to prove their value in practice and, correspondingly, their interest to the IS research community. However, while technical standards are critical, data spaces, as of now, lack proven business models for sustainable, long-term success (Bub 2023). Additionally, legislation, such as the Data Governance Act (DGA), regulates data intermediation services by limiting data usage and setting conditions for the provision of services and infrastructures (Richter 2023), which complicates data space implementation in practice (Gemein et al. 2023).

While there is a large landscape of projects working on data spaces and the facilitation of data ecosystems, these two concepts, in conjunction with each other, are a blind spot in IS research. This is problematic since they are often used interchangeably, prompting a range of adverse effects. First (1), giving the perception that data spaces are the only way to initiate and maintain data ecosystems could exclude or suppress other data infrastructures. For instance, data marketplaces, which are not data spaces (although they could play a role in them), generate data ecosystems of data providers and users using their centralized technical infrastructure to buy, sell, and generally trade data. This means that data spaces require a specific definition as one kind of data infrastructure that enables data ecosystems while fully knowing that there are other infrastructures capable of generating data ecosystems that could be more relevant to the underlying use case. Second (2), this blurry overlap prevents sharp distinctions between each concept and prevents clear scholarly discussion and alignment (Schönwerth 2022). As such, it is necessary to position data spaces as data infrastructures that can generate one or multiple data ecosystems, as well as posit that there is not one data space but likely a whole array tailored to different industries and communities (e.g., see the overview given by the Common European Data Spaces, European Commission 2024b). This conflation could provoke unrealistic expectations by researchers and practitioners and misrepresent their intertwined roles. Third (3), data spaces generate data ecosystems, and participants—often data providers and data users—can be part of the digital infrastructure by using a dedicated data space connector or contribute or consume data or services to the larger data ecosystems outside of the digital infrastructure. The differentiation between actors engaged in the data space and the data ecosystems is necessary since one is integrated technically (data space), while the other does not necessarily have to be (data ecosystem). Disentangling this overlap is relevant as being engaged in a data space poses a different and unique set of requirements (e.g., adhering to specific rules and technical integration) than being part of a more comprehensive data ecosystem. Fourth (4), the blurriness above might prevent IS researchers from exploring data spaces and their ecosystems with maximum effectiveness and results in a lack of a distinct research stream for the IS community.

To summarize, the fundamentals article addresses these issues and aims to clarify the role of data spaces in conjunction with data ecosystems (1), illustrate and systematize data spaces and their initiatives (2), and craft a research agenda for IS research (3 and 4).

From data sharing to data spaces

Over the years, research has started to acknowledge that data sharing is and has been a pivotal activity in many industries, such as e-commerce (e.g., Ghoshal et al. 2020) or healthcare (e.g., Li & Qin 2017). In the following, we illustrate, narratively (Schryen et al. 2020), the field of industrial data sharing between at least two organizations. We start by exploring supply chain data sharing and outline how this notion developed into what we call—today—data ecosystems.

Supply chain data sharingFootnote 4

Data sharing is not a novel activity, but it has been discussed for over 40 years in the literature. Early papers explored data sharing within organizations, responding to the introduction of computers and databases (e.g., Brathwaite 1983). Data is mandatory for organizations to conduct cardinal business functions, such as record-keeping or documentation of business transactions (Davenport & Prusak 1998). Early research also identified the value of information partnerships that produce shared benefits such as cost sharing and distributing excess capacity through customer data sharing (Konsynski & Mcfarlan 1990) or the benefits of the introduction of Electronic Data Interchange (EDI) for sharing electronic business documents between organizations (Hansen & Hill 1989; Mukhopadhyay et al. 1995). These ISs posed many benefits for supply chains, at least for those organizations that could afford the implementation and integration cost of such EDI systems (Stefansson 2002). Another example is information sharing in Vendor Managed Inventories (VMI), in which suppliers manage the inventory on-site and require information to handle replenishment (Lee et al. 2000).

Consequently, inter-organizational data sharing is necessary to perform fundamental coordinating activities in supply chains in all their individual parts (e.g., resource collection, assembly) (Cachon & Fisher 2000; Luo et al. 2013; Stefansson 2002; Wang et al. 2021). It impacts upstream and downstream business processes of supply chain participants, such as production, and is a known strategy to mitigate the bullwhip effect, which occurs when fluctuations in production or orders outweigh inventory and variations in consumer demands impact a supplier’s production (Lee et al. 1997; Wang & Disney 2016). Coping with the bullwhip effect in supply chains without data sharing is barely possible since supplier and OEM relationships lag behind and require time to act, and sharing information can help respond to long-lead-time bullwhips (Bray & Mendelson 2012). The hypothesis is that when customers provide more complete data, the supplier can improve forecasting and accordingly have more room to address bullwhips (Moyaux et al. 2007). For instance, Aviv (2007 p. 790) found that if retailers share information upstream, “it makes sense that the manufacturer will end up with a gain due to his improved ability to anticipate demand.” However, this data and information that is shared—typically—is restricted to very specific data in bilateral relationships (Adner 2017) that are necessary to achieve a specific purpose (Legenvre & Hameri 2024; Wixom et al. 2020). Potential reasons for this narrow scope most likely originate in prevailing fears of misconduct or breaches of confidentiality by data users (Kuo et al. 2014). Cachon & Fisher (2000 p. 1033) already explored the value of shared data between “traditional information sharing” restricted to only orders and “full information sharing” with a range of benefits (e.g., improving order quantity decisions or allocating batches based on inventory positions). Refusing to share data and mitigating issues of information asymmetry can result in the loss of sales and supply chain inefficiencies (Wang et al. 2021), but using shared data requires “trust in the veracity of the reported information” (Cachon & Lariviere 2001 p. 629). One party in the supply chain could exploit another party’s lack of information and adapt buying and selling strategies according to that advantage (Cachon & Lariviere 2001; Makadok 2010).

Industrial data ecosystems

While the IS discipline has researched digital transformation and information management for decades, data ecosystems are a novel socio-technical manifestation of digitally transformed systems of organizations, providing a bouquet of research opportunities (e.g., Curry et al. 2022; Heinz et al. 2022; Hevner & March 2003; Legner et al. 2017; Oliveira et al. 2019). Data ecosystems advocate for an alternative view of inter-organizational data sharing, skewing away from sequential and bilateral data sharing. Instead, data ecosystems center on dynamic data sharing based on common value drivers (e.g., customer value or compliance) as opposed to merely executing fundamental business functions (Jacobides et al. 2018; Legenvre et al. 2022).

In particular, the seminal article of Moore (1993) has significantly contributed to popularizing business ecosystems. It originates from the biological ecosystem, which is a systemic demarcation of a natural environment consisting of organisms and other physical and environmental factors with which they interact (Tansley 1935). Contrary to other inter-organizational forms of cooperation, ecosystems are inherently more dynamic since they revolve around a shared purpose, such as customer innovation from data, and retain the involved parties through a continuous balance of value received and effort given, enabling organizations to enter and exit freely (Otto 2022b). Nowadays, the “ecosystem” concept is mainly associated with digital platforms and reorganizes the sequential supply chain logic to a more open, dynamic, and shared understanding of inter-organizational collaboration that requires alignment of actors beyond those already implemented bilateral relationships (Adner 2017; Legenvre et al. 2022). Instead of clear boundaries confining value-creation activities, these ecosystems are populated by more-or-less autonomous multilateral actors that work as complementors and generate value and network effects (Hein et al. 2020). The synthesis of inter-organizational data sharing and (platform) ecosystems spurred the concept of data ecosystems, which are “socio-technical complex networks in which actors interact and collaborate with each other to find, archive, publish, consume, or reuse data as well as to foster innovation, create value, and support new business” (Oliveira et al. 2019 p. 589). In short, data ecosystems are “creating, managing and sustaining data sharing initiatives” (Oliveira & Lóscio 2018 p. 1). Table 1 juxtaposes dominant data-sharing practices within supply chains and data ecosystems based on the literature’s narrative description and our experience.

Table 1 Contrasting data sharing in supply chains and industrial data ecosystems (see also Legenvre et al. (2022) and Adner (2017) for a juxtaposition of supply chains and ecosystems)

What characterizes the novelty of data ecosystems is the dedicated focus on data as a transaction object for value creation and capture and the accompanying mandatory consideration of their peculiarities (e.g., Prieëlle et al. 2020). Data (as well as products and services stemming from them) can be shared and reproduced indefinitely, contrasting it distinctively with finite resources (Shapiro et al. 1998; Veit et al. 2014). At the core of data ecosystems are the capabilities of each party to contribute to data sharing by either receiving it (data users), sending it (data provider), or facilitating the process (data intermediary) (e.g., Oliveira et al. 2019). The value resulting from these relationships is manifold. For instance, companies can use external data sources to innovate existing services and business models (e.g., Beverungen et al. 2022; Lim et al. 2018; Vesselkov et al. 2019), use others’ data for internal optimization, or use existing data as new business assets (e.g., Cappiello et al. 2020). Overall, the evolving nature of data ecosystems is shaped by the inherent dynamics of ecosystems and the intrinsic characteristics of data itself. For instance, data can be reproduced ad infinitum (e.g., Veit et al. 2014) and has inherent portability, which makes them accessible and shareable quickly and independently from geography through standardized interfaces (APIs) or open data sets (e.g., Gregory et al. 2022). Different sets of algorithms and combinations of data can then produce a variety of information or applications from the same data (e.g., Yoo et al. 2010).

Data sharing infrastructure

Industrial data sharing requires technical infrastructure. Since there are many variants of how data ecosystems can evolve, we will only discuss some of the more prominent ones (e.g., see Ditfurth & Lienemann 2022). Broadly, we categorize data-sharing infrastructure as either based on data intermediaries (as digital platforms engaged in facilitating data sharing) or IOIS. Referring to inter-organizational data sharing in supply chains, the literature predominantly discussed IOISs, which connect individual systems of organizations (e.g., Enterprise Resource Planning (ERP) systems) for deeper supply chain integration and automated data sharing (Holland 1995; Johnston & Vitale 1988). These shared systems mitigate manual data exchange, contribute to productivity gains and flexibility (Cash & Konsynski 1985), and replace “traditional” means of inter-organizational communication such as the telephone or fax (Suomi 1992). They are typically engraved in bilateral data-sharing scenarios or multilateral networks that revolve around a focal entity (e.g., a supplier) (Kumar & van Dissel 1996).

Scaling inter-organizational data sharing beyond bilateral IOIS is faced with a range of challenges. First and foremost, sharing data between organizations—as opposed to consumers—requires consideration, at scale, of the potential harms (e.g., business secrets) that could be spilled, even unintentionally (Zrenner et al. 2019). Retaining control over one’s data is commonly referred to as data sovereignty, which can be technically implemented through formalized usage control policies (Otto & Jarke 2019). Data spaces are technical data sharing infrastructures that share characteristics with IOIS and data intermediaries. On the one hand, they aim to open up a shared space for organizations to find trusted data sources and, in that, aim to be open and fertile soil for inter-organizational optimization and business innovation. This resembles their position as data intermediaries—“a mediator between those who wish to make their data available, and those who seek to leverage that data” (Janssen & Singh 2022 p. 2), i.e., two-sided markets for inter-organizational data sharing (Ditfurth and Lienemann 2022). On the other hand, a data space itself is decentralized and does not store data centrally in one platform; it uses so-called data space connectors to facilitate data sharing (as an IOIS) between two parties. In this, data spaces are IOIS as they “enable the movement of information across organizational boundaries” (Johnston & Vitale 1988 p. 153) but do so by accommodating organizational barriers by spanning boundaries with technically implemented data sovereignty.

The synthesis of this results in the dual nature of data spaces, both as data intermediaries (multilateral, ecosystem view) and IOIS (bilateral, data sharing view), which use connectors to ensure technical data sovereignty and onboarding mechanisms to generate a trusted pool of data ecosystem actors (Braud et al. 2021). Figure 1 illustrates our conceptual understanding of data spaces as data intermediaries (the big picture) and IOIS (a zoomed-in transaction).

Fig. 1
figure 1

The dual nature of data spaces as data intermediaries and IOISs

In the following, we summarize and discuss options to operationalize industrial data sharing in data ecosystems based on existing literature (e.g., Oliveira et al. 2019; van den Broek & van Veenstra 2015) and our observations in practice (see Table 2). We will then discuss data spaces in detail and align them with data ecosystems.

  • First (1), data and information extracted are essential to supply chains to communicate and work (Ahmed & Omar 2019). They are the basis for generating IOISs that automatically share information between supply chain participants (Holland 1995). Typically, these data-sharing relationships are sequential in that participants in the supply chain share data to ensure compliance or to improve processes (van den Broek and van Veenstra 2015). However, there is also a growing tendency for data to be shared across supply chain partners and supply chains. For example, in the area of the circular economy, data is shared between different material suppliers, battery manufacturers, and recyclers to decide how, for example, a battery should be reused or recycled.

  • Second (2), data ecosystems emerge around data intermediaries (Ditfurth & Lienemann 2022), in which data providers offer data that can be searched and accessed by data users based on various decision criteria (e.g., the data type or the price) (Jussen et al. 2023a). Some types of data intermediaries are usually open to anyone (e.g., data marketplaces), while others are restricted only to data users that fulfill pre-defined governance policies (e.g., data trusts) (Ditfurth & Lienemann 2022). In governmental data sharing, data collaboratives revolve around one or a few private or public organizations that provide data to others to foster innovation with the distinct goal of contributing to a societal good and public governance (Klievink et al. 2018; Susha et al. 2022).

  • Third (3), a data ecosystem emerges around data spaces. Data spaces share some characteristics with data intermediaries. Similar to data intermediaries, data spaces need to orchestrate and bring together data providers and data users and—like digital platforms—must exploit network effects (Otto 2022b). Contrary to most data intermediaries, data spaces are only open to participants who possess technical access (i.e., a data space connector) and share data between data providers and data users. Prior to data sharing, data providers and data users must find each other’s offers and demands by providing meta-data to a data catalog (which can be operationalized through a data intermediary as a data space participant). The actual data is kept decentralized with the data provider and is only shared once negotiations are successful. In this regard, data spaces share characteristics with data intermediaries (as digital platforms) and as an IOIS integrating two decentralized systems through data space connectors (e.g., Zrenner et al. 2019).

Table 2 Views on data sharing in IOISs, data intermediaries, and data spaces

Data space-enabled data ecosystems

Data spaces in data ecosystems

Given the explication of data spaces as motors for data ecosystems above, we will define the relevant constructs—for data ecosystems enabled by data spaces—below. Data sharing is the process of giving others access to data that they would not have access to on their own (Jussen et al. 2023a; Jussen et al. 2024). Figure 2 conceptualizes three layers of data space-enabled data ecosystems, and we define data spaces as follows:

Fig. 2
figure 2

The Catena-X and Mobility Data Space data spaces as illustrative case scenarios contextualized within data ecosystems (for a detailed description of the illustrated use cases used for the scenarios, see https://catena-x.net/en/benefits-pros/sustainability and https://mobility-dataspace.eu/use-cases last accessed: 14.07.2024)

Data spaces are decentralized data infrastructures designed to enable data-sharing scenarios across organizational boundaries by implementing mechanisms for secure and trustworthy data sharing—such as distributed data storage and the sharing of meta-data. They guarantee data sovereignty by ensuring that the data provider determines control over the access and use of the shared data.

Data spaces allow the formation of flexible organizational forms that grant a delimited set of members access to a secure and trusted space to share data, which can be embedded in a larger data ecosystem. Data ecosystem parties can share data without explicitly using the data space technology and may contribute data but are not part of the demarcated set of members sharing data in a data space under the same set of data sovereignty mechanisms (e.g., defining data usage policies). The conceptual boundary may be formed around a technology (e.g., AI), a domain (e.g., automotive), or other factors (e.g., using a specific architecture or a shared purpose). We differentiate between organizations directly (technically) engaged in data sharing through a data space (data space members) and those that are part of the data ecosystem (data ecosystem parties) but do not directly engage in the data space themselves (see Fig. 2). For example, the Catena-X Automotive Network envisions a data space integrating data from all parties alongside a supply chain (Catena-X 2024b). Members access the data space through dedicated data space connectors. This software component acts as an interface between the internal systems of the data space members and the data space itself (Pettenpohl et al. 2022). It can also be extended with additional functions, such as the International Data Spaces (IDS) Connector, which can interpret and technically enforce data usage policies (Otto et al. 2022b; Zrenner et al. 2019). The Eclipse Dataspace Connector Component (Spiekermann 2022) is used in the Catena-X Automotive NetworkFootnote 5. The corresponding data ecosystem to Catena-X’s data space could be the Automotive data ecosystem, potentially having more than one data space. Transferred to the Catena-X case, we find many use cases, such as Manufacturing-as-a-Service (Catena-X 2023b) or Circular Economy (Catena-X 2023a), from a sub-group of the data ecosystem on the data spaces. Each use case requires different data and parties that act in the data space and are part of the data ecosystem.

From a technical point of view, the term data space describes a specific data infrastructure concept, which can be characterized by four properties (Franklin et al. 2005; Halevy et al. 2006; Otto 2022a, 2022b; Otto & Burmann 2021):

  1. 1.

    Distributed: Data spaces are distributed by design, which means that they do not require physical data integration but leave the data at the data source and make it accessible only when it is needed.

  2. 2.

    No common schema: Data spaces do not require a common database schema to which data from different sources must adhere. Rather, data integration occurs at the semantic level, e.g., through common vocabularies.

  3. 3.

    Data redundancy: The distributed architecture of data spaces allows for redundancy of data, i.e., multiple data objects can coexist in a data space describing the same real-world object.

  4. 4.

    Nested and overlapping: Data spaces can be overlapping and nested so that data providers and data users can be members in multiple data spaces, and data can be shared between data spaces.

Based on this understanding, we define data ecosystems based on data spaces as follows (see also Fig. 2):

Data ecosystems are socio-technical systems that emerge around one or multiple (federated) data spaces. They represent the sum of collaborative data-sharing activities built on the secure and trustworthy data-sharing paradigm of data spaces to realize shared goals (e.g., innovation, compliance, optimization) for their members.

To summarize this understanding of data spaces and their data ecosystems, we establish four principles from practice and research to guide our understanding of how they work in data ecosystems:

  1. 1.

    There is more than one data ecosystem. They may be differentiated by referencing a technology (e.g., AI), domain (e.g., automotive), or other conceptual boundaries delineating one data ecosystem from another. Different data ecosystems can intersect and generate an overlapping data ecosystem through data spaces being part of more than one data ecosystem. Each data ecosystem is operationalized through at least one data space (see (2) in Fig. 3).

  2. 2.

    A data ecosystem can span more than one data space. For example, the Gaia-X data ecosystem consists of various data spaces, such as Agri-Gaia or the Mobility Data Space (e.g., Otto 2022a). The Gaia-X-based health data ecosystem conceptualizes all relevant health stakeholders acting in multiple data spaces that should be connected (Gaia-X 2021). Data spaces can connect through technical integration (e.g., APIs) and be part of more than one data ecosystem (e.g., Otto & Burmann 2021). However, suppose data spaces exist for the same domain in different countries. Arguably, these data spaces would initially not be connected but operate in parallel (see (1) in Fig. 3). The four options indicate development steps (see trajectories in Fig. 3). It appears that the 4th option, i.e., overlapping data ecosystems with connected data spaces, is the goal, as otherwise, further “larger” data silos, i.e., isolated data spaces, would emerge (see (2) in Fig. 3).

  3. 3.

    Each data space can enable more than one data-sharing use case between its members or a sub-group of its members. For example, Catena-X (2022a) proposes ten use cases at this point, which, naturally, only concern a smaller sub-group of all Catena-X (2024a) members. Each data space provides the technical infrastructure (e.g., standards, interoperability) to facilitate the data-sharing use cases.

  4. 4.

    Governance policies are possible across all levels of abstraction, i.e., rules and guidelines can be specified on the ecosystem, data space, and use case level (e.g., Otto 2022b). These policies can be legal requirements, rules set by the data space, i.e., Governance Frameworks or Rulebooks (e.g., see Data Spaces Support Center 2023), or rules and terms negotiated bilaterally between data-sharing participants.

Fig. 3
figure 3

Scenarios of various data ecosystems and connected data spaces

Data spaces (initiatives) in practice

The complexity of data space initiatives offers research opportunities for IS from multiple angles. Broadly, we classify these initiatives into three categories that are not final but a starting point (see Table 3). First (1), live data spaces are technical infrastructures actively facilitating data sharing. Second (2), architecture and standardization initiatives support data ecosystem formation by guiding architecture design and standardization of data spaces (e.g., connector technologies or vocabularies). Third (3), support organizations help organizations implement their data spaces by creating a central point of contact that synergizes, accumulates, and distributes design knowledge about data spaces.

Table 3 Overview of selected data ecosystem initiatives (see also Otto et al. (2022a) or Anjomshoaa et al. (2022) for more examples of data ecosystems and data spaces)

Examples of live data spaces

The Catena-X Automotive Network e.V. was founded in 2021 with the vision of creating a data ecosystem in the automotive industry on European values (Catena-X 2024b). The initiative was launched by a consortium of 17 founding members who applied for a research and development project funded by the German government. At the same time, the non-profit association Catena-X was founded, which is open to all interested partners and currently has over 130 members (Catena-X 2024a). Within the Catena-X research project, initial use cases were defined to demonstrate the added value of data sharing between stakeholders in the automotive industry. All use cases have in common that data must be shared across companies’ borders; i.e., no single company can realize the use case alone.

The German Federal Ministry for Digital and Transport established the Mobility Data Space (MDS) initiative in 2019 with a funding volume of 18 million Euros (Delhaes 2021). More than 200 stakeholders in the German mobility landscape, from science, business, and public administration, were involved in its conception (Mobility Data Space 2023). The appointed goal of this data ecosystem is to ensure a trustworthy, sovereign, and decentralized exchange of data in the mobility sector. The technical foundation for the MDS is the architecture of the IDS initiative to ensure data sovereignty, interoperability, and secure data exchange and to enable innovative data-based mobility solutions. Current use cases include traffic optimization, predictive maintenance of transport infrastructure, AI-supported mobility optimization, and the further development of autonomous driving. In 2021, a neutral non-profit supporting company for the data space was founded to build up the MDS further and orchestrate it technically and commercially (Mobility Data Space 2023).

Examples of architecture and standardization initiatives

The International Data Spaces (IDS) initiative fosters interoperability and data sovereignty when sharing data (Otto & Jarke 2019). The initiative started in 2015 with a Fraunhofer-Gesellschaft project funded by the German Federal Ministry of Research and Education (BMBF), followed in early 2016 by the founding of the non-profit International Data Spaces Association (IDSA), which now includes over 130 members from more than 20 countries (International Data Spaces Association 2022). In the IDS Reference Architecture Model (IDS-RAM), the International Data Spaces Association specifies a technology-agnostic software architecture for data spaces.

The Gaia-X initiative was initiated in 2019 by the German and French governments and institutionalized by establishing a non-profit institution based in Brussels. Gaia-X extends the scope of the IDS initiative to the (cloud) infrastructure level (Otto et al. 2021). The Gaia-X initiative (having over 350 members) develops a specification for an overall Gaia-X architecture that aims to assist in creating data ecosystems for data sharing in a trusted European environment (Gaia-X 2022).

Examples of support organizations

The Data Spaces Support Centre (DSSC) is funded by the European Commission with 14 million euros to help implement requirements set by the European Strategy for Data (EU Commission 2020). The DSSC strives to facilitate common data spaces that collectively create an interoperable data sharing environment within and across sectors while based on European values (European Commission 2021).

Research opportunities for IS research

Data ecosystems and data spaces are complex socio-technical systems with many facets and research opportunities (Burmeister et al. 2021; Oliveira et al. 2019). Data spaces are predominantly practice-driven IT artifacts. Through funded projects and the dissemination of calls for papers in our field, they have gained more and more attention in IS research. The European research landscape and legislation efforts foster this attention by building data spaces in almost all industries, e.g., public administration, mobility, media, or tourism (European Commission 2024a). We propose a set of research questions inspired by a Socio-Technical Systems (STS) view of data ecosystems and data spaces (Bostrom and Heinen 1977) that we derive from our extensive experience working in the field of data spaces. While we expect that there are more, we propose three research perspectives from our experience that are valuable to explore.

First (1), from a dual socio-technical view, organizations and people act in data ecosystems and build complex relationships, as well as motivations for engaging in data sharing. In this, data spaces are IT artifacts that are purposed to span organizational boundaries as they aim to mitigate barriers produced by a lack of trust and the need to foster data sovereignty. Second (2), from a social view, data spaces need to produce generative mechanisms that ensure longevity and business success by establishing transactions between data providers and data users as a foundation for network effects. Primarily, research should account for the unique role of data as a digital transaction object in these data space-enabled data ecosystems, which are shared among different parties in different scenarios. Third (3), data spaces are technical infrastructure with components that must be configured. The growing interest in data spaces means that IS research has the opportunity to accompany these developments and explore data spaces as new drivers for digital transformation as it evolves. Table 4 summarizes these three research perspectives, which we will discuss in detail below.

Table 4 Research questions for data space-enabled data ecosystems

Data spaces for organizational boundary spanning

Plainly spoken, data spaces are IT artifacts that should help organizations transcend organizational boundaries and share data while upholding data sovereignty. This means that data spaces take on the role of boundary spanners (e.g., Aldrich & Herker 1977), working on mitigating effects preventing inter-organizational data sharing. Boundaries between organizations occur in a multitude of facets. One boundary in operationalizing data sharing use cases is heterogeneity in data standards, interoperability, and rules. Data standardization is necessary since standardization is the precursor for the interoperable use of shared data and is “aimed at inducing conformity of practice and behaviour” (Hawkins & Blind 2017 p. 3). In established industries, this is a complex, challenging task because of “varying incentives and capabilities of existing industry players” (Dinçkol et al. 2023 p. 2). Data spaces contribute to solving the data standardization tension in data sharing, which is characterized by existing divergences in the decision of whether and on which standards companies can agree upon (Jussen et al. 2023b). For instance, the operating company Cofinity-X (2024) of the large-scale data space Catena-X Automotive Network outlines the path for the automotive industry from an industry in which data sharing is hindered due to non-existing unifying standards facilitated by the data space connector technology and shared Catena-X standards (Catena-X 2024c). Another example is SCSN, which provides a messaging standard that defines what information and data can be shared in which format and maintains this for all participants of the data space (Smart Connected Supplier Network 2024). However, mitigating tensions and potential conflicts of finding common standards is not a simple task but requires inter-organizational orchestration of parties that have long-established standards. Explicating this field of tension and outlining pathways for how to solve it is an excellent opportunity for IS research. For instance, this field could be investigated from different perspectives, such as actor-network theory, which characterizes the motivations of individual actors to uphold or adapt their standards and how this weighs to the benefit of participating in a data space and its data ecosystem.

Another boundary between organizations in data sharing emerges through missing trust, ignorance about short-term and long-term benefits, or data misappropriation (e.g., Opriel et al. 2021). Against this stands the potential benefits and incentives that organizations expect from continuing digitization and the growing data economy. In this conflict, data spaces can assist organizations in overcoming boundaries as an artifact to integrate organizational activities (Schotter et al. 2017). This requires exploring the capabilities of data spaces to generate a trusted data ecosystem and balancing the concerns of data providers against them. In personal data sharing, the Privacy-Calculus Theory (Laufer & Wolfe 1977) helps weigh the potential benefits of using technology against the risks associated with sharing data. We expect something similar is needed, and this would require an effort to theorize the balancing between the existing concerns of data providers and the capabilities of each data space to mitigate them. We find this theorizing effort especially relevant since only the successful adoption between organizations can lead to long-term dissemination and utilization of data spaces in practice.

From our experience working with large data space projects, we know that establishing interoperability and interconnectivity between data spaces is a current issue. Subsequently, a prevailing research issue is how to prevent generating new “larger” data silos on data spaces level through researching ways to establish federations of data spaces and data ecosystems. Naturally, this gets even more complicated when these data spaces intersect different domains, technologies, or (inter-)national boundaries with their specific requirements and restrictions. Given that a variety of data spaces (and all their potential instantiations) are currently being built and initiatives are being founded, a central proposition is harmonizing (design) knowledge about the design of data spaces and making it available to others.

Data spaces for generativity

Data spaces, in their view as data intermediaries and digital platforms, produce generative mechanisms as digital infrastructure (Yoo et al. 2010). For instance, they allow for dynamic recombination of data by different organizations to realize specific digital solutions for specific use cases. However, the value mechanisms occurring in data spaces and their surrounding data ecosystems are—as of now—not well understood. Rather, there are some indications, such as existing use cases (see the examples above) and commitments from organizations and politics. One way to make this value and generativity more transparent is drawing from established and proven modeling procedures such as e3-value modeling language to explicitly identify and outline the value objects, actors, and their relationships both bilaterally and multilateral (Gordijn & Akkermans 2003). With the rising number of data spaces and research about them, there will be more and more data to model generative value patterns. This would imply a stronger focus on theorizing business models for data spaces as well as collecting empirical data as they emerge and work in practice.

In their role as data intermediaries, there should be ample ground for theorizing on novel data network effects (similar to Gregory et al. 2022) that do not accumulate through collecting more and more data about customers but granting potential data users access to a sphere of potential data and subsequently potential for innovation and optimization. As a result, it would be an interesting avenue to explore whether there are specifics in data spaces and their network effects—arguably trusted network effects—that differ from traditional marketplaces or digital platform infrastructures. This could fill the gap of generating such structures in industrial domains, which are traditionally more restrictive when it comes to sharing their data. Accompanying these processes on an individual, organizational, data space, and potentially a data ecosystem level could yield promising and valuable research in data sharing adoption in traditional industries (e.g., Legenvre & Hameri 2024). In that regard, it would be interesting to see how data-sharing initiatives in traditional industries differ from value creation in not-yet-connected industries. For the former, data sharing can act as an enabler and supports efficiency. For the latter, data sharing is novel and not possible without data spaces.

Data spaces for organizational infrastructure

From a configurational perspective, data spaces have a variety of components that can and have to be shaped according to the organization’s capabilities and application scenarios. For instance, the data space connector is not one unified entity but a design artifact that has a range of potential functions. In the data space connector report, Giussani & Steinbuß (2023) list existing data space connectors and show how they differ (e.g., closed source versus open source). SovityFootnote 6 provides data space connectors and offers them in a freemium business mode. One version is free and open source, and one is maintained, updated, and implemented by the company. Since the data space in itself has the potential to generate data ecosystems, it can host data intermediaries and other complementary roles as active data space participants. Navigating this field of potential (e.g., implementing data trusts or data marketplaces) requires a configurational analysis of data spaces as infrastructure and outlines the range of options in their implementation. This means that options for how to implement and configure data spaces have consequences on the technological and individual organizational levels as well as for the emerging ecosystem as a whole.

Conclusion

In this fundamentals article, we offer an introduction and overview of data spaces and data space-enabled data ecosystems. We were motivated by their strong presence in practice and legislative pressure that increasingly push data spaces to the center of the European digital transformation. Data spaces are one way to foster trusted inter-organizational data sharing and thus have the opportunity to be a motor for the European data economy. At its heart is the implementation of data sovereignty, which ensures that data rights holders can retain control over their data. In our article, we discussed data spaces from two perspectives. First, the bilateral view, in which data spaces work as IOIS connecting two independent organizations through data space, connects and subsequently integrates their data-sharing activities. Second, from a multilateral data intermediary point of view, participating in data spaces grants users access to an ecosystem of potential co-actors that engage in a trusted environment and are possible candidates for data sharing.

We provided a contextualization of data spaces and data ecosystems. In particular, we conceptualize that data ecosystems can emerge around data spaces. We distinguish between those participants who are actively engaged in the data space through technical data space connectors and those on the periphery who might contribute data to data space participants but are not technically integrated themselves. In this, we conceptualize how data ecosystems relate to data spaces, i.e., how many data spaces can occupy a data ecosystem or whether they can overlap. We also formulate research questions based on our experience and the arguments we discussed in this article. While not comprehensive or conclusive, we find these research perspectives to be intriguing and promising and hope they spur the interest of the IS research community.

Overall, we believe that this fundamentals article provides an important introduction and overview that offsets interesting avenues for future research and, hopefully, spurs researchers to communicate their theoretical and empirical findings around data spaces and their ecosystems. By its nature, the fundamentals article is restricted to discussing the most important issues from our view. Subsequently, it does not provide deep theorization of data spaces and only accounts for a fraction of potential research questions. While we embed our research agenda in a socio-technical view, we are fully aware that there are more facets and potential research questions to explore data spaces and their data ecosystems. Additionally, the fundamentals article concentrates on data spaces, fully knowing that the possibilities of data infrastructures are diverse and manifold. As data spaces mature and are used in practice, it is a reasonable route to explore how they impact data sharing practice as opposed to or in combination with other (de-)central data sharing infrastructures.