1 Introduction

Big data represents the leading edge of innovation, competition, and productivity [1]. A multitude of advanced analytical algorithms and applications harness big data to pioneer novel theories and technologies, such as artificial intelligence and edge computing. In the midst of the big data surge, the processes of data sharing and exchanging occur ubiquitously and continuously. Such sharing and exchanging take place between specific entities, be they individuals, devices, or databases. These entities relay information amongst themselves, with the mechanisms of transmission spanning electronic methods or specialized systems [2]. Notably, while data exchanging entails a bi-directional transfer, data sharing is a unidirectional process. In recent decades, the paradigm of hosting, sharing, and exchanging data in the cloud has emerged as the predominant design choice. This has led to the rise of third-party platforms as the preferred means for participants in data sharing and exchanging. For instance, Amazon introduced the “Amazon Web Service (AWS) Data Exchange”, a platform that allows customers to tap into third-party data sources within the AWS marketplace. This service ensures reliable access for customers on an unprecedented scale and doubles as a streamlined tool for data ingestion and utilization [3].

Data sharing and exchanging offer a plethora of benefits, including fee-less transactions, tamper resistance, enhanced services, high transparency, and real-time engagement for all involved parties. A pertinent example is Google Drive’s collaboration with WhatsApp [4], allowing users to back up their chat histories and media to the cloud, ensuring data portability and recovery without transaction fees. Nevertheless, this paradigm faces a multitude of challenges:

  • A predominant challenge in data sharing and exchanging concerns the willingness of ordinary individuals to engage and share their data resources. Concomitant issues of privacy, security, and costs (e.g., energy consumption, network bandwidth) might deter participants, especially if the rewards are not deemed sufficient. Thus, crafting mechanisms to incentivize participation becomes a pressing priority.

  • There exists an inherent trade-off between data privacy and accessibility. For cloud-based data-sharing and exchanging platforms, striking a balance between security and efficiency becomes pivotal during mechanism design.

  • As the digital market for data sharing and exchanging evolves, devising an equitable data pricing strategy emerges as a new challenge. The quest for an efficient digital market necessitates mechanisms that price data transparently while safeguarding data privacy.

Given these challenges, designing incentive-based mechanisms stands out as a pivotal research area within the realm of data sharing and exchanging.

In recent years, the design of incentive mechanisms has become increasingly prevalent in crowdsensing applications within the realm of computing. One illustrative case is Waze, a crowdsourcing-based navigation application. The platform introduced a mechanism termed “Awazeing Race” to motivate both existing and prospective users to engage with the Major Traffic Event (MTE) tool in the Waze Map Editor (WME). This initiative was aimed at enhancing the volume of user-contributed MTEs and closures, thereby enriching the overall Waze experience for local users. A review of pertinent literature reveals that incentive mechanisms in computing can be broadly classified into three categories: entertainment, service, and monetary incentives [5]. Entertainment-centric incentives predominantly employ location-based mobile games to spur participation [6,7,8]. Service-oriented incentives, on the other hand, leverage the promise of enhanced service benefits as a motivational strategy. For instance, in GPS applications, users not only consume data but also contribute to its generation, driven by the aspiration for superior service quality [9, 10].

Monetary-based incentives have emerged as a prevalent strategy to motivate mobile sensors to participate. Within this domain, price determination and the criteria for winner selection have piqued the interest of numerous researchers. For instance, ride-sharing apps like Uber use dynamic pricing algorithms that incentivize drivers (mobile sensors) by increasing fares during peak demand times, effectively balancing supply and demand [11]. Nevertheless, the intricacies of designing incentive mechanisms escalate when applied to the data-sharing and exchanging process. Our analysis reveals that, relative to the aforementioned categories, monetary incentives garner more extensive attention in computing research. As delineated in Table 2, a significant fraction of researchers have gravitated towards leveraging game theory algorithms in computing to objectives such as utility maximization [12,13,14], profit maximization [15,16,17], and social welfare maximization [13, 18,19,20,21]. Concurrently, there are studies employing economic incentives to attain analogous goals. Notwithstanding this proliferation, it is noteworthy that only a limited number of researchers have delved into the holistic design of incentive mechanisms within the entire data-sharing and exchanging platform. A real-world example is the use of cashback rewards by credit card companies to encourage consumers to share transaction data, which is then utilized for personalized marketing and data analysis [22].

We decompose the data sharing and exchanging process into four principal components: data creation, data storage, data access, and data privacy preservation. Contrary to the classifications of earlier researchers, we posit that it’s redundant to segregate incentive mechanisms into entertainment-based, service-based, and money-based categories. Instead, an amalgamation of both monetary and non-monetary incentives is imperative to galvanize holistic participation in the data-sharing and exchanging ecosystem. For instance, on such a platform, integrating service-based with monetary incentives can be an efficacious strategy. This would entail providing participants with both service credits and direct monetary rewards. Notably, even though providers in the data-sharing and exchanging paradigm might concurrently serve as requesters, the allure of service credits remains undiminished, proving invaluable when they seek access to future data resources. Take Microsoft Azure, for instance, which provides credits to users who contribute to its machine learning datasets, encouraging a reciprocal data-sharing ecosystem [23].

In the ensuing sections of this survey, we commence by presenting a preliminary definition of data sharing, data exchanging, and the underlying incentive mechanisms. Subsequently, we delve into a thorough review and discourse on the associated incentive mechanisms and optimization algorithms that underpin the life cycles of data sharing and exchanging. Ultimately, we shed light on the prevailing challenges and opportunities encompassing data creation, storage, access, and privacy preservation in the context of data exchange and sharing.

Our primary contributions to this domain can be distilled as follows:

  • We put forth a nuanced taxonomy of the incentive-driven processes in data sharing and exchanging, predicated on its lifecycle. Concurrently, we encapsulate the challenges inherent to each phase.

  • Our discourse extends to a meticulous examination of incentive mechanisms pivotal to data sharing and exchanging. Although we bifurcate these mechanisms into monetary and non-monetary classifications, our stance diverges from preceding researchers; we advocate for a synergistic integration of both categories to stimulate greater participation in data sharing and exchanging.

  • For the first time, we systematically deconstruct the lifecycle of data sharing and exchanging into its quartet of elements: data creation, data storage, data access, and data privacy preservation. Each segment is underpinned by a comprehensive exploration to serve as a point of reference. Additionally, we provide an exhaustive analysis of the nuances of privacy preservation spanning the entire lifecycle.

  • We highlight five emergent research trajectories in the ambit of incentive mechanisms for data sharing and exchanging. These span computational efficiency, trustworthiness, data privacy and security, data management system intricacies, and data quality, among others. Within each avenue, we discern current lacunae and prospective directions. A notable proposition is the conceptualization of a system rooted in blockchain technology for data sharing and exchange, synergized with diverse incentive mechanisms. The integration of such mechanisms with deep learning algorithms, we posit, will pave the way for the next generation of incentive-centric data-sharing and exchanging frameworks.

The remainder of this paper is organized in a systematic fashion. Section 2 delineates the preliminary definitions central to our discussion. In Sect. 3, we present a review of the pertinent existing literature. Section 4 offers a comprehensive analysis of the lifecycle associated with data sharing and exchanging. Challenges inherent to the field are highlighted in Sect. 5, while potential research avenues are explored in Sect. 6. Finally, Sect. 7 elucidates the research opportunities present within various incentive-based data-sharing and exchanging applications.

2 Preliminary definition

2.1 Data sharing

Data sharing occurs among n distinct entities, which can be represented by the list \(\mathbb {B}=(\beta _1, \beta _2, \beta _3,..., \beta _n)\). This sharing can take various forms: it can transpire between individuals with extensive databases, between individuals and public organizations, or between public and private entities. We can denote a collection of datasets as \(\mathbb {D}= ( d _1, d _2, d _3,..., d _m)\). In the context of data sharing, an entity represented by \(\beta _i\) can access the dataset represented by \(d _j\) provided they obtain the requisite authority from another entity. Database access grants have traditionally been utilized as a mechanism for data sharing. To access such a granted database, it is imperative for user accounts to be part of one or more user groups. Authorization to the databases is then conferred upon these users. For instance, the SQL grant database is a widely recognized method employed across various database systems, such as SQL Server [24].

Cloud-based data sharing has become an ubiquitous method in the contemporary era of data dissemination. Cloud storage and computing serve as pivotal elements in the domains of data sharing and exchange. These cloud infrastructures utilize standard protocols to provide access to a myriad of configurable resources, encompassing applications, storage solutions, networks, servers, and various services. The concept of using the cloud for data sharing can trace its origins to an internal document of Compaq in 1996. This idea matured over the subsequent decade, culminating in the advent of cloud computing. In 2006, Amazon took a significant step in this direction by launching the Elastic Compute Cloud (EC2) to bolster its Amazon Web Services. Following suit, Google introduced the Google App Engine in 2008. The 2010s witnessed the emergence of sophisticated Smarter Computing frameworks, subsequent to IBM’s unveiling of IBM SmartCloud. Within these advanced data sharing and computing architectures, cloud-based data sharing is an integral component [25].

Utilizing cloud-based platforms for data sharing offers a multitude of advantages, notably reducing costs and infrastructural management overheads. Users benefit from the “pay-as-you-go” model, incurring costs only for data processing and storage, be it in a public or private cloud. Furthermore, cloud services are renowned for their scalability, adeptly adjusting to varying demands, expediting development tasks, and delivering efficient computing solutions [26, 27]. Data sharing via the cloud empowers entities to seamlessly access data remotely [28]. Numerous applications leverage cloud data-sharing capabilities, enhancing quality of life and productivity. For instance, Google Docs [29] furnishes a collaborative environment for users to disseminate diverse data types like documents and images. Similarly, DocuSign [30] facilitates the sharing of documents requiring signatures. However, with the exponential proliferation of IoT devices and the advent of 5G technology, the demands on data sharing have intensified. Critical questions arise, such as the cloud’s capacity to process and store voluminous data from myriad IoT devices, and whether latency issues can be effectively managed for time-sensitive applications like autonomous vehicles. Addressing these concerns, recent years have seen the emergence of cutting-edge edge computing paradigms, including fog computing (FC) [31], mobile edge computing (MEC) [32], and mobile cloud computing (MCC) [33]. Shifting data sharing to the edge has proven increasingly efficient and prevalent. These avant-garde distributed computing frameworks share a core principle: rather than relying on centralized cloud resources, they harness computational power closer to the end-users - typically through smart or edge devices. Such a configuration optimizes data sharing by processing data at the edge, markedly reducing transmission times. A practical example of MEC is the deployment of edge servers by telecom operators to provide low-latency gaming experiences on mobile devices. MCC’s real-world application can be seen in services like iCloud, which seamlessly integrate edge devices with cloud storage to optimize data accessibility and processing [34].

2.2 Data exchanging

Data exchange, while falling under the umbrella of data sharing, is distinct in its bidirectional nature. In this paradigm, entities engaged in the data-sharing ecosystem reciprocally exchange their resources to fulfill their respective data requirements.

Table 1 Data exchanging highlights

Data exchange, while being a subset of data sharing, is distinct due to its bidirectional nature. This implies that different entities participating in the data-sharing process must reciprocate with their resources to access the desired data.

Similar to data sharing, data exchanging takes place among n entities and is denoted by the list \(\mathbb {B}=(\beta _1, \beta _2, \beta _3,..., \beta _n)\). These exchanges can involve various entities, from individuals with vast databases to interactions between individuals and public organizations, and between public and private organizations. The datasets involved are represented as \(\mathbb {D}=( d _1, d _2, d _3,..., d _m)\). When data sharing occurs, each entity \(\beta _n\) can access the dataset \(d _m\) if they obtain the authority from another entity. However, in data exchanging, there is a defined set of goals or targets. These are represented as \(\mathbb {T}=( t _1, t _2, t _3,..., t _i)\). To realize the data exchanging targets of \(t _i\), an appropriate incentive schema should be in place to spur the targets to completion. Data sharing historically took place in centralized databases and the cloud. However, the trend is increasingly shifting towards decentralization. Table 1 elucidates the significant milestones in the evolution of languages used for data exchange.

Data exchange enables the transfer of data between various systems and organizations while maintaining its integrity and meaning, ensuring that no modifications or alterations are made to the content [35]. This process often involves incentives to foster participation. The data requesters can compensate the data owners through various means, including monetary rewards or alternative data resources. Several algorithms, drawing from fields such as economics and game theory, have been developed to determine the optimal compensation or reward in the data-exchanging scenario. A practical example of this mechanism in action is the platform Airbnb [36]. Airbnb, a peer-to-peer service for people to list, discover, and book accommodations around the world, embodies the essence of data exchange. Property owners list their homes for travelers to rent, essentially sharing their data (property details, availability, price, etc.) with potential guests. In return, they receive monetary compensation when travelers book their spaces. Simultaneously, Airbnb uses optimization and ranking algorithms to gauge the success of each property listing. Properties that fare well, receive positive reviews, or fit specific criteria are then prioritized and given more visibility in the platform’s search results, benefiting the homeowners further. This system of rewards, both in visibility and monetary compensation, exemplifies the principles of data exchange.

Fig. 1
figure 1

Lifecycle of data sharing and exchanging

Participants in data exchange might be hesitant to share their data if they believe the data they receive in return falls short of their expectations. A significant incident occurred in 2021 with the Microsoft Exchange Server data breach, in which attackers gained access to user emails and passwords, eroding trust in secure data exchange [37]. Participants might have believed that it wasn’t worthwhile to share their data without adequate incentives. Thus, establishing a fair data valuation and incentivizing participants to engage in the exchange becomes paramount. Within the sphere of data pricing, numerous factors require optimization. Challenges such as determining the right price point and allocating value based on data quality are pivotal. Consequently, the development of robust and precise pricing algorithms is integral to the success of data exchange.

2.3 Data sharing and exchanging life-cycle

The process of data sharing and exchanging hinges on four primary components: data creation, data storage, data access, and data privacy-preserving. The genesis of this process, data creation or collection, hinges on pivotal decisions regarding the nature, method, and volume of data collection. Central questions include: What kind of data should be collated? What are the optimal methods for its collection? How extensive should the data pool be? Once created, the data’s preservation requires both secure and efficient storage solutions. Data access, the subsequent phase, revolves around granting permissions to various stakeholders involved in the data sharing and exchange process. Meanwhile, data privacy-preserving is not merely an isolated component but an omnipresent factor throughout the data lifecycle, ensuring the integrity and confidentiality of shared data. The entire lifecycle of data sharing and exchange can be visualized in Fig. 1.

The distinction between data exchanging and data sharing lies in the transactional nature of the former. Data exchange embodies a two-way data exchange mechanism, characterized by a reciprocal trading process. Hence, in the context of data exchange, it becomes paramount to motivate multiple entities to actively engage in the exchange while ensuring that the process remains robustly secure. These considerations underscore the key research themes in this realm.

2.4 Incentive mechanisms

Incentive mechanisms have traditionally played a pivotal role in the realm of human resources management, acting as catalysts to drive employee motivation, performance, and overall achievement [38]. A notable example can be seen in Google’s work environment, which has garnered a reputation for being exceptionally gratifying. The tech giant has ingeniously embedded incentive-driven strategies into its human resource management framework. Through an intricate system of incentives, ranging from peer bonuses to performance-based rewards, Google has fostered an organizational climate ripe with trust. This ecosystem not only promotes collaboration and teamwork but also empowers employees within similar departments to synergize their efforts and aid one another [39]. As the digital landscape evolved, especially with the proliferation of the Internet of Things (IoT) and the ubiquity of big data, these incentive mechanisms have found their application extended to the domain of data science.

The surge in mobile device usage has catalyzed the development of a myriad of mobile crowdsensing applications. These tools harness the power of collective intelligence, leveraging mobile users to share data for various sensing tasks in a crowdsourced fashion. Challenge.gov stands as a testament to this trend—a digital platform where the public collaboratively addresses pressing issues faced by federal agencies. Through this platform, innovative solutions are crowd-sourced, facilitating more informed and effective governmental decisions [40]. Existing research classifies incentive mechanisms within crowdsensing applications into three primary categories: entertainment-based, service-based, and money-based [5]. Entertainment-Based Mechanisms: These are designed to pique user interest by integrating elements of fun and engagement. Specifically, they encourage participation through location-based mobile games. Such gamified mechanisms have been explored and discussed in various studies, highlighting their effectiveness in promoting user engagement in crowdsensing tasks [6,7,8].

  • Entertainment-based mechanisms: These are designed to pique user interest by integrating elements of fun and engagement. Specifically, they encourage participation through location-based mobile games. Such gamified mechanisms have been explored and discussed in various studies, highlighting their effectiveness in promoting user engagement in crowdsensing tasks [6,7,8].

  • Service-based mechanisms: Such incentives offer tangible service benefits to users in return for their participation, capitalizing on the mutual relationship where both the provider and the participant stand to gain. A prime example can be observed in GPS applications. Here, users, while benefiting from the service, also act as data providers. The underlying principle is that a collective effort from all users ensures a more refined and accurate service [9, 10].

  • Money-based mechanisms: Monetary rewards remain a tried-and-true incentive. Within the realm of crowdsensing, the intricacies lie in determining the appropriate pricing strategy and selecting winners. These components are pivotal and have piqued the interest of many researchers aiming to optimize and refine monetary incentive systems.

In summary, as the digital landscape grows more interconnected, the potential of mobile crowdsensing applications continues to expand. Harnessing this potential effectively necessitates the design and implementation of robust incentive mechanisms that cater to a diverse user base.

In recent times, the burgeoning field of blockchain technology has emerged as a pivotal solution for safeguarding data privacy. Numerous scholars have delved into the realm of incentive mechanisms within the blockchain environment. Broadly, these mechanisms can be categorized into two predominant types: those rooted in game theory and external incentives [41].

Consensus algorithms, intrinsic to blockchain operations, necessitate incentives to galvanize miners to compute the hash functions, subsequently facilitating the creation of new transactions. The overarching objective of achieving consensus within blockchain networks is to ensure a unanimous agreement among all participating nodes. This process empowers even the untrusted nodes, enabling them to select an individual or a cluster of nodes responsible for instigating new transactions. Various incentive strategies have been formulated in the blockchain context, including but not limited to, Proof of Work (PoW), Proof of Stake (PoS), and Zero-Knowledge Proof.

In recent times, the burgeoning field of blockchain technology has emerged as a pivotal solution for safeguarding data privacy. Numerous scholars have delved into the realm of incentive mechanisms within the blockchain environment. Broadly, these mechanisms can be categorized into two predominant types: those rooted in game theory and external incentives [41]. During the COVID-19 pandemic, federated learning was instrumental at UCSF in developing AI models to predict the need for supplemental oxygen in patients, leveraging data across 20 hospitals without compromising patient privacy [42].

The intricate process underlying federated learning is illustrated in Fig. 2.

Fig. 2
figure 2

Federating learning process

Consequently, introducing incentives in federated learning becomes imperative to counteract potential challenges posed by selfish nodes and participants of suboptimal quality.

3 Existing data sharing and exchanging incentive mechanisms

This section categorizes the prevailing incentives in data sharing and exchanging into two distinct types: monetary and non-monetary incentives. As depicted in Fig. 4, these incentives structure the landscape of existing research. Notably, from our analysis, a synergistic approach combining both monetary and non-monetary incentives could be more effective in motivating participants to actively engage in the data-sharing and exchanging processes.

3.1 Monetary incentives

In the realm of computing, the emphasis predominantly falls on monetary incentives, as evidenced by a majority of the research in this domain. As illustrated in Table 2, we have encapsulated 24 notable studies from the computing sector. These studies provide insights into the performance metrics, types of mechanisms employed, applications, and optimization objectives of each respective paper. A discernible trend from our analysis indicates that game theory remains the quintessential algorithmic approach for designing incentive mechanisms. A majority of these works pivot around objectives of utility maximization and social cost minimization during the formulation of their optimization strategies.

3.1.1 Game theory-based incentives

Numerous game theory algorithms have prominently featured in the incentive mechanisms for data sharing and exchanging. According to a survey by Liang et al. [43], the dominant algorithms in this realm include the Stackelberg game, non-cooperative game, bargaining game, and the Vickrey-Clarke-Groves (VCG) game.

Table 2 Related works for monetary incentives

In the Stackelberg game, the decision-making process is divided into two periods. Detailed formulations of the game can be found in Machado’s work [60]. During the initial period, every node within the network selects its respective quantity, denoted as \(\mathcal {Q}_n\). The associated production cost is represented by \(\varsigma _n \mathcal {Q}_n\). For a scenario involving one leader and one follower in the Stackelberg game, the demand curve is defined as:

$$\begin{aligned} P(\mathcal {Q}_1+\mathcal {Q}_2)=a-b(\mathcal {Q}_1+\mathcal {Q}21) \end{aligned}$$
(1)

The total profit can be denoted as \(\prod _n (\mathcal {Q}_1+\mathcal {Q}_2)\), and can be calculated by:

$$\begin{aligned} \prod {_n}(\mathcal {Q}_1+\mathcal {Q}_2)=P(\mathcal {Q}_1+\mathcal {Q}_2)\mathcal {Q}_n-\varsigma _n \mathcal {Q}_n \end{aligned}$$
(2)

In the second period, the maximum profit or revenue can be defined as:

$$\begin{aligned} \max _{\mathcal {Q}_2}\prod {^2}=(P(\mathcal {Q}_1+\mathcal {Q}_2)-\varsigma )\mathcal {Q}_2=(a-b(\mathcal {Q}_1+\mathcal {Q}_2)-\varsigma )\mathcal {Q}_2 \end{aligned}$$
(3)

In the initial period, the maximum profit or revenue can be represented as:

$$\begin{aligned} \max _{\mathcal {Q}_1}\prod {^1}=(P(\mathcal {Q}_1+\mathcal {Q}_2)-\varsigma )\mathcal {Q}_1=(a-b(\mathcal {Q}_1+R_2(\mathcal {Q}_1))-\varsigma )\mathcal {Q}_1 \end{aligned}$$
(4)

The increasing demand for more efficient and advanced data-sharing and exchanging mechanisms has led researchers to explore various game theory models. The Stackelberg game, in particular, has been at the forefront of such explorations due to its effectiveness in handling hierarchical decision-making processes. A closer look at recent literature sheds light on its widespread application across multiple domains: Li et al. [13] ventured into the domain of WiFi-based indoor localization systems. Recognizing the challenges of constructing a radio map via conventional site surveys, they turned to crowdsourcing as a remedy. Mobile users were incentivized to contribute their indoor trajectories. Employing a two-stage Stackelberg game, the authors ensured the dual goals of maximizing mobile users’ utility while ensuring profitability for the crowdsourcing platform. Xiong et al. [16] innovatively bridged the spheres of mobile blockchain and edge computing. Addressing the resource-intensive nature of solving proof-of-work puzzles, they introduced a model leveraging edge computing for mobile blockchain. Their approach used a two-stage Stackelberg game to optimize the allocation of edge computing resources. Sarikaya et al. [44] tapped into the promising realm of federated learning. With a Stackelberg game at its core, their system discerned a Nash Equilibrium to adeptly balance the interplay between worker diversity and training latency. In the vehicle edge computing scenario, Zeng et al. [61] married the Stackelberg game with a novel reputation incentive mechanism. Their empirical findings affirmed that this fusion not only augmented profits for edge servers but also slashed average delays by an impressive 76% against conventional mobile edge computing setups. Zhou et al. [62] envisioned an optimization framework for cloud-edge computing networks by leaning on the Stackelberg game. Their gradient-based Iterative Search Algorithm unearthed the optimal solution to utility maximization. Benchmarking against existing algorithms, their model showcased remarkable efficiency. Lastly, Li et al. [63] unfolded a contract-Stackelberg blueprint tailored for vehicular fog-edge computing. Central to their design was the Stackelberg game, which orchestrated pricing strategies to synchronize the utilities of all stakeholders involved. In essence, the Stackelberg game’s versatility in addressing hierarchical decision-making challenges has rendered it an invaluable asset for researchers. Its capacity to harmonize competing objectives while ensuring stakeholder satisfaction underscores its potential in shaping the future of data sharing and exchanging mechanisms.

The Vickrey-Clarke-Groves (VCG) mechanism stands out in the realm of mechanism design for its truth-inducing properties, fostering participants to reveal their genuine valuations. This mechanism ensures an outcome that optimizes social welfare. In VCG, each winner’s payment \(\rho _i\) will be the difference between the total cost for the other when verifier i is not participating and the total cost for the others when verifier i joins. It can be defined as:

$$\begin{aligned} \rho _i=\sum _{{\nu _j}\ne {\nu _i}}\zeta _j(W^*_{-i})-\sum _{{\nu _j}\ne {\nu _i}}\zeta _j(W^*_{i}) \end{aligned}$$
(5)

Several research endeavors have adopted the VCG mechanism within the context of blockchain ecosystems. Notably, these studies predominantly center around the allocation of computational resources between miners and edge service providers. For instance, Jiao et al. [18] formulated an auction game between edge computing service providers and miners requiring computational resources. Through their proposed auction mechanism, they managed to optimize social welfare. Furthermore, their methodology ensures individual rationality, truthfulness, and computational efficiency. In another study, Gu et al. [19] leveraged the VCG auction mechanism to address issues related to storage transactions. Implementing their model on the Ethereum platform, they were able to demonstrate that their approach fosters secure, efficient, and cost-effective resource trading.

The VCG mechanism has also been embraced in a myriad of domains including edge computing, wireless networks, crowdsourcing, and crowdsensing, among others. For instance, in the sphere of mobile crowdsensing, Li et al. [20] leveraged the VCG mechanism. Their theoretical algorithms aimed to enhance the efficiency of platforms while making them more appealing for prospective participants. Similarly, Zhou et al. [21] pioneered a novel framework within the crowdsensing domain. Their methodology combined the rewarding potential of the VCG mechanism with edge computing to alleviate computational traffic and workload. Moreover, they integrated advanced deep learning algorithms, such as Convolutional Neural Networks (CNN), to sieve out spurious and irrelevant information that could be disseminated by inauthentic participants. Their empirical case study further reinforced the robustness of their proposed framework. Liu [64], venturing into the realm of ridesharing systems, harnessed the VCG mechanism to conceive a cost-sharing architecture. He meticulously devised two VCG-centric mechanisms tailored for both rudimentary and intricate scenarios. His model notably underscored the potential of minimizing societal costs. Lastly, Borjigin et al. [65] melded VCG algorithms into their innovative multiple-Walrasian auction mechanism, particularly for the valuation service of trees in the network function virtualization market. Their primary objective in utilizing the VCG mechanism was to accentuate and maximize societal effectiveness.

In non-cooperative games, players act independently, making decisions based on predictions of other players’ strategies and payoffs, with the aim of identifying a Nash Equilibrium [66]. Such games are characterized by four fundamental components: players, actions, strategies, and payoffs. Assume we have a set of players participating in the game denotes to \(\mathbb {P}=\{\rho _1,\rho _2, ..., \rho _n\}\). A set of strategies denotes \(\mathbb {S}=\{\phi _1,\phi _2, ..., \phi _m\}\), which represents how the player will act in every possible distinguishable circumstance. The payoffs will be the utility of each player, if the utility of player i denotes \(\mu _i(\phi _i, \phi _-i )\), then other players’ strategies will be \(\overrightarrow{\phi }_{-i}=\{\phi _1, \phi _2, ..., \phi _{i-1}, \phi _{i+1}, \phi _m \}\). To find the optimal utility of players, the player i’s strategy \({\phi _i}^*\) is the best response to the strategies specified for the other \(n-1\) players. The Nash Equilibrium can be defined as follows:

$$\begin{aligned} {\phi _i}^*= \mathop {\textrm{argmax}}\limits _{\phi _i}\mu _i(\phi _i, \overrightarrow{\phi }_{-i}) \end{aligned}$$
(6)

Zhang et al. [45] introduced a game-theoretic model tailored to enhance the outcomes of the non-cooperative equilibria observed in crowdsourcing applications. Their research identified a delicate balance between social welfare and non-cooperative equilibria. In response, they developed incentive mechanisms rooted in non-cooperative games, pinpointing an optimized solution that maximizes social welfare. Zhan et al. [14] highlighted that as the Internet of Things (IoT) continues to evolve, federated learning emerges as an adept solution to address issues related to network bandwidth, storage, and most pertinently, privacy. Yet, the federated learning landscape is devoid of robust incentive mechanisms, primarily due to the challenges posed by the reluctance to share information and the complexities of contribution evaluation. Addressing this, they introduced a two-tiered incentive mechanism, with the latter stage anchored in a non-cooperative game. This mechanism aimed to galvanize edge nodes, motivating them to more actively and efficiently participate in the training process. Hossain et al. [67] utilized a non-cooperative game approach to address the challenge of resource constraints within a vehicular edge computing setting. In their model, each vehicle autonomously devises its strategy, determining whether to offload a task to a multi-access edge computing server or a cloud server, with the objective of optimizing its benefits.

A bargaining game pertains to a scenario wherein players negotiate to decide the division of benefits derived from cooperation. An illustrative example of this is the negotiation between a seller and a buyer over the price of an automobile. There exists a set of players’ strategies denoted as \(\mathbb {S}=\{\phi _1,\phi _2, ..., \phi _m\}\). For any two players, \(\phi _i\) is the seller, and \(\phi _j\) is the buyer, they will determine the selling price \({\phi _i}^*\), the expected utility for the seller denotes as \({\mu _i}^*\). Similarly, the buyer will also determine his/her utility \({\mu _j}^*\). If \({\mu _i}^*>{\mu _j}^*\), there will be disagreement between two players, and the negotiations need to be continued. When \({\mu _i}^*\le {\mu _j}^*\), the bargaining game is performed, and the price strategy \(({\phi _i}^*,{\phi _j}^* )\) is the Nash Equilibrium of this game [43].

Recent research has delved into the application of bargaining games in various sectors: Magerkurth et al. [47] crafted a multi-stage bargaining game tailored for crowdfunding platforms. Their primary objective was to navigate the challenges of crowdfunding benefit allocation, with the ultimate goal of optimizing social welfare. In another study, Lu et al. [48] advanced an incentive mechanism that integrated a bargaining game. Recognizing the constraints of non-cooperative games, they introduced a two-sided rating protocol. Through systematic rating, they devised strategies anchored on intrinsic parameters, aiming for the pinnacle of social welfare maximization. Wang et al. [49] ingeniously melded a Nash bargaining game with deep reinforcement learning methodologies, focusing on enhancing communication in heterogeneous vehicular networks. The core of their approach lies in optimizing the network’s overall performance, striving for the zenith of total reward maximization. Kim [68], on the other hand, conceived a resource management model for pervasive edge computing infrastructure, founded on a bargaining game. He embarked on a comprehensive exploration of the allocation challenges related to computation and communication resources, offering solutions via his proposed model.

3.1.2 Demand and supply models based-incentives

The challenge of determining appropriate reward pricing in incentive mechanisms is perennial. A renowned economic model, known as the demand and supply model, offers insight into determining the price associated with data sharing and exchanging. The demand and supply model elucidates the interplay between data owners and data requesters. At a specific point, an equilibrium price emerges when the quantity demanded aligns with the supply. Such an equilibrium enables efficient resource allocation. Let’s consider the subsequent equations for demand and supply functions. In these equations, P symbolizes the price corresponding to each quantity:

$$\begin{aligned} Q_d=a-b*P \end{aligned}$$
(7)
$$\begin{aligned} Q_s=-c+d*P \end{aligned}$$
(8)
Fig. 3
figure 3

Demand and supply model

In a recent study by Ma et al. [51], a time and location correlation incentive mechanism was introduced for deep data collection in crowdsourcing networks. They established a metric termed “Quality of Information Satisfaction Degree” (QoISD) to assess the adequacy of collected sensing data. By designing two demand-based incentive mechanisms, they aimed to optimize the QoISD and the associated rewards. Simulations affirmed their method’s efficacy, reducing costs and enhancing QoISD. Sun et al. [69] proposed a dynamic digital twin-based incentive mechanism for resource allocation in aerial-assisted Internet of Vehicles. This two-stage algorithm adeptly handles fluctuating resource supply and demands, ensuring efficient resource scheduling and allocation. Meanwhile, Esfandiari et al. [70] leveraged demand-supply theory to counteract nodes’ selfish behaviors in disruption-tolerant networks, enhancing criteria such as delivery ratio, delay, dropped messages, and overhead ratio (Fig. 3).

3.1.3 Cost model based-incentives

The cost model allows for the determination of the final price of a product by taking into account the total production cost and adding the intended profit margin. When applied to incentive mechanisms, this model provides a means to establish the appropriate reward or price. Let the desired income be represented by \(\eta\), the total cost be \(\varsigma\), and a predefined profit percentage be \(\rho\). The relationship between the cost and income can then be expressed as follows:

$$\begin{aligned} \eta =\varsigma (1+\rho ) \end{aligned}$$
(9)

Cost models offer a straightforward and cost-effective approach when compared to other economic-based models. The implementation of a cost model as an incentive mechanism results in efficient computation due to its relative simplicity. However, it’s important to note that cost models have limitations as they tend to overlook elements like competition and replacement costs. They primarily consider internal factors while neglecting external ones, as highlighted in prior research [43, 71].

Cheng et al. [54] identified a challenge in the context of crowdsourcing platforms, particularly when these platforms sent location-based requests to workers. The challenge revolved around optimizing the assignment of workers to tasks. To address this challenge, they devised three effective heuristic methods: the greedy approach, g-divide and conquer, and cost model-based adaptive algorithms. Experimental results demonstrated the efficiency and effectiveness of these methods in maximizing workers’ rewards within a limited budget. Xue et al. [72] applied both public and private cost models for rational miners in a Bitcoin mining pool. They introduced a Budget Feasible Reward Optimization (BFRO) model aimed at maximizing the reward function while adhering to budget constraints. To solve the BFRO problem, they developed a budget-feasible reverse auction mechanism.

3.1.4 Competition model-based incentives

Competition-based models assist organizations in formulating their pricing strategies by taking into account the pricing strategies of their competitors. In contrast to the cost model, competition-based models consider an external factor: competition within the market. Prices or rewards are determined by assessing market information. In these models, participants establish their prices by benchmarking against similar tasks, aiming to align with a leader’s pricing decisions, which are then followed by others.

Dong et al. [55] employed a competition-based model to establish QoE-ensured pricing in mobile networks. They combined game theory and the competition model to depict social behavior and understand the relationships among devices, service organizers, and users. Damien et al. [56] highlighted the common implementation of cooperation and competition modes in crowdsourcing platforms. They introduced a hybrid model called “coopetition,” which blends both approaches. Their experiments demonstrated that the hybrid model outperformed the two traditional ones. Ghasemi et al. [73] designed a competition-based pricing strategy for the cloud market environment. Their experimental results showcased a significant increase in profits for providers compared to other pricing policies discussed in previous literature.

Fig. 4
figure 4

Related incentive mechanisms

3.2 Non-monetary incentives

Non-monetary incentives in previous research can be categorized into two main types: entertainment-based incentives and service-based incentives [5]. Entertainment-based incentives primarily utilize location-based mobile games as a means to motivate participants [6,7,8]. These incentives leverage the engagement and enjoyment derived from gaming experiences to encourage participation. On the other hand, service-based incentives focus on rewarding participants with improved services or benefits. For example, in GPS applications, users who also contribute data may receive enhanced services [9, 10]. Providing better services as rewards serves as an incentive for participants to contribute their input.

3.2.1 Entertainment-based incentives

Many researchers have directed their efforts towards the development of games as entertainment-based incentives to motivate participants to engage with various platforms. For instance, Barkhuus et al. [9] created an entertainment-based game inspired by Weiser’s concept of seams, gaps, and breaks within different forms of media. In their system, participants were required to upload specific content to a server to earn game points. Additionally, players had the opportunity to collaborate with teammates to double their earned points. Neustaedter et al. [74] identified that creating and sustaining location-based games in the real world posed significant challenges over extended periods. Consequently, they devised a well-structured location-based game called Geocaching, which relied on active engagement from participants. To gain insights into user participation and growth, they also introduced an online survey system as part of their efforts to address this issue. Rossitto et al. [75] conducted a qualitative study involving an interactive audio drama facilitated through a location-based application. The objective of their research was to enhance user engagement and overall experience in location-based services. They employed entertainment-based incentives and devised a game in which audience members could trigger new scenes within the audio drama by standing in predefined locations. Lammes et al. [76] introduced various location-based mobile games as alternatives to conventional maps for capturing players’ local information. Their approach combined insights from game studies with non-representational perspectives on maps as technological constructs. We observed that location-based games are the predominant incentives utilized within entertainment-based models. These incentive systems reward participants with game points to stimulate user engagement. By revealing the network coverage map of predefined areas, these incentive mechanisms have the potential to enhance system performance. Nonetheless, entertainment-based incentives face various challenges, particularly concerning security issues related to location-based games. There is a significant risk of sensitive location information being compromised.

3.2.2 Service-based incentives

Service-based incentives are commonly employed in scenarios where service providers also act as service requesters. In the context of data exchange, data requesters often double as data providers. As a result, service-based incentives represent another significant mechanism within the data-sharing and exchanging process.

Gupta et al. [77] employed a service-based incentive mechanism in their global crowdsourcing platform, named “mClerk.” Using their method, low-income workers could have more new employment opportunities. The users in mClerk both sent and received tasks via SMS. Huang et al. [78] proposed a blockchain-based decentralized platform for the Internet of Things (IoT) data exchange. The data requester also was a data provider in their system. In their system, since the characteristics of blockchain, the third party for data exchange is useless. Therefore, their method would be more secure and transparent compared with centralized data exchange systems. Yi et al. [79] developed a novel information dissemination process with the service-oriented incentive mechanism analyzed and modeled in IIoT, which depicts the dynamical evolution of IIoT devices’ interactions. Vimalajeewa et al. [80] designed a Service-based joint model used for smart agriculture distributed Learning application. They incorporated a service-based incentive mechanism with a deep neural network and federated learning, their experiment results showed their proposed model performed and compared well with a centralized approach and demonstrated state-of-the-art performance.

Compared with many monetary-based incentives, service-based incentives will be more simple and more low-cost. However, some disadvantages of service-based incentives still exist, such as ensuring the truthfulness of all users and maximizing the users’ utility, protecting users’ sensitive information, etc. We will discuss more opportunities in Sect. 6.

4 Data sharing and exchanging lifecycle

The data sharing and exchanging lifecycle can be subdivided into four primary segments: data creation, data storage, data access, and data security and privacy. Within each of these segments, several crucial components exist. A visual representation of the detailed data sharing and exchanging lifecycle can be found in Fig. 5.

Fig. 5
figure 5

Data sharing and exchanging lifecycle components

4.1 Data creation

Within the data creation phase, several key components come into play, including the data management system, data quality, data processing, and data transformation. The process of data creation and integration in data sharing and exchanging can be notably intricate due to these components. Data creation and integration form the bedrock of data sharing and exchanging systems, laying the groundwork by providing the necessary data for all participants to store, access, and analyze throughout the entire lifecycle. Prior research has witnessed the development of numerous monetary incentive algorithms aimed at augmenting data collection and survey responses. Wang et al. [81] introduced a two-stage auction algorithm and a truthful online reputation updating algorithm to enhance mobile crowdsourcing data collection systems. Their systems were capable of selecting optimal workers, determining winners, and calculating payments based on these algorithms.

A data management system serves as the database for storing, accessing, and manipulating the integrated data intended for sharing and exchanging purposes. Traditionally, data management systems primarily handled structured data, and cloud storage has become a prevalent approach in recent times. Numerous researchers have been actively involved in developing data management systems tailored for data-sharing purposes. Liu et al. [82] highlighted security concerns in traditional cloud data management systems and introduced a secure multi-owner data sharing management scheme named “Mona.” This scheme utilized group signature and dynamic broadcast encryption techniques to enhance security. The majority of research in data management systems has concentrated on structured data and cloud-based systems. Before data sharing and exchange can occur, it is imperative to ensure high data quality during the data creation step. Data quality plays a critical role in determining the effectiveness and efficiency of data sharing and exchange processes. In recent years, an increasing number of researchers have considered data quality as a significant parameter when designing incentive mechanisms. For example, Yang et al. [83] observed that data quality was often overlooked in the mobile crowdsensing domain. To address this issue, they integrated quality estimation and monetary incentives into their model to support data sharing. Additionally, they employed an unsupervised learning approach to quantify data quality and implemented outlier detection techniques to filter out anomalous data. Similarly, Luo et al. [84] identified limitations in using data mining techniques to control data quality. They introduced a cross-validation approach to identify a validating crowd capable of verifying the contributions made by sensor data providers. Furthermore, they employed weighted oversampling methods and privacy-aware trust algorithms to enhance the services of mobile crowdsensing systems. However, it’s worth noting that many researchers continue to rely on traditional machine learning methods for data quality filtering.

Data processing and data transformation are crucial steps aimed at converting raw data into meaningful and structured information. When creating or collecting data, it’s essential to establish a standardized data structure for efficient big data management. This involves tasks such as data format conversion, data cleaning, and factor extraction, among others. To facilitate data creation and integration, unified data processing and transformation formats become essential. These unified data integration frameworks can significantly reduce the time spent on data wrangling and help save costs. For instance, Ma et al. [85] introduced a novel graph-based data integration framework built upon a unified conceptual model. They applied this framework to address a real-world refueling problem and demonstrated improved precision and recall results. Given the diversity of data types in the realm of big data, some researchers have developed data integration frameworks tailored for unstructured data. Williams et al. [86] designed an image data integration platform for bioimages sourced from various channels, including high-content screening, multi-dimensional microscopy, and digital pathology. They also established a computational resource for remote access to their system, enabling users to re-analyze the data. Nevertheless, unified data integration frameworks may still face challenges, such as data security concerns and process efficiency optimization.

4.2 Data storage

Within the data storage process, several key components play critical roles: data backups, data replication, data deduplication, and cloud storage. Data backups serve as a crucial means of ensuring data protection and mitigating costs in the event of data loss. Some organizations still employ tape backup as their method of choice for safeguarding against data loss. This involves storing data on magnetic media. However, it’s important to note that tape backups can be vulnerable to corruption. Even when organizations opt for cloud storage or other backup solutions, the possibility of disasters leading to system shutdowns remains a concern. In modern data management practices, a secure approach involves the combination of full backups and partial backups. This strategy enhances data protection and resilience against data loss scenarios. A full backup corresponds to a specific moment in time, involving the capture of a comprehensive system image, which is then stored on a secondary device. In contrast, partial backups encompass differential and incremental methods. However, regardless of the traditional backup strategies implemented, a persistent risk remains: the potential for system corruption [87, 88].

Data replication The key distinction between backups and replication lies in the accessibility of replicas, which are more readily available to production systems. Data deduplication is a vital data cleaning process in data storage, serving to mitigate data redundancy and optimize storage space utilization. The primary objective of data deduplication algorithms is to enhance the efficiency of databases by eliminating redundancies without compromising data accuracy or integrity. In recent research, there has been a significant focus on developing secure data deduplication mechanisms. For instance, Fan et al. [89] introduced a hybrid data deduplication mechanism tailored for cloud storage systems, addressing security concerns associated with the deduplication process. Their experimental results demonstrated the effectiveness of their approach in resolving security issues within data deduplication. Similarly, Rashid et al. [90] proposed a two-level data deduplication framework designed for cloud storage systems. The framework comprised two tiers: the enterprise-level and the cloud storage provider-level. At the enterprise level, data deduplication was performed, and the deduplicated data was stored in the cloud. Subsequently, at the cloud storage provider level, duplicate data was systematically removed to optimize storage space while ensuring data security and control. The authors showcased the advantages of their framework in terms of security, control, space efficiency, and reduced storage costs.

Cloud storage and edge storage Data storage is an essential method for preserving data, and much research attention has been devoted to developing incentive mechanisms for this purpose. Conventional data storage relies on established mechanisms for accessing multiple configurable resources. Over the past few decades, numerous researchers have dedicated their efforts to enhancing cloud storage systems through various incentive mechanisms. However, with the advent of 5G technology, the Internet of Things (IoT), and the proliferation of big data, cloud-based data storage has exhibited certain limitations. Cloud computing, in its current form, lacks some crucial functionalities required to cope with the surging volumes of big data effectively. These shortcomings include challenges related to low latency and jitter, ensuring high availability, and scalability. Consequently, several transformative changes are poised to impact our daily lives. Key questions arise, such as “Can services be delivered closer to end-users through distributed computing?” “Can your smartphone serve as your primary data repository?” “Can your vehicle monitor machine health, facilitate software updates, and identify real-time maintenance issues promptly?” “What if smart edge devices could offer deterministic latency and support time-sensitive applications while analyzing real-time and streaming data at the edge?” These questions present formidable challenges for data storage as we design incentive mechanisms for data sharing and exchange.

In response to these challenges, recent studies have focused on the development of edge data storage and processing solutions aimed at addressing the aforementioned questions. Ge et al. [91] investigated the data caching resource allocation problem in fog radio access networks environment. They employed a Stackelberg game to incentivize the data providers to participate in the resource allocation process. They applied the simple search method to solve the optimization problem that could optimize the data caching resource allocation. Alioua emphet al. [92] developed an incentive mechanism of edge caching for the Internet of vehicles system. Their incentive mechanism focused on the economic side of caching by considering the competitive cache-enablers market. They employed a Stackelberg game between the data provider and the multiple mobile network operators and found a Nash equilibrium to reduce the caching cost. However, we still have many opportunities to improve data storage in the data sharing and data exchanging process.

4.3 Data access

In the context of incentive mechanisms for data sharing and exchanging, “data access” is a broad concept that encompasses the authorization to access the data. This area comprises several critical data access components, including identity and authentication, access control, encryption, and trust management.

Identity and authentication is a term used to describe the process of granting different parties access to the data. In the past, authentication protocols were primarily designed for single-server environments, which are ill-suited for the new architecture of big data and IoT environments. Around 2015, an increasing amount of sensitive data, such as healthcare records, began to transition into digital formats. Consequently, many researchers began developing more efficient authentication schemes to safeguard E-healthcare databases. For instance, both Wu et al. and Jiang et al. concentrated on devising three-factor authentication protocols to mitigate various types of attacks [93, 94]. Recognizing the limitations of single-server authentication schemes, some researchers began to explore the creation of inter-cloud identity management systems like OpenID and SAML, which offer Single-Sign-On (SSO) authentication capabilities.

Access control is employed to prevent unauthorized entities from accessing devices and sharing or exchanging data. Historically, the majority of research has been concentrated on designing access control systems for the cloud. However, as edge computing architectures have evolved, there have been relatively few developments in edge access control mechanisms. Yu et al. designed an access control system by leveraging techniques from various encryption schemes, establishing efficient fine-grained data access control [95]. Additionally, they introduced a novel framework for access control within the healthcare domain in a cloud computing environment [96].

Encryption has been a popular research topic for many years. However, traditional encryption methods like the Triple Data Encryption Algorithm (TDEA) and Triple Data Encryption (3DSE) have their limitations. They require devices to have prior knowledge of information recipients’ identities and share credentials, which may not be feasible in many data-sharing and exchanging scenarios where recipients are often unknown. To address these challenges, encryption methods tailored for data sharing and exchanging environments have been developed, providing solutions for scenarios where traditional algorithms fall short. For instance, Attribute-Based Encryption (ABE) is one such encryption algorithm that involves a key authority between a data sender and recipient [97]. This approach offers more flexibility and adaptability in complex data-sharing and exchanging systems.

Encryption methods have evolved to address the security needs of various computing environments, including centralized cloud servers and emerging edge paradigms. Researchers have proposed innovative encryption schemes to protect data in these diverse settings. In centralized cloud environments, Wu et al. combined hierarchical identity-based encryption (HIBE) with ciphertext attributed-based encryption (CP-ABE) to create an efficient encryption scheme for sharing confidential data [98]. Li et al. extended this approach to safeguard healthcare data on cloud servers, utilizing attribute-based encryption (ABE) techniques to encrypt patient’s personal health record (PHR) files [99].

With the advent of edge computing paradigms, encryption methods have been adapted to suit these environments. Alrawais et al. introduced an efficient key exchange protocol based on CP-ABE and digital signature techniques in fog computing environments, achieving improved performance in terms of confidentiality, authentication, verifiability, and access control [100]. Jiang et al. also designed an encryption scheme based on CP-ABE for fog computing in the Internet of Things (IoT) context [101].

Additionally, within the realm of attribute-based encryption (ABE), key-policy ABE (KP-ABE) has emerged as another scheme that contributes to enhancing the security of big data [102]. These various encryption methods cater to the specific security needs of different computing environments, ensuring data protection and confidentiality.

Trust management is a critical aspect of data access control models, particularly in the context of data sharing and exchanging processes. Recent research has emphasized the development of secure and efficient incentive mechanisms within trust management systems. For instance, Fernandes et al. introduced an incentive-based trust management mechanism called “Pinocchio” [103], which focuses on ensuring the honesty of participants. The Pinocchio framework aims to reduce instances of free-riding and enhance the service quality of distributed trust management infrastructure.

Similarly, Lafuente et al. designed a cooperation incentive mechanism as part of a trust management system [104]. This mechanism contributes to the creation of a robust system architecture while encouraging user honesty. Their framework comprises three integral components: the identity manager, the trust manager, and the cooperation manager. These efforts collectively address the crucial role of trust management in data sharing and exchanging processes.

4.4 Data security and privacy

Data security and privacy are paramount concerns throughout the data sharing and exchanging lifecycle. In each phase of this process, various data security and privacy challenges can arise.

Data creation: One initial security concern pertains to hardware security. Unauthorized access and cloning of sensor tags are potential risks, enabling adversaries to reprogram data. Malicious attackers may gain control over user-accessible IoT devices and disseminate false information. Additionally, attackers can easily acquire confidential information, such as passwords for wireless sensors like RFID, allowing them to eavesdrop on sensitive data. The second security challenge pertains to software security, emphasizing the inadequacy of relying solely on hardware security measures. It is crucial to recognize that a significant portion of cyberattacks targets software vulnerabilities. One of the most prominent security threats involves hackers targeting widely-used operating systems like iOS and Android. Malicious actors who compromise smart mobile devices gain access to both enterprise and personal data, posing various threats such as cryptographic attacks and code injection attacks. Furthermore, security issues in both hardware and software can lead to problems like inaccurate data creation or data breaches.

Data storage Ensuring the protection of cloud and edge data centers is crucial to prevent physical damage and unauthorized privilege escalation during the data storage process. These data centers, which are often managed by various business organizations, must be safeguarded to maintain the integrity and security of data. Furthermore, an additional challenge lies in effectively thwarting external attackers, ensuring well-trained security personnel are in place, and maintaining the data center with a high level of professionalism. In the realm of cloud security, the primary threats encompass denial of service (DoS) attacks, shared cloud computing services, system vulnerabilities, and potential risks posed by malicious or negligent insiders. Distributed denial-of-service (DDoS) attacks, in particular, pose a significant threat to cloud platforms, as they have the potential to disrupt and disable cloud systems, thereby denying access to services [105]. System vulnerabilities can persist in complex cloud computing infrastructures, and attackers who exploit these weaknesses can potentially compromise the integrity of the entire cloud system. In addition to external threats, cloud computing management teams must also be vigilant in guarding against internal security threats that may arise from human actions.

Data access The first security threat pertains to network security during data access. Distributed Denial of Service (DDoS) attacks and wireless jamming are common and pose significant risks. Additionally, Man-in-the-Middle attacks and counterfeit attacks may occur when authorized nodes access the network. Malicious attackers can compromise a portion of the network, potentially leading to widespread network vulnerabilities. Unauthorized attackers might falsify data, communicate undetected, and gain access to sensitive database information. Furthermore, the theft of Intellectual Property (IP) represents a grave concern for network security. While IPv4 and IPv6 offer certain protections for applications, numerous security threats persist, including DDoS attacks, Man-in-the-Middle attacks, packet sniffing, and more. The second security concern revolves around heterogeneous networks. Sharing data across such networks introduces security challenges, such as DoS attacks and malicious code injections. Without secure authentication and key agreement mechanisms, attackers can easily disrupt and compromise services.

5 Challenges in incentive-based data sharing and exchanging

5.1 Algorithms to improve data quality

The necessity for high-quality data is paramount when various entities engage in data sharing and exchange. Given the immense volume associated with big data, assessing data quality within a constricted timeframe becomes a challenge. A pervasive issue in many incentive mechanisms is the potential for participants to contribute counterfeit or subpar data to gain increased rewards. With the diversification of data types, established and recognized data quality standards frameworks have emerged to regulate quality within the data-sharing sector. The challenges of preemptively identifying inferior data and ensuring authenticity during multi-entity exchanges remain central to discussions on data quality [106].

Furthermore, the heterogeneous nature of big data necessitates careful consideration not only of data quality but also of selecting the optimal database systems for storing high-caliber data. Traditional database management systems are ill-equipped to address the uncertainty inherent in diverse data types collected prior to sharing and exchanging. The evolution from data warehousing to data catalogs, data hubs, and data fabric structures, coupled with the transition from SQL to NoSQL platforms and from conventional data types to textual, visual, and video data, highlights the challenges posed by database management systems. These encompass addressing the unpredictability of a broad spectrum of NoSQL tools, choosing the appropriate non-relational database for data integration, and sustaining the database.

However, the process of converting to high-quality data is both time-intensive and financially demanding. Data transformation can be sluggish, particularly in the absence of a standardized data collection framework. When contributors fail to supply data of requisite quality or in the mandated formats, the financial implications of modifying data to fulfill sharing and exchange specifications escalate. Hence, establishing and adhering to a unified high-quality data format emerges as yet another pivotal challenge.

5.2 Incentive mechanisms

With the advent of new applications in data-sharing and exchanging, these processes have undergone significant enhancements in both security and efficiency. However, there exists a conspicuous absence of incentive mechanisms that would motivate individual nodes to actively participate throughout the process. For instance, federated learning stands as a secure technique in edge computing, where each node is tasked with training a deep learning model at the edge, and only the training parameters are shared with the parameter server. Nevertheless, incentivizing every node to partake in the federated learning procedure is a formidable challenge. Furthermore, gauging the contribution and data quality of the edge node becomes another intricate issue in the development of incentive mechanisms.

Within the realms of IoT and 5G, numerous applications necessitate immediate data sharing and exchange frameworks. Consequently, for real-time data sharing and exchange, it’s imperative that our incentive structures effectively motivate data proprietors to engage in prompt negotiations. Additionally, spurring participants to actively generate and furnish real-time data poses another challenge [107]. In tandem with real-time data production considerations, ensuring data quality becomes an indispensable criterion for data sharing and exchange.

Challenges related to incentive-driven storage resource allocation and offloading are also evident in the data sharing and exchange process, especially within the contexts of IoT and edge computing. As many applications gravitate towards edge computing to curtail computation time and alleviate cloud load, crafting efficient and cost-effective incentive mechanisms to address data allocation and offloading remains a topic inviting further exploration.

5.3 Unified data management systems

The intricate processes of data sharing and exchanging necessitate a consolidated data management system, encompassing facets like data transformation and pre-processing. One of the most pressing challenges in the contemporary big data landscape is the prevalence of ’dirty data.’ Contemporary data scientists find themselves dedicating approximately 60.

For instance, both data replication and data deduplication stand as pivotal components necessitating a unified data management approach. Data replication principally aims to store identical data across multiple locations, enhancing data availability and bolstering storage system resilience and reliability [108]. Yet, designing replication strategies is not devoid of challenges. Replication is both costly and time-intensive. Synchronizing voluminous and real-time data updates is a herculean task, further complicated by bandwidth constraints. Navigating these new processes and managing voluminous data traffic remain pressing challenges. A unified framework could proffer applications with an abstracted interface, harmonizing with extant replication systems.

Conversely, data deduplication serves as a pivotal data-cleaning phase in storage. It’s an imperative strategy aimed at minimizing data redundancy and conserving storage capacity. Yet, the deduplication process is intricate, exacerbated by the absence of a unified model capable of eliminating redundancy from the diverse data typologies in big data. Deduplication, too, is time-intensive and poses risks, such as inadvertently purging valuable information. As a result, when orchestrating the data sharing and exchanging paradigm, it’s crucial to factor in the deduplication process, possibly integrating redundancy algorithms to oversee dataset duplication. Crafting such a unified data management framework is paramount for efficient data sharing and exchange.

Additionally, harnessing machine learning for data unification presents a promising research avenue. Techniques like entity resolution [109] and tools like the Python dedupe library [110] can be pivotal when architecting data-sharing and exchanging systems. A structured and integrated data management system can significantly truncate the time dedicated to data cleaning.

5.4 Secure and trustworthy data access control systems

In the realm of data access, pivotal challenges encompass devising efficient and secure identity and authentication systems, crafting robust access control frameworks, selecting optimal encryption techniques, and enhancing trust management infrastructures.

Access control plays a quintessential role in ensuring that unauthorized actors are deterred from accessing IoT devices and retrieving data. Historically, a significant portion of research has centered on sculpting access control systems tailored for cloud environments. Yet, with the evolution of edge architectures, there’s a conspicuous lacuna in access control strategies pertinent to edge data sharing and exchanging. As such, conceptualizing and implementing potent access control algorithms emerges as a pressing challenge. Given the sheer volume of edge devices within a 5G-integrated IoT ecosystem engaged in data sharing and exchange, a novel set of challenges presents itself. These include efficiently defining the access model and optimizing scarce resources, particularly critical for battery-dependent devices engaged in data activities.

Further complicating matters, many data-sharing and exchanging scenarios, especially in the burgeoning 5G-enabled IoT landscape, feature a plethora of unknown recipients. This implies that conventional encryption algorithms may be ill-equipped to navigate the intricate dynamics of modern data-sharing and exchange systems. As a result, there’s a pressing need to evolve encryption frameworks to be both more secure and agile to cater to diverse access patterns. When users access data, another emerging concern is effectively masking the data to safeguard the sensitive facets of the original dataset.

Trust issues, often rife in data sharing and exchanging processes, tend to surface especially between service providers and remote servers. Establishing and nurturing trust relationships—whether device-to-device, device-to-user, or user-to-server—stands as a formidable challenge. As data interactions intensify, it becomes paramount to architect efficient and secure trust management algorithms that foster confidence between the data provider and the data requester.

5.5 Privacy-preserving strategies and security principle

Data security and privacy stand as paramount considerations in the crafting of incentive mechanisms throughout the data sharing and exchanging life cycles. Each phase of the data-sharing process and the associated life cycle necessitates robust privacy-preserving mechanisms to ensure comprehensive protection.

During the data creation and collection phase, there is a conspicuous absence of adequate privacy-preserving algorithms. In emergent applications, such as smart transportation and smart homes, the data accrued from devices is intrinsically personal. Given the highly distributed nature of this data and the mandate to share it with central servers, the role of privacy-preserving algorithms becomes indispensable. Moreover, during data sharing and exchange, participants often harbor genuine privacy apprehensions. The absence of secure incentives, anchored in preserving participant privacy, can deter parties from engaging in the data-sharing process.

The data storage domain has witnessed the proposal of myriad privacy-preserving algorithms, including but not limited to K-anonymity [111], L-diversity [112], T-closeness [113], and differential privacy [114]. Notwithstanding these advancements, there’s a void in the arena of privacy-preserving algorithms tailored for unstructured data. While a plethora of research endeavors have zeroed in on the privacy mechanisms for cloud storage, a relatively smaller subset has directed their focus toward edge environments. Crafting mechanisms to safeguard data across both cloud and edge domain surfaces is a pressing challenge, necessitating a fortified access control system.

The advancement of privacy-preserving methodologies in data sharing contends with the rapid generation and dissemination of information. The incessant flow of data from the IoT and online platforms calls for agile privacy-preserving frameworks capable of adapting to evolving data topologies. Furthermore, the detailed personal information garnered necessitates a rigorous ethical framework. The conception of algorithms capable of effectuating real-time data anonymization and obfuscation without diminishing the data’s intrinsic value remains an urgent area of scholarly investigation.

Concurrently, on the security spectrum, the assimilation of diverse data sources poses significant vulnerabilities. The diversity of data structures, coupled with the intricacies of communication across systems, demands all-encompassing security protocols to thwart unauthorized incursions and potential data compromises. Additionally, with the ascendancy of blockchain and other decentralized data custodianship paradigms, safeguarding data integrity across these fragmented networks introduces novel complexities. The establishment of secure data conduits, potent encryption modalities, and indelible audit trails is paramount to preserve data sanctity and confidentiality throughout its lifecycle. Addressing these challenges transcends technical confines, extending into the domain of policy-making where clear and enforceable governance structures are indispensable for engendering stakeholder confidence.

Fig. 6
figure 6

Data sharing and exchanging with blockchain

5.6 Data ethics principles and fairness algorithms

Complementing the technical dimensions, the ethical fabric of data interactions also calls for attention. Ethical guidelines governing data transactions are imperative. Globally, governments exhibit an escalating impetus for legislative measures dedicated to ensuring data privacy and security. Yet, the quest for universal, pragmatic, and cohesive data ethics principles remains an unresolved challenge in the data science landscape, particularly in the context of data sharing and exchange.

The evolution of data ethics principles amid the data sharing and exchange paradigm has been deeply influenced by the intersection of technological progress, societal norms, and the ever-evolving regulatory framework. Initially anchored in fundamental principles of confidentiality and informed consent, the scope of data ethics has progressively broadened to include intricate issues like algorithmic transparency, data sovereignty, and the principled application of artificial intelligence. In our increasingly interconnected digital milieu, the call for sophisticated governance models is pronounced, models that judiciously balance individual rights within digital ecosystems. The shift from traditional data protection statutes to advanced regulatory frameworks is a testament to this evolution, taking into account the profound implications of data handling on societal welfare and individual liberties. Emblematic of this shift is the GDPR, which stands as a benchmark in the global narrative for privacy and ethical data management.

However, the ethical challenges are pervasive across various stages in data sharing and exchanging life cycles. In the data creation stage, obtaining informed consent, ensuring the accuracy of data, and preventing biases in data collection are crucial. For instance, facial recognition technologies used for creating datasets have faced scrutiny for biases that result in inaccurate recognition across different demographics, as seen in some law enforcement applications [115, 116]. Mitigating such biases and ensuring fairness in data creation is an ongoing challenge. In the data storage stage, questions about the long-term protection of privacy are challenging. Organizations must secure data against breaches and unauthorized access. Cloud storage services, like those provided by Amazon AWS or Google Cloud, have faced incidents where misconfigurations led to data leaks [117]. Ensuring that ethical standards are met involves implementing rigorous security protocols and regularly updating them to guard against emerging threats. Determining who has the right to access data and for what purposes is fraught with ethical dilemmas in the data access stage. Ethical data access involves transparent policies about who can access data, the purposes for which it can be used, and mechanisms that enable individuals to control their own data. Therefore, more detailed and specific regulations are needed.

Achieving fairness in data sharing and exchange is rife with challenges centered on the just allocation of data-derived advantages and responsibilities. A crucial concern is guaranteeing equitable access to these benefits, a task complicated by imbalances in technical knowledge and computational capabilities. Furthermore, it is essential to circumvent biases in data accumulation and algorithmic processes, which risk reinforcing societal disparities. Additionally, the consent and privacy of data subjects must be prioritized to prevent the exploitation of personal information. A lack of transparency regarding data utilization can diminish trust. With AI’s evolution, fairness has become increasingly significant. For instance, federated learning faces the dilemma of equitably distributing rewards among servers [118], while in generative AI, particularly within natural language processing, fairness is paramount to ensure non-discriminatory outcomes. In generative AI, fairness challenges include biased datasets causing stereotype perpetuation and the risk of AI amplifying societal biases, requiring meticulous curation for balance [119].

6 Opportunities in data sharing and exchanging

Given the myriad challenges delineated in Sect. 5, we perceive multiple opportunities for researchers to refine and enhance incentive-based mechanisms for data sharing and exchange. To provide a structured understanding, we categorize these opportunities across three distinct dimensions: data creation, data storage, and data access.

6.1 Design trustworthy, efficient, and economic incentive mechanisms for data sharing and exchanging

The task of crafting reliable, efficient, secure, and economically viable incentive mechanisms across the complete spectrum of the data sharing and exchanging lifecycle remains an intricate and unresolved academic conundrum.

During the data creation phase, there is a profound necessity to introduce incentive-driven data collection algorithms. The axiom stands that the prerequisite for preparing data for sharing and exchange hinges upon its quality and relevance. As a result, there’s a compelling case for devising incentives-based algorithms. These algorithms could galvanize nodes to amass a more considerable volume of data and stimulate respondents to relay this data to the pertinent data creation servers.

Yet, a discernible challenge emerges with current data collection algorithms: they tend to be financially burdensome, especially when they grapple with identifying suitable candidates for data relay. In contemporary research, there has been an inclination towards leveraging machine learning and deep learning paradigms. These techniques, as evidenced by various studies [120,121,122,123,124,125], have shown promise in filtering out unreliable candidates and in prognosticating the behavior of potential candidates using their historical data footprints. The overarching sentiment is that such algorithmic approaches can streamline the targeting of genuine and apt candidates, leading to cost optimization.

Turning our gaze to the data storage aspect, one cannot overlook the constraints imposed by limited data caching resources, a reality that persists regardless of whether the storage medium is edge-based or cloud-centric. Thus, pioneering an incentive mechanism to judiciously allocate these caching resources becomes an essential research endeavor [126]. Notably, while many data storage incentive approaches have been grounded in theoretical game mechanics, there remains ample scope for refinement and innovation. The burgeoning domain of distributed data storage platforms, exemplified by technologies like blockchain [59], illuminates a path forward. Herein, the design of an incentive-laden data storage platform appears to be a promising research vector. Additionally, given the inherent unpredictability in framing these incentive mechanisms, there’s merit in melding deep learning and machine learning algorithms. Such an amalgamation could furnish more precise parameter estimations, amplifying optimization efficacy.

Furthermore, as we navigate the rapidly evolving technological landscape, a salient question emerges: How do we conceptualize and implement incentive mechanisms within nascent data-sharing and exchanging modalities? Crafting robust, efficient, and cost-effective incentive frameworks specifically tailored for paradigms like federated learning, blockchain, and edge computing indeed constitutes a valuable research frontier.

6.2 Design unified data management systems and frameworks

Constructing an efficient and cohesive data integration framework is imperative during data preparation. Such unified frameworks [127] hold the potential to streamline data-wrangling endeavors, culminating in considerable time and cost savings.

Given the vast heterogeneity intrinsic to big data, several scholars have ventured into developing specialized integration frameworks tailored for structured and unstructured data [128, 129]. Yet, even these sophisticated frameworks are not without their constraints. Crucially, as we advance these systems, considerations surrounding data privacy and security cannot be sidelined. Moreover, there’s an evident merit in harnessing deep learning methodologies. By doing so, the framework could potentially auto-adjust its structure, accommodating the multifaceted nature of data.

Augmenting the data management system with capabilities for real-time data processing is another pivotal aspect. The rapid proliferation of big data and the IoT has ushered in an era where numerous applications are tethered to the immediacy of data sharing and exchange. Paradigmatic instances include autonomous vehicles [130], emergency fire response [131], and medical emergency services [132]. Yet, it’s evident that real-time data sharing in these sectors is fraught with limitations. Take, for instance, the unfortunate event of a vehicular accident. Present protocols necessitate a phone call, awaiting police intervention-a process that inadvertently prolongs the accident’s aftermath and could potentially delay critical medical interventions. Hence, it becomes paramount for the data management system to evolve, equipping itself with the agility to seamlessly handle real-time data streams.

6.3 Employ artificial intelligence algorithms to improve data quality

Data quality directly impacts the efficiency and accuracy of data sharing and exchange processes [133]. Leveraging artificial intelligence techniques to discern and eliminate inauthentic and subpar data is an invaluable research trajectory.

While many researchers have hitherto applied conventional machine learning and deep learning algorithms to this domain [134, 135], relying solely on these traditional methods to sieve out low-quality data can be resource-intensive. As a remedy, federated learning emerges as a promising paradigm. By adopting a distributed approach to deep learning, computation resources can be conserved. Additionally, the presence of counterfeit data undermines data quality, diminishing the efficacy of data sharing and exchange. Thus, devising mechanisms to detect such spurious data is an avenue worth exploring in future research. Beyond the conventional deep learning techniques, there’s potential in harnessing reinforcement learning. This can expedite model creation, obviating the need to amass training and testing datasets beforehand. Integrating anomaly detection systems into federated learning architectures can significantly bolster the integrity of data quality. Utilizing the distributed topology of the network, these systems are adept at pinpointing and segregating questionable data entries, thus enhancing the robustness of the dataset across the collective nodes. Moving beyond traditional deep learning methodologies, reinforcement learning presents a compelling alternative. It can streamline the model development process by eliminating the extensive requirement for precompiled training and testing datasets. Additionally, reinforcement learning algorithms are designed to adapt to evolving data trends autonomously, providing a scalable and adaptive solution for maintaining data quality amidst the complexities of expansive network environments.

6.4 Using distributed data storage techniques to ensure data security and privacy

In contrast to centralized data storage solutions, distributed storage methods have gained increasing prominence. Recently, blockchain has emerged as a widely researched approach for data storage. Blockchain, characterized as a decentralized, digital, secure, and transparent ledger for cryptographic data transactions, has begun to revolutionize numerous sectors [136]. The adoption of blockchain enables participants to authenticate transactions without relying on a centralized certifying authority. This capability underscores its potential to offer industries a reliable, precise, and immutable record-keeping system.

The academic community has shown burgeoning interest in blockchain, leading to its application across a plethora of domains including the Internet of Things (IoT) [137, 138], cyber-physical systems [139], education [136], supply chain management [140], and crowdsourcing and crowdsensing [141], among others.

Figure 6 provides an illustrative depiction of how blockchain can enhance the data sharing and exchange paradigm [118]. The figure delineates n entities, represented as \(\mathbb {B}=(\beta _1, \beta _2, \beta _3,..., \beta _n)\). Each of these entities can assume the role of either a data provider or a requester. The datasets, denoted by \(\mathbb {D}\)=(\(d _1, d _2, d _3,..., d _m)\), constitute the content intended for sharing and exchange between entities. To safeguard data privacy throughout these operations, various encryption techniques, represented by \(\varepsilon\), are invoked. To spur entities’ participation, an incentive algorithm, \(\Gamma\), is integrated. This algorithm may encompass both monetary and non-monetary rewards. Considering the data storage and processing phases, a suite of distributed data storage and processing strategies can be implemented to bolster data privacy and efficiency. We advocate for the incorporation of blockchain and smart contracts as robust mechanisms for secure data storage.

Fig. 7
figure 7

Reinforcement learning in data sharing and exchanging

6.5 Design authentication and encryption mechanisms

Efficient and private authentication schemes [142, 143] have become paramount in the realm of data sharing and exchange. Historically, authentication protocols within the cloud ecosystem were primarily tailored to single-server environments. This model, however, is increasingly incongruent with the emergent architectures of 5G and the IoT [144], which champion distributed service environments. With the introduction of cloud data sharing in 2009, there was an uptick in user growth and a pronounced demand for shared services. This phenomenon led researchers to embark on the quest for robust trust and security authentications that would seamlessly link cloud users to services. Established frameworks like the SSL Authentication Protocol (SAP) [145] were often perceived as cumbersome and unintuitive by a vast swath of users.

In recognizing the constraints of singular server authentication systems, several scholars veered towards the development of inter-cloud identity management solutions. This journey saw the emergence of protocols such as OpenID [146] and SAML [147], both championing Single-Sign-On (SSO) authentication. Yet, these systems inherently hinge on third-party intermediaries, potentially ushering in unforeseen security vulnerabilities. Consequently, the crafting of efficient, private authentication schemes that resonate with a distributed service environment remains an ongoing academic challenge. In today’s data exchange ecosystems, the vast majority of IoT devices necessitate users to establish personal accounts, often requiring the divulgence of sensitive information. Thus, the dual challenge of guaranteeing user anonymity whilst ensuring efficient authentication becomes evident. Several burgeoning research opportunities have been identified:

  • Synergies: Most of the cutting-edge research delving into efficient, privacy-centric authentication [148, 149] is anchored in the domains of cloud computing and mobile cloud computing. As the locus of future data sharing and exchange is likely to shift towards edge computing, it is imperative to discern potential collaborations between mobile cloud computing and other edge paradigms.

  • Security vs. privacy trade-offs: In the process of devising novel authentication protocols, it becomes essential to strike an equilibrium between security and privacy. For instance, within the paradigm of lightweight authentication [150], the assurance of rigorous user anonymity takes precedence. Furthermore, for devices tethered to batteries, the nexus between energy conservation and security emerges as a captivating area of research.

Historically, the bulk of research efforts have been funneled into crafting access control systems [151] that seamlessly integrate with cloud computing. Contrastingly, research endeavors exploring access control mechanisms within the context of edge computing have been sparse. Thus, the development and implementation of pragmatic access control algorithms tailored to the edge environment stand out as a promising research trajectory. Given the anticipated surge in edge devices in the near future, a new set of challenges centered on efficient access model identification and optimization of finite resources come to the fore. This is especially pertinent for devices that are battery-dependent. In the nascent stages of trust management within the realm of cloud computing, Service Level Agreements (SLAs) [152] emerged as the foundational technique. However, these were not universally consistent across cloud providers, leading to potential trust issues. Most of the scholarly endeavors in trust management have historically been rooted in centralized services. Yet, come 2016, a modest shift was observed with more research gravitating toward distributed computing services. Trust management, when viewed through the lens of distributed computing within data sharing and exchange, emerges as a potential research frontier.

Regarding encryption mechanisms integral to data sharing and exchange processes, it’s evident that a majority of these mechanisms are tailored for cloud-based data sharing and fog data sharing ecosystems. However, the landscape is replete with opportunities to architect more efficient encryption algorithms that dovetail with mobile edge computing and cloud computing paradigms. While a significant chunk of the academic community is engrossed with the CP-ABE algorithms [153], alternative encryption strategies that seamlessly integrate with CP-ABE, such as fully homomorphic encryption (FHE) [154] and ciphertext policy attribute-based proxy re-encryption (CP-ABPRE) [155], also hold promise. As a result, refining encryption mechanisms in the data sharing and exchange space remains a paramount academic pursuit.

6.6 Use reinforcement learning to solve the un-shared decision problem

In data sharing and exchanging processes, it’s commonplace for multiple nodes to collaboratively complete a task. For instance, within the federated learning [156] paradigm, nodes might share their training weights with a central server. Yet, due to privacy and security concerns, these nodes might withhold sharing their individual decisions. Similarly, in various data-sharing scenarios, participants often abstain from disclosing their strategies, especially when bidding for identical tasks [118]. This presence of incomplete information can spawn multiple challenges. For one, participants might propose suboptimal decisions. Further, this ambiguity can lead nodes to adopt untrustworthy or irrational strategies.

Reinforcement learning emerges as a potent solution to these challenges. It empowers participants to discern the most efficacious actions within a data sharing and exchanging milieu. This ensures they grasp external strategies and consequently make optimal decisions. Within reinforcement learning, an “agent” epitomizes the algorithm while the “environment” symbolizes the context in which the agent operates. The environment defines the state (or strategies) and relays it to the agent. Subsequently, the agent, based on its knowledge, takes an action, updates the state, and transmits a reward. Post this, the agent determines whether it should revise its prior action and sends this modified action back to the environment. This cyclical process persists until the environment conveys a terminal state. There exist multiple algorithms within the reinforcement learning umbrella, like Q-learning [157] and actor-critic learning [158], which are adept at addressing these challenges. Figure 7 delineates our proposed mechanism detailing how reinforcement learning operates in data-sharing and exchanging environments. Such environments span diverse applications, from smart cities to healthcare interoperability and federated learning. Within these contexts, given the competitive nature of users and the sensitivity of their data, they are often reticent about sharing their unique strategies during data exchange. Take federated learning as an example: users might deploy their local data to refine global deep learning models and then share only model parameters with the server. They might remain tight-lipped about certain strategies, like the extent of data computing resources they’ve utilized. In such a landscape, as illustrated in Fig. 7, reinforcement learning can deftly address the challenge posed by undisclosed decisions. It enables participants to intuit the strategies of their counterparts, thus formulating their strategy to maximize gains within data-sharing and exchanging platforms.

6.7 Designing fair and ethical algorithms for data sharing and exchange

The creation of an ethical environment for data sharing and exchange is anchored in regulatory compliance. Regulations like GDPR prescribe ethical compliance, yet the interpretability of these laws, including the right to explanation, poses questions as per Walke[159]. Ensuring that AI algorithms offer intelligible explanations to users remains a critical issue. Furthermore, adhering to the data minimization principle is essential to optimize resource use. As technology advances rapidly, automated tools for continuous and immediate compliance monitoring become crucial, eliminating any lag in adapting to new developments, as suggested in the study [160]. A platform that involves a diverse group of developers and users in the design process could ensure inclusive and up-to-date compliance monitoring.

The examination of data and algorithms for inherent biases is essential. It is incumbent upon researchers to devise algorithms capable of detecting and preventing the reinforcement or intensification of social inequalities. Within AI-driven data sharing and exchange platforms, there is a pressing need to evolve more sophisticated methods for uncovering and countering biases in intricate algorithms, notably those employing deep learning where interpretability poses significant challenges. Addressing biases in natural language processing, specifically, those pertaining to gender, race, and age, is of particular importance [161,162,163]. Given the dynamic generation of data, the creation of efficacious algorithms for the continuous monitoring of biases is indispensable, ensuring their timely adaptation and the prevention of new forms of social inequality. Investigations into real-time adaptive learning [164] and bias detection algorithms [165] represent valuable avenues for future research.

In the context of data sharing and exchange, it is imperative to balance the objectives of data process transparency with the imperatives of security and privacy. Crafting algorithms that simultaneously enhance transparency and bolster security is a complex yet vital endeavor. Blockchain technology offers a promising avenue, delivering comprehensive data visibility while maintaining the sanctity of sensitive information, crucial for fostering trust and ensuring compliance in data-centric systems [166,167,168]. Furthermore, evolving consent mechanisms to afford individuals robust control over their data necessitates innovative solutions. Research inquiries may delve into designing universally comprehensible consent protocols, sustaining informed consent amidst changing data use scenarios, and exploring the role of AI in consent management, scrutinizing the ethical dimensions of automation in such critical processes.

In summary, the exploration of data ethics in the context of data sharing and exchange is a critical area ripe for scholarly inquiry, especially regarding its broader societal ramifications. Investigative efforts could delve into the integration of ethical principles in the initial design stages of data infrastructures to promote just and responsible data utilization. Scholars are poised to scrutinize the broader societal repercussions of data-sharing protocols, emphasizing their effects on communal trust, individual privacy rights, and overall societal safety. Furthermore, research might illuminate pathways to harmonize technological progression with ethical considerations, aiming for a just distribution of data sharing’s advantages across the societal spectrum, thereby addressing the digital divide and mitigating social disparities. The ultimate objective is to cultivate data ecosystems that align with ethical norms and actively enhance societal wellbeing and fairness.

7 Incentive-based data sharing and exchanging applications

7.1 Healthcare

Interoperability in healthcare serves as a pivotal data sharing and exchanging system, enabling third parties to cohesively exchange, interpret, and utilize data. However, certain stakeholders, like pharmacies and clinics, might be hesitant to share their customers’ data on third-party platforms. Consequently, these data-sharing and exchanging platforms need to architect robust incentive mechanisms that both entice various entities to share data and ensure the protection of sensitive information. Recent adoption of FHIR (Fast Healthcare Interoperability Resources) APIs by leading institutions, including the Mayo Clinic and the UK’s NHS, has streamlined health record management globally [169]. These APIs empower developers to create applications that enhance patient involvement and clinical care standards.

Furthermore, inherent data security and privacy challenges in interoperability systems may deter patients from sharing their data. One paramount security concern is software security. With the proliferation of health and wellness apps on mobile devices, patients can effortlessly schedule appointments, view reports, and more. But this convenience amplifies risks, especially with heightened concerns about data breaches and malevolent hackers targeting healthcare interoperability platforms. In 2020, the U.S. Department of Veterans Affairs debuted a unified health platform, enabling veterans to seamlessly retrieve their records across providers, reflecting a push towards greater healthcare interoperability and fortified data security [170].

A second concern stems from human factors. Irrespective of whether actions are malicious or inadvertent, data leakage due to human error or intent can have grave repercussions. For instance, employees within participating organizations might pilfer data or misplace devices. Another critical security threat is Distributed Denial of Service (DDoS) attacks. The healthcare sector, being a recurrent target, necessitates fortified defenses. As the digitization of healthcare continues its upward trajectory, it’s imperative that those spearheading security efforts be proficiently trained.

Additionally, potential vulnerabilities in the hospital supply chain can’t be overlooked. Collaborators and partners in the healthcare ecosystem might inadvertently introduce data breach risks. For example, The IBM Blockchain Platform has been utilized to bolster transparency and enable real-time tracking in the medical supply chain, offering an immutable ledger that safeguards data integrity and security across stakeholders [171]. As such, when formulating incentive mechanisms for data sharing in healthcare interoperability platforms, ensuring comprehensive security and privacy across every stage of the supply chain becomes paramount [172]. Incentive structures should not only motivate stakeholders to share data but also integrate privacy-preserving algorithms at every phase to bolster trust and safety.

7.2 Smart home

Smart homes can be significantly enhanced through data sharing and exchanging, especially when it comes to refining customer behavior prediction algorithms. However, homeowners might hesitate to share their data unless they perceive tangible benefits in return. This reluctance is compounded when considering the sharing of highly sensitive information such as location data, utility consumption, or footage from security cameras. In 2021, Google’s Nest thermostats began using radar for presence sensing, optimizing energy efficiency through smart data use [173]. Simultaneously, Amazon’s Alexa for Residential was introduced, enabling custom voice services in rentals, marrying data sharing with tenant privacy [174].

The sphere of smart home data sharing and exchanging isn’t devoid of security and privacy challenges. Two primary concerns include threats to confidentiality and authentication. Breaches leading to unintended disclosure of personal information epitomize confidentiality threats. Consider smart home control systems like Nest: these systems monitor room temperatures, electricity usage, and other metrics to deduce occupancy patterns. Such intimate insights into daily life can become areas of vulnerability, especially when unauthorized access occurs. A misplaced or stolen password can open the door to malicious actors, empowering them to alter system settings, issue deceptive commands to devices, or even disengage safety features like smart locks, potentially resulting in dire consequences [175]. One notable incident involved a hacker gaining access to a family’s Wi-Fi network through a smart thermostat and manipulating the home’s heating and security systems [176]. Another example is the breach of Ring security cameras, where attackers harassed occupants through the device’s speaker system, showcasing the risks of insufficient device security [177]. Moreover, the interconnected nature of smart home devices can inadvertently amplify users’ anxieties. When multiple devices share data, they can collectively paint a detailed portrait of a user’s behavior. For instance, AI-driven appliances might draw from data streams of ovens, microwaves, refrigerators, and security systems, thereby reconstructing a user’s daily routine. Such predictive capabilities underscore the importance of crafting effective incentive mechanisms that inspire homeowners to willingly share data. Simultaneously, safeguarding sensitive data emerges as a paramount research area in the realm of smart homes.

7.3 Smart grid

Data sharing and exchanging are not novel concepts for the energy sector. Activities such as supplier switching and requests for meter data have been prevalent for some time. The primary concern in the realm of smart grids is data privacy. Smart meters, pivotal components of intelligent grids, continually relay private data from households to utility companies and service providers. Analogous to smart homes, this transmitted information can be used to deduce customer behavior patterns and ascertain periods of home occupancy. The interconnected nature of the smart grid means that multiple devices can coordinate the electricity supply and network demand concurrently. This connectedness, however, can also be a potential vulnerability: attackers might exploit the network to compromise these devices. Recent incidents, such as the SolarWinds cyberattack [178], reveal the fragility of interconnected systems and the necessity for enhanced grid security protocols.

Furthermore, the remote operability of many devices in smart grid systems necessitates robust security measures. Devices that are remotely controlled should boast long operational lifespans, and crucially, the ability to efficiently receive and implement security software updates. Lacking these safeguards, equipment can be susceptible to attacks such as Denial of Service (DoS), rendering them inoperative. In response, the U.S. energy sector’s adoption of the NIST Cybersecurity Framework [179] highlights the commitment to reinforcing device security across the grid.

Another paramount aspect of data sharing in smart grids is fostering collaboration amongst different companies and providers. Given that many of these entities are competitors, the challenge lies in crafting an equitable reward system and distributing resources judiciously to incentivize data sharing. For example, the ENTSO-E Transparency Platform [180] in Europe enhances cooperation by providing comprehensive electricity data, boosting market efficiency.

7.4 Smart logistics

Data sharing and exchanging play an instrumental role in the realm of smart logistics. Through third-party platforms, companies can exchange sensitive data, leading to enhanced predictability and increased productivity. However, smart logistics platforms encounter several security and privacy challenges during data sharing and exchanging.

The foremost challenge is data corruption. Attackers infiltrating the system can dispatch deceptive requests, prompting devices to render incorrect decisions. Another significant threat is linked to equipment maintenance. In the absence of robust security measures, Denial of Service (DoS) attacks could compromise and incapacitate vital equipment or facilities. The 2021 ransomware attack [181] on Colonial Pipeline exemplifies the risks of DoS, emphasizing the need for resilient security in logistics infrastructure.

Furthermore, data privacy emerges as a consistent concern. The supply chain, often encompassing confidential user information, mandates rigorous protective measures to prevent breaches [182]. Additionally, participants in these data-sharing platforms often stand as market competitors. Consequently, companies may harbor reservations about revealing proprietary data, especially in the absence of compelling incentives. Thus, devising appropriate incentive mechanisms, complemented by robust privacy-preserving algorithms, becomes a pivotal concern in this domain. To address this, initiatives like the Digital Container Shipping Association (DCSA) are setting industry-wide standards to foster data sharing while protecting competitive interests [183].

7.5 Automotive industry

The advent of data sharing and exchange has been instrumental in advancing the automotive sector, especially in the realm of autonomous vehicle technology. The capability for real-time data exchange is essential, enabling vehicular and infrastructural intercommunications to enhance safety and traffic management. Notwithstanding the advantages, the integrity of Vehicle-to-Everything (V2X) communication systems is of paramount concern. It is imperative to establish robust incentive mechanisms to motivate all parties, including vehicle owners and manufacturers, to contribute data, while concurrently ensuring stringent cybersecurity protocols. Strategic partnerships between automakers and cybersecurity enterprises have culminated in the fortification of V2X systems, with Tesla and BMW emerging as pioneers in this technological integration [184].

Nevertheless, the exchange of data within the automotive industry raises significant privacy concerns. Such data could potentially disclose an individual’s location, daily patterns, and private conversations, especially with the presence of camera recording devices in vehicles. It is, therefore, essential to implement stringent data security protocols that bolster user confidence in sharing their information. Furthermore, sophisticated incentive algorithms can be employed to refine data sharing and exchange systems, ensuring the automotive industry’s advancement without compromising personal privacy.

7.6 Financial services

Data sharing has revolutionized the financial sector, enabling banks and fintech entities to offer bespoke services. For example, Plaid [185] is a platform to link bank accounts to financial apps, streamlining transactions, and enhancing user experiences. Similarly, Yodlee [186] offers data aggregation and analytics services, providing insights to both consumers and financial institutions. Nonetheless, the sector is navigating through a labyrinth of rigorous data privacy and security regulations. Incentive-driven data-sharing frameworks have the potential to catalyze the secure exchange of data, aligning with regulatory mandates such as GDPR and CCPA. The drive towards open banking, propelled by API technology, has facilitated the creation of novel applications that elevate service delivery, ranging from unified financial interfaces to advanced fraud detection mechanisms [187].

However, breaches of financial records pose considerable risks. Stakeholders are often reticent to share financial data without assurance of robust security measures. Additionally, within financial data-sharing platforms, participants may also be competitors from different institutions, naturally cautious about disclosing customer data without substantial incentives and algorithms to safeguard their clients’ information. Consequently, incentive-based data sharing and exchange platforms are of paramount importance in the financial sector, balancing competitive interests with collaborative imperatives.

8 Conclusion

In this comprehensive survey, we explored various incentive mechanisms and optimization algorithms related to data sharing and exchanging, offering foundational definitions and related concepts. We segmented the lifecycle of data sharing and exchanging into four distinct parts, presenting in-depth insights on associated works within each category. Among the challenges identified in the design of incentive mechanisms, two primary concerns stand out in the majority of incentive-based applications: the challenge of motivating different users, especially competitors, to engage in data sharing and exchanging; and the imperative to protect sensitive user data. Addressing the former, combining both monetary and non-monetary incentives appears to be an effective approach to stimulate user participation in the sharing process. For ensuring data security, the integration of tailored encryption algorithms and the use of distributed data storage methods, such as blockchain and federated learning, emerge as sound strategies. In scenarios where data quality is paramount, deep learning presents a potential solution to both identify fake users and anticipate user behavior. In our rapidly evolving digital landscape, the crafting of trustworthy, efficient, and economical incentive mechanisms for data sharing and exchanging holds significant importance across numerous domains.