1 Introduction

The concept of data monetization is new in the academic literature. However, practitioner firms like Gartner (Moore 2015), EY (EY Global 2018), Deloitte (Deloitte AI Institute 2021), KPMG (Mohasseb 2015), and academic institutions such as MIT (Barbara and Jeanne 2017), (Wixom, Cashing in on your Data 2014), (Moore 2015) have published several articles on the concept. As companies continue to generate massive amounts of data and handle existing historical data, they are turning to consulting companies to help them understand how to maximize the value of their data. Data monetization is the commercialization of data and information assets. Data monetization occurs when organizations exchange data and information assets for financial return or something with equivalent value (Buff et al. 2014). According to Prakash (2014), data monetization refers to the organization's ability to create additional revenue from existing data sources (internal and external), to create useful information, insights, and observations. Fortune 500 organizations such as Amazon, Facebook, and Apple focus on data-driven business models to develop new products and services and improve the customer experience while generating additional revenue streams. This concept focuses on data as a product and as such is managed that way (Marcinkowski and Gawin 2020).

Data monetization goes beyond selling raw or processed data therefore the concept of “monetization” may be misleading. Beyond selling the data directly for cash, data monetization occurs when organizations use data to create value driven products, convert data and analytics into financial returns and other tangible benefits such as supplier funded advertising and discounts, or by avoiding costs that could come from operational inefficiencies.

Data monetization either aims to reduce operational costs by leveraging internal data and/or generate revenue through other models such as selling data and wrapping data around products and services.Footnote 1

The data monetization global market is estimated to grow from US$2.1 billion in 2020 to US$15.5 billion in 2030 (compound annual growth rate of 22.1%) (Kanhaiya et al. 2022). This will be driven by the increasing magnitude of generated data, awareness of data monetization, emerging technology opportunities and trends (Moore 2015) such as Business Intelligence and Analytics (BI&A), cloud computing, blockchain, Internet of Things (IoT), social networks and post-COVID-19 pandemic business approaches and strategies (Mordor Intelligence 2022).

Organizations wanting to develop a successful data monetization strategy will require a good understanding of the different data monetization models, their implications, opportunities, and limitations. Given the recency of data monetization as a discipline, efforts have been dedicated towards producing academic research across different areas, however there is still work to be done. The first published Systematic Literature Review (SLR) was conducted by Liu and Chen in 2015 to raise awareness, within the academic community, on the potential of data monetization research. The authors contributed to understanding data monetization by providing use cases, guiding principles and a framework that combines Analytics 3.0 (advanced analytics) and BI&A 3.0 (mobile and sensor-based analytics) to better understand the subject. Analytics 3.0 is a stage where organizations realize measurable business impacts from the combination of traditional analytics, big data and powerful data gathering and analysis methods to a company’s offerings, embedding data smartness into the company’s products and services. This review was limited to a short period (i.e., from 2010 to 2015) and only considered articles that contain “data monetization” or “monetization” within their titles thereby excluding papers that discuss data business models that are not explicitly titled “monetization”. In 2016, Thomas and Leiponen conducted an SLR on data commercialization based on 51 articles. They chose the term commercialization rather than monetization to differentiate the trade in data through commercial transactions. They identified monetization challenges and models beyond the internal use case for monetization. The identified models included data suppliers, data managers, data custodians, application developers, service providers and data aggregators. While the authors considered broader terms beyond “data monetization”, the review was limited to a short review period (i.e., from 2010 to 2013) and narrowly focused on monetization models through the lens of the players involved and therefore did not provide insights beyond a single construct (players). Hanafizadeh and Nik (2020), using an SLR, proposed a configuration model called “Data monetization configuration” (DaMoc) and tested it with a real application (i.e., Cardlytics). They identified the following global themes categorized into different layers: the monetization layer (theme consisting of trading model, goods, end consumer), the data refinement process layer (theme consisting of assets, data driven operation and value), the base layer (theme consisting of resources and supplies) and the accessing and processing restrictions layer (theme consisting of privacy, legal and ethical issues of data processing). Similar to Liu and Chen (2015), the review only considered articles that contained “data monetization” within their title thereby significantly limiting their scope. In addition, despite the longer publication period (i.e., from 2006 to 2018), the authors had a small paper sample of only 18 papers. Faroukhi et al. (2020a, b) used a longer publication period (i.e., from 2000 to 2020) and a larger paper sample (i.e., 97 papers) to conduct an SLR using a Big Data Value Chain (BDVC) framework. The BDVC framework describes steps for administering an organization’s data related processes. The steps range from data generation to data exposition. They further proposed two monetization models: a reduced data monetization model and a full data monetization model. The reduced data monetization model aims to monetize data only through the storage and visualization phases. The full data monetization model is more generic, expensive and supports monetization along the whole BDVC. Unfortunately, this study does not show how BDVC and data monetization are integrated, and it does not provide a reconciliation to existing literature.

While the abovementioned SLRs have contributed to a domain-based review on data monetization, there continues to be a notable lack of comprehensive work in the academic literature that seeks to systematically map the literature in a way that consolidates and structures knowledge in the domain for both the academic and practitioner community, highlights areas in need of further research, and guides the planning of new research while supporting claims of relevance and novelty.

This paper conducts a technology empowered SLR. The data is electronically sourced from online scientific databases, which is then analyzed through a combination of VOSviewerFootnote 2 and manual effort. The SLR provides a thorough evaluation of the literature, going beyond monetization models that focus on trading data or data monetization in the context of data value chains. Indeed, the study finds, identifies, selects, analyzes, evaluates, and systematically synthesizes work that focuses on different models of data monetization. The work described in this paper is not limited by industry and geography. It consolidates the existing literature, develops a content categorization and a corresponding conceptual framework, and provides a structure for exploring specific research areas in data monetization that cater to a variety of stakeholders such as academic researchers, business managers and decision makers. The paper further provides practitioners with insights into existing data monetization models, which may serve as a starting point for various data monetization initiatives.

The literature review seeks to answer the following research question: What are the main subjects, challenges, and opportunities of data monetization in the academic literature and what are the corresponding implications for practitioners and academic researchers?

To achieve this objective, the study describes the areas of data monetization, maps the literature on the subject, proposes a categorization and corresponding conceptual framework highlighting the literature which has been produced till date, proposes new research questions capable of increasing the quality and relevance of the academic literature and proposes managerial implications for organizations.

The paper is structured as follows. The conceptual background is presented in Sect. 2, Sect. 3 defines the research method, followed by a discussion of the findings, managerial implications and research agenda in Sect. 4. Section 5 presents the conclusion, limitations, and considerations for future work.

2 Conceptual background

2.1 Data

Data is a public good that is consumed by people but whose supply is not affected by people’s consumption (non-rivalrous) (Thomas and Leiponen 2016). Non-rivalrous means that multiple actors can exploit a single dataset, it is replicable and using it does not make its value disappear however once the data is revealed its value drops. Data value varies by the way one uses it, combines it and whether one can make it available at the right time (Parvinen et al. 2020). Most organizations now acknowledge data as a strategic asset and many practitioners have gone as far as calling data, not oil, the most valuable resource in the world thereby making data one of the most important assets for digital transformation.

As the world becomes more digital, the volume of data continues to increase, and the notion of big data is becoming a widespread phenomenon. Big data is a major enabler of data monetization and has quickly gained popularity among industries that own huge data assets. Gartner in its glossary defines big data as a “high volume”, “high velocity”, “high variety” information asset that requires a cost-effective, innovative form of information processing that enables enhanced insights, decision making and process automation.Footnote 3 According to Faroukhi et al. (2020a, b) “high volume” indicates large amounts of generated data that cannot be processed through traditional processing and storage means. “High velocity” indicates the speed at which data can be generated. “High variety” refers to the characteristics of data that come in different formats including structured formats such as traditional relational database values, semi structured formats like XML, and unstructured formats which could include email and IoT data. While the benefits of big data is real, its characteristics are often constraining. Big data often has data quality issues, which make its use challenging. It must be well arranged and free from gaps and erroneous records to enable efficient data-based decisions (Marcinkowski and Gawin 2020). In addition, the violation of data privacy leads to several ethical and legal issues (Faroukhi et al. 2020a, b).

The analysis of this data is so rampant and has become one of the four technology trends of the decade together with mobile, cloud, and social business (Saynajoki et al. 2017). Organizations are sitting on large amounts of data (historical and transactional) with no clue on how to maximize its value. The new data economy has led to several data use cases and one of them is monetization. Since the term Data and Big Data monetization are used interchangeably throughout the academic literature, this paper will use data to mean both big data and data.

2.2 Data monetization

Data monetization is an evolution from Business Intelligence (BI), which has become a strategic tool for many organizations in the last two decades. BI is the process, strategy and technology involved in using business information by organizations for data driven analysis to extract usable and shareable information. During the 2000s, data was used for descriptive analytics, which allowed organizations to extract, process and aggregate data for internal use within the organization (Zakaria et al. 2020). With the proliferation of digital technologies such as big data analytics, IoT, Cloud, Machine Learning, and Artificial Intelligence (AI), data monetization has become a formal discipline and models beyond traditional BI have been developed. This is demonstrated when Liu and Chen (2015) developed an analytics framework for data monetization by adapting two evolution models: the evolution of Business Intelligence and Analytics (BI&A) and Analytics 3.0 framework.

With this, it can be said that organizations have been performing internal data monetization by improving internal operations through traditional analytics capabilities and data mining approaches to improve their bottom line returns (Alfaro et al. 2019), meet business needs and solve internal problems (Najjar 2013). In the literature, Alfaro et al. (2019) investigated the monetization journey of BBVA (a global financial group) which improved pre-existing internal monetization activities, and pursued new approaches by establishing a data science center of excellence and getting the center to collaborate with the business on data monetization projects. Marcinkowski and Gawin (2020) shared insights from a facility that evolved from “service based” to “data driven”, which led to a strong and loyal customer base as well as additional collaborative opportunities. Internal data monetization focuses on reducing operational cost and improving business operations (existing processes and products). It informs strategic business decisions and refines business processes by inputting data into the management process (Schroeder 2016). Internal monetization is usually the first stage for organizations as they deal with limited organizational and technological resources (Lange et al. 2021).

Data monetization as a concept has been heavily studied by MIT’s Center for Information Systems Research (MIT CISR). One of their very first case studies was Owens and Minor (OM). In the early 2000 OM, a distributor of healthcare supplies, created information offerings based on data from products and services distributed from hundreds of suppliers to several thousand hospitals. These offerings, called “Spend Analytics”, were sold to hospitals and supplier organizations that needed to make better decisions about procurement and product distribution/market penetration respectively. OM later evolved by creating a separate solution that focused on its information-based offerings (Wixom, Cashing in on your Data 2014). By 2014, MIT CISR started to produce more data monetization research in the form of research briefings. This was an early case of external data monetization.

Several papers in the literature have tackled external data monetization from different perspectives. In terms of holistic case studies, Najjar and Kettinger (2013) studied the data monetization journey between retailers, suppliers, and a supplier portal. The paper identified three pathways to monetization and followed the third pathway where the retailer built a technical data infrastructure (supplier platform) and leveraged the analytics capabilities of the suppliers. Parvinen et al. (2020) studied 24 companies to provide a wide-ranging view of data-monetization practices in large and middle-sized companies. They identified three models: selling data, selling analysis, and selling data-based services. They created a matrix of these models against customer types (current customers, actors in the current value chain, anyone), devised a path using this matrix and came up with steps on the path to monetization. De Reuver et al. (2015) studied the case of 3cixty (a data platform for mobile context-aware travel applications) which developed a multi-sided platform to serve multiple users (advertisers, app developers, government organization and end-users). The objective of a multi-sided platform is to facilitate the transactions between different user groups such as consumers and app developers. The platform provides apps and services for city visitors. The paper explored different revenue models from the end-user perspective. The authors discovered that the more willing a user is to share data, the less likely the user is willing to pay for the app. De La Vega et al. (2018) introduced data monetization in the context of IoT based on the study of two companies; Company A provides smart city services, Company B owns a smart building, they propose a data marketplace with a peer-to-peer architecture powered by blockchain to enforce trust and non-repudiation among peers.

Another traditional business model that has influenced data monetization is a two-sided market, defined as a trading platform dealing with two distinct user groups that provide each other with benefits. The trading platform involves a data broker, a data provider, and a data consumer. The data broker acts as an intermediary which connects two or more market participants via the platform and simplifies their interactions. Large companies operate their own data platforms to manage regular data interactions with third parties, while smaller companies tend to exchange via neutral platforms (Spiekermann 2019). The two-sided platform is a component of a data marketplace where firms and individuals can buy, sell or trade second or third-party data. Examples of data marketplaces include: Salesforce’s Data Studio, Oracle’s BlueKai, Adobe’s Audience Marketplace (Sinha 2019) and Snowflake Data Marketplace. The two-sided markets were studied by Agarwal et al. (2019), Bataineh et al. (2020a, b) and Saleh et al. (2021).

With nascent digital technologies such as Blockchain, data monetization continues to evolve. Al-Zahrani (2020) proposed a subscription-based data sharing model that not only leverages blockchain but also Data as a Service (DaaS), data centers in the cloud concept. In this model users subscribe to a data provider for a specific period and pay for the data access based on the selected subscription plan. Javaid et al.(2020), Abubaker et al. (2022), Madinen et al. (2022), Khezr et al. (2022a; b) leveraged IoT with blockchain technology to provide trustful trading through automatic review systems for monetizing IoT data using Ethereum smart contracts.

3 Research method

A systematic literature review (SLR) is a rigorous research methodology, not just to gather, organize and analyze existing research on the subject but to help researchers develop evidence-based guidance for research in their area of study. This SLR follows Kitchenham’s 2004 approach for conducting a systematic literature review, which consists of three stages: (1) identification and selection of studies; (2) data collection and extraction; and (3) data synthesis and interpretation.

Scopus, Web of Science Core and ABI/INFORM Global were selected as the databases of choice. The search strategy was developed by first identifying key terminologies and synonyms from the research topic, which were then translated into Boolean queries. These words are “data monetization model”.

The initial search was conducted in Scopus to assess the validity of the query at a high-level. From the results, a paper was selected based on how close the title was to the research topic. Additional terms were identified such as “information model”, “data business model”, “data commercialization”, etc. The paper in question was: Advancing data monetization and the creation of data-based business models (Parvinen, et al. 2020). This approach, called Pearl growing (also known as ‘Citation mining’ or ‘Snowballing’) ensures that all relevant literature has been identified. This approach was also leveraged to extract papers manually.

The query results were extracted into MendeleyFootnote 4 to support export to Covidence.Footnote 5 All papers from the three databases were exported into Covidence for de-duplication. The abstracts were initially screened in Covidence to exclude all papers that are either not relevant or do not meet the inclusion criteria by identifying papers as either “Yes”, “No” or “Maybe”. The results were exported back from Covidence to Mendeley for the papers to be extracted for a full text review.

In addition to the papers identified by the initial query, four papers were manually identified through “backward snowballing”.

The query used is broken into three blocks. The first block captures “data” related terms, and the second block captures “monetization” related terms. These two sets of words are separated by a proximity operator to capture articles where the terms appear a few words apart. The third and last block of words captures “model” related terms. The query is shown below:

((data OR insight OR information OR "digital business" OR "business Intelligence") W/4.

(moneti* OR commerciali* OR "revenue generat*")) AND.

(model OR strategy OR approach* OR offering).

3.1 Exclusion criteria

Results prior to 2013 were excluded given the recency of the topic. The practice of data monetization, although common since 2000 (Wikipedia 2014) has only been published since 2013. This is evident from the initial results of our query where we searched the database to identify when papers on “data monetization” were first published.

3.2 Inclusion criteria

The systematic literature review prioritizes peer reviewed journal and conference papers. Selected papers should explicitly describe at least one model for data monetization and only work published in the English language was considered. The paper selection steps are represented in the PRISMA results flow diagram in Fig. 1

Fig. 1
figure 1

Systematic review selection and review procedure adapted from PRISMA (2009)

3.3 Content analysis

To answer to the research questions, both thematic and co-occurrence analyses were performed. Given that the literature on data monetization is nascent and the final number of articles selected for analysis was only 54, the thematic analysis was performed manually and supported by the co-occurrence analysis using a tool.

Firstly, the authors independently identified key themes from all 54 papers. Secondly, to aid the development of key themes, the authors developed a term co-occurrence map based on the titles and abstracts of all 54 papers (see Fig. 2) using VOS Viewer’sFootnote 6 enrichment technique called co-word network analysis. Co-word analysis assumes that words that frequently appear together have a thematic relationship with one another. Thematic analysis is an analysis that requires researchers to systematically extract qualitative data (e.g., text) from a collection of documents (e.g., articles, interviews) for identifying, analyzing, and reporting on a theme. Thematic analysis empowers researchers with autonomy in dealing with the themes that manifest themselves from the research data (Naveen et al. 2021), (Marc et al. 2022).

Fig. 2
figure 2

A network visualization of terms based on keyword co-occurrence in 54 data monetization articles using full counting and the 75 most relevant terms. The size of the nodes indicates the frequency of the keyword. In this case, a minimum threshold of 4 is set as a frequency. The link between nodes indicates the similarity of keywords, i.e., the closer the node, the greater the similarity. Red (Cluster 1) = challenges, blue (Cluster 3) = business model, and yellow (Cluster 6) = marketplace, light blue (Cluster 4) and purple (Cluster 5) = revenue and players, green (Cluster 2) = platform

Each color in Fig. 2, below, represents a thematic cluster and each node in a network represents an entity (e.g., article, author, country, institution, keyword, journal), and in the case of Fig. 2, a keyword in the title and abstract of all 54 papers. The size of the node indicates the number of times that the keyword is used. The bigger the node, the greater the occurrence of the keyword. Each circle in Fig. 2 represents a term that appears at least four times in the titles and abstracts of all 54 papers. Setting the criterion that keywords are included when they have appeared in a minimum of four articles helps curate a pragmatic set of clusters for network visualization based on how prolific or prominent the keywords in the clusters are in the corpus (Marc et al. 2022). The link between the nodes represents the keywords that co-occur or occur together. The thickness of the link signals the number of times that the keywords co-occur or occur together (Satish et al. 2022).

The network map produced six clusters. The authors then reconciled the themes identified from the manual exercise conducted independently. The explanation for each cluster was developed and the themes were identified manually, wherein keywords in each cluster were organized to convey a coherent narrative that explains the essence and scope of the similarities within a cluster. As a result, the map was further refined and re-categorized and coded as can be seen in Appendix A: Themes mapped to clusters and paper count which shows the number of articles that discuss each concept. Where discrepancies occurred between the authors, there was elaborate discussions until a mutual agreement was reached. As a result, three final categories where identified: Data Monetization Strategy (DMS), Data Monetization Infrastructure (DMI) and Data Monetization Challenges (DMC).

DMS was extracted from several clusters. Clusters 3 and 6 (blue and yellow networks) were refined to form the data monetization models, while Cluster 5 (purple network) and Cluster 4 (light blue network) were for the players and the revenue models. DMI was identified based on Cluster 2 (green network) with terms such as Internet, platform, IoT, blockchain, and device. DMC was identified based on Cluster 1 (red network) with terms such as challenge, gap, security, and privacy. Using this as a foundation, the authors included the additional themes that were identified and agreed upon from the manual exercise.

Hence DMS includes operating model, players, and revenue model and DMI includes cloud, blockchain, sensors and IoT. For DMC related terms, challenges related to data monetization were extracted from all 54 papers. Using a word art, the top seven challenges were identified as security, privacy, pricing, contract design, data quality, beliefs, and data skills. The total number of papers that discuss the identified themes can be found in Appendix A: Themes mapped to clusters and paper count. The breakdown of papers by themes can be found in Appendix B: Categorization of papers by themes identified.

4 Findings and proposed research agenda

4.1 Basic characteristics of the literature

A total of 54 papers were collected for data extraction and synthesis. In Fig. 3, we illustrate the trend of publications with a significant spike in 2020 and 2021, accounting for about 50% of the total publications. In Appendix C: Overview of studies, year, journal and paper methodology, we illustrated the breakdown of papers by journal and methodology. About 35% of publications were made in an IEEE/data related journal or conference with about 20% leveraging the case study approach, 23% literature review, 14% mathematical approach and the other 40% some form of qualitative and exploratory approaches. Given that the query was last executed in Oct 2022 there is a risk of missing newly published papers that have not been captured.

Fig. 3
figure 3

Count of papers by year and article type

In the following subsections, and based on the network visualization map generated through VOS Viewer and a review of all 54 papers, we answer the research question: What are the main subjects, challenges, and opportunities of data monetization in the academic literature and what are the corresponding implications for practitioners and academic researchers? We do so by identifying categorization and subcategorization areas and proposing areas for further research at the categorization level. We also point to the managerial implications of our findings.

4.2 Data monetization strategy (DMS)

The data monetization framework presented in Fig. 4 was devised to visualize the main components of a data monetization strategy. The framework includes the models to the left, the players to the right, and the revenue models to the top. This framework builds upon existing classifications in the literature and enhances them by introducing the players organized by the value they add to the monetization ecosystem, as well as the overarching revenue models. The mapping of the revenue models to the monetization players is represented in Table 1. In addition, the studies that discuss the DMS category and subcategories are shown in Appendix B: Categorization of papers by themes identified.

Fig. 4
figure 4

Data Monetization Framework. “Adapted from Barbara and Jeanne (2017), Susan Moore (2015), Parvinen et al. (2020) and Zakaria et al. (2020)”

Table 1 Mapping of revenue models to monetization players. Adapted from Spiekermann (2019), Kemppainen et al. (2018), Thomas and Leiponen (2016)

In Sects. 4.2.1, 4.2.2 and 4.2.3 the authors describe the findings and in Sect. 4.2.4 and 4.2.5, the authors present managerial implications and a research agenda respectively.

Figure 4 is the first attempt to visualize the monetization models identified from the literature. The categorization by Wixom (2014) (data wrapping, bartering, and selling) and Parvinen et al. (2020) continue to be a basis in the data monetization literature. We contribute to the literature by enhancing the existing models with critical elements such as the revenue model and players based on the value generated within the ecosystem.

4.2.1 Operating model (OM)

The operating model is depicted to the center left of the framework in Fig. 4. The first dimension is the type of data monetization (internal, indirect, and external data monetization). Internal data monetization is when a company has data assets, extracts value from them, and does not wish to share those assets with other parties. Internal data monetization has existed long before the concept was introduced. Any company using data to improve its performance can be considered as involved in internal data monetization.

Internal data monetization seeks to reduce operational cost, improve business operations, and improve its reputation (Alfaro, et al. 2019). Marcinkowski and Gawin (2020), Alfaro et al. (2019), Najjar (2013), Schroeder (2016), Lange et al. (2021), Firouzi et al. (2020), Quach et al. (2022) all describe an internal data monetization model. Examples of internal data monetization would include efforts to help banks optimize the placement of bank branches, data from sensors in building to complement service driven operations leading to decreasing property utilization cost, developing business intelligence and analytical capabilities to meet business needs and solve internal problems.

External data monetization is when a company shares its data assets with other parties such as suppliers and customers. External monetization can take various forms. Susan Moore (2015) proposed direct and indirect data monetization types. In direct monetization, the trade in data is through commercial transactions that involve monetary rewards. The indirect method uses data, refines it, and produces information assets, services or products that can be sold. Indirect monetization can, for instance, help identify new customer needs and create new revenue opportunities. Direct monetization involves selling data directly and indirect monetization involves data wrapping and bartering.

Barbara and Jeanne (2017) defined three business models within external data monetization, namely data wrapping, bartering, and selling. Data wrapping involves wrapping core offerings with an analytics feature. Here you are making money by distinguishing offerings with features and experiences. An example is when commercial banks create a financial tool for customers that automatically categorizes their transactions into common budgeting categories (Alfaro, et al. 2019), (Firouzi, et al. 2020), (Quach et al. 2022). In a 2014 poll by MIT CISR, 73% of executives chose wrapping as the data monetization approach that offers the greatest future potential for their companies (Wixom, Cashing in on your Data 2014). The desired outcome of wrapping includes increased value, market share, product price cost effective services.

Data bartering involves providing data in exchange for non-monetary rewards such as reports, favorable terms, free services, benchmark metrics, or analytics software (Wixom, Cashing in on your Data 2014). This is popular amongst social media companies such as Facebook who provide free access to social media platforms in exchange for user data. The key concern with bartering is that organizations or individuals may not recognize the true worth of the information they are giving up. Therefore, data regulations need to be established to protect both parties.

Data selling is the most common form of external data monetization. Common methods involve retailers selling Point of Sale (POS) transaction data to consumer research firms like IQVIA, Kantar, Nielsen, etc. Given that selling raw data poses privacy concerns and questions contractual obligations, companies are developing alternate revenue streams by selling information from reports and analytics (Wixom, Cashing in on your Data 2014). Data selling can take various forms, from a marketplace model where data providers can offer their data (Grubenmann et al. 2017), (Faroukhi et al. 2020a, b), (Rao and Ng 2016), (Kemppainen, et al. 2018), (Spiekermann 2019), (Firouzi, et al. 2020), (Lange et al. 2021); to watching targeted ads for rewards (Yu et al. 2020), (Trzaskowski 2022); to subscription based data sharing models (Al-Zahrani 2020) or merchant models where a third party collects data from its owners, processes it and sells the information to consumers (Saleh, et al. 2021).

As part of data selling, Parvinen et al. (2020), Firouzi et al. (2020) and Calvin et al. (2021) identify three business models that align with the information offering consumption path identified by Buff et al. (2014). These are described below.

Selling data (data offering) This involves selling raw and prepared data directly.

Selling analysis (insights offering) This involves selling analysis and restricting access to the original data. Given that data doesn’t change hands, privacy and security concerns are mitigated. The less versatile use of the analysis given that buyers cannot combine it with other data sources brings to light questions around value. Criteria such as data quality and business context relevance also play a critical role in determining the value of insights (Rix et al. 2021a, b).

Selling data-based services (action offerings) This involves creating a new service that can provide customers with relevant signals on the business environment, help scale how data is delivered using multi-sided business models and can help customers act on insight. An example of a business model that provides customers with relevant signals on the business environment include Facebook’s sale of advertising space that enables publishers to target their specific user groups based on their user data (Matsakis 2018). Business models that help customers act on insights could include consulting, onsite support, process automation and process outsourcing. This is similar to services provided by major management consulting firms such as KPMG, Deloitte, PWC etc.

Faroukhi et al. (2020a, b) introduced data monetization from a Big Data Value Chain (BDVC) perspective. The BDVC describes steps that aim to administer organizations’ data related processes. These steps are: data, acquisition, preprocessing, storage, analysis, visualization and exposition. Given the importance of the data lifecycle, the steps are mapped against the data consumption path (selling data, selling analysis, and selling data-based services). Data monetization can occur at any step. Selling raw data occurs in the initial value chain steps (data generation to data storage). As data becomes more refined, the rest of the steps facilitate selling analysis.

4.2.2 Players (P)

The data monetization ecosystem consists of a variety of players as identified to the right of Fig. 4. Players can be individuals, organizations, or systems. To make an effective external data monetization strategy, a data consuming party and a data providing party must exist. A data monetization player can take on a variety of roles which adds complexity to the ecosystem. The roles could range from public cloud platform providers to third party brokers, facilitators, and consultants (Najjar 2013), (Rix et al. 2021a, b). Thomas and Leiponen (2016) describe data monetization players as models of value creation within data ecosystems. They include data suppliers, data managers, data custodians, application developers, service providers and data aggregators. Faroukhi et al. (2020a, b) proposed to build data business models around four pillars, namely data users, data providers, data facilitators and data aggregators. Rix et al. (2021a, b) proposed 10 roles for the data ecosystem. As per the literature, the following are data monetization players and the value they provide to the monetization ecosystem.

Data providers These are the originators/owners of the data (Grubenmann et al. 2017) (Spiekermann 2019) (Thomas and Leiponen 2016) (Rix et al. 2021a, b). They can also be called data suppliers as they generate the data leveraged in the ecosystem. It could be smart phone users or individuals having some personal data to sell (Bataineh et al. 2020a, b).This could include user generated, IoT sensor generated or company data. Data providers may play multiple roles such as all the other roles defined below. For a multi-sided market, stability of the data providers in the ecosystem and the nature of the domains and parties involved in collecting and sharing data determine the success of this market (Bataineh et al. 2020a, b), (Parvinen et al. 2020).

Data aggregators They combine the data and provide users with aggregated services and data, thereby enabling them to produce a targeted advertising business model. They also perform data crawling and visualization. The most common data aggregators are price comparison services such as the travel search engine Kayak (Faroukhi et al. 2020a, b), (Gopalkrishnan 2013), (Hartmann 2016). There are also aggregators such as Meta, Google, and Twitter where the user does not pay to use the services, but the aggregator monetizes the service through advertising (Robinson 2017).

Data managers They improve the data. These are organizations that catalogue, clean, and parse data that is not in an easily usable format or improve the value of the data with additional context. They add value to data by improving the efficiency, interpretability, and the overall functionality of the data (Thomas and Leiponen 2016), (Klaus 2011), (Saynajoki et al. 2017).

Data regulators They define and help enforce data standards. These organizations recommend and ensure the security, privacy, and ethical use of data. They define standard data technologies and standardization of data transfer (Faroukhi et al. 2021).

Data banks They are the custodians of the data. Data banks are organizations that enable the reuse and resale of data by providing a ‘trusted’ infrastructure (Thomas and Leiponen 2016), (Schwab 2011). According to Saynajoki et al. (2017) data banks orchestrate external data distribution between companies. They also reassure end users and data consumers through provenance validation and certification and auditing services to ensure that the integrity and the quality of data is maintained from sourcing to use (Perrin 2013).

Data brokers They collect and bundle data for prospective buyers. The broker is a platform equipped with the needed infrastructure to store and share data. They provide services that enable providers and consumers to perform data selling and buying transactions (Bataineh et al. 2020a, b), (Lin et al. 2020), (Schroeder 2016). They can be referred to as orchestrators (Rix et al. 2021a, b). Examples of such platforms are Snowflake and the Azure data marketplace. Lin et al. (2020) proposed that adopting brokers has three advantages. First, brokers can continue the trading process when both sides are offline. Second, brokers facilitate a refund thereby protecting the rights of participants. Third, parties do not need to reveal sensitive information because of a decentralized architecture. Brokers resolve privacy issues as they are the bridges that link data providers and consumers.

Data facilitators They are the typical third parties with the required capabilities to share data with data consumers. Facilitators do not own the data but provide services such as data cleaning, data analytics and consulting services (Najjar 2013). Data facilitators could correspond to a technical platform based on tools for data collection, integration, processing, storage, analysis, and visualization (Faroukhi et al. 2020a, b). They provide the physical architecture and the provision of outsourced analytics services.

Tool Providers Schroeder (2016) and Calvin et al. (2021) identified this player category. Hardware and software infrastructures are a significant facilitator of data monetization. From IOT to cloud to on-premises tools, the producers of these tools have a significant contribution to make as they enable all players in the monetization ecosystem. Examples include but are not limited to Microsoft, AWS, and Google who provide both software and hardware solutions. Data brokers, facilitators and tool providers facilitate transactions within the data monetization ecosystem.

Service providers Service providers develop new services for data, distinct to the resale, analysis or repackaging of data or the development of specific applications (Perrin 2013), (Saynajoki et al. 2017).

Application developers They are organizations and software entrepreneurs that design, build and sell applications that enable data monetization (Hammell 2012), (de Reuver et al. 2015). They design and build tools to analyze data (Saynajoki et al. 2017).

Consultants They demonstrate the value of data monetization to data providers and support them in developing strategies (Rix et al. 2021a, b). They can also provide data sourcing and consultation services to help buyers find the right data according to their use case (Luch Kelly 2022). The complexity of the data monetization ecosystem brings about new questions such as what are the use cases for data monetization? What is the best and most scalable architecture that supports a chosen strategy? How can the organization’s structure be designed to successfully deploy a chosen strategy? What are the privacy and ethical considerations to be made? How can existing processes be optimized based on findings from the data? etc. Answering these questions and guiding organizations through their data monetization strategy and execution has brought about a myriad of services offered by consulting firms such as IBM, KPMG, Deloitte, etc. Service providers, application developers and consultants enrich the data monetization ecosystem with their products and services.

Data consumers These are the parties that need and consume the data. They are individuals, businesses or systems that use collected data from various sources such as product usage, behaviors, preferences, Internet activities, IoT, etc. (Faroukhi et al. 2020a, b) and are willing to buy real-time data streams (Lin et al. 2020). Data consumer requirements vary in terms of the type, quality, and amount of data based on their scope and the applications they need (Bataineh et al. 2020a, b). Al-Zahrani (2020) refers to these players as data subscribers.

4.2.3 Revenue model (RM)

The data monetization ecosystem consists of several revenue models as identified at the top of Fig. 4. The identified operating models leverage one or more revenue models, which serve to provide compensation to the players within the ecosystem. A revenue model is seen as one fee or a combination of fees for different players (Kemppainen et al. 2018). The revenue model determines how players will be charged/rewarded for the value they receive/provide in the monetization ecosystem. Data monetization models can use a combination of revenue models to achieve the desired objectives. Spiekermann (2019) using eight revenue models and Kemppainen et al. (2018) using 14 revenue models created a revenue and price model taxonomy which serves as a conceptual starting point. For any revenue model, users’ willingness to pay for and share personal data is critical to success. De Reuver et al. (2015) discovered that the more a user is willing to share data, the less likely they are to pay for an application. As per our literature review, the following consolidated data monetization revenue models have been identified.

Free of charge A strategy to attract users and build a community. This is sometimes referred to as a freemium model where businesses give away basic data to encourage further engagement and charge a premium for access to more detailed data (Thomas and Leiponen 2016), (de Reuver et al. 2015). Popular music and movie streaming platforms such as YouTube, Spotify, Alexa, and Hulu music allow users access to a broad collection of music selection and movies, attracting a large user base and restricting the services offered to freemium members.

Advertising Revenue is mainly generated from advertisers. The competitive advantage for models relying on advertising as the main source of revenue lies in platforms, enabling better ways to gather and evaluate information (Tucker 2014). Kemppainen et al. (2018) propose that when adopting a human centered approach to personal data management, no advertising policy serves as the foundation of a revenue model. The no advertising model reflects the changing attitudes towards personal data usage, individual rights to privacy and companies’ need for finding alternative revenue models.

Subscription (membership) Several subscription-based models exist in the literature. Subscriptions can either be free of charge or fee based to be renewed periodically. Organizations utilize package levels where basic level data (raw data) is the most basic level while more refined, aggregated data is the top level (Najjar 2013), (Spiekermann 2019). In an advertising and subscription-based revenue model, the key drivers of revenue are the number of users and their willingness to pay (Kemppainen, et al. 2018).

Pay-per-use A price is charged per unit of data consumed with this unit needing to be defined. This option is popular for Application Programming Interface (API) access.

Transaction based model Consists of a transaction fee that is time or volume based. The platform operator facilitates data transactions between the stakeholders (Kemppainen et al. 2018).

Service based model Consists of a service fee, a connection fee, and a membership fee. The platform operator generates revenue by offering value-adding services on the platform or charging for the usage of the platform (Kemppainen et al. 2018).

Licensing Data marketplaces often provide standardized licensing models as well as regulations regarding data access and usage (Spiekermann 2019).

4.2.4 Managerial implications of DMS

Today, due to access to a wealth of information, almost every business can aim to be a data business. For an organization to effectively develop a data monetization strategy, there are key considerations to be made. The first step is to access the current ecosystem to identify opportunities, gaps and risks. Organizations need to understand where they are at in the monetization journey and where they want to end up. This evaluation requires careful consideration of the data asset inventory, characteristics of the data that are central to realizing benefits, and metrics for assessing the value of data and return on investment (Quach et al. 2022). In addition, organizations need to determine the value of their data as not all data is of equal value, what insights it can produce, who would be interested (entities internal or external to the organization), how to deliver the information in the most useful format, how much can be paid (revenue model), when to deliver this data as some data may be required in real-time, and finally how to process the data to add value. These questions need to be answered through both internal assessment and competitive market research.

Furthermore, the structure of the organization as well as its analytical and technical capabilities will determine the most successful pathways to data monetization.

The evaluation of the organization’s structure involves answering questions such as: How is the personnel organized to successfully deliver on data mandates? What is the organization’s attitude towards innovation and disruption? Are there dedicated resources for data monetization? Analytical capabilities involve evaluating data skillsets and identifying skill gaps, while technical capabilities involve evaluating the technical data infrastructure (digital platform). Organizations can decide to build inhouse platforms which can be expensive or leverage data marketplaces which provide productive and transparent means for data monetization. Data marketplaces provide a platform to sell datasets, data services or APIs. Data marketplaces enable data monetization by providing an access to a network of data buyers, avoiding costly data integration operations, and enabling small companies to grow data monetization capabilities. A data marketplace offers three benefits. Firstly, it empowers individuals and organizations to monetize rich data that is automatically generated and has become rampant due to the advent of IoT. Secondly, it allows non-technical users such as business managers to easily navigate the complex world of data as these marketplaces are designed like regular everyday websites. Thirdly, it can thrive as a result of big data and the network effect of a two-sided model that brings data producers and consumers together.

4.2.5 Research agenda on DMS

The following three DMS areas require further research. (1) To understand the factors to be considered in an effective data monetization strategy. Such factors could range from establishing a data monetization center of excellence as signified by Alfaro et al. (2019) to developing a data monetization strategy that is part of the organization’s broader strategy. (2) With the myriad of players, there is a need for the academic community to further investigate the interdependencies between multiple roles that players can take on, the value co-creation process, as well as how the overall data monetization ecosystem is governed. (3) There is a need to understand data monetization revenue models based on business models and players within the data monetization ecosystem. Although Kemppainen et al. (2018) studied revenue models at a conceptual level by looking at business models that are suitable for other multi-sided markets, there is insufficient literature on revenue models for data monetization.

4.3 Data monetization infrastructure (DMI)

As per the literature, we identified the Cloud, Blockchain, Sensors and IoT as data monetization infrastructures. In Sects. 4.3.1, 4.3.2 and 4.3.3 we describe the findings and in Sect. 4.3.4 and 4.3.5 we present the managerial implications and the research agenda respectively. In addition, the studies that discuss the DMI category and subcategories are presented in Appendix B: Categorization of papers by themes identified.

4.3.1 Cloud (C)

Cloud computing delivers on-demand computing on a pay-as-you-go model via the Internet. It enables organizations to switch from a CAPEX (capital expenditures) model to an OPEX (operating expenses) model for Information Technology resources. Data monetization relies on distributed architectures such as cloud computing and a trustless (i.e., involved participants do not need to know or trust each other) data trading infrastructure. Cloud computing remains a suitable solution to provide a secure, comprehensive, robust, scalable, and elastic ecosystem to host data monetization. It also provides an efficient model for data monetization as a service (Faroukhiet al. 2020a, b). In the data economy, most data and IoT services reside in the cloud. Massive amounts of data is being generated with the growth of IoT and the value of the data needs to be extracted by a supportable ecosystem such as IoT-Cloud that solves the problem of network resource occupation, high latency, and additional network load by distributing the execution of the computing task in a balanced manner to maximize the benefits of the system (Yu et al. 2020a, b). Cloud computing enables the provision of complimentary data for AI-driven services. Complimentary data is data formed by integrating multiple data types from multiple sources (Saleh et al. 2021). Cloud computing enables data platforms that implement the most valuable data monetization business models. The platforms are not only for selling data but also for delivering various data products and services (Lange et al. 2021).

Cloud computing can be deployed either as Infrastructure as a service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS). It supports various data models such as Analytics as a Service (AaaS), Data as a Service (DaaS), etc.

Analytics as a Service (AaaS) provides a data analytics platform service in the cloud. With huge volumes of data being available, people want answers, not more data. Moreover, the cost of the in-house infrastructure that supports data analytics continues to rise. AaaS, sometimes referred to as ‘agile analytics’, is defined as generating insights from data wherever this data may be located and turning a general-purpose analytical platform into a shared utility (Demirkan and Delen 2013). AaaS is a multifaceted concept that can be offered as SaaS when presented as a reporting application for business end-users. It can also be offered as PaaS that provides data scientists with a data analysis suite for their development. Finally, it can be offered as IaaS that provides virtualized resources to host vast amounts of data (Naous et al. 2017).

Data as a Service (DaaS) is a data management framework provided through services in the cloud to bring data storage, integration, processing, analysis services, security, availability, elasticity, and quality directly to the consumer. DaaS provides data centers on the cloud. It enhances data accessibility through different channels, allows the cleansing and enriching of data to occur in a centralized place and eliminates geographical and scalability limitations. This data is offered to different systems, applications, or users with elastic access to data, scalability, high availability, and system performance by demand (based on service level agreements—SLAs) regardless of geographical or organizational separation of the network (Rajesh 2012).

4.3.2 Blockchain (BC)

Blockchain is a distributed and decentralized ledger with the main purpose of removing third parties. It is a series of data blocks, produced and joined chronologically. It consists of a consensus method, distributed ledger, smart contracts, peer-to-peer network and blocklist containing a cryptographic hash that guarantee reliable transactions by executing a decentralized consensus protocol (Al-Zahrani 2020). The technology uses digital networks in which different types of users can interact and share data (Xie 2020). For data monetization that relies on decentralized peer-to-peer architecture and IoT, this monetization technique is highly effective given there is no need for a third party and high interoperability exists between fog nodes (Khezr et al. (2022a; b). Fog computing is also referred to as edge computing. In a fog computing architecture, companies can make data available to other companies in a peer-to-peer fashion, without needing a cloud intermediary, thereby maximizing the locality of the processing, and avoiding bottlenecks. In this architecture, data processing, filtering and stream-based event generation is done in a fog node. Blockchain allows relationships, commercial agreements, data delivery, access control and access logs to be performed directly between data producers and consumers without the need for mutual trust or a central entity (De La Vega, et al. 2018) (Kolade 2022). Data security is established, and the data users have some confidence in the quality of the data (Javaid, et al. 2020) because poor data quality does not only have financial impacts, but it also has a negative impact on the productivity and the businesses reputation. Blockchain provides a secure, transparent, anonymous, cost effective and decentralized solution for IoT data (Javaid, et al. 2020), (Khezr et al. (2022a; b). It reduces the risk of privacy incidence and avoids disputes with transactions (Xie 2020). In blockchain driven data monetization, ownership rights and identity authentication, the performance of the blockchain network, pricing, security, privacy, and transparency (Al-Zahrani 2020) all contribute to the effectiveness of this monetization infrastructure. Blockchain can help address privacy concerns by offloading the computation over sensitive data to an external network where it may be broken into different nodes and apply cryptographic techniques (Shrobe et al. 2018).

The design features of blockchain such as immutability, transparency, and traceability are being applied to several fields such as medicine, economics, IoT, etc.

In 2022, there has been an uptick in the literature on blockchain technology for data monetization as researchers continue to find solutions to resolve the dominance of storage and delivery networks by cloud providers. This has mainly been due to the successful utilization of technologies such as Non-Fungible Tokens (NFT). NFT is a digital asset that offers ownership guarantees for every asset added on the blockchain network. Blockchain solutions promise not only to provide users with the ability to control their data but also alternatives to challenges found in centralized frameworks (e.g., security and availability) (Madine et al. 2022), (Khezr et al. (2022a; b).

4.3.3 Sensors (S) and IoT

The proliferation of sensors and IoT based devices has led to Analytics 3.0, allowing organizations to make data driven decisions and unlock value through data monetization (Faroukhi et al. 2020a, b). The goal of IoT is to increase the connectedness of people and things. Sensors drive the IoT ecosystem as they detect and measure changes in position, temperature, light, etc. Sensors turn objects into data-generating mediums that often interact with their environment. Infrastructure is required to support data collection, transmission, processing, analysis, reporting and advanced querying. The use of sensors is common in industries such as Energy and Mining, Power and Utilities, Healthcare, Transportation and Vehicles, Industrial Internet, Hospitality, Technology, Financial Services, and retail. Lengyel et al. (2015) proposed a Sensor Hub framework set up as Platform as a Service (PaaS) that serves as an enabler for data monetization. The solution enables collecting sensor data, transmitting, processing, analyzing, and supporting the utilization of data. For smart buildings, for instance, knowledge gleaned is used in optimizing cleaning and waste management processes, preserving heating, cooling, and lighting energy (Saynajoki et al. 2017). IoT is typically enabled by distributed and decentralized architectures such as cloud computing and blockchain, which can offer a secure and dependable way for monetizing IOT data.

4.3.4 Managerial implications of DMI

Given the variety of data sources for data monetization, organizations need to have the right technical infrastructure to retrieve, store, share and track data. Therefore, infrastructure is the technological facilitator for data monetization. There are several infrastructure configurations that support data monetization, ranging from the most fundamental leveraging of a web plugin that controls the access of ad platforms to a user’s browser profile (Parra-Arnau 2017) to data management platforms, to cloud network environments via interfaces and communication protocols (Faroukhi et al. 2020a, b), to data trading platforms that build a secure, reliable and scalable data sharing infrastructure (Lin et al. 2020), (Madine et al. 2022), (Abubaker et al. 2022). In a data marketplace, platform architecture can be based on a centralized or decentralized approach. In a centralized approach data products are offered by different providers via a central location which could be a cloud infrastructure. This central location contains semantified (restructured and optimized to capture contextual relationships) and reconciled data that application developers can access via an application programming interface (API) (de Reuver et al. 2015). In a decentralized approach, the data products remain with the data provider and examples of such framework include blockchain. A data monetization strategy must take infrastructure into consideration since a technical data infrastructure can be developed in-house, outsourced, or delivered as a service (Najjar 2013). Organizations need to consider the following questions: what data is required and how will it be acquired? In what way will the data be processed? In what way will the data be distributed? (Marcinkowski and Gawin 2020).

Infrastructure considerations must include the following. (1) Data-as-a Service (DaaS) for providing raw and anonymized data. Such direct data monetization strategy is indicated when the organization lacks sufficient infrastructure and analytics capabilities. (2) Insights-as-a-Service (IaaS) for when the organization has the capability to aggregate both internal and external data to produce analytical insights and visualization. (3) Analytics-as-a-Service (AaaS) for when the organization not only provides analytical insights but empowers the data consumers with BI tools requiring zero setup and maintenance. Notice that this is similar to cloud models such as IaaS, PaaS and SaaS and therefore has a huge infrastructure burden on the providing organization. (4) An indirect strategy such as data-driven business models that leverage existing data to improve productivity and increase efficiency (Trianz 2022).

With the many infrastructure opportunities come challenges with security, legal and privacy issues, as well as the need for suitable standards. Organizations must avoid the tendency of using an existing infrastructure to enable data monetization as existing infrastructures may be unable to fulfil storage, bandwidth, processing and security requirements. Organizations must plan for a dedicated infrastructure that is secure, scalable, accessible, and well governed (Trianz 2022).

Today, data marketplaces are platforms that allow organizations to share their data with internal and external partners as well as the public. Studies show that organizations that leverage next-generation data marketplaces will gain a competitive digital edge because data marketplaces are the best demand generation platforms and the easiest route to data monetization. A data marketplace can be personal (because it allows consumers to get paid for sharing their data), B2B and IoT based, with B2B marketplaces understandably making up the majority.

Data marketplaces can offer large volumes of actionable data and APIs without having to complete complex transformations. They can be offered both as centralized and decentralized platforms (Luch Kelly 2022), although there has been an increased interest in decentralized platforms due to their promise to address security and privacy challenges.

4.3.5 Research agenda on DMI

The following three DMI areas require further research. (1) Academic research still needs to develop seamless, elegant architectures that can address models for data monetization. Consideration should be given to architectures that support plug-and-play of the components based on the specific monetization model under consideration. Also, additional research needs to be dedicated to addressing how the cloud as a data monetization infrastructure can help resolve challenges such as data quality, security, and privacy. (2) Research on the application of blockchain to data monetization remains highly theoretical and its potential remains untapped due to integrating challenges with existing technologies (Bechtsis et al. 2021). Academic research on blockchain for data monetization infrastructure must progress to the applicability stage. (3) The integrated application of both centralized (i.e., cloud computing) and decentralized (i.e., blockchain) technologies needs to be further explored.

4.4 Data monetization challenges (DMC)

The top seven challenges across all 54 papers are discussed below. Privacy/trust/security and contract design/pricing are the most re-occurring themes in the data monetization literature. 24 papers identified privacy as a major challenge to data monetization given that external data monetization involves distributing raw data or data insights.

In Sects 4.4.1–4.4.6 we describe the findings and in Sects. 4.4.7 and 4.4.8 we present the managerial implications and research agenda respectively. In addition, the studies that discuss the DMC category and subcategories are defined in Appendix B: Categorization of papers by themes identified.

4.4.1 Security (S) and privacy (PV)

Data monetization cannot be discussed without security and privacy. A major obstacle to data sharing is a lack of trust and security. Data security refers to the process of keeping data confidential and protecting it from theft, errors, and accidental destruction (Parvinen et al. 2020). Earlier research on privacy suggests that people make trade-offs between utility, price, and privacy (de Reuver et al. 2015). Even though consumers value their privacy, they tend to provide their information for a monetary value or a service (e.g., users of online services such as Google and Facebook) (Sánchez 2022). There is an increasing regulatory and security concern into the behavior of organizations that sell personal data (Thomas and Leiponen 2016). Security and privacy issues prevent data owners from sharing data amongst themselves despite the profitability from data sharing.

Users’ perception of privacy infringement will continue to pose a risk to the free flow of data between data monetization players. Empirical studies reveal that there has been a dichotomy in human behavior that continues to baffle privacy experts and has been a major hurdle in the development of models that put a price on privacy. Parra-Arnau (2017) attempted to resolve privacy in a web tracking scenario by creating a privacy model that allows for the optimal trade-off between economic reward and privacy. The user’s privacy is ensured by a means of collaborative masking. Rao and Ng (2016) introduced the idea of obfuscation of user information to protect user privacy. Individual personal identifiable information (PII) is stripped off or noise is introduced to the data before the data is sold.

Effective contract designs can help alleviate security and privacy concerns through the establishment of appropriate assurance practices (Najjar 2013). Regulatory complexity and the absence of a legal framework may lead to considerable legal uncertainty with regards to trading data. The current regulatory environment does not have a cohesive and comprehensive set of laws to support a data monetization ecosystem. The ones in existence are siloed given they are created in an ad hoc manner for different institutional purposes (Spiekermann 2019). For example, in the US, there is no single regulation to protect personal privacy. There are a set of laws and regulations for sectors of activity or regions such as California Consumer Privacy Act (CCPA) and Health Insurance Portability and Accountability Act (HIPAA). However, in the EU, the General Data Protection Regulation (GDPR) exists as a single body of rules protecting privacy and personal data (Perrin 2013).

4.4.2 Pricing (PC)

Putting a price tag on data is not an easy task. The reason is that in order to accommodate diverse demands, data sellers devise different plans and pricing schemes for their buyers because buyers can obtain varying utility from the same data. This could be because data users have different uses of a particular data, different skillsets and varying complementary knowledge (Sinha 2019) (Rix 2021a; b). The players involved must mutually agree on the valuation of the traded data. The following characteristics of data by Agarwal et al. (2019) further make this a unique problem: data can be replicated at zero marginal cost, its value to a firm is combinatorial (i.e., the value of a particular dataset to a firm may depend on other datasets available), and the authenticity and usefulness of data is difficult to verify a priori without first applying it.

Rao and Ng (2016) proposed an information market for Internet users to enable the exchange of data. Their model gives users an idea about the value of their information using the concept of Shannon’s information theory, which is a measure of uncertainty of information. Using this, one can estimate the value and price of the information type in the information market. While Shannon’s information theory helps understand the amount of information that has been divulged, there is a need to understand the demand in the information market from the buyers interested in the information. So, the buyers state the amount they would be willing to pay for an information category and the average is used to determine the demand in the information market. Another technique identified by Chao Li (2013) includes linking the price of the data with the amount of noise added to the data by a third party called a “market marker”. In this scenario, the market maker can be prone to act maliciously since they have the unperturbed data. Al-Zahrani (2020) proposes a subscription-based data-sharing model where the users subscribe to a data provider for a specific period and pay for the data access based on the selected subscription plan. Thomas and Leiponen (2016) argued that data packaging and pricing models must be considered to identify what data can be made available, in what mode and at what prices while taking into consideration associated costs.

Recently, there has been more focus on pricing data, but there is still work to be done. Calvin et al. (2021) using a topology formation derived three pricing models for manufacturing from 11 features, namely price determination, price discovery, measurement unit, payment flow, timing of price determination, bundle component, bundling type, degree of integration, differentiation, price dynamics, and value creation. While this topology can be applied beyond manufacturing, academic research needs to consider quantitative and practical applications.

Stein Hannah et al. (2021) proposed a framework that provides four approaches: criteria-based for internal qualitative valuation, reporting-based for external qualitative valuation, cost-based for internal quantitative valuation and transaction-based for external quantitative valuation. This was tested using a case study in the manufacturing context, yet it needs to be tested in a broader context using multiple industries.

Monteiro et al. (2021) identified the need to focus on the value dimension of the Vs of big data. They acknowledge that academic research on value is lacking compared to the three classical dimensions (Volume, Velocity and Variety) and existing studies do not agree on the right way to measure and define this value.

4.4.3 Contract design (CD)

Data is non-rivalrous and only partially exclusive. Non-rivalry means that the same data can be used by many and partially exclusive implies that data is only exclusive within a specified type of use. These characteristics of data emphasize how critical it is to have clear contractual agreements (Thomas and Leiponen 2016). A contract is a legal agreement that states how parties must interact and fulfil their obligations. Contracts involve NDAs (non-disclosure agreements), data sharing and purchase contracts.

Given that data monetization involves strategy designs that involve multiple players and revenue structures, designing an optimal and fair contract agreeable by all parties is critical. Designing contracts helps address IP (intellectual property), privacy, and security issues by ensuring data sold or shared is used for the intended and agreed upon purpose (Najjar 2013). Sinha et al. (2019) propose a contract-theoretical framework to accommodate heterogeneous honest buyers as well as adversarial types. The framework proposes that the seller add noise to data query answers, charge more for lower noise, and thwart rational adversaries by levying fines.

Further academic research needs to consider answering questions that tackle how contracts should be designed to cater for IP protection, pricing concerns, regulatory complexity, data reuse/licensing and data quality.

4.4.4 Data quality (DQ)

The quality of data plays a major role in data monetization. Data quality addresses issues such as accuracy, completeness, consistency, interpretability, and reliability (Thomas and Leiponen 2016). Depending on the quality of data, organizations can choose to only be involved in specific models for data monetization. For low quality data and organizations not willing or able to process data, data monetization may simply be selling raw datasets. Mature organizations with the right infrastructure can sell more than just raw data. They can sell insights, data-based products, and other refined data outcomes. According to Faroukhi et al. (2020a, b) data users often scramble with low quality data, diverse data sources, data management, regulated strategies, and the violation of data privacy. They propose a data management platform to ensure end-to-end integrity of all the processes within an organization to be able to exploit valuable information. Javaid et al. (2020) propose a review system based on blockchain technology that holds the review of users who have used IoT data so that other users can trust the data they are using. The system provides confidence to users that the quality of data is satisfactory. IoT data is heterogenous in nature and therefore create compatibility issues on different platforms (Al-Zahrani 2020). Poor quality data cost business an average of $15 million of losses per year (Moore 2018). Poor quality data has a negative impact on customer trust, product reliability and ultimately business reputation (Al-Zahrani 2020).

4.4.5 Beliefs (B)

Perceptions are difficult to change given they are inherent deep-rooted beliefs of individuals and organizations, hence the need to nurture trust between the parties involved. The lack of trust and security can cause data providers to fear that competitors could benefit from disclosures of in-house data (Spiekermann 2019). It could also discourage data owners from participating as there is a tendency for organizations who have economic benefit to optimize surveillance and manipulation tactics (Trzaskowski 2022). No doubt that contracts can help alleviate concerns. However, having shared values is required to give players in the ecosystem a chance. For a collaborative mutually beneficial relationship, demonstrated trustworthiness, inter-organizational coordination to establish governance mechanisms and successful and repeatable interactions demonstrate reliability (Najjar 2013).

4.4.6 Data skills (DS) and other challenges

Having the right skillset can make or break a data monetization agenda. The right data skillset includes both the technical skills required to orchestrate data from data providers to data consumers. Organizations need to develop strategies to hire and retain the talent required to deliver an end-to-end data strategy (Alfaro et al. 2019).

Other DMCs identified throughout the literature include identifying a trade-off between information transparency and risk of losing information advantage to data consumers (Najjar 2013), the organization’s position in the value network, organization type and culture (Parvinen et al. 2020), IP protection (Thomas and Leiponen 2016), poor infrastructure (Bram et al. 2015), willingness of users to share personal data with app developers and pay for platform applications (de Reuver et al. 2015), lack of demand for data (Spiekermann 2019), regulatory complexity (Najjar 2013), data provenance (Schroeder 2016), standards and accessibility (Schroeder 2016), and internal politics (Schroeder 2016). In the context of Big Data Value Chain (BDVC) and cloud, challenges are related to deployment, scalability, exposition, networking, and enormous resources (Faroukhi, El Alaoui, et al. 2021).

4.4.7 Managerial implications for DMC

Organizations looking to monetize data must deploy security systems such as centralized authentication and authorization, role and data based access control, encryption and data anonymization. The owners of data monetization infrastructure must consider the legal risks, data protection barriers, competitive barriers, data availability problems, and data delivery methods. Data marketplaces can address many of these challenges as they rely on privacy assured, transaction secured and transparent platforms. They remove the effort of finding data providers and foster trustworthy transactions (Luch Kelly 2022).

Given the risks of cybersecurity incidents and the reputational implications of such incidents, many industries (e.g., the health sector) choose not to monetize data. But privacy and data protection laws provide the tools required to ensure individual data is protected and organizations are transparent (i.e., they reveal their commercial practices) (Trzaskowski 2022).

With regards to pricing, the million-dollar question remains what are the most effective means to determine price equilibrium for all the players? Since data is experienced goods, how can pricing mechanisms function if there is less willingness to pay given that buyers do not recognize the value of data because it has not been fully disclosed (Spiekermann 2019), (Rix 2021a; b)? How can pricing models be developed to consider the cost of collecting, maintaining, and making data available? In a data marketplace, how do you determine pricing that satisfies consumers and covers the cost for providers? Without financial incentives, datasets will be poorly maintained. Can someone get the same (or better) data for free somewhere else? Finally, can organizations ensure that data is accurate, updated and obtained through ethical means? These questions still need to be further explored.

4.4.8 Research agenda for DMC

The following DMC related research questions have been identified. (1) Investigate how data monetization can be designed with issues such as privacy and security at the forefront. Designing data-based services with security and privacy in mind is called privacy by design and is particularly important since legal developments are outpaced by technological developments. Privacy by design is an approach that takes privacy into account in the designing of a data product or service. There are reports and principles that provide such design guidance. Examples of such principles include privacy as the default, end-to-end security, avoiding false dichotomies such as privacy vs. security, etc. (2) Contribute to the academic literature on pricing models for data products and develop pricing packages and contract designs with security and privacy in mind. (3) Conduct research on developing data standards that improve quality, accessibility, and combinatorial insights. (4) Conduct research on developing regulation and policies around different types of data such as open data, proprietary data and social media data given that these three types of data barely overlap, and have different sources, uses and implications (Schroeder 2016). (5) Conduct research on developing contracts that are designed to cater for IP protection, pricing concerns, regulatory complexity, data usage/licensing and data quality.

5 Conclusion and limitations

This paper contributes to improving the understanding of data monetization in three ways. First, it provides a holistic understanding of areas within data monetization using a framework derived from the literature. The framework outlines the existing business models based on the research of Wixom (2014), Parvinen et al. (2020), Faroukhi et al. (2020a, b) and enhances it by introducing the players based on value contribution to the monetization ecosystem and the revenue models. The framework categorizes the models based on identified dimensions. The models include internal monetization, indirect monetization, data wrapping, data bartering and data selling with most of the literature describing a model for selling data. The framework goes further by mapping the data selling models against the BDVC phases. Second, the paper systematically derives a broad categorization and sub-categorization for the key themes in data monetization. The categories are Data Monetization Strategy (DMS), Data Monetization Infrastructure (DMI), and Data Monetization Challenges (DMC). The literature review identifies challenges such as privacy, data management, pricing, contract agreement and security which can serve as input for industry as they carve out their data monetization strategy. Third, the paper highlights managerial implications and future research agendas based on the proposed categorization. For DMS, the paper proposes that academic researchers focus on understanding factors to be considered in designing an effective data monetization strategy, developing data monetization revenue models based on chosen business models and players within the monetization ecosystem and on the interdependencies between multiple roles players can take on, the value co-creation process as well as how the overall ecosystem is governed. Organizations need to understand the structure as well as the analytical and technical capabilities that can determine the pathway to data monetization. As stated by Hartmann et al. (2016), there is a need to understand factors that impact data monetization strategies from an ecosystem perspective. This includes the characteristics of data and technological interdependencies that impact data monetization. There is a need to understand how different factors such as data quality impact the choice of a monetization model. For DMI, the paper proposes the development of a cloud and blockchain architecture that supports data monetization models as well as practical applicability of cloud/blockchain to address DMC issues. For DMC, the paper proposes that future research and practice should consider how data monetization can be designed with privacy and security at the forefront, developing data standards that improve quality, accessibility, and combinatorial insights, developing regulation and policies around different types of data and developing contracts that alleviate data monetization concerns.

To the best of our knowledge, this study is the first that considers all the data monetization models that currently exist in the academic literature. Note that the selected literature addresses data monetization from a business perspective, treating data as a resource to generate revenue. Current academic literature does not address monetization from a social perspective where data is treated as a public good or for social initiatives (e.g., Open Data initiative). Insights from data monetization have several sociological and cultural aspects that require research exploration (Thomas and Leiponen 2016).

Inevitably, the work has limitations due to the research design and exploratory nature of content analysis. From a research design perspective, the determination of the sample based on the search string, selection of timelines, database identification and criteria for paper selection (via inclusion and exclusion criteria) contribute to this limitation. Also, since the research follows an exploratory approach, the authors acknowledge the subjectivity of the outcome. Given the novelty of the research area, the authors do not anticipate that changes to these parameters may alter the overall findings.

For future work, we suggest expanding the search criteria by introducing grey literature. The data monetization framework can be further refined to improve the validity of the findings beyond academia. Researchers can also take up alternative methods such as semantic analysis to search for new concepts and better categorization or even to validate the findings of this review. Finally, identified research areas can be further explored to improve the discipline of data monetization.

6 Appendix A: Themes mapped to clusters and paper count

Theme Category—Number of papers that discuss

Cluster #- Color

Data monetization strategy (DMS)—54

1.1 Operating model (OM)- 19

Cluster 3, 6- Blue

1.2 Players (P)—29

Cluster 4- Yellow, Cluster 5- Purple

1.3 Revenue model (RM)—10

Data monetization infrastructure (DMI)—24

2.1 Cloud (C)- 10

Cluster 2- Green

2.2 Blockchain (BC)—14

2.3 Sensors (S) and IoT—9

Data monetization challenges (DMC)—33

3.1 Security (S)—13

Cluster 1-Red, Cluster 7- Orange

3.2 Privacy (PV)—24

3.3 Pricing (PC)—15

3.4 Contract design (CD)—3

3.5 Data quality (DQ)—7

3.6 Beliefs (B)—6

3.7 Data skills (DS)—4

7 Appendix B: Categorization of papers by themes identified

8 Appendix C: Overview of SLR studies, year, journal and paper methodology

Title

Authors

Year

Journal

Paper methodology

Data Monetization: Lessons from a retailer's journey

Najjar, M.S., Kettinger, W.J

2013

MIS Quarterly Executive

Case study

A review of data monetization: Strategic use of big data

Liu, C.-H.; Chen, C.-L

2015

International Conference on Electronic Business (ICEB)

Literature review

Designing viable multi-sided data platforms: The case of context-aware mobile travel applications

de Reuver, M; Haaker, T; Nikayin, F; Kosman, R

2015

Lecture Notes in Computer Science

Case study with survey

How much is your information worth—A method for revenue generation for your information

Rao, D; Ng, W K

2015

IEEE International Conference on Big Data

Deductive: Markovian decision process

SensorHUB: An IoT driver framework for supporting sensor networks and data analysis

Lengyel, L; Ekler, P; Ujj, T; Balogh, T; Charaf, H

2015

International Journal of Distributed Sensor Networks

Descriptive and case study

Utilization and Monetization of Healthcare Data in Developing Countries

Bram, J T; Warwick-Clark, B; Obeysekare, E; Mehta, K

2015

Big Data

Exploratory

A User-Centric Approach to Pricing Information

Rao, D; Ng, W K

2016

IEEE 2nd International Conference on Big Data Computing Service and Applications

Shannons information theory

Big data business models: Challenges and opportunities

Ralph Schroeder

2016

Cogent Social Sciences

Interview

Big data commercialization

Thomas, L D W; Leiponen, A

2016

IEEE Engineering Management Review

Systematic literature review

Capturing value from big data—a taxonomy of data-driven business models used by start-up firms

Hartmann, P.M., Zaki, M., Feldmann, N., Neely, A

2016

International Journal of Operations and Production Management

DDBM frawework with clustering algorithm on 100 companies

Monetizing Personal Data: A Two-Sided Market Approach

Bataineh, A S; Mizouni, R; El Barachi, M; Bentahar, J

2016

Procedia Computer Science

Experimental analysis

Monetizing the user's information asset in internet information market

Rao, D; Ng, W K

2016

IEEE International Congress on Big Data

Mathematical analysis: Information pricing model

Some remarks and ideas about monetization of sensitive data

Piotrowska, A M; Klonowski, M

2016

Lecture Notes in Computer Science

Analyze monetization protocol developed by Bilogrevic et al

Data commercialisation: Extracting value from smart buildings

Säynäjoki, A; Pulkka, L; Säynäjoki, E.-S.; Junnila, S

2017

Buildings

Exploratory: Literature and qualitative analysis

Decentralizing the Semantic Web: Who will pay to realize it?

Grubenmann, T; Dell'Aglio, D; Bernstein, A; Moor, D; Seuken, S

2017

CEUR Workshop Proceedings

Exploratory

Pay-per-tracking: A collaborative masking model for web browsing

Parra-Arnau, J

2017

Information Sciences

Experimental analysis

A Peer-to-Peer Architecture for Distributed Data Monetization in Fog Computing Scenarios

De La Vega, F; Soriano, J; Jimenez, M; Lizcano, D

2018

Wireless Communications and Mobile Computing

Case study

Emerging Revenue Models for Personal Data Platform Operators: When Individuals are in Control of Their Data

Kemppainen, Laura; Koivumäki, Timo; Pikkarainen, Minna; Poikola, Antti

2018

Journal of Business Models

Qualitative questionnaire

A marketplace for data: An algorithmic solution

Agarwal, A; Dahleh, M; Sarkar, T

2019

2019 ACM Conference on Economics and Computation

Descriptive: Mathematical model

BBVA's data monetization journey

Alfaro, E; Bressan, M; Girardin, F; Murillo, J; Someh, I; Wixom, B H

2019

MIS Quarterly Executive

Case study

Data Marketplaces: Trends and Monetisation of Data Goods

Spiekermann, M

2019

Intereconomics

Exploratory and qualitative

A Fully Decentralized Infrastructure for Subscription-based IoT Data Trading

Lin, C.-H.V.; Huang, C.-C.J.; Yuan, Y.-H.; Yuan, Z.-S.S

2020

IEEE International Conference on Blockchain

Exploratory

Advancing data monetization and the creation of data-based business models

Parvinen, P; Pöyry, E; Gustafsson, R; Laitila, M; Rossi, M

2020

Communications of the Association for Information Systems

Qualitative: Exploratory and descriptive

An adaptable Big Data Value Chain (BDVC) framework for end-to-end big data monetization

Faroukhi, A Z; Alaoui, I E; Gahi, Y; Amine, A

2020

Big Data and Cognitive Computing

Systematic literature review

An Intelligent Game based Offloading Scheme for Maximizing Benefits of IoT-Edge-Cloud Ecosystems

Yu, M; Liu, A; Xiong, N N; Wang, T

2020

IEEE Internet of Things Journal

Mathematical analysis

Big data monetization throughout Big Data Value Chain: a comprehensive review

Faroukhi, A Z; El Alaoui, I; Gahi, Y; Amine, A

2020

Journal of Big Data

Systematic literature review

Data-driven business model development †“ insights from the facility management industry

Marcinkowski, B; Gawin, B

2020

Journal of Facilities Management

Qualitative case study

Monetizing Mobile Data via Data Rewards

Yu, H; Wei, E; Berry, R A

2020

IEEE Journal on Selected Areas in Communications

2 stage Sackelberg game

Reputation System for IoT Data Monetization Using Blockchain

Javaid, A; Zahid, M; Ali, I; Khan, R.J.U.H.; Noshad, Z; Javaid, N

2020

Lecture Notes in Networks and Systems

Exploratory

Subscription-Based Data-Sharing Model Using Blockchain and Data as a Service

Al-Zahrani, F A

2020

IEEE Access

Inductive: Model creation

Toward monetizing personal data: A two-sided market analysis

Bataineh, A S; Mizouni, R; Bentahar, J; El Barachi, M

2020

Future Generation Computer Systems

Mathematical analysis

A Novel Approach for Big Data Monetization as a Service

Faroukhi, A Z; El Alaoui, I; Gahi, Y; Amine, A

2021

Advances in Intelligent Systems and Computing

Systematic literature review

Big Data Monetization: Platforms and Business Models

Monteiro, D.S.M.P., Meira, S.R.L., Ferraz, F.S

2021

Iberian Conference on Information Systems and Technologies, CISTI

Systematic literature review

Cloud as platform for monetizing complementary data for AI-driven services: A two-sided cooperative game

Bataineh, A.S., Bentahar, J., Wahab, O.A., Mizouni, R., Rjoub, G

2021

IEEE International Conference on Services Computing, SCC

Modelling

Conceptualizing Data Ecosystems for Industrial Food Production

Calvin, R., Hannah, S., Qiang, C., Jana, F., Wolfgang, M

2021

IEEE 23rd Conference on Business Informatics, CBI

Design data ecosystem and Case study

Data-driven secure, resilient and sustainable supply chains: gaps, opportunities, and a new generalised data sharing and data monetisation framework

Bechtsis, D., Tsolakis, N., Iakovou, E., Vlachos, D

2021

International Journal of Production Research

Literature review and case studies

From Qualitative to Quantitative Data Valuation in Manufacturing Companies

Stein, H., Holst, L., Stich, V., Maass, W

2021

IFIP Advances in Information and Communication Technology

Case study & exploratory action research

Ideation is Fine, but Execution is Key': How Incumbent Companies Realize Data-Driven Business Models

Lange, H.E., Drews, P., Hoft, M

2021

IEEE 23rd Conference on Business Informatics, CBI

19 experts interviews and literature review

Insight monetization intermediary platform using recommender systems

Hanafizadeh, P; Barkhordari Firouzabadi, M; Vu, K M

2021

Electronic Markets

Design science: Literature review and model creation

Pricing Models for Data Products in the Industrial Food Production

Rix, C., Frank, J., Stich, V., Urban, D

2021

IFIP Advances in Information and Communication Technology

Exploratory following the procedure of typology formation by Welter

Towards data markets in renewable energy forecasting

Goncalves, C; Pinson, P; Bessa, R J

2021

IEEE Transactions on Sustainable Energy

Mathematical analysis

Untangling the Open Data Value Paradox: How Organizations Benefit from Revealing Data

Enders, T., Benz, C., Satzger, G

2021

Lecture Notes in Information Systems and Organisation

Semi structured experts interviews

Blockchains and the disruption of the sharing economy value chains

Kolade, O., Adepoju, D., Adegbile, A

2022

Strategic Change

Conceptual paper

Data-driven value extraction and human well-being under EU law

Trzaskowski, Jan

2022

Electronic Markets

Exploratory

Cloud Computing as a Platform for Monetizing Data Services: A Two-Sided Game Business Model

Bataineh, Ahmed Saleh, Jamal Bentahar, Rabeb Mizouni, Omar Abdel Wahab, Gaith Rjoub, and May El Barachi

2021

IEEE Transactions on Network and Service Management

Modelling

AI-Driven Data Monetization: The Other Face of Data in IoT-Based Smart and Connected Health

Firouzi, Farshad, Bahar Farahani, Mojtaba Barzegari, and Mahmoud Daneshmand

2020

IEEE Internet of Things Journal

Conceptual, reference architecture and case study

A Scalable, Standards-Based Approach for IoT Data Sharing and Ecosystem Monetization

Figueredo, Ken, Dale Seed, and Chonggang Wang

2020

IEEE Internet of Things Journal

Conceptual, reference architecture and case study

User incentives for blockchain-based data sharing platforms

Jaiman, Vikas, Leonard Pernice, and Visara Urovi

2022

Plos one 17

Architecture proposal and evalution

Trustful data trading through monetizing IoT data using BlockChain based review system

Abubaker, Zain, Asad Ullah Khan, Ahmad Almogren, Shahid Abbas, Atia Javaid, Ayman Radwan, and Nadeem Javaid

2022

Concurrency and Computation: Practice and Experience

Exploratory

Blockchain and NFTs for Time-bound Access and Monetization of Private Data

Madine, Mohammad, Khaled Salah, Raja Jayaraman, Ammar Battah, Haya Hasan, and Ibrar Yaqoob

2022

IEEE Access

Exploratory

Towards a secure and dependable IoT data monetization using blockchain and fog computing

Khezr, Seyednima, Abdulsalam Yassine, and Rachid Benlamri

2022

Cluster Computing

Exploratory and evaluation

An Edge Intelligent Blockchain-based Reputation System for IIoT Data Ecosystem

Khezr, Seyednima, Abdulsalam Yassine, Rachid Benlamri, and M. Shamim Hossain

2022

IEEE Transactions on Industrial Informatics

Exploratory and evaluation

A General Approach on Privacy and its Implications in the Digital Economy

Sánchez, Mariola

2022

Journal of Economic Issues

Exploratory

Digital technologies: tensions in privacy and data

Quach, Sara, Park Thaichon, Kelly D. Martin, Scott Weaven, and Robert W. Palmatier

2022

Journal of the Academy of Marketing Science

Exploratory

9 Appendix D: Summary of players and value generated

Value Generated

Players

Definition

Generate data

Data providers

Originators/owners of the data/data suppliers as they generate the data leveraged in the ecosystem. It could be smart phone users or individuals having some personal data to sell. It could be user generated, IoT sensor generated or company data

Combine data

Data aggregators

Combines the data and provides aggregated services and data, thereby enabling them to produce a targeted advertising business model. They also perform data crawling and visualization. Common data aggregators are price comparison services such as the travel search engine Kayak. Others include Meta, Google, and Twitter

Improve data

Data managers

These organizations catalogue, clean, and parse information that is not in an easily usable format or improve the value of the data with additional context. They add value to data by improving the interpretability and the overall functionality of the data

Define and enforce data standards

Data regulators

Define and help enforce data standards. These organizations recommend and ensure the security, privacy, and ethical use of data

Custodians of data

Data bank

Custodians of data that enable the reuse and resale of data by providing a ‘trust’ infrastructure

Facilitate data transactions

Data brokers

Collect and bundle data for prospective buyers. The broker is an online platform or cloud platform equipped with the needed infrastructure to store and share data. They provide services that enable the data provider and data consumers to perform data selling and buying transactions. They can be referred to as the orchestrators

 

Data facilitators

Have the capabilities to share data with data consumers. Facilitators do not own the data but provide services such as data cleaning, data analytics and consulting services. Data facilitators could correspond to a technical platform based on tools for data collection, integration, processing, storage, analysis, and visualization. They provide the physical architecture and the provision of outsourced analytics services

 

Tool providers

Provide hardware and software infrastructure for data monetization. Examples include but are not limited to Microsoft, AWS, and Google who provide both software and hardware solutions

Enrich monetization ecosystem

Service providers

Develop new services for data, distinct to the resale, analysis or repackaging of data or the development of specific applications

 

App developers

Design, build and sell applications that enable data monetization

 

Consultant

Demonstrates the value of data monetization to data providers and support them in developing strategies

Consume data

Data consumers

Consume/subscribe to the data. They are individuals, businesses or systems that use collected data