1 Introduction

The Internet allows for almost ubiquitous transactions, access to and exchange of information, and instant communication. Due to the unprecedented supply of data, the Internet has altered how people relate to and access information or data, respectively. Besides the emergence of new markets, this wealth of information has also led to a major transformation of existing markets. Prior to the Web 2.0 development, the market for data could be characterized as a private large-scale information exchange between major companies [17]. However, in light of the newly available abundance of data sources as well as the variety of storage and processing options, it is not surprising that data are increasingly supplied and demanded publicly on the Internet. Besides (partially) free platforms such as Wikipedia or Wolfram Alpha, commercial data marketplaces have emerged; examples include http://knoema.com, Microsoft Azure, Freebase, or http://datamarket.com (recently acquired by Qlik). In this paper, we develop a classification framework for data marketplaces.

In recent years, we have conducted several surveys in the area of data marketplaces, to gain an understanding of their offerings, their functionality, their business models, and their dynamics [35, 36]. During this analysis, we found that a clear definition of data marketplaces and the market for data is still missing. This paper aims at closing this gap, by providing such a definition and presenting a classification framework for data market places. Every emerging market is characterized by a number of participants entering and leaving, while developing resolutions and strategies for the number of challenges that new markets or products entail. The high number of providers eventually leaving the field of trading data in the past few years illustrates that data marketplaces seem to be particularly challenging. Interviews with founders of the visualization tool SwivelFootnote 1 yielded that, aside from the “usual” management issues, the main obstacle to their business was that users were willing to pay for the services in the “single-digit” area only [14]. The Internet, the very medium that led to the transformation of data markets in the first place, is also one of the major threats to their business model; indeed, users are accustomed to have constant access to information for free which results in a rather low willingness to pay for data. Thus, companies with a focus on data provisioning need to find suitable strategies to generate revenue with their offerings.

Considering how much those strategies are an alluring field of research for business administration and information systems and how much the concept of a data market is discussed in the blogosphere, the lack of formal research on this topic is surprising. On the Internet, informal evaluations by journalists and platform operators can easily be found. Informative examples are [7, 10, 16, 17, 22].Footnote 2 The only studies of the market with formalized standards are by Schomm, Stahl, and Vossen who surveyed the data market on a selected sample size in 2012 and 2013. They characterize the market through an increasing “proliferation of data as a commodity” and identify several trends, most notably a trend towards high-quality data [35, 36].

The theoretical and empirical research concerning data, data markets, and data marketplaces is filled with a number of different, partly contradictory terms: electronic markets, e-hubs, or data vendors [15]. Most of these terms do not properly describe the underlying concepts concerned with data exchange. Providing a definition for data marketplaces allows us to relate this development to traditional markets and marketplaces as known from the field of economics. Furthermore, it allows to provide clarity whenever the term is used, which is currently not the case—as the term data market may refer to the overall market for data, an online platform facilitating the trading of data, or even vendors. A clear definition also allows for further studies to clearly include and exclude providers of marketplaces, as we have done in the latest iteration of our data marketplaces study [37].

The remainder of this paper is organized as follows: After a short review of existing classification models from neo-classical economics in Sect. 2, our own model of categorizing electronic marketplaces is presented as a framework for provider characterization in Sect. 3; additionally, the obstacles to the marketability of information goods are presented: the difficulty of value attribution, information asymmetries, and the particular cost structure of information goods. How those can be overcome in the case of data and how data are distributed and allocated on electronic marketplaces are discussed in Sect. 4. A consideration of the relevance to our study is given in Sect. 5, where we outline our latest survey results. Section 6 concludes the paper.

2 Markets and marketplaces

In everyday language, the terms market and marketplace are commonly used synonymously without taking their differences into account. However, to understand data marketplaces, it is important to define the terms for the purpose of establishing a common understanding. Neo-classical economics—the currently widely accepted economic model—consider marketplaces to be the physical or virtual implementation of markets. Markets are defined as the concrete place where the interactions of buyers and sellers determine the price and the quantity of a good or a service [21, 32]. This implies that a market commonly focuses on one product [3]. In contrast, the term marketplace for a given good is the explicit place of encounter in terms of time and location where market participants prepare and execute transactions, i.e., it provides the infrastructure for trading [13]. An examples is a marketplace focusing on flowers, e.g., Dutch tulips. The marketplace can be considered the real interpreter of supply and demand that coordinates output [33]. This means that the difference between a market and a marketplace can be attributed to the level of abstraction. A marketplace is the infrastructure that enables the abstract concept of a market. Indeed, the sum of all market-based transactions, e.g., selling and buying a specific good in a specific region, constitute a market [26]. For instance, one could investigate the PC market in the UK, which is constituted by all PC-related transactions in the country through various channels, such as online and offline marketplaces.

On a market, the abstract place of trade, potential and realized trading relationships determine the economic equilibrium of price and quantity of a product [32]. Both the entirety or segments of the economic structure can be addressed with the term “market” [33]. It serves three main functions:

  1. 1.

    Institution The market as an institution is a framework of rules that governs the behavior of the participating agents. It assigns the roles of the agents (e.g., intermediary, seller, etc.) and sets expectations and protocols on their behavior. Further, participants willing to trade find a medium allowing them to satisfy their exchange goals [28].

  2. 2.

    Transaction The market is constituted by the sum of all market-based transactions. In turn, the market defines the process of transactions [31]. The transaction itself is, according to [27], constituted by four distinct phases: (1) the information phase where agents collect information on products and form concrete exchange intentions in the form of bids and offers; (2) the negotiation phase where negotiations on the product, the contract terms, and the price are carried out and which ends in a contract; (3) the transaction phase where the contract is fulfilled and the commodity is exchanged; and (4) the after-sales phase where customer service is crucial to individualize and enhance the customer’s satisfaction and commit them. Other authors use different phases, for example [31] splits the information phase into an information and intention phase while aggregating the transaction and after-sales phase. In order to be considered part of an electronic market, at least one transaction phase needs to be performed electronically. Most researchers consider the information phase to be the minimal requirement as in this phase demand and supply are matched globally and immediately [31].

  3. 3.

    Pricing mechanism Markets are a mechanism through which buyers and sellers interact to set prices. To be more precise, the price is the equalizing element that coordinates the actions of buyers and sellers on a market. Furthermore, prices signal the conditions of exchange to other participants [32]. The market as a pricing mechanism is closely linked to the efficient market hypothesis: once supply and demand have equalized by the optimal price that clears the market, the allocation of goods is pareto-optimal and social welfare is maximized [19, 32].

3 Information technology, data, marketplaces

The rapid development of modern information and communication technology (ICT) also constitutes the development of a new medium through which market relations, transactions, and information can be processed and realized [31]. This new medium enabled by ICT is an electronic infrastructure which companies, individuals, and governments can use to create virtual marketplaces where they previously did not exist [5]. Indeed, ICT has led to the creation of virtual trading areas where products, services, and information are sold [27]. Furthermore, due to the on-demand availability of large computing power, high-capacity storage as a service from the cloud, and application service provisioning, completely new categories of goods have emerged, most notably data. Data in various forms (raw, aggregated, processed in various forms, etc.) can nowadays be traded just like any other good, and platforms supporting this resemble marketplaces for traditional goods.

Moreover, ICT integration has entailed several configurations of traditional market mechanisms like more flexible price setting or faster transaction performance, but its defining new quality is the mechanization of information processing, leading to a drastic increase in information production [28].

This reshaping is not without consequences to the current set of definitions. The position of electronic markets in the existing framework is not self-evident, and up to date no commonly agreed upon definition of electronic markets and marketplaces has been established. A patchwork of several definitions—ranging from electronic markets as agora Footnote 3 [31] to information systems [9]—hampers respective research. We suggest the relationship between electronic markets and electronic marketplaces to be analogous to the (often neglected) distinction between their real-world counterparts.

3.1 Electronic markets

As implied above, “the electronic market as an electronic medium is based on the new digital communication and transaction infrastructure” [31]. Accordingly, electronic markets are submarkets qualified by the electronic infrastructure they are based upon [3]. Analogously to the economic market definition, an electronic market is the abstract summary of all market-based allocation on the basis of electronic media [33].

Understood as a submarket, the three main market functions defined above—institution, transaction, and pricing mechanism—remain unchanged on electronic markets. Electronic markets deviate from the traditional realization mainly in two regards: the implementation of the institution function is more complex because the ubiquitous nature of electronic markets makes the assignment of rules and language difficult, and it deviates in pricing [31]. As in traditional markets, pricing is the principal signal of the value and conditions of the good offered. Price composition with regard to transaction costs and the cost structure of virtual goods may be different. Since transaction costs are one of the main elements in pricing, the facilitation via ICT typically leads to a drastic drop in the costs of a good [2].

Electronic markets are often discussed in terms of their transformation power and can be considered a convergence of the market towards a perfect market [13]. With respect to higher accessibility, lower entry barriers, and their ubiquity, electronic markets carry a high advantage over traditional markets [31]. Given their higher transparency, electronic markets are usually attributed an improved allocation coordination [31]. These advancements give electronic markets an advantage over traditional forms of market organization. Especially transaction cost theory asserts that by implementing an electronic infrastructure the transaction costs become negligible, improving the competition and almost completing the conversion towards a perfect market [34].

3.2 Electronic marketplaces

Following the previously introduced distinction between markets and marketplaces, an electronic marketplace is the concrete agency or infrastructure that allows participants to meet and perform the market transactions, translated into an electronic medium. Yet, the term is often used to describe various concepts of e-commerce and market organization or as a synonym for electronic markets. Wang and Archer [39] present a summary of prevalent definitions and group them concept-wise. They outline two fundamental types in the mass of definitions: electronic marketplaces as governance structures and as business models which can be characterized as follows:

The Business Model dimension is effectively the definition of an electronic marketplace: the concrete virtual institution and place of exchange that brings together supply and demand and supports the trade between providers and customers, i.e., transferring the market function into an electronic infrastructure. Any type of business action on online platforms or any type of electronic venture falls into this dimension with no regard to whether they are based on competition or on collaboration [39].

Definitions covered by the Governance Structure dimension actually refer to electronic markets in the abstract sense. As such, these definitions do not really reference electronic marketplaces.

3.3 Typology of electronic marketplaces

Electronic marketplaces manifest in different shapes and can be categorized along various dimensions. As a result of the overlapping definitions of electronic marketplaces, the categorizations are equally confusing. Each model uses different definitions which makes a general classification of the various forms of business models difficult.

In their literature review, [39] find nine common categories of electronic marketplaces: number of participants; relationship dimension; participant behavior; ownership; industry scope; market mechanism; products; power asymmetries and fee structure. Some models implement all or most of these categories, e.g., [1, 13, 30, 33]. Although those models are capable of reflecting every particular manifestation of specific platforms, they do not allow for meaningful conclusions on the prevalence of categories in quantitative empirical research. For example, the application of the model by [1] in a study with 31 samples returned zero findings in six categories, which illustrates that simpler models with less categories enable a more concise typology.

Other models, e.g., [8, 12, 24, 27], differentiate providers based on the relationship dimension into buyer-biased, seller-biased, and neutral. Those simpler models, however, often merge several dimensions, especially the ownership and the relationship dimension, without specifying that they are indeed distinct dimensions. Concerning the evaluation of data marketplaces, an examination of the prevalent forms of electronic marketplaces with respect to the relationship dimension and the market/hierarchy differentiation is most interesting. On the markets, market forces are allowed to operate freely; in contrast, in a hierarchy model—i.e., in an exchange of goods within organizational boundaries—the operator of the infrastructure, be it supplier or demander, has an advantage over the other party involved.

All transactions between suppliers and buyers can be classified as either hierarchical or market-based. In the market, the quantity and price of a good are determined by market forces among competitive offerings while hierarchical relations are characterized by pre-determined limitations for a specific price and specific buyers or suppliers. The relative advantages of the strategies depend on the transaction costs and the structure of the good [20].

As no comprehensive model incorporating all of the above has yet been developed, we present a new model, illustrated in Fig. 1, incorporating the market/hierarchy divide as well as the correlations between the nine categories identified by [39]. The high correlations between the number of participants, the relationship dimension and market mechanism as well as the correlation between ownership and power asymmetries allow for an aggregation of categories [39].

Fig. 1
figure 1

A model of electronic marketplaces discerning between three ownership types, inspired by [27] and [8], cited from [40]

First, providers are placed on a scale between hierarchy and market. Furthermore, marketplaces are categorised based on their ownership, which can be (a) private, i.e., owned by a single company (seller or buyer); (b) consortia-based, i.e., owned by a small number of companies (seller or buyer); and (c) independent, i.e., the marketplace is run as a platform without any connection to sellers or buyers. The differentiation between vendor-based and marketplace-based electronic marketplaces has some implications. While marketplaces as platforms are inherently independent, marketplaces driven by vendors (or buyers) are likely to be biased in their respective favour.

Based on these dimensions, our model differentiates six business models. At the hierarchy level, privately owned platforms typically facilitate the procurement or selling of its owner (a company) in closed systems and only allow for one-to-many or many-to-one relations. In between hierarchy and market, consortia-based platforms implement many-to-few or vice versa relations and are typically a collaboration of several companies in the same industry that seek to facilitate their sales or procurement processes. Those platforms are closed because the entry into the platform (into the consortium) is only theoretically possible.

At the market level, many-to-many marketplaces are usually operated by an independent intermediary and only have minimal entry restrictions. Many-to-many platforms on which operators are also trading their own products and services are a special case. These platforms are not independent and neutral because operators run them with the biased interest of facilitating their sales [15]. In this case, operators and competing suppliers form the supply side even though the association between the agents may not be formalized. For the purpose of our model, they are consortium marketplaces because they operate in a similar way to real consortia. The competition on consortium platforms is higher than on purely hierarchical systems but still lower than on marketplaces due to the entry restrictions [32].

This model, depicted in Fig. 1, intends to close the gap between theoretical models that are hard to apply in empirical studies and simpler models with little explanatory power and a simplified focus on the ownership. Through the aggregation of ownership, power asymmetries, and number of participants into six different business models the number of types is manageable while allowing for meaningful conclusions.

4 Discussion

Following the integration of electronic data marketplaces into the existing neo-classical framework for markets, several qualifiers and disqualifiers can be developed to allow for a clear identification and characterization of data marketplaces as electronic marketplaces. In particular, the following criteria are defined:

  1. 1.

    Having established that markets and marketplaces are shaped by the goods they focus on, a provider’s primary business model needs to be providing data and/or related services to be a data marketplace.

  2. 2.

    Data marketplace providers need to offer an infrastructure that allows customers to upload, browse, download, buy, and sell machine-readable (e.g., RDF or XML) data. The data have to be hosted by the providers and it needs to be clear whether the specific data come from the community or the operator to classify as an electronic marketplace in the narrow sense.

Moreover, this has some implications regarding side constraints. As already indicated, marketplaces should focus on one particular good. For data marketplace this good is data and data-related services. However, to be indeed well exchangeable, automatically processable and hence useful, it has to be in machine-readable format. This rule applies, for example, to Wikipedia: its infrastructure allows users to freely upload and/or access information, which is not easily machine-processable though. Data vendors only linking to data locations without hosting the data proper (such as the list of data sets on KDnuggets.com) are also excluded because this type of provider offers a directory rather than data itself.

Despite meeting the above criteria, offerings from government agencies or non-governmental organizations (NGOs) providing free data are not regarded as data marketplaces because publishing and trading data is not their core business. If at all, they are only remotely relevant, as they publish data as a side effect of their general purpose and are not set on commoditizing data or even finding an appropriate business model. A large number of cities, provinces, and countries—the Global Open Data Index counts 79 countries—participate in the Open Government movement [11]. This movement aims at publishing government data to allow for more transparent and citizen-oriented participation and innovation [23]. Transnational organizations such as the United Nations or the World Bank and NGOs like interaction.org promote their objectives by sharing their findings. The research on this emerging field is still developing; two notable works are [6, 29].

5 Relevance

The purpose of defining data marketplaces is to closely monitor relevant real-world implementations of the opportunities offered by recent technical innovations, such as cloud infrastructures. Empirical research on this emerging field is still scarce. Three surveys on data marketplaces have been conducted by several of the authors between 2012 and 2014. The most current one has employed the classification system presented here [3537].

This last survey has looked at 72 different data marketplaces and data providers to characterize the data market and to identify trends in data-related business models. Through empirical research using the provider definition outlined, it becomes evident that the technical side of information provisioning is far less severe than its economic side, e.g., choosing the appropriate pricing strategy. We found that, regarding the pricing models, flat rates enjoy a clear advancement over pay-per-use models among the surveyed data providers, often combined with freemium models. Whether this is to accommodate customers or to make use of lock-in effects should be subject to future research. Customers might be unsatisfied with granular pricing models that also restrict unfocused data exploration and prefer simpler subscriptions. Up to now, flat rates remain the most attractive pricing model for providers because of the more stable revenue generated by subscription plans without additional costs due to the practically non-existing marginal costs. Furthermore, pay-per-use models have not (yet) reached the necessary level of sophistication to prevent arbitrage exploitation. Research is currently conducted to find technical and policy amendments [4]. Until then, the trend towards flat rates is not surprising and most likely indicates a rather low competition among the providers so that they still have plenty of options for differentiation and no need for price competitions.

The study presented in [37] suggests two distinct scenarios with very different data access requirements. In the first scenario, data are used as a type of manufacturing input and customers expect complete, formatted, and reliable data. In order to process the acquired data further and use it as a basis for the production of another good, its quality must be extremely high and the access to it must be reliable. However, it does not need to be very specific and pre-analyzed. In the second scenario, data are considered as add-on in the process of decision making and a specialized product that can be spot-purchased whenever necessary or may be acquired on a regular basis. Its quality is not of crucial importance compared to the importance of its specificity. In the add-on scenario, customers do not depend on the data quality, but rather expect a higher individuality of the product to match their particular wishes. In contrast, data buyers in the first scenario would more likely expect a constant standard which they can depend on. Examples of the first scenario are the financial data APIs offered by Xignite, Bloomberg PolarLake, or Interactive Data. The specialized inputs in the second scenario could be some enrichment services like CrowdSource, crawling services like 80legs or address sellers like xDayta [18].

In general, hierarchical structures are more prevalent among the providers than intermediate platforms [37]. This could possibly be linked to the reach/ scope hypothesis touched upon in Sect. 3.3. A possible explanation may be, considering that the data market is mainly a B2B market, that hierarchical relationships are easier to implement which is a favorable feature in a B2B market. Also, private customers tend to have a lower willingness to pay for data [25]. The observations hint that the data market is developing towards a mainstream market also targeting non-technical companies and users: a high number of providers offer several access possibilities but limit the number of data formats. The restriction to mostly standard formats such as reports or CSV files probably aims to reduce the presuppositions for data usage.

6 Conclusion

The development of IT has brought about innovations in both technical and commercial areas which have led to the emergence of new business models for data exchange. The question of how to make data provisioning profitable is relevant to entrepreneurs and academic research alike. Still, the successful distribution of data is impeded by complex pricing mechanisms combined with a generally low willingness to pay on the buyers’ side. Well-studied economical principles for markets and marketplaces can help to explain and mitigate those concerns which make an integration of electronic data marketplaces into the existing economic framework necessary. The provision of such a theoretical foundation as in this paper complements previous studies by ourselves and others regarding the dynamics when selling data and data-related services. It allows for further research regarding the business models of data providers, pricing strategies, and the distribution of data.

As pointed out earlier, the Internet has enabled a number of service developments in recent years. Many of these resemble traditional phenomena from the real world, or try to transform such phenomena to the virtual world, often even without realizing what is going on. It then happens that only after a while the originators of such a transition discover that what they considered “new” in the virtual world has had quite a tradition in the real world already, and there are indeed underpinnings that could help to better understand the virtual world.

It is our conviction that this paper, which tries to bridge the gap between computer science and economics, can help to avoid unnecessary explorations of what has been explored already in other domains. At the same time, this study provides a common language to facilitate the comprehension of what is happening on the data market and on data marketplaces.

As a result, several topics for further research arise. Indeed, the obstacles and concerns raised in the proliferation of data on data marketplaces should lead to research concerning cloud sourcing of data. Pricing strategies such as ones based on data quality [38], trading options, or auctioning systems are all to be reconsidered for data marketplaces, and to be adapted to the digital nature of the goods being at stake. Moreover, as any market, data markets will undergo, and partially have already undergone, a diversification into “black” and “white” markets, where data are traded illegally in the former. To this end, it will be both interesting and relevant how to detect this and how to protect data from being traded on a black market.