1 Introduction

The increasing salience of digital and data products has presented new challenges for economists and business analysts trying to understand their value. Standard econometric methods that work well for physical goods or financial products are often inadequate when applied to data products due to their unique characteristics. The non-rival nature of data products and associated spillovers, together with their high fixed but low marginal costs, mean standard pricing and valuation techniques will generally be inappropriate (Shapiro and Varian 1999, Goldfarb and Tucker 2019, Jones and Tonetti 2020). There is a need for the development of alternative valuation methods that can account for these distinctive features of data.

Recent efforts to devise new techniques mainly focus on a subset of desired features that pricing of data should satisfy (Muschalle et al. 2013). However, there is no consensus regarding a standard approach at present. The proposed approaches can be broadly divided into static appraisals, focused on ‘one-off’ business decision-making (Coyle and Manley 2021; Pei 2022), and dynamic ones generally focused on monetization or on theoretical data markets (Ghorbani and Zou 2019, Deep and Koutris 2016), although few such markets exist. Other examples include real time algorithmic pricing techniques (Agarwal et al. 2019, Yoon et al. 2020). While the value of data is widely acknowledged across the public and private sectors, only a few empirical methods have been applied to real business situations. What’s more, the benefits sought by organizations from data valuations extend beyond monetization. Value is a powerful metric for data management within the organization, enabling the allocation and redirection of resources to the most valuable assets. Reviews of the literature to date on empirical valuation methods and some of their applications are presented in Coyle and Manley (2023), Pei (2022), and Fleckenstein et al (2023).

We introduce here a novel method for valuing data drawing on real options theory. This addresses a gap in the prior literature for a straightforward valuation method appropriate to practical business decision contexts. Our approach builds on the well-established real options literature, outlined below, and we discuss the key parameters needed in valuing data options in practice. We then show how this approach can be applied by providing a case study of Tesco's Clubcard, the customer loyalty scheme for one of the UK’s largest supermarket businesses. This case illustrates how the method could help businesses make strategic decisions about the value of the data they hold; although there is no single 'correct' point value, the options approach we propose would determine whether the value is non-zero and will help identify which contextual features determine the value of the data held.

One of the main advantages of our methodology is that it allows for a representation of the uncertainty associated with investments in data initiatives and thus the value they potentially create, even when data are being accumulated for other business purposes and future uses might not yet be fully known. Traditional valuation methods, such as discounted cash flow analysis, rely on a single point estimate of the future value of the asset being valued. This can be problematic when the value of the asset is uncertain and dependent on multiple sources of uncertainty. Real option valuation, on the other hand, explicitly accounts for uncertainty, future technological developments and changes in market behavior and thus allows for a range of potential outcomes to be considered. Furthermore, real options valuation relies on well-understood toolkits borrowed from financial mathematics, which makes the method relatively straightforward to apply. In this, it contrasts with other methods proposed in the prior literature.

2 Real options

2.1 Background

The commonly used approach to determine the value of investments is to compute the net present value (NPV) of the investment, which is the sum of the present values of the expected future cash flows it generates, discounted at a suitable rate. Project valuation founded on comparing the net present value of future costs and earnings has become highly sophisticated, accounting for tax and financial structures, and potentially cascading risks. Risk analysis in particular forms an important part of the exercise. However, the standard method, while widely accepted and used in practice, often assumes a statistically stable environment (although discount factors can be adjusted for risk drawn from a given distribution) and relies on a set of projections or forecasts of costs and returns, which may additionally be subject to optimism or pessimism bias. Importantly, it lacks the flexibility to account for decisions changing the future path of the variables of interest during the life of the investment project (Kumar 2015).

Real options provide a dynamic alternative accommodating uncertainty and incorporating mid-course flexibility. A real option is an opportunity (but not requirement) to act or take a decision at some point in the future, creating economic value in the context of uncertainty about how things will turn out. They are named ‘real’ because they are usually considered in the context of decisions to invest (or not) in physical assets. Real options are similar to financial options in providing the holder with the ability, but not the obligation, to take a specific course of action. But while financial options provide the right to buy or sell a financial asset, real options incorporate the flexibility to make decisions that could affect the outcome of an investment project. Examples of real options include the option to abandon a project if it becomes unprofitable, to expand the scope of a project if it exceeds expectations, or to delay a project to wait for more favorable market conditions. If the option possibilities are ignored or there is no flexibility, any current valuation will overestimate downside risks or fail to realize positive opportunities. Thus, the real option value, relative to the NPV, is increasing in the amount of uncertainty in the business decision environment, and in the flexibility of the project (Mayer and Schultmann 2017). It is therefore a useful approach in volatile contexts, for example, when the decision-maker can benefit from ending a project that is performing less well than expected or expanding an outperforming project (Damodaran 2005).

A real option valuation can be then used to supplement the standard NPV approach for valuing an investment, enabling the decision-makers to understand the implications of adjusting their strategy as market conditions or the economic environment changes. In this sense, real options can act as a form of insurance limiting downside risk while maximizing upside opportunities (de Neufville et al. 2006), by actively responding to developments that were uncertain at the time of the initial decision (Van Putten and MacMillan 2004). One limitation, however, is that real options may inflate the value of risky projects. Although higher volatility can increase the potential range of outcomes for an underlying asset, and therefore increase the potential upside, it may also decrease the risk-adjusted rate of return. If the rate of return used in the valuation is not appropriately risk-adjusted, this can result in an inappropriately high estimated value of the option (Marzo 2005).

2.2 Real option valuation approaches

There are various methods of calculating the value of real options, which fall into either analytical or numerical approaches (Mayer and Schultmann 2017). Practitioners have generally preferred analytical methods, such as the Black-Scholes formula, for their simplicity, while in academic research numerical procedures dominate. This is due to the fact that numerical methods are able to better handle multiple sources of uncertainty and a large number of scenarios which cannot be easily modelled with other methods. Monte-Carlo simulation methods and game-theoretic approaches are popular numerical methods that can account for any stochastic process assumed to be followed by the underlying asset's value. These methods are understood well enough for theoretical manipulation and flexible enough for practical use. However, these numerical methods can be computationally intensive and time-consuming and hence their use outside academia is yet limited (Csapi 2019, Mayer and Schultmann 2017).

3 Methods

We propose a real option valuation approach for estimating the value of data when the potential uses of that data are not immediately apparent. Consider a retail company that may be examining collecting data on their consumers with one immediate use case in mind, but with the possibility of using that data for other purposes in future; for example, the initial use case could be more efficient stock control, and a future use case such as selling shelf locations to particular suppliers might take time to emerge. The value of the collected data are thus dependent on its possible future use cases through its option value.

One of the main advantages of using real option valuation to estimate the value of data are that it allows for a more comprehensive analysis of the as-yet-unknown potential value. As pointed out above, standard valuation methods, such as discounted cash flow analysis, rely on a single point estimate of the future value of the asset for one single use case at the time. Real option valuation, on the other hand, explicitly accounts for uncertainty and allows for a range of potential outcomes to be considered. Data are an intangible asset, like financial options, but at the same time like any other asset requires companies to decide how much to invest in gathering and structuring it so that it is available to inform future business activity. There is no consensus about the methodology for valuing data assets, and hence nor for how businesses should make decisions about what to invest in them. Here we propose that a real option approach captures the economic value provided by data when the additional scope for insight it provides may help decision-makers in an uncertain business environment. This is consistent with findings that companies using data analytics tend to have higher productivity than others (e.g., Brynjolfsson et al. 2021). This method is also simple to implement.

Specifically, we present a method based on the Black-Scholes formula, due to its simplicity and familiarity to practitioners. Moreover, the Black-Scholes formula requires relatively few inputs, which is advantageous when there are information constraints on parameters and the use of proxies is necessary. This can be especially important when dealing with complex data sets where there may be limited information available on certain parameters. With this method, results are easily reproducible, which is useful when comparing valuations over time or across different data sets. The use of a widely accepted and understood method can also reduce the likelihood of bias in the valuation process and can ensure that the limitations surrounding the model are well-known and accounted for in the valuation process. In terms of limitations, in particular, the Black-Scholes model assumes continuous asset prices, and the existence of an efficient and arbitrage-free market. These conditions cannot always be satisfied or checked in the context of real options and, as a result, a real options valuation can move far from the ‘true value’ (Csapi 2019, Marzo 2005). Nevertheless, we suggest the benefits of using this approach outweigh these limitations, not only because of the absence of other standard methods for data valuation, but also because the alternative (often implicit) assumption of zero value is most likely wrong.

3.1 Variable selection for the black-scholes method

In financial economics, the Black-Scholes model is widely used for pricing options contracts. It allows calculation of the price P of a contract that grants the holder the right to buy an asset at a predetermined price K, known as the strike price, after a specified period of time τ has passed. The underlying asset's value S is assumed to follow a geometric Brownian motion, characterized by its volatility σ. At the expiration date, the option holder evaluates the value of the underlying asset. If exercising the option by purchasing the asset at the strike price is financially advantageous, they do so. Otherwise, they let the option expire, accepting the loss of the option price P.

The equation for the option price reads

$$P\, = \,SN(d_{1} )\, - \,Ke^{ - r\tau } N\left( {d_{2} } \right),$$
(1)

where

$$d_{1} \, = \,\left[ {{\text{ l}}n \, S/K\, + \,\left( {r \, + \, \raise.5ex\hbox{$\scriptstyle 1$}\kern-.1em/ \kern-.15em\lower.25ex\hbox{$\scriptstyle 2$} \, \sigma^{2} } \right).\tau \, } \right] \, \,/ \, \sigma \surd \tau$$
$$d_{2} \, = \, d_{1} \, - \, \sigma \surd \tau$$

and r is the risk-free rate, \(N\) is the cumulative Normal distribution and P the real option value.

The Black-Scholes formula is a well-established model found in finance textbooks and much used in practice, so does not require an in-depth elaboration here. However, in the context of real options values for data, it is essential for practitioners to select appropriate values or proxies to be used as input parameters for their specific applications. This is particularly important in this context as the selection of input parameters can greatly influence the final valuation results (as we show in the example below). Therefore, we now contextualize each input parameter and discuss the use of specific proxies, when applicable.

3.1.1 Value of the underlying asset

This is the initial value of the project using conventional market methods. This value is subject to uncertainty and will change with market behaviors and as information arises. In the context of data valuation, it is the value of the dataset established in a conventional way. This value is uncertain and may change over time, for example, the value of a static set of customer data may decrease as they change their preferences. Various methods are possible, but a natural choice would be the cost of creating and maintaining the data over time. An alternative would be an estimate of the income its use has generated (Coyle and Manley 2023). For investment decisions, income-based approaches may be more appropriate, thus one can use an estimate of the present value of forecast future incomes derived from one or more known use cases of the data, such as increased revenue or profits increase and/or decreased total costs from higher efficiency of processes. These calculations will often already have been made as part of a business case. However, the cost of creating and maintaining the data are a reasonable alternative and is the method of data valuation being adopted in national statistics.

3.1.2 Strike price

In the context of options, the strike price refers to a pre-determined fixed price that allows the holder to take a specified action. For the purpose of valuing data options, the strike price would be the estimated cost of analyzing the dataset for its intended use case, which would include expenses such as labor costs, computing power, setup of necessary infrastructure, and other relevant implementation costs. The relevant accounting information is regularly estimated by firms and can be directly used in the data valuation.

3.1.3 Time to expiration

The time to expiration parameter represents the duration during which the option can be exercised. However, in the context of real option valuation for data, a fixed value for this parameter may be too inflexible. Instead, firms often make budget decisions that naturally serve as a point of decision-making at the end of an option's period. Additionally, while data do not have an expiration date, their value often tends to diminish over time, and many valuable use cases for data in business are time sensitive. The appropriate duration for this parameter will depend on the specifics of each case: a one-time data purchase may require a shorter expiration date than an ongoing data collection project. Nonetheless, businesses will generally set a cut-off time to ensure abandoning unprofitable projects.

3.1.4 Risk-free rate of return

The theoretical return on a risk-free investment. This could be based on US Treasury bills, for US-based valuation. Alternatives would be the Bank of England SONIA rates and EURIBOR for the UK and Europe.

3.1.5 Volatility of the underlying asset

Volatility refers to the degree of variation of an asset's price or value over time. It is a measure of the degree of uncertainty or risk associated with the asset's future value. Valuing a data option requires the knowledge of how the value generated by a data set has changed over time or how price has fluctuated for a particular data set. It is not implausible that some data owners will have their own information, particularly data hubs which will sell data regularly, or companies with large volumes of data which they have utilized in multiple ways. However, this may only be applicable to a small set of use cases. To estimate the volatility, practitioners may examine the stock price volatility of competitor firms that have undergone similar data-driven projects or the firm's past projects (Li and Hall 2020). Alternatively, an approach proposed by Ker and Mazzini (2020) involves estimating the value of data to the US economy by looking at the market capitalization of a constructed list of 64 US-based data-driven firms with public stock listings. In the absence of more specific proxies, the volatility of this index, adjusted for the market volatility, can serve as a possible estimate for the volatility parameter in valuing a data option.

We anticipate that most applications will only take into account one source of uncertainty, typically the potential value of the dataset. However, in some situations, there may be other sources of uncertainty, such as uncertainty regarding the strike price or the time to expiration that could further complicate the analysis. Nonetheless, the flexibility of our real option valuation method allows practitioners to incorporate additional sources of uncertainty through moderation of the Black-Scholes method (Damodaran 2005), where the variation of the option value can be assessed by varying the input variables. It is worth noting that any additional source of uncertainty will lead to an increase in the option value of the dataset. Therefore, when considering only one source of uncertainty, the resulting value could be considered a conservative estimate of the 'true' option value of the dataset.

4 An application to Tesco's Clubcard scheme

4.1 Context

Tesco, one of the largest UK groceries and general retailers, introduced its popular Clubcard scheme in 1995. This initiative aimed to enhance customer engagement and loyalty by providing personalized offers based on customer shopping behavior. A rich set of data on customer purchasing history enables Tesco to tailor its marketing strategies and deliver a more personalized shopping experience to its customers, reducing customer churn and optimizing inventory.

The scheme has emerged as a highly successful data initiative within the retail sector. While some authors (East and Hogg 1997) have previously argued that Tesco's growth may be attributed to factors such as geographical store expansion, there is widespread recognition that the Clubcard Scheme has played a pivotal role in providing a competitive advantage to Tesco, enabling it to consistently outperform competitors (Boothby 2007, Mesure 2003, Humby et al. 2007). In particular (Boothby 2007), it has been shown that Tesco's cumulative sales-to-space growth has been faster than that of its direct UK competitor, Sainsbury's plc.

The main cost associated with the Tesco Clubcard scheme has been attributed to opportunity costs, specifically the loss of revenue resulting from reducing prices and offering discounts in exchange for customer data, whenever a customer uses their card in a purchase. Between 1995 and 2002, the scheme resulted in nearly £1 billion in such foregone income (Mesure 2003). While there is no exact breakdown of this figure into fixed initial set-up costs and unrealized profits, it can be assumed that the former may be low compared to the overall opportunity cost. This assumption is supported by Tesco's acquisition of a 53% stake in Dunnhumby in 2001 for £30 million (Wood & Lyons 2010), as Dunnhumby was the main "data driver" behind the initial Clubcard scheme and bore the costs for its implementation. It is reasonable to assume that the price paid for the implementation of the scheme was a fraction of the company's valuation, thus negligible compared to the total opportunity cost.

4.2 Variable selection and key assumptions

In 1995, Tesco and Sainsbury's had similar stock market value and market share. In our study, we therefore compare Tesco's performance to that of Sainsbury's, as they were direct competitors with similar initial financial conditions. Our assumption is that the two competitors would have sustained comparable sales per space growth, in the counterfactual absence of any alterations to their circumstances or strategies.

According to historical figures on space and sales growth (Boothby 2007), Tesco would have achieved a cumulative sales growth of approximately 55% had it followed Sainsbury's performance from 1995 to 2001. However, Tesco considerably outperformed its competitor, realizing a cumulative growth of 96%. A key assumption is, then, that Tesco's surplus sales growth, compared to Sainsbury's, can be attributed to the innovation introduced by the Clubcard scheme.

In practice, to estimate the underlying value of the variable S, we rely on Tesco's annual pre-tax profits derived from publicly accessible accounting data sourced from the UK Companies House database. From this figure, we remove the yearly expected profit attributable to sales growth associated with the expansion of physical in-store space. This is computed using Sainsbury’s sales-to-space growth ratio as a proxy for Tesco’s counterfactual performance in the absence of the Clubcard scheme. The space and sales growth figures used for the adjustment between 1995 to end-2001, are extrapolated from (Boothby 2007), and their cumulative value can be seen in Table 1.

Table 1 Comparison of Tesco’s and Sainsbury’s percentage space and sales growth over the 7 years period 1995-2001.

Our analysis is restricted to the interval from 1995 to the end of 2001 due to the availability of estimates for the opportunity costs exclusively within this timeframe (Mesure 2003). We use the opportunity costs as a proxy for the strike price K, neglecting set-up costs that would be directly related to the data collection process.

We derive the historical volatility from the average monthly change in retail sales value for food stores statistics provided by the UK Office for National Statistics (2024), as that is a direct, interpretable measure of uncertainty associated with sector's financial performance and its inherent risk (Malik 2015), and it is consistent with the other metrics used for value, thus making it a suitable proxy.

Taking into account these considerations, our illustrative exercise of valuing the data option associated with the establishment of the Tesco Clubcard scheme incorporates the variables presented in Table 2.

Table 2 Variables and proxies for Tesco Clubcard example

4.3 The option value of Tesco's Clubcard scheme

With our method, by simply plugging the estimated proxies into equation (1) we estimate the data option value of the Clubcard scheme to be £206.5m. Table 3 summarizes the estimated necessary variables based on our proxies set out in Table 2. The final two columns of Table 3 show the impact of changing the proxy values by 20% of their original setting on the value of the data real option; the time horizon, strike price and risk-free rate values have the biggest impact. This application of the Black-Scholes formula can be considered a backward-looking exercise because the variables used in the calculation, such as Tesco's excess profits, space expansion, volatility, and estimated opportunity costs, are now known. This means that in this illustration, where the data and relevant parameters are available retrospectively, it was possible to calculate the data option value easily. For the intended applications of this method, the parameter values necessary to calculate the option value of a digital scheme will probably not be available at the desired time of computation. Therefore, these values would need to be estimated from historical data or substituted with alternative proxies.

Table 3 Parameter values and impact analysis of their change used in the Tesco Clubcard example for the proxies in Table 2.

While the primary goal of this exercise is to apply the methodology to a real example and suggest how to go about choosing suitable proxies for valuing data options, our findings indicate an estimated option value in the order of the reported £300 million that Tesco received from selling its Clubcard data (Wallop 2023). In practical settings, practitioners may face uncertainties and challenges in estimating the values and uncertainties associated with data options. With this backward-looking exercise, our intention is to demonstrate the process and illustrate how to consider the selection of proxies and the associated uncertainties when valuing data options in real-world contexts.

5 Conclusion

In conclusion, our proposed method offers a simple and practical solution to the challenge of valuing data. Using well-established option pricing theory incorporating relevant parameters, we have demonstrated a flexible and easy-to-use method that can be adopted by practitioners across various industries.

While we have emphasized how the value calculated with our method would likely result in a lower bound for the real value of the option since the full spectrum of future use cases for data sets is unknown, it offers a valuable starting point for companies looking to make strategic decisions about their data assets. Agarwal et al. (2019) argue that it is unlikely a company will have priors regarding the usefulness of a dataset; however, we have provided an alternative approach based on parameters that could be known or estimated that would allow an option valuation to be possible, even though future use cases may indeed be unknown. We have demonstrated our method using the simple example of Tesco's Clubcard scheme data. Although this example was designed for illustrative purposes only (since the relevant information is public), it provides a guide to selecting appropriate values and a proof of concept. The estimated option value for the data are a plausible lower bound, given the reported sale value of Tesco’s Clubcard data.

In future research, it would be beneficial to explore the use of modified Black-Scholes formulae that can account for early exercise, such as in “American” options, or for uncertainty regarding the exercise date. Additionally, efforts should be directed towards developing tools for practitioners to assess the value and uncertainty of their use cases and to select the appropriate proxy for volatility when only a subset of potential use cases is known. These developments would enhance the applicability of our valuation method.