Skip to main content
Log in

Pricing private data

  • SPECIAL THEME
  • Published:
Electronic Markets Aims and scope Submit manuscript

Abstract

We consider a market where buyers can access unbiased samples of private data by appropriately compensating the individuals to whom the data corresponds (the sellers) according to their privacy attitudes. We show how bundling the buyers’ demand can decrease the price that buyers have to pay per data point, while ensuring that sellers are willing to participate. Our approach leverages the inherently randomized nature of sampling, along with the risk-averse attitude of sellers in order to discover the minimum price at which buyers can obtain unbiased samples. We take a prior-free approach and introduce a mechanism that incentivizes each individual to truthfully report his preferences in terms of different payment schemes. We then show that our mechanism provides optimal price guarantees in several settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. It is possible to produce an unbiased statistic from a biased sample (e.g., with the Horvitz-Thompson estimator), but here we take a prior free approach and do not restrict attention to a specific statistic. It is thus natural in our setting to aim for unbiased samples.

  2. This implies that the value that buyer b sees in obtaining access to an unbiased sample of d b individuals is higher than the price that he is asked to pay. Our mechanisms work more generally for settings where the demand does not change drastically with the price or the market-maker has a good estimate of the right range for the price. See (Gkatzelis et al. 2012) for details.

  3. In order to verify this fact, note that the value \(\overline {r}\) which maximizes f(r)(c max wr) will always correspond to the \(\hat {r}_{i}\) value for some seller i. If this were not the case, then slightly decreasing the value of \(\overline {r}\) would not affect f(r), but it would increase (c max wr), a contradiction. Hence, in order to avoid suboptimality, the value of r would in general be “controlled” by some seller i for whom \(\overline {r}=\hat {r}_{i}\). This seller can, in general, increase \(\overline {r}\) by lying in order to slightly increase \(\hat {r}_{i}\).

  4. For instance, this is the case if c i (k) = k and \(c_{j}(k) = 2 \sqrt {k}\).

  5. We can significantly reduce the number of values that a seller needs to report (1) if the sampling is based on bundling buyers’ demand (e.g., if we use the ordering distribution) and (2) if there are relatively few different sample sizes requested by buyers, e.g., because buyers choose from predefined sample size options.

  6. It is unrealistic to assume that the mechanism will choose the distribution for the sampling as a function of the values that sellers report because then π max cannot in general be elicited truthfully.

  7. A potential issue here is that seller j might not put in the effort needed for quantifying this certainty equivalent value (because his utility is not affected in any way by his report) and, as a result, not report the correct value. We can avoid this by not telling seller j which lottery each question corresponds to and/or by adding artificial questions about certainty equivalents of lotteries.

  8. Note that q ψ is different than p ψ which is a distribution over the number of times that seller i will be sampled.

References

  • Acquisti, A., John, L.K., & Loewenstein, G. (2013). What is privacy worth? The Journal of Legal Studies, 42(2), 249–274.

    Article  Google Scholar 

  • Aperjis, C., & Huberman, B.A. (2012). A market for unbiased private data: Paying individuals according to their privacy attitudes. First Monday, 17(5).

  • Carrascal, J.P., Riederer, C., Erramilli, V., Cherubini, M., & De Oliveira, R. (2013). Your browsing behavior for a Big Mac: economics of personal information online. In 22nd International World Wide Web Conference, WWW ’13, Rio de Janeiro, Brazil, May 13-17, 2013 (pp. 189–200).

  • Cummings, R., Ligett, K., Roth, A., Zhiwei Steven, W., & Ziani, J. (2015). Accuracy for sale: Aggregating data with a variance constraint. In Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science, ITCS 2015, Rehovot, Israel, January 11-13, 2015 (pp. 317–324).

  • Cvrcek, D., Kumpost, M., Matyas, V., & Danezis, G. (2006). A study on the value of location privacy. In Proceedings of Workshop on Privacy in the Electronic Society (pp. 109–118).

  • Dandekar, P., Fawaz, N., & Ioannidis, S. (2012). Privacy auctions for recommender systems. In WINE (pp. 309–322).

  • Ghosh, A., & Roth, A. (2011). Selling privacy at auction. In ACM Conference on Electronic Commerce (pp. 199–208).

  • Gkatzelis, V., Aperjis, C., & Huberman, B.A. (2012). Pricing private data. SSRN eLibrary.

  • Haddadi, H., Mortier, R., & Hand, S. (2012). Privacy analytics. SIGCOMM Computer Communications Review, 42(2), 94–98.

    Google Scholar 

  • Hann, I.-H., Hui, K.-L., Lee, S.-Y.T., & Png, I.P.L. (2007). Overcoming online information privacy concerns: An information-processing theory approach. Journal of Management Information Systems, 24(2), 13–42.

    Article  Google Scholar 

  • Holt, C.A., & Laury, S.K. (2002). Risk aversion and incentive effects. American Economic Review, 92, 1644–1655.

    Google Scholar 

  • Huberman, B.A., Adar, E., & Fine, L.R. (2005). Valuating privacy. IEEE Security Privacy, 3(5), 22–25.

    Google Scholar 

  • Mas-Colell, A., Whinston, M.D., & Green, J.R. (1995). Microeconomic Theory: Oxford University Press.

  • Riederer, C., Erramilli, V., Chaintreau, A., Krishnamurthy, B., & Rodriguez, P. (2011). For sale: your data: by : you. In Tenth ACM Workshop on Hot Topics in Networks, HOTNETS (p. 13).

  • Roth, A., & Schoenebeck, G. (2012). Conducting truthful surveys, cheaply. In ACM Conference on Electronic Commerce (pp. 826–843).

  • Singer, N. (2012). You for sale: Mapping, and sharing, the consumer genome: The New York Times.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vasilis Gkatzelis.

Additional information

Responsible Editor: Rainer Böhme

The first two authors were working at HP Labs when this work took place.

Appendix: A Proofs

Appendix: A Proofs

We now provide the proofs of the results that were omitted from the main section.

Proof of Theorem 4.1

Consider some seller i and first suppose that \(c_{i} < \max _{j \ne i} \hat {c}_{j}\). We observe that reporting any \(\hat {c}_{i} < \max _{j \ne i} \hat {c}_{j}\) will not make a difference in the utility of seller i regardless of the outcome of the sampling; furthermore, he will derive a strictly positive utility every time seller i is sampled. On the other hand, if he reports \(\hat {c}_{i} > \max _{j \ne i} \hat {c}_{j}\), then seller i will be excluded from the sampling and derive zero utility. The second case to consider is that \(c_{i} > \max _{j \ne i} \hat {c}_{j}\). Then, by reporting \(\hat {c}_{i} = c_{i}\), seller i’s utility is equal to zero. However, there is no way of getting positive utility in this case. In particular, by reporting \(\hat {c}_{i} < \max _{j \ne i} \hat {c}_{j}\), seller i will get negative utility whenever he is sampled.

Thus, reporting \(\hat {c}_{i} \ne c_{i}\) can never increase the utility of seller i but in some circumstances may actually decrease it. This shows that truthful reporting is a dominant strategy for each seller. To show ex-post individual rationality, we observe that by reporting \(\hat {c}_{i} = c_{i}\) seller i gets a positive utility whenever sampled and zero utility otherwise.

Proof of Lemma 4.3

Let Z i be a random variable that denotes the number of times seller i is sampled. We have that \({\sum }_{i = 1}^{n} Z_{i} = {\sum }_{b \in B} d_{b}\) in order to meet the demand. Observe that the expected number of times that seller i is sampled under distribution ψ is \(\mathbb {E}[Z_{i}] = {\sum }_{k=0}^{m} k p_{\psi }(k)\). Since ψ ∈ Ψ, this distribution produces unbiased samples, which implies that each seller is sampled the same expected number of times. Thus, summing over all sellers,

$$n^{\prime} \sum\limits_{k=0}^{m} k p_{\psi}(k) = \sum\limits_{i = 1}^{n} \mathbb{E}[Z_{i}] = \sum\limits_{b \in B} d_{b},$$

which concludes the proof.

Proof of Theorem 4.2

If the payment will be determined by the function π max, Theorem 4.1 implies that it is a dominant strategy for seller i to report c i truthfully and that we get ex-post individual rationality. For a seller i with \(c_{i} < \max _{j \ne i} \hat {c}_{j}\), it is a dominant strategy to also report his certainty equivalent for (p ψ , π max) truthfully in Step (2), because his report \(\hat {e}_{i}\) does not affect the threshold \(\overline {r}_{i}\). Finally, if the payment is determined to be \(\pi _{i}(k) \equiv \overline {r}_{i} + \hat {c}_{i} (k - w)\) and seller i has reported \(\hat {c}_{i}\) truthfully, we have ex-post individual rationality because \(\pi _{i}(k) - c_{i} k = \overline {r}_{i} - c_{i} w > \hat {r}_{i} - c_{i} w = \hat {e}_{i} > 0\).

We now turn to the seller i with \(c_{i} > \max _{j \ne i} \hat {c}_{j}\). By reporting any value \(\hat {c}_{i} > \max _{j \ne i} \hat {c}_{j}\), the seller will not be sampled and gets utility zero. By reporting \(\hat {c}_{i} < \max _{j \ne i} \hat {c}_{j} < c_{i}\), seller i gets a negative utility if assigned payment π max in Step 3. On the other hand, if assigned the payment \(\pi _{i}(k) \equiv \overline {r}_{i} + \hat {c}_{i} (k - w)\), seller i derives utility \(u_{i}(\overline {r}_{i} - \hat {c}_{i} w - (c_{i} - \hat {c}_{i})k)\) which may be positive for small values of k.

We now show that if seller i is risk-neutral or risk-averse, i.e., not risk-seeking, he derives negative utility in expectation. In particular, we have \(\overline {r}_{i} - \hat {c}_{i} w - (c_{i} - \hat {c}_{i})k < (c_{i} - \hat {c}_{i}) (w-k)\). Thus, \({\sum }_{k} p_{\psi }(k) (\overline {r}_{i} - \hat {c}_{i} w - (c_{i} - \hat {c}_{i})k) < {\sum }_{k} p_{\psi }(k) (c_{i} - \hat {c}_{i}) (w-k) = 0\). And since u i is concave or linear, \({\sum }_{k} p_{\psi }(k) u_{i}(\overline {r}_{i} - \hat {c}_{i} w - (c_{i} - \hat {c}_{i})k) < {\sum }_{k} p_{\psi }(k) u_{i}(c_{i} - \hat {c}_{i}) (w-k) = 0\).

To conclude the proof, we consider the case that the seller i with \(c_{i} > \max _{j \ne i} \hat {c}_{j}\) is risk-seeking. Then, it is plausible that seller i is better off reporting \(\hat {c}_{i} < c_{i}\) in order to get utility \(u_{i}(\overline {r}_{i} - \hat {c}_{i} w - (c_{i} - \hat {c}_{i})k)\) which is positive for small values of k, but negative for large values. Even though such preferences are very unlikely, for the sake of completeness we describe how our CE mechanism can be extended to deal with this issue for the sake of completeness.

To avoid such situations, the mechanism can ask each seller ji to report his certainty equivalent for the lottery \((p_{\psi }, \pi _{-i}^{max})\), where \(\pi _{-i}^{max} \equiv \max _{j\ne i} \hat {c}_{j}\), and use these values to determine the threshold \(\overline {r}_{i}\) for seller i. Then, seller i will be included in the sampling only if \(\hat {r}_{i} < \overline {r}_{i}\). This guarantees truthful reporting and individual rationality for selleri. Moreover, each seller ji has no reason to lie about his certainty equivalent for \((p_{\psi }, \pi _{-i}^{max})\).Footnote 7

Proof of Theorem 4.5

Let S denote the set of all subsets of B (i.e., the powerset of B), and let S b S denote the set of all such subsets that include buyer b. We sort all buyers in a non-increasing order of the sample sizes that they request, i.e., \(d_{b^{\prime }} \geq d_{b}\) if b <b. What follows is described from the perspective of some arbitrary seller i. Given a distribution ψ ∈ Ψ(N ), let q ψ (s) denote the probability that seller i’s data is sold to all buyers in s but nobody else.Footnote 8 Since q ψ is a distribution over S, we have that \({\sum }_{s\in S} q_{\psi }(s)=1\). Since ψ ∈ Ψ(N ), the probability that buyer b gets seller i in his sample must be equal to d b /n , where n is the number of sellers in N . Equivalently, for each buyer b, \({\sum }_{s\in S_{b}} q_{\psi }(s) = d_{b}/n^{\prime }\).

The ordering distribution ψ = ψ (N ) satisfies the following simple predicate: If buyer b gets access to the data of seller i then so does buyer b − 1; equivalently, \(q_{\psi ^{\ast }}(s)=0\) for every s∈(S b S b−1). We will show that G(ψ) is minimized at ψ over all ψ ∈ Ψ(N ) using proof by contradiction. Assume that G(ψ) < G(ψ ) for some distribution ψ ∈ Ψ(N ) that does not satisfy the predicate. We will gradually modify distribution ψ until it satisfies the predicate without increasing the expected value G in the process (if g is strictly concave, then this modification leads to a decrease in G).

Let b be the first buyer in the ordering for which the predicate is not true, i.e., there exists some set s A that contains b and does not contain b − 1 (s A ∈ (S b S b−1)) such that q A q ψ (s A )>0. Since d b−1d b , there must also exist some s B that contains b−1 but not b, and occurs with some positive probability q B q ψ (s B ) > 0. Define q min ≡ min(q A , q B )>0. Let s I (resp., s U ) denote the outcome that contains exactly the intersection (resp., union) of the buyers in s A and s B . We modify ψ by removing probability mass q min from s A and s B and moving it to s I and s U . This leads to a new probability distribution \(\hat {q}\) over S such that \(\hat {q}(s_{A})=q_{A}-q_{\text {min}}\), \(\hat {q}(s_{B})=q_{B}-q_{\text {min}}\), \(\hat {q}(s_{I})=q(s_{I})+q_{\text {min}}\), and \(\hat {q}(s_{U})=q(s_{U})+q_{\text {min}}\); for all other sS we have \(\hat {q}(s)=q_{\psi }(s)\). \(\hat {q}\) corresponds to some distribution \(\hat {\psi } \in {\Psi }(N^{\prime })\).

We now show that \(G(\hat {\psi }) \leq G(\psi )\). Let n α be the number of buyers in s α , and d the number of buyers in s A s B ; thus, n A = n I +d, n U = n B +d, and \(G(\psi ^{\prime }) - G(\hat {\psi })\) is equal to

$$\begin{array}{@{}rcl@{}} &&q_{\text{min}} g(n_{I}+d) + q_{\text{min}} g(n_{B}) - q_{\text{min}} g (n_{I}) - q_{\text{min}}g(n_{B}+d) \\ &=&q_{\text{min}} [(g(n_{I}+d)-g(n_{I})) - (g(n_{B}+d)-g(n_{B}))] \geq 0 \end{array} $$

The inequality holds because g is concave and n B >n I .

We can repeat the modification step for this same pair of buyers b and b−1 as long as the predicate is not satisfied. After every modification, either q(s A ) or q(s B ) becomes 0 and the probabilities of these sets are never raised again during the modification steps for this same pair of buyers. Thus, only a finite number of modifications is needed until the induced lottery satisfies the predicate. Since the expected value G does not increase at any point during this process, we conclude that G(ψ) is minimized at ψ .

Proof of Theorem 5.1

Consider two instances (A and B) with n sellers and mn 2+n + 1 buyers. In both instances, buyers’ demand is the same: one buyer demands a sample of n (i.e., all of the sellers) and the remaining n 2+n buyers demand a sample of just one seller. The two instances differ with respect to the sellers’ cost functions and have different payment functions: \(\pi ^{max}_{A}(k)=\min \{k,n\}\) and \(\pi ^{max}_{B}(k)=\max \{k,n\}\); note that both can arise from concave c i ’s.

Since we are interested in distributions that are oblivious to the payment function and the two instances differ only with respect to the payment functions, it suffices to show that if a distribution ψ ∈ Ψ(N) gives a 2-approximation for instance A, then the same distribution ψ cannot give a better than (2−𝜖)-approximation for instance B for 𝜖>0. More formally, we will show that if

$$ \sum\limits_{k=0}^{m} p_{\psi}(k) \pi_{A}^{max}(k) \leq 2 {\Pi}^{\text{OPT}}_{A}, $$
(2)

then

$$ \sum\limits_{k=0}^{m} p_{\psi}(k) \pi_{B}^{max}(k) > (2 - 6/n) {\Pi}^{\text{OPT}}_{B}. $$
(3)

Setting n>6/𝜖 will then conclude the proof.

Since \(\pi _{A}^{max}\) is concave, ψ is optimal for instance A and

$$\begin{array}{@{}rcl@{}} {\Pi}^{\text{OPT}}_{A} &=& \sum\limits_{k=0}^{m} p_{\psi^{\ast}}(k) \pi_{A}^{max}(k) \\ &=&\frac{1}{n} \min\{n^{2}+n+1,n\} + \frac{n-1}{n} \min\{1,n\} = 2 - \frac{1}{n}. \end{array} $$

Thus, (2) implies that \({\sum }_{k=0}^{m} p_{\psi }(k) \pi _{A}^{max}(k) < 4\). Then,

$$\sum\limits_{k=0}^{n} p_{\psi}(k)\pi^{max}_{A}(k) = \sum\limits_{k=0}^{n} k p_{\psi}(k)<4, $$

which, together with Lemma 4.3, implies

$$ \sum\limits_{k=n+1}^{m} k p_{\psi}(k) > n-2. $$
(4)

Moreover, \({\sum }_{k = n+1}^{m} p_{\psi }(k)\pi ^{max}_{A}(k) = n {\sum }_{k = n+1}^{m} p_{\psi }(k) < 4\), which implies that

$$ \sum\limits_{k = 0}^{n} p_{\psi}(k) > 1 - 4/n. $$
(5)

Now consider instance B. The buyer that requested a sample of size n will be given access to the data of all sellers. To determine what samples other buyers will get, suppose we randomly split the set of n 2+n buyers demanding a single seller into n groups of n + 1 buyers each. We label these groups {1,2,...,n}. We then assign seller i to the buyers of group i. This gives unbiased samples, because each buyer is equally likely to be in each group. Note that exactly n buyers get access to the data of a given seller, so \({\Pi }^{\text {OPT}}_{B} \leq n\).

To conclude the proof, we show that (4) and (5) imply (3). First note that in order to satisfy the demand of the buyer who requests n sellers, every seller will be sampled at least once and paid at least max{1,n}=n. Then, (5) implies that \({\sum }_{k=0}^{n} p_{\psi }(k) \pi ^{max}_{B}(k) > n-4\). On the other hand, (4) implies that \({\sum }_{k=n+1}^{m} p_{\psi }(k) \pi ^{max}_{B}(k) > n-2\). Summing these two inequalities, we conclude that \({\sum }_{k=0}^{m} p_{\psi }(k) \pi _{B}^{max}(k) > 2n-6\), which together with \({\Pi }^{\text {OPT}}_{B} \leq n\) implies (3).

Proof Sketch for Theorem 5.2

Let ψ ∈ Ψ(N) be the distribution that achieves πOPT; for notational simplicity, we write pp ψ , \(p_{o} \equiv p_{\psi ^{\ast }}\), and ππ max for the remainder of the proof. We wish to show that

$$ \sum\limits_{k=0}^{m} p_{o}(k) \pi(k) \leq 2{\Pi}^{\text{OPT}} = 2 \sum\limits_{k=0}^{m} p(k) \pi(k). $$
(6)

Let \(P(k)={\sum }_{k^{\prime } \leq k}{p(k^{\prime })}\) denote the cumulative distribution function of p, and P −1(t) = inf{k|P(k)≥t} denote its generalized inverse distribution function; the functions P o (⋅) and \(P_{o}^{-1}(\cdot )\) are defined similarly for p o .

We decompose the [0,1] interval into subintervals (t l , t r ) for which we know that, for any two values t,t ∈(t l , t r ), we have P −1(t) = P −1(t ) and \(P_{o}^{-1}(t)=P_{o}^{-1}(t^{\prime })\). In order to do so, we let T = {P(k)|(p(k)>0)∨(p o (k)>0)} be the set of distinct values in [0,1] that either P(⋅) or P o (⋅) takes. If 0<t 1 < t 2<...<t |T| = 1 is an ordering of the values in T and t 0 = 0, then we let I = {(t 0, t 1],(t 1, t 2],...,(t |T|−1,1]}, which is a set of intervals that satisfy the property that we wanted. Given this property, (6) can be rewritten as follows:

$$\sum\limits_{t_{r}\in T}{(t_{r}-t_{r-1})\pi(P_{o}^{-1}(t_{r}))} \leq 2 \sum\limits_{t_{r}\in T}{(t_{r}-t_{r-1})\pi(P^{-1}(t_{r}))}. $$

Let \(T_{A} = \{t_{r}\in T | P^{-1}(t_{r})>P_{o}^{-1}(t_{r})\}\) and T B = TT A . By definition of the set T A , and using the fact that π(⋅) is an increasing function, it is easy to see that for any t r T A we have \(\pi (P^{-1}(t_{r})) \geq \pi (P_{o}^{-1}(t_{r}))\). Therefore

$$\begin{array}{@{}rcl@{}} \sum\limits_{t_{r}\in T_{A}}{(t_{r}-t_{r-1})\pi (P_{o}^{-1}(t_{r}))} &\leq &\sum\limits_{t_{r}\in T_{A}}{(t_{r}-t_{r-1}) \pi(P^{-1}(t_{r}))} \\ &\leq &\sum\limits_{t_{r}\in T}{(t_{r}-t_{r-1})\pi(P^{-1}(t_{r}))}. \end{array} $$

In order to prove the theorem (for a complete proof see (Gkatzelis et al. 2012)) we only need to show the following inequality; summing it with the previous one proves the theorem since T A T B = T.

$$\sum\limits_{t_{r}\in T_{B}}{(t_{r}-t_{r-1})\pi(P_{o}^{-1}(t_{r}))}\leq \sum\limits_{t_{r}\in T}{(t_{r}-t_{r-1})\pi(P^{-1}(t_{r}))}. $$

Proving this inequality is significantly more demanding, but the main intuition behind why it holds is that, even though π may not be concave, the payment that a seller receives per buyer is decreasing in the total number of buyers that get access to his data. To verify that this is true, for some k, let i be the seller for which c i (k) = π(k). Then, for k k,

$$\begin{array}{@{}rcl@{}} \pi(k) = c_{i}(k) &\leq& \frac{k}{k^{\prime}}c_{i}(k^{\prime}) \leq \frac{k}{k^{\prime}}\pi(k^{\prime})\\ \Rightarrow \frac{\pi(k)}{k} &\leq& \frac{\pi(k^{\prime})}{k^{\prime}}, \end{array} $$

where the first inequality holds because of the concavity of c i (⋅) and the second holds by definition of π(k )= maxi c i (k ).

We use this to upper bound the payment from distribution p o due to t r T B intervals; we show that for every expected buyer of p o in the interval (t l , t r ] with t r T B , there exists some other interval \((\bar {t}_{l},\bar {t}_{r}]\) with \(\bar {t}_{r}\leq t_{r}\) and the same expected number of buyers for p. Since \(\bar {t}_{r}\leq t_{r}\), the former interval leads to a smaller payment per buyer.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gkatzelis, V., Aperjis, C. & Huberman, B.A. Pricing private data. Electron Markets 25, 109–123 (2015). https://doi.org/10.1007/s12525-015-0188-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12525-015-0188-8

Keywords

JEL Classification

Navigation