Skip to main content
Log in

Privacy-preserving worker allocation in crowdsourcing

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Crowdsourcing has been a prevalent way to obtain answers for tasks that need human intelligence. In general, a crowdsourcing platform is responsible for allocating workers to each received task, with high-quality workers in priority. However, the allocation results can in turn yield knowledge about workers’ quality. For example, those unallocated workers are supposed to be less-qualified. They can be upset if such information is known by the public, which is an invasion of their privacy. To alleviate such concerns, we study the privacy-preserving worker allocation problem in this paper, aiming to properly allocate the workers while protecting their privacy. We propose worker allocation methods with the property of differential privacy, which proceed by first computing weights for each potential allocation and then sampling according to the weights. The Markov Chain Monte Carlo-based method is shown in our experiments to improve over the trivial random allocation method by 18.9% in terms of worker quality on synthetic data. On the real data, it realizes differential privacy with less than 20% loss on quality even when \(\epsilon = \frac{1}{3}\).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. http://www.image-net.org/.

  2. https://www.mturk.com/.

References

  1. Amazon mechanical turk. https://www.mturk.com/

  2. Ele.me. https://www.ele.me/

  3. Uber. https://www.uber.com/

  4. Abadi, M., Chu, A., Goodfellow, I.J., McMahan, H.B., Mironov, I., Talwar, K., Zhang, L.: Deep learning with differential privacy. In: Proc. of the ACM CCS, pp. 308–318

  5. Andrés, M.E., Bordenabe, N.E., Chatzikokolakis, K., Palamidessi, C.: Geo-indistinguishability: differential privacy for location-based systems. In: Proc. of the ACM CCS, pp. 901–914 (2013)

  6. Beimel, A., Nissim, K., Stemmer, U.: Private learning and sanitization: Pure vs. approximate differential privacy. In: Raghavendra, P., Raskhodnikova, S., Jansen, K., Rolim, J.D.P. (eds.) Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pp. 363–378 (2013)

  7. Béziaud, L., Allard, T., Gross-Amblard, D.: Lightweight privacy-preserving task assignment in skill-aware crowdsourcing. In: DEXA (2), 10439 of Lecture Notes in Computer Science, pp. 18–26 (2017)

  8. Bhaskar, R., Laxman, S., Smith, A.D., Thakurta, A.: Discovering frequent patterns in sensitive data. In: Proc. of the KDD, pp. 503–512. ACM (2010)

  9. Borodin, A., El-Yaniv, R.: Online computation and competitive analysis. cambridge university press (2005)

  10. Duguépéroux, J., Allard, T.: From task tuning to task assignment in privacy-preserving crowdsourcing platforms. Trans. Large Scale Data Knowl. Centered Syst. 44, 67–107 (2020)

    Google Scholar 

  11. Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I., Naor, M.: Our data, ourselves: privacy via distributed noise generation. Proc. of the EUROCRYPT 4004, 486–503 (2006)

    MathSciNet  MATH  Google Scholar 

  12. Dwork, C., McSherry, F., Nissim, K., Smith, A.D.: Calibrating noise to sensitivity in private data analysis. Proc. of the TCC 3876, 265–284 (2006)

    MathSciNet  MATH  Google Scholar 

  13. Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9(3–4), 211–407 (2014)

    MathSciNet  MATH  Google Scholar 

  14. Fan, J., Li, G., Ooi, B.C., Tan, K., Feng, J.: icrowd: An adaptive crowdsourcing framework. In: Proc. of the SIGMOD, pp. 1015–1030 (2015)

  15. Fisher, R.A., et al.: Statistical methods for research workers. Statistical methods for research workers., (5th Ed) (1934)

  16. Geweke, J., et al.: Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments, volume 196. Federal Reserve Bank of Minneapolis, Research Department Minneapolis, MN (1991)

  17. Hu, H., Zheng, Y., Bao, Z., Li, G., Feng, J., Cheng, R.: Crowdsourced poi labelling: location-aware result inference and task assignment. In: Proc. of the ICDE, pp. 61–72. IEEE (2016)

  18. Ipeirotis, P.G., Provost, F., Wang, J.: Quality management on amazon mechanical turk. In: Proc. of the SIGKDD, pp. 64–67 (2010)

  19. Kajino, H., Arai, H., Kashima, H.: Preserving worker privacy in crowdsourcing. Data Mining and Knowl. Dis. 28(5–6), 1314–1335 (2014)

    Article  MathSciNet  Google Scholar 

  20. Karger, D.R., Oh, S., Shah, D.: Budget-optimal task allocation for reliable crowdsourcing systems. Oper. Res. 62(1), 1–24 (2014)

    Article  Google Scholar 

  21. Karp, R.M.: On-line algorithms versus off-line algorithms: How much. In: Algorithms, Software, Architecture: Information Processing 92: Proceedings of the IFIP 12th World Computer Congress, volume 1, p. 416 (1992)

  22. Khattak, F.K., Salleb-Aouissi, A.: Quality control of crowd labeling through expert evaluation. In: Proc. of the NIPS 2nd Workshop on Computational Social Science and the Wisdom of Crowds, volume 2, p. 5 (2011)

  23. Li, H., Liu, Q.: Cheaper and better: Selecting good workers for crowdsourcing. In: Proc. of the HCOMP, pp. 20–21 (2015)

  24. Liu, Y., Guo, B., Chen, C., Du, H., Yu, Z., Zhang, D., Ma, H.: Foodnet: toward an optimized food delivery network based on spatial crowdsourcing. IEEE Trans. Mobile Comput. 18(6), 1288–1301 (2018)

    Article  Google Scholar 

  25. Marshall Hall, J.: Combinatorial theory. Blaisdell, Waltham, Mass, 196 (1986)

  26. McSherry, F.: Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: Proc. of the SIGMOD, pp. 19–30 (2009)

  27. McSherry, F., Talwar, K.: Mechanism design via differential privacy. In: Proc. of the FOCS, pp. 94–103 (2007)

  28. Shen, E., Yu, T.: Mining frequent graph patterns with differential privacy. In: Proc. of the KDD, pp. 545–553 (2013)

  29. Shu, J., Jia, X., Yang, K., Wang, H.: Privacy-preserving task recommendation services for crowdsourcing. IEEE Trans. Services Comput. (2018)

  30. Tao, Q., Tong, Y., Zhou, Z., Shi, Y., Chen, L., Xu, K.: Differentially private online task assignment in spatial crowdsourcing: A tree-based approach. In: Proc. of the ICDE, pp. 517–528 (2020)

  31. To, H., Ghinita, G., Shahabi, C.: A framework for protecting worker location privacy in spatial crowdsourcing. Proc. of the VLDB Endowment 7(10), 919–930 (2014)

    Article  Google Scholar 

  32. Tong, Y., Zhou, Z., Zeng, Y., Chen, L., Shahabi, C.: Spatial crowdsourcing: a survey. The VLDB J. 29(1), 217–250 (2020)

    Article  Google Scholar 

  33. Varshney, L.R.: Privacy and reliability in crowdsourcing service delivery. In: Annual SRII Global Conference, pp. 55–60 (2012)

  34. Wang, J., Kraska, T., Franklin, M.J., Feng, J.: Crowder: Crowdsourcing entity resolution. Proc. of the VLDB 5(11), 1483–1494 (2012)

    Article  Google Scholar 

  35. Zhao, Z., Wei, F., Zhou, M., Chen, W., Ng, W.: Crowd-selection query processing in crowdsourcing databases: A task-driven approach. In: Proc. of the EDBT, pp. 397–408 (2015)

  36. Zheng, L., Chen, L.: DLTA: A framework for dynamic crowdsourcing classification tasks. IEEE Trans. Knowl. Data Eng. 31(5), 867–879 (2019)

    Article  Google Scholar 

  37. Zheng, Y., Li, G., Li, Y., Shan, C., Cheng, R.: Truth inference in crowdsourcing: is the problem solved? Proc. of the VLDB 10(5), 541–552 (2017)

    Article  Google Scholar 

Download references

Acknowledgements

Libin Zheng’s work is supported by the National Natural Science Foundation of China No. 62102463, Basic and Applied basic Research Project of Guangzhou basic Research Program (202102080401), and Zhuhai Industry-University-Research Cooperation Project (ZH22017001210010PWC). Peng Cheng is sponsored by Shanghai Pujiang Program 19PJ1403300 and the National Natural Science Foundation of China No. 62102149. Lei Chen’s work is partially supported by National Key Research and Development Program of China Grant No. 2018AAA0101100, the Hong Kong RGC GRF Project 16209519, CRF Project C6030-18G, C1031-18G, C5026-18G, AOE Project AoE/E-603/18, Theme-based project TRS T41-603/20R, China NSFC No. 61729201, Guangdong Basic and Applied Basic Research Foundation 2019B151530001, Hong Kong ITC ITF grants ITS/044/18FX and ITS/470/18FX, Microsoft Research Asia Collaborative Research Grant, HKUST-NAVER/LINE AI Lab, Didi-HKUST joint research lab, HKUST-Webank joint research lab grants.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Libin Zheng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A Proof of Lemma 1

Proof

We show that \(g(\cdot )\) can be constructed to meet (i) and (ii), respectively.

  1. (i)

    We construct a bipartite graph with \(\varPhi '\) and \(\varPhi \) as the left and right sides, respectively. In addition, each \(\phi \in \varPhi \) has \(\lfloor \frac{|W| - (k -1 )}{k} \rfloor \) copies in its side. Each copy of \(\phi \) is equally linked to all \(\phi ' \in \varPhi '\) such that \(|\phi \setminus \phi ' | = 1\). An example is given in Fig. 10. We show that there is a matching covering all \(\phi \in \varPhi \), which then directly infers the feasibility of constructing \(g(\cdot )\) meeting (i). Hall’s Theorem [25] states that for a bipartite graph, there is a matching covering \(\varPhi \), if and only if

    $$\begin{aligned} \forall X \subseteq {\varPhi }, \quad |X| \le |Nbor(X)|\,, \end{aligned}$$

    where Nbor(X) contains all the neighbor nodes of X in \(\varPhi '\). Given any \(X \subseteq \varPhi \), we construct \(X^*\) on top of X by including all copies of any \(\phi \in X\) such that each \(\phi \in X\) has \(\lfloor \frac{|W| - (k -1 )}{k} \rfloor \) copies in \(X^*\). Obviously we have \(Nbor(X) = Nbor(X^*)\). There are \(\frac{|X^*|}{\lfloor \frac{|W| - (k -1 )}{k} \rfloor }\) distinct \(\phi \)’s in \(X^*\), each of which has \(|W| - (k -1 )\) neighbors in \(\varPhi '\). Different \(\phi \in X^*\) can be incident to a same node in \(Nbor(X^*)\). However, each \(\phi '\in Nbor(X^*)\) has at most k neighbors in \(X^*\). As a result,

    $$\begin{aligned} |Nbor(X^*)| \ge \frac{|X^*|}{\lfloor \frac{|W| - (k -1 )}{k} \rfloor } * (|W|-(k-1)) * \frac{1}{k} \ge |X^*|. \end{aligned}$$

    Then \(|Nbor(X)| = |Nbor(X^*)| \ge |X^*| \ge |X| \).

  2. (ii)

    We still construct a bi-graph with \(\varPhi \) and \(\varPhi '\), but this time each \(\phi \in \varPhi \) has \(\lceil \frac{|W| - (k -1) }{k} \rceil \) copies. An example is given in Fig. 11. We show that there is a matching covering all \(\phi ' \in \varPhi '\), which then directly infers the feasibility of constructing \(g(\cdot )\) meeting (ii). With Hall’s Theorem, we need to show that

    $$\begin{aligned} \forall X \subseteq {\varPhi '}, \quad |X| \le |Nbor(X)|\,. \end{aligned}$$

    Given any \(X \subseteq \varPhi '\), each \(\phi ' \in X\) has \(k * \lceil \frac{|W| - (k -1) }{k} \rceil \) neighbors in \(\varPhi \). Each \(\phi \in Nbor(X)\) has at most \(|W| -(k - 1)\) neighbors in X. As a result,

    $$\begin{aligned} |Nbor(X)|\ge & {} |X| * k * \lceil \frac{|W| - (k -1) }{k} \rceil \\&* \frac{1}{|W| -(k - 1)} \ge |X|\,. \end{aligned}$$

    \(\square \)

Fig. 10
figure 10

An example bigraph for proving (i) of Lemma 1, where \(W = \{a,b,c,d,e,f\}\), \(k = 2\), \(\varPhi = W\), \(\varPhi ' = \{ab, ac,..., ef\}\), and \(\lfloor \frac{|W| - (k -1 )}{k} \rfloor = 2\). Each element in \(\varPhi \) has 2 copies in the downside of the bigraph

Fig. 11
figure 11

An example bigraph for proving (ii) of Lemma 1, where \(W = \{a,b,c,d,e,f\}\), \(k = 2\), \(\varPhi = W\), \(\varPhi ' = \{ab, ac,..., ef\}\), and \(\lceil \frac{|W| - (k -1 )}{k} \rceil = 3\). Each element in \(\varPhi \) has 3 copies in the downside of the bigraph

B Proof of Theorem 3

Proof

Given two neighboring worker pools \(W_1\) and \(W_2\) such that \(W_1 \setminus W_2 = w_1\), and a query Q, let \(P_1(A_h) \) and \(P_2(A_h) \) denote the probability of getting the output \(A_h\) from using Q over \(W_1\) and \(W_2\), respectively. Let \(\varPhi _1\) and \(\varPhi _2\) denote all the length-B subsets of \(W_1\) and \(W_2\), respectively. Then, we need to bound \(\frac{P_1(A_h)}{P_2(A_h) }\) and \(\frac{P_2(A_h)}{P_1(A_h) }\) for any \(A _h\in \varPhi =\varPhi _1 \cup \varPhi _2\).

Let us first consider \(A _h\) such that \(w_1 \notin A _h\). Then, according to Algorithm 1,

$$\begin{aligned} P_1(A _h) = \frac{e^{\epsilon f(A _h)} }{\sum \limits _{A \in \varPhi _1} e^{\epsilon f(A) }} \,. \end{aligned}$$

\(P_2(A _h)\) can be deducted similarly. Since \(\varPhi _2 \subset \varPhi _1\), we have \(\frac{P_1(A _h)}{P_2(A _h) } <\frac{P_2(A _h)}{P_1(A _h) }\), and thus focus on \(\frac{P_2(A _h)}{P_1(A _h) }\).

$$\begin{aligned} \frac{P_2(A _h)}{P_1(A _h) }= & {} \frac{\sum \limits _{A \in \varPhi _2 } e^{\epsilon f(A)} + \sum \limits _{A \in \varPhi _1 \setminus \varPhi _2 } e^{\epsilon f(A) } }{ \sum \limits _{A \in \varPhi _2} e^{\epsilon f(A)} } \\= & {} 1 + \frac{\sum \limits _{A \in \varPhi _1 \setminus \varPhi _2 } e^{\epsilon f(A)} }{\sum \limits _{A \in \varPhi _2 } e^{\epsilon f(A) } } \\ \end{aligned}$$

Let \(\varDelta \varPhi = \varPhi _1 \setminus \varPhi _2\). Each answer \(A_p \in \varDelta \varPhi \) can be represented as a combination as \((\tilde{A},w_1)\), where \(\tilde{A}\) is a size-(B-1) worker subset of \(W_2\), i.e., \(\tilde{A}\subset W_2 \wedge |\tilde{A}| = B - 1\). Given \(\tilde{A}\), for any \(A_p \in \varDelta \varPhi \) and any \(A_q \in \varPhi _2\), such that \(\tilde{A} \subset A_p \) and \(\tilde{A} \subset A_q \), we have \(f(A_p) \le f(A_q) + 1\). Note that \(\tilde{A}\) and \(A_q\) are size-\((B-1)\) and -B subsets of \(W_2\). According to Lemma 1,we can construct an injection \(g: \varPhi _2 \rightarrow \varDelta \varPhi \), such that for each \(A_q \in \varPhi _2\), we have \(g(A_q) \setminus A_q = w_1\) and each \(A_p \in \varDelta \varPhi \) is reached via \(g(\cdot )\) by at least \(\lfloor {\frac{|W_2| - (B-1)}{B}} \rfloor \) times. Let \(\alpha = \lfloor {\frac{|W_2| - (B-1)}{B}} \rfloor \), and we have

$$\begin{aligned} \frac{\sum \limits _{A \in \varDelta \varPhi } e^{\epsilon f(A)} }{\sum \limits _{A \in \varPhi _2 } e^{\epsilon f(A) } }\le & {} \frac{\frac{1}{\alpha } \sum \limits _{A \in \varPhi _2 } e^{\epsilon f(g(A)) } }{\sum \limits _{A \in \varPhi _2 } e^{\epsilon (f(A)) )} } \\\le & {} \frac{\frac{1}{\alpha } \sum \limits _{A \in \varPhi _2 } e^{\epsilon (f(A) + 1) } }{\sum \limits _{A \in \varPhi _2 } e^{\epsilon (f(A)) )} } \\= & {} \frac{e^\epsilon }{\alpha } \,. \end{aligned}$$

Then,

$$\begin{aligned} \frac{P_2(A _h)}{P_1(A _h) } \le 1 + \frac{e^\epsilon }{\alpha } \le e^{\epsilon + \ln (\frac{1}{\alpha } + 1)} \,. \end{aligned}$$
(5)

We then consider the other case when \(w_1 \in A _h\), which suggests that \(A _h \notin \varPhi _2\). Obviously we have \(P_2(A _h) = 0\). For any \(t \in \{1...B\}\) and \(\varPhi _1^t = \{A\in \varPhi _1 \wedge |A \cap A_h |= t\} \), we have \(|\varPhi _1^t| = \left( {\begin{array}{c}B\\ t\end{array}}\right) \left( {\begin{array}{c}|W_1| - B\\ B - t\end{array}}\right) \) and \(\forall A \in \varPhi _1^t\,,~ f(A) \ge f(A_h) - (B - t)\). Then,

$$\begin{aligned} \begin{aligned} P_1(A _h)&= \frac{e^{\epsilon f(A_h)}}{\sum \limits _{t \in \{0...B\}}\sum \limits _{A \in \varPhi _1^t} e^{\epsilon f(A)} } \\&\le \frac{e^{\epsilon f(A_h) } }{\sum \limits _{t \in \{0...B\}} \left( {\begin{array}{c}B\\ t\end{array}}\right) \left( {\begin{array}{c}|W_1| - B\\ B - t\end{array}}\right) e^{\epsilon \left( f(A_h) - (B - t)\right) }} \\&\le \frac{ 1 }{ \sum \limits _{t \in \{0...B\}} \left( {\begin{array}{c}B\\ t\end{array}}\right) \left( {\begin{array}{c}|W_1| - B\\ B - t\end{array}}\right) e^{\epsilon ( t - B ) }} \,. \end{aligned} \end{aligned}$$
(6)

Combining Equation (5) and (6), we can obtain for any \(A_h\),

$$\begin{aligned} \begin{aligned} P_1(A _h) \le ~&e^{\epsilon + \ln (\frac{1}{\alpha } + 1) }* P_2(A _h) + \\&\frac{ 1 }{ \sum \limits _{t \in \{0...B\}} \left( {\begin{array}{c}B\\ t\end{array}}\right) \left( {\begin{array}{c}|W_1| - B\\ B - t\end{array}}\right) e^{\epsilon ( t - B ) }} \,, \\ P_2(A _h) \le ~&e^{\epsilon + \ln (\frac{1}{\alpha } + 1)} * P_1(A _h) + \\&\frac{ 1 }{ \sum \limits _{t \in \{0...B\}} \left( {\begin{array}{c}B\\ t\end{array}}\right) \left( {\begin{array}{c}|W_1| - B\\ B - t\end{array}}\right) e^{\epsilon ( t - B ) }} \,. \end{aligned} \end{aligned}$$

Finally, since \(|W_1| - |W_2| = 1\), we have \({{\mathcal {O}}}(|W_1|) = {{\mathcal {O}}}(|W_2|) \). We simply use the same ‘W’ in the theorem. \(\square \)

C Proof of Theorem 4

Proof

Case \(\mathbf {B \le \frac{W+1}{2}}\). Let us consider the worker \(\hat{w}\) who has the largest reliability. We consider all the worker subsets of length-B including and excluding \(\hat{w}\), respectively, denoted by \({{\mathcal {W}}}^+ = \{W^+\}\) and \({{\mathcal {W}}^-} = \{W^-\}\). Then, we have

$$\begin{aligned} |{{\mathcal {W}}}^+| = \left( {\begin{array}{c}|W| - 1\\ B-1\end{array}}\right) \text {~~and~~} |{{\mathcal {W}}}^-| = \left( {\begin{array}{c}|W| - 1\\ B\end{array}}\right) \,. \end{aligned}$$

According to Lemma 1 (making \(W\leftarrow W \setminus \hat{w}\)), we can construct an injection \(g: {{\mathcal {W}}^-} \rightarrow {{\mathcal {W}}}^+\) such that \( g(W^-) \setminus W^-= \hat{w}\), and each \( W^+ \in {{\mathcal {W}}}^+\) is mapped at most \(\lceil \frac{|W| - B}{B} \rceil \) = \(\lceil \frac{|W|}{B} \rceil - 1 \) times.

Obviously we have \(\forall W^-, ~~ f(W^-) \le f( g(W^-))\). Then, the probability that a worker allocation in \({{\mathcal {W}}}^-\) is selected by Algorithm 1 is

$$\begin{aligned} \begin{aligned}&Prob(\widehat{A} \in {{\mathcal {W}}}^-) \\ =&\frac{\sum \limits _{W^-\in {{\mathcal {W}}}^-} e^{\epsilon f(W^-)}}{\sum \limits _{W^+\in { {{\mathcal {W}}}^+}} e^{\epsilon f(W^+)} + \sum \limits _{W^-\in {{\mathcal {W}}}^-} e^{\epsilon f(W^-)}}\\ \le&\frac{\sum \limits _{W^- \in {{\mathcal {W}}}^-} e^{\epsilon f(W^-)}}{ \frac{1}{\lceil \frac{|W|}{B} \rceil - 1}\sum \limits _{W^- \in {{\mathcal {W}}}^-} e^{\epsilon f(g(W^-)) } + \sum \limits _{W^-\in {{\mathcal {W}}}^-} e^{\epsilon f(W^-)} } \\ \le&\frac{\sum \limits _{W^- \in {{\mathcal {W}}}^-} e^{\epsilon f(W^-)}}{\frac{1}{\lceil \frac{|W|}{B} \rceil - 1 } \sum \limits _{W^- \in {{\mathcal {W}}}^-} e^{\epsilon f(W^-) } + \sum \limits _{W^-\in {{\mathcal {W}}}^-} e^{\epsilon f(W^-)} } \\ =&1 - \frac{1}{\lceil \frac{|W|}{B} \rceil } \,. \end{aligned} \end{aligned}$$

The probability that \(\widehat{A} \) excludes \(\hat{w}\) is at most \( 1 - \frac{1}{\lceil \frac{|W|}{B} \rceil } \), conditional on which we can also infer that \(\widehat{A} \) excludes the worker with second largest reliability is \(1 - \frac{1}{\lceil \frac{|W|- 1}{B - 1} \rceil } \) using the same analysis as above by replacing W and B with \(W \setminus \hat{w}\) and \(B-1 \), respectively. In fact, let \(\hat{w}_k\) denote the worker with the k-th largest reliability for \(1 \le k \le B \), we have

$$\begin{aligned} Prob(\hat{w}_{k}\notin \widehat{A} |\hat{w}_1...\hat{w}_{k-1} \notin \widehat{A} ) \le 1 - \frac{1}{\lceil \frac{|W| - (k - 1)}{B - (k - 1)} \rceil } \,. \end{aligned}$$

Therefore, \( Prob(\hat{w}_1...\hat{w}_{b} \notin \widehat{A} ) \le \prod \limits _{ k \in \{0...b-1\} }( 1 - \frac{1}{\lceil \frac{|W| - k}{B - k } \rceil } )\).

Case \(\mathbf {B > \frac{W+1}{2}}\). Let \(B' = |W| - B\). The algorithm opts to select the top-\(B' \) workers, with the reversed reliabilities. Considering the original top reliable worker \(\hat{w}\), w.r.t. the regulated reliabilities, it now has the smallest \(r'\). Similarly denoting the worker subsets including and excluding \(\hat{w} \) as \({{\mathcal {W}}}^+ \) and \({{\mathcal {W}}}^- \), respectively, then missing \(\hat{w}\) in the final allocation \(W \setminus \widehat{A} \) is equivalent to selecting \(\hat{w}\) in \(\widehat{A} \). We have

$$\begin{aligned} \frac{|{{\mathcal {W}}}^+|}{|{{\mathcal {W}}}^-|} = \frac{B'}{|W| - B'} = \frac{|W| - B}{B} \,. \end{aligned}$$

Similarly, we can construct an injection \(g: {{\mathcal {W}}^+} \rightarrow {{\mathcal {W}}}^-\) such that \( W^+ \setminus g(W^+) = \hat{w}\), with each \( W^- \in {{\mathcal {W}}}^-\) mapped at most \(\lceil \frac{|W| - B}{B} \rceil \) times. In addition, since \(\hat{w}\) has the smallest \(r'\), we have \(\forall W^+, ~~ f'(W^+) \le f'( g(W^+))\). Then we can deduct the bound for \(Prob(\widehat{A} \in {{\mathcal {W}}}^+) \) in the same way as deducting \(Prob(\widehat{A} \in {{\mathcal {W}}}^-) \) for the case \(B \le \frac{W+1}{2}\), with \(f(\cdot )\) replaced by \(f'(\cdot )\). The subsequent deductions for \( Prob(\hat{w}_1...\hat{w}_{b} \in \widehat{A} )\) follow the previous case as well. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, L., Chen, L. & Cheng, P. Privacy-preserving worker allocation in crowdsourcing. The VLDB Journal 31, 733–751 (2022). https://doi.org/10.1007/s00778-021-00713-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-021-00713-1

Keywords

Navigation