Preserving worker privacy in crowdsourcing

Kajino, Hiroshi; Arai, Hiromi; Kashima, Hisashi

doi:10.1007/s10618-014-0352-3

Preserving worker privacy in crowdsourcing

Published: 29 May 2014

Volume 28, pages 1314–1335, (2014)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Hiroshi Kajino¹,
Hiromi Arai² &
Hisashi Kashima³

1426 Accesses
28 Citations
Explore all metrics

Abstract

This paper proposes a crowdsourcing quality control method with worker-privacy preservation. Crowdsourcing allows us to outsource tasks to a number of workers. The results of tasks obtained in crowdsourcing are often low-quality due to the difference in the degree of skill. Therefore, we need quality control methods to estimate reliable results from low-quality results. In this paper, we point out privacy problems of workers in crowdsourcing. Personal information of workers can be inferred from the results provided by each worker. To formulate and to address the privacy problems, we define a worker-private quality control problem, a variation of the quality control problem that preserves privacy of workers. We propose a worker-private latent class protocol where a requester can estimate the true results with worker privacy preserved. The key ideas are decentralization of computation and introduction of secure computation. We theoretically guarantee the security of the proposed protocol and experimentally examine the computational efficiency and accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Chicago face database: A free stimulus set of faces and norming data

Article 13 January 2015

Privacy concerns in E-commerce: A taxonomy and a future research agenda

Article 13 November 2019

Big Data Privacy: Challenges to Privacy Principles and Models

Article Open access 15 September 2015

Notes

References

Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp 439–450
Bernstein M, Chi EH, Chilton L, Hartmann B, Kittur A, Miller RC (2011) Crowdsourcing and human computation: systems, studies and platforms. In: Proceedings of CHI 2011 Workshop on Crowdsourcing and Human Computation, pp 53–56
Burkhart M, Strasser M, Many D, Dimitropoulos X (2010) SEPIA: privacy-preserving aggregation of multi-domain network events and statistics. In: Proceedings of the 19th USENIX Conference on Security, pp 223–240
Damgård I, Jurik M (2001) A Generalisation, a simplification and some applications of Paillier’s probabilistic public-key system. In: Proceedings of the 4th International Workshop on Practice and Theory in Public Key Cryptography: Public Key Cryptography, pp 119–136
Dawid AP, Skene AM (1979) Maximum likelihood estimation of observer error-rates using the EM algorithm. J R Stat Soc Ser C 28(1):20–28
Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39(1):1–38
MATH MathSciNet Google Scholar
Ertekin S, Hirsh H, Rudin C (2012) Learning to predict the wisdom of crowds. In: Proceedings of Collective Intelligence 2012
Lease M (2011) On quality control and machine learning in crowdsourcing. In: Proceedings of the Third Human Computation Workshop, pp 97–102
Lin X, Clifton C, Zhu M (2005) Privacy-preserving clustering with distributed EM mixture modeling. Knowl Inf Syst 8(1):68–81
Article Google Scholar
Lindell Y, Pinkas B (2000) Privacy preserving data mining. In: Advances in Cryptology-CRYPTO ’00, pp 36–54
Nabar SU, Kenthapadi K, Mishra N, Motwani R (2008) A survey of query auditing techniques for data privacy. In: Privacy-Preserving Data Mining: Models and Algorithms, pp 415–431
Raykar VC, Yu S, Zhao LH, Florin C, Bogoni L, Moy L (2010) Learning from crowds. J Mach Learn Res 11:1297–1322
MathSciNet Google Scholar
Shamir A (1979) How to share a secret. Commun ACM 22(11):612–613. doi:10.1145/359168.359176
Article MATH MathSciNet Google Scholar
Sheng VS, Provost F, Ipeirotis PG (2008) Get another label? Improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 614–622
Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Syst 10(5):557–570. doi:10.1142/S0218488502001648
Article MATH MathSciNet Google Scholar
Varshney LR (2012) Privacy and reliability in crowdsourcing service delivery. In: Proceedings of the 2012 Annual SRII Global Conference, pp 55–60
Welinder P, Branson S, Belongie S, Perona P (2010) The multidimensional wisdom of crowds. Adv Neural Inf Process Syst 23:2424–2432
Google Scholar
Whitehill J, Ruvolo P, Wu T, Bergsma J, Movellan J (2009) Whose vote should count more: optimal integration of labels from labelers of unknown expertise. Adv Neural Inf Process Syst 22:2035–2043
Google Scholar
Yang B, Sato I, Nakagawa H (2012) Privacy-preserving EM algorithm for clustering on social network. In: Advances in Knowledge Discovery and Data Mining 16th Pacific-Asia Conference, PAKDD 2012, pp 542–553

Download references

Acknowledgments

H. Kajino and H. Kashima were supported by the FIRST program.

Author information

Authors and Affiliations

Department of Mathematical Informatics, Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo , 113-8656, Japan
Hiroshi Kajino
Advanced Center for Computing and Communication, RIKEN, Hirosawa 2-1, Wako, Saitama, 351-0198, Japan
Hiromi Arai
Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo-ku, Kyoto , 606-8501, Japan
Hisashi Kashima

Authors

Hiroshi Kajino
View author publications
You can also search for this author in PubMed Google Scholar
Hiromi Arai
View author publications
You can also search for this author in PubMed Google Scholar
Hisashi Kashima
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hiroshi Kajino.

Additional information

Responsible editors: Toon Calders, Floriana Esposito, Eyke Hüllermeier, and Rosa Meo.

Appendix 1: Extensions to multi-class and real-valued labels

We introduce the detailed update rules of modified LC methods to deal with multi-class and real-valued labels, and then we explain how to extend the inference algorithms to preserve worker privacy.

1.1 Appendix 1.1: Multi-class labels

The LC method was originally proposed for multi-class labels by Dawid and Skene (1979). Let us assume a task to give a $K$-class label ($K\ge 2$). For each $i\in {\mathcal I}$ and $j\in {\mathcal {J}}$, a crowd label $y_{i,j}\in \{0,\dots ,K-1\}(=:{\mathcal K})$ is generated by the multinomial distribution

$$\begin{aligned} \pi _{jkl} = \Pr [y_{i,j} = k \mid y_{i}=l, \Pi _{j}], \end{aligned}$$

where $\sum _{k\in {\mathcal K}} \pi _{jkl} = 1$ holds for all $l\in {\mathcal K}$, and we denote $\Pi _{j} = \{\pi _{jkl} \mid k,l\in {\mathcal K}\}$. Also, for each $i\in {\mathcal I}$, the true label $y_i\in {\mathcal K}$ is generated by

$$\begin{aligned} p_l = \Pr [y_{i} = l], \end{aligned}$$

where $\sum _{l\in {\mathcal K}} p_l = 1$ holds. The model parameters $\Pi =\bigcup _{j\in {\mathcal {J}}}\Pi _{j}$ and $\{p_l \mid l\in {\mathcal K}\}$ and the posterior probabilities of the true labels $\mu _{il} = \Pr [y_i = l \mid {\mathcal Y}, \Pi ]$ are estimated using the following EM algorithm.

E-step:
for each $i\in {\mathcal I}$, update $\{\mu _{il} \mid l\in {\mathcal K}\}$ as
$$\begin{aligned} \mu _{il}&= \dfrac{p_l \rho _{il}}{\sum _{l^{\prime }\in {\mathcal K}} p_{l^{\prime }}\rho _{il^{\prime }}},\\ \mathrm{where\ } \log \rho _{il}&= \sum _{j\in {\mathcal {J}}_{i}} \sum _{k\in {\mathcal K}} {\mathbf I}(y_{i,j}=k) \log \pi _{jkl}. \end{aligned}$$
M-step:
for each $j\in {\mathcal {J}}$, update $\Pi _j$ as
$$\begin{aligned} \pi _{jkl} = \dfrac{\sum _{i\in {\mathcal I}_j} \mu _{il} {\mathbf I}(y_{i,j} = k)}{\sum _{i\in {\mathcal I}_j} \mu _{il}}, \end{aligned}$$
and for each $l\in {\mathcal K}$, update $p_l$ as
$$\begin{aligned} p_l = \dfrac{1}{|{\mathcal I}|}\sum _{i\in {\mathcal I}}\mu _{il}. \end{aligned}$$

This algorithm can be extended to preserve worker privacy. In the E-step, the parties calculate $\{\log \rho _{il} \mid i\in {\mathcal I}, l\in {\mathcal K}\}$ using our secure sum protocol, and the requester calculates and broadcasts $\{\mu _{il}\mid i\in {\mathcal I}, l\in {\mathcal K}\}$. In the M-step, each worker $j$ calculates $\{\pi _{jkl} \mid k,l\in {\mathcal K}\}$, and the requester calculates $\{p_l \mid {l\in {\mathcal K}}\}$.

1.2 Appendix 1.2: Real-valued labels

The LC method was modified to deal with real-valued labels by Raykar et al. (2010). Let us assume a task to give a real-valued label. For each $i\in {\mathcal I}$ and $j\in {\mathcal {J}}$, a crowd label $y_{i,j}\in \mathbb {R}$ is generated by the normal distribution

$$\begin{aligned} p(y_{i,j}\mid y_{i}, \tau _j, \gamma ) = \mathcal {N}(y_{i,j} \mid y_{i}, 1/\tau _j + 1/\gamma ), \end{aligned}$$

where $\tau _j (> 0)$ is the precision parameter of the normal distribution, which is interpreted as the ability of worker $j$, and $\gamma $ works as regularization. Let us denote $1/\lambda _j := 1/\tau _j + 1/\gamma $. Assuming that the crowd labels were generated by this model, the true labels and the precision parameters are estimated by the following EM-like algorithm.

E-step: for each $i\in {\mathcal I}$, update the true label $y_i$ as
$$\begin{aligned} y_i = \dfrac{\sum _{j\in {\mathcal {J}}_i} \lambda _j y_{i,j}}{\sum _{j\in {\mathcal {J}}_i} \lambda _j}. \end{aligned}$$
M-step: for each $j\in {\mathcal {J}}$, update $\lambda _j$ by solving
$$\begin{aligned} \dfrac{1}{\lambda _j} = \dfrac{1}{|{\mathcal I}_j|}\sum _{i\in {\mathcal I}_j} (y_{i,j} - y_i)^2. \end{aligned}$$

This algorithm can also be extended to preserve worker privacy. In the E-step, the parties calculate $\left\{ \sum _{j\in {\mathcal {J}}_i} \lambda _j y_{i,j}, \sum _{j\in {\mathcal {J}}_i} \lambda _j \mid {i\in {\mathcal I}}\right\} $ using our secure sum protocol, and the requester calculates and broadcasts $\{y_i \mid {i\in {\mathcal I}}\}$. In the M-step, each worker $j$ calculates $\lambda _j$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kajino, H., Arai, H. & Kashima, H. Preserving worker privacy in crowdsourcing. Data Min Knowl Disc 28, 1314–1335 (2014). https://doi.org/10.1007/s10618-014-0352-3

Download citation

Received: 11 October 2013
Accepted: 02 May 2014
Published: 29 May 2014
Issue Date: September 2014
DOI: https://doi.org/10.1007/s10618-014-0352-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Preserving worker privacy in crowdsourcing

Abstract

Access this article

Similar content being viewed by others

The Chicago face database: A free stimulus set of faces and norming data

Privacy concerns in E-commerce: A taxonomy and a future research agenda

Big Data Privacy: Challenges to Privacy Principles and Models

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix 1: Extensions to multi-class and real-valued labels

1.1 Appendix 1.1: Multi-class labels

1.2 Appendix 1.2: Real-valued labels

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Preserving worker privacy in crowdsourcing

Abstract

Access this article

Similar content being viewed by others

The Chicago face database: A free stimulus set of faces and norming data

Privacy concerns in E-commerce: A taxonomy and a future research agenda

Big Data Privacy: Challenges to Privacy Principles and Models

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix 1: Extensions to multi-class and real-valued labels

Appendix 1: Extensions to multi-class and real-valued labels

1.1 Appendix 1.1: Multi-class labels

1.2 Appendix 1.2: Real-valued labels

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation