Springer Nature is making Coronavirus research free. View research | View latest news | Sign up for updates

A workload-dependent task assignment policy for crowdsourcing

  • 466 Accesses

  • 2 Citations

Abstract

Crowdsourcing marketplaces have emerged as an effective tool for high-speed, low-cost labeling of massive data sets. Since the labeling accuracy can greatly vary from worker to worker, we are faced with the problem of assigning labeling tasks to workers so as to maximize the accuracy associated with their answers. In this work, we study the problem of assigning workers to tasks under the assumption that workers’ reliability could change depending on their workload, as a result of, e.g., fatigue and learning. We offer empirical evidence of the existence of a workload-dependent accuracy variation among workers, and propose solution procedures for our Crowdsourced Labeling Task Assignment Problem, which we validate on both synthetic and real data sets.

This is a preview of subscription content, log in to check access.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11

Notes

  1. 1.

    To show this, let OPT be the value of an optimal solution. Let \(\rho = \frac {LB}{UB}\) . Due to \(\frac {LB}{UB} \leq \frac {LB}{OPT}\), we have L Bρ O P T. Since L BO P T, we conclude ρ O P TL BO P T, i.e., that the solution of value LB is a ρ-approximate solution to the problem, with \(\rho = \frac {LB}{UB}\).

  2. 2.

    For decreasing accuracies, mean = 0.15⋅2t; for increasing accuracies, mean = 0.85⋅2t. Standard deviation = 0.14.

References

  1. 1.

    Amazon Mechanical Turk. https://www.mturk.com

  2. 2.

    Abraham, I., Alonso, O., Kandylas, V., Slivkins, A.: Adaptive crowdsourcing algorithms for the bandit survey problem. In: COLT, pp. 882–910 (2013)

  3. 3.

    Celis, L.E., Dasgupta, K., Rajan, V.: Adaptive crowdsourcing for temporal crowds. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1093–1100 (2013)

  4. 4.

    Chen, X., Lin, Q., Zhou, D.: Optimistic knowledge gradient policy for optimal budget allocation in crowdsourcing. In: ICML (3), pp. 64–72 (2013)

  5. 5.

    Ciceri, E., Fraternali, P., Martinenghi, D., Tagliasacchi, M.: Crowdsourcing for top-K query processing over uncertain data. IEEE Trans. Knowl. Data Eng. (2015)

  6. 6.

    CrowdFlower. https://www.crowdflower.com

  7. 7.

    Dai, P., Lin, C.H., Mausam, Weld, D.S.: Pomdp-based control of workflows for crowdsourcing. A. I. 202, 52–85 (2013)

  8. 8.

    Dasgupta, K., Rajan, V., Karanam, S., Ponnavaikko, K., Balamurugan, C., Piratla, N.M.: Crowdutility: know the crowd that works for you. In: CHI, pp. 145–150 (2013)

  9. 9.

    Donmez, P., Carbonell, J.G., Schneider, J.G.: A probabilistic framework to learn from multiple annotators with time-varying accuracy. In: SDM, pp. 826–837 (2010)

  10. 10.

    Fan, J., Li, G., Ooi, B.C., Tan, K.-l., Feng, J.: iCrowd: an adaptive crowdsourcing framework. In: SIGMOD, pp. 1015–1030. ACM (2015)

  11. 11.

    Galli, L., Fraternali, P., Martinenghi, D., Tagliasacchi, M., Novak, J.: A draw-and-guess game to segment images (2012)

  12. 12.

    Ho, C.-J., Vaughan, J.W.: Online task assignment in crowdsourcing markets. In: AAAI, pp. 45–51 (2012)

  13. 13.

    Ho, C.-J., Jabbari, S., Vaughan, J.W.: Adaptive task assignment for crowdsourced classification. In: ICML, pp. 534–542 (2013)

  14. 14.

    Ipeirotis, P.G.: Analyzing the amazon mechanical turk marketplace. XRDS, 16–21 (2010)

  15. 15.

    Karger, D.R., Oh, S., Shah, D.: Iterative learning for reliable crowdsourcing systems. In: NIPS, pp. 1953–1961 (2011)

  16. 16.

    Kobren, A., Tan, C.H., Ipeirotis, P., Gabrilovich, E.: Getting more for less optimized crowdsourcing with dynamic tasks and goals. In: WWW, pp. 592–602 (2015)

  17. 17.

    Microtask. http://microtask.com

  18. 18.

    Minder, P., Seuken, S., Bernstein, A., Zollinger, M.: Crowdmanager - combinatorial allocation and pricing of crowdsourcing tasks with time constraints. In: ACM-EC 2012 Workshops (2012)

  19. 19.

    Nemhauser, G.L., Wolsey, L.A.: Integer and combinatorial optimization, vol. 18. Wiley, New York (1988)

  20. 20.

    Rajan, V., Bhattacharya, S., Celis, L.E., Chander, D., Dasgupta, K., Karanam, S.: Crowdcontrol: an online learning approach for optimal task scheduling in a dynamic crowd platform. In: ICML Workshop: Machine Learning Meets Crowdsourcing (2013)

  21. 21.

    Raykar, V.C., Yu, S., Zhao, L.H., Valadez, G.H., Florin, C., Bogoni, L., Moy, L.: Learning from crowds. J. Mach. Learn. Res., 1297–1322 (2010)

  22. 22.

    reCAPTCHA. https://www.google.com/recaptcha

  23. 23.

    Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? Improving data quality and data mining using multiple, noisy labelers. In: KDD, pp. 614–622 (2008)

  24. 24.

    Tran-Thanh, L., Venanzi, M., Rogers, A., Jennings, N.R.: Efficient budget allocation with accuracy guarantees for crowdsourcing classification tasks. In: AAMAS, pp. 901–908 (2013)

  25. 25.

    Tran-Thanh, L., Stein, S., Rogers, A., Jennings, N.R.: Efficient crowdsourcing of unknown experts using bounded multi-armed bandits. A. I. 214, 89–111 (2014)

  26. 26.

    Wolsey, L.A.: Integer programming, vol. 42. Wiley, New York (1998)

  27. 27.

    Yan, Y., Rosales, R., Fung, G., Dy, J.G.: Active learning from crowds. In: ICML, pp. 1161–1168 (2011)

  28. 28.

    Zhang, C.J., Chen, L., Jagadish, H.V., Cao, C.C.: Reducing uncertainty of schema matching via crowdsourcing. PVLDB 6(9), 757–768 (2013)

  29. 29.

    Zheng, Y., Wang, J., Li, G., Cheng, R., Feng, J.: QASCA a Quality-Aware task assignment system for crowdsourcing applications. In: SIGMOD, pp. 1031–1046 (2015)

Download references

Acknowledgments

The authors acknowledge support from the EC’s FP7 “Smart H2O” project (http://smarth2o-fp7.eu/). The work of S. Coniglio is partly supported by the German Federal Ministry for Economic Affairs and Energy, BMWi, grant 03ET7528B.

Author information

Correspondence to Ilio Catallo.

Appendix: A: Derivation of the pricing subproblem

Appendix: A: Derivation of the pricing subproblem

Consider the primal linear program max{c x:A xb,x≥0}, with \(c \in \mathbb {R}^{n}\), \(x \in \mathbb {R}^{n}\), \(A \in \mathbb {R}^{m \times n}\), and \(b \in \mathbb {R}^{m}\). By aggregating the constraints A xb with a vector \(y \in \mathbb {R}^{m}_{+}\), we have the valid inequality y A xy b. If we choose y such that y Ac, then c xy A xy b. This implies that, for any y≥0 satisfying y Ac, we obtain an upper bound of value y b on the value of an optimal solution to the primal problem. The tightest upper bound is obtained by solving min{y b:y Ac,y≥0} (the dual problem). For each j=1,…,n, the dual constraint of x j can be obtained by first aggregating all the primal constraints as \({\sum }_{i=1}^{m} y_{i} {\sum }_{j=1}^{n} a_{ij} x_{j} \leq {\sum }_{i=1}^{m} y_{i} b_{i}\), then collecting x j on the left-hand side, yielding \({\sum }_{j=1}^{n} x_{j} ({\sum }_{i=1}^{m} y_{i} a_{ij}) \leq {\sum }_{i=1}^{m} y_{i} b_{i}\), and finally imposing \({\sum }_{i=1}^{m} y_{i} a_{ij} \geq c_{j}\).

We now derive the dual constraint for Problem (14)–(19), corresponding to variable λ i h . We first aggregate the inequalities (15)–(19), each of which multiplied by the corresponding dual variable (the calculation of the right-hand side is omitted):

$$\begin{array}{@{}rcl@{}} {\sum}_{i \in \mathcal{T}} \alpha_{i} \left( \eta - {\sum}_{h \in \mathcal{H}} c_{h} \lambda_{ih} \right) +& \\ + \beta \left( {\sum}_{i \in \mathcal{T}} {\sum}_{h \in \mathcal{H}} \left( {\sum}_{j \in \mathcal{A}} {\sum}_{k \in \mathcal{I}} w_{hjk}\right) \lambda_{ih} - b\right) +& \\ + {\sum}_{j \in \mathcal{A}} {\sum}_{k \in \mathcal{I}} \gamma_{jk} \left( {\sum}_{i \in \mathcal{T}} {\sum}_{h \in \mathcal{H}} w_{hjk} \lambda_{ih} - 1 \right) +& \\ + {\sum}_{i \in \mathcal{T}} \delta_{i} \left( {\sum}_{h \in \mathcal{H}} \lambda_{ih} -1 \right) +& \\ + {\sum}_{j \in \mathcal{A}} {\sum}_{k \in \mathcal{I} \setminus \{1\}} \epsilon_{jk} \left( {\sum}_{i \in \mathcal{T}} {\sum}_{h \in \mathcal{H}} (w_{hjk} - w_{hj,k-1}) \lambda_{ih} \right) & \geq (\cdot). \end{array} $$
(31)

Let \(\epsilon _{j,|\mathcal {I}|+1} = 0\). After collecting λ i h (and omitting the coefficient for η and the right-hand side), Inequality (31) becomes:

$$\begin{array}{@{}rcl@{}} {\sum}_{{{\begin{array}{lllll}i \in \mathcal{T}\\ h \in \mathcal{H}\end{array}}}} \left( -\alpha_{i} c_{h} + {\sum}_{{{\begin{array}{lllllll}j \in A\\k \in \mathcal{I}\end{array}}}} \left( \beta + \gamma^{jk} + \epsilon^{jk} - \epsilon^{j,k+1} \right) w_{hjk} + \delta_{i} \right)\lambda_{ih} \\ + (\cdot) \eta \geq (\cdot). \end{array} $$

For each \(i \in \mathcal {T}\) and \(h \in \mathcal {H}\), the dual constraint corresponding to λ i h thus reads:

$$\begin{array}{@{}rcl@{}} -\alpha_{i} c_{h} + {\sum}_{j \in A} {\sum}_{k \in \mathcal{I}} \left( \beta + \gamma_{jk} + \epsilon_{jk} - \epsilon_{j,k+1} \right) w_{hjk} + \delta_{i} \geq 0, \end{array} $$
(32)

where the right-hand side is zero since λ i h does not show up in the objective function of Problem (14)–(19).

By standard linear programming duality, we have that the reduced cost of a column is equal to the slack of the corresponding dual constraint. More precisely, to a primal column with nonnegative reduce costs corresponds a dual constraint which is violated. Hence, the pricing subproblem amounts to finding a feasible workplan that minimizes the left-hand side of (32) (or, equivalently, that maximizes its opposite). Thus, Problem (21)–(24) is obtained. When solving it for index i, any solution of value greater than or equal to δ i yields a new column with a nonnegative reduced cost which, when added to Problem (14)–(19), might improve its solution.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Catallo, I., Coniglio, S., Fraternali, P. et al. A workload-dependent task assignment policy for crowdsourcing. World Wide Web 20, 1179–1210 (2017). https://doi.org/10.1007/s11280-016-0428-7

Download citation

Keywords

  • Crowdsourcing
  • Task assignment
  • Human computation