How many crowdsourced workers should a requester hire?

Open Access
Article

Abstract

Recent years have seen an increased interest in crowdsourcing as a way of obtaining information from a potentially large group of workers at a reduced cost. The crowdsourcing process, as we consider in this paper, is as follows: a requester hires a number of workers to work on a set of similar tasks. After completing the tasks, each worker reports back outputs. The requester then aggregates the reported outputs to obtain aggregate outputs. A crucial question that arises during this process is: how many crowd workers should a requester hire? In this paper, we investigate from an empirical perspective the optimal number of workers a requester should hire when crowdsourcing tasks, with a particular focus on the crowdsourcing platform Amazon Mechanical Turk. Specifically, we report the results of three studies involving different tasks and payment schemes. We find that both the expected error in the aggregate outputs as well as the risk of a poor combination of workers decrease as the number of workers increases. Surprisingly, we find that the optimal number of workers a requester should hire for each task is around 10 to 11, no matter the underlying task and payment scheme. To derive such a result, we employ a principled analysis based on bootstrapping and segmented linear regression. Besides the above result, we also find that overall top-performing workers are more consistent across multiple tasks than other workers. Our results thus contribute to a better understanding of, and provide new insights into, how to design more effective crowdsourcing processes.

Keywords

Crowdsourcing Human computation Amazon mechanical turk 

Mathematics Subject Classification (2010)

68T99 90B99 

References

  1. 1.
    von Ahn, L., Dabbish, L.: Designing games with a purpose. Commun. ACM 51(8), 58–67 (2008)CrossRefGoogle Scholar
  2. 2.
    Armstrong, J.S.: Combining Forecasts. In: Armstrong, J.S. (ed.) Principles of Forecasting: A Handbook for Researchers and Practitioners, pp 1–19. Kluwer Academic Publishers (2001)Google Scholar
  3. 3.
    Bacon, D.F., Chen, Y., Kash, I., Parkes, D.C., Rao, M., Sridharan, M.: Predicting your own effort. In: Proceedings of the 11th International conference on autonomous agents and multiagent systems, pp 695–702 (2012)Google Scholar
  4. 4.
    Bai, J., Perron, P.: Computation and analysis of multiple structural change models. J. Appl. Econ. 18(1), 1–22 (2003)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Buhrmester, M.D., Kwang, T., Gosling, S.D.: Amazon’s mechanical turk: a new source of inexpensive, yet high-quality, data. Perspect. Psychol. Sci. 6(1), 3–5 (2011)CrossRefGoogle Scholar
  6. 6.
    Carvalho, A.: Tailored proper scoring rules elicit decision weights. Judgment and Decision Making 10(1), 86–96 (2015)Google Scholar
  7. 7.
    Carvalho, A., Dimitrov, S., Larson, K.: Inducing honest reporting without observing outcomes: an application to the peer-review process (2013). arXiv preprint arXiv: 1309.3197Google Scholar
  8. 8.
    Carvalho, A., Dimitrov, S., Larson, K.: The output-agreement method induces honest behavior in the presence of social projection. ACM SIGecom Exchanges 13(1), 77–81 (2014)CrossRefGoogle Scholar
  9. 9.
    Carvalho, A., Dimitrov, S., Larson, K.: A study on the influence of the number of mturkers on the quality of the aggregate output. In: Bulling, N. (ed.) Multi-agent systems, lecture notes in computer science, vol. 8953, pp 285–300. Springer (2015)Google Scholar
  10. 10.
    Carvalho, A., Larson, K.: Sharing a reward based on peer evaluations. In: Proceedings of the 9th International conference on autonomous agents and multiagent systems, pp 1455–1456 (2010)Google Scholar
  11. 11.
    Carvalho, A., Larson, K.: A truth serum for sharing rewards. In: Proceedings of the 10th International conference on autonomous agents and multiagent systems, pp 635–642 (2011)Google Scholar
  12. 12.
    Carvalho, A., Larson, K.: Sharing rewards among strangers based on peer evaluations. Decis. Anal. 9(3), 253–273 (2012)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Carvalho, A., Larson, K.: A consensual linear opinion pool. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence, pp 2518–2524 (2013)Google Scholar
  14. 14.
    Chen, Y., Chu, C.H., Mullen, T., Pennock, D.M.: Information markets vs. opinion pools: an empirical comparison. In: Proceedings of the 6th ACM Conference on Electronic Commerce, pp 58–67 (2005)Google Scholar
  15. 15.
    Chiu, C.M., Liang, T.P., Turban, E.: What can crowdsourcing do for decision support. Decis. Support. Syst. 65, 40–49 (2014)CrossRefGoogle Scholar
  16. 16.
    Clemen, R.T.: Combining forecasts: a review and annotated bibliography. Int. J. Forecast. 5(4), 559–583 (1989)CrossRefGoogle Scholar
  17. 17.
    Dietterich, T.G.: Ensemble methods in machine learning. In: Multiple classifier systems, lecture notes in computer science, vol. 1857, pp 1–15. Springer (2000)Google Scholar
  18. 18.
    Gao, X.A., Mao, A., Chen, Y.: Trick or treat: putting peer prediction to the test. In: Proceedings of the 1st workshop on crowdsourcing and online behavioral experiments (2013)Google Scholar
  19. 19.
    Hansen, L.K., Salamon, P.: Neural Network Ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12(10), 993–1001 (1990)CrossRefGoogle Scholar
  20. 20.
    Hanson, R.: Combinatorial information market design. Inf. Syst. Front. 5(1), 107–119 (2003)CrossRefGoogle Scholar
  21. 21.
    Ho, C.J., Vaughan, J.W.: Online task assignment in crowdsourcing markets. In: Proceedings of the 26th AAAI conference on artificial intelligence, pp 45–51 (2012)Google Scholar
  22. 22.
    Huang, S.W., Fu, W.T.: Enhancing reliability using peer consistency evaluation in human computation. In: Proceedings of the 2013 conference on computer supported cooperative work, pp 639–648 (2013)Google Scholar
  23. 23.
    Ipeirotis, P.G.: Analyzing the amazon mechanical turk marketplace. XRDS Crossroads: The ACM Magazine for Students 17(2), 16–21 (2010)CrossRefGoogle Scholar
  24. 24.
    Ipeirotis, P.G., Provost, F., Sheng, V.S., Wang, J.: Repeated labeling using multiple noisy labelers. Data Min. Knowl. Disc 28(2), 402–441 (2014)MathSciNetCrossRefMATHGoogle Scholar
  25. 25.
    Lin, C.H., Weld, D.S.: Dynamically switching between synergistic workflows for crowdsourcing. In: Proceedings of the 26th AAAI conference on artificial intelligence, pp 132–133 (2012)Google Scholar
  26. 26.
    Marge, M., Banerjee, S., Rudnicky, A.I.: Using the amazon mechanical turk for transcription of spoken language. In: Proceedings of the 2010 IEEE International conference on acoustics speech and signal processing, pp 5270–5273 (2010)Google Scholar
  27. 27.
    Mason, W., Suri, S.: Conducting behavioral research on amazon’s mechanical turk. Behav. Res. Methods 44(1), 1–23 (2012)CrossRefGoogle Scholar
  28. 28.
    Neruda: P.: 100 Love Sonnets. Exile (2007)Google Scholar
  29. 29.
    Oshiro, T.M., Perez, P.S., Baranauskas, J.A.: How many trees in a random forest In: Perner, P. (ed.) Machine learning and data mining in pattern recognition, Lecture notes in computer science, vol. 7376, pp 154–168. Springer, Berlin (2012)Google Scholar
  30. 30.
    Paolacci, G., Chandler, J., Ipeirotis, P.G.: Running experiments on amazon mechanical turk. Judgment and Decision making 5(5), 411–419 (2010)Google Scholar
  31. 31.
    Plous, S.: The Psychology of Judgment and Decision Making. Mcgraw-Hill Book Company (1993)Google Scholar
  32. 32.
    Quinn, A.J., Bederson, B.B.: Human computation: a survey and taxonomy of a growing field. In: Proceedings of the 2011 SIGCHI conference on human factors in computing systems, pp 1403–1412 (2011)Google Scholar
  33. 33.
    Ren, J., Nickerson, J.V., Mason, W., Sakamoto, Y., Graber, B.: Increasing the crowd’s capacity to create: how alternative generation affects the diversity, relevance and effectiveness of generated ads. Decis. Support. Syst. 65, 28–39 (2014)CrossRefGoogle Scholar
  34. 34.
    Savage, L.J.: Elicitation of personal probabilities and expectations. J. Am. Stat. Assoc 66(336), 783–801 (1971)MathSciNetCrossRefMATHGoogle Scholar
  35. 35.
    Selten, R.: Axiomatic characterization of the quadratic scoring rule. Exp. Econ 1(1), 43–62 (1998)MathSciNetCrossRefMATHGoogle Scholar
  36. 36.
    Shaw, A.D., Horton, J.J., Chen, D.L.: Designing incentives for inexpert human raters. In: Proceedings of the ACM 2011 conference on computer supported cooperative work, pp 275–284 (2011)Google Scholar
  37. 37.
    Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? Improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th International conference on knowledge discovery and data mining, pp 614–622 (2008)Google Scholar
  38. 38.
    Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast—But is it good? Evaluating non-expert annotations for natural language tasks. In: Proceedings of the conference on empirical methods in natural language processing, pp 254–263 (2008)Google Scholar
  39. 39.
    Taylor, J., Taylor, A., Greenaway, K.: Little Ann and Other Poems. Nabu Press (2010)Google Scholar
  40. 40.
    Tran-Thanh, L., Stein, S., Rogers, A., Jennings, N.R.: Efficient crowdsourcing of unknown experts using multi-armed bandits. In: Proceedings of the 20th European conference on artificial intelligence, pp 768–773 (2012)Google Scholar
  41. 41.
    Winkler, R.L., Clemen, R.T.: Multiple experts vs. multiple methods: combining correlation assessments. Decis. Anal 1(3), 167–176 (2004)CrossRefGoogle Scholar
  42. 42.
    Winkler, R.L., Murphy, A.H.: “Good” Probability Assessors. J. Appl. Meteorol 7(5), 751–758 (1968)CrossRefGoogle Scholar
  43. 43.
    Yuen, M.C., King, I., Leung, K.S.: A survey of crowdsourcing systems. In: Proceedings of IEEE 3rd International Conference on Social Computing, pp 766–773 (2011)Google Scholar
  44. 44.
    Zeileis, A., Leisch, F., Hornik, K., Kleiber, C.: strucchange: an R package for testing for structural change in linear regression models. J. Stat. Softw 7(2), 1–38 (2002)CrossRefGoogle Scholar
  45. 45.
    Zhang, H., Horvitz, E., Parkes, D.: Automated workflow synthesis. In: Proceedings of the 27th AAAI conference on artificial intelligence, pp 1020–1026 (2013)Google Scholar

Copyright information

© The Author(s) 2016

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors and Affiliations

  1. 1.Rotterdam School of ManagementErasmus UniversityRotterdamThe Netherlands
  2. 2.Department of Management SciencesUniversity of WaterlooWaterlooCanada
  3. 3.David R. Cheriton School of Computer ScienceUniversity of WaterlooWaterlooCanada

Personalised recommendations