A Study on the Influence of the Number of MTurkers on the Quality of the Aggregate Output

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8953)


Recent years have seen an increased interest in crowdsourcing as a way of obtaining information from a large group of workers at a reduced cost. In general, there are arguments for and against using multiple workers to perform a task. On the positive side, multiple workers bring different perspectives to the process, which may result in a more accurate aggregate output since biases of individual judgments might offset each other. On the other hand, a larger population of workers is more likely to have a higher concentration of poor workers, which might bring down the quality of the aggregate output.

In this paper, we empirically investigate how the number of workers on the crowdsourcing platform Amazon Mechanical Turk influences the quality of the aggregate output in a content-analysis task. We find that both the expected error in the aggregate output as well as the risk of a poor combination of workers decrease as the number of workers increases.

Moreover, our results show that restricting the population of workers to up to the overall top 40 % workers is likely to produce more accurate aggregate outputs, whereas removing up to the overall worst 40 % workers can actually make the aggregate output less accurate. We find that this result holds due to top-performing workers being consistent across multiple tasks, whereas worst-performing workers tend to be inconsistent. Our results thus contribute to a better understanding of, and provide valuable insights into, how to design more effective crowdsourcing processes.



The authors acknowledge Craig Boutilier, Pascal Poupart, Daniel Lizotte, and Xi Alice Gao for useful discussions. The authors thank Carol Acton, Katherine Acheson, Stefan Rehm, Susan Gow, and Veronica Austen for providing gold-standard outputs for our experiment. The authors also thank the Natural Sciences and Engineering Research Council of Canada for funding this research.


  1. 1.
    Carvalho, A., Dimitrov, S., Larson, K.: The output-agreement method induces honest behavior in the presence of social projection. ACM SIGecom Exch. 13(1), 77–81 (2014)CrossRefGoogle Scholar
  2. 2.
    Carvalho, A., Larson, K.: A consensual linear opinion pool. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence, pp. 2518–2524. AAAI Press (2013)Google Scholar
  3. 3.
    Clemen, R.T.: Combining forecasts: a review and annotated bibliography. Int. J. Forecast. 5(4), 559–583 (1989)CrossRefGoogle Scholar
  4. 4.
    Gao, X.A., Mao, A., Chen, Y.: Trick or treat: putting peer prediction to the test. In: Proceedings of the 1st Workshop on Crowdsourcing and Online Behavioral Experiments (2013)Google Scholar
  5. 5.
    Ho, C.J., Vaughan, J.W.: Online task assignment in crowdsourcing markets. In: Proceedings of the 26th AAAI Conference on Artificial Intelligence, pp. 45–51 (2012)Google Scholar
  6. 6.
    Ipeirotis, P.G.: Analyzing the amazon mechanical turk marketplace. XRDS Crossroads: ACM Mag. Stud. 17(2), 16–21 (2010)CrossRefGoogle Scholar
  7. 7.
    Lin, C.H., Weld, D.S.: Dynamically switching between synergistic workflows for crowdsourcing. In: Proceedings of the 26th AAAI Conference on Artificial Intelligence, pp. 132–133 (2012)Google Scholar
  8. 8.
    Mason, W., Suri, S.: Conducting behavioral research on amazon’s mechanical turk. Behav. Res. Methods 44(1), 1–23 (2012)CrossRefGoogle Scholar
  9. 9.
    Neruda, P.: 100 Love Sonnets. Exile, Holstein (2007)Google Scholar
  10. 10.
    Quinn, A.J., Bederson, B.B.: Human computation: a survey and taxonomy of a growing field. In: Proceedings of the 2011 SIGCHI Conference on Human Factors in Computing Systems, pp. 1403–1412 (2011)Google Scholar
  11. 11.
    Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008)Google Scholar
  12. 12.
    Taylor, J., Taylor, A., Greenaway, K.: Little Ann and Other Poems. Nabu Press, Charleston (2010)Google Scholar
  13. 13.
    Tran-Thanh, L., Stein, S., Rogers, A., Jennings, N.R.: Efficient crowdsourcing of unknown experts using multi-armed bandits. In: Proceedings of the 20th European Conference on Artificial Intelligence, pp. 768–773 (2012)Google Scholar
  14. 14.
    Yuen, M.C., King, I., Leung, K.S.: A survey of crowdsourcing systems. In: Proceedings of IEEE 3rd International Conference on Social Computing, pp. 766–773 (2011)Google Scholar
  15. 15.
    Zhang, H., Horvitz, E., Parkes, D.: Automated workflow synthesis. In: Proceedings of the 27th AAAI Conference on Artificial Intelligence, pp. 1020–1026 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Rotterdam School of ManagementErasmus UniversityRotterdamThe Netherlands
  2. 2.Department of Management SciencesUniversity of WaterlooWaterlooCanada
  3. 3.Cheriton School of Computer ScienceUniversity of WaterlooWaterlooCanada

Personalised recommendations