Social Network Analysis and Mining

, Volume 3, Issue 4, pp 873–888 | Cite as

High-throughput crowdsourcing mechanisms for complex tasks

  • Guido Sautter
  • Klemens Böhm
Original Article


Crowdsourcing has been identified as a way to facilitate large-scale data processing that requires human input. However, working with a large anonymous user community also poses new challenges. In particular, both possible misjudgment and dishonesty threaten the quality of the results. Common countermeasures are based on redundancy, giving way to a tradeoff between result quality and throughput. Ideally, measures should (1) maintain high throughput and (2) ensure high result quality at the same time. Existing research on crowdsourcing mostly focuses on result quality and pays little attention to throughput or even to the tradeoff between the two. One reason is that the number of tasks (atomic units of work) is usually small. A further problem is that the tasks themselves are small as well. In consequence, existing result quality-improvement mechanisms do not scale to the number or complexity of tasks that arise, for instance, in proofreading and processing of digitized legacy literature. This paper proposes novel mechanisms that (1) are independent of the size and complexity of tasks and (2) allow to trade result quality for throughput to a significant extent. Both mathematical analyses and extensive simulations demonstrate the effectiveness of the proposed mechanisms.


Crowdsourcing Data quality Throughput 


  1. AMT. The Amazon Mechanical Turk,
  2. Cooper S, Khatib F, Treuille A, Barbero J, Lee J, Beenen M, Leaver-Fay A, Baker D, Popovic Z (2010) Predicting protein structures with a multiplayer online game. Nature 466:756–760CrossRefGoogle Scholar
  3. Eckert K, Niepert M, Niemann C, Buckner C, Allen C, Stuckenschmidt H (2010) Crowdsourcing the assembly of concept hierarchies. In: Proceedings of JCDL 2010, Brisbane, AustraliaGoogle Scholar
  4. Lintott CJ, Schawinski K, Slosar A, Land K, Bamford S, Thomas D, Raddick MJ, Nichol RC, Szalay A, Andreescu D, Murray P, Vandenberg J (2008) Galaxy Zoo: morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey. Monthly Notices of the Royal Astronomical Society, 389. doi:  10.1111/j.1365-2966.2008.13689.x
  5. Newby GB, Franks C (2003) Distributed proofreading. In Proceedings of JCDL 2003. Houston, TX, USA. doi: 10.1109/JCDL.2003.1204888
  6. Sautter G, Böhm K (2011) High-throughput crowdsourcing mechanisms for complex tasks. In: Proceedings of SocInfo 2011, SingaporeGoogle Scholar
  7. Sautter G, Agosti D, Böhm K, Klingenberg C (2009) Creating digital resources from legacy documents—an experience report from the biosystematics domain. In: Proceedings of ESWC, Heraklion, GreeceGoogle Scholar
  8. Siorpaes K, Hepp M (2007) OntoGame: towards over-coming the incentive bottleneck in ontology building. In: Proceedings OTM 2007, Vilamoura, PortugalGoogle Scholar
  9. Snow R, O’Connor B, Jurafsky D, Ng AY (2008) Cheap and fast — but is it good?: evaluating non-expert annotations for natural language tasks. In: EMNLP 2008, Morristown, NJ, USAGoogle Scholar
  10. Von Ahn L (2006) Games with a purpose. IEEE Comput 29(6):92–94CrossRefGoogle Scholar
  11. Von Ahn L, Blum M, Hopper N, Langford J (2003) CAPTCHA: using hard ai problems for security. Advances in cryptology—EUROCRYPT 2003. Springer Berlin/Heidelberg. doi: 10.1007/3-540-39200-9_18
  12. Von Ahn L, Maurer B, McMillen C, Abraham D, Blum M (2008) reCAPTCHA: Human-Based Character Recognition via Web Security Measures. Science 321 (5895). doi: 10.1126/science.1160379

Copyright information

© Springer-Verlag Wien 2013

Authors and Affiliations

  1. 1.Computer Science DepartmentKarlsruhe Institute of TechnologyKarlsruheGermany

Personalised recommendations