Crowd Anatomy Beyond the Good and Bad: Behavioral Traces for Crowd Worker Modeling and Pre-selection

Abstract

The suitability of crowdsourcing to solve a variety of problems has been investigated widely. Yet, there is still a lack of understanding about the distinct behavior and performance of workers within microtasks. In this paper, we first introduce a fine-grained data-driven worker typology based on different dimensions and derived from behavioral traces of workers. Next, we propose and evaluate novel models of crowd worker behavior and show the benefits of behavior-based worker pre-selection using machine learning models. We also study the effect of task complexity on worker behavior. Finally, we evaluate our novel typology-based worker pre-selection method in image transcription and information finding tasks involving crowd workers completing 1,800 HITs. Our proposed method for worker pre-selection leads to a higher quality of results when compared to the standard practice of using qualification or pre-screening tests. For image transcription tasks our method resulted in an accuracy increase of nearly 7% over the baseline and of almost 10% in information finding tasks, without a significant difference in task completion time. Our findings have important implications for crowdsourcing systems where a worker’s behavioral type is unknown prior to participation in a task. We highlight the potential of leveraging worker types to identify and aid those workers who require further training to improve their performance. Having proposed a powerful automated mechanism to detect worker types, we reflect on promoting fairness, trust and transparency in microtask crowdsourcing platforms.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Notes

  1. 1.

    http://www.captcha.net/

  2. 2.

    http://www.crowdflower.com/

  3. 3.

    http://github.com/Valve/fingerprintjs

  4. 4.

    Note that worker types describe session-level behavior of the workers rather than properties of a person.

  5. 5.

    Untrustworthy workers are those workers who failed to pass at least one attention check question.

  6. 6.

    Shortened URL - https://goo.gl/jjv0gp

  7. 7.

    First point at which a worker provides an incorrect response after having provided at least one correct response (Gadiraju et al. 2015b).

  8. 8.

    CrowdFlower suggests a min. accuracy of 70% by default.

References

  1. Berg, Bruce Lawrence (2004). Methods for the social sciences. Qualitative Research Methods for the Social Sciences. Boston: Pearson Education.

  2. Bozzon, Alessandro; Marco Brambilla; Stefano Ceri; Matteo Silvestri; and Giuliano Vesci (2013). Choosing the Right Crowd: Expert Finding in Social Networks. EDBT’13. Joint 2013 EDBT/ICDT Conferences, Proceedings of the 16th International Conference on Extending Database Technology, Genoa, Italy, 18-22 March 2013. New York: ACM Press, pp. 637– 648.

  3. Cheng, Justin; Jaime Teevan; Shamsi T Iqbal; and Michael S Bernstein (2015). Break it down: A comparison of macro-and microtasks. CHI’15. Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, Seoul, Republic of Korea, 18-23 April 2015. New York: ACM Press, pp. 4061–4064.

  4. Dang, Brandon; Miles Hutson; and Matthew Lease (2016). MmmTurkey: A Crowdsourcing Framework for Deploying Tasks and Recording Worker Behavior on Amazon Mechanical Turk. HCOMP’16. Proceedings of the 4th AAAI Conference on Human Computation and Crowdsourcing (HCOMP): Works-in-Progress Track, Austin, Texas, USA, 30 October-3 November 2016. AAAI Press, pp. 1–3.

  5. Demartini, Gianluca; Djellel Eddine Difallah; and Philippe Cudré-Mauroux (2012). ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. WWW’12. Proceedings of the 21st World Wide Web Conference 2012, Lyon, France, 16-20 April 2012. New York: ACM Press, pp. 469–478.

  6. Denzin, Norman K (1978). The research act: A theoretical orientation to sociological methods, Vol. 2. New York: McGraw-Hill.

  7. Difallah, Djellel Eddine; Gianluca Demartini; and Philippe Cudré-Mauroux (2013). Pick-a-crowd: tell me what you like, and i’ll tell you what to do. WWW’13. Proceedings of the 22nd International World Wide Web Conference, Rio de Janeiro, Brazil, 13-17 May 2013. New York: ACM Press, pp. 367–374.

  8. Difallah, Djellel Eddine; Michele Catasta; Gianluca Demartini; Panagiotis G Ipeirotis; and Philippe Cudré-Mauroux (2015). The dynamics of micro-task crowdsourcing: The case of amazon mturk. WWW’15. Proceedings of the 24th International Conference on World Wide Web, Florence, Italy, 18-22 May 2015. New York: ACM Press, pp. 238–247.

  9. Dow, Steven; Anand Kulkarni; Scott Klemmer; and Björn Hartmann (2012). Shepherding the crowd yields better work. CSCW’12. Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work, Seattle, WA, USA, 11-15 February 2012. New York: ACM Press, pp. 1013–1022.

  10. Eckersley, Peter (2010). How unique is your web browser? PETS’10. Proceedings of the 10th International Symposium on Privacy Enhancing Technologies Symposium, Berlin, Germany, 21-23 July 2010. Heidelberg: Springer, pp. 1–18.

  11. Eickhoff, Carsten; Christopher G Harris; Arjen P de Vries; and Padmini Srinivasan (2012). Quality through flow and immersion: gamifying crowdsourced relevance assessments. SIGIR’12. Proceedings of the 35th International ACM SIGIR conference on research and development in Information Retrieval, Portland, OR, USA, 12-16 August 2012. New York: ACM Press, pp. 871–880.

  12. Feyisetan, Oluwaseyi; Elena Simperl; Max Van Kleek; and Nigel Shadbolt (2015a). Improving paid microtasks through gamification and adaptive furtherance incentives. WWW’15. Proceedings of the 24th International Conference on World Wide Web, Florence, Italy, 18-22 May 2015. New York: ACM Press, pp. 333–343.

  13. Feyisetan, Oluwaseyi; Markus Luczak-Roesch; Elena Simperl; Ramine Tinati; and Nigel Shadbolt (2015b). Towards hybrid NER: a study of content and crowdsourcing-related performance factors. ESWC’15. Proceedings of The Semantic Web. Latest Advances and New Domains - 12th European Semantic Web Conference, Portoroz, Slovenia, 31 May-4 June 2015. Heidelberg: Springer, pp. 525–540.

  14. Gadiraju, Ujwal; and Neha Gupta (2016). Dealing with Sub-optimal Crowd Work: Implications of Current Quality Control Practices. International Reports on Socio-Informatics (IRSI), Proceedings of the CHI 2016 - Workshop: Crowd Dynamics: Exploring Conflicts and Contradictions in Crowdsourcing, Vol. 13. pp. 15–20.

  15. Gadiraju, Ujwal; and Ricardo Kawase (2017). Improving Reliability of Crowdsourced Results by Detecting Crowd Workers with Multiple Identities. ICWE’17. Proceedings of the 17th International Conference, Rome, Italy, 5-8 June 2017. Heidelberg: Springer, pp. 190–205.

  16. Gadiraju, Ujwal; and Stefan Dietze (2017). Improving learning through achievement priming in crowdsourced information finding microtasks. LAK’17. Proceedings of the seventh international learning analytics & knowledge conference, Vancouver, BC, Canada, 13-17 March 2017. New York: ACM Press, pp. 105–114.

  17. Gadiraju, Ujwal; Ricardo Kawase; and Stefan Dietze (2014). A taxonomy of microtasks on the web. HT’14. Proceedings of the 25th ACM Conference on Hypertext and Social Media, Santiago, Chile, 1-4 September 2014. New York: ACM Press, pp. 218–223.

  18. Gadiraju, Ujwal; Besnik, Fetahu; and Ricardo, Kawase (2015a). Training workers for improving performance in crowdsourcing microtasks. EC-TEL’15. Design for Teaching and Learning in a Networked World - Proceedings of the 10th European Conference on Technology Enhanced Learning, Toledo, Spain, 15-18 September 2015. Heidelberg: Springer, pp. 100–114.

  19. Gadiraju, Ujwal; Ricardo Kawase; Stefan Dietze; and Gianluca Demartini (2015b). Understanding malicious behavior in crowdsourcing platforms: The case of online surveys. CHI’15. Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, CHI 2015, Seoul, Republic of Korea, 18-23 April 2015. New York: ACM Press, pp. 1631–1640.

  20. Gadiraju, Ujwal; Alessandro Checco; Neha Gupta; and Gianluca Demartini (2017a). Modus operandi of crowd workers: The invisible role of microtask work environments. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), vol. 1, no. 3, pp. 49:1–49:29.

    Google Scholar 

  21. Gadiraju, Ujwal; Besnik Fetahu; Ricardo Kawase; Patrick Siehndel; and Stefan Dietze (2017b). Using worker self-assessments for competence-based preselection in crowdsourcing microtasks. ACM Transactions on Computer-Human Interaction (TOCHI), vol. 24, no. 4, pp. 30:1–30:26.

    Article  Google Scholar 

  22. Gadiraju, Ujwal; Jie Yang; and Alessandro Bozzon (2017c). Clarity is a Worthwhile Quality – On the Role of Task Clarity in Microtask Crowdsourcing. HT’17. Proceedings of the 28th ACM Conference on Hypertext and Social Media, Prague, Czech Republic, 4-7 July 2017. New York: ACM Press, pp. 5–14.

  23. Gaikwad, Snehalkumar Neil S; Durim Morina; Adam Ginzberg; Catherine Mullings; Shirish Goyal; Dilrukshi Gamage; Christopher Diemert; Mathias Burton; Sharon Zhou; Mark Whiting et al. (2016). Boomerang: Rebounding the consequences of reputation feedback on crowdsourcing platforms. UIST’16. Proceedings of the 29th Annual Symposium on User Interface Software and Technology, Tokyo, Japan, 16-19 October 2016. New York: ACM Press, pp. 625–637.

  24. Ipeirotis, Panagiotis G; Foster Provost; and Jing Wang (2010). Quality management on amazon mechanical turk. HCOMP’10. Proceedings of the ACM SIGKDD workshop on Human Computation. New York: ACM Press, pp. 64–67.

  25. Irani, Lilly C; and M Silberman (2013). Turkopticon: Interrupting worker invisibility in amazon mechanical turk. CHI’13. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Paris, France, 27 April-2 May 2013. New York: ACM Press, pp. 611– 620.

  26. Kazai, Gabriella; and Imed, Zitouni (2016). Quality management in crowdsourcing using gold judges behavior. WSDM’18. Proceedings of the ninth ACM international conference on web search and data mining, san francisco, CA, USA, 22-25 February, 2016. New York: ACM Press, pp. 267–276.

  27. Kazai, Gabriella; Jaap Kamps; and Natasa Milic-Frayling (2011). Worker types and personality traits in crowdsourcing relevance labels. CIKM’11. Proceedings of the 20th ACM International Conference on Information and Knowledge Management, Glasgow, United Kingdom, 24-28 October 2011. New York: ACM Press, pp. 1941–1944.

  28. Kazai, Gabriella; Jaap Kamps; and Natasa Milic-Frayling (2012). The face of quality in crowdsourcing relevance labels: demographics, personality and labeling accuracy. CIKM’12. Proceedings of the 21st ACM International conference on Information and Knowledge Management, Maui, HI, USA, 29 October-02 November 2012. New York: ACM Press, pp. 2583–2586.

  29. Kazai, Gabriella; Jaap Kamps; and Natasa Milic-Frayling (2013). An analysis of human factors and label accuracy in crowdsourcing relevance judgments. Information Retrieval, vol. 16, no. 2, pp. 138–178.

    Article  Google Scholar 

  30. Kittur, Aniket; Jeffrey V Nickerson; Michael Bernstein; Elizabeth Gerber; Aaron Shaw; John Zimmerman; Matt Lease; and John Horton (2013). The future of crowd work. CSCW’13. Proceedings of the 16th ACM Conference on Computer Supported Cooperative Work, San Antonio, TX, USA, 23-27 February 2013. New York: ACM Press, pp. 1301–1318.

  31. Marshall, Catherine C; and Frank M Shipman (2013). Experiences surveying the crowd: Reflections on methods, participation, and reliability. Proceedings of the 5th Annual ACM Web Science Conference, pp. 234–243.

  32. Martin, David; Benjamin V Hanrahan; Jacki O’Neill; and Neha Gupta (2014). Being a Turker. CSCW’14. Proceedings of the 17th ACM conference on Computer Supported Cooperative Work & Social Computing, Baltimore, MD, USA, 15-19 February 2014. New York: ACM Press, pp. 224–235.

  33. Oleson, David; Alexander Sorokin; Greg P. Laughlin; Vaughn Hester; John Le; and Lukas Biewald (2011). Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing. HCOMP’11. Papers from the 2011 AAAI Workshop on Human Computation, San Francisco, California, USA, 8 August 2011. AAAI Press, pp. 43–48.

  34. Rokicki, Markus; Sergej Zerr; and Stefan Siersdorfer (2015). Groupsourcing: Team competition designs for crowdsourcing. WWW’15. Proceedings of the 24th International Conference on World Wide Web, Florence, Italy, 18-22 May 2015. New York: ACM Press, pp. 906–915.

  35. Rzeszotarski, Jeffrey; and Aniket Kittur (2012). CrowdScape: interactively visualizing user behavior and output. UIST’12. Proceedings of the he 25th Annual ACM Symposium on User Interface Software and Technology, Cambridge, MA, USA, 7-10 October 2012. New York: ACM Press, pp. 55–62.

  36. Rzeszotarski, Jeffrey M; and Aniket Kittur (2011). Instrumenting the crowd: using implicit behavioral measures to predict task performance. UIST’11. Proceedings of the 24th annual ACM symposium on User Interface Software and Technology, Santa Barbara, CA, USA, 16-19 October 2011. New York: ACM Press, pp. 13–22.

  37. Sheshadri, Aashish; and Matthew Lease (2013). SQUARE: A Benchmark for Research on Computing Crowd Consensus. HCOMP’13. Proceedings of the First AAAI Conference on Human Computation and Crowdsourcing, 7-9 November 2013, Palm Springs, CA, USA. AAAI Press, pp. 156–164.

  38. Strauss, Anselm; and Barney Glaser (1967). Discovery of grounded theory. Chicago: Aldine.

  39. Strauss, Anselm L (1987). Qualitative analysis for social scientists. Cambridge: Cambridge University Press.

  40. Taras, Maddalena (2002). Using assessment for learning and learning from assessment. Assessment & Evaluation in Higher Education, vol. 27, no. 6, pp. 501–510.

    Article  Google Scholar 

  41. Venanzi, Matteo; John Guiver; Gabriella Kazai; Pushmeet Kohli; and Milad Shokouhi (2014). Community-based bayesian aggregation models for crowdsourcing. WWW’14. Proceedings of the 23rd International World Wide Web Conference, Seoul, Republic of Korea, 7-11 April 2014. New York: ACM Press, pp. 155–164.

  42. Vuurens, Jeroen BP; and Arjen P De Vries (2012). Obtaining high-quality relevance judgments using crowdsourcing. IEEE Internet Computing, vol. 16, no. 5, pp. 20–27.

    Article  Google Scholar 

  43. Wang, Jing; Panagiotis G Ipeirotis; and Foster Provost (2011). Managing crowdsourcing workers. WCBI’11. Proceedings of the Winter Conference on Business Intelligence, Salt Lake City, Utah, USA, 12-14 March 2011. Citeseer, pp. 10–12.

  44. Wood, Robert E (1986). Task complexity: Definition of the construct. Organizational Behavior and Human Decision Processes, vol. 37, no. 1, pp. 60–82.

    Article  Google Scholar 

  45. Yang, Jie; Judith Redi; Gianluca Demartini; and Alessandro Bozzon (2016). Modeling Task Complexity in Crowdsourcing. HCOMP’16. Proceedings of the Fourth AAAI Conference on Human Computation and Crowdsourcing, Austin, Texas, USA, 30 October-3 November 2016. AAAI Press, pp. 249–258.

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Ujwal Gadiraju.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gadiraju, U., Demartini, G., Kawase, R. et al. Crowd Anatomy Beyond the Good and Bad: Behavioral Traces for Crowd Worker Modeling and Pre-selection. Comput Supported Coop Work 28, 815–841 (2019). https://doi.org/10.1007/s10606-018-9336-y

Download citation

Keywords

  • Behavioral traces
  • Crowdsourcing
  • Microtasks
  • Pre-selection
  • Pre-screening
  • Workers
  • Worker typology