Social Network Analysis and Mining

, Volume 2, Issue 4, pp 329–344 | Cite as

Social network mining of requester communities in crowdsourcing markets

  • Daniel Schall
  • Florian Skopik
Original Article


Crowdsourcing is a new computing approach where human tasks are outsourced to a large number of human workers. Crowdsourcing has not only attracted attention from industry but also from various academic communities. Amazon Mechanical Turk (AMT) has been the first commercial platform offering crowdsourcing services to its customers. AMT is often referred to as a platform supplying ‘artificial’ artificial-intelligence. Recent research efforts have not been addressing the analysis of the community structure of large-scale crowdsourcing platforms. In this work, we discuss detailed statistics of the popular AMT marketplace to provide insights in task properties and requester behavior. Here we present a model to automatically infer requester communities based on task keywords. Hierarchical clustering is used to identify relations between keywords associated with tasks. We present novel techniques to rank communities and requesters by using a graph-based algorithm. Furthermore, we introduce models and methods for the discovery of relevant crowdsourcing brokers who are able to act as intermediaries between requesters and platforms such as AMT.


Crowdsourcing Mechanical turk Hierarchical clustering Community detection Community ranking Broker discovery 


  1. Alonso O, Rose DE, Stewart B (2008) Crowdsourcing for relevance evaluation. SIGIR Forum 42(2):9–15CrossRefGoogle Scholar
  2. Barabasi A-L, Albert R (1999) Emergence of scaling in random networks. Science 286:509MathSciNetCrossRefGoogle Scholar
  3. Benkler Y (2001) Coase’s penguin, or linux and the nature of the firm. CoRR. cs.CY/0109077Google Scholar
  4. Bhattacharyya P, Garg A, Wu S (2011) Analysis of user keyword similarity in online social networks. Soc Netw Anal Min 1:143–158. doi: 10.1007/s13278-010-0006-4 CrossRefGoogle Scholar
  5. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022zbMATHGoogle Scholar
  6. Branting L (2011) Context-sensitive detection of local community structure. Soc Netw Anal Min 1–11. doi: 10.1007/s13278-011-0035-7
  7. Burt RS (1992) Structural holes: the social structure of competition. Harvard University Press, CambridgeGoogle Scholar
  8. Callison-Burch C, Dredze M (2010) Creating speech and language data with amazon’s mechanical turk. In: Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon’s Mechanical Turk, CSLDAMT ’10. Association for Computational Linguistics, Stroudsburg, pp 1–12Google Scholar
  9. Carvalho VR, Lease M, Yilmaz E (2011) Crowdsourcing for search evaluation. SIGIR Forum 44(2):17–22CrossRefGoogle Scholar
  10. Cazabet R, Takeda H, Hamasaki M, Amblard F (2012) Using dynamic community detection to identify trends in user-generated content. Soc Netw Anal Min 1–11. doi: 10.1007/s13278-012-0074-8
  11. Chakrabarti S (2007) Dynamic personalized pagerank in entity-relation graphs. In: Proceedings of the 16th international conference on World Wide Web, WWW ’07. ACM, New York, pp 571–580Google Scholar
  12. Chang J, Boyd-Graber J, Gerrish S, Wang C, Blei D (2009) Reading tea leaves: how humans interpret topic models. In: Bengio Y, Schuurmans D, Lafferty J, Williams CKI, Culotta A (eds) Advances in neural information processing systems, vol 22. Morgan Kaufmann, San Mateo, pp 288–296Google Scholar
  13. ClickWorker. Accessed 2012
  14. CrowdFlower. Accessed 2012
  15. Doan A, Ramakrishnan, R, Halevy Y (2011) Crowdsourcing systems on the world-wide web. Commun ACM 54(4):86–96CrossRefGoogle Scholar
  16. Eda T, Yoshikawa M, Yamamuro M (2008) Locally expandable allocation of folksonomy tags in a directed acyclic graph. In: Proceedings of the 9th international conference on Web information systems engineering, WISE ’08. Springer, Berlin, pp 151–162Google Scholar
  17. Fazeen M, Dantu R, Guturu P (2011) Identification of leaders, lurkers, associates and spammers in a social network: context-dependent and context-independent approaches. Soc Netw Anal Min 1:241–254. doi: 10.1007/s13278-011-0017-9 CrossRefGoogle Scholar
  18. Fisher D, Smith M, Welser HT (2006) You are who you talk to: Detecting roles in usenet newsgroups. In: Proceedings of the 39th annual Hawaii international conference on system sciences, HICSS ’06, vol 03. IEEE Computer Society, Washington, p 59.2Google Scholar
  19. Flickr. Accessed 2012
  20. Fogaras D, Rácz B, Csalogány K, Sarlós T (2005) Towards scaling fully personalized pagerank: algorithms, lower bounds, and experiments. Internet Math 2(3):333–358MathSciNetzbMATHCrossRefGoogle Scholar
  21. Franklin MJ, Kossmann D, Kraska T, Ramesh S, Xin R (2011) Crowddb: answering queries with crowdsourcing. In: Proceedings of the 2011 international conference on management of data, SIGMOD ’11. ACM, New York, pp 61–72Google Scholar
  22. Gemmell J, Shepitsen A, Mobasher B, Burke R (2008) Personalizing navigation in folksonomies using hierarchical tag clustering. In: Proceedings of the 10th international conference on data warehousing and knowledge discovery, DaWaK ’08. Springer, Berlin, pp 196–205Google Scholar
  23. Golder S, Huberman BA (2006) Usage patterns of collaborative tagging systems. J Inf Sci 32(2):198–208CrossRefGoogle Scholar
  24. Haveliwala TH (2002) Topic-sensitive pagerank. In: Proceedings of the 11th international conference on World Wide Web, WWW ’02. ACM, New York, pp 517–526Google Scholar
  25. Heer J, Bostock M (2010) Crowdsourcing graphical perception: using mechanical turk to assess visualization design. In: Proceedings of the 28th international conference on Human factors in computing systems, CHI ’10. ACM, New York, pp 203–212Google Scholar
  26. Herlocker JL, Konstan JA, Terveen LG, Riedl JT (2004) Evaluating collaborative filtering recommender systems. ACM Trans Inf Syst 22(1):5–53CrossRefGoogle Scholar
  27. Heymann P, Garcia-Molina H (2006) Collaborative creation of communal hierarchical taxonomies in social tagging systems. Technical report, Computer Science Department, Standford UniversityGoogle Scholar
  28. Howe J (2006) The rise of crowdsourcing. Wired 14(14):1–5Google Scholar
  29. Howe J (2008) Crowdsourcing: Why the Power of the Crowd is Driving the Future of Business. Crown Business, New YorkGoogle Scholar
  30. Ipeirotis PG (2010) Analyzing the amazon mechanical turk marketplace. XRDS 17:16–21CrossRefGoogle Scholar
  31. Ipeirotis PG (2012) Mechanical turk: Now with 40.92 % spam, 2010. Accessed 2012
  32. Jeh G, Widom J (2003) Scaling personalized web search. In: Proceedings of the 12th international conference on World Wide Web, WWW ’03. ACM, New York, pp 271–279Google Scholar
  33. Kittur A, Chi EH, Suh B (2008) Crowdsourcing user studies with mechanical turk. In: Proceedings of the twenty-sixth annual SIGCHI conference on human factors in computing systems, CHI ’08. ACM, New York, pp 453–456Google Scholar
  34. Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM 46(5):604–632MathSciNetzbMATHCrossRefGoogle Scholar
  35. Kourtellis N, Alahakoon T, Simha R, Lamnitchi A, Tripathi R (2012) Identifying high betweenness centrality nodes in large social networks. Soc Netw Anal Min 1–16. doi: 10.1007/s13278-012-0076-6
  36. Lampe C, Resnick P (2004) Slash(dot) and burn: distributed moderation in a large online conversation space. In: Proceedings of the SIGCHI conference on human factors in computing systems, CHI ’04. ACM, New York, pp 543–550Google Scholar
  37. Little G, Chilton LB, Goldman M, Miller RC (2010) Turkit: human computation algorithms on mechanical turk. In: Proceedings of the 23nd annual ACM symposium on User interface software and technology, UIST ’10. ACM, New York, pp 57–66Google Scholar
  38. Marge M, Banerjee S, Rudnicky AI (2010) Using the amazon mechanical turk for transcription of spoken language. In: Proceedings of the IEEE international conference on acoustics, speech, and, signal processing, pp 5270–5273Google Scholar
  39. Michlmayr E, Cayzer S (2007) Learning user profiles from tagging data and leveraging them for personal(ized) information access. In: Tagging and metadata for social information organization, workshop, WWW07Google Scholar
  40. Munro R, Bethard S, Kuperman V, Lai VT, Melnick R, Potts C, Schnoebelen T, Tily H (2010) Crowdsourcing and language studies: the new generation of linguistic data. In: Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon’s Mechanical Turk, CSLDAMT ’10. Association for Computational Linguistics, Stroudsburg, pp 122–130Google Scholar
  41. oDesk. Accessed 2012
  42. Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: bringing order to the webGoogle Scholar
  43. Parameswaran A, Park H, Garcia-Molina H, Polyzotis N, Widom J (2011) Deco: declarative crowdsourcing. Stanford University technical reportGoogle Scholar
  44. Psaier H, Skopik F, Schall D, Dustdar S (2011) Resource and agreement management in dynamic crowdcomputing environments. EDOC. IEEE Computer Society, Los Vaqueros Circle Los Alamitos, pp 193–202Google Scholar
  45. Quinn AJ, Bederson BB (2011) Human computation: a survey and taxonomy of a growing field. In: Proceedings of the 2011 annual conference on Human factors in computing systems, CHI ’11. ACM, New York, pp 1403–1412Google Scholar
  46. Romesburg C (2004) Cluster analysis for researchers. Krieger Pub. Co., MalabarGoogle Scholar
  47. Rosvall M, Bergstrom CT (2008) Maps of random walks on complex networks reveal community structure. PNAS 105:1118CrossRefGoogle Scholar
  48. Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manage 24(5):513–523CrossRefGoogle Scholar
  49. Samasource. Accessed 2012
  50. Satzger B, Psaier H, Schall D, Dustdar S (2011) Stimulating skill evolution in market-based crowdsourcing. In: BPM, pp 66–82Google Scholar
  51. Schall D (2011) A human centric runtime framework for mixed service-oriented systems. Distrib Parallel Databases 29:333–360. doi: 10.1007/s10619-011-7081-z CrossRefGoogle Scholar
  52. Schall D (2012) Expertise ranking using activity and contextual link measures. Data Knowl Eng 71(1):92–113. doi: 10.1016/j.datak.2011.08.001 CrossRefGoogle Scholar
  53. Schall D, Skopik F (2011) An analysis of the structure and dynamics of large-scale q/a communities. In: Eder J, Bieliková M, Tjoa AM (eds) ADBIS. Lecture notes in computer science, vol 6909. Springer, Berlin, pp 285–301Google Scholar
  54. Schall D, Skopik F, Psaier H, Dustdar S (2011) Bridging socially-enhanced virtual communities. In: Chu WC, Wong WE, Palakal MJ, Hung C-C (eds) SAC. ACM, New York, pp 792–799Google Scholar
  55. Shepitsen A, Gemmell J, Mobasher B, Burke R (2008) Personalized recommendation in social tagging systems using hierarchical clustering. In: Proceedings of the 2008 ACM conference on recommender systems, RecSys ’08. ACM, New York, pp 259–266Google Scholar
  56. Sigurbjörnsson B, van Zwol R (2008) Flickr tag recommendation based on collective knowledge. In: Proceedings of the 17th international conference on World Wide Web, WWW ’08. ACM, New York, pp 327–336Google Scholar
  57. Skopik F, Schall D, Dustdar S (2009) Start trusting strangers? bootstrapping and prediction of trust. In: Vossen G, Long DDE, Yu JX (eds) WISE. Lecture notes in computer science, vol 5802. Springer, Berlin, pp 275–289Google Scholar
  58. SmartSheet. Accessed 2012
  59. SpeechInk. Accessed 2012
  60. Vukovic M (2009) Crowdsourcing for enterprises. In: Proceedings of the 2009 congress on services-I, Services ’09. IEEE Computer Society, WashingtonGoogle Scholar

Copyright information

© Springer-Verlag 2012

Authors and Affiliations

  1. 1.Siemens Corporate TechnologyWienAustria
  2. 2.Safety and Security DepartmentAIT Austrian Institute of TechnologySeibersdorfAustria

Personalised recommendations