The VLDB Journal

, Volume 26, Issue 6, pp 855–880 | Cite as

Answer validation for generic crowdsourcing tasks with minimal efforts

  • Nguyen Quoc Viet HungEmail author
  • Duong Chi Thang
  • Nguyen Thanh Tam
  • Matthias Weidlich
  • Karl Aberer
  • Hongzhi Yin
  • Xiaofang Zhou
Regular Paper


Crowdsourcing has been established as an essential means to scale human computation in diverse Web applications, reaching from data integration to information retrieval. Yet, crowd workers have wide-ranging levels of expertise. Large worker populations are heterogeneous and comprise a significant amount of faulty workers. As a consequence, quality insurance for crowd answers is commonly seen as the Achilles heel of crowdsourcing. Although various techniques for quality control have been proposed in recent years, a post-processing phase in which crowd answers are validated is still required. Such validation, however, is typically conducted by experts, whose availability is limited and whose work incurs comparatively high costs. This work aims at guiding an expert in the validation of crowd answers. We present a probabilistic model that helps to identify the most beneficial validation questions in terms of both improvement in result correctness and detection of faulty workers. By seeking expert feedback on the most problematic cases, we are able to obtain a set of high-quality answers, even if the expert does not validate the complete answer set. Our approach is applicable for a broad range of crowdsourcing tasks, including classification and counting. Our comprehensive evaluation using both real-world and synthetic datasets demonstrates that our techniques save up to 60% of expert efforts compared to baseline methods when striving for perfect result correctness. In absolute terms, for most cases, we achieve close to perfect correctness after expert input has been sought for only 15% of the crowdsourcing tasks.


Crowdsourcing Validation Guiding user feedback Generic tasks Probabilistic model 

Supplementary material


  1. 1.
    Amsterdamer, Y., Grossman, Y., Milo, T., Senellart, P.: Crowd mining. In: SIGMOD, pp. 241–252 (2013)Google Scholar
  2. 2.
    Arasu, A., Götz, M., Kaushik, R.: On active learning of record matching packages. In: SIGMOD, pp. 783–794 (2010)Google Scholar
  3. 3.
    Callison-Burch, C.: Fast, cheap, and creative: evaluating translation quality using Amazon’s Mechanical Turk. In: EMNLP, pp. 286–295 (2009)Google Scholar
  4. 4.
    Cao, C.C., She, J., Tong, Y., Chen, L.: Whom to ask?: jury selection for decision making tasks on micro-blog services. In: VLDB, pp. 1495–1506 (2012)Google Scholar
  5. 5.
    CrowdFlower: (2016)
  6. 6.
    Davtyan, M., Eickhoff, C., Hofmann, T.: Exploiting document content for efficient aggregation of crowdsourcing votes. In: CIKM, pp. 783–790 (2015)Google Scholar
  7. 7.
    Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the EM algorithm. J. R. Stat. Soc. 1, 20–28 (1979)Google Scholar
  8. 8.
    Dekel, O., Shamir, O.: Vox populi: collecting high-quality labels from a crowd. In: COLT (2009)Google Scholar
  9. 9.
    Demartini, G., Difallah, D.E., Cudré-Mauroux, P.: Zencrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: WWW, pp. 469–478 (2012)Google Scholar
  10. 10.
    Difallah, D.E., Demartini, G., Cudré-Mauroux, P.: Mechanical cheat: spamming schemes and adversarial techniques on crowdsourcing platforms. In: CrowdSearch, pp. 26–30 (2012)Google Scholar
  11. 11.
    Dong, X.L., Berti-Equille, L., Hu, Y., Srivastava, D.: Solomon: Seeking the truth via copying detection. In: VLDB, pp. 1617–1620 (2010)Google Scholar
  12. 12.
    Dong, X.L., Berti-Equille, L., Srivastava, D.: Truth discovery and copying detection in a dynamic world. In: VLDB, pp. 562–573 (2009)Google Scholar
  13. 13.
    Dong, X.L., Naumann, F.: Data fusion: resolving data conflicts for integration. In: VLDB, pp. 1654–1655 (2009)Google Scholar
  14. 14.
    Eckart, C., Young, G.: The approximation of one matrix by another of lower rank. Psychometrika pp. 211–218 (1936)Google Scholar
  15. 15.
    Galland, A., Abiteboul, S., Marian, A., Senellart, P.: Corroborating information from disagreeing views. In: WSDM, pp. 131–140 (2010)Google Scholar
  16. 16.
    Garcin, F., Faltings, B., Jurca, R., Joswig, N.: Rating aggregation in collaborative filtering systems. In: Proceedings of the Third ACM Conference on Recommender Systems, pp. 349–352 (2009)Google Scholar
  17. 17.
    Gokhale, C., Das, S., Doan, A., Naughton, J.F., Rampalli, N., Shavlik, J., Zhu, X.: Corleone: Hands-off crowdsourcing for entity matching. In: SIGMOD, pp. 601–612 (2014)Google Scholar
  18. 18.
    Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Longman, Boston (1989)zbMATHGoogle Scholar
  19. 19.
    Gomes, R.G., Welinder, P., Krause, A., Perona, P.: Crowdclustering. In: NIPS, pp. 558–566 (2011)Google Scholar
  20. 20.
    Hu, Q., He, Q., Huang, H., Chiew, K., Liu, Z.: Learning from crowds under experts supervision. In: PAKDD, pp. 200–211 (2014)Google Scholar
  21. 21.
    Hung, N.Q.V., Tam, N.T., Miklós, Z., Aberer, K.: On leveraging crowdsourcing techniques for schema matching networks. In: DASFAA, pp. 139–154 (2013)Google Scholar
  22. 22.
    Hung, N.Q.V., Tam, N.T., Tran, L.N., Aberer, K.: An evaluation of aggregation techniques in crowdsourcing. In: WISE, pp. 1–15 (2013)Google Scholar
  23. 23.
    Ipeirotis, P.G., Provost, F., Wang, J.: Quality management on Amazon Mechanical Turk. In: HCOMP, pp. 64–67 (2010)Google Scholar
  24. 24.
    Jeffery, S.R., Franklin, M.J., Halevy, A.Y.: Pay-as-you-go user feedback for dataspace systems. In: SIGMOD, pp. 847–860 (2008)Google Scholar
  25. 25.
    Joglekar, M., Garcia-Molina, H., Parameswaran, A.: Comprehensive and reliable crowd assessment algorithms. In: ICDE, pp. 195–206 (2015)Google Scholar
  26. 26.
    Jung, H.J., Lease, M.: Improving quality of crowdsourced labels via probabilistic matrix factorization. In: HCOMP, pp. 101–106 (2012)Google Scholar
  27. 27.
    Kajino, H., Tsuboi, Y., Sato, I., Kashima, H.: Learning from crowds and experts. In: HCOMP, pp. 107–113 (2012)Google Scholar
  28. 28.
    Karger, D.R., Oh, S., Shah, D.: Iterative learning for reliable crowdsourcing systems. In: NIPS, pp. 1953–1961 (2011)Google Scholar
  29. 29.
    Karger, D.R., Oh, S., Shah, D.: Budget-optimal task allocation for reliable crowdsourcing systems. Oper. Res. 62, 1–24 (2014)CrossRefzbMATHGoogle Scholar
  30. 30.
    Karypis, G., Kumar, V.: Metis-unstructured graph partitioning and sparse matrix ordering system, version 2.0. Technical Report, University of Minnesota (1995)Google Scholar
  31. 31.
    Kazai, G., Kamps, J., Milic-Frayling, N.: Worker types and personality traits in crowdsourcing relevance labels. In: CIKM, pp. 1941–1944 (2011)Google Scholar
  32. 32.
    Kittur, A., Chi, E.H., Suh, B.: Crowdsourcing user studies with Mechanical Turk. In: CHI, pp. 453–456 (2008)Google Scholar
  33. 33.
    Kschischang, F.R., Frey, B.J., Loeliger, H.A.: Factor graphs and the sum-product algorithm. In: TIT pp. 498–519 (1998)Google Scholar
  34. 34.
    Kulkarni, A., Can, M., Hartmann, B.: Collaboratively crowdsourcing workflows with turkomatic. In: CSCW, pp. 1003–1012 (2012)Google Scholar
  35. 35.
    Kumar, A., Lease, M.: Modeling annotator accuracies for supervised learning. In: CSDM, pp. 19–22 (2011)Google Scholar
  36. 36.
    Lam, S.K., Riedl, J.: Shilling recommender systems for fun and profit. In: WWW, pp. 393–402 (2004)Google Scholar
  37. 37.
    Laws, F., Schätze, H.: Stopping criteria for active learning of named entity recognition. In: ICCL, pp. 465–472 (2008)Google Scholar
  38. 38.
    Lee, K., Caverlee, J., Webb, S.: The social honeypot project: protecting online communities from spammers. In: WWW, pp. 1139–1140 (2010)Google Scholar
  39. 39.
    Marcus, A., Parameswaran, A., et al.: Crowdsourced data management industry and academic perspectives. Found Trends Databases 6, 1–161 (2015)CrossRefGoogle Scholar
  40. 40.
    Mozafari, B., Sarkar, P., Franklin, M., Jordan, M., Madden, S.: Scaling up crowd-sourcing to very large datasets: a case for active learning. In: VLDB, pp. 125–136 (2014)Google Scholar
  41. 41.
    Nguyen, Q.V.H., Do, S.T., Nguyen, T.T., Aberer, K.: Tag-based paper retrieval: minimizing user effort with diversity awareness. In: DASFAA, pp. 510–528 (2015)Google Scholar
  42. 42.
    Nguyen, Q.V.H., Duong, C.T., Nguyen, T.T., Weidlich, M., Aberer, K., Yin, H., Zhou, X.: Argument discovery via crowdsourcing. VLDB J 26, 511–535 (2017)CrossRefGoogle Scholar
  43. 43.
    Nguyen, Q.V.H., Duong, C.T., Weidlich, M., Aberer, K.: Minimizing efforts in validating crowd answers. In: SIGMOD (2015)Google Scholar
  44. 44.
    Nguyen, Q.V.H., Huynh, H.V., Nguyen, T.T., Weidlich, M., Yin, H., Zhou, X.: Computing crowd consensus with partial agreement. In: TKDE pp. 1–14 (2017)Google Scholar
  45. 45.
    Nguyen, Q.V.H., Nguyen, T.T., Miklós, Z., Aberer, K., Gal, A., Weidlich, M.: Pay-as-you-go reconciliation in schema matching networks. In: ICDE, pp. 220–231 (2014)Google Scholar
  46. 46.
    Nguyen, Q.V.H., Nguyen Thanh, T., Lam, N.T., Do, S.T., Aberer, K.: A benchmark for aggregation techniques in crowdsourcing. In: SIGIR, pp. 1079–1080 (2013)Google Scholar
  47. 47.
    Nguyen, T.T., Duong, C.T., Weidlich, M., Yin, H., Nguyen, Q.V.H.: Retaining data from streams of social platforms with minimal regret. In: IJCAI (2017)Google Scholar
  48. 48.
    Nguyen, T.T., Nguyen, Q.V.H., Weidlich, M., Aberer, K.: Result selection and summarization for web table search. In: ICDE, pp. 231–242 (2015)Google Scholar
  49. 49.
    Nushi, B., Singla, A., Gruenheid, A., Zamanian, E., Krause, A., Kossmann, D.: Crowd access path optimization: diversity matters. In: AAAI (2015)Google Scholar
  50. 50.
    O’Mahony, M., Hurley, N., Kushmerick, N., Silvestre, G.: Collaborative recommendation: a robustness analysis. TOIT 4, 344–377 (2004)CrossRefGoogle Scholar
  51. 51.
    Pasternack, J., Roth, D.: Latent credibility analysis. In: WWW, pp. 1009–1020 (2013)Google Scholar
  52. 52.
    Prelec, D., Seung, H.S., McCoy, J.: A solution to the single-question crowd wisdom problem. Nature 541, 532–535 (2017)CrossRefGoogle Scholar
  53. 53.
    Quinn, A.J., Bederson, B.B.: Human computation: a survey and taxonomy of a growing field. In: CHI, pp. 1403–1412 (2011)Google Scholar
  54. 54.
    Quoc Viet Hung, N., Chi Thang, D., Weidlich, M., Aberer, K.: Erica: expert guidance in validating crowd answers. In: SIGIR, pp. 1037–1038 (2015)Google Scholar
  55. 55.
    Raykar, V.C., Yu, S.: Ranking annotators for crowdsourced labeling tasks. In: NIPS, pp. 1809–1817 (2011)Google Scholar
  56. 56.
    Raykar, V.C., Yu, S.: Eliminating spammers and ranking annotators for crowdsourced labeling tasks. J. Mach. Learn. Res. 13, 491–518 (2012)MathSciNetzbMATHGoogle Scholar
  57. 57.
    Reason, J.: Human Error. Cambridge University Press, Cambridge (1990)CrossRefGoogle Scholar
  58. 58.
    Refaeilzadeh, P., Tang, L., Liu, H.: Cross-validation. In: Encyclopedia of database systems, pp. 532–538. Springer (2009)Google Scholar
  59. 59.
    Ross, J., Irani, L., Silberman, M., Zaldivar, A., Tomlinson, B.: Who are the crowdworkers?: Shifting demographics in Mechanical Turk. In: CHI, pp. 2863–2872 (2010)Google Scholar
  60. 60.
    Rubens, N., Kaplan, D., Sugiyama, M.: Active learning in recommender systems. In: Recommender Systems Handbook, pp. 735–767. Springer (2011)Google Scholar
  61. 61.
    Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach. Pearson Education, London (2003)zbMATHGoogle Scholar
  62. 62.
    Sarma, A.D., Jain, A., Nandi, A., Parameswaran, A., Widom, J.: Surpassing humans and computers with JELLYBEAN: crowd-vision-hybrid counting algorithms. In: HCOMP (2015)Google Scholar
  63. 63.
    Shannon, C.E.: A mathematical theory of communication. SIGMOBILE 5, 3–55 (2001)Google Scholar
  64. 64.
    Sheng, V.S., Provost, F.: Get another label? Improving data quality and data mining using multiple, noisy labelers. In: SIGKDD pp. 614–622 (2008)Google Scholar
  65. 65.
    Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast—but is it good?: Evaluating non-expert annotations for natural language tasks. In: EMNLP, pp. 254–263 (2008)Google Scholar
  66. 66.
    Sun, C., Rampalli, N., Yang, F., Doan, A.: Chimera: large-scale classification using machine learning, rules, and crowdsourcing. In: VLDB, pp. 1529–1540 (2014)Google Scholar
  67. 67.
    Surowiecki, J.: The wisdom of crowds: why the many are smarter than the few and how collective wisdom shapes business. Econ. ESN 296, 63–65 (2004)Google Scholar
  68. 68.
    TRAVAIL: Global Wage Report 2012–13. International Labour Organization (ILO) (2012)Google Scholar
  69. 69.
    Turk, A.M.: (2016)
  70. 70.
    Vuurens, J., de Vries, A., Eickhoff, C.: How much spam can you take? An analysis of crowdsourcing results to increase accuracy. In: CIR, pp. 48–55 (2011)Google Scholar
  71. 71.
    Wang, D., Kaplan, L., Le, H., Abdelzaher, T.: On truth discovery in social sensing: a maximum likelihood estimation approach. In: IPSN, pp. 233–244 (2012)Google Scholar
  72. 72.
    Welinder, P., Perona, P.: Online crowdsourcing: rating annotators and obtaining cost-effective labels. In: CVPRW, pp. 25–32 (2010)Google Scholar
  73. 73.
    Wick, M., McCallum, A., Miklau, G.: Scalable probabilistic databases with factor graphs and mcmc. In: VLDB, pp. 794–804 (2010)Google Scholar
  74. 74.
    Yakout, M., Elmagarmid, A.K., Neville, J., Ouzzani, M., Ilyas, I.F.: Guided data repair. In: VLDB, pp. 279–289 (2011)Google Scholar
  75. 75.
    Yan, T., Kumar, V., Ganesan, D.: Crowdsearch: exploiting crowds for accurate real-time image search on mobile phones. In: MobiSys, pp. 77–90 (2010)Google Scholar
  76. 76.
    Zaidan, O.F., Callison-Burch, C.: Crowdsourcing translation: professional quality from non-professionals. In: ACL, pp. 1220–1229 (2011)Google Scholar
  77. 77.
    Zhang, C., Ré, C.: Towards high-throughput Gibbs sampling at scale: a study across storage managers. In: SIGMOD, pp. 397–408 (2013)Google Scholar
  78. 78.
    Zhang, C.J., Chen, L., Jagadish, H.V., Cao, C.C.: Reducing uncertainty of schema matching via crowdsourcing. In: VLDB, pp. 757–768 (2013)Google Scholar
  79. 79.
    Zhao, B., Rubinstein, B.I., Gemmell, J., Han, J.: A Bayesian approach to discovering truth from conflicting sources for data integration. In: VLDB, pp. 550–561 (2012)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany 2017

Authors and Affiliations

  • Nguyen Quoc Viet Hung
    • 1
    Email author
  • Duong Chi Thang
    • 2
  • Nguyen Thanh Tam
    • 2
  • Matthias Weidlich
    • 3
  • Karl Aberer
    • 2
  • Hongzhi Yin
    • 4
  • Xiaofang Zhou
    • 4
  1. 1.Griffith UniversityGold CoastAustralia
  2. 2.École Polytechnique Fédérale de LausanneLausanneSwitzerland
  3. 3.Humboldt-Universität zu BerlinBerlinGermany
  4. 4.The University of QueenslandBrisbaneAustralia

Personalised recommendations