Effective Solution for Labeling Candidates with a Proper Ration for Efficient Crowdsourcing

  • Zhao Chen
  • Peng Cheng
  • Chen ZhangEmail author
  • Lei Chen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10828)


One of the core problems of crowdsourcing research is how to reduce the cost, in other words, how to get better results with a limited budget. To save budget, most researchers concentrate on internal steps of crowdsourcing while in this work we focus on the pre-processing stage: how to select the input for crowds to contribute. A straightforward application of this work is to help budget-limited machine learning researchers to get better balanced training data from crowd labeling. Specifically, we formulate the prior information based input manipulating procedure as the Candidate Selection Problem (CSP) and propose an end-squeezing algorithm for it. Our results show that a considerable cost reduction can be achieved by manipulating the input to the crowd with the help of some additional prior information. We verify the effectiveness and efficiency of these algorithms through extensive experiments.



The work is partially supported by the Hong Kong RGC GRF Project 16207617, National Grand Fundamental Research 973 Program of China under Grant 2014CB340303, the National Science Foundation of China (NSFC) under Grant No. 61729201, Science and Technology Planning Project of Guangdong Province, China, No. 2015B010110006, Webank Collaboration Research Project, and Microsoft Research Asia Collaborative Research Grant.


  1. 1.
    Amsterdamer, Y., Grossman, Y., Milo, T., Senellart, P.: Crowdminer: mining association rules from the crowd. Proc. VLDB Endow. 6(12), 1250–1253 (2013)CrossRefGoogle Scholar
  2. 2.
    Buhrmester, M., Kwang, T., Gosling, S.D.: Amazon’s mechanical turk a new source of inexpensive, yet high-quality, data? Perspect. Psychol. Sci. 6(1), 3–5 (2011)CrossRefGoogle Scholar
  3. 3.
    Cao, C.C., She, J., Tong, Y., Chen, L.: Whom to ask?: jury selection for decision making tasks on micro-blog services. Proc. VLDB Endow. 5(11), 1495–1506 (2012)CrossRefGoogle Scholar
  4. 4.
    Cao, C.C., Tong, Y., Chen, L., Jagadish, H.: Wisemarket: a new paradigm for managing wisdom of online social users. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 455–463. ACM (2013)Google Scholar
  5. 5.
    Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the EM algorithm. Appl. Stat. 20–28 (1979)Google Scholar
  6. 6.
    Fan, J., Li, G., Ooi, B.C., Tan, K.-L., Feng, J.: icrowd: An adaptive crowdsourcing framework. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1015–1030. ACM (2015)Google Scholar
  7. 7.
    Fujishige, S.: Submodular Functions and Optimization, vol. 58. Elsevier, New York City (2005)zbMATHGoogle Scholar
  8. 8.
    Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 42, 463–484 (2012)CrossRefGoogle Scholar
  9. 9.
    Hong, Y.: On computing the distribution function for the poisson binomial distribution. Comput. Stat. Data Anal. 59, 41–51 (2013)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical report, Technical Report 07–49, University of Massachusetts, Amherst (2007)Google Scholar
  11. 11.
    Ipeirotis, P.G., Provost, F., Wang, J.: Quality management on amazon mechanical turk. In: Proceedings of the ACM SIGKDD Workshop on Human Computation. ACM (2010)Google Scholar
  12. 12.
    Karger, D.R., Oh, S., Shah, D.: Budget-optimal task allocation for reliable crowdsourcing systems. Oper. Res. 62(1), 1–24 (2014)CrossRefGoogle Scholar
  13. 13.
    Le Cam, L., et al.: An approximation theorem for the poisson binomial distribution. Pac. J. Math. 10(4), 1181–1197 (1960)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Li, Q., Li, Y., Gao, J., Su, L., Zhao, B., Demirbas, M., Fan, W., Han, J.: A confidence-aware approach for truth discovery on long-tail data. Proc. VLDB Endow. 8(4), 425–436 (2014)CrossRefGoogle Scholar
  15. 15.
    Lopez, V., Fernandez, A., Garcia, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250, 113–141 (2013)CrossRefGoogle Scholar
  16. 16.
    Mason, W., Watts, D.J.: Financial incentives and the performance of crowds. ACM SigKDD Explor. Newsl. 11(2), 100–108 (2010)CrossRefGoogle Scholar
  17. 17.
    Thompson, S., Seber, G.: Adaptive Sampling. Wiley series in probability and statistics. Wiley, Hoboken (1996). Show all parts in this serieszbMATHGoogle Scholar
  18. 18.
    Thompson, S.K.: Sampling. Wiley CourseSmart series, 3rd edn. Wiley, Hoboken (2012)CrossRefGoogle Scholar
  19. 19.
    Tong, Y., Chen, L., Zhou, Z., Jagadish, H.V., Shou, L., Lv, W.: Slade: a smart large-scale task decomposer in crowdsourcing. IEEE Trans. Knowl. Data Eng. (2018)Google Scholar
  20. 20.
    Tong, Y., She, J., Ding, B., Chen, L., Wo, T., Xu, K.: Online minimum matching in real-time spatial data: experiments and analysis. PVLDB 9, 1053–1064 (2016)Google Scholar
  21. 21.
    Tong, Y., She, J., Ding, B., Wang, L., Chen, L.: Online mobile micro-task allocation in spatial crowdsourcing. In: ICDE (2016)Google Scholar
  22. 22.
    Verroios, V., Garcia-Molina, H.: Entity resolution with crowd errors. In: 2015 IEEE 31st International Conference on Data Engineering (ICDE), pp. 219–230. IEEE (2015)Google Scholar
  23. 23.
    Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. 57(2), 137–154 (2004)CrossRefGoogle Scholar
  24. 24.
    Volkova, A.Y.: A refinement of the central limit theorem for sums of independent random indicators. Theor. Probab. Appl. 40(4), 791–794 (1996)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Wang, D., Hoi, S.C.H., He, Y.: A unified learning framework for auto face annotation by mining web facial images. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 1392–1401. ACM (2012)Google Scholar
  26. 26.
    Wang, Y.H.: On the number of successes in independent trials. Stat. Sin. 3, 295–312 (1993)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.The Hong Kong University of Science and TechnologyKowloonHong Kong
  2. 2.Shandong University of Finance and EconomicsJinanChina

Personalised recommendations