Skip to main content
Log in

Learning from crowdsourced labeled data: a survey

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

With the rapid growing of crowdsourcing systems, quite a few applications based on a supervised learning paradigm can easily obtain massive labeled data at a relatively low cost. However, due to the variable uncertainty of crowdsourced labelers, learning procedures face great challenges. Thus, improving the qualities of labels and learning models plays a key role in learning from the crowdsourced labeled data. In this survey, we first introduce the basic concepts of the qualities of labels and learning models. Then, by reviewing recently proposed models and algorithms on ground truth inference and learning models, we analyze connections and distinctions among these techniques as well as clarify the level of the progress of related researches. In order to facilitate the studies in this field, we also introduce open accessible real-world data sets collected from crowdsourcing systems and open source libraries and tools. Finally, some potential issues for future studies are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Allahbakhsh M, Benatallah B, Ignjatovic A, Motahari-Nezhad HR, Bertino E, Dustdar S (2013) Quality control in crowdsourcing systems: issues and directions. IEEE Internet Comput 2:76–81

    Article  Google Scholar 

  • Bernardi C, Maday Y (1997) Handbook of numerical analysis. Spectr Methods 5:209–485

    MathSciNet  Google Scholar 

  • Bernstein MS, Little G, Miller RC, Hartmann B, Ackerman MS, Karger DR, Crowell D, Panovich K (2010) Soylent: a word processor with a crowd inside. In: Proceedings of the 23nd annual ACM symposium on user interface software and technology, ACM, pp 313–322

  • Bragg J, Weld DS, et al (2013) Crowdsourcing multi-label classification for taxonomy creation. In: First AAAI conference on human computation and crowdsourcing, pp 25–33

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MathSciNet  MATH  Google Scholar 

  • Brew A, Greene D, Cunningham P (2010) The interaction between supervised learning and crowdsourcing. In: NIPS workshop on computational social science and the wisdom of crowds

  • Brodley CE, Friedl MA (1999) Identifying mislabeled training data. J Artif Intell Res 11:131–167

    MATH  Google Scholar 

  • Buckley C, Lease M, Smucker MD, Jung HJ, Grady C, Buckley C, Lease M, Smucker MD, Grady C, Lease M, et al (2010) Overview of the trec 2010 relevance feedback track (notebook). In: The nineteenth text retrieval conference (TREC) notebook

  • Carvalho VR, Lease M, Yilmaz E (2011) Crowdsourcing for search evaluation. ACM Sigir Forum ACM 44:17–22

    Article  Google Scholar 

  • Corney J, Lynn A, Torres C, Di Maio P, Regli W, Forbes G, Tobin L (2010) Towards crowdsourcing translation tasks in library cataloguing, a pilot study. In: The 4th IEEE international conference on digital ecosystems and technologies(DEST), IEEE, pp 572–577

  • Dagan I, Glickman O, Magnini B (2006) The pascal recognising textual entailment challenge. In: Quiñonero-Candela J, Dagan I, Magnini B, d’Alché-Buc F (eds) Machine learning challenges. evaluating predictive uncertainty, visual object classification, and recognising tectual entailment, Springer, pp 177–190

  • Dalvi N, Dasgupta A, Kumar R, Rastogi V (2013) Aggregating crowdsourced binary ratings. In: Proceedings of the 22nd international conference on world wide web, International World Wide Web conferences steering committee, pp 285–294

  • Dawid AP, Skene AM (1979) Maximum likelihood estimation of observer error-rates using the em algorithm. Appl Stat 28(1):20–28

    Article  Google Scholar 

  • Demartini G, Difallah DE, Cudré-Mauroux P (2012) Zencrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: Proceedings of the 21st international conference on world wide web, ACM, pp 469–478

  • Doan A, Ramakrishnan R, Halevy AY (2011) Crowdsourcing systems on the world-wide web. Commun ACM 54(4):86–96

    Article  Google Scholar 

  • Donmez P, Carbonell JG, Schneider J (2009) Efficiently learning the accuracy of labeling sources for selective sampling. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 259–268

  • Donmez P, Carbonell JG, Schneider JG (2010) A probabilistic framework to learn from multiple annotators with time-varying accuracy. In: Proceedings of the 10th SIAM international conference on data mining, SIAM, pp 826–837

  • Dow S, Kulkarni A, Klemmer S, Hartmann B (2012) Shepherding the crowd yields better work. In: Proceedings of the ACM 2012 conference on computer supported cooperative work, ACM, pp 1013–1022

  • Downs JS, Holbrook MB, Sheng S, Cranor LF (2010) Are your participants gaming the system? Screening mechanical turk workers. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp 2399–2402

  • Eagle N (2009) txteagle: mobile crowdsourcing. In: Aykin N (ed) Internationalization, design and global development, Springer, pp 447–456

  • Evgeniou T, Pontil M (2004) Regularized multi-task learning. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 109–117

  • Faltings B, Jurca R, Pu P, Tran BD (2014) Incentives to counter bias in human computation. In: Second AAAI conference on human computation and crowdsourcing

  • Fang M, Yin J, Zhu X (2013) Knowledge transfer for multi-labeler active learning. In: Machine learning and knowledge discovery in databases, Springer, pp 273–288

  • Fang M, Zhu X, Li B, Ding W, Wu X (2012) Self-taught active learning from crowds. In: Data mining (2012 IEEE 12th international conference on ICDM), IEEE, pp 858–863

  • Frénay B, Verleysen M (2014) Classification in the presence of label noise: a survey. IEEE Trans Neural Netw Learn Syst 25(5):845–869

    Article  Google Scholar 

  • Freund Y (2001) An adaptive version of the boost by majority algorithm. Mach Learn 43(3):293–318

    Article  MATH  Google Scholar 

  • Fu Y, Zhu X, Li B (2013) A survey on instance selection for active learning. Knowl Inf Syst 35(2):249–283

    Article  Google Scholar 

  • Gelman A, Carlin JB, Stern HS, Rubin DB (2014) Bayesian data analysis, vol 2. Taylor & Francis, London

    MATH  Google Scholar 

  • Ghosh A, Kale S, McAfee P (2011) Who moderates the moderators? Crowdsourcing abuse detection in user-generated content. In: Proceedings of the 12th ACM conference on electronic commerce, ACM, pp 167–176

  • Grady C, Lease M (2010) Crowdsourcing document relevance assessment with mechanical turk. In: Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon’s mechanical turk, association for computational linguistics, pp 172–179

  • Gu B, Sheng VS, Tay KY, Romano W, Li S (2014) Incremental support vector learning for ordinal regression. IEEE Trans Neural Netw Learn Syst 26(7):1403–1416

    Article  MathSciNet  Google Scholar 

  • Gu B, Sheng VS, Wang Z, Ho D, Osman S, Li S (2015) Incremental learning for \(\nu \)-support vector regression. Neural Netw 67:140–150

    Article  Google Scholar 

  • Halevy A, Norvig P, Pereira F (2009) The unreasonable effectiveness of data. IEEE Intell Syst 24(2):8–12

    Article  Google Scholar 

  • Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. ACM SIGKDD Explor Newslett 11(1):10–18

    Article  Google Scholar 

  • Han H, Otto C, Liu X, Jain A (2014) Demographic estimation from face images: human vs. machine performance. IEEE Trans Pattern Anal Mach Intell 37(6):1148–1161

    Article  Google Scholar 

  • Howe J (2006) The rise of crowdsourcing. Wired Mag 14(6):1–4

    MathSciNet  Google Scholar 

  • Ipeirotis PG, Provost F, Wang J (2010) Quality management on amazon mechanical turk. In: Proceedings of the ACM SIGKDD workshop on human computation, ACM, pp 64–67

  • Ipeirotis PG, Provost F, Sheng VS, Wang J (2014) Repeated labeling using multiple noisy labelers. Data Min Knowl Discov 28(2):402–441

    Article  MathSciNet  MATH  Google Scholar 

  • Jung HJ, Lease M (2011) Improving consensus accuracy via z-score and weighted voting. In: Proceedings of the 3rd human computation workshop (HCOMP) at AAAI

  • Kajino H, Kashima H (2012) Convex formulations of learning from crowds. Trans Jan Soc Artif Intell 27:133–142

    Article  Google Scholar 

  • Karger DR, Oh S, Shah D (2011) Budget-optimal crowdsourcing using low-rank matrix approximations. In: Communication, control, and computing (Allerton), 2011 49th annual allerton conference on, IEEE, pp 284–291

  • Karger DR, Oh S, Shah D (2014) Budget-optimal task allocation for reliable crowdsourcing systems. Oper Res 62(1):1–24

    Article  MATH  Google Scholar 

  • Khattak FK, Salleb-Aouissi A (2011) Quality control of crowd labeling through expert evaluation. In: Proceedings of the NIPS 2nd workshop on computational social science and the wisdom of crowds

  • Khetan A, Oh S (2016) Reliable crowdsourcing under the generalized dawid-skene model. arXiv:1602.03481

  • Kittur A, Smus B, Khamkar S, Kraut RE (2011) Crowdforge: crowdsourcing complex work. In: Proceedings of the 24th annual ACM symposium on User interface software and technology, ACM, pp 43–52

  • Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT press, Cambridge

    MATH  Google Scholar 

  • Kulkarni A, Can M, Hartmann B (2012) Collaboratively crowdsourcing workflows with turkomatic. In: Proceedings of the ACM 2012 conference on computer supported cooperative work, ACM, pp 1003–1012

  • Kurve A, Miller DJ, Kesidis G (2015) Multicategory crowdsourcing accounting for variable task difficulty, worker skill, and worker intention. IEEE Trans Knowl Data Eng 27(3):794–809

    Article  Google Scholar 

  • Lease M (2011) On quality control and machine learning in crowdsourcing. In: Proceedings of the 3rd human computation workshop (HCOMP) at AAAI

  • Li H, Yu B (2014) Error rate bounds and iterative weighted majority voting for crowdsourcing. arXiv:1411.4086

  • Li J, Ott M, Cardie C, Hovy E (2014) Towards a general rule for identifying deceptive opinion spam. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, ACL

  • Li J, Li X, Yang B, Sun X (2015) Segmentation-based image copy-move forgery detection scheme. IEEE Trans Inf Forensics Secur 10(3):507–518

    Article  Google Scholar 

  • Lin CH, Weld DS, et al (2014) To re (label), or not to re (label). In: Second AAAI conference on human computation and crowdsourcing

  • Ling CX, Sheng VS (2010) Cost-sensitive learning. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning, Springer, pp 231–235

  • Little G, Chilton LB, Goldman M, Miller RC (2009) Turkit: tools for iterative tasks on mechanical turk. In: Proceedings of the ACM SIGKDD workshop on human computation, ACM, pp 29–30

  • Liu K, Cheung WK, Liu J (2015) Detecting multiple stochastic network motifs in network data. Knowl Inf Syst 42(1):49–74

    Article  Google Scholar 

  • Liu Q, Peng J, Ihler AT (2012) Variational inference for crowdsourcing. In: Advances in neural information processing systems, pp 692–700

  • Long C, Hua G, Kapoor A (2013) Active visual recognition with expertise estimation in crowdsourcing. In: 2013 IEEE international conference on computer vision (ICCV), IEEE, pp 3000–3007

  • Michalski RS, Carbonell JG, Mitchell TM (2013) Machine learning: an artificial intelligence approach. Springer, Berlin

    MATH  Google Scholar 

  • Miller GA, Charles WG (1991) Contextual correlates of semantic similarity. Lang Cognit Process 6(1):1–28

    Article  Google Scholar 

  • Mo K, Zhong E, Yang Q (2013) Cross-task crowdsourcing. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 677–685

  • Muhammadi J, Rabiee HR, Hosseini A (2015) A unified statistical framework for crowd labeling. Knowl Inf Syst 45(2):271–294

    Article  Google Scholar 

  • Natarajan N, Dhillon IS, Ravikumar PK, Tewari A (2013) Learning with noisy labels. In: Advances in neural information processing systems, vol 26. pp 1196–1204

  • Nguyen QVH, Nguyen TT, Lam NT, Aberer K (2013) Batc: a benchmark for aggregation techniques in crowdsourcing. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 1079–1080

  • Nicholson B, Zhang J, Sheng VS, Wang Z (2015) Label noise correction methods. In: IEEE International Conference on, IEEE, data science and advanced analytics (DSAA), 2015. 36678 2015, pp 1–9

  • Nowak S, Rüger S (2010) How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation. In: Proceedings of the international conference on multimedia information retrieval, ACM, pp 557–566

  • Oyen D, Lane T (2015) Transfer learning for Bayesian discovery of multiple Bayesian networks. Knowl Inf Syst 43(1):1–28

    Article  Google Scholar 

  • Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359

    Article  Google Scholar 

  • Parhami B (1994) Voting algorithms. IEEE Trans Reliab 43(4):617–629

    Article  Google Scholar 

  • Pradhan SS, Loper E, Dligach D, Palmer M (2007) Semeval-2007 task 17: english lexical sample, srl and all words. In: Proceedings of the 4th international workshop on semantic evaluations, association for computational linguistics, pp 87–92

  • Prati RC, Batista GEAPA, Silva DF (2015) Class imbalance revisited: a new experimental setup to assess the performance of treatment methods. Knowl Inf Syst 45(1):247–270

    Article  Google Scholar 

  • Prpic J, Shukla P (2013) The theory of crowd capital. In: The 46th Hawaii international conference on system sciences (HICSS), IEEE, pp 3505–3514

  • Prpic J, Shukla P (2014) The contours of crowd capability. In: The 47th Hawaii international conference on system sciences (HICSS), IEEE, pp 3461–3470

  • Rätsch G, Schölkopf B, Smola AJ, Mika S, Onoda T, Müller KR (2000) Robust ensemble learning for data mining. Knowledge discovery and data mining. Current issues and new applications. Springer, Berlin, pp 341–344

    Chapter  Google Scholar 

  • Raykar VC, Yu S (2012) Eliminating spammers and ranking annotators for crowdsourced labeling tasks. J Mach Learn Res 13:491–518

    MathSciNet  MATH  Google Scholar 

  • Raykar VC, Yu S, Zhao LH, Jerebko A, Florin C, Valadez GH, Bogoni L, Moy L (2009) Supervised learning from multiple experts: whom to trust when everyone lies a bit. In: Proceedings of the 26th annual international conference on machine learning, ACM, pp 889–896

  • Raykar VC, Yu S, Zhao LH, Valadez GH, Florin C, Bogoni L, Moy L (2010) Learning from crowds. J Mach Learn Res 11:1297–1322

    MathSciNet  Google Scholar 

  • Rodrigues F, Pereira F, Ribeiro B (2013) Learning from multiple annotators: distinguishing good from random labelers. Pattern Recogn Lett 34(12):1428–1436

    Article  Google Scholar 

  • Rodrigues F, Pereira F, Ribeiro B (2014) Gaussian process classification and active learning with multiple annotators. In: Proceedings of the 31st international conference on machine learning (ICML-14), pp 433–441

  • Ross J, Irani L, Silberman M, Zaldivar A, Tomlinson B (2010) Who are the crowdworkers? Shifting demographics in mechanical turk. In: CHI’10 extended abstracts on human factors in computing systems, ACM, pp 2863–2872

  • Settles B (2010) Active learning literature survey. Univ Wis Madison 52(55–66):11

    Google Scholar 

  • Settles B, Craven M (2008) An analysis of active learning strategies for sequence labeling tasks. In: Proceedings of the conference on empirical methods in natural language processing, Association for Computational Linguistics, pp 1070–1079

  • Shah NB, Zhou D (2015) Double or nothing: multiplicative incentive mechanisms for crowdsourcing. In: Advances in neural information processing systems, vol 28. pp 1–9

  • Shah NB, Zhou D, Peres Y (2015) Approval voting and incentives in crowdsourcing. In: Proceedings of the 32nd international conference on machine learning (ICML)

  • Sheng VS, Provost F, Ipeirotis PG (2008) Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 614–622

  • Sheng VS (2011) Simple multiple noisy label utilization strategies. In: Data mining (ICDM), 2011 IEEE 11th international conference on, IEEE, pp 635–644

  • Sheshadri A, Lease M (2013) Square: a benchmark for research on computing crowd consensus. In: First AAAI conference on human computation and crowdsourcing, AAAI, pp 156–164

  • Smyth P, Burl MC, Fayyad UM, Perona P (1994) Knowledge discovery in large image databases: dealing with uncertainties in ground truth. In: KDD workshop, pp 109–120

  • Smyth P, Fayyad U, Burl M, Perona P, Baldi P (1995) Inferring ground truth from subjective labelling of venus images. In: Advances in Neural Information Processing Systems, vol 7. pp 1085–1092

  • Snow R, O’Connor B, Jurafsky D, Ng AY (2008) Cheap and fast—but is it good? Evaluating non-expert annotations for natural language tasks. In: Proceedings of the conference on empirical methods in natural language processing, association for computational linguistics, pp 254–263

  • Sorokin A, Forsyth D (2008) Utility data annotation with amazon mechanical turk. In: Proceedings of the First IEEE Workshop on Internet Vision at CVPR 2008, pp 1–8

  • Steinwart I, Christmann A (2008) Support vector machines. Springer, Berlin

    MATH  Google Scholar 

  • Strapparava C, Mihalcea R (2007) Semeval-2007 task 14: affective text. In: Proceedings of the 4th international workshop on semantic evaluations, association for computational linguistics, pp 70–74

  • Sukhbaatar S, Bruna J, Paluri M, Bourdev L, Fergus R (2014) Training convolutional networks with noisy labels. arXiv:1406.2080

  • Su Q, Pavlov D, Chow JH, Baker WC (2007) Internet-scale collection of human-reviewed data. In: Proceedings of the 16th international conference on world wide web, ACM, pp 231–240

  • Tang W, Lease M (2011) Semi-supervised consensus labeling for crowdsourcing. In: SIGIR workshop on crowdsourcing for information retrieval, pp 66–75

  • Tian T, Zhu J (2015) Uncovering the latent structures of crowd labeling. In: Pacific-Asia conference on knowledge discovery and data mining, pp 392–404

  • Ting KM (2002) An instance-weighting method to induce cost-sensitive trees. IEEE Trans Knowl Data Eng 14(3):659–665

    Article  Google Scholar 

  • Tong Y, Cao CC, Zhang CJ, Li Y, Chen L (2014) Crowdcleaner: data cleaning for multi-version data on the web via crowdsourcing. In: 2014 IEEE 30th international conference on data engineering (ICDE), IEEE, pp 1182–1185

  • Urbano J, Morato J, Marrero M, Martín D (2010) Crowdsourcing preference judgments for evaluation of music similarity tasks. In: ACM SIGIR workshop on crowdsourcing for search evaluation, pp 9–16

  • Vempaty A, Varshney LR, Varshney PK (2014) Reliable crowdsourcing for multi-class labeling using coding theory. IEEE J Sel Top Signal Process 8(4):667–679

    Article  Google Scholar 

  • Von Ahn L (2009) Human computation. In: The 46th ACM/IEEE design automation conference (DAC’09), IEEE, pp 418–419

  • Von Ahn L, Dabbish L (2004) Labeling images with a computer game. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp 319–326

  • Von Ahn L, Maurer B, McMillen C, Abraham D, Blum M (2008) recaptcha: Human-based character recognition via web security measures. Science 321(5895):1465–1468

    Article  MathSciNet  MATH  Google Scholar 

  • Vondrick C, Patterson D, Ramanan D (2013) Efficiently scaling up crowdsourced video annotation. Int J Comput Vis 101(1):184–204

    Article  Google Scholar 

  • Wainwright MJ, Jordan MI (2008) Graphical models, exponential families, and variational inference. Found Trends Mach Learn 1(1–2):1–305

    MATH  Google Scholar 

  • Wang G, Wang T, Zheng H, Zhao BY (2014) Man vs. machine: practical adversarial detection of malicious crowdsourcing workers. In: 23rd USENIX security symposium, USENIX Association, CA

  • Watanabe M, Yamaguchi K (2003) The EM algorithm and related statistical models. CRC Press, Boca Raton

    Book  MATH  Google Scholar 

  • Wauthier FL, Jordan MI (2011) Bayesian bias mitigation for crowdsourcing. In: Advances in neural information processing systems, pp 1800–1808

  • Weiss GM, Hirsh H (1998) The problem with noise and small disjuncts. In: ICML, pp 574–578

  • Welinder P, Perona P (2010) Online crowdsourcing: rating annotators and obtaining cost-effective labels. In: The 2010 IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW), IEEE, pp 25–32

  • Welinder P, Branson S, Perona P, Belongie SJ (2010) The multidimensional wisdom of crowds. In: Advances in neural information processing systems (NIPS), vol 23. pp 2424–2432

  • Wen X, Shao L, Xue Y, Fang W (2015) A rapid learning algorithm for vehicle classification. Inf Sci 295:395–406

    Article  Google Scholar 

  • Whitehill J, Wu Tf, Bergsma J, Movellan JR, Ruvolo PL (2009) Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In: Advances in neural information processing systems (NIPS), pp 2035–2043

  • Williams CK, Barber D (1998) Bayesian classification with gaussian processes. IEEE Trans Pattern Anal Mach Intell 20(12):1342–1351

    Article  Google Scholar 

  • Wu W, Liu Y, Guo M, Wang C, Liu X (2013) A probabilistic model of active learning with multiple noisy oracles. Neurocomputing 118:253–262

    Article  Google Scholar 

  • Xu Q, Huang Q, Yao Y (2012) Online crowdsourcing subjective image quality assessment. In: Proceedings of the 20th ACM international conference on multimedia, ACM, pp 359–368

  • Yan T, Kumar V, Ganesan D (2010a) Crowdsearch: exploiting crowds for accurate real-time image search on mobile phones. In: Proceedings of the 8th international conference on mobile systems, applications, and services, ACM, pp 77–90

  • Yan Y, Rosales R, Fung G, Dy J (2010b) Modeling multiple annotator expertise in the semi-supervised learning scenario. In: Proceedings of conference on uncertainty in artificial intelligence, pp 674–682

  • Yan Y, Rosales R, Fung G, Schmidt MW, Valadez GH, Bogoni L, Moy L, Dy JG (2010c) Modeling annotator expertise: learning when everybody knows a bit of something. In: International conference on artificial intelligence and statistics, pp 932–939

  • Yan Y, Fung GM, Rosales R, Dy JG (2011) Active learning from crowds. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 1161–1168

  • Zellner A (1996) An introduction to Bayesian inference in econometrics. Wiley, New York

  • Zhang Z, Pang J, Xie X (2013) Research on crowdsourcing quality control strategies and evaluation algorithm. Chin J Comput 8:1636–1649

    Google Scholar 

  • Zhang Y, Chen X, Zhou D, Jordan MI (2014) Spectral methods meet em: a provably optimal algorithm for crowdsourcing. In: Advances in neural information processing systems, vol 27. pp 1260–1268

  • Zhang J, Sheng V, Nicholson BA, Wu X (2015a) Ceka: a tool for mining the wisdom of crowds. J Mach Learn Res 16:2853–2858

    MathSciNet  MATH  Google Scholar 

  • Zhang J, Sheng VS, Wu J, Fu X, Wu X (2015b) Improving label quality in crowdsourcing using noise correction. In: Proceedings of the 24th ACM international on conference on information and knowledge management, ACM, pp 1931–1934

  • Zhang J, Wu X, Sheng VS (2015c) Imbalanced multiple noisy labeling. IEEE Trans Knowl Data Eng 27(2):489–503

    Article  Google Scholar 

  • Zhang J, Wu X, Shengs VS (2015d) Active learning with imbalanced multiple noisy labeling. IEEE Trans Cybern 45(5):1081–1093

    Google Scholar 

  • Zhang J, Sheng V, Wu J, Wu X (2016) Multi-class ground truth inference in crowdsourcing with clustering. IEEE Trans Knowl Data Eng 28(4):1080–1085

    Article  Google Scholar 

  • Zhong J, Tang K, Zhou ZH (2015) Active learning from crowds with unsure option. In: Proceedings of 2015 international joint conference on artificial intelligence

  • Zhou D, Basu S, Mao Y, Platt JC (2012) Learning from the wisdom of crowds by minimax entropy. In: Advances in neural information processing systems (NIPS), pp 2195–2203

  • Zhou D, Liu Q, Platt J, Meek C (2014) Aggregating ordinal labels from crowds by minimax conditional entropy. In: Proceedings of the 31st international conference on machine learning (ICML-14), pp 262–270

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing Zhang.

Additional information

This research has been supported by the China Postdoctoral Science Foundation under grant 2016M590457, the Program for Changjiang Scholars and Innovative Research Team in University (PCSIRT) of the Ministry of Education, China, under grant IRT13059, the National 973 Program of China under grant 2013CB329604, and the US National Science Foundation under grant IIS-1115417.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, J., Wu, X. & Sheng, V.S. Learning from crowdsourced labeled data: a survey. Artif Intell Rev 46, 543–576 (2016). https://doi.org/10.1007/s10462-016-9491-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-016-9491-9

Keywords

Navigation