Abstract
Crowdsourcing is widely used nowadays in machine learning for data labeling. Although in the traditional case annotators are asked to provide a single label for each instance, novel approaches allow annotators, in case of doubt, to choose a subset of labels as a way to extract more information from them. In both the traditional and these novel approaches, the reliability of the labelers can be modeled based on the collections of labels that they provide. In this paper, we propose an Expectation-Maximization-based method for crowdsourced data with candidate sets. Iteratively the likelihood of the parameters that model the reliability of the labelers is maximized, while the ground truth is estimated. The experimental results suggest that the proposed method performs better than the baseline aggregation schemes in terms of estimated accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Banerjee, S.O.A., Gurari, D.: Let’s agree to disagree: a meta-analysis of disagreement among crowdworkers during visual question answering. In: GroupSight Workshop at AAAI HCOMP, Quebec City, Canada (2017)
Beñaran-Muñoz, I., Hernández-González, J., Pérez, A.: Weak Labeling for Crowd Learning. arXiv e-prints (2018)
Brams, S.J., Fishburn, P.C.: Approval voting. Am. Polit. Sci. Rev. 72(3), 831–847 (1978)
Côme, E., Oukhellou, L., Denoeux, T., Aknin, P.: Learning from partially supervised data using mixture models and belief functions. Pattern Recognit. 42(3), 334–348 (2009)
Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the EM algorithm. J. Roy. Stat. Soc. Ser. C 28(1), 20–28 (1979)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B 39(1), 1–38 (1977)
Ding, Y.X., Zhou, Z.H.: Crowdsourcing with unsure option. Mach. Learn. 107(4), 749–766 (2018)
Falmagne, J.C., Regenwetter, M.: A random utility model for approval voting. J. Math. Psychol. 40(2), 152–159 (1996)
Grady, C., Lease, M.: Crowdsourcing document relevance assessment with mechanical turk. In: NAACL HLT 2010 Workshop, pp. 172–179 (2010)
Hernández-González, J., Inza, I., Lozano, J.A.: Weak supervision and other non-standard classification problems: a taxonomy. Pattern Rec. Lett. 69, 49–55 (2016)
Karger, D.R., Oh, S., Shah, D.: Iterative learning for reliable crowdsourcing systems. In: NIPS, pp. 1953–1961 (2011)
López-Cruz, P.L., Bielza, C., Larrañaga, P.: Learning conditional linear gaussian classifiers with probabilistic class labels. In: Bielza, C., et al. (eds.) CAEPIA 2013. LNCS (LNAI), vol. 8109, pp. 139–148. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40643-0_15
Procaccia, A.D., Shah, N.: Is approval voting optimal given approval votes? In: NIPS, pp. 1801–1809 (2015)
Raykar, V.C., et al.: Learning from crowds. J. Mach. Learn. Res. 11, 1297–1322 (2010)
Smyth, P., Fayyad, U.M., Burl, M.C., Perona, P., Baldi, P.: Inferring ground truth from subjective labelling of venus images. In: Proceedings of NIPS 7, pp. 1085–1092 (1994)
Venanzi, M., Guiver, J., Kohli, P., Jennings, N.R.: Time-sensitive bayesian information aggregation for crowdsourcing systems. J. Artif. Intell. Res. 56, 517–545 (2016)
Welinder, P., Branson, S., Belongie, S., Perona, P.: The multidimensional wisdom of crowds. In: Proceedings of NIPS 23, pp. 2424–2432 (2010)
Whitehill, J., Ruvolo, P., Wu, T., Bergsma, J., Movellan, J.R.: Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: Proceedings of NIPS 22, pp. 2035–2043 (2009)
Zhang, J., Sheng, V.S., Wu, J., Wu, X.: Multi-class ground truth inference in crowdsourcing with clustering. IEEE Trans. Knowl. Data Eng. 28(4), 1080–1085 (2016)
Zhang, Y., Chen, X., Zhou, D., Jordan, M.I.: Spectral methods meet EM: a provably optimal algorithm for crowdsourcing. In: Advances in Neural Information Processing Systems, pp. 1260–1268 (2014)
Zhong, J., Tang, K., Zhou, Z.H.: Active learning from crowds with unsure option. In: Proceedings of 24th IJCAI, pp. 1061–1068 (2015)
Acknowledgments
IBM and AP are both supported by the Spanish Ministry MINECO through BCAM Severo Ochoa excellence accreditation SEV-2013-0323 and the project TIN2017-82626-R funded by (AEI/FEDER, UE). IBM is also supported by the grant BES-2016-078095. AP is also supported by the Basque Government through the BERC 2014-2017 and the ELKARTEK programs, and by the MINECO through BCAM Severo Ochoa excellence accreditation SVP-2014-068574. JHG is supported by the Basque Government (IT609-13, Elkartek BID3A) and the MINECO (TIN2016-78365-R).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Beñaran-Muñoz, I., Hernández-González, J., Pérez, A. (2018). Crowd Learning with Candidate Labeling: An EM-Based Solution. In: Herrera, F., et al. Advances in Artificial Intelligence. CAEPIA 2018. Lecture Notes in Computer Science(), vol 11160. Springer, Cham. https://doi.org/10.1007/978-3-030-00374-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-00374-6_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00373-9
Online ISBN: 978-3-030-00374-6
eBook Packages: Computer ScienceComputer Science (R0)