Crowd Learning with Candidate Labeling: An EM-Based Solution

Beñaran-Muñoz, Iker; Hernández-González, Jerónimo; Pérez, Aritz

doi:10.1007/978-3-030-00374-6_2

Iker Beñaran-Muñoz²⁰,
Jerónimo Hernández-González²¹ &
Aritz Pérez²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11160))

Included in the following conference series:

Conference of the Spanish Association for Artificial Intelligence

827 Accesses
1 Citations

Abstract

Crowdsourcing is widely used nowadays in machine learning for data labeling. Although in the traditional case annotators are asked to provide a single label for each instance, novel approaches allow annotators, in case of doubt, to choose a subset of labels as a way to extract more information from them. In both the traditional and these novel approaches, the reliability of the labelers can be modeled based on the collections of labels that they provide. In this paper, we propose an Expectation-Maximization-based method for crowdsourced data with candidate sets. Iteratively the likelihood of the parameters that model the reliability of the labelers is maximized, while the ground truth is estimated. The experimental results suggest that the proposed method performs better than the baseline aggregation schemes in terms of estimated accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Banerjee, S.O.A., Gurari, D.: Let’s agree to disagree: a meta-analysis of disagreement among crowdworkers during visual question answering. In: GroupSight Workshop at AAAI HCOMP, Quebec City, Canada (2017)
Google Scholar
Beñaran-Muñoz, I., Hernández-González, J., Pérez, A.: Weak Labeling for Crowd Learning. arXiv e-prints (2018)
Google Scholar
Brams, S.J., Fishburn, P.C.: Approval voting. Am. Polit. Sci. Rev. 72(3), 831–847 (1978)
Article Google Scholar
Côme, E., Oukhellou, L., Denoeux, T., Aknin, P.: Learning from partially supervised data using mixture models and belief functions. Pattern Recognit. 42(3), 334–348 (2009)
Article Google Scholar
Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the EM algorithm. J. Roy. Stat. Soc. Ser. C 28(1), 20–28 (1979)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B 39(1), 1–38 (1977)
MathSciNet MATH Google Scholar
Ding, Y.X., Zhou, Z.H.: Crowdsourcing with unsure option. Mach. Learn. 107(4), 749–766 (2018)
Article MathSciNet Google Scholar
Falmagne, J.C., Regenwetter, M.: A random utility model for approval voting. J. Math. Psychol. 40(2), 152–159 (1996)
Article Google Scholar
Grady, C., Lease, M.: Crowdsourcing document relevance assessment with mechanical turk. In: NAACL HLT 2010 Workshop, pp. 172–179 (2010)
Google Scholar
Hernández-González, J., Inza, I., Lozano, J.A.: Weak supervision and other non-standard classification problems: a taxonomy. Pattern Rec. Lett. 69, 49–55 (2016)
Article Google Scholar
Karger, D.R., Oh, S., Shah, D.: Iterative learning for reliable crowdsourcing systems. In: NIPS, pp. 1953–1961 (2011)
Google Scholar
López-Cruz, P.L., Bielza, C., Larrañaga, P.: Learning conditional linear gaussian classifiers with probabilistic class labels. In: Bielza, C., et al. (eds.) CAEPIA 2013. LNCS (LNAI), vol. 8109, pp. 139–148. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40643-0_15
Chapter Google Scholar
Procaccia, A.D., Shah, N.: Is approval voting optimal given approval votes? In: NIPS, pp. 1801–1809 (2015)
Google Scholar
Raykar, V.C., et al.: Learning from crowds. J. Mach. Learn. Res. 11, 1297–1322 (2010)
MathSciNet Google Scholar
Smyth, P., Fayyad, U.M., Burl, M.C., Perona, P., Baldi, P.: Inferring ground truth from subjective labelling of venus images. In: Proceedings of NIPS 7, pp. 1085–1092 (1994)
Google Scholar
Venanzi, M., Guiver, J., Kohli, P., Jennings, N.R.: Time-sensitive bayesian information aggregation for crowdsourcing systems. J. Artif. Intell. Res. 56, 517–545 (2016)
Article MathSciNet Google Scholar
Welinder, P., Branson, S., Belongie, S., Perona, P.: The multidimensional wisdom of crowds. In: Proceedings of NIPS 23, pp. 2424–2432 (2010)
Google Scholar
Whitehill, J., Ruvolo, P., Wu, T., Bergsma, J., Movellan, J.R.: Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: Proceedings of NIPS 22, pp. 2035–2043 (2009)
Google Scholar
Zhang, J., Sheng, V.S., Wu, J., Wu, X.: Multi-class ground truth inference in crowdsourcing with clustering. IEEE Trans. Knowl. Data Eng. 28(4), 1080–1085 (2016)
Article Google Scholar
Zhang, Y., Chen, X., Zhou, D., Jordan, M.I.: Spectral methods meet EM: a provably optimal algorithm for crowdsourcing. In: Advances in Neural Information Processing Systems, pp. 1260–1268 (2014)
Google Scholar
Zhong, J., Tang, K., Zhou, Z.H.: Active learning from crowds with unsure option. In: Proceedings of 24th IJCAI, pp. 1061–1068 (2015)
Google Scholar

Download references

Acknowledgments

IBM and AP are both supported by the Spanish Ministry MINECO through BCAM Severo Ochoa excellence accreditation SEV-2013-0323 and the project TIN2017-82626-R funded by (AEI/FEDER, UE). IBM is also supported by the grant BES-2016-078095. AP is also supported by the Basque Government through the BERC 2014-2017 and the ELKARTEK programs, and by the MINECO through BCAM Severo Ochoa excellence accreditation SVP-2014-068574. JHG is supported by the Basque Government (IT609-13, Elkartek BID3A) and the MINECO (TIN2016-78365-R).

Author information

Authors and Affiliations

Basque Center for Applied Mathematics, Al. Mazarredo 14, Bilbao, Spain
Iker Beñaran-Muñoz & Aritz Pérez
University of the Basque Country UPV/EHU, P. Manuel de Lardizabal 1, Donostia, Spain
Jerónimo Hernández-González

Authors

Iker Beñaran-Muñoz
View author publications
You can also search for this author in PubMed Google Scholar
Jerónimo Hernández-González
View author publications
You can also search for this author in PubMed Google Scholar
Aritz Pérez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Iker Beñaran-Muñoz .

Editor information

Editors and Affiliations

Andalusian Research Institute on Data Science and Computational Intelligence (DaSCI), University of Granada, Granada, Spain
Francisco Herrera
Andalusian Research Institute on Data Science and Computational Intelligence (DaSCI), University of Granada, Granada, Spain
Sergio Damas
Andalusian Research Institute on Data Science and Computational Intelligence (DaSCI), University of Granada, Granada, Spain
Rosana Montes
Andalusian Research Institute on Data Science and Computational Intelligence (DaSCI), University of Granada, Granada, Spain
Sergio Alonso
Andalusian Research Institute on Data Science and Computational Intelligence (DaSCI), University of Granada, Granada, Spain
Óscar Cordón
Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
Antonio González
School of Engineering, Pablo de Olavide University, Seville, Spain
Alicia Troncoso

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Beñaran-Muñoz, I., Hernández-González, J., Pérez, A. (2018). Crowd Learning with Candidate Labeling: An EM-Based Solution. In: Herrera, F., et al. Advances in Artificial Intelligence. CAEPIA 2018. Lecture Notes in Computer Science(), vol 11160. Springer, Cham. https://doi.org/10.1007/978-3-030-00374-6_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-00374-6_2
Published: 27 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00373-9
Online ISBN: 978-3-030-00374-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics