Skip to main content

Advertisement

SpringerLink
  • Log in
Book cover

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

ECML PKDD 2011: Machine Learning and Knowledge Discovery in Databases pp 553–568Cite as

  1. Home
  2. Machine Learning and Knowledge Discovery in Databases
  3. Conference paper
Learning from Inconsistent and Unreliable Annotators by a Gaussian Mixture Model and Bayesian Information Criterion

Learning from Inconsistent and Unreliable Annotators by a Gaussian Mixture Model and Bayesian Information Criterion

  • Ping Zhang23 &
  • Zoran Obradovic23 
  • Conference paper
  • 5427 Accesses

  • 9 Citations

Part of the Lecture Notes in Computer Science book series (LNAI,volume 6913)

Abstract

Supervised learning from multiple annotators is an increasingly important problem in machine leaning and data mining. This paper develops a probabilistic approach to this problem when annotators are not only unreliable, but also have varying performance depending on the data. The proposed approach uses a Gaussian mixture model (GMM) and Bayesian information criterion (BIC) to find the fittest model to approximate the distribution of the instances. Then the maximum a posterior (MAP) estimation of the hidden true labels and the maximum-likelihood (ML) estimation of quality of multiple annotators are provided alternately. Experiments on emotional speech classification and CASP9 protein disorder prediction tasks show performance improvement of the proposed approach as compared to the majority voting baseline and a previous data-independent approach. Moreover, the approach also provides more accurate estimates of individual annotators performance for each Gaussian component, thus paving the way for understanding the behaviors of each annotator.

Keywords

  • multiple noisy experts
  • data-dependent experts
  • Gaussian mixture model
  • Bayesian information criterion

Download conference paper PDF

References

  1. Amazon Mechanical Turk, http://www.mturk.com

  2. Smyth, P., Fayyad, U.M., Burl, M.C., Perona, P., Baldi, P.: Inferring ground truth from subjective labelling of venus images. In: NIPS, pp. 1085–1092 (1994)

    Google Scholar 

  3. Jin, R., Ghahramani, Z.: Learning with multiple labels. In: NIPS, pp. 897–904 (2002)

    Google Scholar 

  4. Sheng, V.S., Provost, F.J., Ipeirotis, P.G.: Get another label? Improving data quality and data mining using multiple, noisy labelers. In: KDD, pp. 614–622 (2008)

    Google Scholar 

  5. Donmez, P., Carbonell, J.G.: Proactive learning: cost-sensitive active learning with multiple imperfect oracles. In: CIKM, pp. 619–628 (2008)

    Google Scholar 

  6. Donmez, P., Carbonell, J.G., Schneider, J.: Efficiently learning the accuracy of labeling sources for selective sampling. In: KDD, pp. 259–268 (2009)

    Google Scholar 

  7. Crammer, K., Kearns, M., Wortman, J.: Learning from multiple sources. Journal of Machine Learning Research 9, 1757–1774 (2008)

    MATH  MathSciNet  Google Scholar 

  8. Dekel, O., Shamir, O.: Vox populi: Collecting high-quality labels from a crowd. In: COLT (2009)

    Google Scholar 

  9. Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast - but is it good? Evaluating non-expert annotations for natural language tasks. In: EMNLP, pp. 254–263 (2008)

    Google Scholar 

  10. Cholleti, S.R., Goldman, S.A., Blum, A., Politte, D.G., Don, S., Smith, K., Prior, F.: Veri-tas: combining expert opinions without labeled data. International Journal on Artificial Intelligence Tools 18, 633–651 (2009)

    CrossRef  Google Scholar 

  11. Raykar, V.C., Yu, S., Zhao, L.H., Jerebko, A.K., Florin, C., Valadez, G.H., Bogoni, L., Moy, L.: Supervised learning from multiple experts: whom to trust when everyone lies a bit. In: ICML, pp. 889–896 (2009)

    Google Scholar 

  12. Whitehill J., Ruvolo P., Wu T., Bergsma J., Movellan J.: Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In NIPS (2009)

    Google Scholar 

  13. Welinder P., Branson S., Belongie S., Perona P.: The multidimensional wisdom of crowds. In: NIPS (2010)

    Google Scholar 

  14. Audhkhasi K., Narayanan S.: Data-dependent evaluator modeling and its application to emotional valence classification from speech. In: InterSpeech, pp. 2366–2369 (2010)

    Google Scholar 

  15. Rzhetsky, A., Shatkay, H., Wilbur, W.J.: How to get the most out of your curation effort. PLoS. Comput. Biol. 5(5), e1000391 (2009)

    CrossRef  Google Scholar 

  16. Zhang, P., Obradovic, Z.: Unsupervised integration of multiple protein disorder predictors. In: IEEE Int’l. Conf. Bioinformatics and Biomedicine, pp. 49–52 (2010)

    Google Scholar 

  17. Yan, Y., Rosales, R., Fung, G., Schmidt, M., Hermosillo, G., Bogoni, L., Moy, L., Dy, J.G.: Modeling annotator expertise: learning when everybody knows a bit of something. Journal of Machine Learning Research - Proceedings Track 9, 932–939 (2010)

    Google Scholar 

  18. Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)

    CrossRef  MATH  MathSciNet  Google Scholar 

  19. Martinez, W.L., Martinez, A.R.: Exploratory data analysis with MATLAB, pp. 163–195. Chapman & Hall/CRC, Boca Raton (2004)

    CrossRef  MATH  Google Scholar 

  20. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B 39(1), 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  21. Fraley, C., Raftery, A.E.: How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput. J., 578–588 (1998)

    Google Scholar 

  22. Bishop, C.: Pattern recognition and machine learning, pp. 203–213. Springer, New York (2006)

    MATH  Google Scholar 

  23. Lee, S., Yildirim, S., Kazemzadeh, A., Narayanan, S.: An articulatory study of emotional speech production. In: Eurospeech, pp. 497–500 (2005)

    Google Scholar 

  24. VOICEBOX, http://www.ee.imperial.ac.uk/hp/staff/dmb/voicebox/voicebox.html

  25. CASP experiments, http://predictioncenter.org/

  26. Noivirt-Brik, O., Prilusky, J., Sussman, J.L.: Assessment of disorder predictions in CASP8. Proteins 77(suppl. 9), 210–216 (2009)

    CrossRef  Google Scholar 

  27. Uversky, V.N., Dunker, A.K.: Understanding protein non-folding. Biochim. Biophys. Acta 1804, 1231–1264 (2010)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Center for Data Analytics and Biomedical Informatics, Temple University, Philadelphia, PA, 19122, USA

    Ping Zhang & Zoran Obradovic

Authors
  1. Ping Zhang
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Zoran Obradovic
    View author publications

    You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

  1. Department of Informatics and Telecommunications, University of Athens, Panepistimioupolis, Ilisia, 15784, Athens, Greece

    Dimitrios Gunopulos

  2. Google Switzerland GmbH, Brandschenkestrasse 110, 8002, Zurich, Switzerland

    Thomas Hofmann

  3. Department of Computer Science, University of Bari “Aldo Moro”, via Orabona 4, 70125, Bari, Italy

    Donato Malerba

  4. Deptartment of Informatics, Athens University of Economics and Business, Patision 76, 10434, Athens, Greece

    Michalis Vazirgiannis

Rights and permissions

Reprints and Permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, P., Obradovic, Z. (2011). Learning from Inconsistent and Unreliable Annotators by a Gaussian Mixture Model and Bayesian Information Criterion. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2011. Lecture Notes in Computer Science(), vol 6913. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23808-6_36

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/978-3-642-23808-6_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23807-9

  • Online ISBN: 978-3-642-23808-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Over 10 million scientific documents at your fingertips

Switch Edition
  • Academic Edition
  • Corporate Edition
  • Home
  • Impressum
  • Legal information
  • Privacy statement
  • California Privacy Statement
  • How we use cookies
  • Manage cookies/Do not sell my data
  • Accessibility
  • FAQ
  • Contact us
  • Affiliate program

Not logged in - 34.232.63.94

Not affiliated

Springer Nature

© 2023 Springer Nature Switzerland AG. Part of Springer Nature.