Abstract
Crowdsourcing services have been proven efficient in collecting large amount of labeled data for supervised learning tasks. However, the low cost of crowd workers leads to unreliable labels, a new problem for learning a reliable classifier. Various methods have been proposed to infer the ground truth or learn from crowd data directly though, there is no guarantee that these methods work well for highly biased or noisy crowd labels. Motivated by this limitation of crowd data, in this paper, we propose a novel framewor for improving the performance of crowdsourcing learning tasks by some additional expert labels, that is, we treat each labeler as a personal classifier and combine all labelers’ opinions from a model combination perspective, and summarize the evidence from crowds and experts naturally via a Bayesian classifier in the intermediate feature space formed by personal classifiers. We also introduce active learning to our framework and propose an uncertainty sampling algorithm for actively obtaining expert labels. Experiments show that our method can significantly improve the learning quality as compared with those methods solely using crowd labels.
Similar content being viewed by others
Notes
We assume that an expert always gives true labels and use the two terms ‘expert labels’ and ‘ground truth’ interchangeably. As experts can also make mistakes, this assumption is a simplification and may be relaxed in future work.
A task of which true labels are completely infeasible is beyond the scope of our current work as it is no longer a traditional supervised learning problem.
We assume at this point that all labelers give full labels to keep the notations simple. We will discuss our assumption on the number of labels and the case of missing labels in Section 3.6.1.
References
Ambati, V., Vogel, S., & Carbonell, J. (2010). Active learning and crowd-sourcing for machine translation. In Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC), (Vol. 7 pp. 2169–2174): Citeseer.
Bishop, C.M., & et al. (2006). Pattern recognition and machine learning Vol. 1. New York: Springer.
Brew, A., Greene, D., & Cunningham, P. (2010). Using crowdsourcing and active learning to track sentiment in online media. In Proceedings of the 19th European Conference on Artificial Intelligence (ECAI) (pp. 145–150).
Dagan, I., & Engelson, S.P. (1995). Committee-based sampling for training probabilistic classifiers. In Proceedings of the International Conference on Machine Learning (ICML’95), (Vol. 95 pp. 150–157): Citeseer.
Dawid, A.P., & Skene, A.M. (1979). Maximum likelihood estimation of observer error-rates using the EM algorithm. Applied Statistics, 28(1), 20–28.
Donmez, P., Carbonell, J.G., & Schneider, J. (2009). Efficiently learning the accuracy of labeling sources for selective sampling. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 259–268).
Finin, T., Murnane, W., Karandikar, A., Keller, N., Martineau, J., & Dredze, M. (2010). Annotating named entities in twitter data with crowdsourcing. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk (pp. 80–88).
Frank, A., & Asuncion, A. (2010). UCI machine learning repository.
Hu, Q., He, Q., Huang, H., Chiew, K., & Liu, Z. (2014). Learning from crowds under experts’ supervision. In Proceedings of the 18th Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 200–211).
Ipeirotis, P.G., Provost, F., & Wang, J. (2010). Quality management on amazon mechanical turk. In Proceedings of the ACM SIGKDD Workshop on Human Computation (pp. 64–67).
Jaakkola, T., & Jordan, M. (1997). A variational approach to Bayesian logistic regression models and their extensions. In Proceedings of the 6th International Workshop on Artificial Intelligence and Statistics.
Kajino, H., Tsuboi, Y., & Kashima, H. (2012). A convex formulation for learning from crowds. In Proceedings of the 26th AAAI Conference on Artificial Intelligence (pp. 73–79).
Kajino, H., Tsuboi, Y., Sato, I., & Kashima, H. (2012). Learning from crowds and experts. In Proceedings of the 4th Human Computation Workshop (HCOMP’12) (pp. 107–113).
Karger, D.R., Oh, S., & Shah, D. (2011). Iterative learning for reliable crowdsourcing systems. In Proceedings of Advances in Neural Information Processing Systems (NIPS’11) (pp. 1953–1961).
Kuncheva, L.I. (2007). Combining pattern classifiers: Methods and algorithms (kuncheva, li; 2004)[book review]. IEEE Transactions on Neural Networks, 18(3), 964–964.
Kuncheva, L.I., Bezdek, J.C., & Duin, R.P.W. (2001). Decision templates for multiple classifier fusion: an experimental comparison. Pattern Recognition, 34(2), 299–314.
Lewis, D.D., & Gale, W.A. (1994). A sequential algorithm for training text classifiers. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 3–12).
Liu, Q., Peng, J., & Ihler, A. (2012). Variational inference for crowdsourcing. In Proceedings of Advances in Neural Information Processing Systems (NIPS’12) (pp. 701–709).
McCallumzy, A.K., & Nigamy, K. (1998). Employing em and pool-based active learning for text classification. In Proceedings of the International Conference on Machine Learning (ICML’95): Citeseer.
Merz, C.J. (1999). Using correspondence analysis to combine classifiers. Machine Learning, 36(1), 33–58.
Raykar, V.C., & Yu, S. (2011). Ranking annotators for crowdsourced labeling tasks. In Proceedings of Advances in Neural Information Processing Systems (NIPS’11) (pp. 1809–1817).
Raykar, V.C., Yu, S., Zhao, L.H., Valadez, G.H., Florin, C., Bogoni, L., & Moy, L. (2010). Learning from crowds. The Journal of Machine Learning Research, 11(4), 1297–1322.
Seung, H.S., Opper, M., & Sompolinsky, H. (1992). Query by committee. In Proceedings of the 5th Annual Workshop on Computational Learning Theory (pp. 287–294): ACM.
Sheng, V.S., Provost, F., & Ipeirotis, P.G. (2008). Get another label? Improving data quality and data mining using multiple, noisy labelers. In Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 614–622).
Snow, R., O’Connor, B., Jurafsky, D., & Ng, A. (2008). Cheap and fast—but is it good? evaluating non-expert annotations for natural language tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 254–263): Association for Computational Linguistics.
Tang, W., & Lease, M. (2011). Semi-supervised consensus labeling for crowdsourcing. In Proceedings of the 2nd ACM SIGIR Workshop on Crowdsourcing for Information Retrieval (CIR’11).
Tian, Y., & Zhu, J. (2012). Learning from crowds in the presence of schools of thought. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 226–234).
Wauthier, F.L., & Jordan, M.I. (2011). Bayesian bias mitigation for crowdsourcing. In Proceedings of Advances in Neural Information Processing Systems (NIPS’11) (pp. 1800–1808).
Welinder, P., Branson, S., Belongie, S., & Perona, P. (2010). The multidimensional wisdom of crowds. In Proceedings of Advances in Neural Information Processing Systems (NIPS’10) (pp. 2424–2432).
Wolpert, D.H. (1992). Stacked generalization. Neural Networks, 5(2), 241–259.
Yan, Y., & et al. (2010). Modeling annotator expertise: Learning when everybody knows a bit of something. In Proceedings of 13th International Conference on Artificial Intelligence and Statistics (AISTATS’10), (Vol. 9 pp. 932–939).
Yan, Y., Rosales, R., Fung, G., & Dy, J. (2011). Active learning from crowds. In Proceedings of the International Conference on Machine Learning (ICML’11) (pp. 1161–1168).
Acknowledgments
This work is partly supported by the Fundamental Research Funds for the Central Universities under Grant No. 2042015kf0038, and the Scientific Research Foundation for Introduced Talents of Wuhan University.
Author information
Authors and Affiliations
Corresponding author
Additional information
A conference version of our framework can be found in Hu et al. (2014).
Rights and permissions
About this article
Cite this article
Hu, Q., He, Q., Huang, H. et al. A formalized framework for incorporating expert labels in crowdsourcing environment. J Intell Inf Syst 47, 403–425 (2016). https://doi.org/10.1007/s10844-015-0371-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-015-0371-6