Abstract
We present a robust active learning technique for situations where there are weak and adversarial oracles. Our work falls under the general umbrella of active learning in which training data is insufficient and oracles are queried to supply labels for the most informative samples to expand the training set. On top of that, we consider problems where a large percentage of oracles may be strategically lying, as in adversarial settings. We present an adversarial active learning technique that explores the duality between oracle modeling and data modeling. We demonstrate on real datasets that our adversarial active learning technique is superior to not only the heuristic majority-voting technique but one of the state-of-the-art adversarial crowdsourcing technique—Generative model of Labels, Abilities, and Difficulties (GLAD), when genuine oracles are outnumbered by weak oracles and malicious oracles, and even in the extreme cases where all the oracles are either weak or malicious. To put our technique under more rigorous tests, we compare our adversarial active learner to the ideal active learner that always receives correct labels. We demonstrate that our technique is as effective as the ideal active learner when only one third of the oracles are genuine.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Balcan, M., Beygelzimer, A., Langford, J.: Agnostic active learning. In: ICML, pp. 65–72 (2006)
Beygelzimer, A., Langford, J., Tong, Z., Hsu, D.J.: Agnostic active learning without constraints. In: Advances in Neural Information Processing Systems, vol. 23, pp. 199–207. Curran Associates, Inc. (2010)
Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: Proceedings of the 11th International AAAI Conference on Web and Social Media, ICWSM 2017 (2017)
French, S.: Group consensus probability distributions: a critical survey. Bayesian Stat. 2, 183–202 (1985)
Jagabathula, S., Subramanian, L., Venkataraman, A.: Reputation-based worker filtering in crowdsourcing. In: NIPS, pp. 2492–2500 (2014)
LIBSVM: LIBSVM Data: Classification, Regression, and Multi-label (2014). http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Liu, Q., Peng, J., Ihler, A.T.: Variational inference for crowdsourcing. In: Advances In Neural Information Processing Systems, pp. 692–700 (2012)
Ma, F., et al.: Faitcrowd: fine grained truth discovery for crowdsourced data aggregation. In: Proceedings of the 21th ACM SIGKDD, pp. 745–754 (2015)
Miller, B., et al.: Adversarial active learning. In: Proceedings of the 2014 AISec Workshop, pp. 3–14 (2014)
Raykar, V.C., Yu, S.: Eliminating spammers and ranking annotators for crowdsourced labeling tasks. J. Mach. Learn. Res. 13, 491–518 (2012)
Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? Improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD, pp. 614–622 (2008)
Uebersax, J.S.: Statistical modeling of expert ratings on medical treatment appropriateness. J. Am. Stat. Assoc. 88, 421–427 (1993)
Vuurens, J.B., de Vries, A.P.: Obtaining high-quality relevance judgments using crowdsourcing. IEEE Internet Comput. 16, 20–27 (2012)
Welinder, P., Branson, S., Perona, P., Belongie, S.J.: The multidimensional wisdom of crowds. In: NIPS, pp. 2424–2432 (2010)
Whitehill, J., Ruvolo, P., Wu, T., Bergsma, J., Movellan, J.R.: Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: NIPS, pp. 2035–2043 (2009)
Acknowledgement
The research reported herein was supported in part by NIH award 1R01HG006844, NSF awards CICI- 1547324, IIS-1633331, CNS-1837627, OAC-1828467 and ARO award W911NF-17-1-0356.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhou, Y., Kantarcioglu, M., Xi, B. (2019). Adversarial Active Learning in the Presence of Weak and Malicious Oracles. In: U., L., Lauw, H. (eds) Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2019. Lecture Notes in Computer Science(), vol 11607. Springer, Cham. https://doi.org/10.1007/978-3-030-26142-9_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-26142-9_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26141-2
Online ISBN: 978-3-030-26142-9
eBook Packages: Computer ScienceComputer Science (R0)