Abstract
Crowdsourcing platforms are very frequently used for collecting training data. Quality assurance is the most obvious problem but not the only one. This work proposes iterative approach which helps to reduce costs of building training/testing datasets. Information about classifier confidence is used for making decision whether new labels from crowdsourcing platform are required for this particular object. Conducted experiments have confirmed that proposed method reduces costs by over 50 % in best scenarios and at the same time increases the percentage of correctly classified objects.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
While it may seem strange solving 1000 CAPTCHAs costs only a few USD.
- 6.
- 7.
We use word partially to stress that the initial number of cases used for training was really small and our classifier may be quite imprecise.
- 8.
We use a standard 10-fold cross-validation approach.
- 9.
- 10.
Deep learning algorithms are an exception.
- 11.
- 12.
- 13.
- 14.
It is an official name used by Amazon for entities that outsource jobs.
- 15.
Higher vages in crowdsourcing platform increase quality and motivation of workers and can be used for collecting labels that are less noisy.
References
Attenberg, J.M., Ipeirotis, P.G.: Task-agnostic integration of human and machine intelligence, US Patent App. 13/863,751, 16 April 2013
Can, G., Odobez, J.-M., Gatica-Perez, D.: Is that a jaguar?: segmenting ancient maya glyphs via crowdsourcing. In: Proceedings of the 2014 International ACM Workshop on Crowdsourcing for Multimedia, pp. 37–40. ACM (2014)
Chen, L.-C., Fidler, S., Yuille, A.L., Urtasun, R.: Beat the mturkers: automatic image labeling from weak 3d supervision. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3198–3205. IEEE (2014)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. Roy. Stat. Soc. Ser. B (Methodol.) 39(1), 1–38 (1977)
Downs, J.S., Holbrook, M.B., Sheng, S., Cranor, L.F.: Are your participants gaming the system?: screening mechanical turk workers. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2010, pp. 2399–2402. ACM, New York (2010)
Elahi, M., Ricci, F., Rubens, N.: Active learning strategies for rating elicitation in collaborative filtering: a system-wide perspective. ACM Trans. Intell. Syst. Technol. (TIST) 5(1), 13 (2013)
Fort, K., Adda, G., Cohen, K.B.: Amazon mechanical turk: gold mine or coal mine? Comput. Linguist. 37(2), 413–420 (2011)
Howe, J.: The rise of crowdsourcing. Wired Mag. 14(6), 1–4 (2006)
Khanna, S., Ratan, A., Davis, J., Thies, W.: Evaluating and improving the usability of mechanical turk for low-income workers in India. In: Proceedings of the First ACM Symposium on Computing for Development, p. 12. ACM (2010)
Ross, J., Lilly Irani, M., Silberman, A.Z., Tomlinson, B.: Who are the crowdworkers?: shifting demographics in mechanical turk. In: CHI 2010 Extended Abstracts on Human Factors in Computing Systems, pp. 2863–2872. ACM (2010)
Schneider, J.L., Weisz, J.R.: Using mechanical turk to study family processes and youth mental health: a test of feasibility. J. Child Fam. Stud. 24(11), 3235–3246 (2015)
Vasantha, A., Vijayumar, G., Corney, J., AcurBakir, N., Lynn, A., Jagadeesan, A.P., Smith, M., Agarwal, A.: Social implications of crowdsourcing in rural scotland. Int. J. Soc. Sci. Hum. Behav. Study 1(3), 47–52 (2014)
Wang, G., Wang, T., Zhang, H., Zhao, B.Y.: Man vs. machine: practical adversarial detection of malicious crowdsourcing workers. In: Proceedings of the 23rd USENIX Conference on Security Symposium, SEC 20114, pp. 239–254. USENIX Association, Berkeley (2014)
Wang, J., Ipeirotis, P.G., Provost, F.: Managing crowdsourcing workers. In: The 2011 Winter Conference on Business Intelligence, pp. 10–12 (2011)
Wang, J., Ipeirotis, P.G., Provost, F.: Quality-Based Pricing for Crowdsourced Workers, NYU Working Paper No. 2451/31833, June 2013. Available at SSRN: http://ssrn.com/abstract=2283000
Zhu, X., Vondrick, C., Ramanan, D., Fowlkes, C.: Do we need more training data or better models for object detection?. In: BMVC vol. 3, p. 5. Citeseer (2012)
Acknowledgments
This project has received funding from the European Unions Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 690962.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Nielek, R., Georgiew, F., Wierzbicki, A. (2016). Crowd Teaches the Machine: Reducing Cost of Crowd-Based Training of Machine Classifiers. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2016. Lecture Notes in Computer Science(), vol 9693. Springer, Cham. https://doi.org/10.1007/978-3-319-39384-1_44
Download citation
DOI: https://doi.org/10.1007/978-3-319-39384-1_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-39383-4
Online ISBN: 978-3-319-39384-1
eBook Packages: Computer ScienceComputer Science (R0)