Crowd Teaches the Machine: Reducing Cost of Crowd-Based Training of Machine Classifiers

Nielek, Radoslaw; Georgiew, Filip; Wierzbicki, Adam

doi:10.1007/978-3-319-39384-1_44

Radoslaw Nielek¹⁹,
Filip Georgiew¹⁹ &
Adam Wierzbicki¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9693))

Included in the following conference series:

International Conference on Artificial Intelligence and Soft Computing

1182 Accesses
1 Altmetric

Abstract

Crowdsourcing platforms are very frequently used for collecting training data. Quality assurance is the most obvious problem but not the only one. This work proposes iterative approach which helps to reduce costs of building training/testing datasets. Information about classifier confidence is used for making decision whether new labels from crowdsourcing platform are required for this particular object. Conducted experiments have confirmed that proposed method reduces costs by over 50 % in best scenarios and at the same time increases the percentage of correctly classified objects.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.pewinternet.org/2014/08/06/future-of-jobs/.
2.
http://www.makeuseof.com/tag/6-human-jobs-computers-will-never-replace/.
3.
www.mturk.com.
4.
www.crowdflower.com.
5.
While it may seem strange solving 1000 CAPTCHAs costs only a few USD.
6.
https://github.com/ipeirotis/Troia-Server.
7.
We use word partially to stress that the initial number of cases used for training was really small and our classifier may be quite imprecise.
8.
We use a standard 10-fold cross-validation approach.
9.
http://thinknook.com/wp-content/uploads/2012/09/Sentiment-Analysis-Dataset.zip.
10.
Deep learning algorithms are an exception.
11.
http://www.nltk.org/.
12.
http://nielek.com/crowdai/.
13.
http://scikit-learn.org/stable/.
14.
It is an official name used by Amazon for entities that outsource jobs.
15.
Higher vages in crowdsourcing platform increase quality and motivation of workers and can be used for collecting labels that are less noisy.

References

Attenberg, J.M., Ipeirotis, P.G.: Task-agnostic integration of human and machine intelligence, US Patent App. 13/863,751, 16 April 2013
Google Scholar
Can, G., Odobez, J.-M., Gatica-Perez, D.: Is that a jaguar?: segmenting ancient maya glyphs via crowdsourcing. In: Proceedings of the 2014 International ACM Workshop on Crowdsourcing for Multimedia, pp. 37–40. ACM (2014)
Google Scholar
Chen, L.-C., Fidler, S., Yuille, A.L., Urtasun, R.: Beat the mturkers: automatic image labeling from weak 3d supervision. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3198–3205. IEEE (2014)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. Roy. Stat. Soc. Ser. B (Methodol.) 39(1), 1–38 (1977)
MathSciNet MATH Google Scholar
Downs, J.S., Holbrook, M.B., Sheng, S., Cranor, L.F.: Are your participants gaming the system?: screening mechanical turk workers. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2010, pp. 2399–2402. ACM, New York (2010)
Google Scholar
Elahi, M., Ricci, F., Rubens, N.: Active learning strategies for rating elicitation in collaborative filtering: a system-wide perspective. ACM Trans. Intell. Syst. Technol. (TIST) 5(1), 13 (2013)
Google Scholar
Fort, K., Adda, G., Cohen, K.B.: Amazon mechanical turk: gold mine or coal mine? Comput. Linguist. 37(2), 413–420 (2011)
Article Google Scholar
Howe, J.: The rise of crowdsourcing. Wired Mag. 14(6), 1–4 (2006)
MathSciNet Google Scholar
Khanna, S., Ratan, A., Davis, J., Thies, W.: Evaluating and improving the usability of mechanical turk for low-income workers in India. In: Proceedings of the First ACM Symposium on Computing for Development, p. 12. ACM (2010)
Google Scholar
Ross, J., Lilly Irani, M., Silberman, A.Z., Tomlinson, B.: Who are the crowdworkers?: shifting demographics in mechanical turk. In: CHI 2010 Extended Abstracts on Human Factors in Computing Systems, pp. 2863–2872. ACM (2010)
Google Scholar
Schneider, J.L., Weisz, J.R.: Using mechanical turk to study family processes and youth mental health: a test of feasibility. J. Child Fam. Stud. 24(11), 3235–3246 (2015)
Article Google Scholar
Vasantha, A., Vijayumar, G., Corney, J., AcurBakir, N., Lynn, A., Jagadeesan, A.P., Smith, M., Agarwal, A.: Social implications of crowdsourcing in rural scotland. Int. J. Soc. Sci. Hum. Behav. Study 1(3), 47–52 (2014)
Google Scholar
Wang, G., Wang, T., Zhang, H., Zhao, B.Y.: Man vs. machine: practical adversarial detection of malicious crowdsourcing workers. In: Proceedings of the 23rd USENIX Conference on Security Symposium, SEC 20114, pp. 239–254. USENIX Association, Berkeley (2014)
Google Scholar
Wang, J., Ipeirotis, P.G., Provost, F.: Managing crowdsourcing workers. In: The 2011 Winter Conference on Business Intelligence, pp. 10–12 (2011)
Google Scholar
Wang, J., Ipeirotis, P.G., Provost, F.: Quality-Based Pricing for Crowdsourced Workers, NYU Working Paper No. 2451/31833, June 2013. Available at SSRN: http://ssrn.com/abstract=2283000
Zhu, X., Vondrick, C., Ramanan, D., Fowlkes, C.: Do we need more training data or better models for object detection?. In: BMVC vol. 3, p. 5. Citeseer (2012)
Google Scholar

Download references

Acknowledgments

This project has received funding from the European Unions Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 690962.

Author information

Authors and Affiliations

Polish-Japanese Academy of Information Technology, Koszykowa 86, 02008, Warsaw, Poland
Radoslaw Nielek, Filip Georgiew & Adam Wierzbicki

Authors

Radoslaw Nielek
View author publications
You can also search for this author in PubMed Google Scholar
Filip Georgiew
View author publications
You can also search for this author in PubMed Google Scholar
Adam Wierzbicki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Radoslaw Nielek .

Editor information

Editors and Affiliations

Częstochowa University of Technology, Czestochowa, Poland
Leszek Rutkowski
Częstochowa University of Technology, Czestochowa, Poland
Marcin Korytkowski
Częstochowa University of Technology, Czestochowa, Poland
Rafał Scherer
AGH University of Science and Technology, Krakow, Poland
Ryszard Tadeusiewicz
University of California, Berkeley, California, USA
Lotfi A. Zadeh
University of Louisville, Louisville, Kentucky, USA
Jacek M. Zurada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nielek, R., Georgiew, F., Wierzbicki, A. (2016). Crowd Teaches the Machine: Reducing Cost of Crowd-Based Training of Machine Classifiers. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2016. Lecture Notes in Computer Science(), vol 9693. Springer, Cham. https://doi.org/10.1007/978-3-319-39384-1_44

Download citation

DOI: https://doi.org/10.1007/978-3-319-39384-1_44
Published: 29 May 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-39383-4
Online ISBN: 978-3-319-39384-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics