Skip to main content

Crowd Teaches the Machine: Reducing Cost of Crowd-Based Training of Machine Classifiers

  • Conference paper
  • First Online:
Artificial Intelligence and Soft Computing (ICAISC 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9693))

Included in the following conference series:

Abstract

Crowdsourcing platforms are very frequently used for collecting training data. Quality assurance is the most obvious problem but not the only one. This work proposes iterative approach which helps to reduce costs of building training/testing datasets. Information about classifier confidence is used for making decision whether new labels from crowdsourcing platform are required for this particular object. Conducted experiments have confirmed that proposed method reduces costs by over 50 % in best scenarios and at the same time increases the percentage of correctly classified objects.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.pewinternet.org/2014/08/06/future-of-jobs/.

  2. 2.

    http://www.makeuseof.com/tag/6-human-jobs-computers-will-never-replace/.

  3. 3.

    www.mturk.com.

  4. 4.

    www.crowdflower.com.

  5. 5.

    While it may seem strange solving 1000 CAPTCHAs costs only a few USD.

  6. 6.

    https://github.com/ipeirotis/Troia-Server.

  7. 7.

    We use word partially to stress that the initial number of cases used for training was really small and our classifier may be quite imprecise.

  8. 8.

    We use a standard 10-fold cross-validation approach.

  9. 9.

    http://thinknook.com/wp-content/uploads/2012/09/Sentiment-Analysis-Dataset.zip.

  10. 10.

    Deep learning algorithms are an exception.

  11. 11.

    http://www.nltk.org/.

  12. 12.

    http://nielek.com/crowdai/.

  13. 13.

    http://scikit-learn.org/stable/.

  14. 14.

    It is an official name used by Amazon for entities that outsource jobs.

  15. 15.

    Higher vages in crowdsourcing platform increase quality and motivation of workers and can be used for collecting labels that are less noisy.

References

  1. Attenberg, J.M., Ipeirotis, P.G.: Task-agnostic integration of human and machine intelligence, US Patent App. 13/863,751, 16 April 2013

    Google Scholar 

  2. Can, G., Odobez, J.-M., Gatica-Perez, D.: Is that a jaguar?: segmenting ancient maya glyphs via crowdsourcing. In: Proceedings of the 2014 International ACM Workshop on Crowdsourcing for Multimedia, pp. 37–40. ACM (2014)

    Google Scholar 

  3. Chen, L.-C., Fidler, S., Yuille, A.L., Urtasun, R.: Beat the mturkers: automatic image labeling from weak 3d supervision. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3198–3205. IEEE (2014)

    Google Scholar 

  4. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. Roy. Stat. Soc. Ser. B (Methodol.) 39(1), 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  5. Downs, J.S., Holbrook, M.B., Sheng, S., Cranor, L.F.: Are your participants gaming the system?: screening mechanical turk workers. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2010, pp. 2399–2402. ACM, New York (2010)

    Google Scholar 

  6. Elahi, M., Ricci, F., Rubens, N.: Active learning strategies for rating elicitation in collaborative filtering: a system-wide perspective. ACM Trans. Intell. Syst. Technol. (TIST) 5(1), 13 (2013)

    Google Scholar 

  7. Fort, K., Adda, G., Cohen, K.B.: Amazon mechanical turk: gold mine or coal mine? Comput. Linguist. 37(2), 413–420 (2011)

    Article  Google Scholar 

  8. Howe, J.: The rise of crowdsourcing. Wired Mag. 14(6), 1–4 (2006)

    MathSciNet  Google Scholar 

  9. Khanna, S., Ratan, A., Davis, J., Thies, W.: Evaluating and improving the usability of mechanical turk for low-income workers in India. In: Proceedings of the First ACM Symposium on Computing for Development, p. 12. ACM (2010)

    Google Scholar 

  10. Ross, J., Lilly Irani, M., Silberman, A.Z., Tomlinson, B.: Who are the crowdworkers?: shifting demographics in mechanical turk. In: CHI 2010 Extended Abstracts on Human Factors in Computing Systems, pp. 2863–2872. ACM (2010)

    Google Scholar 

  11. Schneider, J.L., Weisz, J.R.: Using mechanical turk to study family processes and youth mental health: a test of feasibility. J. Child Fam. Stud. 24(11), 3235–3246 (2015)

    Article  Google Scholar 

  12. Vasantha, A., Vijayumar, G., Corney, J., AcurBakir, N., Lynn, A., Jagadeesan, A.P., Smith, M., Agarwal, A.: Social implications of crowdsourcing in rural scotland. Int. J. Soc. Sci. Hum. Behav. Study 1(3), 47–52 (2014)

    Google Scholar 

  13. Wang, G., Wang, T., Zhang, H., Zhao, B.Y.: Man vs. machine: practical adversarial detection of malicious crowdsourcing workers. In: Proceedings of the 23rd USENIX Conference on Security Symposium, SEC 20114, pp. 239–254. USENIX Association, Berkeley (2014)

    Google Scholar 

  14. Wang, J., Ipeirotis, P.G., Provost, F.: Managing crowdsourcing workers. In: The 2011 Winter Conference on Business Intelligence, pp. 10–12 (2011)

    Google Scholar 

  15. Wang, J., Ipeirotis, P.G., Provost, F.: Quality-Based Pricing for Crowdsourced Workers, NYU Working Paper No. 2451/31833, June 2013. Available at SSRN: http://ssrn.com/abstract=2283000

  16. Zhu, X., Vondrick, C., Ramanan, D., Fowlkes, C.: Do we need more training data or better models for object detection?. In: BMVC vol. 3, p. 5. Citeseer (2012)

    Google Scholar 

Download references

Acknowledgments

This project has received funding from the European Unions Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 690962.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Radoslaw Nielek .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Nielek, R., Georgiew, F., Wierzbicki, A. (2016). Crowd Teaches the Machine: Reducing Cost of Crowd-Based Training of Machine Classifiers. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2016. Lecture Notes in Computer Science(), vol 9693. Springer, Cham. https://doi.org/10.1007/978-3-319-39384-1_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-39384-1_44

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-39383-4

  • Online ISBN: 978-3-319-39384-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics