Skip to main content

An Introduction to Crowdsourcing for Language and Multimedia Technology Research

  • Chapter
Information Retrieval Meets Information Visualization (PROMISE 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7757))

Included in the following conference series:

Abstract

Language and multimedia technology research often relies on large manually constructed datasets for training or evaluation of algorithms and systems. Constructing these datasets is often expensive with significant challenges in terms of recruitment of personnel to carry out the work. Crowdsourcing methods using scalable pools of workers available on-demand offers a flexible means of rapid low-cost construction of many of these datasets to support existing research requirements and potentially promote new research initiatives that would otherwise not be possible.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Alonso, O., Rose, D.E., Stewart, B.: Crowdsourcing for relevance evaluation. SIGIR Forum 42(2), 9–15 (2008)

    Article  Google Scholar 

  2. Ambati, V., Vogel, S., Carbonell, J.: Active learning and crowd-sourcing for machine translation. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2012), pp. 2169–2174 (2010)

    Google Scholar 

  3. Callison-Burch, C.: Fast, cheap, and creative: Evaluating translation quality using Amazon’s Mechanical Turk. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP 2009), pp. 286–295 (2009)

    Google Scholar 

  4. Callison-Burch, C., Dredze, M.: Creating speech and language data with Amazon’s Mechanical Turk. In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk (CSLDAMT 2010), pp. 1–12. Association for Computational Linguistics, Stroudsburg (2010)

    Google Scholar 

  5. Doan, A., Ramakrishnan, R., Halevy, A.Y.: Crowdsourcing systems on the World-Wide Web. Communications of the ACM 54(4), 86–96 (2011)

    Article  Google Scholar 

  6. Eskevich, M., Jones, G.J.F., Larson, M., Ordelman, R.: Creating a Data Collection for Evaluating Rich Speech Retrieval. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey (2012)

    Google Scholar 

  7. Evanini, K., Higgins, D., Zechner, K.: Using Amazon Mechanical Turk for transcription of non-native speech. In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk (CSLDAMT 2010), pp. 53–56. Association for Computational Linguistics (2010)

    Google Scholar 

  8. Goto, M., Ogata, J.: Podcastle: Recent advances of a spoken document retrieval service improved by anonymous user contributions. In: Proceedings of Interspeech 2011 (2011)

    Google Scholar 

  9. Grady, C., Lease, M.: Crowdsourcing document relevance assessment with Mechanical Turk. In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk (CSLDAMT 2010), pp. 172–179. Association for Computational Linguistics (2010)

    Google Scholar 

  10. Horton, J.J.: Employer expectations, peer effects and productivity: Evidence from a series of field experiments. CoRR, abs/1008.2437 (2010)

    Google Scholar 

  11. Ipeirotis, P.G., Provost, F., Sheng, V., Wang, J.: Repeated Labeling Using Multiple Noisy Labelers. SSRN eLibrary (2010)

    Google Scholar 

  12. Kittur, A., Chi, E.H., Suh, B.: Crowdsourcing user studies with Mechanical Turk. In: Proceedings of the 2008 Conference on Human Factors in Computing Systems (CHI 2008), pp. 453–456. ACM (2008)

    Google Scholar 

  13. Kunchukuttan, A., Roy, S., Patel, P., Ladha, K., Gupta, S., Khapra, M.M., Bhattacharyya, P.: Experiences in resource generation for machine translation through crowdsourcing. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey (2012)

    Google Scholar 

  14. Lane, I., Waibel, A., Eck, M., Rottmann, K.: Tools for collecting speech corpora via Mechanical-Turk. In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk (CSLDAMT 2010), pp. 184–187. Association for Computational Linguistics (2010)

    Google Scholar 

  15. Larson, M., Eskevich, M., Ordelman, R., Kofler, C., Schmiedeke, S., Jones, G.J.F.: Overview of Mediaeval 2011 Rich Speech Retrieval Task and Genre Tagging Task. In: MediaEval 2011 Workshop Notes Proceedings, vol. 807. CEUR-WS.org (2011)

    Google Scholar 

  16. Larson, M., Soleymani, M., Eskevich, M., Serdyukov, P., Ordelman, R., Jones, G.J.F.: The Community and the Crowd: Multimedia Benchmark Dataset Development. IEEE Multimedia 19(3), 15–23 (2012)

    Article  Google Scholar 

  17. Lasecki, W.S., Miller, C.D., Sadilek, A., Abumoussa, A., Borrello, D., Kushalnagar, R., Bigham, J.P.: Real-time captioning by groups of non-experts. In: Proceedings of 25th ACM Symposium on User Interface Software and Technology (UIST 2012), pp. 23–34. ACM, Cambridge (2012)

    Chapter  Google Scholar 

  18. Liao, S., Wu, C., Huerta, J.M.: Evaluating human correction quality for machine translation from crowdsourcing. In: Recent Advances in Natural Language Processing (RANLP 2011), pp. 598–603 (2011)

    Google Scholar 

  19. Lloret, E., Plaza, L., Aker, A.: Analyzing the capabilities of crowdsourcing services for text summarization. In: Language Resources and Evaluation (LRE) (2012)

    Google Scholar 

  20. Marge, M., Banerjee, S., Rudnicky, A.I.: Using the Amazon Mechanical Turk for transcription of spoken language. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2010), pp. 5270–5273. IEEE (2010)

    Google Scholar 

  21. Mason, W., Watts, D.J.: Financial incentives and the “performance of crowds”. In: Proceedings of the ACM SIGKDD Workshop on Human Computation (HCOMP 2009), pp. 77–85. ACM, New York (2009)

    Chapter  Google Scholar 

  22. McGraw, I., Cyphers, S., Pasupat, P., Liu, J., Glass, J.: Automating crowd-supervised learning for spoken language systems. In: Proceedings of Interspeech 2012 (2012)

    Google Scholar 

  23. Novotney, S., Callison-Burch, C.: Cheap, fast and good enough: automatic speech recognition with non-expert transcription. In: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (HLT 2010), pp. 207–215. Association for Computational Linguistics, Stroudsburg (2010)

    Google Scholar 

  24. Ogata, J., Goto, M.: PodCastle: Collaborative training of language models on the basis of the wisdom of crowds. In: Proceedings of Interspeech 2012 (2012)

    Google Scholar 

  25. Paolacci, G., Chandler, J., Ipeirotis, P.G.: Running Experiments on Amazon Mechanical Turk. Judgment and Decision Making 5(5), 411–419 (2010)

    Google Scholar 

  26. Parent, G., Eskenazi, M.: Speaking to the Crowd: looking at past achievements in using crowdsourcing for speech and predicting future challenges. In: Proceedings of Interspeech 2011, pp. 3037–3040 (2011)

    Google Scholar 

  27. Pickard, G., Pan, W., Rahwan, I., Cebrian, M., Crane, R., Madan, A., Pentland, A.: Time-critical social mobilization. Science 334(6055), 509–512 (2011)

    Article  Google Scholar 

  28. Rashtchian, C., Young, P., Hodosh, M., Hockenmaier, J.: Collecting image annotations using Amazon’s Mechanical Turk. In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk (CSLDAMT 2010), pp. 139–147. Association for Computational Linguistics, Stroudsburg (2010)

    Google Scholar 

  29. Rayker, V.C., Yu, S., Zhao, L.H., Hermosillo Valadez, G., Floring, C., Bogoni, L., May, L.: Learning from crowds. Journal of Machine Learning Research 11, 1297–1322 (2010)

    MathSciNet  Google Scholar 

  30. Ross, J., Irani, L., Silberman, M.S., Zaldivar, A., Tomlinson, B.: Who are the crowdworkers?: shifting demographics in Mechanical Turk. In: Proceedings of the 28th of the International Conference Extended Abstracts on Human Factors in Computing Systems (CHI EA 2010), pp. 2863–2872. ACM, New York (2010)

    Chapter  Google Scholar 

  31. Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2008), pp. 614–622. ACM, New York (2008)

    Chapter  Google Scholar 

  32. Snoek, C.G., Freiburg, B., Oomen, J., Ordelman, R.: Crowdsourcing rock n’ roll multimedia retrieval. In: Proceedings of the 18th ACM International Conference on Multimedia (ACM MM 2010), pp. 1535–1538. ACM (2010)

    Google Scholar 

  33. Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast — but is it good?: Evaluating non-expert annotations for natural language tasks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2008), pp. 254–263. Association for Computational Linguistics, Stroudsburg (2008)

    Google Scholar 

  34. Soleymani, M., Larson, M.: Crowdsourcing for Affective Annotation of Video: Development of a Viewer-reported Boredom Corpus. In: Carvalho, V., Lease, M., Yilmaz (eds.) Proceedings of the SIGIR 2010 Workshop on Crowdsourcing for Search Evaluation (CSE 2010). ACM (2010)

    Google Scholar 

  35. Sorokin, A., Forsyth, D.: Utility data annotation with Amazon Mechanical Turk. In: Proceedings of the First IEEE Workshop on Internet Vision at CVPR 2008, pp. 1–8. IEEE (2008)

    Google Scholar 

  36. von Ahn, L.: Games with a Purpose. Computer 39(6), 92–94 (2006)

    Article  Google Scholar 

  37. von Ahn, L., Dabbish, L.: Labeling images with a computer game. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI 2004), pp. 319–326. ACM, New York (2004)

    Google Scholar 

  38. Wang, J., Ipeirotis, P.G., Provost, F.: Managing crowdsourcing workers. In: Proceedings of the Winter Conference on Business Intelligence (2011)

    Google Scholar 

  39. Willett, W., Heer, J., Agrawala, M.: Strategies for crowdsourcing social data analysis. In: Proceedings of the 2012 ACM Annual Conference on Human Factors in Computing Systems (CHI 2012), pp. 227–236. ACM, New York (2012)

    Chapter  Google Scholar 

  40. Wu, S.-Y., Thawonmas, R., Chen, K.-T.: Video summarization via crowdsourcing. In: Extended Abstracts on Human Factors in Computing Systems (CHI 2011), pp. 1531–1536. ACM (2011)

    Google Scholar 

  41. Yan, T., Kumar, V., Ganesan, D.: Crowdsearch: exploiting crowds for accurate real-time image search on mobile phones. In: Proceedings of the 8th International Conference on Mobile Systems, Applications, and Services (MobiSys 2010), pp. 77–90. ACM, New York (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Jones, G.J.F. (2013). An Introduction to Crowdsourcing for Language and Multimedia Technology Research. In: Agosti, M., Ferro, N., Forner, P., Müller, H., Santucci, G. (eds) Information Retrieval Meets Information Visualization. PROMISE 2012. Lecture Notes in Computer Science, vol 7757. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36415-0_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36415-0_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36414-3

  • Online ISBN: 978-3-642-36415-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics