An Introduction to Crowdsourcing for Language and Multimedia Technology Research

Jones, Gareth J. F.

doi:10.1007/978-3-642-36415-0_9

Gareth J. F. Jones²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7757))

Included in the following conference series:

PROMISE Winter School

1793 Accesses
7 Citations
1 Altmetric

Abstract

Language and multimedia technology research often relies on large manually constructed datasets for training or evaluation of algorithms and systems. Constructing these datasets is often expensive with significant challenges in terms of recruitment of personnel to carry out the work. Crowdsourcing methods using scalable pools of workers available on-demand offers a flexible means of rapid low-cost construction of many of these datasets to support existing research requirements and potentially promote new research initiatives that would otherwise not be possible.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Crowdsourcing

Crowdsourcing for Language Resource Development: Criticisms About Amazon Mechanical Turk Overpowering Use

An Open Source Tool for Crowd-Sourcing the Manual Annotation of Texts

References

Alonso, O., Rose, D.E., Stewart, B.: Crowdsourcing for relevance evaluation. SIGIR Forum 42(2), 9–15 (2008)
Article Google Scholar
Ambati, V., Vogel, S., Carbonell, J.: Active learning and crowd-sourcing for machine translation. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2012), pp. 2169–2174 (2010)
Google Scholar
Callison-Burch, C.: Fast, cheap, and creative: Evaluating translation quality using Amazon’s Mechanical Turk. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP 2009), pp. 286–295 (2009)
Google Scholar
Callison-Burch, C., Dredze, M.: Creating speech and language data with Amazon’s Mechanical Turk. In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk (CSLDAMT 2010), pp. 1–12. Association for Computational Linguistics, Stroudsburg (2010)
Google Scholar
Doan, A., Ramakrishnan, R., Halevy, A.Y.: Crowdsourcing systems on the World-Wide Web. Communications of the ACM 54(4), 86–96 (2011)
Article Google Scholar
Eskevich, M., Jones, G.J.F., Larson, M., Ordelman, R.: Creating a Data Collection for Evaluating Rich Speech Retrieval. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey (2012)
Google Scholar
Evanini, K., Higgins, D., Zechner, K.: Using Amazon Mechanical Turk for transcription of non-native speech. In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk (CSLDAMT 2010), pp. 53–56. Association for Computational Linguistics (2010)
Google Scholar
Goto, M., Ogata, J.: Podcastle: Recent advances of a spoken document retrieval service improved by anonymous user contributions. In: Proceedings of Interspeech 2011 (2011)
Google Scholar
Grady, C., Lease, M.: Crowdsourcing document relevance assessment with Mechanical Turk. In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk (CSLDAMT 2010), pp. 172–179. Association for Computational Linguistics (2010)
Google Scholar
Horton, J.J.: Employer expectations, peer effects and productivity: Evidence from a series of field experiments. CoRR, abs/1008.2437 (2010)
Google Scholar
Ipeirotis, P.G., Provost, F., Sheng, V., Wang, J.: Repeated Labeling Using Multiple Noisy Labelers. SSRN eLibrary (2010)
Google Scholar
Kittur, A., Chi, E.H., Suh, B.: Crowdsourcing user studies with Mechanical Turk. In: Proceedings of the 2008 Conference on Human Factors in Computing Systems (CHI 2008), pp. 453–456. ACM (2008)
Google Scholar
Kunchukuttan, A., Roy, S., Patel, P., Ladha, K., Gupta, S., Khapra, M.M., Bhattacharyya, P.: Experiences in resource generation for machine translation through crowdsourcing. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey (2012)
Google Scholar
Lane, I., Waibel, A., Eck, M., Rottmann, K.: Tools for collecting speech corpora via Mechanical-Turk. In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk (CSLDAMT 2010), pp. 184–187. Association for Computational Linguistics (2010)
Google Scholar
Larson, M., Eskevich, M., Ordelman, R., Kofler, C., Schmiedeke, S., Jones, G.J.F.: Overview of Mediaeval 2011 Rich Speech Retrieval Task and Genre Tagging Task. In: MediaEval 2011 Workshop Notes Proceedings, vol. 807. CEUR-WS.org (2011)
Google Scholar
Larson, M., Soleymani, M., Eskevich, M., Serdyukov, P., Ordelman, R., Jones, G.J.F.: The Community and the Crowd: Multimedia Benchmark Dataset Development. IEEE Multimedia 19(3), 15–23 (2012)
Article Google Scholar
Lasecki, W.S., Miller, C.D., Sadilek, A., Abumoussa, A., Borrello, D., Kushalnagar, R., Bigham, J.P.: Real-time captioning by groups of non-experts. In: Proceedings of 25th ACM Symposium on User Interface Software and Technology (UIST 2012), pp. 23–34. ACM, Cambridge (2012)
Chapter Google Scholar
Liao, S., Wu, C., Huerta, J.M.: Evaluating human correction quality for machine translation from crowdsourcing. In: Recent Advances in Natural Language Processing (RANLP 2011), pp. 598–603 (2011)
Google Scholar
Lloret, E., Plaza, L., Aker, A.: Analyzing the capabilities of crowdsourcing services for text summarization. In: Language Resources and Evaluation (LRE) (2012)
Google Scholar
Marge, M., Banerjee, S., Rudnicky, A.I.: Using the Amazon Mechanical Turk for transcription of spoken language. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2010), pp. 5270–5273. IEEE (2010)
Google Scholar
Mason, W., Watts, D.J.: Financial incentives and the “performance of crowds”. In: Proceedings of the ACM SIGKDD Workshop on Human Computation (HCOMP 2009), pp. 77–85. ACM, New York (2009)
Chapter Google Scholar
McGraw, I., Cyphers, S., Pasupat, P., Liu, J., Glass, J.: Automating crowd-supervised learning for spoken language systems. In: Proceedings of Interspeech 2012 (2012)
Google Scholar
Novotney, S., Callison-Burch, C.: Cheap, fast and good enough: automatic speech recognition with non-expert transcription. In: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (HLT 2010), pp. 207–215. Association for Computational Linguistics, Stroudsburg (2010)
Google Scholar
Ogata, J., Goto, M.: PodCastle: Collaborative training of language models on the basis of the wisdom of crowds. In: Proceedings of Interspeech 2012 (2012)
Google Scholar
Paolacci, G., Chandler, J., Ipeirotis, P.G.: Running Experiments on Amazon Mechanical Turk. Judgment and Decision Making 5(5), 411–419 (2010)
Google Scholar
Parent, G., Eskenazi, M.: Speaking to the Crowd: looking at past achievements in using crowdsourcing for speech and predicting future challenges. In: Proceedings of Interspeech 2011, pp. 3037–3040 (2011)
Google Scholar
Pickard, G., Pan, W., Rahwan, I., Cebrian, M., Crane, R., Madan, A., Pentland, A.: Time-critical social mobilization. Science 334(6055), 509–512 (2011)
Article Google Scholar
Rashtchian, C., Young, P., Hodosh, M., Hockenmaier, J.: Collecting image annotations using Amazon’s Mechanical Turk. In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk (CSLDAMT 2010), pp. 139–147. Association for Computational Linguistics, Stroudsburg (2010)
Google Scholar
Rayker, V.C., Yu, S., Zhao, L.H., Hermosillo Valadez, G., Floring, C., Bogoni, L., May, L.: Learning from crowds. Journal of Machine Learning Research 11, 1297–1322 (2010)
MathSciNet Google Scholar
Ross, J., Irani, L., Silberman, M.S., Zaldivar, A., Tomlinson, B.: Who are the crowdworkers?: shifting demographics in Mechanical Turk. In: Proceedings of the 28th of the International Conference Extended Abstracts on Human Factors in Computing Systems (CHI EA 2010), pp. 2863–2872. ACM, New York (2010)
Chapter Google Scholar
Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2008), pp. 614–622. ACM, New York (2008)
Chapter Google Scholar
Snoek, C.G., Freiburg, B., Oomen, J., Ordelman, R.: Crowdsourcing rock n’ roll multimedia retrieval. In: Proceedings of the 18th ACM International Conference on Multimedia (ACM MM 2010), pp. 1535–1538. ACM (2010)
Google Scholar
Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast — but is it good?: Evaluating non-expert annotations for natural language tasks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2008), pp. 254–263. Association for Computational Linguistics, Stroudsburg (2008)
Google Scholar
Soleymani, M., Larson, M.: Crowdsourcing for Affective Annotation of Video: Development of a Viewer-reported Boredom Corpus. In: Carvalho, V., Lease, M., Yilmaz (eds.) Proceedings of the SIGIR 2010 Workshop on Crowdsourcing for Search Evaluation (CSE 2010). ACM (2010)
Google Scholar
Sorokin, A., Forsyth, D.: Utility data annotation with Amazon Mechanical Turk. In: Proceedings of the First IEEE Workshop on Internet Vision at CVPR 2008, pp. 1–8. IEEE (2008)
Google Scholar
von Ahn, L.: Games with a Purpose. Computer 39(6), 92–94 (2006)
Article Google Scholar
von Ahn, L., Dabbish, L.: Labeling images with a computer game. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI 2004), pp. 319–326. ACM, New York (2004)
Google Scholar
Wang, J., Ipeirotis, P.G., Provost, F.: Managing crowdsourcing workers. In: Proceedings of the Winter Conference on Business Intelligence (2011)
Google Scholar
Willett, W., Heer, J., Agrawala, M.: Strategies for crowdsourcing social data analysis. In: Proceedings of the 2012 ACM Annual Conference on Human Factors in Computing Systems (CHI 2012), pp. 227–236. ACM, New York (2012)
Chapter Google Scholar
Wu, S.-Y., Thawonmas, R., Chen, K.-T.: Video summarization via crowdsourcing. In: Extended Abstracts on Human Factors in Computing Systems (CHI 2011), pp. 1531–1536. ACM (2011)
Google Scholar
Yan, T., Kumar, V., Ganesan, D.: Crowdsearch: exploiting crowds for accurate real-time image search on mobile phones. In: Proceedings of the 8th International Conference on Mobile Systems, Applications, and Services (MobiSys 2010), pp. 77–90. ACM, New York (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Next Generation Localisation School of Computing, Dublin City University, Dublin 9, Ireland
Gareth J. F. Jones

Authors

Gareth J. F. Jones
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information Engineering, University of Padua, Via Gradenigo 6/a, 35131, Padua, Italy
Maristella Agosti
Department of Information Engineering, University of Padua, Via Gradenigo 6/a, 35131, Italy
Nicola Ferro
Center for the Evaluation of Language and Communication Technologies (CELCT), Via alla Cascata 56/c, 38123, Povo, TN, Italy
Pamela Forner
University of Applied Sciences Western Switzerland, TechnoArk 3, 3960, Sierre, Switzerland
Henning Müller
Department of Computer, Control and Management, Engineering Antonio Ruberti, Sapienza University of Rome, Via Ariosto 25, 00185, Rome, Italy
Giuseppe Santucci

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Jones, G.J.F. (2013). An Introduction to Crowdsourcing for Language and Multimedia Technology Research. In: Agosti, M., Ferro, N., Forner, P., Müller, H., Santucci, G. (eds) Information Retrieval Meets Information Visualization. PROMISE 2012. Lecture Notes in Computer Science, vol 7757. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36415-0_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-36415-0_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36414-3
Online ISBN: 978-3-642-36415-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Introduction to Crowdsourcing for Language and Multimedia Technology Research

Abstract

Access this chapter

Preview

Similar content being viewed by others

Crowdsourcing

Crowdsourcing for Language Resource Development: Criticisms About Amazon Mechanical Turk Overpowering Use

An Open Source Tool for Crowd-Sourcing the Manual Annotation of Texts

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

An Introduction to Crowdsourcing for Language and Multimedia Technology Research

Abstract

Access this chapter

Preview

Similar content being viewed by others

Crowdsourcing

Crowdsourcing for Language Resource Development: Criticisms About Amazon Mechanical Turk Overpowering Use

An Open Source Tool for Crowd-Sourcing the Manual Annotation of Texts

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation