Abstract
Language and multimedia technology research often relies on large manually constructed datasets for training or evaluation of algorithms and systems. Constructing these datasets is often expensive with significant challenges in terms of recruitment of personnel to carry out the work. Crowdsourcing methods using scalable pools of workers available on-demand offers a flexible means of rapid low-cost construction of many of these datasets to support existing research requirements and potentially promote new research initiatives that would otherwise not be possible.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Alonso, O., Rose, D.E., Stewart, B.: Crowdsourcing for relevance evaluation. SIGIR Forum 42(2), 9–15 (2008)
Ambati, V., Vogel, S., Carbonell, J.: Active learning and crowd-sourcing for machine translation. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2012), pp. 2169–2174 (2010)
Callison-Burch, C.: Fast, cheap, and creative: Evaluating translation quality using Amazon’s Mechanical Turk. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP 2009), pp. 286–295 (2009)
Callison-Burch, C., Dredze, M.: Creating speech and language data with Amazon’s Mechanical Turk. In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk (CSLDAMT 2010), pp. 1–12. Association for Computational Linguistics, Stroudsburg (2010)
Doan, A., Ramakrishnan, R., Halevy, A.Y.: Crowdsourcing systems on the World-Wide Web. Communications of the ACM 54(4), 86–96 (2011)
Eskevich, M., Jones, G.J.F., Larson, M., Ordelman, R.: Creating a Data Collection for Evaluating Rich Speech Retrieval. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey (2012)
Evanini, K., Higgins, D., Zechner, K.: Using Amazon Mechanical Turk for transcription of non-native speech. In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk (CSLDAMT 2010), pp. 53–56. Association for Computational Linguistics (2010)
Goto, M., Ogata, J.: Podcastle: Recent advances of a spoken document retrieval service improved by anonymous user contributions. In: Proceedings of Interspeech 2011 (2011)
Grady, C., Lease, M.: Crowdsourcing document relevance assessment with Mechanical Turk. In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk (CSLDAMT 2010), pp. 172–179. Association for Computational Linguistics (2010)
Horton, J.J.: Employer expectations, peer effects and productivity: Evidence from a series of field experiments. CoRR, abs/1008.2437 (2010)
Ipeirotis, P.G., Provost, F., Sheng, V., Wang, J.: Repeated Labeling Using Multiple Noisy Labelers. SSRN eLibrary (2010)
Kittur, A., Chi, E.H., Suh, B.: Crowdsourcing user studies with Mechanical Turk. In: Proceedings of the 2008 Conference on Human Factors in Computing Systems (CHI 2008), pp. 453–456. ACM (2008)
Kunchukuttan, A., Roy, S., Patel, P., Ladha, K., Gupta, S., Khapra, M.M., Bhattacharyya, P.: Experiences in resource generation for machine translation through crowdsourcing. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey (2012)
Lane, I., Waibel, A., Eck, M., Rottmann, K.: Tools for collecting speech corpora via Mechanical-Turk. In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk (CSLDAMT 2010), pp. 184–187. Association for Computational Linguistics (2010)
Larson, M., Eskevich, M., Ordelman, R., Kofler, C., Schmiedeke, S., Jones, G.J.F.: Overview of Mediaeval 2011 Rich Speech Retrieval Task and Genre Tagging Task. In: MediaEval 2011 Workshop Notes Proceedings, vol. 807. CEUR-WS.org (2011)
Larson, M., Soleymani, M., Eskevich, M., Serdyukov, P., Ordelman, R., Jones, G.J.F.: The Community and the Crowd: Multimedia Benchmark Dataset Development. IEEE Multimedia 19(3), 15–23 (2012)
Lasecki, W.S., Miller, C.D., Sadilek, A., Abumoussa, A., Borrello, D., Kushalnagar, R., Bigham, J.P.: Real-time captioning by groups of non-experts. In: Proceedings of 25th ACM Symposium on User Interface Software and Technology (UIST 2012), pp. 23–34. ACM, Cambridge (2012)
Liao, S., Wu, C., Huerta, J.M.: Evaluating human correction quality for machine translation from crowdsourcing. In: Recent Advances in Natural Language Processing (RANLP 2011), pp. 598–603 (2011)
Lloret, E., Plaza, L., Aker, A.: Analyzing the capabilities of crowdsourcing services for text summarization. In: Language Resources and Evaluation (LRE) (2012)
Marge, M., Banerjee, S., Rudnicky, A.I.: Using the Amazon Mechanical Turk for transcription of spoken language. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2010), pp. 5270–5273. IEEE (2010)
Mason, W., Watts, D.J.: Financial incentives and the “performance of crowds”. In: Proceedings of the ACM SIGKDD Workshop on Human Computation (HCOMP 2009), pp. 77–85. ACM, New York (2009)
McGraw, I., Cyphers, S., Pasupat, P., Liu, J., Glass, J.: Automating crowd-supervised learning for spoken language systems. In: Proceedings of Interspeech 2012 (2012)
Novotney, S., Callison-Burch, C.: Cheap, fast and good enough: automatic speech recognition with non-expert transcription. In: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (HLT 2010), pp. 207–215. Association for Computational Linguistics, Stroudsburg (2010)
Ogata, J., Goto, M.: PodCastle: Collaborative training of language models on the basis of the wisdom of crowds. In: Proceedings of Interspeech 2012 (2012)
Paolacci, G., Chandler, J., Ipeirotis, P.G.: Running Experiments on Amazon Mechanical Turk. Judgment and Decision Making 5(5), 411–419 (2010)
Parent, G., Eskenazi, M.: Speaking to the Crowd: looking at past achievements in using crowdsourcing for speech and predicting future challenges. In: Proceedings of Interspeech 2011, pp. 3037–3040 (2011)
Pickard, G., Pan, W., Rahwan, I., Cebrian, M., Crane, R., Madan, A., Pentland, A.: Time-critical social mobilization. Science 334(6055), 509–512 (2011)
Rashtchian, C., Young, P., Hodosh, M., Hockenmaier, J.: Collecting image annotations using Amazon’s Mechanical Turk. In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk (CSLDAMT 2010), pp. 139–147. Association for Computational Linguistics, Stroudsburg (2010)
Rayker, V.C., Yu, S., Zhao, L.H., Hermosillo Valadez, G., Floring, C., Bogoni, L., May, L.: Learning from crowds. Journal of Machine Learning Research 11, 1297–1322 (2010)
Ross, J., Irani, L., Silberman, M.S., Zaldivar, A., Tomlinson, B.: Who are the crowdworkers?: shifting demographics in Mechanical Turk. In: Proceedings of the 28th of the International Conference Extended Abstracts on Human Factors in Computing Systems (CHI EA 2010), pp. 2863–2872. ACM, New York (2010)
Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2008), pp. 614–622. ACM, New York (2008)
Snoek, C.G., Freiburg, B., Oomen, J., Ordelman, R.: Crowdsourcing rock n’ roll multimedia retrieval. In: Proceedings of the 18th ACM International Conference on Multimedia (ACM MM 2010), pp. 1535–1538. ACM (2010)
Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast — but is it good?: Evaluating non-expert annotations for natural language tasks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2008), pp. 254–263. Association for Computational Linguistics, Stroudsburg (2008)
Soleymani, M., Larson, M.: Crowdsourcing for Affective Annotation of Video: Development of a Viewer-reported Boredom Corpus. In: Carvalho, V., Lease, M., Yilmaz (eds.) Proceedings of the SIGIR 2010 Workshop on Crowdsourcing for Search Evaluation (CSE 2010). ACM (2010)
Sorokin, A., Forsyth, D.: Utility data annotation with Amazon Mechanical Turk. In: Proceedings of the First IEEE Workshop on Internet Vision at CVPR 2008, pp. 1–8. IEEE (2008)
von Ahn, L.: Games with a Purpose. Computer 39(6), 92–94 (2006)
von Ahn, L., Dabbish, L.: Labeling images with a computer game. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI 2004), pp. 319–326. ACM, New York (2004)
Wang, J., Ipeirotis, P.G., Provost, F.: Managing crowdsourcing workers. In: Proceedings of the Winter Conference on Business Intelligence (2011)
Willett, W., Heer, J., Agrawala, M.: Strategies for crowdsourcing social data analysis. In: Proceedings of the 2012 ACM Annual Conference on Human Factors in Computing Systems (CHI 2012), pp. 227–236. ACM, New York (2012)
Wu, S.-Y., Thawonmas, R., Chen, K.-T.: Video summarization via crowdsourcing. In: Extended Abstracts on Human Factors in Computing Systems (CHI 2011), pp. 1531–1536. ACM (2011)
Yan, T., Kumar, V., Ganesan, D.: Crowdsearch: exploiting crowds for accurate real-time image search on mobile phones. In: Proceedings of the 8th International Conference on Mobile Systems, Applications, and Services (MobiSys 2010), pp. 77–90. ACM, New York (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Jones, G.J.F. (2013). An Introduction to Crowdsourcing for Language and Multimedia Technology Research. In: Agosti, M., Ferro, N., Forner, P., Müller, H., Santucci, G. (eds) Information Retrieval Meets Information Visualization. PROMISE 2012. Lecture Notes in Computer Science, vol 7757. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36415-0_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-36415-0_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36414-3
Online ISBN: 978-3-642-36415-0
eBook Packages: Computer ScienceComputer Science (R0)