Crowd Labor Markets as Platform for Group Decision and Negotiation Research: A Comparison to Laboratory Experiments

Teschner, Florian; Gimpel, Henner

doi:10.1007/s10726-018-9565-y

Crowd Labor Markets as Platform for Group Decision and Negotiation Research: A Comparison to Laboratory Experiments

Published: 03 March 2018

Volume 27, pages 197–214, (2018)
Cite this article

Group Decision and Negotiation Aims and scope Submit manuscript

537 Accesses
6 Citations
Explore all metrics

Abstract

Crowd labor markets such as Amazon Mechanical Turk (MTurk) have emerged as popular platforms where researchers can relatively inexpensively and easily run web-based experiments. Some work even suggests that MTurk can be used to run large-scale field experiments in which groups of participants interact synchronously in real-time such as electronic markets. Besides technical issues, several methodological questions arise and lead to the question of how results from MTurk and laboratory experiments compare. Our data shows comparable results between MTurk and a standard lab setting with student subjects in a controlled environment when running rather simple individual decision tasks. However, our data shows stark differences in results between the experimental settings for a rather complex market experiment. Each experimental setting—lab and MTurk—has its own benefits and drawbacks; which of the two settings is better suited for a specific experiment depends on the theory or artifact to be tested. We discuss potential causes for differences (language understanding, education, cognition and context) that we cannot control for and provide guidance for the selection of the appropriate setting for an experiment. In any case, researchers studying complex artifacts like group decisions or markets should not prematurely adopt MTurk based on extant literature regarding comparable results across experimental settings for rather simple tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Do mturkers collude in interactive online experiments?

Article Open access 01 September 2023

Risks and Rewards of Crowdsourcing Marketplaces

Is it ethical to use Mechanical Turk for behavioral research? Relevant data from a representative survey of MTurk participants and wages

Article 22 May 2023

Notes

https://www.mturk.com/mturk/, accessed on February 20, 2018.
We do not suggest that information system experiments are per se more complex than experiments in other disciplines. Methodological papers comparing lab experiments to MTurk experiments do, however, so far focus on rather simple settings like an ultimatum game or prisoner’s dilemma, which are arguably less complex than the information markets studied here.
The lab experiment reported here is part of a larger series of lab experiments. In other settings, it is relevant to have two parallel markets and decide among them. For consistency, we used the same setting with two parallel markets here. As a downside, it increases complexity for subjects. However, having multiple parallel markets is common in most real-world applications of information markets.
Note there are technical solutions to run web-based group experiments such as: Lioness and NodeGame.
The decision for random rather than fixed effects or pooled regression bases on theoretical and empirical arguments. On the theoretical side, pooled regression is not adequate as it does not account for the interdependencies in the data. A fixed effect model would rule out time-invariant heterogeneity between cohorts, which is not desirable in an analysis where we consider data from both the lab and MTurk. On the empirical side, we tested the appropriateness of each of the three modeling approaches for each of the regression models reported in the following. We used F test to detect potential significant increases in goodness-of-fit with a fixed effects model as compared to a pooled regression model. We did not find evidence for such effects and, thus, do not further consider fixed effects models. Further, we used the Lagrange Multiplier test to examine random effects (Breusch and Pagan 1980). We found significant evidence for the existence of random effects for multiple regression models. For consistency and the theorical argument, we consistently apply two-way random effects models throughout.
Note that by pooling the data from both settings we constrain the variance of the residuals to be the same for both settings even though this might not be the case given the different settings from which data originates.
In the cognitive reflection test (CRT) participants answer the following 3 questions; A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost? (correct answer: 5)
If it takes 5 machines 5 min to make 5 widgets, how long would it take 100 machines to make 100 widgets? (correct answer: 5) In a lake, there is a patch of lily pads. Every day, the patch doubles in size. If it takes 48 days for the patch to cover the entire lake, how long would it take for the patch to cover half of the lake? (correct answer: 47).
The speculation is based on a general perception that education is positively correlated with intelligence and the fact that MTurk workers are more representative for the general population than students (Paolacci et al. 2010). We did not perform intelligence tests with the participants.

References

Aïmeura E, Lawani O, Dalkir K (2016) When changing the look of privacy policies affects user trust: an experimental study. Comput Hum Behav 58:368–379
Article Google Scholar
Amir O, Rand DG, Gal YK (2012) Economic games on the internet: the effect of $1 Stakes. PLoS ONE 7(2):e31461
Article Google Scholar
Barber BM, Odean T (2000) Trading is hazardous to your wealth: the common stock investment performance of individual investors. J Finance 55(2):773–806
Article Google Scholar
Bennouri M, Gimpel H, Robert J (2011) Measuring the impact of information aggregation mechanisms: an experimental investigation. J Econ Behav Organ 78(3):302–318
Article Google Scholar
Berg JE, Rietz TA (2003) Prediction markets as decision support systems. Inf Syst Front 5(1):79–93
Article Google Scholar
Berg JE, Nelson FD, Rietz TA (2008) Prediction market accuracy in the long run. Int J Forecast 24(2):285–300
Article Google Scholar
Berinsky AJ, Huber GA, Lenz GS (2012) Evaluating online labor markets for experimental research: Amazon. com’s mechanical turk. Polit Anal 20(3):351–368
Article Google Scholar
Bichler M, Kersten G, Strecker S (2003) Towards a structured design of electronic negotiations. Group Decis Negot 12(4):311–335
Article Google Scholar
Blohm I, Riedl C, Leimeister JM, Krcmar H (2011) Idea evaluation mechanisms for collective intelligence in open innovation communities: do traders outperform raters?. In: Proceedings of the thirty second international conference on information systems (ICIS 2011), Shanghai, China
Breusch TS, Pagan AR (1980) The Lagrange multiplier test and its applications to model specification in econometrics. Rev Econ Stud 47:239–253
Article Google Scholar
Buhrmester M, Kwang T, Gosling SD (2011) Amazon’s mechanical turk: a new source of inexpensive, yet high-quality data? Perspect Psychol Sci 6(1):3–5
Article Google Scholar
Casey LS, Jesse Chandler J, Levine AS, Proctor A, Strolovitch DZ (2017) Intertemporal differences among MTurk workers: time-based sample variations and implications for online data collection. SAGE Open. https://doi.org/10.1177/2158244017712774
Chandler D, Kapelner A (2013) Breaking monotony with meaning: motivation in crowdsourcing markets. J Econ Behav Organ 90:123–133
Article Google Scholar
Chen DL, Horton JJ (2016) Are online labor markets spot markets for tasks?A field experiment on the behavioral response to wage cuts. Inf Syst Res 27(2):403–423
Article Google Scholar
Chilton LB, Horton JJ, Miller RC, Azenkot S (2010) Task search in a human computation market. In: Proceedings of the ACM SIGKDD workshop on human computation (HCOMP ‘10), New York, NY
Djamasbi S, Bengisu B, Loiacono E, Whitefleet-Smith J (2008) Can a reasonable time limit improve the effective usage of a computerized decision aid? Commun Assoc Inf Syst 23:22
Google Scholar
Fair RC, Shiller RJ (1989) The informational context of ex-ante forecasts. Rev Econ Stat 71:325–331
Article Google Scholar
Ferreira A, Antunes P, Herskovic V (2011) Improving group attention: an experiment with synchronous brainstorming. Group Decis Negot 20(5):643–666
Article Google Scholar
Frederick S (2005) Cognitive reflection and decision making. J Econ Perspect 19(4):25–42
Article Google Scholar
Graves JT, Acquisti A, Anderson R (2014) Experimental measurement of attitudes regarding cybercrime. In: 13th annual workshop on the economics of information security (WEIS 2014), University Park/State College, PA
Hanson R (2003) Combinatorial information market design. Inf Syst Front 5(1):107–119
Article Google Scholar
Healy PJ, Linardi S, Lowery JR, Ledyard JO (2010) Prediction markets: alternative mechanisms for complex environments with few traders. Manag Sci 56(11):1977–1996
Article Google Scholar
Horton JJ, Rand DG, Zeckhauser RJ (2011) The online laboratory: conducting experiments in a real labor market. Exp Econ 14(3):399–425
Article Google Scholar
Jian L, Sami R (2012) Aggregation and manipulation in prediction markets: effects of trading mechanism and information distribution. Manag Sci 58(1):123–140
Article Google Scholar
Jilke S, Van Ryzin GG, Van de Walle S (2015) Responses to decline in marketized public services: an experimental evaluation of choice overload. J Public Adm Res Theor 26(3):421–432
Article Google Scholar
Jones JL, Collins RW, Berndt DJ (2009) Information markets: a research landscape. Commun Assoc Inf Syst 25(1):27
Google Scholar
Kaufmann N, Schulze T, Veit D (2011) More than fun and money. Worker motivation in crowdsourcing—a study on mechanical turk. In: Proceedings of the 17th Americas conference on information systems (AMCIS 2011), Detroit, MI
Kern R, Thies H, Satzger G (2011) Efficient quality management of human-based electronic services leveraging group decision making .In: Proceedings of the 19th European conference on information systems (ECIS 2011), Helsinki, Finland
Kersten G, Noronha S (1999) Negotiation via the world wide web: a cross-cultural study of decision making. Group Decis Negot 8(3):251–279
Article Google Scholar
Kersten G, Köszegi ST, Vetschera R (2002) The effects of culture in anonymous negotiations: experiment in four countries. In: Proceedings of the 35th Hawaii international conference on system sciences (HICSS-35’02), Big Island, HI
Landemore H, Elster J (eds) (2012) Collective wisdom: principles and mechanisms. Cambridge University Press, New York
Google Scholar
Lavoie J (2009) The innovation engine at rite-soluations: lessons from the CEO. J Predict Mark 3:1–11
Google Scholar
Ledyard J, Hanson R, Ishikida T (2009) An experimental test of combinatorial information markets. J Econ Behav Organ 69(2):182–189
Article Google Scholar
Levy Y, Ellis TJ (2011) A guide for novice researchers on experimental and quasi-experimental studies in information systems research. Interdiscip J Inf Knowl Manag 6:151–161
Google Scholar
Malone TW, Laubacher R, Dellarocas C (2010) The collective intelligence genome. MIT Sloan Manag Rev 51(3):21–31
Google Scholar
Mao A, Chen Y, Gajos KZ, Parkes D, Procaccia AD, Zhang H (2012). TurkServer: enabling synchronous and longitudinal online experiments. In: Proceedings of the fourth workshop on human computation (HCOMP ‘12), Toronto, Canada
Mason W, Suri S (2012) Conducting behavioral research on Amazon’s mechanical turk. Behav Res Methods 44(1):1–23
Article Google Scholar
Mullinix KJ, Leeper TJ, Druckman JN, Freese J (2015) The generalizability of survey experiments. J Exp Polit Sci 2:109–138
Article Google Scholar
Nagar Y, Malone TW (2011) Making business predictions by combining human and machine intelligence in prediction markets. In: Proceedings of the thirty second international conference on information systems (ICIS 2011), Shanghai, China
Palvia P, Leary D, Mao E, Midha V, Pinjani P, Salam AF (2004) Research methodologies in MIS: an update. Commun Assoc Inf Syst 14:24
Google Scholar
Paolacci G, Chandler J, Ipeirotis P (2010) Running experiments on Amazon mechanical turk. Judgm Decis Mak 5(5):411–419
Google Scholar
Pilz D, Gewald H (2013) Does money matter? Motivational factors for participation in paid-and non-profit-crowdsourcing communities. In: 11th International conference on Wirtschaftsinformatik, Leipzig, Germany, pp 577–591
Pinsonneault A, Barki H, Gallupe RB, Hoppen N (1999) Electronic brainstorming: the illusion of productivity. Inf Syst Res 10(2):110–133
Article Google Scholar
Plott CR, Sunder S (1988) Rational expectations and the aggregation of diverse information in laboratory security markets. Econom J Econom Soc 56(5):1085–1118
Google Scholar
Qiu L, Rui H, Whinston A (2011) A twitter-based prediction market: social network approach. In: Proceedings of the thirty second international conference on information systems (ICIS 2011), Shanghai, China
Ross J, Zaldivar A, Irani L, Tomlinson B (2009) Who are the turkers? Worker demographics in Amazon mechanical turk. Technical report, University of California, Irvine, CA
Slamka C, Luckner S, Seemann T, Schröder J (2008) An empirical investigation of the forecast accuracy of play-money prediction markets and professional betting markets. In: Proceedings of the 16th European conference on information systems (ECIS 2008), Galway, Ireland, paper 236
Spann M, Skiera B (2003) Internet-based virtual stock markets for business forecasting. Manag Sci 49(10):1310–1326
Article Google Scholar
Straub T, Gimpel H, Teschner F, Weinhardt C (2014) Feedback and performance in crowd work: a real effort experiment. In: Proceedings of the 22nd European conference on information systems (ECIS)
Straub T, Gimpel H, Teschner F, Weinhardt C (2015) How (not) to incent crowd workers. Bus Inf Syst Eng 57:167–179
Article Google Scholar
Teschner F, Mazarakis A, Riordan R, Weinhardt C (2011) Participation, feedback & incentives in a competitive forecasting community. In: Proceedings of the international conference on information systems (ICIS 2011), Shanghai, China
Teschner F, Rothschild D, Gimpel H (2017) Manipulation in conditional decision markets. Group Decis Negot. https://doi.org/10.1007/s10726-017-9531-0
Google Scholar
Wolfers J, Zitzewitz E (2004) Prediction markets. J Econ Perspect 18(2):107–126
Article Google Scholar

Download references

Author information

Authors and Affiliations

Karlsruhe Institute of Technology (KIT), Englerstr. 14, 76131, Karlsruhe, Germany
Florian Teschner
University of Augsburg, Universitaetsstr. 12, 86135, Augsburg, Germany
Henner Gimpel

Authors

Florian Teschner
View author publications
You can also search for this author in PubMed Google Scholar
Henner Gimpel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Henner Gimpel.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Teschner, F., Gimpel, H. Crowd Labor Markets as Platform for Group Decision and Negotiation Research: A Comparison to Laboratory Experiments. Group Decis Negot 27, 197–214 (2018). https://doi.org/10.1007/s10726-018-9565-y

Download citation

Published: 03 March 2018
Issue Date: April 2018
DOI: https://doi.org/10.1007/s10726-018-9565-y

Keywords

JEL Classification

C9
D8

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Crowd Labor Markets as Platform for Group Decision and Negotiation Research: A Comparison to Laboratory Experiments

Abstract

Access this article

Similar content being viewed by others

Do mturkers collude in interactive online experiments?

Risks and Rewards of Crowdsourcing Marketplaces

Is it ethical to use Mechanical Turk for behavioral research? Relevant data from a representative survey of MTurk participants and wages

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

JEL Classification

Navigation

Crowd Labor Markets as Platform for Group Decision and Negotiation Research: A Comparison to Laboratory Experiments

Abstract

Access this article

Similar content being viewed by others

Do mturkers collude in interactive online experiments?

Risks and Rewards of Crowdsourcing Marketplaces

Is it ethical to use Mechanical Turk for behavioral research? Relevant data from a representative survey of MTurk participants and wages

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classification

Search

Navigation