Skip to main content
Log in

Crowd Labor Markets as Platform for Group Decision and Negotiation Research: A Comparison to Laboratory Experiments

  • Published:
Group Decision and Negotiation Aims and scope Submit manuscript

Abstract

Crowd labor markets such as Amazon Mechanical Turk (MTurk) have emerged as popular platforms where researchers can relatively inexpensively and easily run web-based experiments. Some work even suggests that MTurk can be used to run large-scale field experiments in which groups of participants interact synchronously in real-time such as electronic markets. Besides technical issues, several methodological questions arise and lead to the question of how results from MTurk and laboratory experiments compare. Our data shows comparable results between MTurk and a standard lab setting with student subjects in a controlled environment when running rather simple individual decision tasks. However, our data shows stark differences in results between the experimental settings for a rather complex market experiment. Each experimental setting—lab and MTurk—has its own benefits and drawbacks; which of the two settings is better suited for a specific experiment depends on the theory or artifact to be tested. We discuss potential causes for differences (language understanding, education, cognition and context) that we cannot control for and provide guidance for the selection of the appropriate setting for an experiment. In any case, researchers studying complex artifacts like group decisions or markets should not prematurely adopt MTurk based on extant literature regarding comparable results across experimental settings for rather simple tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. https://www.mturk.com/mturk/, accessed on February 20, 2018.

  2. We do not suggest that information system experiments are per se more complex than experiments in other disciplines. Methodological papers comparing lab experiments to MTurk experiments do, however, so far focus on rather simple settings like an ultimatum game or prisoner’s dilemma, which are arguably less complex than the information markets studied here.

  3. The lab experiment reported here is part of a larger series of lab experiments. In other settings, it is relevant to have two parallel markets and decide among them. For consistency, we used the same setting with two parallel markets here. As a downside, it increases complexity for subjects. However, having multiple parallel markets is common in most real-world applications of information markets.

  4. Note there are technical solutions to run web-based group experiments such as: Lioness and NodeGame.

  5. The decision for random rather than fixed effects or pooled regression bases on theoretical and empirical arguments. On the theoretical side, pooled regression is not adequate as it does not account for the interdependencies in the data. A fixed effect model would rule out time-invariant heterogeneity between cohorts, which is not desirable in an analysis where we consider data from both the lab and MTurk. On the empirical side, we tested the appropriateness of each of the three modeling approaches for each of the regression models reported in the following. We used F test to detect potential significant increases in goodness-of-fit with a fixed effects model as compared to a pooled regression model. We did not find evidence for such effects and, thus, do not further consider fixed effects models. Further, we used the Lagrange Multiplier test to examine random effects (Breusch and Pagan 1980). We found significant evidence for the existence of random effects for multiple regression models. For consistency and the theorical argument, we consistently apply two-way random effects models throughout.

  6. Note that by pooling the data from both settings we constrain the variance of the residuals to be the same for both settings even though this might not be the case given the different settings from which data originates.

  7. In the cognitive reflection test (CRT) participants answer the following 3 questions; A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost? (correct answer: 5)

    If it takes 5 machines 5 min to make 5 widgets, how long would it take 100 machines to make 100 widgets? (correct answer: 5) In a lake, there is a patch of lily pads. Every day, the patch doubles in size. If it takes 48 days for the patch to cover the entire lake, how long would it take for the patch to cover half of the lake? (correct answer: 47).

  8. The speculation is based on a general perception that education is positively correlated with intelligence and the fact that MTurk workers are more representative for the general population than students (Paolacci et al. 2010). We did not perform intelligence tests with the participants.

References

  • Aïmeura E, Lawani O, Dalkir K (2016) When changing the look of privacy policies affects user trust: an experimental study. Comput Hum Behav 58:368–379

    Article  Google Scholar 

  • Amir O, Rand DG, Gal YK (2012) Economic games on the internet: the effect of $1 Stakes. PLoS ONE 7(2):e31461

    Article  Google Scholar 

  • Barber BM, Odean T (2000) Trading is hazardous to your wealth: the common stock investment performance of individual investors. J Finance 55(2):773–806

    Article  Google Scholar 

  • Bennouri M, Gimpel H, Robert J (2011) Measuring the impact of information aggregation mechanisms: an experimental investigation. J Econ Behav Organ 78(3):302–318

    Article  Google Scholar 

  • Berg JE, Rietz TA (2003) Prediction markets as decision support systems. Inf Syst Front 5(1):79–93

    Article  Google Scholar 

  • Berg JE, Nelson FD, Rietz TA (2008) Prediction market accuracy in the long run. Int J Forecast 24(2):285–300

    Article  Google Scholar 

  • Berinsky AJ, Huber GA, Lenz GS (2012) Evaluating online labor markets for experimental research: Amazon. com’s mechanical turk. Polit Anal 20(3):351–368

    Article  Google Scholar 

  • Bichler M, Kersten G, Strecker S (2003) Towards a structured design of electronic negotiations. Group Decis Negot 12(4):311–335

    Article  Google Scholar 

  • Blohm I, Riedl C, Leimeister JM, Krcmar H (2011) Idea evaluation mechanisms for collective intelligence in open innovation communities: do traders outperform raters?. In: Proceedings of the thirty second international conference on information systems (ICIS 2011), Shanghai, China

  • Breusch TS, Pagan AR (1980) The Lagrange multiplier test and its applications to model specification in econometrics. Rev Econ Stud 47:239–253

    Article  Google Scholar 

  • Buhrmester M, Kwang T, Gosling SD (2011) Amazon’s mechanical turk: a new source of inexpensive, yet high-quality data? Perspect Psychol Sci 6(1):3–5

    Article  Google Scholar 

  • Casey LS, Jesse Chandler J, Levine AS, Proctor A, Strolovitch DZ (2017) Intertemporal differences among MTurk workers: time-based sample variations and implications for online data collection. SAGE Open. https://doi.org/10.1177/2158244017712774

  • Chandler D, Kapelner A (2013) Breaking monotony with meaning: motivation in crowdsourcing markets. J Econ Behav Organ 90:123–133

    Article  Google Scholar 

  • Chen DL, Horton JJ (2016) Are online labor markets spot markets for tasks?A field experiment on the behavioral response to wage cuts. Inf Syst Res 27(2):403–423

    Article  Google Scholar 

  • Chilton LB, Horton JJ, Miller RC, Azenkot S (2010) Task search in a human computation market. In: Proceedings of the ACM SIGKDD workshop on human computation (HCOMP ‘10), New York, NY

  • Djamasbi S, Bengisu B, Loiacono E, Whitefleet-Smith J (2008) Can a reasonable time limit improve the effective usage of a computerized decision aid? Commun Assoc Inf Syst 23:22

    Google Scholar 

  • Fair RC, Shiller RJ (1989) The informational context of ex-ante forecasts. Rev Econ Stat 71:325–331

    Article  Google Scholar 

  • Ferreira A, Antunes P, Herskovic V (2011) Improving group attention: an experiment with synchronous brainstorming. Group Decis Negot 20(5):643–666

    Article  Google Scholar 

  • Frederick S (2005) Cognitive reflection and decision making. J Econ Perspect 19(4):25–42

    Article  Google Scholar 

  • Graves JT, Acquisti A, Anderson R (2014) Experimental measurement of attitudes regarding cybercrime. In: 13th annual workshop on the economics of information security (WEIS 2014), University Park/State College, PA

  • Hanson R (2003) Combinatorial information market design. Inf Syst Front 5(1):107–119

    Article  Google Scholar 

  • Healy PJ, Linardi S, Lowery JR, Ledyard JO (2010) Prediction markets: alternative mechanisms for complex environments with few traders. Manag Sci 56(11):1977–1996

    Article  Google Scholar 

  • Horton JJ, Rand DG, Zeckhauser RJ (2011) The online laboratory: conducting experiments in a real labor market. Exp Econ 14(3):399–425

    Article  Google Scholar 

  • Jian L, Sami R (2012) Aggregation and manipulation in prediction markets: effects of trading mechanism and information distribution. Manag Sci 58(1):123–140

    Article  Google Scholar 

  • Jilke S, Van Ryzin GG, Van de Walle S (2015) Responses to decline in marketized public services: an experimental evaluation of choice overload. J Public Adm Res Theor 26(3):421–432

    Article  Google Scholar 

  • Jones JL, Collins RW, Berndt DJ (2009) Information markets: a research landscape. Commun Assoc Inf Syst 25(1):27

    Google Scholar 

  • Kaufmann N, Schulze T, Veit D (2011) More than fun and money. Worker motivation in crowdsourcing—a study on mechanical turk. In: Proceedings of the 17th Americas conference on information systems (AMCIS 2011), Detroit, MI

  • Kern R, Thies H, Satzger G (2011) Efficient quality management of human-based electronic services leveraging group decision making .In: Proceedings of the 19th European conference on information systems (ECIS 2011), Helsinki, Finland

  • Kersten G, Noronha S (1999) Negotiation via the world wide web: a cross-cultural study of decision making. Group Decis Negot 8(3):251–279

    Article  Google Scholar 

  • Kersten G, Köszegi ST, Vetschera R (2002) The effects of culture in anonymous negotiations: experiment in four countries. In: Proceedings of the 35th Hawaii international conference on system sciences (HICSS-35’02), Big Island, HI

  • Landemore H, Elster J (eds) (2012) Collective wisdom: principles and mechanisms. Cambridge University Press, New York

    Google Scholar 

  • Lavoie J (2009) The innovation engine at rite-soluations: lessons from the CEO. J Predict Mark 3:1–11

    Google Scholar 

  • Ledyard J, Hanson R, Ishikida T (2009) An experimental test of combinatorial information markets. J Econ Behav Organ 69(2):182–189

    Article  Google Scholar 

  • Levy Y, Ellis TJ (2011) A guide for novice researchers on experimental and quasi-experimental studies in information systems research. Interdiscip J Inf Knowl Manag 6:151–161

    Google Scholar 

  • Malone TW, Laubacher R, Dellarocas C (2010) The collective intelligence genome. MIT Sloan Manag Rev 51(3):21–31

    Google Scholar 

  • Mao A, Chen Y, Gajos KZ, Parkes D, Procaccia AD, Zhang H (2012). TurkServer: enabling synchronous and longitudinal online experiments. In: Proceedings of the fourth workshop on human computation (HCOMP ‘12), Toronto, Canada

  • Mason W, Suri S (2012) Conducting behavioral research on Amazon’s mechanical turk. Behav Res Methods 44(1):1–23

    Article  Google Scholar 

  • Mullinix KJ, Leeper TJ, Druckman JN, Freese J (2015) The generalizability of survey experiments. J Exp Polit Sci 2:109–138

    Article  Google Scholar 

  • Nagar Y, Malone TW (2011) Making business predictions by combining human and machine intelligence in prediction markets. In: Proceedings of the thirty second international conference on information systems (ICIS 2011), Shanghai, China

  • Palvia P, Leary D, Mao E, Midha V, Pinjani P, Salam AF (2004) Research methodologies in MIS: an update. Commun Assoc Inf Syst 14:24

    Google Scholar 

  • Paolacci G, Chandler J, Ipeirotis P (2010) Running experiments on Amazon mechanical turk. Judgm Decis Mak 5(5):411–419

    Google Scholar 

  • Pilz D, Gewald H (2013) Does money matter? Motivational factors for participation in paid-and non-profit-crowdsourcing communities. In: 11th International conference on Wirtschaftsinformatik, Leipzig, Germany, pp 577–591

  • Pinsonneault A, Barki H, Gallupe RB, Hoppen N (1999) Electronic brainstorming: the illusion of productivity. Inf Syst Res 10(2):110–133

    Article  Google Scholar 

  • Plott CR, Sunder S (1988) Rational expectations and the aggregation of diverse information in laboratory security markets. Econom J Econom Soc 56(5):1085–1118

    Google Scholar 

  • Qiu L, Rui H, Whinston A (2011) A twitter-based prediction market: social network approach. In: Proceedings of the thirty second international conference on information systems (ICIS 2011), Shanghai, China

  • Ross J, Zaldivar A, Irani L, Tomlinson B (2009) Who are the turkers? Worker demographics in Amazon mechanical turk. Technical report, University of California, Irvine, CA

  • Slamka C, Luckner S, Seemann T, Schröder J (2008) An empirical investigation of the forecast accuracy of play-money prediction markets and professional betting markets. In: Proceedings of the 16th European conference on information systems (ECIS 2008), Galway, Ireland, paper 236

  • Spann M, Skiera B (2003) Internet-based virtual stock markets for business forecasting. Manag Sci 49(10):1310–1326

    Article  Google Scholar 

  • Straub T, Gimpel H, Teschner F, Weinhardt C (2014) Feedback and performance in crowd work: a real effort experiment. In: Proceedings of the 22nd European conference on information systems (ECIS)

  • Straub T, Gimpel H, Teschner F, Weinhardt C (2015) How (not) to incent crowd workers. Bus Inf Syst Eng 57:167–179

    Article  Google Scholar 

  • Teschner F, Mazarakis A, Riordan R, Weinhardt C (2011) Participation, feedback & incentives in a competitive forecasting community. In: Proceedings of the international conference on information systems (ICIS 2011), Shanghai, China

  • Teschner F, Rothschild D, Gimpel H (2017) Manipulation in conditional decision markets. Group Decis Negot. https://doi.org/10.1007/s10726-017-9531-0

    Google Scholar 

  • Wolfers J, Zitzewitz E (2004) Prediction markets. J Econ Perspect 18(2):107–126

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Henner Gimpel.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Teschner, F., Gimpel, H. Crowd Labor Markets as Platform for Group Decision and Negotiation Research: A Comparison to Laboratory Experiments. Group Decis Negot 27, 197–214 (2018). https://doi.org/10.1007/s10726-018-9565-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10726-018-9565-y

Keywords

JEL Classification

Navigation