Abstract
Crowd labor markets such as Amazon Mechanical Turk (MTurk) have emerged as popular platforms where researchers can relatively inexpensively and easily run web-based experiments. Some work even suggests that MTurk can be used to run large-scale field experiments in which groups of participants interact synchronously in real-time such as electronic markets. Besides technical issues, several methodological questions arise and lead to the question of how results from MTurk and laboratory experiments compare. Our data shows comparable results between MTurk and a standard lab setting with student subjects in a controlled environment when running rather simple individual decision tasks. However, our data shows stark differences in results between the experimental settings for a rather complex market experiment. Each experimental setting—lab and MTurk—has its own benefits and drawbacks; which of the two settings is better suited for a specific experiment depends on the theory or artifact to be tested. We discuss potential causes for differences (language understanding, education, cognition and context) that we cannot control for and provide guidance for the selection of the appropriate setting for an experiment. In any case, researchers studying complex artifacts like group decisions or markets should not prematurely adopt MTurk based on extant literature regarding comparable results across experimental settings for rather simple tasks.
Similar content being viewed by others
Notes
https://www.mturk.com/mturk/, accessed on February 20, 2018.
We do not suggest that information system experiments are per se more complex than experiments in other disciplines. Methodological papers comparing lab experiments to MTurk experiments do, however, so far focus on rather simple settings like an ultimatum game or prisoner’s dilemma, which are arguably less complex than the information markets studied here.
The lab experiment reported here is part of a larger series of lab experiments. In other settings, it is relevant to have two parallel markets and decide among them. For consistency, we used the same setting with two parallel markets here. As a downside, it increases complexity for subjects. However, having multiple parallel markets is common in most real-world applications of information markets.
Note there are technical solutions to run web-based group experiments such as: Lioness and NodeGame.
The decision for random rather than fixed effects or pooled regression bases on theoretical and empirical arguments. On the theoretical side, pooled regression is not adequate as it does not account for the interdependencies in the data. A fixed effect model would rule out time-invariant heterogeneity between cohorts, which is not desirable in an analysis where we consider data from both the lab and MTurk. On the empirical side, we tested the appropriateness of each of the three modeling approaches for each of the regression models reported in the following. We used F test to detect potential significant increases in goodness-of-fit with a fixed effects model as compared to a pooled regression model. We did not find evidence for such effects and, thus, do not further consider fixed effects models. Further, we used the Lagrange Multiplier test to examine random effects (Breusch and Pagan 1980). We found significant evidence for the existence of random effects for multiple regression models. For consistency and the theorical argument, we consistently apply two-way random effects models throughout.
Note that by pooling the data from both settings we constrain the variance of the residuals to be the same for both settings even though this might not be the case given the different settings from which data originates.
In the cognitive reflection test (CRT) participants answer the following 3 questions; A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost? (correct answer: 5)
If it takes 5 machines 5 min to make 5 widgets, how long would it take 100 machines to make 100 widgets? (correct answer: 5) In a lake, there is a patch of lily pads. Every day, the patch doubles in size. If it takes 48 days for the patch to cover the entire lake, how long would it take for the patch to cover half of the lake? (correct answer: 47).
The speculation is based on a general perception that education is positively correlated with intelligence and the fact that MTurk workers are more representative for the general population than students (Paolacci et al. 2010). We did not perform intelligence tests with the participants.
References
Aïmeura E, Lawani O, Dalkir K (2016) When changing the look of privacy policies affects user trust: an experimental study. Comput Hum Behav 58:368–379
Amir O, Rand DG, Gal YK (2012) Economic games on the internet: the effect of $1 Stakes. PLoS ONE 7(2):e31461
Barber BM, Odean T (2000) Trading is hazardous to your wealth: the common stock investment performance of individual investors. J Finance 55(2):773–806
Bennouri M, Gimpel H, Robert J (2011) Measuring the impact of information aggregation mechanisms: an experimental investigation. J Econ Behav Organ 78(3):302–318
Berg JE, Rietz TA (2003) Prediction markets as decision support systems. Inf Syst Front 5(1):79–93
Berg JE, Nelson FD, Rietz TA (2008) Prediction market accuracy in the long run. Int J Forecast 24(2):285–300
Berinsky AJ, Huber GA, Lenz GS (2012) Evaluating online labor markets for experimental research: Amazon. com’s mechanical turk. Polit Anal 20(3):351–368
Bichler M, Kersten G, Strecker S (2003) Towards a structured design of electronic negotiations. Group Decis Negot 12(4):311–335
Blohm I, Riedl C, Leimeister JM, Krcmar H (2011) Idea evaluation mechanisms for collective intelligence in open innovation communities: do traders outperform raters?. In: Proceedings of the thirty second international conference on information systems (ICIS 2011), Shanghai, China
Breusch TS, Pagan AR (1980) The Lagrange multiplier test and its applications to model specification in econometrics. Rev Econ Stud 47:239–253
Buhrmester M, Kwang T, Gosling SD (2011) Amazon’s mechanical turk: a new source of inexpensive, yet high-quality data? Perspect Psychol Sci 6(1):3–5
Casey LS, Jesse Chandler J, Levine AS, Proctor A, Strolovitch DZ (2017) Intertemporal differences among MTurk workers: time-based sample variations and implications for online data collection. SAGE Open. https://doi.org/10.1177/2158244017712774
Chandler D, Kapelner A (2013) Breaking monotony with meaning: motivation in crowdsourcing markets. J Econ Behav Organ 90:123–133
Chen DL, Horton JJ (2016) Are online labor markets spot markets for tasks?A field experiment on the behavioral response to wage cuts. Inf Syst Res 27(2):403–423
Chilton LB, Horton JJ, Miller RC, Azenkot S (2010) Task search in a human computation market. In: Proceedings of the ACM SIGKDD workshop on human computation (HCOMP ‘10), New York, NY
Djamasbi S, Bengisu B, Loiacono E, Whitefleet-Smith J (2008) Can a reasonable time limit improve the effective usage of a computerized decision aid? Commun Assoc Inf Syst 23:22
Fair RC, Shiller RJ (1989) The informational context of ex-ante forecasts. Rev Econ Stat 71:325–331
Ferreira A, Antunes P, Herskovic V (2011) Improving group attention: an experiment with synchronous brainstorming. Group Decis Negot 20(5):643–666
Frederick S (2005) Cognitive reflection and decision making. J Econ Perspect 19(4):25–42
Graves JT, Acquisti A, Anderson R (2014) Experimental measurement of attitudes regarding cybercrime. In: 13th annual workshop on the economics of information security (WEIS 2014), University Park/State College, PA
Hanson R (2003) Combinatorial information market design. Inf Syst Front 5(1):107–119
Healy PJ, Linardi S, Lowery JR, Ledyard JO (2010) Prediction markets: alternative mechanisms for complex environments with few traders. Manag Sci 56(11):1977–1996
Horton JJ, Rand DG, Zeckhauser RJ (2011) The online laboratory: conducting experiments in a real labor market. Exp Econ 14(3):399–425
Jian L, Sami R (2012) Aggregation and manipulation in prediction markets: effects of trading mechanism and information distribution. Manag Sci 58(1):123–140
Jilke S, Van Ryzin GG, Van de Walle S (2015) Responses to decline in marketized public services: an experimental evaluation of choice overload. J Public Adm Res Theor 26(3):421–432
Jones JL, Collins RW, Berndt DJ (2009) Information markets: a research landscape. Commun Assoc Inf Syst 25(1):27
Kaufmann N, Schulze T, Veit D (2011) More than fun and money. Worker motivation in crowdsourcing—a study on mechanical turk. In: Proceedings of the 17th Americas conference on information systems (AMCIS 2011), Detroit, MI
Kern R, Thies H, Satzger G (2011) Efficient quality management of human-based electronic services leveraging group decision making .In: Proceedings of the 19th European conference on information systems (ECIS 2011), Helsinki, Finland
Kersten G, Noronha S (1999) Negotiation via the world wide web: a cross-cultural study of decision making. Group Decis Negot 8(3):251–279
Kersten G, Köszegi ST, Vetschera R (2002) The effects of culture in anonymous negotiations: experiment in four countries. In: Proceedings of the 35th Hawaii international conference on system sciences (HICSS-35’02), Big Island, HI
Landemore H, Elster J (eds) (2012) Collective wisdom: principles and mechanisms. Cambridge University Press, New York
Lavoie J (2009) The innovation engine at rite-soluations: lessons from the CEO. J Predict Mark 3:1–11
Ledyard J, Hanson R, Ishikida T (2009) An experimental test of combinatorial information markets. J Econ Behav Organ 69(2):182–189
Levy Y, Ellis TJ (2011) A guide for novice researchers on experimental and quasi-experimental studies in information systems research. Interdiscip J Inf Knowl Manag 6:151–161
Malone TW, Laubacher R, Dellarocas C (2010) The collective intelligence genome. MIT Sloan Manag Rev 51(3):21–31
Mao A, Chen Y, Gajos KZ, Parkes D, Procaccia AD, Zhang H (2012). TurkServer: enabling synchronous and longitudinal online experiments. In: Proceedings of the fourth workshop on human computation (HCOMP ‘12), Toronto, Canada
Mason W, Suri S (2012) Conducting behavioral research on Amazon’s mechanical turk. Behav Res Methods 44(1):1–23
Mullinix KJ, Leeper TJ, Druckman JN, Freese J (2015) The generalizability of survey experiments. J Exp Polit Sci 2:109–138
Nagar Y, Malone TW (2011) Making business predictions by combining human and machine intelligence in prediction markets. In: Proceedings of the thirty second international conference on information systems (ICIS 2011), Shanghai, China
Palvia P, Leary D, Mao E, Midha V, Pinjani P, Salam AF (2004) Research methodologies in MIS: an update. Commun Assoc Inf Syst 14:24
Paolacci G, Chandler J, Ipeirotis P (2010) Running experiments on Amazon mechanical turk. Judgm Decis Mak 5(5):411–419
Pilz D, Gewald H (2013) Does money matter? Motivational factors for participation in paid-and non-profit-crowdsourcing communities. In: 11th International conference on Wirtschaftsinformatik, Leipzig, Germany, pp 577–591
Pinsonneault A, Barki H, Gallupe RB, Hoppen N (1999) Electronic brainstorming: the illusion of productivity. Inf Syst Res 10(2):110–133
Plott CR, Sunder S (1988) Rational expectations and the aggregation of diverse information in laboratory security markets. Econom J Econom Soc 56(5):1085–1118
Qiu L, Rui H, Whinston A (2011) A twitter-based prediction market: social network approach. In: Proceedings of the thirty second international conference on information systems (ICIS 2011), Shanghai, China
Ross J, Zaldivar A, Irani L, Tomlinson B (2009) Who are the turkers? Worker demographics in Amazon mechanical turk. Technical report, University of California, Irvine, CA
Slamka C, Luckner S, Seemann T, Schröder J (2008) An empirical investigation of the forecast accuracy of play-money prediction markets and professional betting markets. In: Proceedings of the 16th European conference on information systems (ECIS 2008), Galway, Ireland, paper 236
Spann M, Skiera B (2003) Internet-based virtual stock markets for business forecasting. Manag Sci 49(10):1310–1326
Straub T, Gimpel H, Teschner F, Weinhardt C (2014) Feedback and performance in crowd work: a real effort experiment. In: Proceedings of the 22nd European conference on information systems (ECIS)
Straub T, Gimpel H, Teschner F, Weinhardt C (2015) How (not) to incent crowd workers. Bus Inf Syst Eng 57:167–179
Teschner F, Mazarakis A, Riordan R, Weinhardt C (2011) Participation, feedback & incentives in a competitive forecasting community. In: Proceedings of the international conference on information systems (ICIS 2011), Shanghai, China
Teschner F, Rothschild D, Gimpel H (2017) Manipulation in conditional decision markets. Group Decis Negot. https://doi.org/10.1007/s10726-017-9531-0
Wolfers J, Zitzewitz E (2004) Prediction markets. J Econ Perspect 18(2):107–126
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Teschner, F., Gimpel, H. Crowd Labor Markets as Platform for Group Decision and Negotiation Research: A Comparison to Laboratory Experiments. Group Decis Negot 27, 197–214 (2018). https://doi.org/10.1007/s10726-018-9565-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10726-018-9565-y