Natural Language & Linguistic Theory

, Volume 34, Issue 2, pp 481–495 | Cite as

A streamlined approach to online linguistic surveys

  • Michael Yoshitaka ErlewineEmail author
  • Hadas Kotek


More and more researchers in linguistics use large-scale experiments to test hypotheses about the data they research, in addition to more traditional informant work. In this paper we describe a new set of free, open-source tools that allow linguists to post studies online, turktools. These tools allow for the creation of a wide range of linguistic tasks, including grammaticality surveys, sentence completion tasks, and picture-matching tasks, allowing for easily implemented large-scale linguistic studies. Our tools further help streamline the design of such experiments and assist in the extraction and analysis of the resulting data. Surveys created using the tools described in this paper can be posted on Amazon’s Mechanical Turk service, a popular crowdsourcing platform that mediates between ‘Requesters’ who can post surveys online and ‘Workers’ who complete them. This allows many linguistic surveys to be completed within hours or days and at relatively low costs. Alternatively, researchers can host these randomized experiments on their own servers using a supplied server-side component.


Experimental methods Online surveys Web-based experiments Crowdsourcing Amazon Mechanical Turk Software 



For helpful comments and discussion of this paper and the associated tools, we would like to thank Martin Hackl, David Pesetsky, Coppe van Urk, and participants of our 2013 workshop at MIT on designing linguistic experiments for Mechanical Turk. The current paper has also greatly benefited from the feedback of four anonymous NLLT reviewers, as well as the editor Marcel den Dikken. Any and all errors are ours.

Supplementary material

11049_2015_9305_MOESM1_ESM.pdf (326 kb)
(PDF 320 kB)


  1. Bard, Ellen Gurman, Dan Robertson, and Antonella Sorace. 1996. Magnitude estimation of linguistic acceptability. Language 72: 107–150. CrossRefGoogle Scholar
  2. Berinsky, Adam J., Gregory A. Huber, and Gabriel S. Lenz. 2012. Evaluating online labor markets for experimental research:’s Mechanical Turk. Political Analysis. Google Scholar
  3. Buhrmester, Michael, Tracy Kwang, and Samuel D. Gosling. 2011. Amazon’s Mechanical Turk: A new source of inexpensive, yet high-quality data?. Perspectives on Psychological Science 6(1): 3–5. CrossRefGoogle Scholar
  4. Cable, Seth, and Jesse Harris. 2011. On the grammatical status of PP-Pied-Piping in English: Results from sentence-rating experiments. In University of Massachusetts Occasional Papers in Linguistics: Processing Linguistic Structure, eds. Margaret Grant and Jesse Harris. Vol. 38, 1–22. Amherst: GLSA Publications. Google Scholar
  5. Chemla, Emmanuel, and Benjamin Spector. 2011. Experimental evidence for embedded scalar implicatures. Journal of Semantics 28(3): 359–400. doi: 10.1093/jos/ffq023. CrossRefGoogle Scholar
  6. Chomsky, Noam. 1965. Aspects of the theory of syntax. Cambridge: MIT Press. Google Scholar
  7. Cowart, Wayne. 1997. Experimental syntax: Applying objective methods to sentence judgments. Thousand Oaks: Sage Publications. Google Scholar
  8. Cowart, Wayne. 2012. Doing experimental syntax: bridging the gap between syntactic questions and well-designed questionnaires. In In search of grammar: Experimental and corpus-based studies, ed. James Myers, 67–96. Google Scholar
  9. Culicover, Peter W., and Ray Jackendoff. 2010. Quantitative methods alone are not enough: response to Gibson and Fedorenko. Trends in Cognitive Sciences 14(6): 234–235. CrossRefGoogle Scholar
  10. Crump, Matthew J. C., John V. McDonnell, and Todd M. Gureckis. 2013. Evaluating Amazon’s Mechanical Turk as a tool for experimental behavioral research. PLoS ONE 8(3): e57410. CrossRefGoogle Scholar
  11. Drummond, Alex. 2007. Ibex (Internet-based experiments). Software.
  12. Edelman, Shimon, and Morten Christiansen. 2003. How seriously should we take minimalist syntax? Trends in Cognitive Sciences 7: 60–61. CrossRefGoogle Scholar
  13. Featherston, Sam. 2005. Magnitude estimation and what it can do for your syntax: Some wh-constraints in German. Lingua 115: 1525–1550. CrossRefGoogle Scholar
  14. Ferreira, Fernanda. 2005. Psycholinguistics, formal grammars, and cognitive science. The Linguistic Review 22: 365–380. CrossRefGoogle Scholar
  15. Fort, Karën, Gilles Adda, and K. Bretonnel Cohen. 2011. Amazon Mechanical Turk: Gold mine or coal mine? Computational Linguistics 37(2): 413–420. CrossRefGoogle Scholar
  16. Fukuda, Shin, Dan Michel, Henry Beecher, and Grant Goodall. 2010. Comparing three methods for sentence judgment experiments. Linguistic Society of America (LSA) Annual Meeting, Baltimore, MD. Google Scholar
  17. Gelman, Andrew, and Jennifer Hill. 2007. Data analysis using regression and multilevel/hierarchical models. Cambridge: Cambridge University Press. Google Scholar
  18. Germine, Laura, Ken Nakayama, Bradley C. Duchaine, Christopher F. Chabris, Garga Chatterjee, and Jeremy B. Wilmer. 2012. Is the Web as good as the lab? Comparable performance from web and lab in cognitive/perceptual experiments. Psychonomic Bulletin and Review 19(5). Google Scholar
  19. Gibson, Edward, and Evelina Fedorenko. 2010. Weak quantitative standards in linguistics research. Trends in Cognitive Sciences 14: 233–234. CrossRefGoogle Scholar
  20. Gibson, Edward, Steve Piantadosi, and Kristina Fedorenko. 2011. Using Mechanical Turk to obtain and analyze English acceptability judgments. Language and Linguistics Compass 5(8): 509–524. CrossRefGoogle Scholar
  21. Gosling, Samuel D., Simine Vazire, Sanjay Srivastava, and Oliver P. John. 2004. Should we trust web-based studies? A comparative analysis of six preconceptions about Internet questionnaires. The American Psychologist 59(2): 93–104. CrossRefGoogle Scholar
  22. Horton, John J., David G. Rand, and Richard J. Zeckhauser. 2011. The online laboratory: Conducting experiments in a real labor market. Experimental Economics 14: 399–425. CrossRefGoogle Scholar
  23. Huang, Yi Ting, Elizabeth Spelke, and Jesse Snedeker. 2013. What exactly do numbers mean? Language Learning and Development 9(2): 105–129. CrossRefGoogle Scholar
  24. Ipeirotis, Panagiotis, Foster Provost, and Jing Wang. 2010. Quality management on Amazon Mechanical Turk. In HCOMP’10: Proceedings of the ACM SIGKDD Workshop on Human Computation 2, 64–67. CrossRefGoogle Scholar
  25. Ipeirotis, Panagiotis. 2010. Analyzing the Amazon Mechanical Turk Marketplace. ACM XRDS (Crossroads) 17(2): 16–21. CrossRefGoogle Scholar
  26. Just, Marcel A., Patricia A. Carpenter, and Jacqueline D. Woolley. 1982. Paradigms and processes and in reading comprehension. Journal of Experimental Psychology: General 111: 228–238. CrossRefGoogle Scholar
  27. Keller, Frank. 2000. Gradience in Grammar: Experimental and computational aspects of degrees of grammaticality. Ph.D. Thesis, University of Edinburgh. Google Scholar
  28. Keller, Frank, Martin Corley, Steffan Corley, Lars Konieczny, and Amalia Todirascu. 1998. WebExp: A Java toolbox for web-based psychological experiments (Technical Report No. HCRC/TR-99). Human Communication Research Centre, University of Edinburgh. Google Scholar
  29. Keller, Frank, Subahshini Gunasekharan, Neil Mayo, and Martin Corley. 2009. Timing accuracy of web experiments: A case study using the WebExp software package. Behavior Research Methods 41(1): 1–12. CrossRefGoogle Scholar
  30. Kotek, Hadas, Yasutada Sudo, Edwin Howard, and Martin Hackl. 2011. Most meanings are superlative. In Syntax and semantics 37: Experiments at the interfaces, ed. Jeff Runner, 101–145. Google Scholar
  31. Langendoen, Terence D., Nancy Kalish-Landon, and John Dore. 1973. Dative questions: A study of the relation of acceptability to grammaticality of an English sentence type. Cognition 2: 451–478. CrossRefGoogle Scholar
  32. Little, Greg. 2009. How many turkers are there? Deneme: A blog of experiments on Amazon Mechanical Turk. Retrieved March 28, 2014.
  33. Marantz, Alec. 2005. Generative linguistics within the cognitive neuroscience of language. The Linguistic Review 22: 429–445. CrossRefGoogle Scholar
  34. Mason, Winter, and Siddarth Suri. 2012. Conducting behavioral experiments on Amazon’s Mechanical Turk. Behavior Research Methods 44: 1–23. CrossRefGoogle Scholar
  35. Milsark, Gary. 1974. Existential sentences in English, Doctoral dissertation, MIT. Google Scholar
  36. Milsark, Gary. 1977. Toward an explanation of certain peculiarities of the existential construction in English. Linguistic Analysis 3: 1–29. Google Scholar
  37. Munro, Robert, Steven Bethard, Victor Kuperman, Vicky Tzuyin Lai, Robin Melnick, Christopher Potts, Tyler Schnoebelen, and Harry Tily. 2010. Crowdsourcing and language studies: the new generation of linguistic data. In Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, Los Angeles, CA. Google Scholar
  38. Myers, James. 2009a. Syntactic judgment experiments. Language and Linguistics Compass 3: 406–423. CrossRefGoogle Scholar
  39. Myers, James. 2009b. The design and analysis of small-scale syntactic judgment experiments. Lingua 119: 425–444. CrossRefGoogle Scholar
  40. Paolacci, Gabriele, Jesse Chandler, and Panagiotis Ipeirotis. 2010. Running experiments on Amazon Mechanical Turk. Judgment and Decision Making 5(5): 411–419. Google Scholar
  41. Pearson, Hazel, Manizeh Khan, and Jesse Snedeker. 2010. Even more evidence for the emptiness of plurality: An experimental investigation of plural interpretation as a species of implicature. Semantic and Linguistic Theory (SALT) 20: 489–508. CrossRefGoogle Scholar
  42. Phillips, Colin. 2010. Should we impeach armchair linguists? In Japanese-Korean Linguistics (JK17), eds. Soichi Iwasaki, Hajime Hoji, Patricia Clancy, Devyani Sharma, and Sung-Och Sohn. Vol. 17, 49–64. Stanford: CSLI Publications. Google Scholar
  43. Phillips, Colin, and Howard Lasnik. 2003. Linguistics and empirical evidence: Reply to Edelman and Christiansen. Trends in Cognitive Sciences 7: 61–62. CrossRefGoogle Scholar
  44. Reips, Ulf-Dietrich. 2002. Standards for Internet-based experimenting. Experimental Psychology 49(4): 243–256. CrossRefGoogle Scholar
  45. Schütze, Carson. 1996. The empirical base of linguistics: grammaticality judgments and linguistic methodology. Chicago: University of Chicago Press. Google Scholar
  46. Schütze, Carson, and Jon Sprouse. 2013. Judgment data. In Research Methods in Linguistics, eds. Robert J. Podesva and Devyani Sharma, 27–50. Cambridge: Cambridge University Press. Google Scholar
  47. Shapiro, Danielle, Jesse Chandler, and Pam Mueller. 2013. Using Mechanical Turk to study clinical populations. Clinical Psychological Science 1(2): 213–220. CrossRefGoogle Scholar
  48. Sprouse, Jon. 2009. Revisiting satiation: Evidence for an equalization response strategy. Linguistic Inquiry 40: 329–341. CrossRefGoogle Scholar
  49. Sprouse, Jon. 2011. A validation of Amazon Mechanical Turk for the collection of acceptability judgments in linguistic theory. Behavior Research Methods 43: 155–167. CrossRefGoogle Scholar
  50. Sprouse, Jon. 2013. Acceptability judgments. In Oxford Bibliographies Online: Linguistics, ed. Mark Aronoff. Google Scholar
  51. Sprouse, Jon, and Diogo Almeida. 2012. Assessing the reliability of textbook data in syntax: Adger’s Core Syntax. Journal of Linguistics 48: 609–652. CrossRefGoogle Scholar
  52. Sprouse, Jon, and Diogo Almeida. 2013. The empirical status of data in syntax: a reply to Gibson and Fedorenko. Language and Cognitive Processes 28(3): 222–228. CrossRefGoogle Scholar
  53. Sprouse, Jon, Carson Schütze, and Diogo Almeida. 2013. A comparison of informal and formal acceptability judgments using a random sample from Linguistic Inquiry 2001–2010. Lingua 134: 219–248. CrossRefGoogle Scholar
  54. Tamir, D. 2011. 50,000 Worldwide Mechanical Turk workers. Techlist,
  55. Tily, Harry, and Edward Gibson. 2015, in preparation. Self-paced reading on Mechanical Turk. Google Scholar
  56. Wasow, Thomas, and Jennifer Arnold. 2005. Intuitions in linguistic argumentation. Lingua 115: 1481–1496. CrossRefGoogle Scholar
  57. Weskott, Thomas, and Gisbert Fanselow. 2011. On the informativity of different measures of linguistic acceptability. Language 87: 249–273. CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2015

Authors and Affiliations

  1. 1.National University of SingaporeSingaporeSingapore
  2. 2.McGill UniversityMontrealCanada

Personalised recommendations