A streamlined approach to online linguistic surveys


More and more researchers in linguistics use large-scale experiments to test hypotheses about the data they research, in addition to more traditional informant work. In this paper we describe a new set of free, open-source tools that allow linguists to post studies online, turktools. These tools allow for the creation of a wide range of linguistic tasks, including grammaticality surveys, sentence completion tasks, and picture-matching tasks, allowing for easily implemented large-scale linguistic studies. Our tools further help streamline the design of such experiments and assist in the extraction and analysis of the resulting data. Surveys created using the tools described in this paper can be posted on Amazon’s Mechanical Turk service, a popular crowdsourcing platform that mediates between ‘Requesters’ who can post surveys online and ‘Workers’ who complete them. This allows many linguistic surveys to be completed within hours or days and at relatively low costs. Alternatively, researchers can host these randomized experiments on their own servers using a supplied server-side component.

  1. 1.

    In AMT jargon, these tasks are called Human Intelligence Tasks, or HITs. The organization of linguistic surveys into HITs will be discussed in the Appendix (online).

  2. 2.

    The process of designing an experiment can itself be very valuable. As is often the case, expanding the scope of one’s investigation can lead to interesting findings about the factors that affect the phenomenon in question. Although this goal by itself can be achieved without experimentation, we believe that the exercise of turning a theoretical research question into a testable set of experimental predictions can inform one’s thinking about the problem.

  3. 3.

    Participants in university lab settings often tend to be college students, and hence have a restricted distribution of age, education, and socio-economic status.

  4. 4.

    An anonymous reviewer asks whether there has been a comparison of AMT and lab data for tasks involving timing, for example for Self-Paced Reading. To the best of our knowledge, although there is ongoing work attempting to answer this question (see Tily and Gibson 2015), there are no published results.

  5. 5.

    A screen capture of this map can be found at http://turktools.net/crowdsourcing/.

  6. 6.

    Data collected on April 24, 2013. The vast majority of experiments were on English and restricted IP addresses of workers to within the US. Our experiments request that workers participate in each experiment only once.

  7. 7.

    The only quantitative data cited by Fort et al. (2011) to motivate this concern comes from Little (2009) who reports that, over a 75 day period in their lab at MIT’s Computer Science and Artificial Intelligence Lab, 22 % of their workers completed 80 % of their the tasks that they posted on AMT. However, these tasks are not linguistic experiments that request that workers participate only once per experiment, unlike for the results we report above from the Experimental Syntax-Semantics Lab at MIT.

  8. 8.

    For example, Cowart (1997, 2012) gives practical suggestions for systematically constructing item paradigms in Excel. Myers (2009b) presents MiniJudge, a tool designed to facilitating this process of constructing linguistic stimuli online.

  9. 9.

    Our supplied skeletons support choices introduced with buttons below the sentence, as in Fig. 1, or with a drop-down menu.

  10. 10.

    Some of these modifications require custom JavaScript programming in the template. Our own templates utilize the jQuery JavaScript library (http://jquery.com/), and we recommend its use for such custom programming.

  11. 11.

    Turktools is an ongoing, open-source project. The documentation will be continuously updated as necessary, and we encourage contributions by other users. Details can be found at: http://turktools.net/use/.

  12. 12.

    At the time of writing, these tools require the use of Python 2.6.x or 2.7.x, available at http://python.org. However, the tools described here and their prerequisites and usage are subject to change. Please consult the latest information at http://turktools.net before using these tools.

  13. 13.

    In the interest of space, we do not critically review the Gibson et al. (2011) paper and turkolizer tool.

  14. 14.

    The strengths and increased flexibility of WebExp and Ibex come with a higher technical barrier to entry than turktools, both in terms of experiment creation and in the deployment of their experiments. Both are written as server-side software packages that are designed to run on the researchers’ own servers, configured in a particular way. To recruit participants for WebExp/Ibex experiments on AMT, a simple template is used in AMT to redirect participants to the externally-hosted survey. AMT provides a sample template, called “Survey Link,” for such purposes. An additional step of cross-referencing submissions between the AMT and WebExp/Ibex submission results is then necessary in order to verify experiment participation in order to pay participants on AMT.


For helpful comments and discussion of this paper and the associated tools, we would like to thank Martin Hackl, David Pesetsky, Coppe van Urk, and participants of our 2013 workshop at MIT on designing linguistic experiments for Mechanical Turk. The current paper has also greatly benefited from the feedback of four anonymous NLLT reviewers, as well as the editor Marcel den Dikken. Any and all errors are ours.

