Introduction

Lightning-fast internet speeds and significant technological improvements have made it possible to perform complex experiments within a modern web browser. It is becoming increasingly popular to combine browser-based experiments with recruiting participants on platforms such as Amazon’s Mechanical Turk (MTurk) or Prolific Academic (Palan & Schitter, 2018). There are several reasons why researchers opt for online instead of lab-based testing. The first is efficiency. The recruitment platforms (e.g., MTurk) have access to large numbers of participants, allowing many (thousands of) participants to be tested simultaneously, which would not be possible in a lab-based setting. They are also not restricted to office hours or teaching schedules, and do not require an on-campus presence for participants or researchers. Secondly, participants from the online platforms are a better reflection of the general population than the undergraduate students who typically participate in experiments on campus (Berinsky et al., 2012). Finally, online experiments are more economicalFootnote 1, because there is no need to spend time recruiting, scheduling, and testing participants.

Our lab has had an overwhelmingly positive experience with running online studies (Grootswagers et al., 2017, 2018, 2020). While early days involved extensive JavaScript programming for relatively simple online studies, recent advancements have made it much easier to get complex studies up and running (Anwyl-Irvine, Massonnié, Flitton, Kirkham, & Evershed,. 2020b; Barnhoorn et al., 2014; De Leeuw, 2015; Henninger et al., 2019; Peirce et al., 2019). These generally come with associated tutorials and hands-on guides, but these are often specific to a single platform or method. Therefore, becoming familiar with the infrastructure, tools, and terminology can be challenging, especially when starting from scratch. This document aims to facilitate this process by introducing the basics of online testing. It is intended to serve as a high-level overview, and guide the reader to relevant in-depth literature, reviews, and tutorials.

The basics

The core infrastructure needed for online experiments consists of (1) a browser-based experiment, (2) a server to host the experiment, and (3) a participant recruitment tool. Figure 1 illustrates the general infrastructure and workflow for online experiments. Experiments are programmed to run in a browser and are hosted on a server. Participants are recruited from online marketplaces and perform the task on their local machine. The data are uploaded to the hosting server, where the experimenter can collect the results.

Fig. 1
figure 1

Infrastructure model for online experiments

Creating the experiment

The experiment needs to be able to run in a web browser (e.g., Safari, Google Chrome, Internet Explorer). It therefore needs to be programmed in a browser-compatible programming language (e.g., JavaScript, PhP). The most popular language for online experiments is JavaScript, and there exist several JavaScript modules (e.g., JsPsych, PsychoJS, OSWeb, Lab.js) tailored to behavioural experiments. The libraries provide a number of high-level functions to facilitate experiment-specifics, such as presenting stimuli, timing control, randomisation, and collecting responses. Some (e.g., Lab.js) are accompanied by web-based task builders that allow experiments to be created without the need for any programming. Several free and open-source graphical experiment builders can export experiments as browser-compatible JavaScript code. For example, Psychopy (Peirce et al., 2019) can export to PsychoJS, and OpenSesame (Mathôt et al., 2012) to OSWeb. There also exist commercial solutions that provide experiment builders as part of a complete experiment hosting infrastructure, such as Testable, Inquisit, and Gorilla (Anwyl-Irvine et al., 2020b).

Deciding on a suitable experiment creation method is often a matter of personal preference. Experiment builders are easy to use but can lack flexibility. Some JavaScript modules are also easier to use than others and can be guided by previous experience in experiment programming. For example, PsychoJS has a similar code structure as its Psychopy counterpart and may therefore be well suited for those already experienced with coding Psychopy experiments in python.

Hosting the experiment

The experiment needs to be accessible to the world. This involves hosting the experiment code, stimuli, and libraries on a server. This allows a participant to access the experiment code from their web browser. The experiment then runs in the browser on the participant’s computer. The participant completes the experiment, and the script sends the participant’s experimental data back to the server. This means that the server should be able to receive and store the experiment data. Several paid hosting services exist that are specifically aimed at collecting behavioural data online, such as Pavlovia, Gorilla, or Inquisit. Alternatively, experiments can be directly hosted on a web server (or a cloud service such as Google or Amazon). This requires knowledge of servers and security technology, but is flexible and allows for secure and private data storage. JATOS (Lange et al., 2015) is an example of a free and open-source application that facilitates the setting up and running of a web server for hosting online studies.

When choosing a hosting solution, factors to consider are the cost, flexibility, and ease of use. Commercial services (e.g., Gorilla or Inquisit) are generally very user-friendly but also the most expensive option and use their own experiment builders. Pavlovia is a non-commercial low-cost hosting service that is still user-friendly and accommodates different types of JavaScript experiments. These hosting services all charge a fee per participant or have limited term usage licenses. In contrast, JATOS is free and open-source software for hosting experiments that is flexible but requires more technical skills to set up on a server.

Recruiting participants

The final step is to recruit participants. What is needed for this is a marketplace (on the web) where participants can view and sign up for experiments. When they decide to participate, they get the link (URL) to the experiment server and complete the task. Examples of such marketplaces are SONA systems (often used for undergraduate testing at universities), MTurk, or Prolific (Palan & Schitter, 2018). To be able to compensate participants (e.g., course credits or payment) for their participation, online experiments often display a unique code that participants can enter in the recruitment system so the experimenter can verify their participation. It is useful to note the time zone of the participants, for example, MTurk workers (based in the US) will be more likely to be online and see the experiment if it is posted during their daytime. The recruitment systems will have the option to specify how many participants are needed, and some provide additional screening criteria. When all participants have completed the experiment, the researcher can simply download the data from the server and start analysing.

Frequently asked questions

The basic infrastructure needed for online testing is not overly complex, as noted in the previous section. In addition, the available infrastructure has improved significantly in recent years with the development of more sophisticated hosting solutions and programming libraries. Once one is familiar with these powerful tools, it is extremely easy to go from hypothesis to experimental data within hours. The remainder of this paper will cover a number of frequently asked questions with regard to online testing.

How good are the data?

Several studies have compared data from online markets to data collected in the lab (Barnhoorn et al., 2014; Crump et al., 2013; de Leeuw & Motz, 2016; Simcox & Fiez, 2014; Zwaan & Pecher, 2012), with overall positive results. Tutorials and reviews have suggested that online experiment data are generally better when experiments are short, pay well, are fun, and have clear instructions. It is good to keep in mind that participants from online marketplaces (e.g., MTurk) are not as familiar with psychology experiments as undergraduate students. Therefore, it is essential to provide very clear instructions and sometimes include a number of practice trials to ensure they understand the task.

How good is the timing?

Despite the progress in web-based technology, stimulus and response timing will be less reliable than the commercial equipment used in the lab. In general, latencies and variabilities are higher in web-based than in lab environments. Several studies have assessed the quality of timing in online studies, with encouraging results (Anwyl-Irvine, Dalmaijer, Hodges, & Evershed., 2020a; Bridges et al., 2020; Pronk et al., 2019; Reimers & Stewart, 2015). An online evaluation of a masked priming experiment showed that very short stimulus durations (i.e., under 50 ms) can be problematic (but see Barnhoorn et al., 2014), but other classic experimental psychology paradigms that rely on reaction times (e.g., Stroop, flanker, and Simon tasks) were successfully replicated (Crump et al., 2013).

What are the limitations?

Online experiments only work for some stimulus modalities. While the online approach is well suited for experiments consisting of visual stimuli and keyboard or mouse responses (but see previous question on timing), other paradigms are harder or impossible to move online. For example, studies requiring auditory stimuli are possible (Cooke et al., 2011; Gibson et al., 2011; Schnoebelen & Kuperman, 2010; Slote & Strand, 2016), but may necessitate a more extensive set-up procedure, such as procedures to make sure the participant’s set-up works. Presenting stimuli in other modalities, such as tactile or olfactory stimuli, are impossible to achieve in an online environment.

A second limitation is the lack of experimental control. For example, while a participant’s screen size is reported by the browser, there is no way to know the participant’s distance from the screen. It is therefore impossible to control the exact visual angle of stimuli, which can be a limiting factor for some experiments. It is also hard to test whether participants are paying attention to the experiment. A common approach is to exclude participants based on their performance on catch-trials (Mason & Suri, 2012). Still, there can be a large amount of variability in attention amongst online participants, and they could be distracted by other sources while performing experiments, such as listening to radio, looking at their phone, or watching their children.

Conclusion

Online experiments offer large-scale participant testing in a short time and are cheaper to run than their lab-based counterparts. They can be a suitable option for many research questions but have some limitations in the amount of experimental control. This manuscript has provided a high-level overview of the infrastructure. For more in-depth reading, the reader is referred to the more specialised tutorials and reviews cited above. The JavaScript experiment libraries (e.g., JsPsych, PsychoJS, Lab.js) also have associated hands-on tutorials and contain many examples of classical cognitive science experiments, which are a good place to start with programming the online experiment.

Open Practices Statement

Any relevant data and materials are available at https://osf.io/xkdy4